# **Classification Challenge**

`Tópicos Especiais em Computação VIII`

Using Random Forests and Multi Layer Perceptrons to predict hospital readmissions of diabetic patients

*Luiz Henrique Rigo Faccio*

## **Importing Libraries and loading dataset**

In [157]:
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.max_columns', None)
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.feature_selection import VarianceThreshold
from sklearn.metrics import classification_report, accuracy_score
import joblib


In [39]:
# Loading informations
folder = "diabetes"
diabetes = pd.read_csv(f'{folder}/diabetic_data.csv')
mapping = pd.read_csv(f'{folder}/IDS_mapping.csv')
admission_type_mapping = mapping[0:7]
discharge_disposition_mapping = mapping[10:40].reset_index(drop=True)
admission_source_mapping = mapping[42:67].reset_index(drop=True)

## **Observing the dataset**

In [40]:
print(diabetes.shape)
print(diabetes.info())
print(diabetes.describe(include='all'))

print(diabetes["readmitted"].value_counts())

print()
for c in diabetes.columns:
    print(f"Column {c}", end="\n\t\t")
    print(diabetes[c].unique(), end="\n\n")

(101766, 50)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101766 entries, 0 to 101765
Data columns (total 50 columns):
 #   Column                    Non-Null Count   Dtype 
---  ------                    --------------   ----- 
 0   encounter_id              101766 non-null  int64 
 1   patient_nbr               101766 non-null  int64 
 2   race                      101766 non-null  object
 3   gender                    101766 non-null  object
 4   age                       101766 non-null  object
 5   weight                    101766 non-null  object
 6   admission_type_id         101766 non-null  int64 
 7   discharge_disposition_id  101766 non-null  int64 
 8   admission_source_id       101766 non-null  int64 
 9   time_in_hospital          101766 non-null  int64 
 10  payer_code                101766 non-null  object
 11  medical_specialty         101766 non-null  object
 12  num_lab_procedures        101766 non-null  int64 
 13  num_procedures            101766 non-null  int

## **Data treatment**

The `pre_process_diabetes_data()` function can be used to process new information so that it can be used in the model

In [41]:
def pre_process_diabetes_data(data):
    """
        Args:
            df (pandas DataFrame): Dataset to be processed

        Returns:
            df (pandas DataFrame): Processed dataset

        This function processes diabetes data from the original form to the model-ready form.
    """    

    # Dropping IDs and unnecessary columns and standadizing the missing values

    df = data.copy()

    df.drop(columns=['encounter_id', 'patient_nbr', 'payer_code'], inplace=True)
    df.replace(["?", 'Unknown/Invalid'], pd.NA, inplace=True)

    # Also drppping columns with too little information (Mostly null values)
    
    df.drop(columns=['weight', 'medical_specialty', 'max_glu_serum', 'A1Cresult'], inplace=True)

    # Joining tables

    df = df.join(admission_type_mapping["description"], how='left', on='admission_type_id').rename(columns={"description": "admission_type"}).drop(columns=['admission_type_id'])
    df = df.join(discharge_disposition_mapping["description"], how='left', on='discharge_disposition_id').rename(columns={"description": "discharge_disposition"}).drop(columns=['discharge_disposition_id'])
    df = df.join(admission_source_mapping["description"], how='left', on='admission_source_id').rename(columns={"description": "admission_source"}).drop(columns=['admission_source_id'])

    # Dropping rows with missing values

    df.dropna(axis=0, how='any', inplace=True)
    df.reset_index(drop=True, inplace=True)

    # Treating the target 

    y = df["readmitted"].replace({'NO': 0, '>30': 1, '<30': 2})
    df.drop(columns=['readmitted'], inplace=True)

    # Scaling numerical variables

    numerical_cols =df.select_dtypes(include=['int64', 'float64']).columns
    scaler = StandardScaler()
    df[numerical_cols] = scaler.fit_transform(df[numerical_cols])

    # Getting dummies for categorical variables 

    categorical_columns = df.select_dtypes(include=['object']).columns

    temp = pd.DataFrame()
    for column in categorical_columns:
        dummies = pd.get_dummies(df[column], prefix=column, dtype=int)
        temp = pd.concat([temp, dummies], axis=1)
        df.drop(columns=[column], inplace=True)

    # Removing columns with low variance
    
    selector = VarianceThreshold(threshold=0.01)    ## Remove collumns with variance lower than 0.01, that is, 
                                                    ## those collumns have the same value for 99.9% of the rows
                                                    
    temp = pd.DataFrame(selector.fit_transform(temp), columns=temp.columns[selector.get_support()])

    final = pd.concat([df, temp, y], axis=1)
    return final

In [42]:
# Processing the dataset

diabetes_processed = pre_process_diabetes_data(diabetes)
diabetes_processed.info(verbose=True, memory_usage=True)
diabetes_processed.head(7)

  y = df["readmitted"].replace({'NO': 0, '>30': 1, '<30': 2})


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 93054 entries, 0 to 93053
Data columns (total 130 columns):
 #    Column                                                                             Dtype  
---   ------                                                                             -----  
 0    time_in_hospital                                                                   float64
 1    num_lab_procedures                                                                 float64
 2    num_procedures                                                                     float64
 3    num_medications                                                                    float64
 4    number_outpatient                                                                  float64
 5    number_emergency                                                                   float64
 6    number_inpatient                                                                   float64
 7    number_diag

Unnamed: 0,time_in_hospital,num_lab_procedures,num_procedures,num_medications,number_outpatient,number_emergency,number_inpatient,number_diagnoses,race_AfricanAmerican,race_Caucasian,race_Hispanic,race_Other,gender_Female,gender_Male,age_[20-30),age_[30-40),age_[40-50),age_[50-60),age_[60-70),age_[70-80),age_[80-90),age_[90-100),diag_1_250.6,diag_1_250.8,diag_1_276,diag_1_38,diag_1_410,diag_1_414,diag_1_427,diag_1_428,diag_1_434,diag_1_435,diag_1_486,diag_1_491,diag_1_493,diag_1_518,diag_1_577,diag_1_584,diag_1_599,diag_1_682,diag_1_715,diag_1_780,diag_1_786,diag_1_820,diag_1_996,diag_1_V57,diag_2_250,diag_2_250.01,diag_2_250.02,diag_2_276,diag_2_285,diag_2_401,diag_2_403,diag_2_411,diag_2_414,diag_2_424,diag_2_425,diag_2_427,diag_2_428,diag_2_486,diag_2_491,diag_2_496,diag_2_518,diag_2_584,diag_2_585,diag_2_599,diag_2_682,diag_2_707,diag_2_780,diag_3_250,diag_3_250.02,diag_3_250.6,diag_3_272,diag_3_276,diag_3_285,diag_3_401,diag_3_403,diag_3_414,diag_3_424,diag_3_425,diag_3_427,diag_3_428,diag_3_496,diag_3_585,diag_3_599,diag_3_707,diag_3_780,diag_3_V45,metformin_No,metformin_Steady,metformin_Up,repaglinide_No,repaglinide_Steady,glimepiride_No,glimepiride_Steady,glipizide_No,glipizide_Steady,glyburide_No,glyburide_Steady,pioglitazone_No,pioglitazone_Steady,rosiglitazone_No,rosiglitazone_Steady,insulin_Down,insulin_No,insulin_Steady,insulin_Up,change_Ch,change_No,diabetesMed_No,diabetesMed_Yes,admission_type_Elective,admission_type_Newborn,admission_type_Trauma Center,admission_type_Urgent,discharge_disposition_Discharged/transferred to ICF,discharge_disposition_Discharged/transferred to SNF,discharge_disposition_Discharged/transferred to a long term care hospital.,discharge_disposition_Discharged/transferred to another short term hospital,discharge_disposition_Discharged/transferred to home with home health service,"discharge_disposition_Expired at home. Medicaid only, hospice.",discharge_disposition_Left AMA,discharge_disposition_Still patient or expected to return for outpatient services,admission_source_ Court/Law Enforcement,admission_source_ Emergency Room,admission_source_ Transfer from a Skilled Nursing Facility (SNF),admission_source_Clinic Referral,admission_source_HMO Referral,admission_source_Readmission to Same Home Health Agency,readmitted
0,-0.482483,0.773285,-0.794457,0.230064,-0.279855,-0.209539,-0.511512,0.80858,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,1,0,1,0,1,0,1,0,0,0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1
1,-0.815415,-1.693726,2.102883,-0.381563,1.349633,-0.209539,0.272334,-0.826894,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,1,1,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0
2,-0.815415,0.002344,-0.214989,-0.014586,-0.279855,-0.209539,-0.511512,-0.281736,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,1,0,1,0,1,0,1,0,0,0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0
3,-1.148348,0.362116,-0.794457,-0.99319,-0.279855,-0.209539,-0.511512,-1.372052,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,1,1,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0
4,-0.482483,-0.665805,2.682352,-0.014586,-0.279855,-0.209539,-0.511512,0.80858,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,1,0,1,0,1,0,1,0,0,0,1,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1
5,-0.14955,1.338642,-0.214989,0.597041,-0.279855,-0.209539,-0.511512,-0.281736,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0,0,0,1,0,1,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0
6,0.183382,1.49283,-0.794457,-0.503888,-0.279855,-0.209539,-0.511512,0.263422,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,1,0,0,1,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1


## **The model** 

Two different models were used: RandomForest and Multi Layer Perceptron Classifier (RNN)

A Grid Search was executed to find the best combination of parameters for this problem, to each of the two models

### **Finding out the best parameters**

A grid Search performs a series of model fits using many different pre-defined parameters and makes it easy to compare wich are the best

In [121]:
def best_model_Random_Forest(processed_data: pd.DataFrame, target: str, param_grid: dict):
    """
        Args:
            processed_data (pandas DataFrame): Processed diabetes dataset to be used for training
            target (str): Target variable name
            param_grid (dict): Dictionary containing hyperparameters for Random Forest model

        Returns:
            grid_search, best_model (Grid_Search, RandomForestClassifier): Grid Search results and best Random Forest model after hyperparameter tuning

        This function performs hyperparameter tuning for a Random Forest model using GridSearchCV on the processed diabetes dataset.
    """

    # Splitting Data
    X = processed_data.drop(columns=[target])
    y = processed_data[target]
    
    scoring = 'accuracy'
    
    grid_search = GridSearchCV(RandomForestClassifier(verbose=0), 
                               param_grid, 
                               cv=3, 
                               scoring=scoring, 
                               n_jobs=-1, 
                               verbose=1)
    grid_search.fit(X, y)

    # Visualizing the grid search results
    cv_results_df = pd.DataFrame(grid_search.cv_results_)
    print("Score type: ", scoring)
    display(pd.concat([cv_results_df[f'rank_test_score'], 
                        cv_results_df[f'mean_test_score'], 
                        cv_results_df[f'std_test_score'],
                        cv_results_df[('param_' + str(k) for k in param_grid.keys())]], 
                    axis=1).sort_values(by='rank_test_score'))
    
    return grid_search, grid_search.best_estimator_


In [144]:
def best_model_MLPClassifier(processed_data: pd.DataFrame, target: str, param_grid: dict):
    """
        Args:
            processed_data (pandas DataFrame): Processed diabetes dataset to be used for training
            target (str): Target variable name
            param_grid (dict): Dictionary containing hyperparameters for MLPClassifier model

        Returns:
            grid_search, best_model (Grid_Search, MLPClassifier): Grid Serach results and best MLPClassifier model after hyperparameter tuning

        This function performs hyperparameter tuning for a MLPCLassifier model using GridSearchCV on the processed diabetes dataset.
    """

    # Splitting Data
    X = processed_data.drop(columns=[target])
    y = processed_data[target]
    
    scoring= ['accuracy','f1_macro']

    grid_search = GridSearchCV(MLPClassifier(max_iter=250, 
                                             verbose=False, 
                                             early_stopping=True, 
                                             n_iter_no_change=30), 
                               param_grid, 
                               cv=3, 
                               scoring=scoring, 
                               n_jobs=-1,
                               refit=scoring[0],
                               verbose=1)
    grid_search.fit(X, y)

    # Visualizing the grid search results
    cv_results_df = pd.DataFrame(grid_search.cv_results_)
    
    if len(scoring) == 1:
        columns = ['rank_test_score', 'mean_test_score', 'std_test_score']
        sort = 'rank_test_score'
    else:
        columns = [f'rank_test_{s}' for s in scoring] + [f'mean_test_{s}' for s in scoring] + [f'std_test_{s}' for s in scoring]
        sort = [f'rank_test_{s}' for s in scoring]
    
    print("Score type: ", scoring)
    display(pd.concat([
         cv_results_df[columns], 
         cv_results_df[[('param_' + str(k)) for k in param_grid.keys()]]], axis=1)
        .sort_values(by=sort))
    
    return grid_search, grid_search.best_estimator_


In [None]:
## Fiding ou the best parameters for the Random Forest model (Abou 42 Min of processing time for 192 fits)

# Defining the parameters to be combined and tested for Random Forest
param_grid_rf = {
    'n_estimators': [200, 500],             
    'max_depth': [20, 50],             
    'min_samples_split': [5, 10],               
    'max_features': ['sqrt', 'log2'],      
    'bootstrap': [True],                  
    'criterion': ['gini', 'entropy'],             
    'class_weight': ['balanced', 'balanced_subsample']  # To handle unbalanced target classes
}

grid_rf, m_rf = best_model_Random_Forest(diabetes_processed, "readmitted", param_grid_rf)

Fitting 3 folds for each of 64 candidates, totalling 192 fits
Score type:  accuracy


Unnamed: 0,rank_test_score,mean_test_score,std_test_score,param_n_estimators,param_max_depth,param_min_samples_split,param_max_features,param_bootstrap,param_criterion,param_class_weight
57,1,0.566983,0.005813,500,50,5,sqrt,True,entropy,balanced_subsample
25,2,0.566456,0.005751,500,50,5,sqrt,True,entropy,balanced
61,3,0.565467,0.004747,500,50,5,log2,True,entropy,balanced_subsample
29,4,0.565209,0.004830,500,50,5,log2,True,entropy,balanced
45,5,0.565102,0.005554,500,50,5,log2,True,gini,balanced_subsample
...,...,...,...,...,...,...,...,...,...,...
38,60,0.527597,0.026985,200,20,10,log2,True,gini,balanced_subsample
6,61,0.526855,0.027977,200,20,10,log2,True,gini,balanced
55,62,0.526673,0.029115,500,20,10,log2,True,entropy,balanced_subsample
22,63,0.525931,0.027957,200,20,10,log2,True,entropy,balanced


In [153]:
# Finding out the best parameters for the MLPClassifier model (Aboutv 7 Min of processing time)

param_grid_mlp = {
    'hidden_layer_sizes': [(128,), (128, 64), (256, 128)],
    'activation': ['relu', 'tanh'],
    'alpha': [1e-3, 1e-2],
    'learning_rate_init': [0.001, 0.005],
    'solver': ['adam', 'sgd']
}

grid_mlp, m_mlp = best_model_MLPClassifier(diabetes_processed, "readmitted", param_grid_mlp)

Fitting 3 folds for each of 48 candidates, totalling 144 fits
Score type:  ['accuracy', 'f1_macro']


Unnamed: 0,rank_test_accuracy,rank_test_f1_macro,mean_test_accuracy,mean_test_f1_macro,std_test_accuracy,std_test_f1_macro,param_hidden_layer_sizes,param_activation,param_alpha,param_learning_rate_init,param_solver
45,1,18,0.571528,0.38258,0.00591,0.019751,"(256, 128)",tanh,0.01,0.001,sgd
14,2,43,0.571002,0.376697,0.007059,0.02131,"(128,)",relu,0.01,0.005,adam
26,3,5,0.570411,0.3894,0.004243,0.018981,"(128,)",tanh,0.001,0.005,adam
4,4,13,0.570067,0.38481,0.006295,0.01504,"(128, 64)",relu,0.001,0.001,adam
24,5,4,0.570035,0.39039,0.004888,0.011951,"(128,)",tanh,0.001,0.001,adam
22,6,1,0.570024,0.391536,0.008124,0.022262,"(256, 128)",relu,0.01,0.005,adam
5,7,39,0.570013,0.378444,0.005719,0.019432,"(128, 64)",relu,0.001,0.001,sgd
25,8,30,0.569658,0.380457,0.007594,0.025704,"(128,)",tanh,0.001,0.001,sgd
21,9,41,0.569411,0.377912,0.005251,0.018311,"(256, 128)",relu,0.01,0.001,sgd
43,10,8,0.569089,0.387051,0.005086,0.016675,"(128, 64)",tanh,0.01,0.005,sgd


In [154]:
def single_model_RNN(processed_data: pd.DataFrame, target: str, params: dict):
    """
        Args:
            processed_data (pandas DataFrame): Processed diabetes dataset to be used for training
            target (str): Target variable name
            params (dict): Dictionary containing hyperparameters for MLPClassifier model

        Returns:
            model (MLPClassifier): Trained MLP model
            X_test (pandas DataFrame): Test data to be used for evaluation
            y_test (pandas Series): Test data to be used for evaluation

        This function trains a Multi-Layer Perceptron (MLP) model on the processed diabetes dataset, using predefined hyperparameters.
    """    

    # Splitting the data into train and test sets
    X = processed_data.drop(columns=[target])
    y = processed_data[target]

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    model = MLPClassifier(**params, max_iter=250,
                                    early_stopping=True, 
                                    n_iter_no_change=30)

    model.fit(X_train, y_train)

    return model, X_test, y_test


In [155]:
def single_model_Random_Forest(processed_data: pd.DataFrame, target: str, params: dict):
    """
        Args:
            processed_data (pandas DataFrame): Processed diabetes dataset to be used for training
            target (str): Target variable name
            params (dict): Dictionary containing hyperparameters for Random Forest model

        Returns:
            rf (RandomForestClassifier): Trained Random Forest model
            X_test (pandas DataFrame): Test data to be used for evaluation
            y_test (pandas Series): Test data to be used for evaluation

        This function trains a Random Forest model on the processed diabetes dataset, using predefined hyperparameters.
    """    

    # Splitting the data into train and test sets
    X = processed_data.drop(columns=[target])
    y = processed_data[target]

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Creating and training the Random Forest model
    rf = RandomForestClassifier(**params, n_jobs=-1, verbose=0)
    
    rf.fit(X_train, y_train)

    return rf, X_test, y_test


In [None]:
rf, rf_X_test, rf_Y_test = single_model_Random_Forest(diabetes_processed, "readmitted", grid_rf.best_params_)

## rf_best_params: {'bootstrap': True, 'class_weight': 'balanced_subsample', 'criterion': 'entropy', 'max_depth': 50, 'max_features': 'sqrt', 'min_samples_split': 5, 'n_estimators': 500}

mlp, mlp_X_test, mlp_y_test = single_model_RNN(diabetes_processed, "readmitted", grid_mlp.best_params_)

## mlp_best_params: {'activation': 'tanh', 'alpha': 0.01, 'hidden_layer_sizes': (256, 128), 'learning_rate_init': 0.001, 'solver': 'sgd'}

In [None]:
# Save the Random Forest model
joblib.dump(rf, 'random_forest_model.pkl')

# Save the MLP model
joblib.dump(mlp, 'mlp_model.pkl')

## **Evaluating the models**

In [160]:
print("------------------RANDOM FOREST------------------", end="\n\n")
rf_predictions = rf.predict(rf_X_test)

print("Accuracy:", accuracy_score(rf_Y_test, rf_predictions))
print("\nClassification Report:\n", classification_report(rf_Y_test, rf_predictions))

print("Confusion Matrix:")
display(pd.crosstab(rf_Y_test, rf_predictions, rownames=['Actual'], colnames=['Predicted'], margins=True))

print("Feature Importance:")
display(pd.DataFrame(rf.feature_importances_, index=rf_X_test.columns, columns=['Importance']).sort_values(by='Importance', ascending=False).head(10))

print("---------------MULTI LAYER PERCEPTRON---------------", end="\n\n")
mlp_predictions = mlp.predict(mlp_X_test)

print("Accuracy:", accuracy_score(mlp_y_test, mlp_predictions))
print("\nClassification Report:\n", classification_report(mlp_y_test, mlp_predictions))

print("Confusion Matrix:")
display(pd.crosstab(mlp_y_test, mlp_predictions, rownames=['Actual'], colnames=['Predicted'], margins=True))


------------------RANDOM FOREST------------------

Accuracy: 0.5765944871312665

Classification Report:
               precision    recall  f1-score   support

           0       0.62      0.79      0.69      9838
           1       0.49      0.44      0.46      6572
           2       0.44      0.05      0.10      2201

    accuracy                           0.58     18611
   macro avg       0.52      0.43      0.42     18611
weighted avg       0.55      0.58      0.54     18611

Confusion Matrix:


Predicted,0,1,2,All
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,7736,2032,70,9838
1,3612,2875,85,6572
2,1107,974,120,2201
All,12455,5881,275,18611


Feature Importance:


Unnamed: 0,Importance
num_lab_procedures,0.082952
num_medications,0.073155
time_in_hospital,0.05395
number_inpatient,0.045664
num_procedures,0.038134
number_diagnoses,0.036529
number_outpatient,0.01701
number_emergency,0.014543
gender_Female,0.014187
gender_Male,0.014018


---------------MULTI LAYER PERCEPTRON---------------

Accuracy: 0.5744989522325507

Classification Report:
               precision    recall  f1-score   support

           0       0.61      0.83      0.70      9838
           1       0.49      0.38      0.43      6572
           2       0.37      0.02      0.04      2201

    accuracy                           0.57     18611
   macro avg       0.49      0.41      0.39     18611
weighted avg       0.54      0.57      0.53     18611

Confusion Matrix:


Predicted,0,1,2,All
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,8174,1637,27,9838
1,4048,2470,54,6572
2,1263,890,48,2201
All,13485,4997,129,18611
