# **Classification Challenge**

`Tópicos Especiais em Computação VIII`

Using Random Forests and Multi Layer Perceptrons to predict hospital readmissions of diabetic patients

*Luiz Henrique Rigo Faccio*

## **Importing Libraries and loading dataset**

In [87]:
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.max_columns', None)
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.feature_selection import VarianceThreshold
from sklearn.metrics import classification_report, accuracy_score
from sklearn.decomposition import PCA


In [2]:
# Loading informations
folder = "diabetes"
diabetes = pd.read_csv(f'{folder}/diabetic_data.csv')
mapping = pd.read_csv(f'{folder}/IDS_mapping.csv')
admission_type_mapping = mapping[0:7]
discharge_disposition_mapping = mapping[10:40].reset_index(drop=True)
admission_source_mapping = mapping[42:67].reset_index(drop=True)

## **Observing the dataset**

In [3]:
print(diabetes.shape)
print(diabetes.info())
print(diabetes.describe(include='all'))

print(diabetes["readmitted"].value_counts())

print()
for c in diabetes.columns:
    print(f"Column {c}", end="\n\t\t")
    print(diabetes[c].unique(), end="\n\n")

(101766, 50)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101766 entries, 0 to 101765
Data columns (total 50 columns):
 #   Column                    Non-Null Count   Dtype 
---  ------                    --------------   ----- 
 0   encounter_id              101766 non-null  int64 
 1   patient_nbr               101766 non-null  int64 
 2   race                      101766 non-null  object
 3   gender                    101766 non-null  object
 4   age                       101766 non-null  object
 5   weight                    101766 non-null  object
 6   admission_type_id         101766 non-null  int64 
 7   discharge_disposition_id  101766 non-null  int64 
 8   admission_source_id       101766 non-null  int64 
 9   time_in_hospital          101766 non-null  int64 
 10  payer_code                101766 non-null  object
 11  medical_specialty         101766 non-null  object
 12  num_lab_procedures        101766 non-null  int64 
 13  num_procedures            101766 non-null  int

## **Data treatment**

The `pre_process_new_diabetes_data()` function can be used to process new information so that it can be used in the model

In [None]:
def pre_process_diabetes_data(data):
    """
        Args:
            df (pandas DataFrame): Dataset to be processed

        Returns:
            df (pandas DataFrame): Processed dataset
            features (list): List of features used in the model

        This function processes diabetes data from the original form to the training-ready form, to be used to train the models.
    """    

    # Dropping IDs and unnecessary columns and standadizing the missing values

    df = data.copy()

    df.drop(columns=['encounter_id', 'patient_nbr', 'payer_code'], inplace=True)
    df.replace(["?", 'Unknown/Invalid'], pd.NA, inplace=True)

    # Also drppping columns with too little information (Mostly null values)
    
    df.drop(columns=['weight', 'medical_specialty', 'max_glu_serum', 'A1Cresult'], inplace=True)

    # Joining tables

    df = df.join(admission_type_mapping["description"], how='left', on='admission_type_id').rename(columns={"description": "admission_type"}).drop(columns=['admission_type_id'])
    df = df.join(discharge_disposition_mapping["description"], how='left', on='discharge_disposition_id').rename(columns={"description": "discharge_disposition"}).drop(columns=['discharge_disposition_id'])
    df = df.join(admission_source_mapping["description"], how='left', on='admission_source_id').rename(columns={"description": "admission_source"}).drop(columns=['admission_source_id'])

    # Dropping rows with missing values

    df.dropna(axis=0, how='any', inplace=True)
    df.reset_index(drop=True, inplace=True)

    # Treating the target 

    y = df["readmitted"].replace({'NO': 0, '>30': 1, '<30': 2})
    df.drop(columns=['readmitted'], inplace=True)

    # Scaling numerical variables

    numerical_cols =df.select_dtypes(include=['int64', 'float64']).columns
    scaler = StandardScaler()
    df[numerical_cols] = scaler.fit_transform(df[numerical_cols])

    # Getting dummies for categorical variables 

    categorical_columns = df.select_dtypes(include=['object']).columns

    temp = pd.DataFrame()
    for column in categorical_columns:
        dummies = pd.get_dummies(df[column], prefix=column, dtype=int)
        temp = pd.concat([temp, dummies], axis=1)
        df.drop(columns=[column], inplace=True)


    # Removing columns with low variance
    
    selector = VarianceThreshold(threshold=0.01)    ## Remove collumns with variance lower than 0.01, that is, 
                                                    ## those collumns have the same value for 99.9% of the rows
                                                    
    temp = pd.DataFrame(selector.fit_transform(temp), columns=temp.columns[selector.get_support()])

    final = pd.concat([df, temp, y], axis=1)
    return final, final.columns.tolist()

In [49]:
def pre_process_new_diabetes_data(data, features):
    """
        Args:
            df (pandas DataFrame): Dataset to be processed
            features (list): List of features used in the model

        Returns:
            df (pandas DataFrame): Processed dataset

        This function processes diabetes data from the original form to the model-ready form.
    """    

    # Dropping IDs and unnecessary columns and standadizing the missing values

    df = data.copy()

    df.drop(columns=['encounter_id', 'patient_nbr', 'payer_code'], inplace=True)
    df.replace(["?", 'Unknown/Invalid'], pd.NA, inplace=True)

    # Also drppping columns with too little information (Mostly null values)
    
    df.drop(columns=['weight', 'medical_specialty', 'max_glu_serum', 'A1Cresult'], inplace=True)

    # Joining tables

    df = df.join(admission_type_mapping["description"], how='left', on='admission_type_id').rename(columns={"description": "admission_type"}).drop(columns=['admission_type_id'])
    df = df.join(discharge_disposition_mapping["description"], how='left', on='discharge_disposition_id').rename(columns={"description": "discharge_disposition"}).drop(columns=['discharge_disposition_id'])
    df = df.join(admission_source_mapping["description"], how='left', on='admission_source_id').rename(columns={"description": "admission_source"}).drop(columns=['admission_source_id'])

    # Dropping rows with missing values

    df.dropna(axis=0, how='any', inplace=True)
    df.reset_index(drop=True, inplace=True)

    # Treating the target 

    y = df["readmitted"].replace({'NO': 0, '>30': 1, '<30': 2})
    df.drop(columns=['readmitted'], inplace=True)

    # Scaling numerical variables

    numerical_cols =df.select_dtypes(include=['int64', 'float64']).columns
    scaler = StandardScaler()
    df[numerical_cols] = scaler.fit_transform(df[numerical_cols])

    # Getting dummies for categorical variables 

    categorical_columns = df.select_dtypes(include=['object']).columns

    temp = pd.DataFrame()
    for column in categorical_columns:
        dummies = pd.get_dummies(df[column], prefix=column, dtype=int)
        temp = pd.concat([temp, dummies], axis=1)
        df.drop(columns=[column], inplace=True)
    
    

    final = pd.concat([df, temp, y], axis=1)

    current_cols = final.columns.tolist()
    missing_cols = set(features) - (set(current_cols))

    for col in missing_cols:
        final[col] = 0

    return final[features]

In [98]:
# Processing the dataset

diabetes_processed, features = pre_process_diabetes_data(diabetes)
diabetes_processed.info(verbose=True, memory_usage=True)
diabetes_processed.head(7)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 93054 entries, 0 to 93053
Data columns (total 358 columns):
 #    Column              Dtype  
---   ------              -----  
 0    time_in_hospital    float64
 1    num_lab_procedures  float64
 2    num_procedures      float64
 3    num_medications     float64
 4    number_outpatient   float64
 5    number_emergency    float64
 6    number_inpatient    float64
 7    number_diagnoses    float64
 8    pca_1               float64
 9    pca_2               float64
 10   pca_3               float64
 11   pca_4               float64
 12   pca_5               float64
 13   pca_6               float64
 14   pca_7               float64
 15   pca_8               float64
 16   pca_9               float64
 17   pca_10              float64
 18   pca_11              float64
 19   pca_12              float64
 20   pca_13              float64
 21   pca_14              float64
 22   pca_15              float64
 23   pca_16              float64
 24   

Unnamed: 0,time_in_hospital,num_lab_procedures,num_procedures,num_medications,number_outpatient,number_emergency,number_inpatient,number_diagnoses,pca_1,pca_2,pca_3,pca_4,pca_5,pca_6,pca_7,pca_8,pca_9,pca_10,pca_11,pca_12,pca_13,pca_14,pca_15,pca_16,pca_17,pca_18,pca_19,pca_20,pca_21,pca_22,pca_23,pca_24,pca_25,pca_26,pca_27,pca_28,pca_29,pca_30,pca_31,pca_32,pca_33,pca_34,pca_35,pca_36,pca_37,pca_38,pca_39,pca_40,pca_41,pca_42,pca_43,pca_44,pca_45,pca_46,pca_47,pca_48,pca_49,pca_50,pca_51,pca_52,pca_53,pca_54,pca_55,pca_56,pca_57,pca_58,pca_59,pca_60,pca_61,pca_62,pca_63,pca_64,pca_65,pca_66,pca_67,pca_68,pca_69,pca_70,pca_71,pca_72,pca_73,pca_74,pca_75,pca_76,pca_77,pca_78,pca_79,pca_80,pca_81,pca_82,pca_83,pca_84,pca_85,pca_86,pca_87,pca_88,pca_89,pca_90,pca_91,pca_92,pca_93,pca_94,pca_95,pca_96,pca_97,pca_98,pca_99,pca_100,pca_101,pca_102,pca_103,pca_104,pca_105,pca_106,pca_107,pca_108,pca_109,pca_110,pca_111,pca_112,pca_113,pca_114,pca_115,pca_116,pca_117,pca_118,pca_119,pca_120,pca_121,pca_122,pca_123,pca_124,pca_125,pca_126,pca_127,pca_128,pca_129,pca_130,pca_131,pca_132,pca_133,pca_134,pca_135,pca_136,pca_137,pca_138,pca_139,pca_140,pca_141,pca_142,pca_143,pca_144,pca_145,pca_146,pca_147,pca_148,pca_149,pca_150,pca_151,pca_152,pca_153,pca_154,pca_155,pca_156,pca_157,pca_158,pca_159,pca_160,pca_161,pca_162,pca_163,pca_164,pca_165,pca_166,pca_167,pca_168,pca_169,pca_170,pca_171,pca_172,pca_173,pca_174,pca_175,pca_176,pca_177,pca_178,pca_179,pca_180,pca_181,pca_182,pca_183,pca_184,pca_185,pca_186,pca_187,pca_188,pca_189,pca_190,pca_191,pca_192,pca_193,pca_194,pca_195,pca_196,pca_197,pca_198,pca_199,pca_200,pca_201,pca_202,pca_203,pca_204,pca_205,pca_206,pca_207,pca_208,pca_209,pca_210,pca_211,pca_212,pca_213,pca_214,pca_215,pca_216,pca_217,pca_218,pca_219,pca_220,pca_221,pca_222,pca_223,pca_224,pca_225,pca_226,pca_227,pca_228,pca_229,pca_230,pca_231,pca_232,pca_233,pca_234,pca_235,pca_236,pca_237,pca_238,pca_239,pca_240,pca_241,pca_242,pca_243,pca_244,pca_245,pca_246,pca_247,pca_248,pca_249,pca_250,pca_251,pca_252,pca_253,pca_254,pca_255,pca_256,pca_257,pca_258,pca_259,pca_260,pca_261,pca_262,pca_263,pca_264,pca_265,pca_266,pca_267,pca_268,pca_269,pca_270,pca_271,pca_272,pca_273,pca_274,pca_275,pca_276,pca_277,pca_278,pca_279,pca_280,pca_281,pca_282,pca_283,pca_284,pca_285,pca_286,pca_287,pca_288,pca_289,pca_290,pca_291,pca_292,pca_293,pca_294,pca_295,pca_296,pca_297,pca_298,pca_299,pca_300,pca_301,pca_302,pca_303,pca_304,pca_305,pca_306,pca_307,pca_308,pca_309,pca_310,pca_311,pca_312,pca_313,pca_314,pca_315,pca_316,pca_317,pca_318,pca_319,pca_320,pca_321,pca_322,pca_323,pca_324,pca_325,pca_326,pca_327,pca_328,pca_329,pca_330,pca_331,pca_332,pca_333,pca_334,pca_335,pca_336,pca_337,pca_338,pca_339,pca_340,pca_341,pca_342,pca_343,pca_344,pca_345,pca_346,pca_347,pca_348,pca_349,readmitted
0,-0.482483,0.773285,-0.794457,0.230064,-0.279855,-0.209539,-0.511512,0.80858,0.769268,-0.757539,0.497789,0.057625,0.040615,-0.057489,-1.000206,0.101826,-0.189102,-0.158361,-0.0989,-0.012402,-0.064456,-0.047946,-0.239136,0.055188,-0.050543,-0.726064,0.348675,-0.090542,0.011158,-0.162231,-0.058811,0.098186,0.012489,0.009834,0.03922,-0.051286,-0.168501,-0.032248,-0.018271,-0.094385,0.172524,-0.087761,-0.114393,-0.042958,0.025713,0.049369,-0.067771,-0.034557,0.007795,0.025082,-0.036497,0.02769,-0.002985,-0.019629,0.024106,0.14047,-0.026791,-0.13926,-0.040722,0.034904,-0.072724,-0.010634,-0.039741,-0.229648,-0.067005,0.002648,-0.012962,0.083386,-0.477854,-0.244892,0.369777,-0.218192,0.332493,0.34409,0.091328,0.221074,-0.270087,0.245735,-0.019882,-0.011109,0.068602,0.151665,-0.050074,0.022724,0.107045,-0.357192,-0.114613,0.05473,-0.302441,0.470706,-0.373551,0.028134,-0.297156,0.270441,-0.161224,-0.116116,-0.10277,-0.106794,-0.142817,-0.086037,-0.146141,-0.064041,-0.158779,0.073206,0.024134,-0.085089,-0.021692,0.031956,-0.056387,0.024949,0.003167,-0.021594,-0.00372,0.000945,0.010032,-0.027723,-0.018157,0.006389,-0.02453,0.021955,0.030553,-0.004303,0.035365,-0.038856,-0.003469,0.009976,-0.036188,-0.002787,0.037424,-0.049962,-0.040601,-0.065054,-0.008128,-0.054064,-0.064082,-0.141555,0.114002,0.292942,0.327055,-0.017831,0.000795,-0.041931,0.007276,-0.013396,0.065788,0.011273,-0.032904,0.021356,0.120184,-0.028126,0.014616,-0.056949,-0.058198,0.259435,0.214345,0.30498,0.082414,0.042851,-0.044458,0.039697,0.088105,-0.028461,0.046866,0.055489,-0.105855,0.025227,0.125644,-0.071496,0.038741,-0.193511,0.14635,-0.0357,0.087382,0.064646,0.014819,0.002066,-0.00149,0.091068,0.027802,-0.104387,0.41228,0.068517,-0.307485,-0.048889,0.056366,-0.06929,-0.007851,-0.00873,-0.072484,0.099453,-0.031893,-0.000332,0.027833,-0.003106,-0.030301,-0.03994,0.040529,-0.034452,-0.004019,0.020858,0.040099,0.052134,-0.02878,0.029588,0.054498,0.010706,-0.008559,-0.0573,-0.036556,0.000379,0.030626,0.040283,-0.049338,-0.008617,0.057574,-0.066075,0.018769,0.069849,0.092801,0.086639,-0.000704,0.101284,-0.060603,-0.033036,0.027685,0.040962,-0.026758,-0.018697,0.034242,0.026047,-0.008102,0.018213,-0.011717,0.028431,-0.017243,-0.012109,0.005354,0.00442,0.003039,-0.010267,0.007636,-0.012907,-0.008159,0.01781,0.016285,-0.001552,0.00329,0.003709,0.015653,0.016779,0.005237,-0.001248,0.008404,-0.001451,0.012345,-0.010937,0.01335,0.01661,-0.01233,0.020519,0.015252,0.042501,0.058889,0.035077,0.03053,0.016047,-0.033829,0.080855,0.015949,0.057139,0.004455,0.004369,-0.022177,0.017299,-0.009542,-0.004068,-0.011281,0.006249,-0.006721,0.000757,-0.002792,0.00528,0.001129,-0.002253,-0.004753,-0.004028,0.008422,0.003157,0.008386,0.007643,-0.007519,-0.001825,0.023102,-0.009503,0.01176,0.005733,-0.004234,0.017791,-0.00541,0.040532,0.008134,0.008451,-0.011306,-0.008632,0.02027,-0.005924,-0.001742,-0.012882,0.005103,0.008764,-0.010907,-0.007181,0.007305,0.009341,0.001205,-0.01079,-0.000344,0.010101,0.005051,-0.018986,0.012386,0.002769,0.002659,0.007951,-0.009835,0.005995,0.001101,-0.005453,-0.002918,0.008361,-0.009029,-0.004254,0.010638,-0.001211,-0.008722,0.001787,-0.002136,0.006302,-0.012438,-0.007012,0.006838,0.011375,-0.009,-0.003657,0.00587,0.000428,-0.009237,0.013506,-0.002857,-0.00881,0.005696,0.014069,0.016236,0.000408,-0.001267,-0.004061,-0.017196,1
1,-0.815415,-1.693726,2.102883,-0.381563,1.349633,-0.209539,0.272334,-0.826894,-0.508048,-0.808922,0.563323,-0.453813,0.962157,0.067774,0.312559,0.225287,1.399037,-0.111642,-0.006306,0.534871,0.00423,-0.000522,-0.412749,-0.014541,-0.180019,-0.210726,-0.244959,0.04592,0.131648,0.127134,0.080286,0.121548,-0.109797,-0.140182,0.017109,-0.1353,-0.168722,-0.052896,0.105627,-0.330465,0.398293,0.396192,0.439768,-0.01662,-0.262725,0.119742,0.072955,-0.142582,-0.026678,-0.153368,-0.024124,0.056448,-0.086502,0.071477,0.079319,0.203549,-0.013198,-0.097649,-0.077872,-0.049256,0.00183,0.046215,-0.037705,-0.091071,-0.075857,0.04057,-0.107032,-0.16885,-0.323718,0.158059,-0.085769,0.201853,-0.050814,0.008177,-0.253309,0.575886,-0.183233,0.105497,-0.282084,0.48587,0.117541,-0.056556,0.046143,0.137243,0.066444,0.014883,-0.032504,0.077239,0.08308,-0.074217,0.135914,-0.022109,0.06105,0.077436,0.109263,0.042631,-0.034729,0.077985,-0.056803,0.008676,-0.070508,-0.045787,0.010145,-0.025399,0.054641,-0.020395,0.048904,0.043548,0.012184,0.024134,-0.0394,0.012084,-0.046217,0.022908,0.064482,0.033795,0.039441,0.001364,-0.057087,0.040428,0.040583,0.064572,-0.009704,0.026381,0.033459,0.037737,-0.008898,0.042907,0.001833,0.047565,0.003063,0.035885,0.003833,-0.004346,0.032474,0.041178,-0.050871,-0.165747,-0.132429,-0.024747,0.002055,0.001916,-0.029524,0.010703,-0.07045,-0.009175,-0.001089,-0.042829,-0.004375,0.001967,0.018285,-0.027121,-0.0218,-0.04104,-0.042627,-0.056057,0.002289,-0.008385,-0.014175,-0.006504,-0.015519,-0.026064,-0.028051,-0.0587,0.031282,-0.023399,-0.034755,-0.009034,-0.002342,-0.011758,-0.004853,-0.027331,-0.003893,0.010984,-0.007944,-0.007416,-0.002221,0.037088,-0.007814,-0.028267,0.136056,0.012303,-0.099776,-0.024873,0.008345,-0.023195,-0.011632,-0.00389,-0.019521,0.037484,-0.011764,0.011613,0.022803,0.001685,-0.020782,-0.012919,-0.024596,0.012642,0.000898,0.005005,0.044308,0.024718,-0.019903,0.02357,0.009234,-0.015419,-0.000906,-0.02438,0.007555,0.006228,0.029933,0.015999,-0.029621,-0.043926,0.009824,-0.092212,0.046017,0.018291,0.063237,-0.019109,-0.066858,0.053098,-0.01215,-0.012497,-0.027603,0.088333,-0.011101,-0.006068,0.023051,0.061371,-0.039483,0.037076,-0.03038,0.037868,0.051338,-0.019927,0.022836,-0.06609,0.102281,0.047344,-0.03441,0.014854,-0.103393,-0.033654,0.029238,0.066881,-0.231396,0.039595,-0.07951,0.022246,0.280246,0.117102,-0.070264,-0.031227,-0.120919,0.066415,-0.098066,-0.055624,0.049454,-0.121177,-0.098258,-0.222477,-0.193264,-0.1644,-0.174235,-0.140342,0.171232,-0.346247,-0.02809,-0.188598,0.025371,-0.063562,0.069351,-0.082341,-0.010323,0.015539,0.043064,-0.025335,-0.027236,-0.030817,0.027122,-0.012997,-0.008987,-0.01087,0.021637,0.027102,0.01665,-0.065567,-0.001453,0.026212,-0.026395,0.030939,-0.035946,0.033723,0.108921,-0.001858,-0.054677,0.078943,0.052496,-0.02117,-0.000654,0.168856,0.133763,0.146239,0.169765,-0.035341,0.000784,0.018983,0.02389,-0.011051,0.005307,-0.001035,0.020453,-0.002789,-0.001333,0.010567,0.001662,0.004272,0.015176,0.031598,0.003634,-0.031684,0.005368,0.012157,0.000324,-0.008993,-0.011423,0.006147,0.023633,-0.011936,0.002938,-0.004854,-0.01536,0.015716,0.009187,-0.006324,0.004262,-0.000614,-0.010212,-0.014629,-0.008436,-0.003432,0.001418,-0.012649,-0.00081,-0.000179,0.004515,-0.012054,0.009302,0.004524,-0.007692,-0.000734,-0.001652,-0.008192,0.014856,0.001451,0.009426,0
2,-0.815415,0.002344,-0.214989,-0.014586,-0.279855,-0.209539,-0.511512,-0.281736,0.811809,-0.614768,-0.878979,0.000422,-0.019705,0.177425,-0.860285,0.102217,-0.199747,-0.116148,-0.147359,-0.030799,-0.167097,-0.017664,-0.257815,0.065459,-0.063654,-0.721232,0.357744,-0.092843,0.010378,-0.193652,-0.061395,0.088067,-0.053451,-0.078944,0.003651,-0.019057,-0.159559,-0.049132,0.07091,-0.242579,0.834233,-0.173482,-0.231678,0.131273,0.089423,-0.017911,-0.132711,0.001498,-0.005173,-0.02127,-0.057897,-0.006105,-0.270463,-0.099255,-0.149272,-0.16046,0.011016,0.160231,0.681123,0.291378,-0.028678,-0.146479,0.067024,0.144185,0.067303,-0.02191,-0.035502,0.174909,0.08017,-0.135998,0.060232,-0.032361,-0.159811,0.028976,-0.314241,-0.128772,-0.042093,0.132816,-0.070845,-0.079758,-0.037607,-0.08232,-0.008064,0.084468,0.041125,-0.072321,0.011793,0.005632,-0.057232,-0.054235,0.026272,0.129699,-0.010836,-0.055033,-0.037866,-0.02033,0.017942,-0.064429,0.03219,-0.024631,0.071647,-0.030007,0.019206,-0.062021,-0.041284,0.036125,-0.048782,0.005296,0.043281,0.031264,0.002199,0.045977,-0.014995,-0.016298,0.03985,0.00222,0.007727,0.001614,0.029861,-0.018368,0.019634,0.024928,0.02068,0.07461,0.034904,-0.009126,-0.002621,-0.027907,0.00486,-0.046969,0.00554,0.044534,0.016766,-0.006689,-0.013056,0.068944,-0.002945,-0.0475,-0.087471,-0.042521,-0.014715,-0.003132,-0.07327,0.000876,-0.006002,-0.115748,0.024628,-0.029924,-0.06385,0.012942,-0.024513,0.018493,0.027535,-0.013473,0.005262,-0.13626,-0.062206,-0.02293,-0.165172,-0.111296,-0.076249,-0.139095,-0.069032,-0.078949,0.160195,-0.372854,0.633753,-0.20867,-0.282141,0.105463,0.255765,0.144892,0.142901,0.145477,0.022702,-0.068576,-0.026463,-0.013508,0.039609,-0.019622,-0.006159,0.02113,0.0314,0.0289,0.023921,-0.010165,-0.036977,0.038266,-0.010167,0.0173,0.011588,-0.024381,0.006251,-0.029765,0.025306,-0.010746,0.028329,-0.003227,-0.006744,-0.000992,-0.009218,-0.010467,0.014432,-0.016977,0.012292,0.024325,-0.016515,0.019181,-0.027952,0.004683,-0.005509,-0.015527,-0.004401,0.003125,0.010223,-0.049265,0.013236,0.036021,0.012782,-0.004302,0.003834,0.012224,0.003991,-0.008615,-0.001813,0.002334,-0.000664,0.014926,-0.020612,0.013276,0.001923,-0.011439,0.000528,-0.001609,-0.012974,0.001435,-0.024128,-0.015435,-0.018729,0.004728,-0.015002,-0.010353,-0.001875,-0.014855,0.000115,0.01232,0.003365,0.005674,0.004884,0.001545,-0.008942,-0.007845,-0.009772,0.004678,0.006789,-0.002935,0.013978,0.004936,0.014467,0.019754,0.022547,0.004373,0.003452,-0.008373,0.013576,0.011747,-0.021631,0.017817,0.002137,0.013179,-0.018942,0.002031,0.004654,-0.00523,0.001736,-0.000265,-0.007289,0.002848,-0.013261,0.002093,-0.00453,0.002486,-0.011965,-0.018888,-0.004223,-0.010555,-0.013489,0.020854,0.01628,-0.006414,0.012646,-0.011594,0.00749,-0.002262,-0.022466,-0.001736,-0.001623,0.002366,-0.002328,0.008444,0.008119,-0.003307,0.014156,-0.005993,0.009725,-0.000425,-0.003977,-0.008845,-0.007584,-0.013306,0.008417,-0.003029,0.008778,-0.006895,0.005082,-0.002232,0.002199,0.011242,0.009628,-0.004465,-0.00214,-0.00992,-0.001298,0.005854,0.00035,0.008038,0.005676,0.010826,0.005843,-0.004883,0.008167,-0.005476,-0.007794,-0.00212,0.011023,-0.009484,-0.004069,-0.001431,0.00301,-0.012297,0.005,0.0114,0.005511,-0.000405,-0.004101,0.003488,1.6e-05,-0.001044,0.002094,0.006262,0.002449,0.000301,0.004956,-0.002589,0.014356,0.00111,0.005917,0
3,-1.148348,0.362116,-0.794457,-0.99319,-0.279855,-0.209539,-0.511512,-1.372052,1.00418,-0.58589,-0.9166,-0.357332,-0.060465,-0.449172,-0.326678,0.11741,1.085014,-0.190247,-0.268346,-0.017031,-0.220438,-0.37715,0.46705,0.315216,-0.257865,0.401667,0.834808,-0.066054,0.768247,0.43057,-0.029358,0.07711,-0.126622,0.044789,0.129999,-0.043297,0.023782,0.036244,-0.022106,0.053868,-0.065112,-0.060835,0.060119,-0.080147,-0.01807,0.152976,-0.034728,0.006197,0.007581,-0.126133,-0.099786,0.082187,0.023544,-0.039877,-0.094515,-0.03203,-0.018369,-0.023003,-0.013327,0.037028,-0.00844,5.4e-05,-0.048416,-0.008337,-0.078881,-0.009931,0.011451,-0.005195,-0.005669,-0.005286,0.01491,-0.038518,-0.005499,-0.032806,0.009657,0.023069,0.008218,0.028866,-0.015876,-0.020813,-0.057788,0.022313,0.015957,0.029669,0.004054,0.028562,-0.009578,-0.050834,-0.018114,0.017772,0.002014,0.016929,0.012619,-0.007735,0.015899,-0.043842,-0.001585,-0.01076,-0.018913,-0.005221,0.021952,0.015417,-0.001927,-0.021454,-0.011841,0.057332,-0.051405,0.001194,-0.004537,0.029255,0.013745,0.027383,-0.012815,-0.045873,-0.003703,-0.007915,0.006746,0.023619,0.014979,-0.01154,0.01677,0.008608,0.027855,0.042083,0.041313,-0.006268,-0.023389,-0.009443,0.00808,-0.05927,-0.001744,-0.035901,-0.013972,0.030812,-0.014025,0.025798,-0.023913,-0.005616,-0.028637,-0.051912,-0.010578,-0.026748,-0.059594,0.040932,-0.040783,-0.123971,0.019356,0.038726,-0.030365,-0.016448,-0.012553,-0.025412,-0.094038,-0.123434,-0.046833,0.118648,0.016812,0.020606,0.04141,-0.018316,-0.012915,0.025458,0.039586,-0.018637,0.011506,-0.00638,-0.00198,0.003631,-0.0074,0.007231,0.024447,0.011066,-0.005439,-0.018606,0.029709,-0.095743,0.073456,0.141828,0.025909,0.077066,-0.044947,0.01823,-0.050666,0.012581,-0.007892,-0.014724,0.038003,-0.018813,0.022507,-0.023111,-0.049122,-0.060828,-0.006752,0.044526,-0.042288,0.003269,0.008681,0.018769,-0.083641,-0.027415,0.003723,0.004856,0.034944,0.062881,-0.088204,0.022624,-0.036775,0.016519,-0.10489,0.096018,0.109464,0.048413,0.2235,-0.01959,-0.112508,-0.088637,-0.262048,0.017221,0.114758,0.033753,0.056018,0.035521,-0.193437,0.203121,-0.043313,0.121835,0.147833,0.370902,0.10859,-0.126935,0.045876,0.038273,-0.012425,0.041845,-0.025384,-0.05335,0.206073,-0.035244,-0.130368,0.065937,-0.010263,-0.100863,-0.096158,0.041965,-0.427418,-0.12097,0.071379,0.191535,-0.075293,-0.085424,0.086689,0.076519,0.00973,-0.03986,-0.045973,0.001437,-0.020588,0.066808,0.034314,0.020189,0.064803,-0.031408,-0.031024,-0.039913,-0.002275,0.024081,-0.024859,0.031847,-0.018629,0.054271,0.034883,-0.04774,0.036621,0.022006,0.019174,-0.019546,0.017999,-0.001702,0.022438,0.01897,0.019372,0.028549,0.013181,0.033483,0.006513,-0.040234,-0.019619,-0.004613,-0.110499,-0.028903,0.016936,-0.060502,0.104654,0.006431,-0.04727,-0.027949,0.015209,-0.091923,0.022695,0.033713,0.015348,0.035889,0.014302,-0.003563,0.030294,-0.016193,-0.0235,-0.006087,0.026132,-0.010261,-0.004066,-0.021658,0.015583,0.005492,0.001042,0.006893,-0.002045,0.007004,0.000169,0.008794,-0.009874,0.008761,0.028072,-0.028994,0.027397,4.9e-05,0.014622,0.021073,-0.002773,0.008054,0.029998,0.008288,-0.035384,-0.001229,-0.011931,0.006259,-0.007821,0.002964,-0.010876,-0.0088,-0.041486,0.00851,0.025019,-0.035753,-0.02119,-0.016703,-0.0036,0.006932,0.043274,-0.018289,-0.019076,0.022871,0.026467,-0.011542,-0.00063,-0.007442,0.006462,0
4,-0.482483,-0.665805,2.682352,-0.014586,-0.279855,-0.209539,-0.511512,0.80858,-0.059334,0.712355,-0.782346,-1.13596,-0.358935,-0.693476,-0.090077,-0.065136,-0.015176,0.908375,-0.339025,-0.105052,0.686554,0.024487,0.010172,-0.209501,0.078956,0.400834,0.71158,-0.172484,-0.175734,0.147112,0.327626,0.464945,0.826735,0.280176,-0.567091,0.007117,0.201461,0.265053,0.161873,-0.008564,0.084704,-0.02577,0.11575,-0.044523,-0.048547,-0.000363,-0.047321,0.030027,0.062166,-0.153524,-0.086653,0.11547,0.072904,0.085423,-0.1548,0.002443,-0.023684,0.00337,-0.031942,0.016487,0.057459,-0.159159,0.240534,-0.119213,0.018684,-0.21563,0.016695,-0.019367,0.06505,0.128056,-0.061059,0.00527,0.052291,0.046028,-0.02572,-0.055608,-0.090056,-0.001981,0.285016,0.186171,-0.036212,0.072811,0.22363,0.228925,0.035275,0.060143,-0.011684,-0.07757,-0.015862,0.073228,-0.002379,-0.013892,-0.086292,-0.027685,-0.033593,0.138165,-0.053078,0.154087,-0.255813,-0.292881,0.62799,0.100608,-0.171971,0.100503,0.051201,0.047941,0.129736,0.195164,-0.080012,0.027058,-0.019278,-0.047481,-0.005199,0.021091,-0.041253,-0.015743,-0.078043,-0.037504,-0.201008,-0.188072,0.136449,-0.121416,-0.136981,0.019448,0.009837,-0.009472,-0.020741,0.037289,-0.043262,0.040644,-0.011494,0.018969,0.026033,-0.005972,-0.020308,-0.025196,0.019512,-0.024095,0.020533,0.007806,0.007743,-0.003072,-0.003624,0.000709,0.013252,0.029255,0.001196,-0.013359,0.014364,-0.001368,-0.006607,0.000388,0.001708,0.006642,0.002108,-0.01429,0.014611,0.004977,0.000859,-0.002479,0.005053,0.010269,0.007472,-0.007319,-0.003561,0.010325,0.002183,-0.000431,-0.011943,0.006511,-0.000381,0.007523,0.003623,0.00926,0.008048,0.000396,0.004261,0.009119,0.003577,-0.010064,-0.009271,0.004742,-0.007242,0.007018,-0.006962,0.00246,0.002248,-0.014649,0.000708,0.002037,0.009796,-0.009934,-0.009755,0.005494,0.009581,0.000585,-0.008087,0.002266,0.001211,0.004329,-0.006163,-0.001707,-0.000136,0.002131,0.002925,0.007709,0.005407,-0.003977,0.001178,0.004749,0.005101,-0.001132,0.011829,0.009024,0.001945,0.003955,-0.006247,-0.015662,0.008064,0.002782,-0.001182,-0.015446,0.007191,0.003957,0.001908,0.002921,-0.004977,0.001819,0.007554,-0.003158,0.007063,-0.003316,0.000134,0.001686,0.003636,5.6e-05,-0.00894,-0.016428,-0.005703,-0.017252,0.0099,-0.000914,-0.005554,-0.000423,7.6e-05,0.003843,-0.002304,0.003004,-0.003187,0.000715,0.001715,0.002177,-0.000848,0.005222,-0.001678,0.009839,-0.003864,0.00437,-0.009091,-0.002553,2.6e-05,0.003833,-0.007631,-0.006839,0.002254,0.001251,-0.000119,0.002349,0.003298,-0.012133,-0.005704,0.001919,-0.001972,0.002982,-0.000594,-0.003968,-0.029947,-0.0036,-0.001911,-0.031917,-0.003732,-0.000921,-0.001576,0.000453,-0.000223,0.00023,-0.002633,-0.001711,0.000953,0.001356,0.005278,0.001187,-0.004957,0.000269,-0.004293,0.002616,-0.001062,0.000199,-0.001797,0.001069,0.001861,-0.002726,0.002674,-0.00404,-0.00076,-0.003373,0.00414,-0.004563,0.002026,0.002033,-0.000648,0.003009,0.001379,0.003204,0.00339,-0.000405,-0.000663,-0.000196,0.00237,0.004791,-0.004311,-0.002159,0.001732,-0.00691,0.003497,0.001645,-0.003518,-0.002793,0.001876,-0.001595,-0.001756,-0.00633,3.2e-05,-0.007929,-0.003381,0.000284,-0.011999,-0.005694,0.002743,0.001731,0.001365,-0.004177,0.001163,-0.002757,-0.000997,0.005032,-0.003796,0.000486,-0.001898,0.000522,0.001673,-0.00117,-0.000467,0.00215,0.000167,0.003606,-0.001124,1
5,-0.14955,1.338642,-0.214989,0.597041,-0.279855,-0.209539,-0.511512,-0.281736,1.262284,0.999351,-0.758566,-0.101339,0.360056,-0.675209,0.176679,-0.635869,-0.42799,-0.47384,0.257832,-0.62627,-0.337886,0.091565,0.176785,-0.0133,0.531738,-0.017643,0.01323,0.217345,-0.580667,1.252159,-0.021191,-0.342055,0.949676,0.24266,-0.692914,0.039365,0.072649,0.299024,0.177387,-0.162201,0.053875,-0.055463,-0.030275,0.002762,-0.064319,-0.219666,-0.098056,0.017412,-0.046094,-0.151831,0.050753,0.001114,0.017532,0.138824,-0.040052,0.113578,0.015634,0.096493,-0.007742,-0.133616,0.055493,-0.253089,0.351797,-0.070836,0.006813,-0.035263,-0.020178,-0.095026,0.034473,0.015957,-0.027054,0.070297,0.129387,-0.027023,0.000509,-0.047149,-0.111784,-0.001345,0.39846,0.348377,-0.047764,-0.093846,0.219546,0.112726,0.01568,0.125017,0.035918,-0.115785,-0.051059,0.022806,-0.037151,-0.044184,-0.108006,-0.102549,-0.19554,0.06733,-0.201847,-0.023463,-0.433407,-0.689495,0.39539,0.522518,0.074828,-0.078318,0.249539,0.222234,0.176118,0.137914,0.026054,-0.033059,-0.044669,0.049248,0.023713,0.043199,-0.019289,-0.008906,-0.086086,-0.030436,-0.17616,-0.224657,0.137785,-0.112751,-0.172288,0.024843,-0.009293,-0.025611,0.045517,0.027558,-0.065963,0.039358,-0.001004,0.016328,0.017829,0.019194,-0.025042,0.021293,0.001802,0.004533,0.039798,0.006289,-0.000437,-0.00324,-0.007021,-0.00794,0.033738,0.012831,0.001391,-0.006519,0.009066,0.015168,-0.018679,0.007788,0.019971,0.020799,0.007844,-0.000828,0.001859,-0.00544,0.004879,-0.040448,-0.006333,0.00355,-0.009741,0.006091,-0.001888,0.015027,0.000178,-0.002808,-0.020706,0.001645,-0.0067,0.020792,0.007176,0.009348,0.013308,0.001188,0.00298,0.013106,0.005585,-0.007466,-0.001051,-0.001489,-0.004857,0.010824,-0.002051,0.010115,0.026897,-0.003565,0.00043,0.000749,0.009046,-0.012404,-0.012233,0.00289,0.013055,0.000926,-0.008704,-0.0075,-0.007157,0.003516,-0.00394,0.003019,0.001543,-0.001105,0.000817,0.016242,0.001818,-0.004652,0.007373,-0.001249,0.009031,0.000282,0.013667,0.00092,0.000829,0.005396,-0.004145,-0.004597,0.005517,0.005927,0.001644,-0.017086,0.016747,0.004086,0.005018,0.002741,-0.002596,0.005052,-0.000797,-0.00403,0.003451,-0.008024,0.002885,-0.006458,0.005187,-0.006977,-0.008961,-0.009008,-0.00718,-0.020239,0.008162,0.000484,-0.000474,-0.001232,-0.000371,0.006593,0.0024,-0.002047,-0.003118,-0.00422,0.003466,-0.001554,-0.00124,0.009405,-0.004561,0.001744,0.00469,0.000321,-0.007916,-0.0044,-0.003438,0.003706,-0.011547,8.2e-05,-0.006452,0.005602,0.001378,0.000292,0.004013,-0.013245,-0.003193,0.003933,-0.004217,0.005033,0.001303,-0.011429,-0.028231,-0.007909,0.003454,-0.032611,-0.002753,-0.0022,-0.001393,-0.002134,-0.002301,0.001171,-0.00285,-0.002963,2.1e-05,-0.004909,0.003993,0.004392,-0.001909,-0.002324,-0.003313,0.000467,-0.001559,-0.000116,-0.000437,0.001424,-0.00128,-0.004388,-0.001772,-0.001883,0.001461,-0.00276,-0.002819,-0.000759,0.001046,0.002532,-0.000858,0.003217,0.002129,0.002585,-0.004263,0.000569,0.000584,0.001462,0.00095,0.001261,-0.002426,-3e-06,-0.000598,-0.005423,0.003911,0.003335,-0.000172,0.000703,0.002208,-0.002658,-0.000116,-0.002255,-0.003282,-0.005585,-0.001467,0.003384,-0.011415,-0.00303,0.003301,0.000102,0.000627,-0.00763,0.002993,-0.002438,-0.00044,0.004145,-0.005778,-0.00194,3.7e-05,0.003374,-0.004377,-0.004209,-0.003138,0.005607,-0.002798,0.007503,7.2e-05,0
6,0.183382,1.49283,-0.794457,-0.503888,-0.279855,-0.209539,-0.511512,0.263422,-0.540204,-0.497325,-0.968531,0.29218,0.039697,-0.624063,0.273212,0.876191,-0.436748,-0.084969,0.391758,1.234487,0.222663,-0.095156,-0.25875,-0.035527,-0.159372,0.337128,0.589644,-0.397389,-0.097138,0.019677,0.003743,0.112813,-0.566064,0.515383,-0.035194,0.080546,0.397153,0.04297,0.132835,-0.078156,0.029274,-0.181906,0.162377,0.081289,-0.146541,-0.035517,-0.058466,0.032815,-0.067296,-0.110293,-0.018743,0.10198,-0.023989,0.041532,-0.120388,-0.026119,0.013831,-0.016093,-0.023582,-0.020773,0.036014,0.034398,-0.073857,-0.053365,-0.00439,-0.043166,0.00543,0.011591,-0.006928,0.052391,-0.003674,-0.010409,-0.016076,-0.020899,0.032362,0.022085,-0.033935,0.029034,-0.013863,-0.010675,-0.007018,0.035673,0.03344,-0.018693,0.002177,0.004078,0.002792,-0.009933,0.011269,0.030018,-0.06252,0.038064,-0.012051,0.035956,0.001009,0.063374,0.019945,-0.005972,0.00372,0.005735,2.4e-05,0.022285,0.00321,-0.021144,0.006456,0.067179,-0.00083,0.004017,-0.032258,-0.034975,0.042446,0.025021,0.013347,-0.042738,-0.028979,-0.019448,-0.010048,0.015585,0.005149,-0.018144,-0.008272,0.019904,0.004665,-0.014343,0.001715,0.000405,-0.019697,0.002261,-3.7e-05,-0.023949,0.002596,-0.034247,-0.026117,0.031565,0.014425,0.017824,-0.044573,0.026581,-0.010244,0.012095,-0.00759,-0.00564,0.014751,-0.017786,-0.00619,-0.028219,-0.006927,0.019112,0.02442,-0.018935,-0.017013,0.002142,0.013589,-0.007919,-0.001367,0.024982,-0.027777,-0.007417,-0.010011,0.001562,0.010199,0.000968,0.014512,0.006636,0.022938,-0.004217,-0.008838,0.003463,-0.004432,-0.00128,0.007169,0.018677,-0.016869,-0.022519,-0.006907,-0.007675,-0.024162,-0.029962,0.006495,-0.017349,0.016598,-0.006049,0.012146,-0.008543,-0.018086,0.008385,-0.006483,0.015604,-0.004412,-0.004754,0.005724,-0.005241,0.020106,-0.022984,0.031343,0.021636,0.021112,-0.002765,0.024977,-0.009405,-0.020217,-0.002545,-0.029247,-0.017296,0.004783,-0.013061,-0.001449,0.010323,0.000954,-0.004126,-0.021815,-0.012258,0.009019,0.006353,-0.007767,-0.014383,0.017234,0.004276,-0.030871,-0.042112,0.01967,-0.004375,0.012978,-0.004072,-0.005755,0.003175,0.003874,-0.014342,-0.036618,-0.007577,0.017951,-0.011707,0.016615,-0.011385,3.1e-05,-0.008487,0.012457,0.011077,0.018506,0.0285,0.000725,-0.016473,-0.00193,-0.038955,-0.01154,-0.027547,0.004142,0.05807,-0.021166,-0.024287,0.010131,0.028718,0.000833,0.000444,0.005836,0.046348,0.004112,0.036821,0.048281,0.016491,0.035953,-0.003294,-0.048154,-0.054319,0.023922,0.004411,-0.067124,0.043894,-0.026794,-0.04271,-0.073864,-0.003713,0.111722,0.021788,-0.013081,0.004938,0.057212,-0.093686,0.065958,0.090183,-0.20098,-0.05893,0.063349,0.09613,0.106209,-0.502513,-0.443086,0.094125,0.279312,0.169084,-0.327951,0.116284,0.156684,-0.077714,0.248759,-0.093929,-0.049937,-0.018806,0.008752,-0.018191,0.023263,-0.00125,-0.015391,0.079831,-0.045468,-0.037638,0.021412,-0.007484,-0.057426,0.002892,-0.013734,-0.027127,-0.023579,-0.008898,-0.029616,-0.015078,-0.002648,0.013632,-0.044944,-0.023884,-0.036256,0.033481,-0.051105,0.019851,0.002226,0.00835,0.018997,-0.021785,-0.013585,-0.012942,0.029857,0.037566,0.031573,-0.004003,0.004216,0.00044,0.002625,-0.01077,0.014723,0.013437,-0.008227,0.002792,-0.015975,0.004394,-0.014056,0.005586,-0.007263,0.005053,5.5e-05,-0.012763,-0.019765,0.018914,-0.000648,-0.002888,-0.008898,0.004704,-0.016507,1


## **The model** 

Two different models were used: RandomForest and Multi Layer Perceptron Classifier (RNN)

A Grid Search was executed to find the best combination of parameters for this problem, to each of the two models

### **Finding out the best parameters**

A grid Search performs a series of model fits using many different pre-defined parameters and makes it easy to compare wich are the best

In [6]:
def best_model_Random_Forest(processed_data: pd.DataFrame, target: str, param_grid: dict):
    """
        Args:
            processed_data (pandas DataFrame): Processed diabetes dataset to be used for training
            target (str): Target variable name
            param_grid (dict): Dictionary containing hyperparameters for Random Forest model

        Returns:
            grid_search, best_model (Grid_Search, RandomForestClassifier): Grid Search results and best Random Forest model after hyperparameter tuning

        This function performs hyperparameter tuning for a Random Forest model using GridSearchCV on the processed diabetes dataset.
    """

    # Splitting Data
    X = processed_data.drop(columns=[target])
    y = processed_data[target]
    
    scoring = 'accuracy'
    
    grid_search = GridSearchCV(RandomForestClassifier(verbose=0), 
                               param_grid, 
                               cv=3, 
                               scoring=scoring, 
                               n_jobs=-1, 
                               verbose=1)
    grid_search.fit(X, y)

    # Visualizing the grid search results
    cv_results_df = pd.DataFrame(grid_search.cv_results_)
    print("Score type: ", scoring)
    display(pd.concat([cv_results_df[f'rank_test_score'], 
                        cv_results_df[f'mean_test_score'], 
                        cv_results_df[f'std_test_score'],
                        cv_results_df[('param_' + str(k) for k in param_grid.keys())]], 
                    axis=1).sort_values(by='rank_test_score'))
    
    return grid_search, grid_search.best_estimator_


In [7]:
def best_model_MLPClassifier(processed_data: pd.DataFrame, target: str, param_grid: dict):
    """
        Args:
            processed_data (pandas DataFrame): Processed diabetes dataset to be used for training
            target (str): Target variable name
            param_grid (dict): Dictionary containing hyperparameters for MLPClassifier model

        Returns:
            grid_search, best_model (Grid_Search, MLPClassifier): Grid Serach results and best MLPClassifier model after hyperparameter tuning

        This function performs hyperparameter tuning for a MLPCLassifier model using GridSearchCV on the processed diabetes dataset.
    """

    # Splitting Data
    X = processed_data.drop(columns=[target])
    y = processed_data[target]
    
    scoring= ['accuracy','f1_macro']

    grid_search = GridSearchCV(MLPClassifier(max_iter=250, 
                                             verbose=False, 
                                             early_stopping=True, 
                                             n_iter_no_change=30), 
                               param_grid, 
                               cv=3, 
                               scoring=scoring, 
                               n_jobs=-1,
                               refit=scoring[0],
                               verbose=1)
    grid_search.fit(X, y)

    # Visualizing the grid search results
    cv_results_df = pd.DataFrame(grid_search.cv_results_)
    
    if len(scoring) == 1:
        columns = ['rank_test_score', 'mean_test_score', 'std_test_score']
        sort = 'rank_test_score'
    else:
        columns = [f'rank_test_{s}' for s in scoring] + [f'mean_test_{s}' for s in scoring] + [f'std_test_{s}' for s in scoring]
        sort = [f'rank_test_{s}' for s in scoring]
    
    print("Score type: ", scoring)
    display(pd.concat([
         cv_results_df[columns], 
         cv_results_df[[('param_' + str(k)) for k in param_grid.keys()]]], axis=1)
        .sort_values(by=sort))
    
    return grid_search, grid_search.best_estimator_


#### DO NOT RUN THE NEXT TWO CELLS

They might take hours to process

This process has already been done and the best parameters are in the next section

In [None]:
## Fiding ou the best parameters for the Random Forest model (Abou 42 Min of processing time for 192 fits)

# Defining the parameters to be combined and tested for Random Forest
param_grid_rf = {
    'n_estimators': [200, 500],             
    'max_depth': [20, 50],             
    'min_samples_split': [5, 10],               
    'max_features': ['sqrt', 'log2'],      
    'bootstrap': [True],                  
    'criterion': ['gini', 'entropy'],             
    'class_weight': ['balanced', 'balanced_subsample']  # To handle unbalanced target classes
}

grid_rf, m_rf = best_model_Random_Forest(diabetes_processed, "readmitted", param_grid_rf)

Fitting 3 folds for each of 64 candidates, totalling 192 fits
Score type:  accuracy


Unnamed: 0,rank_test_score,mean_test_score,std_test_score,param_n_estimators,param_max_depth,param_min_samples_split,param_max_features,param_bootstrap,param_criterion,param_class_weight
57,1,0.566983,0.005813,500,50,5,sqrt,True,entropy,balanced_subsample
25,2,0.566456,0.005751,500,50,5,sqrt,True,entropy,balanced
61,3,0.565467,0.004747,500,50,5,log2,True,entropy,balanced_subsample
29,4,0.565209,0.004830,500,50,5,log2,True,entropy,balanced
45,5,0.565102,0.005554,500,50,5,log2,True,gini,balanced_subsample
...,...,...,...,...,...,...,...,...,...,...
38,60,0.527597,0.026985,200,20,10,log2,True,gini,balanced_subsample
6,61,0.526855,0.027977,200,20,10,log2,True,gini,balanced
55,62,0.526673,0.029115,500,20,10,log2,True,entropy,balanced_subsample
22,63,0.525931,0.027957,200,20,10,log2,True,entropy,balanced


In [153]:
# Finding out the best parameters for the MLPClassifier model (Aboutv 7 Min of processing time)

param_grid_mlp = {
    'hidden_layer_sizes': [(128,), (128, 64), (256, 128)],
    'activation': ['relu', 'tanh'],
    'alpha': [1e-3, 1e-2],
    'learning_rate_init': [0.001, 0.005],
    'solver': ['adam', 'sgd']
}

grid_mlp, m_mlp = best_model_MLPClassifier(diabetes_processed, "readmitted", param_grid_mlp)

Fitting 3 folds for each of 48 candidates, totalling 144 fits
Score type:  ['accuracy', 'f1_macro']


Unnamed: 0,rank_test_accuracy,rank_test_f1_macro,mean_test_accuracy,mean_test_f1_macro,std_test_accuracy,std_test_f1_macro,param_hidden_layer_sizes,param_activation,param_alpha,param_learning_rate_init,param_solver
45,1,18,0.571528,0.38258,0.00591,0.019751,"(256, 128)",tanh,0.01,0.001,sgd
14,2,43,0.571002,0.376697,0.007059,0.02131,"(128,)",relu,0.01,0.005,adam
26,3,5,0.570411,0.3894,0.004243,0.018981,"(128,)",tanh,0.001,0.005,adam
4,4,13,0.570067,0.38481,0.006295,0.01504,"(128, 64)",relu,0.001,0.001,adam
24,5,4,0.570035,0.39039,0.004888,0.011951,"(128,)",tanh,0.001,0.001,adam
22,6,1,0.570024,0.391536,0.008124,0.022262,"(256, 128)",relu,0.01,0.005,adam
5,7,39,0.570013,0.378444,0.005719,0.019432,"(128, 64)",relu,0.001,0.001,sgd
25,8,30,0.569658,0.380457,0.007594,0.025704,"(128,)",tanh,0.001,0.001,sgd
21,9,41,0.569411,0.377912,0.005251,0.018311,"(256, 128)",relu,0.01,0.001,sgd
43,10,8,0.569089,0.387051,0.005086,0.016675,"(128, 64)",tanh,0.01,0.005,sgd


### **Training the final models with the defined parameters**

**Best parameters found:**

rf_best_params: `{'bootstrap': True, 'class_weight': 'balanced_subsample', 'criterion': 'entropy', 'max_depth': 50, 'max_features': 'sqrt', 'min_samples_split': 5, 'n_estimators': 500}`

mlp_best_params: `{'activation': 'tanh', 'alpha': 0.01, 'hidden_layer_sizes': (256, 128), 'learning_rate_init': 0.001, 'solver': 'sgd'}`


In [82]:
def single_model_RNN(processed_data: pd.DataFrame, target: str, params: dict):
    """
        Args:
            processed_data (pandas DataFrame): Processed diabetes dataset to be used for training
            target (str): Target variable name
            params (dict): Dictionary containing hyperparameters for MLPClassifier model

        Returns:
            model (MLPClassifier): Trained MLP model
            X_test (pandas DataFrame): Test data to be used for evaluation
            y_test (pandas Series): Test data to be used for evaluation

        This function trains a Multi-Layer Perceptron (MLP) model on the processed diabetes dataset, using predefined hyperparameters.
    """    

    # Splitting the data into train and test sets
    X = processed_data.drop(columns=[target])
    y = processed_data[target]

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    model = MLPClassifier(**params, early_stopping=True,
                                    max_iter=250, 
                                    n_iter_no_change=30)

    model.fit(X_train, y_train)

    return model, X_test, y_test


In [10]:
def single_model_Random_Forest(processed_data: pd.DataFrame, target: str, params: dict):
    """
        Args:
            processed_data (pandas DataFrame): Processed diabetes dataset to be used for training
            target (str): Target variable name
            params (dict): Dictionary containing hyperparameters for Random Forest model

        Returns:
            rf (RandomForestClassifier): Trained Random Forest model
            X_test (pandas DataFrame): Test data to be used for evaluation
            y_test (pandas Series): Test data to be used for evaluation

        This function trains a Random Forest model on the processed diabetes dataset, using predefined hyperparameters.
    """    

    # Splitting the data into train and test sets
    X = processed_data.drop(columns=[target])
    y = processed_data[target]

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Creating and training the Random Forest model
    rf = RandomForestClassifier(**params, n_jobs=-1, verbose=0)
    
    rf.fit(X_train, y_train)

    return rf, X_test, y_test


In [99]:
rf, rf_X_test, rf_Y_test = single_model_Random_Forest(diabetes_processed, "readmitted", {'bootstrap': True, 'class_weight': 'balanced_subsample', 'criterion': 'entropy', 'max_depth': 50, 'max_features': 'sqrt', 'min_samples_split': 5, 'n_estimators': 2000})

## rf_best_params: {'bootstrap': True, 'class_weight': 'balanced_subsample', 'criterion': 'entropy', 'max_depth': 50, 'max_features': 'sqrt', 'min_samples_split': 5, 'n_estimators': 500}

mlp, mlp_X_test, mlp_y_test = single_model_RNN(diabetes_processed, "readmitted", {'activation': 'tanh', 'alpha': 0.001, 'hidden_layer_sizes': (256, 128), 'learning_rate': 'adaptive', 'solver': 'sgd'})

## mlp_best_params: {'activation': 'tanh', 'alpha': 0.01, 'hidden_layer_sizes': (256, 128), 'learning_rate_init': 0.001, 'solver': 'sgd'}



## **Evaluating the models**

In [100]:
print("------------------RANDOM FOREST------------------", end="\n\n")
rf_predictions = rf.predict(rf_X_test)

print("Accuracy:", accuracy_score(rf_Y_test, rf_predictions))
print("\nClassification Report:\n", classification_report(rf_Y_test, rf_predictions))

print("Confusion Matrix:")
display(pd.crosstab(rf_Y_test, rf_predictions, rownames=['Actual'], colnames=['Predicted'], margins=True))

print("Feature Importance:")
display(pd.DataFrame(rf.feature_importances_, index=rf_X_test.columns, columns=['Importance']).sort_values(by='Importance', ascending=False).head(10))

print("---------------MULTI LAYER PERCEPTRON---------------", end="\n\n")
mlp_predictions = mlp.predict(mlp_X_test)

print("Accuracy:", accuracy_score(mlp_y_test, mlp_predictions))
print("\nClassification Report:\n", classification_report(mlp_y_test, mlp_predictions))

print("Confusion Matrix:")
display(pd.crosstab(mlp_y_test, mlp_predictions, rownames=['Actual'], colnames=['Predicted'], margins=True))


------------------RANDOM FOREST------------------

Accuracy: 0.5648809843640857

Classification Report:
               precision    recall  f1-score   support

           0       0.58      0.91      0.70      9838
           1       0.51      0.24      0.33      6572
           2       0.53      0.01      0.03      2201

    accuracy                           0.56     18611
   macro avg       0.54      0.39      0.35     18611
weighted avg       0.55      0.56      0.49     18611

Confusion Matrix:


Predicted,0,1,2,All
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,8910,915,13,9838
1,4985,1572,15,6572
2,1594,576,31,2201
All,15489,3063,59,18611


Feature Importance:


Unnamed: 0,Importance
number_inpatient,0.013294
pca_5,0.00465
pca_46,0.004446
pca_30,0.004305
pca_28,0.003978
pca_16,0.003847
pca_15,0.00379
pca_6,0.003756
pca_54,0.003705
pca_2,0.003692


---------------MULTI LAYER PERCEPTRON---------------

Accuracy: 0.5783676320455644

Classification Report:
               precision    recall  f1-score   support

           0       0.61      0.83      0.70      9838
           1       0.50      0.38      0.43      6572
           2       0.41      0.03      0.06      2201

    accuracy                           0.58     18611
   macro avg       0.51      0.42      0.40     18611
weighted avg       0.55      0.58      0.53     18611

Confusion Matrix:


Predicted,0,1,2,All
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,8177,1620,41,9838
1,3992,2511,69,6572
2,1259,866,76,2201
All,13428,4997,186,18611
