# 🧠 Introduction

Welcome to the second notebook in Team Quail’s submission for the **FLIQ Hackathon**.

Our project, **Quail** – *Quantum Understanding and AI for Interpretability and Learning* – explores how hybrid quantum-classical models and interpretable machine learning can tackle challenging problems in drug discovery.

In this notebook, we focus on preparing the training and test data for modeling. We're not removing any features or addressing label imbalance at this stage — the goal is to **standardize and normalize** the descriptors so that they’re clean, consistent, and ready to be used in both classical and quantum pipelines.

Specifically, we will:
- Normalize numerical descriptors to ensure consistent scale across features  
- Apply transformations uniformly to both train and test sets  
- Ensure feature alignment and format consistency  
- Structure the datasets to be directly usable by classical and quantum machine learning models  

This ensures that all downstream experiments are reproducible and built on a well-defined input structure.

## Data Loading

In [32]:
# import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# show all columns
pd.set_option('display.max_columns', None)  

In [5]:
# import the training file
df_train = pd.read_csv("drug+induced+autoimmunity+prediction/DIA_trainingset_RDKit_descriptors.csv")
df_train.head()

Unnamed: 0,Label,SMILES,BalabanJ,BertzCT,Chi0,Chi0n,Chi0v,Chi1,Chi1n,Chi1v,Chi2n,Chi2v,Chi3n,Chi3v,Chi4n,Chi4v,EState_VSA1,EState_VSA10,EState_VSA11,EState_VSA2,EState_VSA3,EState_VSA4,EState_VSA5,EState_VSA6,EState_VSA7,EState_VSA8,EState_VSA9,ExactMolWt,FractionCSP3,HallKierAlpha,HeavyAtomCount,HeavyAtomMolWt,Ipc,Kappa1,Kappa2,Kappa3,LabuteASA,MaxAbsEStateIndex,MaxAbsPartialCharge,MaxEStateIndex,MaxPartialCharge,MinAbsEStateIndex,MinAbsPartialCharge,MinEStateIndex,MinPartialCharge,MolLogP,MolMR,MolWt,NHOHCount,NOCount,NumAliphaticCarbocycles,NumAliphaticHeterocycles,NumAliphaticRings,NumAromaticCarbocycles,NumAromaticHeterocycles,NumAromaticRings,NumHAcceptors,NumHDonors,NumHeteroatoms,NumRadicalElectrons,NumRotatableBonds,NumSaturatedCarbocycles,NumSaturatedHeterocycles,NumSaturatedRings,NumValenceElectrons,PEOE_VSA1,PEOE_VSA10,PEOE_VSA11,PEOE_VSA12,PEOE_VSA13,PEOE_VSA14,PEOE_VSA2,PEOE_VSA3,PEOE_VSA4,PEOE_VSA5,PEOE_VSA6,PEOE_VSA7,PEOE_VSA8,PEOE_VSA9,RingCount,SMR_VSA1,SMR_VSA10,SMR_VSA2,SMR_VSA3,SMR_VSA4,SMR_VSA5,SMR_VSA6,SMR_VSA7,SMR_VSA8,SMR_VSA9,SlogP_VSA1,SlogP_VSA10,SlogP_VSA11,SlogP_VSA12,SlogP_VSA2,SlogP_VSA3,SlogP_VSA4,SlogP_VSA5,SlogP_VSA6,SlogP_VSA7,SlogP_VSA8,SlogP_VSA9,TPSA,VSA_EState1,VSA_EState10,VSA_EState2,VSA_EState3,VSA_EState4,VSA_EState5,VSA_EState6,VSA_EState7,VSA_EState8,VSA_EState9,fr_Al_COO,fr_Al_OH,fr_Al_OH_noTert,fr_ArN,fr_Ar_COO,fr_Ar_N,fr_Ar_NH,fr_Ar_OH,fr_COO,fr_COO2,fr_C_O,fr_C_O_noCOO,fr_C_S,fr_HOCCN,fr_Imine,fr_NH0,fr_NH1,fr_NH2,fr_N_O,fr_Ndealkylation1,fr_Ndealkylation2,fr_Nhpyrrole,fr_SH,fr_aldehyde,fr_alkyl_carbamate,fr_alkyl_halide,fr_allylic_oxid,fr_amide,fr_amidine,fr_aniline,fr_aryl_methyl,fr_azide,fr_azo,fr_barbitur,fr_benzene,fr_benzodiazepine,fr_bicyclic,fr_diazo,fr_dihydropyridine,fr_epoxide,fr_ester,fr_ether,fr_furan,fr_guanido,fr_halogen,fr_hdrzine,fr_hdrzone,fr_imidazole,fr_imide,fr_isocyan,fr_isothiocyan,fr_ketone,fr_ketone_Topliss,fr_lactam,fr_lactone,fr_methoxy,fr_morpholine,fr_nitrile,fr_nitro,fr_nitro_arom,fr_nitro_arom_nonortho,fr_nitroso,fr_oxazole,fr_oxime,fr_para_hydroxylation,fr_phenol,fr_phenol_noOrthoHbond,fr_phos_acid,fr_phos_ester,fr_piperdine,fr_piperzine,fr_priamide,fr_prisulfonamd,fr_pyridine,fr_quatN,fr_sulfide,fr_sulfonamd,fr_sulfone,fr_term_acetylene,fr_tetrazole,fr_thiazole,fr_thiocyan,fr_thiophene,fr_unbrch_alkane,fr_urea
0,0,COC(=O)N(C)c1c(N)nc(nc1N)c2nn(Cc3ccccc3F)c4ncc...,1.821,1266.407,22.121,16.781,16.781,14.901,9.203,9.203,6.668,6.668,4.719,4.719,3.241,3.241,6.093,9.185,0.0,35.509,22.291,4.9,20.224,41.21,0.0,24.787,11.467,422.162,0.15,-4.28,31,403.272,12184750.0,20.012,7.859,3.564,175.833,14.164,0.452,14.164,0.413,0.018,0.413,-0.668,-0.452,2.443,113.689,422.424,4,10,0,0,0,1,3,4,9,2,11,0,6,0,0,0,158,16.204,17.199,23.107,0.0,0.0,6.093,4.9,28.819,5.099,0.0,18.199,18.199,18.808,19.041,4,13.922,34.45,0.0,24.732,0.0,6.545,30.525,53.976,0,11.518,16.367,26.508,0.0,0.0,44.983,11.282,5.817,5.563,42.595,0.0,22.552,0,138.07,0,0.0,0,0,0,0,0,0,15.727,62.94,0,0,0,2,0,5,0,0,0,0,1,1,0,0,0,6,0,2,0,0,0,0,0,0,0,0,0,1,0,3,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1,0,C[C@H](N(O)C(=O)N)c1cc2ccccc2s1,2.363,490.434,11.707,8.752,9.569,7.592,4.854,5.67,3.545,4.661,2.449,3.735,1.52,2.621,12.073,10.002,0.0,0.0,5.063,14.963,0.0,18.261,30.332,0.0,5.734,236.062,0.182,-1.59,16,224.2,5774.616,10.912,4.112,1.868,96.858,10.832,0.35,10.832,0.339,0.421,0.339,-0.843,-0.35,2.732,63.449,236.296,3,4,0,0,0,1,1,2,3,2,5,0,3,0,0,0,84,5.734,0.0,0.0,0.0,0.0,6.031,5.207,4.795,5.063,11.337,18.199,24.443,9.577,6.042,2,10.002,27.454,0.0,5.063,5.734,12.966,0.0,35.209,0,0.0,5.734,4.795,0.0,11.337,16.302,0.0,0.0,17.843,30.332,0.0,10.086,0,66.56,0,1.542,0,0,0,0,0,0,0.0,39.292,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,1,2,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1
2,0,C[N+](C)(C)CC(=O)[O-],3.551,93.092,6.784,5.471,5.471,3.417,2.42,2.42,2.82,2.82,0.603,0.603,0.387,0.387,5.969,9.901,0.0,6.545,4.483,0.0,0.0,21.143,0.0,0.0,0.0,117.079,0.8,-0.57,8,106.06,23.005,7.43,2.132,5.43,49.125,9.888,0.544,9.888,0.118,0.069,0.118,-1.002,-0.544,-1.557,27.906,117.148,0,3,0,0,0,0,0,0,2,0,3,0,2,0,0,0,48,14.384,6.545,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,27.112,0,14.384,5.969,0.0,0.0,0.0,0.0,27.688,0.0,0,0.0,5.107,0.0,0.0,0.0,38.14,4.795,0.0,0.0,0.0,0.0,0.0,0,40.13,0,0.0,0,0,0,0,0,0,0.419,24.248,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,1,CC(C)n1c(\C=C\[C@H](O)C[C@H](O)CC(=O)O)c(c2ccc...,2.076,1053.003,21.836,16.995,16.995,14.274,9.926,9.926,7.662,7.662,4.998,4.998,3.768,3.768,24.598,19.398,0.0,18.28,0.0,27.724,12.133,24.285,24.265,18.415,5.107,411.185,0.292,-2.96,30,385.265,4553841.0,21.742,9.238,4.777,174.259,13.498,0.481,13.498,0.305,0.073,0.305,-1.13,-0.481,4.628,115.634,411.473,3,5,0,0,0,2,1,3,4,3,6,0,8,0,0,0,158,19.887,5.817,0.0,0.0,0.0,5.969,4.795,4.39,0.0,0.0,36.408,43.686,34.623,18.629,3,24.505,22.948,0.0,4.567,0.0,44.939,0.0,66.118,0,11.127,0.0,4.39,0.0,0.0,38.064,4.795,5.817,38.425,54.607,0.0,28.106,0,82.69,0,0.0,0,0,0,0,0,0,15.642,62.025,1,2,2,0,0,1,0,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1,C\C(=C(\C#N)/C(=O)Nc1ccc(cc1)C(F)(F)F)\O,2.888,549.823,14.629,9.746,9.746,8.752,5.04,5.04,3.601,3.601,2.171,2.171,1.232,1.232,28.979,17.966,0.0,5.687,0.0,31.189,6.069,0.0,0.0,5.317,10.368,270.062,0.167,-2.49,19,261.138,10134.11,14.571,5.436,3.741,106.326,12.3,0.511,12.3,0.416,0.09,0.416,-4.456,-0.511,2.999,61.016,270.21,2,4,0,0,0,1,0,1,3,2,7,0,4,0,0,0,100,10.423,11.828,5.573,0.0,5.907,6.176,4.795,0.0,18.433,0.0,0.0,31.189,5.687,5.563,1,23.072,11.595,5.262,0.0,0.0,13.1,5.317,41.161,0,6.069,5.317,18.859,0.0,0.0,11.014,10.971,11.331,12.487,35.598,0.0,0.0,0,73.12,0,0.0,0,0,0,0,0,0,36.9,30.684,0,1,1,0,0,0,0,0,0,0,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,3,1,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [6]:
# import the test file
df_test = pd.read_csv("drug+induced+autoimmunity+prediction/DIA_testset_RDKit_descriptors.csv")
df_test.head()

Unnamed: 0,Label,SMILES,BalabanJ,BertzCT,Chi0,Chi0n,Chi0v,Chi1,Chi1n,Chi1v,Chi2n,Chi2v,Chi3n,Chi3v,Chi4n,Chi4v,EState_VSA1,EState_VSA10,EState_VSA11,EState_VSA2,EState_VSA3,EState_VSA4,EState_VSA5,EState_VSA6,EState_VSA7,EState_VSA8,EState_VSA9,ExactMolWt,FractionCSP3,HallKierAlpha,HeavyAtomCount,HeavyAtomMolWt,Ipc,Kappa1,Kappa2,Kappa3,LabuteASA,MaxAbsEStateIndex,MaxAbsPartialCharge,MaxEStateIndex,MaxPartialCharge,MinAbsEStateIndex,MinAbsPartialCharge,MinEStateIndex,MinPartialCharge,MolLogP,MolMR,MolWt,NHOHCount,NOCount,NumAliphaticCarbocycles,NumAliphaticHeterocycles,NumAliphaticRings,NumAromaticCarbocycles,NumAromaticHeterocycles,NumAromaticRings,NumHAcceptors,NumHDonors,NumHeteroatoms,NumRadicalElectrons,NumRotatableBonds,NumSaturatedCarbocycles,NumSaturatedHeterocycles,NumSaturatedRings,NumValenceElectrons,PEOE_VSA1,PEOE_VSA10,PEOE_VSA11,PEOE_VSA12,PEOE_VSA13,PEOE_VSA14,PEOE_VSA2,PEOE_VSA3,PEOE_VSA4,PEOE_VSA5,PEOE_VSA6,PEOE_VSA7,PEOE_VSA8,PEOE_VSA9,RingCount,SMR_VSA1,SMR_VSA10,SMR_VSA2,SMR_VSA3,SMR_VSA4,SMR_VSA5,SMR_VSA6,SMR_VSA7,SMR_VSA8,SMR_VSA9,SlogP_VSA1,SlogP_VSA10,SlogP_VSA11,SlogP_VSA12,SlogP_VSA2,SlogP_VSA3,SlogP_VSA4,SlogP_VSA5,SlogP_VSA6,SlogP_VSA7,SlogP_VSA8,SlogP_VSA9,TPSA,VSA_EState1,VSA_EState10,VSA_EState2,VSA_EState3,VSA_EState4,VSA_EState5,VSA_EState6,VSA_EState7,VSA_EState8,VSA_EState9,fr_Al_COO,fr_Al_OH,fr_Al_OH_noTert,fr_ArN,fr_Ar_COO,fr_Ar_N,fr_Ar_NH,fr_Ar_OH,fr_COO,fr_COO2,fr_C_O,fr_C_O_noCOO,fr_C_S,fr_HOCCN,fr_Imine,fr_NH0,fr_NH1,fr_NH2,fr_N_O,fr_Ndealkylation1,fr_Ndealkylation2,fr_Nhpyrrole,fr_SH,fr_aldehyde,fr_alkyl_carbamate,fr_alkyl_halide,fr_allylic_oxid,fr_amide,fr_amidine,fr_aniline,fr_aryl_methyl,fr_azide,fr_azo,fr_barbitur,fr_benzene,fr_benzodiazepine,fr_bicyclic,fr_diazo,fr_dihydropyridine,fr_epoxide,fr_ester,fr_ether,fr_furan,fr_guanido,fr_halogen,fr_hdrzine,fr_hdrzone,fr_imidazole,fr_imide,fr_isocyan,fr_isothiocyan,fr_ketone,fr_ketone_Topliss,fr_lactam,fr_lactone,fr_methoxy,fr_morpholine,fr_nitrile,fr_nitro,fr_nitro_arom,fr_nitro_arom_nonortho,fr_nitroso,fr_oxazole,fr_oxime,fr_para_hydroxylation,fr_phenol,fr_phenol_noOrthoHbond,fr_phos_acid,fr_phos_ester,fr_piperdine,fr_piperzine,fr_priamide,fr_prisulfonamd,fr_pyridine,fr_quatN,fr_sulfide,fr_sulfonamd,fr_sulfone,fr_term_acetylene,fr_tetrazole,fr_thiazole,fr_thiocyan,fr_thiophene,fr_unbrch_alkane,fr_urea
0,0,C[C@H](\C=C\[C@H](O)C1CC1)[C@@H]2CC[C@@H]3\C(=...,1.484,743.207,21.466,18.764,18.764,14.292,12.106,12.106,10.736,10.736,8.532,8.532,6.626,6.626,12.208,15.32,0.0,6.104,41.928,17.567,44.098,0.0,0.0,44.73,0.0,412.298,0.704,-1.16,30,372.294,8390163.0,22.049,8.763,4.397,181.829,10.227,0.393,10.227,0.081,0.25,0.081,-0.621,-0.393,5.091,121.76,412.614,3,3,4,0,4,0,0,0,3,3,3,0,5,4,0,4,166,15.32,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,50.304,91.599,6.421,18.312,4,15.32,0.0,0.0,0.0,29.087,89.947,0.0,47.602,0,0.0,0.0,0.0,0.0,0.0,33.631,0.0,29.087,71.635,47.602,0.0,0.0,0,60.69,0,0.0,0,0,0,0,0,0,0.0,62.083,0,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1,OCCN1CCN(CCCN2c3ccccc3Sc4ccc(cc24)C(F)(F)F)CC1,1.472,868.947,21.14,16.736,17.553,14.453,10.268,11.084,7.662,8.746,5.69,6.82,4.077,5.153,11.74,13.171,0.0,6.607,18.777,54.623,23.895,6.066,29.165,9.8,5.107,437.175,0.455,-1.74,30,411.323,6669991.0,21.49,9.177,4.771,178.82,13.311,0.416,13.311,0.416,0.186,0.395,-4.351,-0.395,4.308,113.598,437.531,1,4,0,2,2,2,0,2,5,1,8,0,7,0,1,1,162,14.906,0.0,0.0,0.0,0.0,6.176,4.9,0.0,13.171,0.0,23.895,43.297,49.06,23.545,4,18.278,23.137,0.0,9.8,0.0,22.388,57.32,48.028,0,0.0,4.9,24.546,0.0,11.762,67.327,6.176,0.0,11.984,52.256,0.0,0.0,0,29.95,0,1.527,0,0,0,0,0,0,39.932,33.458,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,3,0,0,0,0,0,0,0,0,0,3,0,0,0,2,0,0,0,0,2,0,2,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,C[C@@H]1O[C@H](C[C@H](O)[C@@H]1O)O[C@@H]2[C@H]...,0.837,1409.004,39.189,32.904,32.904,26.011,20.941,20.941,18.816,18.816,15.952,15.952,12.762,12.762,90.926,35.434,0.0,61.111,18.76,44.098,0.0,19.923,13.847,6.924,33.158,780.43,0.927,-1.27,55,716.437,1460000000000.0,40.507,15.081,6.95,323.322,12.568,0.458,12.568,0.331,0.045,0.331,-1.012,-0.458,2.218,192.611,780.949,6,14,4,4,8,0,0,0,14,6,14,0,7,4,3,7,312,63.797,24.919,18.87,0.0,0.0,5.969,0.0,4.795,0.0,0.0,13.847,106.798,30.753,54.433,8,68.592,5.969,0.0,0.0,34.502,196.862,6.607,11.649,0,0.0,0.0,0.0,0.0,0.0,134.83,37.953,34.502,105.248,11.649,0.0,0.0,0,203.06,0,0.0,0,0,0,0,0,0,0.0,128.583,0,6,5,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,1,7,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1,NC(=O)Cc1cccc(C(=O)c2ccccc2)c1N,2.406,621.298,13.828,10.297,10.297,9.092,5.847,5.847,4.217,4.217,2.842,2.842,1.898,1.898,5.907,9.589,0.0,12.204,22.378,0.0,0.0,42.465,6.066,0.0,11.467,254.106,0.067,-2.62,19,240.177,20828.75,12.827,5.351,2.796,110.586,12.309,0.398,12.309,0.221,0.03,0.221,-0.476,-0.398,1.528,73.627,254.289,4,4,0,0,0,2,0,2,3,2,4,0,4,0,0,0,96,11.467,0.0,5.783,5.907,0.0,0.0,9.589,0.0,0.0,0.0,42.465,11.63,16.814,6.421,2,9.589,17.378,0.0,0.0,5.734,6.421,5.734,65.221,0,0.0,11.467,5.687,0.0,0.0,11.69,11.215,0.0,21.485,48.531,0.0,0.0,0,86.18,0,0.0,0,0,0,0,0,0,0.0,49.5,0,0,0,1,0,0,0,0,0,0,2,2,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,COc1cc2c(CCN[C@]23CS[C@@H]4[C@@H]5[C@@H]6N(C)[...,1.32,2127.996,37.955,30.849,31.666,25.91,18.066,19.115,14.93,16.06,12.266,13.739,9.995,11.687,47.081,24.909,0.0,42.443,70.202,16.69,32.905,12.133,31.861,10.217,28.421,761.262,0.487,-4.41,54,718.506,1640000000000.0,35.302,12.564,4.742,315.754,14.682,0.504,14.682,0.331,0.023,0.331,-1.358,-0.504,3.413,194.579,761.85,4,14,0,7,7,3,0,3,15,4,15,0,4,0,2,2,286,43.741,18.584,40.036,6.793,0.0,11.939,19.911,4.795,0.0,11.762,6.066,68.123,47.517,37.595,10,53.33,23.7,0.0,15.117,0.0,74.797,46.965,62.707,0,40.247,29.001,0.0,40.247,11.762,102.334,32.707,13.847,68.765,18.199,0.0,0.0,0,168.72,0,1.468,0,0,0,0,0,0,0.0,124.449,0,1,1,0,0,0,0,2,0,0,2,2,0,0,0,2,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,3,0,9,0,0,0,2,6,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,2,2,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0


### Sanity Check
We should take a quick look at the types of data each column has to build intuition.

In [13]:
# check the datatypes
df_train.dtypes

Label                 int64
SMILES               object
BalabanJ            float64
BertzCT             float64
Chi0                float64
                     ...   
fr_thiazole           int64
fr_thiocyan           int64
fr_thiophene          int64
fr_unbrch_alkane      int64
fr_urea               int64
Length: 198, dtype: object

### Observation
![image.png](attachment:a9b16ef2-6a66-4ff6-a7c0-3d9600428267.png)
The mix of data types implies that we have to be careful when normalizing continuous data and discrete ones. 

In [20]:
# get columns which are ints
int_list = list(df_train.dtypes[df_train.dtypes == "int64"][1:].index)
int_list

['HeavyAtomCount',
 'NHOHCount',
 'NOCount',
 'NumAliphaticCarbocycles',
 'NumAliphaticHeterocycles',
 'NumAliphaticRings',
 'NumAromaticCarbocycles',
 'NumAromaticHeterocycles',
 'NumAromaticRings',
 'NumHAcceptors',
 'NumHDonors',
 'NumHeteroatoms',
 'NumRadicalElectrons',
 'NumRotatableBonds',
 'NumSaturatedCarbocycles',
 'NumSaturatedHeterocycles',
 'NumSaturatedRings',
 'NumValenceElectrons',
 'RingCount',
 'SMR_VSA8',
 'SlogP_VSA9',
 'VSA_EState1',
 'VSA_EState2',
 'VSA_EState3',
 'VSA_EState4',
 'VSA_EState5',
 'VSA_EState6',
 'VSA_EState7',
 'fr_Al_COO',
 'fr_Al_OH',
 'fr_Al_OH_noTert',
 'fr_ArN',
 'fr_Ar_COO',
 'fr_Ar_N',
 'fr_Ar_NH',
 'fr_Ar_OH',
 'fr_COO',
 'fr_COO2',
 'fr_C_O',
 'fr_C_O_noCOO',
 'fr_C_S',
 'fr_HOCCN',
 'fr_Imine',
 'fr_NH0',
 'fr_NH1',
 'fr_NH2',
 'fr_N_O',
 'fr_Ndealkylation1',
 'fr_Ndealkylation2',
 'fr_Nhpyrrole',
 'fr_SH',
 'fr_aldehyde',
 'fr_alkyl_carbamate',
 'fr_alkyl_halide',
 'fr_allylic_oxid',
 'fr_amide',
 'fr_amidine',
 'fr_aniline',
 'fr_aryl_

In [23]:
# get columns which are floats
float_list = df_train.dtypes[df_train.dtypes == "float64"].index
float_list

Index(['BalabanJ', 'BertzCT', 'Chi0', 'Chi0n', 'Chi0v', 'Chi1', 'Chi1n',
       'Chi1v', 'Chi2n', 'Chi2v', 'Chi3n', 'Chi3v', 'Chi4n', 'Chi4v',
       'EState_VSA1', 'EState_VSA10', 'EState_VSA11', 'EState_VSA2',
       'EState_VSA3', 'EState_VSA4', 'EState_VSA5', 'EState_VSA6',
       'EState_VSA7', 'EState_VSA8', 'EState_VSA9', 'ExactMolWt',
       'FractionCSP3', 'HallKierAlpha', 'HeavyAtomMolWt', 'Ipc', 'Kappa1',
       'Kappa2', 'Kappa3', 'LabuteASA', 'MaxAbsEStateIndex',
       'MaxAbsPartialCharge', 'MaxEStateIndex', 'MaxPartialCharge',
       'MinAbsEStateIndex', 'MinAbsPartialCharge', 'MinEStateIndex',
       'MinPartialCharge', 'MolLogP', 'MolMR', 'MolWt', 'PEOE_VSA1',
       'PEOE_VSA10', 'PEOE_VSA11', 'PEOE_VSA12', 'PEOE_VSA13', 'PEOE_VSA14',
       'PEOE_VSA2', 'PEOE_VSA3', 'PEOE_VSA4', 'PEOE_VSA5', 'PEOE_VSA6',
       'PEOE_VSA7', 'PEOE_VSA8', 'PEOE_VSA9', 'SMR_VSA1', 'SMR_VSA10',
       'SMR_VSA2', 'SMR_VSA3', 'SMR_VSA4', 'SMR_VSA5', 'SMR_VSA6', 'SMR_VSA7',
       'SMR_VS

In [22]:
# check if they add up to 196
len(int_list) + len(float_list)

196

In [25]:
# Separate train and test into their continuous and discrete parts
df_train_cont = df_train[float_list]
df_train_cont

Unnamed: 0,BalabanJ,BertzCT,Chi0,Chi0n,Chi0v,Chi1,Chi1n,Chi1v,Chi2n,Chi2v,Chi3n,Chi3v,Chi4n,Chi4v,EState_VSA1,EState_VSA10,EState_VSA11,EState_VSA2,EState_VSA3,EState_VSA4,EState_VSA5,EState_VSA6,EState_VSA7,EState_VSA8,EState_VSA9,ExactMolWt,FractionCSP3,HallKierAlpha,HeavyAtomMolWt,Ipc,Kappa1,Kappa2,Kappa3,LabuteASA,MaxAbsEStateIndex,MaxAbsPartialCharge,MaxEStateIndex,MaxPartialCharge,MinAbsEStateIndex,MinAbsPartialCharge,MinEStateIndex,MinPartialCharge,MolLogP,MolMR,MolWt,PEOE_VSA1,PEOE_VSA10,PEOE_VSA11,PEOE_VSA12,PEOE_VSA13,PEOE_VSA14,PEOE_VSA2,PEOE_VSA3,PEOE_VSA4,PEOE_VSA5,PEOE_VSA6,PEOE_VSA7,PEOE_VSA8,PEOE_VSA9,SMR_VSA1,SMR_VSA10,SMR_VSA2,SMR_VSA3,SMR_VSA4,SMR_VSA5,SMR_VSA6,SMR_VSA7,SMR_VSA9,SlogP_VSA1,SlogP_VSA10,SlogP_VSA11,SlogP_VSA12,SlogP_VSA2,SlogP_VSA3,SlogP_VSA4,SlogP_VSA5,SlogP_VSA6,SlogP_VSA7,SlogP_VSA8,TPSA,VSA_EState10,VSA_EState8,VSA_EState9
0,1.821,1266.407,22.121,16.781,16.781,14.901,9.203,9.203,6.668,6.668,4.719,4.719,3.241,3.241,6.093,9.185,0.0,35.509,22.291,4.900,20.224,41.210,0.000,24.787,11.467,422.162,0.150,-4.28,403.272,1.218475e+07,20.012,7.859,3.564,175.833,14.164,0.452,14.164,0.413,0.018,0.413,-0.668,-0.452,2.443,113.689,422.424,16.204,17.199,23.107,0.000,0.000,6.093,4.900,28.819,5.099,0.000,18.199,18.199,18.808,19.041,13.922,34.450,0.000,24.732,0.000,6.545,30.525,53.976,11.518,16.367,26.508,0.000,0.000,44.983,11.282,5.817,5.563,42.595,0.000,22.552,138.07,0.000,15.727,62.940
1,2.363,490.434,11.707,8.752,9.569,7.592,4.854,5.670,3.545,4.661,2.449,3.735,1.520,2.621,12.073,10.002,0.0,0.000,5.063,14.963,0.000,18.261,30.332,0.000,5.734,236.062,0.182,-1.59,224.200,5.774616e+03,10.912,4.112,1.868,96.858,10.832,0.350,10.832,0.339,0.421,0.339,-0.843,-0.350,2.732,63.449,236.296,5.734,0.000,0.000,0.000,0.000,6.031,5.207,4.795,5.063,11.337,18.199,24.443,9.577,6.042,10.002,27.454,0.000,5.063,5.734,12.966,0.000,35.209,0.000,5.734,4.795,0.000,11.337,16.302,0.000,0.000,17.843,30.332,0.000,10.086,66.56,1.542,0.000,39.292
2,3.551,93.092,6.784,5.471,5.471,3.417,2.420,2.420,2.820,2.820,0.603,0.603,0.387,0.387,5.969,9.901,0.0,6.545,4.483,0.000,0.000,21.143,0.000,0.000,0.000,117.079,0.800,-0.57,106.060,2.300500e+01,7.430,2.132,5.430,49.125,9.888,0.544,9.888,0.118,0.069,0.118,-1.002,-0.544,-1.557,27.906,117.148,14.384,6.545,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,27.112,14.384,5.969,0.000,0.000,0.000,0.000,27.688,0.000,0.000,5.107,0.000,0.000,0.000,38.140,4.795,0.000,0.000,0.000,0.000,0.000,40.13,0.000,0.419,24.248
3,2.076,1053.003,21.836,16.995,16.995,14.274,9.926,9.926,7.662,7.662,4.998,4.998,3.768,3.768,24.598,19.398,0.0,18.280,0.000,27.724,12.133,24.285,24.265,18.415,5.107,411.185,0.292,-2.96,385.265,4.553841e+06,21.742,9.238,4.777,174.259,13.498,0.481,13.498,0.305,0.073,0.305,-1.130,-0.481,4.628,115.634,411.473,19.887,5.817,0.000,0.000,0.000,5.969,4.795,4.390,0.000,0.000,36.408,43.686,34.623,18.629,24.505,22.948,0.000,4.567,0.000,44.939,0.000,66.118,11.127,0.000,4.390,0.000,0.000,38.064,4.795,5.817,38.425,54.607,0.000,28.106,82.69,0.000,15.642,62.025
4,2.888,549.823,14.629,9.746,9.746,8.752,5.040,5.040,3.601,3.601,2.171,2.171,1.232,1.232,28.979,17.966,0.0,5.687,0.000,31.189,6.069,0.000,0.000,5.317,10.368,270.062,0.167,-2.49,261.138,1.013411e+04,14.571,5.436,3.741,106.326,12.300,0.511,12.300,0.416,0.090,0.416,-4.456,-0.511,2.999,61.016,270.210,10.423,11.828,5.573,0.000,5.907,6.176,4.795,0.000,18.433,0.000,0.000,31.189,5.687,5.563,23.072,11.595,5.262,0.000,0.000,13.100,5.317,41.161,6.069,5.317,18.859,0.000,0.000,11.014,10.971,11.331,12.487,35.598,0.000,0.000,73.12,0.000,36.900,30.684
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
472,2.022,537.932,10.795,9.110,9.110,7.933,5.672,5.672,4.070,4.070,2.981,2.981,2.192,2.192,0.000,0.000,0.0,0.000,0.000,25.346,16.336,0.000,0.000,52.774,0.000,210.116,0.214,-1.83,196.168,1.143541e+04,9.400,3.969,1.668,95.012,4.443,0.372,4.443,0.101,0.917,0.101,0.917,-0.372,2.384,67.957,210.280,5.317,5.836,0.000,0.000,0.000,0.000,4.992,0.000,0.000,0.000,42.465,16.336,12.966,6.545,0.000,16.608,0.000,5.317,4.992,6.421,13.090,48.028,0.000,5.317,0.000,0.000,0.000,18.925,6.421,0.000,5.563,47.457,0.000,10.772,24.39,0.000,0.000,30.667
473,1.602,848.658,17.897,15.202,15.202,12.389,10.003,10.003,9.607,9.607,8.154,8.154,6.938,6.938,17.120,15.008,0.0,17.575,18.591,47.459,12.842,6.066,6.066,7.048,4.737,356.186,0.667,-1.59,330.234,1.889143e+06,15.466,4.448,1.550,152.710,12.800,0.504,12.800,0.174,0.079,0.174,-0.931,-0.504,1.670,94.143,356.442,19.433,11.643,23.386,0.000,0.000,0.000,4.795,0.000,0.000,0.000,6.066,30.892,30.744,25.552,24.227,5.783,0.000,0.000,5.918,61.687,20.137,23.260,11.499,4.737,0.000,11.499,0.000,58.363,16.630,5.918,43.231,12.133,0.000,0.000,66.76,0.000,0.902,55.931
474,1.766,910.031,21.129,14.986,15.802,13.845,8.129,9.178,5.826,6.931,4.077,5.417,2.834,4.161,35.293,24.285,0.0,35.102,0.000,4.900,31.201,6.066,0.000,20.047,10.151,424.069,0.313,-3.48,408.263,2.993395e+06,20.259,8.261,3.893,167.193,12.570,0.477,12.570,0.404,0.140,0.404,-1.353,-0.477,-0.536,97.474,424.391,30.149,30.829,5.760,5.712,11.814,12.063,14.489,9.589,0.000,11.762,5.156,12.133,11.326,6.263,38.276,41.350,0.000,10.217,10.889,11.416,19.470,35.426,0.000,11.050,4.795,0.000,11.762,70.480,23.958,0.000,5.760,39.239,0.000,0.000,173.76,1.213,5.136,78.485
475,1.831,926.191,18.518,13.372,16.396,12.525,7.566,9.078,5.406,7.094,3.543,4.678,2.258,3.239,0.000,0.000,0.0,6.607,32.347,11.127,0.000,48.922,10.764,10.140,51.241,426.981,0.111,-1.72,416.030,9.533438e+05,19.053,8.780,4.921,170.455,6.322,0.391,6.322,0.143,0.203,0.143,0.203,-0.391,6.118,106.477,429.134,9.405,12.319,0.000,0.000,0.000,0.000,0.000,4.984,0.000,0.000,57.626,30.332,38.588,17.895,4.838,52.115,0.000,9.551,5.156,13.152,0.000,86.337,0.000,0.000,0.000,0.000,46.404,15.263,17.989,0.000,11.127,60.275,20.091,0.000,39.41,24.367,1.863,34.381


In [26]:
df_train_discrete = df_train[int_list]
df_train_discrete

Unnamed: 0,HeavyAtomCount,NHOHCount,NOCount,NumAliphaticCarbocycles,NumAliphaticHeterocycles,NumAliphaticRings,NumAromaticCarbocycles,NumAromaticHeterocycles,NumAromaticRings,NumHAcceptors,NumHDonors,NumHeteroatoms,NumRadicalElectrons,NumRotatableBonds,NumSaturatedCarbocycles,NumSaturatedHeterocycles,NumSaturatedRings,NumValenceElectrons,RingCount,SMR_VSA8,SlogP_VSA9,VSA_EState1,VSA_EState2,VSA_EState3,VSA_EState4,VSA_EState5,VSA_EState6,VSA_EState7,fr_Al_COO,fr_Al_OH,fr_Al_OH_noTert,fr_ArN,fr_Ar_COO,fr_Ar_N,fr_Ar_NH,fr_Ar_OH,fr_COO,fr_COO2,fr_C_O,fr_C_O_noCOO,fr_C_S,fr_HOCCN,fr_Imine,fr_NH0,fr_NH1,fr_NH2,fr_N_O,fr_Ndealkylation1,fr_Ndealkylation2,fr_Nhpyrrole,fr_SH,fr_aldehyde,fr_alkyl_carbamate,fr_alkyl_halide,fr_allylic_oxid,fr_amide,fr_amidine,fr_aniline,fr_aryl_methyl,fr_azide,fr_azo,fr_barbitur,fr_benzene,fr_benzodiazepine,fr_bicyclic,fr_diazo,fr_dihydropyridine,fr_epoxide,fr_ester,fr_ether,fr_furan,fr_guanido,fr_halogen,fr_hdrzine,fr_hdrzone,fr_imidazole,fr_imide,fr_isocyan,fr_isothiocyan,fr_ketone,fr_ketone_Topliss,fr_lactam,fr_lactone,fr_methoxy,fr_morpholine,fr_nitrile,fr_nitro,fr_nitro_arom,fr_nitro_arom_nonortho,fr_nitroso,fr_oxazole,fr_oxime,fr_para_hydroxylation,fr_phenol,fr_phenol_noOrthoHbond,fr_phos_acid,fr_phos_ester,fr_piperdine,fr_piperzine,fr_priamide,fr_prisulfonamd,fr_pyridine,fr_quatN,fr_sulfide,fr_sulfonamd,fr_sulfone,fr_term_acetylene,fr_tetrazole,fr_thiazole,fr_thiocyan,fr_thiophene,fr_unbrch_alkane,fr_urea
0,31,4,10,0,0,0,1,3,4,9,2,11,0,6,0,0,0,158,4,0,0,0,0,0,0,0,0,0,0,0,0,2,0,5,0,0,0,0,1,1,0,0,0,6,0,2,0,0,0,0,0,0,0,0,0,1,0,3,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1,16,3,4,0,0,0,1,1,2,3,2,5,0,3,0,0,0,84,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,1,2,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1
2,8,0,3,0,0,0,0,0,0,2,0,3,0,2,0,0,0,48,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,30,3,5,0,0,0,2,1,3,4,3,6,0,8,0,0,0,158,3,0,0,0,0,0,0,0,0,0,1,2,2,0,0,1,0,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,19,2,4,0,0,0,1,0,1,3,2,7,0,4,0,0,0,100,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,3,1,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
472,16,1,2,0,1,1,2,0,2,2,1,2,0,2,0,0,0,80,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
473,26,2,5,3,2,5,1,0,1,4,2,5,0,2,2,1,3,138,6,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
474,29,4,12,0,2,2,0,1,1,9,3,13,0,9,0,1,1,154,3,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,4,3,0,0,0,2,1,1,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0
475,26,0,4,0,0,0,2,1,3,4,0,8,0,6,0,0,0,134,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,4,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [33]:
# do the same for test
df_test_cont = df_test[float_list]
df_test_discrete = df_test[int_list]

### Scale and simplify
Here, we will use scalers to scale the dataset and ensure that they're in a format that's friendly enough for machine learning later.

In [34]:
# Use StandardScaler for continuous values
standard_scaler = StandardScaler()

In [37]:
# fit_transform df_train_cont
df_train_cont_scaled = standard_scaler.fit_transform(df_train_cont.values)
df_train_cont_scaled

array([[-0.45287476,  1.34446538,  0.551264  , ..., -0.28323031,
         0.46629423,  0.39772629],
       [ 0.31032188, -0.63225659, -0.88719363, ...,  0.06627984,
        -0.48429991, -0.53802509],
       [ 1.98315882, -1.64444987, -1.56719429, ..., -0.28323031,
        -0.45897411, -1.13331618],
       ...,
       [-0.53032092,  0.43662932,  0.41424172, ..., -0.00829139,
        -0.17386235,  1.01284196],
       [-0.43879364,  0.47779548,  0.05359136, ...,  5.23980062,
        -0.37169376, -0.73235336],
       [-0.74012959,  2.10608515,  1.71484832, ..., -1.28982859,
         3.6141891 ,  0.01302674]])

In [39]:
# do the same for df_test_cont
df_test_cont_scaled = standard_scaler.transform(df_test_cont.values)
df_test_cont_scaled

array([[-0.92740847,  0.01166007,  0.46079062, ..., -0.28323031,
        -0.48429991,  0.3638148 ],
       [-0.94430582,  0.3319715 ,  0.41576112, ...,  0.06287993,
         1.92932793, -0.76887647],
       [-1.83845687,  1.70771851,  2.90882064, ..., -0.28323031,
        -0.48429991,  2.99521986],
       ...,
       [-0.66549966, -0.36973572, -0.38206212, ...,  0.1383578 ,
        -0.48429991, -0.49129292],
       [ 0.64826874,  0.41711616,  0.59284031, ..., -0.28323031,
        -0.48429991,  1.07279051],
       [ 0.94538035, -1.60188009, -1.05598518, ..., -0.28323031,
        -0.48429991, -1.12334454]])

In [42]:
# use MixMaxScaler for discrete values
minmax_scaler = MinMaxScaler()

In [46]:
# fit_transform df_train_discrete
df_train_discrete_scaled = minmax_scaler.fit_transform(df_train_discrete.values)
df_train_discrete_scaled

array([[0.42857143, 0.25      , 0.43478261, ..., 0.        , 0.        ,
        0.        ],
       [0.19047619, 0.1875    , 0.17391304, ..., 0.5       , 0.        ,
        1.        ],
       [0.06349206, 0.        , 0.13043478, ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.3968254 , 0.25      , 0.52173913, ..., 0.        , 0.        ,
        0.        ],
       [0.34920635, 0.        , 0.17391304, ..., 0.        , 0.        ,
        0.        ],
       [0.6031746 , 0.        , 0.30434783, ..., 0.        , 0.        ,
        0.        ]])

In [47]:
# do the same for df_test_discrete
df_test_discrete_scaled = minmax_scaler.transform(df_test_discrete.values)
df_test_discrete_scaled

array([[0.41269841, 0.1875    , 0.13043478, ..., 0.        , 0.        ,
        0.        ],
       [0.41269841, 0.0625    , 0.17391304, ..., 0.        , 0.        ,
        0.        ],
       [0.80952381, 0.375     , 0.60869565, ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.28571429, 0.0625    , 0.08695652, ..., 0.5       , 0.        ,
        0.        ],
       [0.41269841, 0.0625    , 0.39130435, ..., 0.        , 0.05555556,
        0.        ],
       [0.15873016, 0.0625    , 0.04347826, ..., 0.        , 0.5       ,
        0.        ]])

### Combine and reconstitute
Now that we've scaled these quantities separately, let's combine them again!

In [53]:
# convert the scaled arrays into DataFrames
df_train_cont_scaled_df = pd.DataFrame(df_train_cont_scaled)
df_train_cont_scaled_df.columns = df_train_cont.columns
df_train_cont_scaled_df

Unnamed: 0,BalabanJ,BertzCT,Chi0,Chi0n,Chi0v,Chi1,Chi1n,Chi1v,Chi2n,Chi2v,Chi3n,Chi3v,Chi4n,Chi4v,EState_VSA1,EState_VSA10,EState_VSA11,EState_VSA2,EState_VSA3,EState_VSA4,EState_VSA5,EState_VSA6,EState_VSA7,EState_VSA8,EState_VSA9,ExactMolWt,FractionCSP3,HallKierAlpha,HeavyAtomMolWt,Ipc,Kappa1,Kappa2,Kappa3,LabuteASA,MaxAbsEStateIndex,MaxAbsPartialCharge,MaxEStateIndex,MaxPartialCharge,MinAbsEStateIndex,MinAbsPartialCharge,MinEStateIndex,MinPartialCharge,MolLogP,MolMR,MolWt,PEOE_VSA1,PEOE_VSA10,PEOE_VSA11,PEOE_VSA12,PEOE_VSA13,PEOE_VSA14,PEOE_VSA2,PEOE_VSA3,PEOE_VSA4,PEOE_VSA5,PEOE_VSA6,PEOE_VSA7,PEOE_VSA8,PEOE_VSA9,SMR_VSA1,SMR_VSA10,SMR_VSA2,SMR_VSA3,SMR_VSA4,SMR_VSA5,SMR_VSA6,SMR_VSA7,SMR_VSA9,SlogP_VSA1,SlogP_VSA10,SlogP_VSA11,SlogP_VSA12,SlogP_VSA2,SlogP_VSA3,SlogP_VSA4,SlogP_VSA5,SlogP_VSA6,SlogP_VSA7,SlogP_VSA8,TPSA,VSA_EState10,VSA_EState8,VSA_EState9
0,-0.452875,1.344465,0.551264,0.395663,0.309064,0.627503,0.204954,0.062833,-0.001312,-0.175467,-0.046856,-0.216005,-0.134956,-0.295263,-0.525136,-0.384981,-0.181361,1.395076,0.554831,-0.812459,0.507059,2.249669,-0.893830,0.593069,0.084085,0.435157,-1.274949,-2.173874,0.494435,-0.077296,0.224120,0.054235,-0.073153,0.460355,1.022116,0.090102,1.022116,1.398748,-0.410428,1.551494,0.396527,-0.140761,0.042382,0.514900,0.433896,0.012270,0.836808,2.541196,-0.547020,-0.461025,0.310042,-0.363430,4.605039,0.582421,-0.362256,-0.293023,-0.790225,0.005257,0.500367,-0.350813,1.067887,-0.190321,2.397158,-0.706031,-0.991136,1.069443,0.578333,1.199439,1.377778,3.798498,-0.496764,-0.536573,0.116134,-0.150281,-0.197533,-1.047197,0.565206,-0.322665,2.851332,0.972146,-0.283230,0.466294,0.397726
1,0.310322,-0.632257,-0.887194,-0.923257,-0.867786,-0.887752,-0.954128,-0.864476,-0.953691,-0.776947,-0.874703,-0.567973,-0.887579,-0.560062,-0.263887,-0.305248,-0.181361,-1.140124,-0.755872,-0.247407,-0.915263,0.516790,0.771000,-0.948309,-0.452991,-0.848802,-1.152678,0.324123,-0.825238,-0.077299,-0.936127,-0.884150,-0.090552,-0.870688,-0.235308,-0.957155,-0.235308,0.715441,0.315142,0.817511,0.283780,0.874895,0.156859,-0.809262,-0.849028,-0.875052,-0.898272,-0.599476,-0.547020,-0.461025,0.299272,-0.319031,0.145245,0.575623,1.532125,-0.293023,-0.541730,-0.626900,-0.622577,-0.618204,0.591976,-0.190321,-0.248317,-0.136601,-0.797708,-0.948979,-0.131601,-0.591507,-0.152654,0.184222,-0.496764,0.540251,-0.902692,-1.186703,-0.759979,-0.572578,-0.003722,-0.322665,0.995063,-0.339273,0.066280,-0.484300,-0.538025
2,1.983159,-1.644450,-1.567194,-1.462226,-1.536495,-1.753286,-1.602831,-1.717507,-1.174784,-1.328678,-1.547920,-1.688262,-1.383059,-1.514193,-0.530553,-0.315105,-0.181361,-0.672837,-0.799998,-1.087602,-0.915263,0.734410,-0.893830,-0.948309,-0.990161,-1.669700,1.208687,1.271318,-1.695872,-0.077299,-1.380081,-1.380014,-0.054011,-1.675180,-0.591554,1.034686,-0.591554,-1.325244,-0.318607,-1.374517,0.181342,-1.056843,-1.542071,-1.746059,-1.670280,-0.141973,-0.237995,-0.599476,-0.547020,-0.461025,-0.748403,-1.072078,-0.744894,-0.380377,-0.362256,-1.138449,-1.514497,-1.282752,1.197596,-0.319299,-0.869567,-0.190321,-0.929289,-0.706031,-1.188299,0.881851,-1.463516,-0.591507,-0.242899,-0.613938,-0.496764,-0.536573,-0.126947,-0.746210,-0.759979,-1.262206,-1.410942,-0.322665,-0.506809,-0.823971,-0.283230,-0.458974,-1.133316
3,-0.093806,0.800838,0.511898,0.430817,0.343985,0.497517,0.397646,0.252599,0.301815,0.122425,0.054892,-0.116209,0.095510,-0.070183,0.283293,0.611727,-0.181361,0.164995,-1.141063,0.469142,-0.061968,0.971662,0.438001,0.196827,-0.511729,0.359423,-0.732370,-0.948091,0.361732,-0.077298,0.444694,0.399587,-0.060710,0.433827,0.770782,0.387851,0.770782,0.401489,-0.311405,0.480276,0.098875,-0.429526,0.907890,0.566164,0.358414,0.324401,-0.311438,-0.599476,-0.547020,-0.461025,0.288501,-0.378616,0.070061,-0.380377,-0.362256,0.552867,0.224091,1.088299,0.464775,0.371076,0.285450,-0.190321,-0.315029,-0.706031,0.165452,-0.948979,1.037651,1.138642,-0.977961,0.116807,-0.496764,-0.536573,-0.129647,-0.746210,-0.197533,0.222912,1.122490,-0.322665,3.678359,-0.043465,-0.283230,0.461157,0.361520
4,1.049581,-0.480968,-0.483586,-0.759973,-0.838903,-0.647268,-0.904556,-1.029833,-0.936613,-1.094620,-0.976086,-1.127402,-1.013526,-1.153297,0.474686,0.471975,-0.181361,-0.734095,-1.141063,0.663707,-0.488440,-0.862098,-0.893830,-0.617672,-0.018871,-0.614226,-1.209993,-0.511638,-0.553023,-0.077299,-0.469606,-0.552572,-0.071338,-0.711115,0.318683,0.695868,0.318683,1.426449,-0.280798,1.581250,-2.043958,-0.728248,0.262622,-0.873388,-0.615269,-0.477664,0.294968,0.157998,-0.547020,1.298737,0.324460,-0.378616,-0.744894,3.100159,-0.362256,-1.138449,-0.273257,-0.893295,-0.663957,0.273328,-0.486851,4.342494,-0.929289,-0.706031,-0.793672,-0.597400,0.093556,0.352168,-0.212673,2.525270,-0.496764,-0.536573,-1.090536,-0.178851,0.335615,-0.779586,0.240588,-0.322665,-0.506809,-0.218969,-0.283230,1.746063,-0.878644
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
472,-0.169844,-0.511260,-1.013166,-0.864449,-0.942685,-0.817058,-0.736117,-0.863952,-0.793589,-0.954065,-0.680687,-0.837672,-0.593701,-0.743286,-0.791321,-1.281364,-0.181361,-1.140124,-1.141063,0.335614,0.233622,-0.862098,-0.893830,2.333439,-0.990161,-1.027811,-1.030407,0.101253,-1.031820,-0.077299,-1.128907,-0.919962,-0.092603,-0.901801,-2.646379,-0.731276,-2.646379,-1.482220,1.208151,-1.543135,1.417691,0.655832,0.019012,-0.690446,-1.028349,-0.910393,-0.309521,-0.599476,-0.547020,-0.461025,-0.748403,-0.350125,-0.744894,-0.380377,-0.362256,0.834242,-0.864367,-0.394815,-0.579125,-1.300462,-0.145836,-0.190321,-0.214154,-0.210287,-0.994871,-0.083421,0.353327,-0.591507,-0.212673,-0.613938,-0.496764,-0.536573,-0.809516,-0.596837,-0.759979,-1.047197,0.790773,-0.322665,1.097213,-1.112627,-0.283230,-0.484300,-0.879316
473,-0.761251,0.280287,-0.032186,0.136281,0.051404,0.106731,0.418168,0.272810,0.894955,0.705324,1.205854,1.012664,1.481805,1.283709,-0.043399,0.183298,-0.181361,0.114661,0.273335,1.577292,-0.012105,-0.404055,-0.560886,-0.510029,-0.546392,-0.020031,0.700497,0.324123,-0.043819,-0.077299,-0.355494,-0.800003,-0.093814,0.070641,0.507372,0.623997,0.507372,-0.808148,-0.300603,-0.819071,0.227085,-0.658546,-0.263813,-0.000269,-0.020898,0.285925,0.276304,2.579117,-0.547020,-0.461025,-0.748403,-0.378616,-0.744894,-0.380377,-0.362256,-0.856656,-0.285077,0.822658,1.062832,0.352113,-0.882220,-0.190321,-0.929289,-0.118328,0.669972,0.382552,-0.583618,1.196485,-0.296154,-0.613938,1.906586,-0.536573,0.591428,0.341014,-0.187768,0.408663,-0.848045,-0.322665,-0.506809,-0.335605,-0.283230,-0.429780,0.120380
474,-0.530321,0.436629,0.414242,0.100799,0.149312,0.408580,-0.081285,0.056271,-0.258085,-0.096649,-0.280987,0.033663,-0.312944,0.097665,0.750526,1.088659,-0.181361,1.366018,-1.141063,-0.812459,1.279054,-0.404055,-0.893830,0.298312,-0.039200,0.448313,-0.652130,-1.430975,0.531216,-0.077298,0.255612,0.154910,-0.069778,0.314737,0.420575,0.346782,0.420575,1.315643,-0.190777,1.462226,-0.044796,-0.389696,-1.137640,0.087526,0.447454,1.194096,2.211838,0.183415,0.506104,3.058500,1.347120,1.023351,1.035199,-0.380377,1.603141,-0.898930,-1.031636,-0.507125,-0.603486,1.310425,1.537268,-0.190321,0.444895,0.375330,-0.844401,0.338447,-0.123392,-0.591507,0.612490,0.184222,-0.496764,0.580619,1.021856,1.014201,-0.759979,-1.039583,0.409508,-0.322665,-0.506809,1.626664,-0.008291,-0.173862,1.012842
475,-0.438794,0.477795,0.053591,-0.164332,0.246240,0.134926,-0.231334,0.030024,-0.386167,-0.047799,-0.475732,-0.230670,-0.564839,-0.296117,-0.791321,-1.281364,-0.181361,-0.668410,1.319889,-0.462804,-0.915263,2.832002,-0.303028,-0.317754,3.810173,0.468404,-1.423968,0.203402,0.588455,-0.077299,0.101847,0.284887,-0.059233,0.369715,-1.937285,-0.536199,-1.937285,-1.094398,-0.077350,-1.126550,0.957684,0.466641,1.498100,0.324815,0.480146,-0.563938,0.344501,-0.599476,-0.547020,-0.461025,-0.748403,-1.072078,0.180331,-0.380377,-0.362256,1.538539,-0.307363,1.359830,0.401367,-0.970452,2.269570,-0.190321,0.355318,-0.194001,-0.792105,-0.948979,1.802512,-0.591507,-0.977961,-0.613938,-0.496764,3.871027,-0.939600,0.465858,-0.759979,-0.832150,1.385450,7.102301,-0.506809,-0.837175,5.239801,-0.371694,-0.732353


In [54]:
# do the same for other DataFrames
df_test_cont_scaled_df = pd.DataFrame(df_test_cont_scaled)
df_test_cont_scaled_df.columns = df_test_cont.columns
df_test_cont_scaled_df

Unnamed: 0,BalabanJ,BertzCT,Chi0,Chi0n,Chi0v,Chi1,Chi1n,Chi1v,Chi2n,Chi2v,Chi3n,Chi3v,Chi4n,Chi4v,EState_VSA1,EState_VSA10,EState_VSA11,EState_VSA2,EState_VSA3,EState_VSA4,EState_VSA5,EState_VSA6,EState_VSA7,EState_VSA8,EState_VSA9,ExactMolWt,FractionCSP3,HallKierAlpha,HeavyAtomMolWt,Ipc,Kappa1,Kappa2,Kappa3,LabuteASA,MaxAbsEStateIndex,MaxAbsPartialCharge,MaxEStateIndex,MaxPartialCharge,MinAbsEStateIndex,MinAbsPartialCharge,MinEStateIndex,MinPartialCharge,MolLogP,MolMR,MolWt,PEOE_VSA1,PEOE_VSA10,PEOE_VSA11,PEOE_VSA12,PEOE_VSA13,PEOE_VSA14,PEOE_VSA2,PEOE_VSA3,PEOE_VSA4,PEOE_VSA5,PEOE_VSA6,PEOE_VSA7,PEOE_VSA8,PEOE_VSA9,SMR_VSA1,SMR_VSA10,SMR_VSA2,SMR_VSA3,SMR_VSA4,SMR_VSA5,SMR_VSA6,SMR_VSA7,SMR_VSA9,SlogP_VSA1,SlogP_VSA10,SlogP_VSA11,SlogP_VSA12,SlogP_VSA2,SlogP_VSA3,SlogP_VSA4,SlogP_VSA5,SlogP_VSA6,SlogP_VSA7,SlogP_VSA8,TPSA,VSA_EState10,VSA_EState8,VSA_EState9
0,-0.927408,0.011660,0.460791,0.721410,0.632649,0.501249,0.978653,0.824786,1.239251,1.043675,1.343707,1.147872,1.345362,1.150456,-0.257990,0.213746,-0.181361,-0.704323,2.048809,-0.101188,2.186080,-0.862098,-0.893830,1.833223,-0.990161,0.367102,0.841873,0.723430,0.266143,-0.077297,0.483836,0.280629,-0.064608,0.561412,-0.463622,-0.515664,-0.463622,-1.666898,0.007270,-1.741508,0.426808,0.446727,1.091291,0.727625,0.366279,-0.062648,-0.898272,-0.599476,-0.547020,-0.461025,-0.748403,-1.072078,-0.744894,-0.380377,-0.362256,1.198399,2.130903,-0.843029,0.437391,-0.255452,-1.275615,-0.190321,-0.929289,2.182529,1.521281,-0.948979,0.337212,-0.591507,-0.977961,-0.613938,-0.496764,-0.536573,-0.287119,-1.186703,2.052441,1.506471,0.797500,-0.322665,-0.506809,-0.446922,-0.283230,-0.484300,0.363815
1,-0.944306,0.331971,0.415761,0.388271,0.435039,0.534627,0.488795,0.556540,0.301815,0.447290,0.307258,0.535504,0.230641,0.521344,-0.278435,0.004021,-0.181361,-0.668410,0.287486,1.979562,0.765235,-0.404055,0.706947,-0.338897,-0.511729,0.538736,-0.109551,0.184829,0.553767,-0.077298,0.412564,0.384310,-0.060772,0.510698,0.700213,-0.279518,0.700213,1.426449,-0.107957,1.372958,-1.976310,0.426812,0.781134,0.512501,0.538024,-0.097734,-0.898272,-0.599476,-0.547020,-0.461025,0.324460,-0.363430,-0.744894,2.106583,-0.362256,-0.028418,0.208610,2.076973,0.889454,-0.053681,0.298307,-0.190321,0.388808,-0.706031,-0.513878,2.841224,0.353327,-0.591507,-0.272693,3.471910,-0.496764,0.580619,0.909853,-0.619344,-0.759979,-0.799027,1.013418,-0.322665,-0.506809,-1.010662,0.062880,1.929328,-0.768876
2,-1.838457,1.707719,2.908821,3.044182,2.940006,2.930757,3.333331,3.143715,3.703299,3.465178,4.049707,3.801941,4.028740,3.771113,3.180967,2.176713,-0.181361,3.222955,0.286193,1.388567,-0.915263,0.642288,-0.133811,-0.517740,2.116130,2.906952,1.693951,0.621282,2.802308,0.264547,2.837225,1.862886,-0.038419,2.946133,0.419820,0.151705,0.419820,0.641570,-0.361817,0.738162,0.174899,-0.200505,-0.046743,2.595025,2.905102,4.045733,1.615622,1.965309,-0.547020,-0.461025,0.288501,-1.072078,0.145245,-0.380377,-0.362256,-0.495193,2.735783,0.823274,3.557774,3.378344,-0.869567,-0.190321,-0.929289,2.720280,4.742008,-0.512100,-1.022848,-0.591507,-0.977961,-0.613938,-0.496764,-0.536573,3.307741,2.299853,2.576017,2.805607,-0.870499,-0.322665,-0.506809,2.163995,-0.283230,-0.484300,2.995220
3,0.370871,-0.298892,-0.594226,-0.669461,-0.748991,-0.576782,-0.689477,-0.818019,-0.748760,-0.910010,-0.731379,-0.887392,-0.722273,-0.868852,-0.533262,-0.345554,-0.181361,-0.268807,0.561449,-1.087602,-0.915263,2.344434,-0.560886,-0.948309,0.084085,-0.724311,-1.592091,-0.632359,-0.707495,-0.077299,-0.691965,-0.573859,-0.081032,-0.639317,0.322079,-0.464328,0.322079,-0.374156,-0.388823,-0.352893,0.520226,0.396940,-0.320061,-0.541003,-0.725008,-0.389186,-0.898272,0.186541,0.542057,-0.461025,-0.748403,0.314702,-0.744894,-0.380377,-0.362256,0.834242,-1.051654,-0.131296,-0.589837,-0.646376,-0.093456,-0.190321,-0.929289,-0.136601,-0.994871,-0.569826,1.003718,-0.591507,0.672510,0.332701,-0.496764,-0.536573,-1.066522,-0.156436,-0.759979,-0.431815,0.840600,-0.322665,-0.506809,0.020538,-0.283230,-0.484300,-0.134095
4,-1.158339,3.539286,2.738372,2.706608,2.737990,2.909818,2.567095,2.664443,2.518238,2.639230,2.705460,3.010370,2.818683,3.311986,1.265508,1.149556,-0.181361,1.890135,4.199888,-0.150433,1.398894,0.054065,0.854922,-0.312965,1.672361,2.774706,0.012721,-2.294595,2.817556,0.306693,2.173589,1.232538,-0.061069,2.818582,1.217598,0.623997,1.217598,0.641570,-0.401426,0.738162,-0.048018,-0.658546,0.426613,2.646895,2.773458,2.346006,0.976530,4.842163,0.705409,-0.461025,1.325580,1.807491,0.145245,-0.380377,1.603141,-0.856656,1.196620,1.971305,2.103191,2.337290,0.336606,-0.190321,1.103943,-0.706031,1.064900,2.156515,0.908616,5.666542,3.196219,-0.613938,7.915066,0.580619,2.153396,1.817929,0.578887,1.395547,-0.566619,-0.322665,-0.506809,1.534235,0.049507,-0.484300,2.831637
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
115,-0.893614,0.989616,1.136924,0.912127,0.955418,1.057057,0.803818,0.883579,0.770838,1.114103,0.624903,1.004795,0.480787,0.997982,1.999232,2.024469,-0.181361,-0.205550,-0.219890,-0.812459,0.601159,2.996690,-0.893830,-0.287034,-0.990161,1.083642,-0.021668,-1.551696,1.136143,-0.077282,0.921543,0.452429,-0.065891,1.051105,0.698703,1.075755,0.698703,0.586167,-0.396025,0.678650,-0.055749,-1.096672,-1.549201,0.805403,1.082720,1.209775,0.862937,-0.599476,1.631133,-0.461025,2.351540,2.410131,0.145245,-0.380377,1.603141,0.270609,-0.466433,0.386907,-0.106933,1.010155,1.948214,-0.190321,2.477993,-0.706031,0.288329,0.349291,-0.105650,-0.591507,1.287533,0.184222,-0.496764,0.580619,1.792272,1.015579,-0.759979,-0.010842,-0.003722,-0.322665,-0.506809,1.360749,-0.001718,-0.484300,1.838114
116,0.753877,-0.331759,-0.909708,-0.933606,-1.011384,-0.861216,-0.977848,-1.102013,-0.998214,-1.155157,-0.910807,-1.063376,-0.880144,-1.023033,-0.791321,-0.813410,-0.181361,-0.345986,-1.141063,-0.143077,-0.915263,1.054648,0.105221,-0.328449,-0.497209,-1.021194,-1.530955,-0.437348,-0.987721,-0.077299,-1.037107,-0.914954,-0.092193,-0.948992,-0.034543,-1.213835,-0.034543,0.041369,-0.221384,0.093448,0.602048,1.123831,-0.284807,-0.906861,-1.021842,-0.938614,0.275296,-0.599476,-0.547020,1.195064,-0.748403,0.342181,-0.744894,0.613199,-0.362256,-1.138449,-0.293275,0.336915,-1.144527,-1.300462,-1.275615,4.342494,0.411404,-0.706031,-0.979719,-0.948979,0.511300,2.082317,-0.177842,-0.613938,-0.496764,-0.536573,-1.127692,-1.186703,1.005096,-0.827125,0.230799,-0.322665,1.150074,-0.284623,-0.283230,-0.484300,-0.457263
117,-0.665500,-0.369736,-0.382062,-0.177145,-0.126788,-0.227873,-0.025583,0.066507,-0.151045,-0.013335,-0.059620,0.106275,0.001050,0.135677,-0.791321,-0.782961,-0.181361,-1.140124,-0.244007,0.722049,1.122151,-0.862098,0.394366,1.253224,-0.990161,-0.303040,-0.036952,0.574851,-0.338246,-0.077299,-0.342106,-0.086761,-0.073523,-0.194776,-0.548910,0.665066,-0.548910,-1.316010,0.426768,-1.364599,1.138079,-0.698376,0.764497,-0.009125,-0.303223,-0.928190,-0.318197,-0.599476,-0.547020,-0.461025,-0.748403,-0.363430,-0.744894,-0.380377,1.532125,0.028628,1.163429,-0.086783,-1.144527,-0.952103,-0.504402,-0.190321,-0.270240,-0.706031,0.169398,-0.083421,0.492840,0.302566,-0.977961,-0.613938,0.705015,0.540251,-0.446722,0.582803,-0.759979,0.120258,0.245877,-0.322665,-0.506809,-1.129498,0.138358,-0.484300,-0.491293
118,0.648269,0.417116,0.592840,0.543834,0.456252,0.475335,0.275848,0.132650,0.121281,-0.054992,-0.134746,-0.302209,-0.127084,-0.287575,0.203870,0.641492,-0.181361,1.440912,0.149097,-1.087602,0.864679,1.687195,-0.893830,-0.617672,0.341147,0.407642,-0.208896,-1.170961,0.413120,-0.077298,0.615671,0.594677,-0.056012,0.435985,0.548129,0.172240,0.548129,0.696973,-0.388823,0.797674,0.258654,-0.220420,0.251530,0.376448,0.406477,0.293892,-0.231740,-0.599476,-0.547020,1.233197,1.325580,0.390629,1.035199,-0.380377,-0.362256,-0.574816,-0.190916,0.815330,1.852928,0.658795,-0.076586,-0.190321,-0.214154,-0.706031,0.008144,0.394917,0.900181,-0.591507,-0.212673,0.332701,-0.496764,-0.536573,0.055959,0.999686,0.217943,0.251938,0.760571,-0.322665,-0.506809,0.585744,-0.283230,-0.484300,1.072791


In [55]:
df_train_discrete_scaled_df = pd.DataFrame(df_train_discrete_scaled)
df_train_discrete_scaled_df.columns = df_train_discrete.columns
df_train_discrete_scaled_df

Unnamed: 0,HeavyAtomCount,NHOHCount,NOCount,NumAliphaticCarbocycles,NumAliphaticHeterocycles,NumAliphaticRings,NumAromaticCarbocycles,NumAromaticHeterocycles,NumAromaticRings,NumHAcceptors,NumHDonors,NumHeteroatoms,NumRadicalElectrons,NumRotatableBonds,NumSaturatedCarbocycles,NumSaturatedHeterocycles,NumSaturatedRings,NumValenceElectrons,RingCount,SMR_VSA8,SlogP_VSA9,VSA_EState1,VSA_EState2,VSA_EState3,VSA_EState4,VSA_EState5,VSA_EState6,VSA_EState7,fr_Al_COO,fr_Al_OH,fr_Al_OH_noTert,fr_ArN,fr_Ar_COO,fr_Ar_N,fr_Ar_NH,fr_Ar_OH,fr_COO,fr_COO2,fr_C_O,fr_C_O_noCOO,fr_C_S,fr_HOCCN,fr_Imine,fr_NH0,fr_NH1,fr_NH2,fr_N_O,fr_Ndealkylation1,fr_Ndealkylation2,fr_Nhpyrrole,fr_SH,fr_aldehyde,fr_alkyl_carbamate,fr_alkyl_halide,fr_allylic_oxid,fr_amide,fr_amidine,fr_aniline,fr_aryl_methyl,fr_azide,fr_azo,fr_barbitur,fr_benzene,fr_benzodiazepine,fr_bicyclic,fr_diazo,fr_dihydropyridine,fr_epoxide,fr_ester,fr_ether,fr_furan,fr_guanido,fr_halogen,fr_hdrzine,fr_hdrzone,fr_imidazole,fr_imide,fr_isocyan,fr_isothiocyan,fr_ketone,fr_ketone_Topliss,fr_lactam,fr_lactone,fr_methoxy,fr_morpholine,fr_nitrile,fr_nitro,fr_nitro_arom,fr_nitro_arom_nonortho,fr_nitroso,fr_oxazole,fr_oxime,fr_para_hydroxylation,fr_phenol,fr_phenol_noOrthoHbond,fr_phos_acid,fr_phos_ester,fr_piperdine,fr_piperzine,fr_priamide,fr_prisulfonamd,fr_pyridine,fr_quatN,fr_sulfide,fr_sulfonamd,fr_sulfone,fr_term_acetylene,fr_tetrazole,fr_thiazole,fr_thiocyan,fr_thiophene,fr_unbrch_alkane,fr_urea
0,0.428571,0.2500,0.434783,0.0,0.000000,0.000000,0.333333,0.75,0.8,0.473684,0.166667,0.416667,0.0,0.214286,0.0,0.0,0.000000,0.385057,0.444444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000,1.0,0.0,0.833333,0.0,0.00,0.0,0.0,0.1,0.1,0.0,0.0,0.00,0.857143,0.000000,0.50,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.1,0.0,1.000000,0.0,0.0,0.0,0.0,0.333333,0.0,0.090909,0.0,0.0,0.0,0.00,0.1,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.190476,0.1875,0.173913,0.0,0.000000,0.000000,0.333333,0.25,0.4,0.157895,0.166667,0.166667,0.0,0.107143,0.0,0.0,0.000000,0.172414,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.1,0.1,0.0,0.0,0.00,0.142857,0.000000,0.25,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.2,0.0,0.000000,0.0,0.0,0.0,0.0,0.333333,0.0,0.090909,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,1.0
2,0.063492,0.0000,0.130435,0.0,0.000000,0.000000,0.000000,0.00,0.0,0.105263,0.000000,0.083333,0.0,0.071429,0.0,0.0,0.000000,0.068966,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.000,0.000,0.0,0.0,0.000000,0.0,0.00,0.2,0.2,0.1,0.1,0.0,0.0,0.00,0.142857,0.000000,0.00,0.000000,1.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.412698,0.1875,0.217391,0.0,0.000000,0.000000,0.666667,0.25,0.6,0.210526,0.250000,0.208333,0.0,0.285714,0.0,0.0,0.000000,0.385057,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.250,0.250,0.0,0.0,0.166667,0.0,0.00,0.2,0.2,0.1,0.0,0.0,0.0,0.00,0.142857,0.000000,0.00,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.666667,0.0,0.090909,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.238095,0.1250,0.173913,0.0,0.000000,0.000000,0.333333,0.00,0.2,0.157895,0.166667,0.250000,0.0,0.142857,0.0,0.0,0.000000,0.218391,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.1,0.1,0.0,0.0,0.00,0.142857,0.142857,0.00,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.083333,0.1,0.0,0.333333,0.0,0.0,0.0,0.0,0.333333,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
472,0.190476,0.0625,0.086957,0.0,0.166667,0.142857,0.666667,0.00,0.4,0.105263,0.083333,0.041667,0.0,0.071429,0.0,0.0,0.000000,0.160920,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.142857,0.142857,0.00,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,1.0,0.000000,0.0,0.0,0.0,0.0,0.666667,0.0,0.090909,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
473,0.349206,0.1250,0.217391,0.6,0.333333,0.714286,0.333333,0.00,0.2,0.210526,0.166667,0.166667,0.0,0.071429,0.4,0.2,0.500000,0.327586,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.000,0.0,0.0,0.000000,0.0,0.25,0.0,0.0,0.1,0.1,0.0,0.0,0.00,0.142857,0.000000,0.00,0.000000,1.0,1.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.333333,0.0,0.000000,0.0,0.0,0.0,0.00,0.1,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.5,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.333333,0.333333,0.0,0.0,0.25,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
474,0.396825,0.2500,0.521739,0.0,0.333333,0.285714,0.000000,0.25,0.2,0.473684,0.250000,0.500000,0.0,0.321429,0.0,0.2,0.166667,0.373563,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.000,0.000,0.0,0.0,0.000000,0.0,0.00,0.2,0.2,0.4,0.3,0.0,0.0,0.00,0.285714,0.142857,0.25,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.3,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.090909,0.0,0.0,0.0,0.00,0.1,1.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,1.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
475,0.349206,0.0000,0.173913,0.0,0.000000,0.000000,0.666667,0.25,0.6,0.210526,0.000000,0.291667,0.0,0.214286,0.0,0.0,0.000000,0.316092,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000,0.0,0.0,0.333333,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.428571,0.000000,0.00,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.666667,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.500,0.0,0.0,0.5,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [56]:
df_test_discrete_scaled_df = pd.DataFrame(df_test_discrete_scaled)
df_test_discrete_scaled_df.columns = df_test_discrete.columns
df_test_discrete_scaled_df

Unnamed: 0,HeavyAtomCount,NHOHCount,NOCount,NumAliphaticCarbocycles,NumAliphaticHeterocycles,NumAliphaticRings,NumAromaticCarbocycles,NumAromaticHeterocycles,NumAromaticRings,NumHAcceptors,NumHDonors,NumHeteroatoms,NumRadicalElectrons,NumRotatableBonds,NumSaturatedCarbocycles,NumSaturatedHeterocycles,NumSaturatedRings,NumValenceElectrons,RingCount,SMR_VSA8,SlogP_VSA9,VSA_EState1,VSA_EState2,VSA_EState3,VSA_EState4,VSA_EState5,VSA_EState6,VSA_EState7,fr_Al_COO,fr_Al_OH,fr_Al_OH_noTert,fr_ArN,fr_Ar_COO,fr_Ar_N,fr_Ar_NH,fr_Ar_OH,fr_COO,fr_COO2,fr_C_O,fr_C_O_noCOO,fr_C_S,fr_HOCCN,fr_Imine,fr_NH0,fr_NH1,fr_NH2,fr_N_O,fr_Ndealkylation1,fr_Ndealkylation2,fr_Nhpyrrole,fr_SH,fr_aldehyde,fr_alkyl_carbamate,fr_alkyl_halide,fr_allylic_oxid,fr_amide,fr_amidine,fr_aniline,fr_aryl_methyl,fr_azide,fr_azo,fr_barbitur,fr_benzene,fr_benzodiazepine,fr_bicyclic,fr_diazo,fr_dihydropyridine,fr_epoxide,fr_ester,fr_ether,fr_furan,fr_guanido,fr_halogen,fr_hdrzine,fr_hdrzone,fr_imidazole,fr_imide,fr_isocyan,fr_isothiocyan,fr_ketone,fr_ketone_Topliss,fr_lactam,fr_lactone,fr_methoxy,fr_morpholine,fr_nitrile,fr_nitro,fr_nitro_arom,fr_nitro_arom_nonortho,fr_nitroso,fr_oxazole,fr_oxime,fr_para_hydroxylation,fr_phenol,fr_phenol_noOrthoHbond,fr_phos_acid,fr_phos_ester,fr_piperdine,fr_piperzine,fr_priamide,fr_prisulfonamd,fr_pyridine,fr_quatN,fr_sulfide,fr_sulfonamd,fr_sulfone,fr_term_acetylene,fr_tetrazole,fr_thiazole,fr_thiocyan,fr_thiophene,fr_unbrch_alkane,fr_urea
0,0.412698,0.1875,0.130435,0.8,0.000000,0.571429,0.000000,0.00,0.0,0.157895,0.250000,0.083333,0.0,0.178571,0.8,0.0,0.666667,0.408046,0.444444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.375,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.333333,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.090909,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
1,0.412698,0.0625,0.173913,0.0,0.333333,0.285714,0.666667,0.00,0.4,0.263158,0.083333,0.291667,0.0,0.250000,0.0,0.2,0.166667,0.396552,0.444444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.428571,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.000000,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.666667,0.0,0.181818,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.000000,0.000000,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
2,0.809524,0.3750,0.608696,0.8,0.666667,1.142857,0.000000,0.00,0.0,0.736842,0.500000,0.541667,0.0,0.250000,0.8,0.6,1.166667,0.827586,0.888889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.750,0.625,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.454545,0.0,0.0,0.0,0.25,0.7,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,1.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
3,0.238095,0.2500,0.173913,0.0,0.000000,0.000000,0.666667,0.00,0.4,0.157895,0.166667,0.125000,0.0,0.142857,0.0,0.0,0.000000,0.206897,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000,0.5,0.0,0.000000,0.0,0.00,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.000000,0.000000,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.1,0.0,0.333333,0.0,0.0,0.0,0.0,0.666667,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.000000,0.000000,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
4,0.793651,0.2500,0.608696,0.0,1.166667,1.000000,1.000000,0.00,0.6,0.789474,0.333333,0.583333,0.0,0.142857,0.0,0.4,0.333333,0.752874,1.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.000000,0.0,0.50,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.285714,0.142857,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.2,0.0,0.0,0.0,1.000000,0.0,0.818182,0.0,0.0,0.0,0.50,0.6,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.666667,0.666667,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
115,0.507937,0.1250,0.521739,0.0,0.500000,0.428571,0.333333,0.00,0.2,0.421053,0.166667,0.500000,0.0,0.321429,0.0,0.6,0.500000,0.482759,0.444444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.000,0.000,0.0,0.0,0.000000,0.0,0.00,0.2,0.2,0.6,0.6,0.0,0.0,0.0,0.428571,0.285714,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.6,0.0,0.000000,0.0,0.0,0.0,0.0,0.333333,0.0,0.090909,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.5,0.0,0.0,0.000000,0.0,1.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,1.0
116,0.190476,0.0625,0.173913,0.0,0.000000,0.000000,0.000000,0.50,0.4,0.157895,0.083333,0.125000,0.0,0.035714,0.0,0.0,0.000000,0.155172,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000,0.0,0.0,0.333333,0.5,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.142857,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.2,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
117,0.285714,0.0625,0.086957,0.2,0.000000,0.142857,0.333333,0.25,0.4,0.157895,0.083333,0.083333,0.0,0.214286,0.0,0.0,0.000000,0.270115,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000,0.0,0.0,0.000000,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.333333,0.0,0.090909,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.333333,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.000000,0.0
118,0.412698,0.0625,0.391304,0.0,0.166667,0.142857,0.333333,0.00,0.2,0.421053,0.083333,0.333333,0.0,0.357143,0.0,0.0,0.000000,0.396552,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.166667,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.333333,0.0,0.000000,0.0,1.0,0.0,0.50,0.3,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.166667,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0


In [60]:
# combine the train sets back into one
df_train_scaled_df = pd.concat([df_train_discrete_scaled_df, df_train_cont_scaled_df], axis = 1)

# reorder it to be similar to original
df_train_scaled_df = df_train_scaled_df[df_train.columns[2:]]
df_train_scaled_df

Unnamed: 0,BalabanJ,BertzCT,Chi0,Chi0n,Chi0v,Chi1,Chi1n,Chi1v,Chi2n,Chi2v,Chi3n,Chi3v,Chi4n,Chi4v,EState_VSA1,EState_VSA10,EState_VSA11,EState_VSA2,EState_VSA3,EState_VSA4,EState_VSA5,EState_VSA6,EState_VSA7,EState_VSA8,EState_VSA9,ExactMolWt,FractionCSP3,HallKierAlpha,HeavyAtomCount,HeavyAtomMolWt,Ipc,Kappa1,Kappa2,Kappa3,LabuteASA,MaxAbsEStateIndex,MaxAbsPartialCharge,MaxEStateIndex,MaxPartialCharge,MinAbsEStateIndex,MinAbsPartialCharge,MinEStateIndex,MinPartialCharge,MolLogP,MolMR,MolWt,NHOHCount,NOCount,NumAliphaticCarbocycles,NumAliphaticHeterocycles,NumAliphaticRings,NumAromaticCarbocycles,NumAromaticHeterocycles,NumAromaticRings,NumHAcceptors,NumHDonors,NumHeteroatoms,NumRadicalElectrons,NumRotatableBonds,NumSaturatedCarbocycles,NumSaturatedHeterocycles,NumSaturatedRings,NumValenceElectrons,PEOE_VSA1,PEOE_VSA10,PEOE_VSA11,PEOE_VSA12,PEOE_VSA13,PEOE_VSA14,PEOE_VSA2,PEOE_VSA3,PEOE_VSA4,PEOE_VSA5,PEOE_VSA6,PEOE_VSA7,PEOE_VSA8,PEOE_VSA9,RingCount,SMR_VSA1,SMR_VSA10,SMR_VSA2,SMR_VSA3,SMR_VSA4,SMR_VSA5,SMR_VSA6,SMR_VSA7,SMR_VSA8,SMR_VSA9,SlogP_VSA1,SlogP_VSA10,SlogP_VSA11,SlogP_VSA12,SlogP_VSA2,SlogP_VSA3,SlogP_VSA4,SlogP_VSA5,SlogP_VSA6,SlogP_VSA7,SlogP_VSA8,SlogP_VSA9,TPSA,VSA_EState1,VSA_EState10,VSA_EState2,VSA_EState3,VSA_EState4,VSA_EState5,VSA_EState6,VSA_EState7,VSA_EState8,VSA_EState9,fr_Al_COO,fr_Al_OH,fr_Al_OH_noTert,fr_ArN,fr_Ar_COO,fr_Ar_N,fr_Ar_NH,fr_Ar_OH,fr_COO,fr_COO2,fr_C_O,fr_C_O_noCOO,fr_C_S,fr_HOCCN,fr_Imine,fr_NH0,fr_NH1,fr_NH2,fr_N_O,fr_Ndealkylation1,fr_Ndealkylation2,fr_Nhpyrrole,fr_SH,fr_aldehyde,fr_alkyl_carbamate,fr_alkyl_halide,fr_allylic_oxid,fr_amide,fr_amidine,fr_aniline,fr_aryl_methyl,fr_azide,fr_azo,fr_barbitur,fr_benzene,fr_benzodiazepine,fr_bicyclic,fr_diazo,fr_dihydropyridine,fr_epoxide,fr_ester,fr_ether,fr_furan,fr_guanido,fr_halogen,fr_hdrzine,fr_hdrzone,fr_imidazole,fr_imide,fr_isocyan,fr_isothiocyan,fr_ketone,fr_ketone_Topliss,fr_lactam,fr_lactone,fr_methoxy,fr_morpholine,fr_nitrile,fr_nitro,fr_nitro_arom,fr_nitro_arom_nonortho,fr_nitroso,fr_oxazole,fr_oxime,fr_para_hydroxylation,fr_phenol,fr_phenol_noOrthoHbond,fr_phos_acid,fr_phos_ester,fr_piperdine,fr_piperzine,fr_priamide,fr_prisulfonamd,fr_pyridine,fr_quatN,fr_sulfide,fr_sulfonamd,fr_sulfone,fr_term_acetylene,fr_tetrazole,fr_thiazole,fr_thiocyan,fr_thiophene,fr_unbrch_alkane,fr_urea
0,-0.452875,1.344465,0.551264,0.395663,0.309064,0.627503,0.204954,0.062833,-0.001312,-0.175467,-0.046856,-0.216005,-0.134956,-0.295263,-0.525136,-0.384981,-0.181361,1.395076,0.554831,-0.812459,0.507059,2.249669,-0.893830,0.593069,0.084085,0.435157,-1.274949,-2.173874,0.428571,0.494435,-0.077296,0.224120,0.054235,-0.073153,0.460355,1.022116,0.090102,1.022116,1.398748,-0.410428,1.551494,0.396527,-0.140761,0.042382,0.514900,0.433896,0.2500,0.434783,0.0,0.000000,0.000000,0.333333,0.75,0.8,0.473684,0.166667,0.416667,0.0,0.214286,0.0,0.0,0.000000,0.385057,0.012270,0.836808,2.541196,-0.547020,-0.461025,0.310042,-0.363430,4.605039,0.582421,-0.362256,-0.293023,-0.790225,0.005257,0.500367,0.444444,-0.350813,1.067887,-0.190321,2.397158,-0.706031,-0.991136,1.069443,0.578333,0.0,1.199439,1.377778,3.798498,-0.496764,-0.536573,0.116134,-0.150281,-0.197533,-1.047197,0.565206,-0.322665,2.851332,0.0,0.972146,0.0,-0.283230,0.0,0.0,0.0,0.0,0.0,0.0,0.466294,0.397726,0.0,0.000,0.000,1.0,0.0,0.833333,0.0,0.00,0.0,0.0,0.1,0.1,0.0,0.0,0.00,0.857143,0.000000,0.50,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.1,0.0,1.000000,0.0,0.0,0.0,0.0,0.333333,0.0,0.090909,0.0,0.0,0.0,0.00,0.1,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.310322,-0.632257,-0.887194,-0.923257,-0.867786,-0.887752,-0.954128,-0.864476,-0.953691,-0.776947,-0.874703,-0.567973,-0.887579,-0.560062,-0.263887,-0.305248,-0.181361,-1.140124,-0.755872,-0.247407,-0.915263,0.516790,0.771000,-0.948309,-0.452991,-0.848802,-1.152678,0.324123,0.190476,-0.825238,-0.077299,-0.936127,-0.884150,-0.090552,-0.870688,-0.235308,-0.957155,-0.235308,0.715441,0.315142,0.817511,0.283780,0.874895,0.156859,-0.809262,-0.849028,0.1875,0.173913,0.0,0.000000,0.000000,0.333333,0.25,0.4,0.157895,0.166667,0.166667,0.0,0.107143,0.0,0.0,0.000000,0.172414,-0.875052,-0.898272,-0.599476,-0.547020,-0.461025,0.299272,-0.319031,0.145245,0.575623,1.532125,-0.293023,-0.541730,-0.626900,-0.622577,0.222222,-0.618204,0.591976,-0.190321,-0.248317,-0.136601,-0.797708,-0.948979,-0.131601,0.0,-0.591507,-0.152654,0.184222,-0.496764,0.540251,-0.902692,-1.186703,-0.759979,-0.572578,-0.003722,-0.322665,0.995063,0.0,-0.339273,0.0,0.066280,0.0,0.0,0.0,0.0,0.0,0.0,-0.484300,-0.538025,0.0,0.000,0.000,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.1,0.1,0.0,0.0,0.00,0.142857,0.000000,0.25,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.2,0.0,0.000000,0.0,0.0,0.0,0.0,0.333333,0.0,0.090909,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,1.0
2,1.983159,-1.644450,-1.567194,-1.462226,-1.536495,-1.753286,-1.602831,-1.717507,-1.174784,-1.328678,-1.547920,-1.688262,-1.383059,-1.514193,-0.530553,-0.315105,-0.181361,-0.672837,-0.799998,-1.087602,-0.915263,0.734410,-0.893830,-0.948309,-0.990161,-1.669700,1.208687,1.271318,0.063492,-1.695872,-0.077299,-1.380081,-1.380014,-0.054011,-1.675180,-0.591554,1.034686,-0.591554,-1.325244,-0.318607,-1.374517,0.181342,-1.056843,-1.542071,-1.746059,-1.670280,0.0000,0.130435,0.0,0.000000,0.000000,0.000000,0.00,0.0,0.105263,0.000000,0.083333,0.0,0.071429,0.0,0.0,0.000000,0.068966,-0.141973,-0.237995,-0.599476,-0.547020,-0.461025,-0.748403,-1.072078,-0.744894,-0.380377,-0.362256,-1.138449,-1.514497,-1.282752,1.197596,0.000000,-0.319299,-0.869567,-0.190321,-0.929289,-0.706031,-1.188299,0.881851,-1.463516,0.0,-0.591507,-0.242899,-0.613938,-0.496764,-0.536573,-0.126947,-0.746210,-0.759979,-1.262206,-1.410942,-0.322665,-0.506809,0.0,-0.823971,0.0,-0.283230,0.0,0.0,0.0,0.0,0.0,0.0,-0.458974,-1.133316,0.2,0.000,0.000,0.0,0.0,0.000000,0.0,0.00,0.2,0.2,0.1,0.1,0.0,0.0,0.00,0.142857,0.000000,0.00,0.000000,1.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,-0.093806,0.800838,0.511898,0.430817,0.343985,0.497517,0.397646,0.252599,0.301815,0.122425,0.054892,-0.116209,0.095510,-0.070183,0.283293,0.611727,-0.181361,0.164995,-1.141063,0.469142,-0.061968,0.971662,0.438001,0.196827,-0.511729,0.359423,-0.732370,-0.948091,0.412698,0.361732,-0.077298,0.444694,0.399587,-0.060710,0.433827,0.770782,0.387851,0.770782,0.401489,-0.311405,0.480276,0.098875,-0.429526,0.907890,0.566164,0.358414,0.1875,0.217391,0.0,0.000000,0.000000,0.666667,0.25,0.6,0.210526,0.250000,0.208333,0.0,0.285714,0.0,0.0,0.000000,0.385057,0.324401,-0.311438,-0.599476,-0.547020,-0.461025,0.288501,-0.378616,0.070061,-0.380377,-0.362256,0.552867,0.224091,1.088299,0.464775,0.333333,0.371076,0.285450,-0.190321,-0.315029,-0.706031,0.165452,-0.948979,1.037651,0.0,1.138642,-0.977961,0.116807,-0.496764,-0.536573,-0.129647,-0.746210,-0.197533,0.222912,1.122490,-0.322665,3.678359,0.0,-0.043465,0.0,-0.283230,0.0,0.0,0.0,0.0,0.0,0.0,0.461157,0.361520,0.2,0.250,0.250,0.0,0.0,0.166667,0.0,0.00,0.2,0.2,0.1,0.0,0.0,0.0,0.00,0.142857,0.000000,0.00,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.666667,0.0,0.090909,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.049581,-0.480968,-0.483586,-0.759973,-0.838903,-0.647268,-0.904556,-1.029833,-0.936613,-1.094620,-0.976086,-1.127402,-1.013526,-1.153297,0.474686,0.471975,-0.181361,-0.734095,-1.141063,0.663707,-0.488440,-0.862098,-0.893830,-0.617672,-0.018871,-0.614226,-1.209993,-0.511638,0.238095,-0.553023,-0.077299,-0.469606,-0.552572,-0.071338,-0.711115,0.318683,0.695868,0.318683,1.426449,-0.280798,1.581250,-2.043958,-0.728248,0.262622,-0.873388,-0.615269,0.1250,0.173913,0.0,0.000000,0.000000,0.333333,0.00,0.2,0.157895,0.166667,0.250000,0.0,0.142857,0.0,0.0,0.000000,0.218391,-0.477664,0.294968,0.157998,-0.547020,1.298737,0.324460,-0.378616,-0.744894,3.100159,-0.362256,-1.138449,-0.273257,-0.893295,-0.663957,0.111111,0.273328,-0.486851,4.342494,-0.929289,-0.706031,-0.793672,-0.597400,0.093556,0.0,0.352168,-0.212673,2.525270,-0.496764,-0.536573,-1.090536,-0.178851,0.335615,-0.779586,0.240588,-0.322665,-0.506809,0.0,-0.218969,0.0,-0.283230,0.0,0.0,0.0,0.0,0.0,0.0,1.746063,-0.878644,0.0,0.125,0.125,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.1,0.1,0.0,0.0,0.00,0.142857,0.142857,0.00,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.083333,0.1,0.0,0.333333,0.0,0.0,0.0,0.0,0.333333,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
472,-0.169844,-0.511260,-1.013166,-0.864449,-0.942685,-0.817058,-0.736117,-0.863952,-0.793589,-0.954065,-0.680687,-0.837672,-0.593701,-0.743286,-0.791321,-1.281364,-0.181361,-1.140124,-1.141063,0.335614,0.233622,-0.862098,-0.893830,2.333439,-0.990161,-1.027811,-1.030407,0.101253,0.190476,-1.031820,-0.077299,-1.128907,-0.919962,-0.092603,-0.901801,-2.646379,-0.731276,-2.646379,-1.482220,1.208151,-1.543135,1.417691,0.655832,0.019012,-0.690446,-1.028349,0.0625,0.086957,0.0,0.166667,0.142857,0.666667,0.00,0.4,0.105263,0.083333,0.041667,0.0,0.071429,0.0,0.0,0.000000,0.160920,-0.910393,-0.309521,-0.599476,-0.547020,-0.461025,-0.748403,-0.350125,-0.744894,-0.380377,-0.362256,0.834242,-0.864367,-0.394815,-0.579125,0.333333,-1.300462,-0.145836,-0.190321,-0.214154,-0.210287,-0.994871,-0.083421,0.353327,0.0,-0.591507,-0.212673,-0.613938,-0.496764,-0.536573,-0.809516,-0.596837,-0.759979,-1.047197,0.790773,-0.322665,1.097213,0.0,-1.112627,0.0,-0.283230,0.0,0.0,0.0,0.0,0.0,0.0,-0.484300,-0.879316,0.0,0.000,0.000,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.142857,0.142857,0.00,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,1.0,0.000000,0.0,0.0,0.0,0.0,0.666667,0.0,0.090909,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
473,-0.761251,0.280287,-0.032186,0.136281,0.051404,0.106731,0.418168,0.272810,0.894955,0.705324,1.205854,1.012664,1.481805,1.283709,-0.043399,0.183298,-0.181361,0.114661,0.273335,1.577292,-0.012105,-0.404055,-0.560886,-0.510029,-0.546392,-0.020031,0.700497,0.324123,0.349206,-0.043819,-0.077299,-0.355494,-0.800003,-0.093814,0.070641,0.507372,0.623997,0.507372,-0.808148,-0.300603,-0.819071,0.227085,-0.658546,-0.263813,-0.000269,-0.020898,0.1250,0.217391,0.6,0.333333,0.714286,0.333333,0.00,0.2,0.210526,0.166667,0.166667,0.0,0.071429,0.4,0.2,0.500000,0.327586,0.285925,0.276304,2.579117,-0.547020,-0.461025,-0.748403,-0.378616,-0.744894,-0.380377,-0.362256,-0.856656,-0.285077,0.822658,1.062832,0.666667,0.352113,-0.882220,-0.190321,-0.929289,-0.118328,0.669972,0.382552,-0.583618,0.0,1.196485,-0.296154,-0.613938,1.906586,-0.536573,0.591428,0.341014,-0.187768,0.408663,-0.848045,-0.322665,-0.506809,0.0,-0.335605,0.0,-0.283230,0.0,0.0,0.0,0.0,0.0,0.0,-0.429780,0.120380,0.0,0.125,0.000,0.0,0.0,0.000000,0.0,0.25,0.0,0.0,0.1,0.1,0.0,0.0,0.00,0.142857,0.000000,0.00,0.000000,1.0,1.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.333333,0.0,0.000000,0.0,0.0,0.0,0.00,0.1,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.5,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.333333,0.333333,0.0,0.0,0.25,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
474,-0.530321,0.436629,0.414242,0.100799,0.149312,0.408580,-0.081285,0.056271,-0.258085,-0.096649,-0.280987,0.033663,-0.312944,0.097665,0.750526,1.088659,-0.181361,1.366018,-1.141063,-0.812459,1.279054,-0.404055,-0.893830,0.298312,-0.039200,0.448313,-0.652130,-1.430975,0.396825,0.531216,-0.077298,0.255612,0.154910,-0.069778,0.314737,0.420575,0.346782,0.420575,1.315643,-0.190777,1.462226,-0.044796,-0.389696,-1.137640,0.087526,0.447454,0.2500,0.521739,0.0,0.333333,0.285714,0.000000,0.25,0.2,0.473684,0.250000,0.500000,0.0,0.321429,0.0,0.2,0.166667,0.373563,1.194096,2.211838,0.183415,0.506104,3.058500,1.347120,1.023351,1.035199,-0.380377,1.603141,-0.898930,-1.031636,-0.507125,-0.603486,0.333333,1.310425,1.537268,-0.190321,0.444895,0.375330,-0.844401,0.338447,-0.123392,0.0,-0.591507,0.612490,0.184222,-0.496764,0.580619,1.021856,1.014201,-0.759979,-1.039583,0.409508,-0.322665,-0.506809,0.0,1.626664,0.0,-0.008291,0.0,0.0,0.0,0.0,0.0,0.0,-0.173862,1.012842,0.2,0.000,0.000,0.0,0.0,0.000000,0.0,0.00,0.2,0.2,0.4,0.3,0.0,0.0,0.00,0.285714,0.142857,0.25,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.3,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.090909,0.0,0.0,0.0,0.00,0.1,1.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,1.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
475,-0.438794,0.477795,0.053591,-0.164332,0.246240,0.134926,-0.231334,0.030024,-0.386167,-0.047799,-0.475732,-0.230670,-0.564839,-0.296117,-0.791321,-1.281364,-0.181361,-0.668410,1.319889,-0.462804,-0.915263,2.832002,-0.303028,-0.317754,3.810173,0.468404,-1.423968,0.203402,0.349206,0.588455,-0.077299,0.101847,0.284887,-0.059233,0.369715,-1.937285,-0.536199,-1.937285,-1.094398,-0.077350,-1.126550,0.957684,0.466641,1.498100,0.324815,0.480146,0.0000,0.173913,0.0,0.000000,0.000000,0.666667,0.25,0.6,0.210526,0.000000,0.291667,0.0,0.214286,0.0,0.0,0.000000,0.316092,-0.563938,0.344501,-0.599476,-0.547020,-0.461025,-0.748403,-1.072078,0.180331,-0.380377,-0.362256,1.538539,-0.307363,1.359830,0.401367,0.333333,-0.970452,2.269570,-0.190321,0.355318,-0.194001,-0.792105,-0.948979,1.802512,0.0,-0.591507,-0.977961,-0.613938,-0.496764,3.871027,-0.939600,0.465858,-0.759979,-0.832150,1.385450,7.102301,-0.506809,0.0,-0.837175,0.0,5.239801,0.0,0.0,0.0,0.0,0.0,0.0,-0.371694,-0.732353,0.0,0.000,0.000,0.0,0.0,0.333333,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.428571,0.000000,0.00,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.666667,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.500,0.0,0.0,0.5,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [61]:
# do the same for test
df_test_scaled_df = pd.concat([df_test_discrete_scaled_df, df_test_cont_scaled_df], axis = 1)

df_test_scaled_df = df_test_scaled_df[df_test.columns[2:]]
df_test_scaled_df

Unnamed: 0,BalabanJ,BertzCT,Chi0,Chi0n,Chi0v,Chi1,Chi1n,Chi1v,Chi2n,Chi2v,Chi3n,Chi3v,Chi4n,Chi4v,EState_VSA1,EState_VSA10,EState_VSA11,EState_VSA2,EState_VSA3,EState_VSA4,EState_VSA5,EState_VSA6,EState_VSA7,EState_VSA8,EState_VSA9,ExactMolWt,FractionCSP3,HallKierAlpha,HeavyAtomCount,HeavyAtomMolWt,Ipc,Kappa1,Kappa2,Kappa3,LabuteASA,MaxAbsEStateIndex,MaxAbsPartialCharge,MaxEStateIndex,MaxPartialCharge,MinAbsEStateIndex,MinAbsPartialCharge,MinEStateIndex,MinPartialCharge,MolLogP,MolMR,MolWt,NHOHCount,NOCount,NumAliphaticCarbocycles,NumAliphaticHeterocycles,NumAliphaticRings,NumAromaticCarbocycles,NumAromaticHeterocycles,NumAromaticRings,NumHAcceptors,NumHDonors,NumHeteroatoms,NumRadicalElectrons,NumRotatableBonds,NumSaturatedCarbocycles,NumSaturatedHeterocycles,NumSaturatedRings,NumValenceElectrons,PEOE_VSA1,PEOE_VSA10,PEOE_VSA11,PEOE_VSA12,PEOE_VSA13,PEOE_VSA14,PEOE_VSA2,PEOE_VSA3,PEOE_VSA4,PEOE_VSA5,PEOE_VSA6,PEOE_VSA7,PEOE_VSA8,PEOE_VSA9,RingCount,SMR_VSA1,SMR_VSA10,SMR_VSA2,SMR_VSA3,SMR_VSA4,SMR_VSA5,SMR_VSA6,SMR_VSA7,SMR_VSA8,SMR_VSA9,SlogP_VSA1,SlogP_VSA10,SlogP_VSA11,SlogP_VSA12,SlogP_VSA2,SlogP_VSA3,SlogP_VSA4,SlogP_VSA5,SlogP_VSA6,SlogP_VSA7,SlogP_VSA8,SlogP_VSA9,TPSA,VSA_EState1,VSA_EState10,VSA_EState2,VSA_EState3,VSA_EState4,VSA_EState5,VSA_EState6,VSA_EState7,VSA_EState8,VSA_EState9,fr_Al_COO,fr_Al_OH,fr_Al_OH_noTert,fr_ArN,fr_Ar_COO,fr_Ar_N,fr_Ar_NH,fr_Ar_OH,fr_COO,fr_COO2,fr_C_O,fr_C_O_noCOO,fr_C_S,fr_HOCCN,fr_Imine,fr_NH0,fr_NH1,fr_NH2,fr_N_O,fr_Ndealkylation1,fr_Ndealkylation2,fr_Nhpyrrole,fr_SH,fr_aldehyde,fr_alkyl_carbamate,fr_alkyl_halide,fr_allylic_oxid,fr_amide,fr_amidine,fr_aniline,fr_aryl_methyl,fr_azide,fr_azo,fr_barbitur,fr_benzene,fr_benzodiazepine,fr_bicyclic,fr_diazo,fr_dihydropyridine,fr_epoxide,fr_ester,fr_ether,fr_furan,fr_guanido,fr_halogen,fr_hdrzine,fr_hdrzone,fr_imidazole,fr_imide,fr_isocyan,fr_isothiocyan,fr_ketone,fr_ketone_Topliss,fr_lactam,fr_lactone,fr_methoxy,fr_morpholine,fr_nitrile,fr_nitro,fr_nitro_arom,fr_nitro_arom_nonortho,fr_nitroso,fr_oxazole,fr_oxime,fr_para_hydroxylation,fr_phenol,fr_phenol_noOrthoHbond,fr_phos_acid,fr_phos_ester,fr_piperdine,fr_piperzine,fr_priamide,fr_prisulfonamd,fr_pyridine,fr_quatN,fr_sulfide,fr_sulfonamd,fr_sulfone,fr_term_acetylene,fr_tetrazole,fr_thiazole,fr_thiocyan,fr_thiophene,fr_unbrch_alkane,fr_urea
0,-0.927408,0.011660,0.460791,0.721410,0.632649,0.501249,0.978653,0.824786,1.239251,1.043675,1.343707,1.147872,1.345362,1.150456,-0.257990,0.213746,-0.181361,-0.704323,2.048809,-0.101188,2.186080,-0.862098,-0.893830,1.833223,-0.990161,0.367102,0.841873,0.723430,0.412698,0.266143,-0.077297,0.483836,0.280629,-0.064608,0.561412,-0.463622,-0.515664,-0.463622,-1.666898,0.007270,-1.741508,0.426808,0.446727,1.091291,0.727625,0.366279,0.1875,0.130435,0.8,0.000000,0.571429,0.000000,0.00,0.0,0.157895,0.250000,0.083333,0.0,0.178571,0.8,0.0,0.666667,0.408046,-0.062648,-0.898272,-0.599476,-0.547020,-0.461025,-0.748403,-1.072078,-0.744894,-0.380377,-0.362256,1.198399,2.130903,-0.843029,0.437391,0.444444,-0.255452,-1.275615,-0.190321,-0.929289,2.182529,1.521281,-0.948979,0.337212,0.0,-0.591507,-0.977961,-0.613938,-0.496764,-0.536573,-0.287119,-1.186703,2.052441,1.506471,0.797500,-0.322665,-0.506809,0.0,-0.446922,0.0,-0.283230,0.0,0.0,0.0,0.0,0.0,0.0,-0.484300,0.363815,0.0,0.375,0.375,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.333333,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.090909,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
1,-0.944306,0.331971,0.415761,0.388271,0.435039,0.534627,0.488795,0.556540,0.301815,0.447290,0.307258,0.535504,0.230641,0.521344,-0.278435,0.004021,-0.181361,-0.668410,0.287486,1.979562,0.765235,-0.404055,0.706947,-0.338897,-0.511729,0.538736,-0.109551,0.184829,0.412698,0.553767,-0.077298,0.412564,0.384310,-0.060772,0.510698,0.700213,-0.279518,0.700213,1.426449,-0.107957,1.372958,-1.976310,0.426812,0.781134,0.512501,0.538024,0.0625,0.173913,0.0,0.333333,0.285714,0.666667,0.00,0.4,0.263158,0.083333,0.291667,0.0,0.250000,0.0,0.2,0.166667,0.396552,-0.097734,-0.898272,-0.599476,-0.547020,-0.461025,0.324460,-0.363430,-0.744894,2.106583,-0.362256,-0.028418,0.208610,2.076973,0.889454,0.444444,-0.053681,0.298307,-0.190321,0.388808,-0.706031,-0.513878,2.841224,0.353327,0.0,-0.591507,-0.272693,3.471910,-0.496764,0.580619,0.909853,-0.619344,-0.759979,-0.799027,1.013418,-0.322665,-0.506809,0.0,-1.010662,0.0,0.062880,0.0,0.0,0.0,0.0,0.0,0.0,1.929328,-0.768876,0.0,0.125,0.125,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.428571,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.000000,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.666667,0.0,0.181818,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.000000,0.000000,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
2,-1.838457,1.707719,2.908821,3.044182,2.940006,2.930757,3.333331,3.143715,3.703299,3.465178,4.049707,3.801941,4.028740,3.771113,3.180967,2.176713,-0.181361,3.222955,0.286193,1.388567,-0.915263,0.642288,-0.133811,-0.517740,2.116130,2.906952,1.693951,0.621282,0.809524,2.802308,0.264547,2.837225,1.862886,-0.038419,2.946133,0.419820,0.151705,0.419820,0.641570,-0.361817,0.738162,0.174899,-0.200505,-0.046743,2.595025,2.905102,0.3750,0.608696,0.8,0.666667,1.142857,0.000000,0.00,0.0,0.736842,0.500000,0.541667,0.0,0.250000,0.8,0.6,1.166667,0.827586,4.045733,1.615622,1.965309,-0.547020,-0.461025,0.288501,-1.072078,0.145245,-0.380377,-0.362256,-0.495193,2.735783,0.823274,3.557774,0.888889,3.378344,-0.869567,-0.190321,-0.929289,2.720280,4.742008,-0.512100,-1.022848,0.0,-0.591507,-0.977961,-0.613938,-0.496764,-0.536573,3.307741,2.299853,2.576017,2.805607,-0.870499,-0.322665,-0.506809,0.0,2.163995,0.0,-0.283230,0.0,0.0,0.0,0.0,0.0,0.0,-0.484300,2.995220,0.0,0.750,0.625,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.454545,0.0,0.0,0.0,0.25,0.7,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,1.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
3,0.370871,-0.298892,-0.594226,-0.669461,-0.748991,-0.576782,-0.689477,-0.818019,-0.748760,-0.910010,-0.731379,-0.887392,-0.722273,-0.868852,-0.533262,-0.345554,-0.181361,-0.268807,0.561449,-1.087602,-0.915263,2.344434,-0.560886,-0.948309,0.084085,-0.724311,-1.592091,-0.632359,0.238095,-0.707495,-0.077299,-0.691965,-0.573859,-0.081032,-0.639317,0.322079,-0.464328,0.322079,-0.374156,-0.388823,-0.352893,0.520226,0.396940,-0.320061,-0.541003,-0.725008,0.2500,0.173913,0.0,0.000000,0.000000,0.666667,0.00,0.4,0.157895,0.166667,0.125000,0.0,0.142857,0.0,0.0,0.000000,0.206897,-0.389186,-0.898272,0.186541,0.542057,-0.461025,-0.748403,0.314702,-0.744894,-0.380377,-0.362256,0.834242,-1.051654,-0.131296,-0.589837,0.222222,-0.646376,-0.093456,-0.190321,-0.929289,-0.136601,-0.994871,-0.569826,1.003718,0.0,-0.591507,0.672510,0.332701,-0.496764,-0.536573,-1.066522,-0.156436,-0.759979,-0.431815,0.840600,-0.322665,-0.506809,0.0,0.020538,0.0,-0.283230,0.0,0.0,0.0,0.0,0.0,0.0,-0.484300,-0.134095,0.0,0.000,0.000,0.5,0.0,0.000000,0.0,0.00,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.000000,0.000000,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.1,0.0,0.333333,0.0,0.0,0.0,0.0,0.666667,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.000000,0.000000,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
4,-1.158339,3.539286,2.738372,2.706608,2.737990,2.909818,2.567095,2.664443,2.518238,2.639230,2.705460,3.010370,2.818683,3.311986,1.265508,1.149556,-0.181361,1.890135,4.199888,-0.150433,1.398894,0.054065,0.854922,-0.312965,1.672361,2.774706,0.012721,-2.294595,0.793651,2.817556,0.306693,2.173589,1.232538,-0.061069,2.818582,1.217598,0.623997,1.217598,0.641570,-0.401426,0.738162,-0.048018,-0.658546,0.426613,2.646895,2.773458,0.2500,0.608696,0.0,1.166667,1.000000,1.000000,0.00,0.6,0.789474,0.333333,0.583333,0.0,0.142857,0.0,0.4,0.333333,0.752874,2.346006,0.976530,4.842163,0.705409,-0.461025,1.325580,1.807491,0.145245,-0.380377,1.603141,-0.856656,1.196620,1.971305,2.103191,1.111111,2.337290,0.336606,-0.190321,1.103943,-0.706031,1.064900,2.156515,0.908616,0.0,5.666542,3.196219,-0.613938,7.915066,0.580619,2.153396,1.817929,0.578887,1.395547,-0.566619,-0.322665,-0.506809,0.0,1.534235,0.0,0.049507,0.0,0.0,0.0,0.0,0.0,0.0,-0.484300,2.831637,0.0,0.125,0.125,0.0,0.0,0.000000,0.0,0.50,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.285714,0.142857,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.2,0.0,0.0,0.0,1.000000,0.0,0.818182,0.0,0.0,0.0,0.50,0.6,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.666667,0.666667,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
115,-0.893614,0.989616,1.136924,0.912127,0.955418,1.057057,0.803818,0.883579,0.770838,1.114103,0.624903,1.004795,0.480787,0.997982,1.999232,2.024469,-0.181361,-0.205550,-0.219890,-0.812459,0.601159,2.996690,-0.893830,-0.287034,-0.990161,1.083642,-0.021668,-1.551696,0.507937,1.136143,-0.077282,0.921543,0.452429,-0.065891,1.051105,0.698703,1.075755,0.698703,0.586167,-0.396025,0.678650,-0.055749,-1.096672,-1.549201,0.805403,1.082720,0.1250,0.521739,0.0,0.500000,0.428571,0.333333,0.00,0.2,0.421053,0.166667,0.500000,0.0,0.321429,0.0,0.6,0.500000,0.482759,1.209775,0.862937,-0.599476,1.631133,-0.461025,2.351540,2.410131,0.145245,-0.380377,1.603141,0.270609,-0.466433,0.386907,-0.106933,0.444444,1.010155,1.948214,-0.190321,2.477993,-0.706031,0.288329,0.349291,-0.105650,0.0,-0.591507,1.287533,0.184222,-0.496764,0.580619,1.792272,1.015579,-0.759979,-0.010842,-0.003722,-0.322665,-0.506809,0.0,1.360749,0.0,-0.001718,0.0,0.0,0.0,0.0,0.0,0.0,-0.484300,1.838114,0.2,0.000,0.000,0.0,0.0,0.000000,0.0,0.00,0.2,0.2,0.6,0.6,0.0,0.0,0.0,0.428571,0.285714,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.6,0.0,0.000000,0.0,0.0,0.0,0.0,0.333333,0.0,0.090909,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.5,0.0,0.0,0.000000,0.0,1.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,1.0
116,0.753877,-0.331759,-0.909708,-0.933606,-1.011384,-0.861216,-0.977848,-1.102013,-0.998214,-1.155157,-0.910807,-1.063376,-0.880144,-1.023033,-0.791321,-0.813410,-0.181361,-0.345986,-1.141063,-0.143077,-0.915263,1.054648,0.105221,-0.328449,-0.497209,-1.021194,-1.530955,-0.437348,0.190476,-0.987721,-0.077299,-1.037107,-0.914954,-0.092193,-0.948992,-0.034543,-1.213835,-0.034543,0.041369,-0.221384,0.093448,0.602048,1.123831,-0.284807,-0.906861,-1.021842,0.0625,0.173913,0.0,0.000000,0.000000,0.000000,0.50,0.4,0.157895,0.083333,0.125000,0.0,0.035714,0.0,0.0,0.000000,0.155172,-0.938614,0.275296,-0.599476,-0.547020,1.195064,-0.748403,0.342181,-0.744894,0.613199,-0.362256,-1.138449,-0.293275,0.336915,-1.144527,0.222222,-1.300462,-1.275615,4.342494,0.411404,-0.706031,-0.979719,-0.948979,0.511300,0.0,2.082317,-0.177842,-0.613938,-0.496764,-0.536573,-1.127692,-1.186703,1.005096,-0.827125,0.230799,-0.322665,1.150074,0.0,-0.284623,0.0,-0.283230,0.0,0.0,0.0,0.0,0.0,0.0,-0.484300,-0.457263,0.0,0.000,0.000,0.0,0.0,0.333333,0.5,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.142857,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.2,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0
117,-0.665500,-0.369736,-0.382062,-0.177145,-0.126788,-0.227873,-0.025583,0.066507,-0.151045,-0.013335,-0.059620,0.106275,0.001050,0.135677,-0.791321,-0.782961,-0.181361,-1.140124,-0.244007,0.722049,1.122151,-0.862098,0.394366,1.253224,-0.990161,-0.303040,-0.036952,0.574851,0.285714,-0.338246,-0.077299,-0.342106,-0.086761,-0.073523,-0.194776,-0.548910,0.665066,-0.548910,-1.316010,0.426768,-1.364599,1.138079,-0.698376,0.764497,-0.009125,-0.303223,0.0625,0.086957,0.2,0.000000,0.142857,0.333333,0.25,0.4,0.157895,0.083333,0.083333,0.0,0.214286,0.0,0.0,0.000000,0.270115,-0.928190,-0.318197,-0.599476,-0.547020,-0.461025,-0.748403,-0.363430,-0.744894,-0.380377,1.532125,0.028628,1.163429,-0.086783,-1.144527,0.333333,-0.952103,-0.504402,-0.190321,-0.270240,-0.706031,0.169398,-0.083421,0.492840,0.0,0.302566,-0.977961,-0.613938,0.705015,0.540251,-0.446722,0.582803,-0.759979,0.120258,0.245877,-0.322665,-0.506809,0.0,-1.129498,0.0,0.138358,0.0,0.0,0.0,0.0,0.0,0.0,-0.484300,-0.491293,0.0,0.000,0.000,0.0,0.0,0.000000,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.333333,0.0,0.090909,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.333333,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.000000,0.0
118,0.648269,0.417116,0.592840,0.543834,0.456252,0.475335,0.275848,0.132650,0.121281,-0.054992,-0.134746,-0.302209,-0.127084,-0.287575,0.203870,0.641492,-0.181361,1.440912,0.149097,-1.087602,0.864679,1.687195,-0.893830,-0.617672,0.341147,0.407642,-0.208896,-1.170961,0.412698,0.413120,-0.077298,0.615671,0.594677,-0.056012,0.435985,0.548129,0.172240,0.548129,0.696973,-0.388823,0.797674,0.258654,-0.220420,0.251530,0.376448,0.406477,0.0625,0.391304,0.0,0.166667,0.142857,0.333333,0.00,0.2,0.421053,0.083333,0.333333,0.0,0.357143,0.0,0.0,0.000000,0.396552,0.293892,-0.231740,-0.599476,-0.547020,1.233197,1.325580,0.390629,1.035199,-0.380377,-0.362256,-0.574816,-0.190916,0.815330,1.852928,0.222222,0.658795,-0.076586,-0.190321,-0.214154,-0.706031,0.008144,0.394917,0.900181,0.0,-0.591507,-0.212673,0.332701,-0.496764,-0.536573,0.055959,0.999686,0.217943,0.251938,0.760571,-0.322665,-0.506809,0.0,0.585744,0.0,-0.283230,0.0,0.0,0.0,0.0,0.0,0.0,-0.484300,1.072791,0.0,0.000,0.000,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.166667,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.333333,0.0,0.000000,0.0,1.0,0.0,0.50,0.3,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.166667,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0


### Exporting the dataset
Now that we've done transforming the dataset separately, it's time to export them for further use later on.

In [62]:
# export the dataset
df_train_scaled_df.to_csv("processed_data/DIA_trainingset_scaled.csv", index = None)
df_test_scaled_df.to_csv("processed_data/DIA_testset_scaled.csv", index = None)

# ✅ Data Preparation Complete

At this point, we’ve successfully completed the core preprocessing steps needed to prepare our dataset for modeling:

- All descriptor features have been **normalized** to a consistent scale using `StandardScaler` and `MinMaxScaler`.
- The same transformation was applied to the test set using the scaler fitted on the training data, ensuring **no data leakage**.
- Both the training and test sets now share the same structure and feature space, ready for downstream models.

We’ve made no assumptions about task-specific optimization, e.g. label balancing, feature selection, focusing instead on **reproducibility and clean input formatting**.

Next up: we benchmark performance using a suite of **fully classical machine learning models** to establish a strong baseline before exploring more advanced or hybrid approaches.
