# Abstract

Hyperparameters have much importance in data science because they directly control the behaviour of the training algorithm and have a significant impact on the performance of the model. Finding out the hyperparameters is a strenuous task.The aim of this project is to make the process easier and determine the important hyperparameters from the dataset.H2O algorithm is used to achieve this.Various models are generated for runtimes 300,500,800,1000,1200

In [1]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import seaborn as sns
import random, os, sys
from datetime import datetime
import time

In [2]:
# Loading the data set using pandas
df=pd.read_csv("indian_liver_patient.csv", sep=',')

In [3]:
df.head()

Unnamed: 0,Age,Gender,Total_Bilirubin,Direct_Bilirubin,Alkaline_Phosphotase,Alamine_Aminotransferase,Aspartate_Aminotransferase,Total_Protiens,Albumin,Albumin_and_Globulin_Ratio,Dataset
0,65,Female,0.7,0.1,187,16,18,6.8,3.3,0.9,1
1,62,Male,10.9,5.5,699,64,100,7.5,3.2,0.74,1
2,62,Male,7.3,4.1,490,60,68,7.0,3.3,0.89,1
3,58,Male,1.0,0.4,182,14,20,6.8,3.4,1.0,1
4,72,Male,3.9,2.0,195,27,59,7.3,2.4,0.4,1


In [4]:
df.describe()

Unnamed: 0,Age,Total_Bilirubin,Direct_Bilirubin,Alkaline_Phosphotase,Alamine_Aminotransferase,Aspartate_Aminotransferase,Total_Protiens,Albumin,Albumin_and_Globulin_Ratio,Dataset
count,583.0,583.0,583.0,583.0,583.0,583.0,583.0,583.0,579.0,583.0
mean,44.746141,3.298799,1.486106,290.576329,80.713551,109.910806,6.48319,3.141852,0.947064,1.286449
std,16.189833,6.209522,2.808498,242.937989,182.620356,288.918529,1.085451,0.795519,0.319592,0.45249
min,4.0,0.4,0.1,63.0,10.0,10.0,2.7,0.9,0.3,1.0
25%,33.0,0.8,0.2,175.5,23.0,25.0,5.8,2.6,0.7,1.0
50%,45.0,1.0,0.3,208.0,35.0,42.0,6.6,3.1,0.93,1.0
75%,58.0,2.6,1.3,298.0,60.5,87.0,7.2,3.8,1.1,2.0
max,90.0,75.0,19.7,2110.0,2000.0,4929.0,9.6,5.5,2.8,2.0


# Data cleaning

In [5]:
df.shape

(583, 11)

In [6]:
#To check the data types
df.dtypes

Age                             int64
Gender                         object
Total_Bilirubin               float64
Direct_Bilirubin              float64
Alkaline_Phosphotase            int64
Alamine_Aminotransferase        int64
Aspartate_Aminotransferase      int64
Total_Protiens                float64
Albumin                       float64
Albumin_and_Globulin_Ratio    float64
Dataset                         int64
dtype: object

In [7]:
#To show the total NULL Values present in the NULL Valued fields
df.isnull().sum()

Age                           0
Gender                        0
Total_Bilirubin               0
Direct_Bilirubin              0
Alkaline_Phosphotase          0
Alamine_Aminotransferase      0
Aspartate_Aminotransferase    0
Total_Protiens                0
Albumin                       0
Albumin_and_Globulin_Ratio    4
Dataset                       0
dtype: int64

filling null values with median

In [8]:
fill = df['Albumin_and_Globulin_Ratio']
Albumin_and_Globulin_Ratio = fill.fillna(fill.median(),inplace=True)

# H2O

In [2]:
import h2o
from h2o.automl import H2OAutoML
import random, os, sys
from datetime import datetime
import pandas as pd
import logging
import csv
import optparse
import time
import json
from distutils.util import strtobool
import psutil

import warnings
warnings.filterwarnings('ignore')

In [3]:
port_no=random.randint(5555,55555)
h2o.init(strict_version_check=False,min_mem_size_GB=5,port=port_no)

Checking whether there is an H2O instance running at http://localhost:53521 ..... not found.
Attempting to start a local H2O server...
; Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
  Starting server from C:\Users\Manvi\Anaconda3\lib\site-packages\h2o\backend\bin\h2o.jar
  Ice root: C:\Users\Manvi\AppData\Local\Temp\tmplvftmp0a
  JVM stdout: C:\Users\Manvi\AppData\Local\Temp\tmplvftmp0a\h2o_Manvi_started_from_python.out
  JVM stderr: C:\Users\Manvi\AppData\Local\Temp\tmplvftmp0a\h2o_Manvi_started_from_python.err
  Server is running at http://127.0.0.1:53521
Connecting to H2O server at http://127.0.0.1:53521 ... successful.


0,1
H2O cluster uptime:,02 secs
H2O cluster timezone:,America/New_York
H2O data parsing timezone:,UTC
H2O cluster version:,3.24.0.1
H2O cluster version age:,25 days
H2O cluster name:,H2O_from_python_Manvi_s8kupb
H2O cluster total nodes:,1
H2O cluster free memory:,4.792 Gb
H2O cluster total cores:,8
H2O cluster allowed cores:,8


In [4]:
#importing data to the server
df = h2o.import_file(path="indian_liver_patient.csv")

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [5]:
df.head()

Age,Gender,Total_Bilirubin,Direct_Bilirubin,Alkaline_Phosphotase,Alamine_Aminotransferase,Aspartate_Aminotransferase,Total_Protiens,Albumin,Albumin_and_Globulin_Ratio,Dataset
65,Female,0.7,0.1,187,16,18,6.8,3.3,0.9,1
62,Male,10.9,5.5,699,64,100,7.5,3.2,0.74,1
62,Male,7.3,4.1,490,60,68,7.0,3.3,0.89,1
58,Male,1.0,0.4,182,14,20,6.8,3.4,1.0,1
72,Male,3.9,2.0,195,27,59,7.3,2.4,0.4,1
46,Male,1.8,0.7,208,19,14,7.6,4.4,1.3,1
26,Female,0.9,0.2,154,16,12,7.0,3.5,1.0,1
29,Female,0.9,0.3,202,14,11,6.7,3.6,1.1,1
17,Male,0.9,0.3,202,22,19,7.4,4.1,1.2,2
55,Male,0.7,0.2,290,53,58,6.8,3.4,1.0,1




In [6]:
df.isna()

isNA(Age),isNA(Gender),isNA(Total_Bilirubin),isNA(Direct_Bilirubin),isNA(Alkaline_Phosphotase),isNA(Alamine_Aminotransferase),isNA(Aspartate_Aminotransferase),isNA(Total_Protiens),isNA(Albumin),isNA(Albumin_and_Globulin_Ratio),isNA(Dataset)
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0




In [7]:
target = 'Dataset'
run_time=300
pct_memory=0.5
server_path=None 
data_path=None
all_variables=None
test_path=None
model_path=None
nthreads=1 
name=None 
virtual_memory=psutil.virtual_memory()
min_mem_size=int(round(int(pct_memory*virtual_memory.available)/1073741824,0))
run_id='SOME_ID_20180617_221529' # Just some arbitrary ID
classification=True
scale=False
max_models=None
balance_y=False # balance_classes=balance_y
balance_threshold=0.2
project ="automl_test"
analysis=0

defining functions

In [8]:
def alphabet(n):
  alpha='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'    
  str=''
  r=len(alpha)-1   
  while len(str)<n:
    i=random.randint(0,r)
    str+=alpha[i]   
  return str


def set_meta_data(analysis,run_id,server,data,test,model_path,target,run_time,classification,scale,model,balance,balance_threshold,name,path,nthreads,min_mem_size):
  m_data={}
  m_data['start_time'] = time.time()
  m_data['target']=target
  m_data['server_path']=server
  m_data['data_path']=data 
  m_data['test_path']=test
  m_data['max_models']=model
  m_data['run_time']=run_time
  m_data['run_id'] =run_id
  m_data['scale']=scale
  m_data['classification']=classification
  m_data['scale']=False
  m_data['model_path']=model_path
  m_data['balance']=balance
  m_data['balance_threshold']=balance_threshold
  m_data['project'] =name
  m_data['end_time'] = time.time()
  m_data['execution_time'] = 0.0
  m_data['run_path'] =path
  m_data['nthreads'] = nthreads
  m_data['min_mem_size'] = min_mem_size
  m_data['analysis'] = analysis
  return m_data


def automl(maxruntime,X,Y,df):
    aml = H2OAutoML(max_runtime_secs=maxruntime,exclude_algos = ['DeepLearning'])
    aml.train(x=X,y=y,training_frame=df)
    return aml


def dict_to_json(dct,n):  
    j = json.dumps(dct, indent=4)
    f = open(n, 'w')
    print(j, file=f)
    f.close()

generating a unique random ID for every runtime

In [9]:
run_id=alphabet(9)
if server_path==None:
  server_path=os.path.abspath(os.curdir)
os.chdir(server_path) 
run_dir = os.path.join(server_path,run_id)
os.mkdir(run_dir)
os.chdir(run_dir)    

# run_id to std out
print (run_id+"_300sec")

wnRs4lzNR_300sec


In [10]:
# meta data
meta_data = set_meta_data(analysis, run_id,server_path,data_path,test_path,model_path,target,run_time,classification,scale,max_models,balance_y,balance_threshold,name,run_dir,nthreads,min_mem_size)
print(meta_data)

{'start_time': 1556263975.8294196, 'target': 'Dataset', 'server_path': 'C:\\Users\\Manvi\\Anaconda3\\indian-liver-patient-records\\New folder', 'data_path': None, 'test_path': None, 'max_models': None, 'run_time': 300, 'run_id': 'wnRs4lzNR', 'scale': False, 'classification': True, 'model_path': None, 'balance': False, 'balance_threshold': 0.2, 'project': None, 'end_time': 1556263975.8294196, 'execution_time': 0.0, 'run_path': 'C:\\Users\\Manvi\\Anaconda3\\indian-liver-patient-records\\New folder\\wnRs4lzNR', 'nthreads': 1, 'min_mem_size': 0, 'analysis': 0}


In [11]:
y = target
X=[name for name in df.columns if name != y]
print(X)
print(y)

['Age', 'Gender', 'Total_Bilirubin', 'Direct_Bilirubin', 'Alkaline_Phosphotase', 'Alamine_Aminotransferase', 'Aspartate_Aminotransferase', 'Total_Protiens', 'Albumin', 'Albumin_and_Globulin_Ratio']
Dataset


In [12]:
meta_data['X']=X  
model_start_time = time.time()

In [13]:
if analysis == 3:
  classification=False
elif analysis == 2:
  classification=True
elif analysis == 1:
  classification=True

the dependent variable is of classification type

In [14]:
if classification:
    df[y] = df[y].asfactor()

In [15]:
classification=True
if classification:
    print(df[y].levels())

[['1', '2']]


# Runtime: 300sec

In [17]:
aml4 = automl(300,X,y,df)

AutoML progress: |████████████████████████████████████████████████████████| 100%


In [18]:
meta_data['run_time'] = 300
meta_data['end_time'] = time.time()
meta_data['execution_time'] = meta_data['end_time'] - meta_data['start_time']

generating a leaderboard for best models

In [19]:
aml4.leaderboard

model_id,auc,logloss,mean_per_class_error,rmse,mse
GBM_grid_1_AutoML_20190426_033319_model_3,0.770677,0.503864,0.288663,0.413461,0.17095
GBM_grid_1_AutoML_20190426_033319_model_7,0.762357,0.503831,0.284553,0.413728,0.171171
GBM_1_AutoML_20190426_033309,0.760191,0.520464,0.291088,0.419356,0.175859
GLM_grid_1_AutoML_20190426_033309_model_1,0.751699,0.503103,0.2955,0.413658,0.171113
GLM_grid_1_AutoML_20190426_033319_model_1,0.751699,0.503103,0.2955,0.413658,0.171113
GBM_grid_1_AutoML_20190426_033319_model_12,0.750187,0.514211,0.292837,0.418036,0.174754
GBM_1_AutoML_20190426_033319,0.749503,0.530475,0.295306,0.425634,0.181164
GBM_grid_1_AutoML_20190426_033319_model_1,0.745588,0.512969,0.297688,0.416552,0.173516
GBM_grid_1_AutoML_20190426_033319_model_4,0.745523,0.515055,0.289951,0.41692,0.173822
GBM_5_AutoML_20190426_033319,0.74417,0.516124,0.303763,0.417484,0.174293




In [20]:
aml4_leaderboard_df=aml4.leaderboard.as_data_frame()
aml4_leaderboard_df

Unnamed: 0,model_id,auc,logloss,mean_per_class_error,rmse,mse
0,GBM_grid_1_AutoML_20190426_033319_model_3,0.770677,0.503864,0.288663,0.413461,0.17095
1,GBM_grid_1_AutoML_20190426_033319_model_7,0.762357,0.503831,0.284553,0.413728,0.171171
2,GBM_1_AutoML_20190426_033309,0.760191,0.520464,0.291088,0.419356,0.175859
3,GLM_grid_1_AutoML_20190426_033309_model_1,0.751699,0.503103,0.2955,0.413658,0.171113
4,GLM_grid_1_AutoML_20190426_033319_model_1,0.751699,0.503103,0.2955,0.413658,0.171113
5,GBM_grid_1_AutoML_20190426_033319_model_12,0.750187,0.514211,0.292837,0.418036,0.174754
6,GBM_1_AutoML_20190426_033319,0.749503,0.530475,0.295306,0.425634,0.181164
7,GBM_grid_1_AutoML_20190426_033319_model_1,0.745588,0.512969,0.297688,0.416552,0.173516
8,GBM_grid_1_AutoML_20190426_033319_model_4,0.745523,0.515055,0.289951,0.41692,0.173822
9,GBM_5_AutoML_20190426_033319,0.74417,0.516124,0.303763,0.417484,0.174293


In [21]:
length = len(aml4_leaderboard_df)
length
meta_data["models_generated"] = length

In [22]:
# save leaderboard
leaderboard_stats=run_id+'_300_leaderboard.csv'
aml4_leaderboard_df.to_csv(leaderboard_stats)

In [23]:
aml4_leaderboard_df=aml4.leaderboard.as_data_frame()
model4_set=aml4_leaderboard_df['model_id']
model4_set

0      GBM_grid_1_AutoML_20190426_033319_model_3
1      GBM_grid_1_AutoML_20190426_033319_model_7
2                   GBM_1_AutoML_20190426_033309
3      GLM_grid_1_AutoML_20190426_033309_model_1
4      GLM_grid_1_AutoML_20190426_033319_model_1
5     GBM_grid_1_AutoML_20190426_033319_model_12
6                   GBM_1_AutoML_20190426_033319
7      GBM_grid_1_AutoML_20190426_033319_model_1
8      GBM_grid_1_AutoML_20190426_033319_model_4
9                   GBM_5_AutoML_20190426_033319
10    GBM_grid_1_AutoML_20190426_033319_model_11
11                  GBM_4_AutoML_20190426_033319
12                  GBM_2_AutoML_20190426_033319
13     GBM_grid_1_AutoML_20190426_033319_model_2
14    GBM_grid_1_AutoML_20190426_033319_model_10
15                  GBM_3_AutoML_20190426_033319
16                  DRF_1_AutoML_20190426_033309
17                  GBM_2_AutoML_20190426_033309
18     GBM_grid_1_AutoML_20190426_033319_model_6
19                  XRT_1_AutoML_20190426_033319
20                  

generating best parameters and storing them in a json file

In [24]:
count = 0;
for i in model4_set:
    count = count+1;
for i in range(0,count):
    mod_best=h2o.get_model(model4_set[i])
    parameters = mod_best.params
    n= str((model4_set[i]))+'__300'
    dict_to_json(parameters,n)

In [25]:
# Update and save meta data
n=run_id+'_300'+'_meta_data.json'
dict_to_json(meta_data,n)

In [26]:
meta_data

{'start_time': 1556263975.8294196,
 'target': 'Dataset',
 'server_path': 'C:\\Users\\Manvi\\Anaconda3\\indian-liver-patient-records\\New folder',
 'data_path': None,
 'test_path': None,
 'max_models': None,
 'run_time': 300,
 'run_id': 'wnRs4lzNR',
 'scale': False,
 'classification': True,
 'model_path': None,
 'balance': False,
 'balance_threshold': 0.2,
 'project': None,
 'end_time': 1556264020.5950823,
 'execution_time': 44.7656626701355,
 'run_path': 'C:\\Users\\Manvi\\Anaconda3\\indian-liver-patient-records\\New folder\\wnRs4lzNR',
 'nthreads': 1,
 'min_mem_size': 0,
 'analysis': 0,
 'X': ['Age',
  'Gender',
  'Total_Bilirubin',
  'Direct_Bilirubin',
  'Alkaline_Phosphotase',
  'Alamine_Aminotransferase',
  'Aspartate_Aminotransferase',
  'Total_Protiens',
  'Albumin',
  'Albumin_and_Globulin_Ratio'],
 'models_generated': 25}

In [27]:
mods=mod_best.coef_norm
print(mods)

Model Details
H2OGradientBoostingEstimator :  Gradient Boosting Machine
Model Key:  GBM_grid_1_AutoML_20190426_033319_model_5


ModelMetricsBinomial: gbm
** Reported on train data. **

MSE: 0.20052044842921568
RMSE: 0.44779509647741306
LogLoss: 0.5894901589588553
Mean Per-Class Error: 0.1374582565637955
AUC: 0.9392489060340856
pr_auc: 0.8706669682403423
Gini: 0.8784978120681712
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.2915087856939063: 


0,1,2,3,4
,1.0,2.0,Error,Rate
1,391.0,25.0,0.0601,(25.0/416.0)
2,43.0,124.0,0.2575,(43.0/167.0)
Total,434.0,149.0,0.1166,(68.0/583.0)


Maximum Metrics: Maximum metrics at their respective thresholds



0,1,2,3
metric,threshold,value,idx
max f1,0.2915088,0.7848101,115.0
max f2,0.2881433,0.8472687,171.0
max f0point5,0.2921180,0.8345120,105.0
max accuracy,0.2921180,0.8867925,105.0
max precision,0.2999945,1.0,0.0
max recall,0.2825824,1.0,287.0
max specificity,0.2999945,1.0,0.0
max absolute_mcc,0.2921180,0.7134960,105.0
max min_per_class_accuracy,0.2888808,0.8533654,154.0


Gains/Lift Table: Avg response rate: 28.64 %, avg score: 28.64 %



0,1,2,3,4,5,6,7,8,9,10,11,12,13
,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
,1,0.0102916,0.2983277,3.4910180,3.4910180,1.0,0.2990308,1.0,0.2990308,0.0359281,0.0359281,249.1017964,249.1017964
,2,0.0205832,0.2976581,3.4910180,3.4910180,1.0,0.2980337,1.0,0.2985322,0.0359281,0.0718563,249.1017964,249.1017964
,3,0.0308748,0.2971469,3.4910180,3.4910180,1.0,0.2973459,1.0,0.2981368,0.0359281,0.1077844,249.1017964,249.1017964
,4,0.0411664,0.2968343,3.4910180,3.4910180,1.0,0.2969591,1.0,0.2978424,0.0359281,0.1437126,249.1017964,249.1017964
,5,0.0514580,0.2966422,3.4910180,3.4910180,1.0,0.2967251,1.0,0.2976189,0.0359281,0.1796407,249.1017964,249.1017964
,6,0.1012007,0.2947305,3.3706380,3.4318482,0.9655172,0.2955868,0.9830508,0.2966201,0.1676647,0.3473054,237.0638034,243.1848168
,7,0.1509434,0.2938776,2.8891183,3.2529940,0.8275862,0.2943240,0.9318182,0.2958634,0.1437126,0.4910180,188.9118315,225.2994012
,8,0.2006861,0.2929045,2.6483585,3.1031271,0.7586207,0.2934072,0.8888889,0.2952546,0.1317365,0.6227545,164.8358456,210.3127079
,9,0.3001715,0.2902359,1.6251291,2.6132763,0.4655172,0.2916352,0.7485714,0.2940550,0.1616766,0.7844311,62.5129052,161.3276305




ModelMetricsBinomial: gbm
** Reported on cross-validation data. **

MSE: 0.20290388717585647
RMSE: 0.4504485399863746
LogLoss: 0.5952747974969533
Mean Per-Class Error: 0.3746329456471672
AUC: 0.6529249193919853
pr_auc: 0.36870760586396106
Gini: 0.30584983878397054
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.27993274011344466: 


0,1,2,3,4
,1.0,2.0,Error,Rate
1,113.0,303.0,0.7284,(303.0/416.0)
2,6.0,161.0,0.0359,(6.0/167.0)
Total,119.0,464.0,0.53,(309.0/583.0)


Maximum Metrics: Maximum metrics at their respective thresholds



0,1,2,3
metric,threshold,value,idx
max f1,0.2799327,0.5103011,312.0
max f2,0.2799327,0.7111307,312.0
max f0point5,0.2875480,0.4372093,173.0
max accuracy,0.3058468,0.7118353,0.0
max precision,0.2998547,0.4285714,24.0
max recall,0.2749926,1.0,371.0
max specificity,0.3058468,0.9975962,0.0
max absolute_mcc,0.2799327,0.2643893,312.0
max min_per_class_accuracy,0.2866435,0.6107784,194.0


Gains/Lift Table: Avg response rate: 28.64 %, avg score: 28.64 %



0,1,2,3,4,5,6,7,8,9,10,11,12,13
,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
,1,0.0102916,0.3043622,0.5818363,0.5818363,0.1666667,0.3053452,0.1666667,0.3053452,0.0059880,0.0059880,-41.8163673,-41.8163673
,2,0.0205832,0.3029111,1.7455090,1.1636727,0.5,0.3037590,0.3333333,0.3045521,0.0179641,0.0239521,74.5508982,16.3672655
,3,0.0308748,0.3015517,0.5818363,0.9697272,0.1666667,0.3023653,0.2777778,0.3038232,0.0059880,0.0299401,-41.8163673,-3.0272788
,4,0.0411664,0.3008025,2.9091816,1.4545908,0.8333333,0.3012313,0.4166667,0.3031752,0.0299401,0.0598802,190.9181637,45.4590818
,5,0.0514580,0.2997372,1.1636727,1.3964072,0.3333333,0.3000932,0.4,0.3025588,0.0119760,0.0718563,16.3672655,39.6407186
,6,0.1012007,0.2971396,1.0834194,1.2425657,0.3103448,0.2983089,0.3559322,0.3004699,0.0538922,0.1257485,8.3419368,24.2565716
,7,0.1509434,0.2951599,1.0834194,1.1901198,0.3103448,0.2959984,0.3409091,0.2989963,0.0538922,0.1796407,8.3419368,19.0119760
,8,0.2006861,0.2928795,1.9260789,1.3725370,0.5517241,0.2941728,0.3931624,0.2978008,0.0958084,0.2754491,92.6078877,37.2536977
,9,0.3001715,0.2895204,1.2037993,1.3166125,0.3448276,0.2909939,0.3771429,0.2955448,0.1197605,0.3952096,20.3799298,31.6612489



Cross-Validation Metrics Summary: 


0,1,2,3,4,5,6,7
,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.6381521,0.0507575,0.6410257,0.6324787,0.5982906,0.7672414,0.5517241
auc,0.7345716,0.0397529,0.7482853,0.7209302,0.6582376,0.8305556,0.7148494
err,0.3618479,0.0507575,0.3589744,0.3675214,0.4017094,0.2327586,0.4482759
err_count,42.2,5.9228373,42.0,43.0,47.0,27.0,52.0
f0point5,0.4902629,0.0526814,0.5063291,0.4662379,0.4078014,0.625,0.4459459
f1,0.5813351,0.0415618,0.6037736,0.5742574,0.4946237,0.6746988,0.5593221
f2,0.7212971,0.0331138,0.7476636,0.7474227,0.6284153,0.7329843,0.75
lift_top_group,1.0408181,0.6041802,0.0,1.8870968,0.0,1.6111112,1.7058823
logloss,0.5953192,0.0134694,0.6142051,0.5758563,0.5699882,0.6153972,0.6011494


Scoring History: 


0,1,2,3,4,5,6,7,8,9
,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_auc,training_pr_auc,training_lift,training_classification_error
,2019-04-26 03:33:33,3.202 sec,0.0,0.4521019,0.5989418,0.5,0.0,1.0,0.7135506
,2019-04-26 03:33:33,3.217 sec,5.0,0.4513670,0.5973186,0.9108634,0.8092588,3.4910180,0.1680961
,2019-04-26 03:33:33,3.237 sec,10.0,0.4506510,0.5957412,0.9239694,0.8427321,3.4910180,0.1509434
,2019-04-26 03:33:33,3.249 sec,15.0,0.4499062,0.5941047,0.9337359,0.8565656,3.4910180,0.1389365
,2019-04-26 03:33:33,3.262 sec,20.0,0.4492057,0.5925690,0.9406236,0.8726717,3.4910180,0.1149228
,2019-04-26 03:33:33,3.275 sec,25.0,0.4484838,0.5909917,0.9391553,0.8712559,3.4910180,0.1063465
,2019-04-26 03:33:33,3.288 sec,30.0,0.4477951,0.5894902,0.9392489,0.8706670,3.4910180,0.1166381


Variable Importances: 


0,1,2,3
variable,relative_importance,scaled_importance,percentage
Total_Bilirubin,272.5380249,1.0,0.2494325
Alkaline_Phosphotase,166.2353821,0.6099530,0.1521421
Alamine_Aminotransferase,142.9506836,0.5245165,0.1308315
Direct_Bilirubin,128.7798767,0.4725208,0.1178620
Age,116.1490021,0.4261754,0.1063020
Aspartate_Aminotransferase,73.0823517,0.2681547,0.0668865
Albumin,66.6127014,0.2444162,0.0609653
Albumin_and_Globulin_Ratio,57.3744545,0.2105191,0.0525103
Total_Protiens,55.7629128,0.2046060,0.0510354


<bound method ModelBase.coef_norm of >


# Conclusion

1.Models have been generated through H2OAutoML for runtime of 300secs.

2.A leaderboard is obtained listing the best models 

3.Best models are considered based on metrics like rmse,mse,auc,logloss.

4.Model from GBM is considered as best

# Contribution

selected a dataset and performed H2O algorithm to generate a leaderboard of best models

# Citations

https://github.com/prabhuSub/Hyperparamter-Samples
    
https://machinelearningmastery.com/vector-norms-machine-learning/
    
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/grid-search.html?highlight=hyperparameters#supported-grid-search-hyperparameters


# License

Copyright 2019 Manogjna Potluri 

Copyright 2019 Manvitha Jagadam


Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE
