# Abstract

Hyperparameters have much importance in data science because they directly control the behaviour of the training algorithm and have a significant impact on the performance of the model. Finding out the hyperparameters is a strenuous task.The aim of this project is to make the process easier and determine the important hyperparameters from the dataset.H2O algorithm is used to achieve this.Various models are generated for runtimes 300,500,800,1000,1200

In [2]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import seaborn as sns
import random, os, sys
from datetime import datetime
import time

In [3]:
# Loading the data set using pandas
df=pd.read_csv("indian_liver_patient.csv", sep=',')

In [4]:
df.head()

Unnamed: 0,Age,Gender,Total_Bilirubin,Direct_Bilirubin,Alkaline_Phosphotase,Alamine_Aminotransferase,Aspartate_Aminotransferase,Total_Protiens,Albumin,Albumin_and_Globulin_Ratio,Dataset
0,65,Female,0.7,0.1,187,16,18,6.8,3.3,0.9,1
1,62,Male,10.9,5.5,699,64,100,7.5,3.2,0.74,1
2,62,Male,7.3,4.1,490,60,68,7.0,3.3,0.89,1
3,58,Male,1.0,0.4,182,14,20,6.8,3.4,1.0,1
4,72,Male,3.9,2.0,195,27,59,7.3,2.4,0.4,1


In [5]:
df.describe()

Unnamed: 0,Age,Total_Bilirubin,Direct_Bilirubin,Alkaline_Phosphotase,Alamine_Aminotransferase,Aspartate_Aminotransferase,Total_Protiens,Albumin,Albumin_and_Globulin_Ratio,Dataset
count,583.0,583.0,583.0,583.0,583.0,583.0,583.0,583.0,579.0,583.0
mean,44.746141,3.298799,1.486106,290.576329,80.713551,109.910806,6.48319,3.141852,0.947064,1.286449
std,16.189833,6.209522,2.808498,242.937989,182.620356,288.918529,1.085451,0.795519,0.319592,0.45249
min,4.0,0.4,0.1,63.0,10.0,10.0,2.7,0.9,0.3,1.0
25%,33.0,0.8,0.2,175.5,23.0,25.0,5.8,2.6,0.7,1.0
50%,45.0,1.0,0.3,208.0,35.0,42.0,6.6,3.1,0.93,1.0
75%,58.0,2.6,1.3,298.0,60.5,87.0,7.2,3.8,1.1,2.0
max,90.0,75.0,19.7,2110.0,2000.0,4929.0,9.6,5.5,2.8,2.0


# data cleaning

In [6]:
df.shape

(583, 11)

In [7]:
#To check the data types
df.dtypes

Age                             int64
Gender                         object
Total_Bilirubin               float64
Direct_Bilirubin              float64
Alkaline_Phosphotase            int64
Alamine_Aminotransferase        int64
Aspartate_Aminotransferase      int64
Total_Protiens                float64
Albumin                       float64
Albumin_and_Globulin_Ratio    float64
Dataset                         int64
dtype: object

checking for nulls

In [8]:
#To show the total NULL Values present in the NULL Valued fields
df.isnull().sum()

Age                           0
Gender                        0
Total_Bilirubin               0
Direct_Bilirubin              0
Alkaline_Phosphotase          0
Alamine_Aminotransferase      0
Aspartate_Aminotransferase    0
Total_Protiens                0
Albumin                       0
Albumin_and_Globulin_Ratio    4
Dataset                       0
dtype: int64

filling the null values with median

In [9]:
fill = df['Albumin_and_Globulin_Ratio']
Albumin_and_Globulin_Ratio = fill.fillna(fill.median(),inplace=True)

# H2O

In [1]:
import h2o
from h2o.automl import H2OAutoML
import random, os, sys
from datetime import datetime
import pandas as pd
import logging
import csv
import optparse
import time
import json
from distutils.util import strtobool
import psutil

import warnings
warnings.filterwarnings('ignore')

In [2]:
port_no=random.randint(5555,55555)
h2o.init(strict_version_check=False,min_mem_size_GB=5,port=port_no)

Checking whether there is an H2O instance running at http://localhost:21833 ..... not found.
Attempting to start a local H2O server...
; Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
  Starting server from C:\Users\Manvi\Anaconda3\lib\site-packages\h2o\backend\bin\h2o.jar
  Ice root: C:\Users\Manvi\AppData\Local\Temp\tmpye4ugl9g
  JVM stdout: C:\Users\Manvi\AppData\Local\Temp\tmpye4ugl9g\h2o_Manvi_started_from_python.out
  JVM stderr: C:\Users\Manvi\AppData\Local\Temp\tmpye4ugl9g\h2o_Manvi_started_from_python.err
  Server is running at http://127.0.0.1:21833
Connecting to H2O server at http://127.0.0.1:21833 ... successful.


0,1
H2O cluster uptime:,02 secs
H2O cluster timezone:,America/New_York
H2O data parsing timezone:,UTC
H2O cluster version:,3.24.0.1
H2O cluster version age:,25 days
H2O cluster name:,H2O_from_python_Manvi_znvqu5
H2O cluster total nodes:,1
H2O cluster free memory:,4.792 Gb
H2O cluster total cores:,8
H2O cluster allowed cores:,8


In [3]:
#importing data to the server
df = h2o.import_file(path="indian_liver_patient.csv")

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [4]:
df.head()

Age,Gender,Total_Bilirubin,Direct_Bilirubin,Alkaline_Phosphotase,Alamine_Aminotransferase,Aspartate_Aminotransferase,Total_Protiens,Albumin,Albumin_and_Globulin_Ratio,Dataset
65,Female,0.7,0.1,187,16,18,6.8,3.3,0.9,1
62,Male,10.9,5.5,699,64,100,7.5,3.2,0.74,1
62,Male,7.3,4.1,490,60,68,7.0,3.3,0.89,1
58,Male,1.0,0.4,182,14,20,6.8,3.4,1.0,1
72,Male,3.9,2.0,195,27,59,7.3,2.4,0.4,1
46,Male,1.8,0.7,208,19,14,7.6,4.4,1.3,1
26,Female,0.9,0.2,154,16,12,7.0,3.5,1.0,1
29,Female,0.9,0.3,202,14,11,6.7,3.6,1.1,1
17,Male,0.9,0.3,202,22,19,7.4,4.1,1.2,2
55,Male,0.7,0.2,290,53,58,6.8,3.4,1.0,1




In [5]:
df.isna()

isNA(Age),isNA(Gender),isNA(Total_Bilirubin),isNA(Direct_Bilirubin),isNA(Alkaline_Phosphotase),isNA(Alamine_Aminotransferase),isNA(Aspartate_Aminotransferase),isNA(Total_Protiens),isNA(Albumin),isNA(Albumin_and_Globulin_Ratio),isNA(Dataset)
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0




In [6]:
target = 'Dataset'
run_time=500
pct_memory=0.5
server_path=None 
data_path=None
all_variables=None
test_path=None
model_path=None
nthreads=1 
name=None 
virtual_memory=psutil.virtual_memory()
min_mem_size=int(round(int(pct_memory*virtual_memory.available)/1073741824,0))
run_id='SOME_ID_20180617_221529' # Just some arbitrary ID
classification=True
scale=False
max_models=None
balance_y=False # balance_classes=balance_y
balance_threshold=0.2
project ="automl_test"
analysis=0

defining functions

In [7]:
def alphabet(n):
  alpha='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'    
  str=''
  r=len(alpha)-1   
  while len(str)<n:
    i=random.randint(0,r)
    str+=alpha[i]   
  return str


def set_meta_data(analysis,run_id,server,data,test,model_path,target,run_time,classification,scale,model,balance,balance_threshold,name,path,nthreads,min_mem_size):
  m_data={}
  m_data['start_time'] = time.time()
  m_data['target']=target
  m_data['server_path']=server
  m_data['data_path']=data 
  m_data['test_path']=test
  m_data['max_models']=model
  m_data['run_time']=run_time
  m_data['run_id'] =run_id
  m_data['scale']=scale
  m_data['classification']=classification
  m_data['scale']=False
  m_data['model_path']=model_path
  m_data['balance']=balance
  m_data['balance_threshold']=balance_threshold
  m_data['project'] =name
  m_data['end_time'] = time.time()
  m_data['execution_time'] = 0.0
  m_data['run_path'] =path
  m_data['nthreads'] = nthreads
  m_data['min_mem_size'] = min_mem_size
  m_data['analysis'] = analysis
  return m_data


def automl(maxruntime,X,Y,df):
    aml = H2OAutoML(max_runtime_secs=maxruntime,exclude_algos = ['DeepLearning', 'StackedEnsemble'])
    aml.train(x=X,y=y,training_frame=df)
    return aml


def dict_to_json(dct,n):  
    j = json.dumps(dct, indent=4)
    f = open(n, 'w')
    print(j, file=f)
    f.close()

generating a random unique run ID for every runtime

In [8]:
run_id=alphabet(9)
if server_path==None:
  server_path=os.path.abspath(os.curdir)
os.chdir(server_path) 
run_dir = os.path.join(server_path,run_id)
os.mkdir(run_dir)
os.chdir(run_dir)    

# run_id to std out
print ("500_"+run_id)

500_hBs8QswFL


In [9]:
# meta data
meta_data = set_meta_data(analysis, run_id,server_path,data_path,test_path,model_path,target,run_time,classification,scale,max_models,balance_y,balance_threshold,name,run_dir,nthreads,min_mem_size)
print(meta_data)

{'start_time': 1556264280.595097, 'target': 'Dataset', 'server_path': 'C:\\Users\\Manvi\\Anaconda3\\indian-liver-patient-records\\New folder', 'data_path': None, 'test_path': None, 'max_models': None, 'run_time': 500, 'run_id': 'hBs8QswFL', 'scale': False, 'classification': True, 'model_path': None, 'balance': False, 'balance_threshold': 0.2, 'project': None, 'end_time': 1556264280.595097, 'execution_time': 0.0, 'run_path': 'C:\\Users\\Manvi\\Anaconda3\\indian-liver-patient-records\\New folder\\hBs8QswFL', 'nthreads': 1, 'min_mem_size': 0, 'analysis': 0}


In [10]:
y = target
X=[name for name in df.columns if name != y]
print(X)
print(y)

['Age', 'Gender', 'Total_Bilirubin', 'Direct_Bilirubin', 'Alkaline_Phosphotase', 'Alamine_Aminotransferase', 'Aspartate_Aminotransferase', 'Total_Protiens', 'Albumin', 'Albumin_and_Globulin_Ratio']
Dataset


In [11]:
meta_data['X']=X  
model_start_time = time.time()

In [12]:
if analysis == 3:
  classification=False
elif analysis == 2:
  classification=True
elif analysis == 1:
  classification=True

problem type - the dependent variable is of classification type

In [13]:
if classification:
    df[y] = df[y].asfactor()

In [14]:
classification=True
if classification:
    print(df[y].levels())

[['1', '2']]


# Runtime: 500sec

In [15]:
aml11 = automl(500,X,y,df)

AutoML progress: |████████████████████████████████████████████████████████| 100%


In [16]:
meta_data['run_time'] = 500
meta_data['end_time'] = time.time()
meta_data['execution_time'] = meta_data['end_time'] - meta_data['start_time']

generating leaderboard

In [17]:
aml11.leaderboard

model_id,auc,logloss,mean_per_class_error,rmse,mse
GBM_grid_1_AutoML_20190426_033825_model_6,0.759946,1.36508,0.285496,0.481891,0.232219
GBM_grid_1_AutoML_20190426_033825_model_15,0.752353,0.507743,0.300639,0.413897,0.171311
GLM_grid_1_AutoML_20190426_033825_model_1,0.751699,0.503103,0.2955,0.413658,0.171113
GBM_grid_1_AutoML_20190426_033825_model_11,0.750727,0.616482,0.31209,0.443597,0.196778
GBM_grid_1_AutoML_20190426_033825_model_8,0.746042,0.589001,0.305706,0.442185,0.195527
GBM_5_AutoML_20190426_033825,0.745782,0.514145,0.292398,0.416652,0.173599
GBM_grid_1_AutoML_20190426_033825_model_21,0.74525,0.523523,0.298408,0.419545,0.176018
GBM_grid_1_AutoML_20190426_033825_model_14,0.740968,0.521504,0.296047,0.42118,0.177393
GBM_1_AutoML_20190426_033825,0.740241,0.536871,0.294104,0.426313,0.181743
GBM_grid_1_AutoML_20190426_033825_model_17,0.739802,0.543169,0.315739,0.425547,0.18109




In [18]:
aml11_leaderboard_df=aml11.leaderboard.as_data_frame()
aml11_leaderboard_df

Unnamed: 0,model_id,auc,logloss,mean_per_class_error,rmse,mse
0,GBM_grid_1_AutoML_20190426_033825_model_6,0.759946,1.365079,0.285496,0.481891,0.232219
1,GBM_grid_1_AutoML_20190426_033825_model_15,0.752353,0.507743,0.300639,0.413897,0.171311
2,GLM_grid_1_AutoML_20190426_033825_model_1,0.751699,0.503103,0.2955,0.413658,0.171113
3,GBM_grid_1_AutoML_20190426_033825_model_11,0.750727,0.616482,0.31209,0.443597,0.196778
4,GBM_grid_1_AutoML_20190426_033825_model_8,0.746042,0.589001,0.305706,0.442185,0.195527
5,GBM_5_AutoML_20190426_033825,0.745782,0.514145,0.292398,0.416652,0.173599
6,GBM_grid_1_AutoML_20190426_033825_model_21,0.74525,0.523523,0.298408,0.419545,0.176018
7,GBM_grid_1_AutoML_20190426_033825_model_14,0.740968,0.521504,0.296047,0.42118,0.177393
8,GBM_1_AutoML_20190426_033825,0.740241,0.536871,0.294104,0.426313,0.181743
9,GBM_grid_1_AutoML_20190426_033825_model_17,0.739802,0.543169,0.315739,0.425547,0.18109


In [19]:
length = len(aml11_leaderboard_df)
length
meta_data["models_generated"] = length

loading the leaderboard into csv file

In [20]:
# save leaderboard
leaderboard_stats=run_id+ '500sec'+ '_leaderboard.csv'
aml11_leaderboard_df.to_csv(leaderboard_stats)

In [21]:
aml11_leaderboard_df=aml11.leaderboard.as_data_frame()
model11_set=aml11_leaderboard_df['model_id']
model11_set

0      GBM_grid_1_AutoML_20190426_033825_model_6
1     GBM_grid_1_AutoML_20190426_033825_model_15
2      GLM_grid_1_AutoML_20190426_033825_model_1
3     GBM_grid_1_AutoML_20190426_033825_model_11
4      GBM_grid_1_AutoML_20190426_033825_model_8
5                   GBM_5_AutoML_20190426_033825
6     GBM_grid_1_AutoML_20190426_033825_model_21
7     GBM_grid_1_AutoML_20190426_033825_model_14
8                   GBM_1_AutoML_20190426_033825
9     GBM_grid_1_AutoML_20190426_033825_model_17
10                  GBM_3_AutoML_20190426_033825
11                  XRT_1_AutoML_20190426_033825
12     GBM_grid_1_AutoML_20190426_033825_model_1
13    GBM_grid_1_AutoML_20190426_033825_model_19
14     GBM_grid_1_AutoML_20190426_033825_model_5
15                  GBM_4_AutoML_20190426_033825
16    GBM_grid_1_AutoML_20190426_033825_model_16
17     GBM_grid_1_AutoML_20190426_033825_model_4
18    GBM_grid_1_AutoML_20190426_033825_model_18
19    GBM_grid_1_AutoML_20190426_033825_model_20
20     GBM_grid_1_Au

loading the obtained parameters into a json file

In [22]:
count = 0;
for i in model11_set:
    count = count+1;
for i in range(0,count):
    mod_best=h2o.get_model(model11_set[i])
    parameters = mod_best.params
    n= str((model11_set[i]))+'__500'
    dict_to_json(parameters,n)

In [23]:
# Update and save meta data
n=run_id+'_meta_data.json'
dict_to_json(meta_data,n)

In [24]:
meta_data

{'start_time': 1556264280.595097,
 'target': 'Dataset',
 'server_path': 'C:\\Users\\Manvi\\Anaconda3\\indian-liver-patient-records\\New folder',
 'data_path': None,
 'test_path': None,
 'max_models': None,
 'run_time': 500,
 'run_id': 'hBs8QswFL',
 'scale': False,
 'classification': True,
 'model_path': None,
 'balance': False,
 'balance_threshold': 0.2,
 'project': None,
 'end_time': 1556264326.9088955,
 'execution_time': 46.31379842758179,
 'run_path': 'C:\\Users\\Manvi\\Anaconda3\\indian-liver-patient-records\\New folder\\hBs8QswFL',
 'nthreads': 1,
 'min_mem_size': 0,
 'analysis': 0,
 'X': ['Age',
  'Gender',
  'Total_Bilirubin',
  'Direct_Bilirubin',
  'Alkaline_Phosphotase',
  'Alamine_Aminotransferase',
  'Aspartate_Aminotransferase',
  'Total_Protiens',
  'Albumin',
  'Albumin_and_Globulin_Ratio'],
 'models_generated': 29}

generating normalized coeffiecients

In [25]:
mods=mod_best.coef_norm
print(mods)

Model Details
H2OGradientBoostingEstimator :  Gradient Boosting Machine
Model Key:  GBM_grid_1_AutoML_20190426_033825_model_2


ModelMetricsBinomial: gbm
** Reported on train data. **

MSE: 0.20124583489834685
RMSE: 0.4486043188583307
LogLoss: 0.5912540160241964
Mean Per-Class Error: 0.18531926531552279
AUC: 0.9020972478120682
pr_auc: 0.8060860269018478
Gini: 0.8041944956241365
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.29037238827693107: 


0,1,2,3,4
,1.0,2.0,Error,Rate
1,367.0,49.0,0.1178,(49.0/416.0)
2,43.0,124.0,0.2575,(43.0/167.0)
Total,410.0,173.0,0.1578,(92.0/583.0)


Maximum Metrics: Maximum metrics at their respective thresholds



0,1,2,3
metric,threshold,value,idx
max f1,0.2903724,0.7294118,124.0
max f2,0.2844579,0.8023483,253.0
max f0point5,0.2926311,0.7992895,72.0
max accuracy,0.2926311,0.8524871,72.0
max precision,0.2992219,1.0,0.0
max recall,0.2822922,1.0,292.0
max specificity,0.2992219,1.0,0.0
max absolute_mcc,0.2926311,0.6228689,72.0
max min_per_class_accuracy,0.2890572,0.8083832,155.0


Gains/Lift Table: Avg response rate: 28.64 %, avg score: 28.65 %



0,1,2,3,4,5,6,7,8,9,10,11,12,13
,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
,1,0.0102916,0.2976186,3.4910180,3.4910180,1.0,0.2985919,1.0,0.2985919,0.0359281,0.0359281,249.1017964,249.1017964
,2,0.0205832,0.2968132,3.4910180,3.4910180,1.0,0.2972844,1.0,0.2979381,0.0359281,0.0718563,249.1017964,249.1017964
,3,0.0308748,0.2961875,3.4910180,3.4910180,1.0,0.2964799,1.0,0.2974520,0.0359281,0.1077844,249.1017964,249.1017964
,4,0.0411664,0.2956417,2.9091816,3.3455589,0.8333333,0.2958810,0.9583333,0.2970593,0.0299401,0.1377246,190.9181637,234.5558882
,5,0.0514580,0.2954356,3.4910180,3.3746507,1.0,0.2955413,0.9666667,0.2967557,0.0359281,0.1736527,249.1017964,237.4650699
,6,0.1012007,0.2936175,3.1298782,3.2543388,0.8965517,0.2944476,0.9322034,0.2956212,0.1556886,0.3293413,212.9878175,225.4338780
,7,0.1509434,0.2929193,3.1298782,3.2133234,0.8965517,0.2932331,0.9204545,0.2948342,0.1556886,0.4850299,212.9878175,221.3323353
,8,0.2006861,0.2920515,1.9260789,2.8942628,0.5517241,0.2924828,0.8290598,0.2942514,0.0958084,0.5808383,92.6078877,189.4262757
,9,0.3001715,0.2902829,1.6251291,2.4736356,0.4655172,0.2911991,0.7085714,0.2932397,0.1616766,0.7425150,62.5129052,147.3635586




ModelMetricsBinomial: gbm
** Reported on cross-validation data. **

MSE: 0.20291466940598216
RMSE: 0.45046050815358074
LogLoss: 0.5953065431245199
Mean Per-Class Error: 0.37301358820819897
AUC: 0.6491896015660986
pr_auc: 0.37935142307949304
Gini: 0.29837920313219723
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.2827973498347851: 


0,1,2,3,4
,1.0,2.0,Error,Rate
1,148.0,268.0,0.6442,(268.0/416.0)
2,17.0,150.0,0.1018,(17.0/167.0)
Total,165.0,418.0,0.4889,(285.0/583.0)


Maximum Metrics: Maximum metrics at their respective thresholds



0,1,2,3
metric,threshold,value,idx
max f1,0.2827973,0.5128205,273.0
max f2,0.2785738,0.703125,324.0
max f0point5,0.2886080,0.4221770,153.0
max accuracy,0.3012545,0.7186964,9.0
max precision,0.3012545,0.6363636,9.0
max recall,0.2751566,1.0,369.0
max specificity,0.3039446,0.9975962,0.0
max absolute_mcc,0.2827973,0.2548954,273.0
max min_per_class_accuracy,0.2866921,0.59375,188.0


Gains/Lift Table: Avg response rate: 28.64 %, avg score: 28.65 %



0,1,2,3,4,5,6,7,8,9,10,11,12,13
,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
,1,0.0102916,0.3019614,1.1636727,1.1636727,0.3333333,0.3024690,0.3333333,0.3024690,0.0119760,0.0119760,16.3672655,16.3672655
,2,0.0205832,0.3010647,2.9091816,2.0364271,0.8333333,0.3016017,0.5833333,0.3020354,0.0299401,0.0419162,190.9181637,103.6427146
,3,0.0308748,0.3005288,1.1636727,1.7455090,0.3333333,0.3007282,0.5,0.3015996,0.0119760,0.0538922,16.3672655,74.5508982
,4,0.0411664,0.2997881,0.5818363,1.4545908,0.1666667,0.3001772,0.4166667,0.3012440,0.0059880,0.0598802,-41.8163673,45.4590818
,5,0.0514580,0.2993630,1.7455090,1.5127745,0.5,0.2995489,0.4333333,0.3009050,0.0179641,0.0778443,74.5508982,51.2774451
,6,0.1012007,0.2974161,1.2037993,1.3609053,0.3448276,0.2983876,0.3898305,0.2996676,0.0598802,0.1377246,20.3799298,36.0905308
,7,0.1509434,0.2951420,1.0834194,1.2694611,0.3103448,0.2964346,0.3636364,0.2986022,0.0538922,0.1916168,8.3419368,26.9461078
,8,0.2006861,0.2930179,1.8056989,1.4023747,0.5172414,0.2942103,0.4017094,0.2975136,0.0898204,0.2814371,80.5698947,40.2374738
,9,0.3001715,0.2897203,1.2639893,1.3565098,0.3620690,0.2911993,0.3885714,0.2954209,0.1257485,0.4071856,26.3989263,35.6509837



Cross-Validation Metrics Summary: 


0,1,2,3,4,5,6,7
,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.6656057,0.0437184,0.7264957,0.6495727,0.5726496,0.7413793,0.6379311
auc,0.7456766,0.0320202,0.7362826,0.7344336,0.6957855,0.83125,0.7306313
err,0.3343943,0.0437184,0.2735043,0.3504274,0.4273504,0.2586207,0.3620690
err_count,39.0,5.138093,32.0,41.0,50.0,30.0,42.0
f0point5,0.5074604,0.0480959,0.5701754,0.4785478,0.4088050,0.5970149,0.4827586
f1,0.593398,0.0397912,0.6190476,0.5858586,0.5098040,0.6808510,0.5714286
f2,0.7202908,0.0324319,0.6770833,0.7552083,0.6770833,0.7920792,0.7
lift_top_group,1.6880403,0.7243694,1.625,1.8870968,0.0,3.2222223,1.7058823
logloss,0.5953510,0.0136306,0.6145981,0.5758864,0.5696281,0.615754,0.6008886


Scoring History: 


0,1,2,3,4,5,6,7,8,9
,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_auc,training_pr_auc,training_lift,training_classification_error
,2019-04-26 03:38:35,0.618 sec,0.0,0.4521019,0.5989418,0.5,0.0,1.0,0.7135506
,2019-04-26 03:38:35,0.632 sec,5.0,0.4514937,0.5975981,0.8649960,0.6961833,3.4910180,0.2521441
,2019-04-26 03:38:35,0.643 sec,10.0,0.4509181,0.5963289,0.8895526,0.7918393,3.4910180,0.1629503
,2019-04-26 03:38:35,0.655 sec,15.0,0.4503215,0.5950164,0.8947346,0.8053875,3.4910180,0.1423671
,2019-04-26 03:38:35,0.666 sec,20.0,0.4497376,0.5937340,0.8949217,0.8003669,3.4910180,0.1835334
,2019-04-26 03:38:35,0.677 sec,25.0,0.4491489,0.5924443,0.9010896,0.8094671,3.4910180,0.1612350
,2019-04-26 03:38:35,0.689 sec,30.0,0.4486043,0.5912540,0.9020972,0.8060860,3.4910180,0.1578045


Variable Importances: 


0,1,2,3
variable,relative_importance,scaled_importance,percentage
Total_Bilirubin,266.8583374,1.0,0.3156220
Alkaline_Phosphotase,115.8033142,0.4339505,0.1369643
Age,90.1130676,0.3376813,0.1065796
Direct_Bilirubin,84.5292435,0.3167570,0.0999755
Alamine_Aminotransferase,74.4399261,0.2789492,0.0880425
Aspartate_Aminotransferase,67.6440277,0.2534829,0.0800048
Total_Protiens,48.5343475,0.1818731,0.0574031
Albumin,44.6886330,0.1674620,0.0528547
Albumin_and_Globulin_Ratio,43.5121841,0.1630535,0.0514633


<bound method ModelBase.coef_norm of >


# conclusion

1.Models have been generated through H2OAutoML for runtime of 500secs.

2.A leaderboard is obtained listing the best models 

3. Best models are choosen based on metrics like rmse,mse,auc,logloss.

4.Model obtained through GBM is considered as best

# contribution 

selected a dataset and performed H2O algorithm to generate a leaderboard of best models

# citations


https://github.com/prabhuSub/Hyperparamter-Samples
https://machinelearningmastery.com/vector-norms-machine-learning/
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/grid-search.html?highlight=hyperparameters#supported-grid-search-hyperparameters
    

# License

Copyright 2019 Manogjna Potluri 
Copyright 2019 Manvitha Jagadam


Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.