# Abstract

Hyperparameters have much importance in data science because they directly control the behaviour of the training algorithm and have a significant impact on the performance of the model. Finding out the hyperparameters is a strenuous task.The aim of this project is to make the process easier and determine the important hyperparameters from the dataset.H2O algorithm is used to achieve this.Various models are generated for runtimes 300,500,800,1000,1200

In [1]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import seaborn as sns
import random, os, sys
from datetime import datetime
import time

In [2]:
# Loading the data set using pandas
df=pd.read_csv("indian_liver_patient.csv", sep=',')

In [3]:
df.head()

Unnamed: 0,Age,Gender,Total_Bilirubin,Direct_Bilirubin,Alkaline_Phosphotase,Alamine_Aminotransferase,Aspartate_Aminotransferase,Total_Protiens,Albumin,Albumin_and_Globulin_Ratio,Dataset
0,65,Female,0.7,0.1,187,16,18,6.8,3.3,0.9,1
1,62,Male,10.9,5.5,699,64,100,7.5,3.2,0.74,1
2,62,Male,7.3,4.1,490,60,68,7.0,3.3,0.89,1
3,58,Male,1.0,0.4,182,14,20,6.8,3.4,1.0,1
4,72,Male,3.9,2.0,195,27,59,7.3,2.4,0.4,1


In [4]:
df.describe()

Unnamed: 0,Age,Total_Bilirubin,Direct_Bilirubin,Alkaline_Phosphotase,Alamine_Aminotransferase,Aspartate_Aminotransferase,Total_Protiens,Albumin,Albumin_and_Globulin_Ratio,Dataset
count,583.0,583.0,583.0,583.0,583.0,583.0,583.0,583.0,579.0,583.0
mean,44.746141,3.298799,1.486106,290.576329,80.713551,109.910806,6.48319,3.141852,0.947064,1.286449
std,16.189833,6.209522,2.808498,242.937989,182.620356,288.918529,1.085451,0.795519,0.319592,0.45249
min,4.0,0.4,0.1,63.0,10.0,10.0,2.7,0.9,0.3,1.0
25%,33.0,0.8,0.2,175.5,23.0,25.0,5.8,2.6,0.7,1.0
50%,45.0,1.0,0.3,208.0,35.0,42.0,6.6,3.1,0.93,1.0
75%,58.0,2.6,1.3,298.0,60.5,87.0,7.2,3.8,1.1,2.0
max,90.0,75.0,19.7,2110.0,2000.0,4929.0,9.6,5.5,2.8,2.0


# Data cleaning

In [5]:
df.shape

(583, 11)

In [6]:
#To check the data types
df.dtypes

Age                             int64
Gender                         object
Total_Bilirubin               float64
Direct_Bilirubin              float64
Alkaline_Phosphotase            int64
Alamine_Aminotransferase        int64
Aspartate_Aminotransferase      int64
Total_Protiens                float64
Albumin                       float64
Albumin_and_Globulin_Ratio    float64
Dataset                         int64
dtype: object

In [7]:
#To show the total NULL Values present in the NULL Valued fields
df.isnull().sum()

Age                           0
Gender                        0
Total_Bilirubin               0
Direct_Bilirubin              0
Alkaline_Phosphotase          0
Alamine_Aminotransferase      0
Aspartate_Aminotransferase    0
Total_Protiens                0
Albumin                       0
Albumin_and_Globulin_Ratio    4
Dataset                       0
dtype: int64

filling null values with median

In [8]:
fill = df['Albumin_and_Globulin_Ratio']
Albumin_and_Globulin_Ratio = fill.fillna(fill.median(),inplace=True)

# H2O

In [9]:
import h2o
from h2o.automl import H2OAutoML
import random, os, sys
from datetime import datetime
import pandas as pd
import logging
import csv
import optparse
import time
import json
from distutils.util import strtobool
import psutil

import warnings
warnings.filterwarnings('ignore')

In [10]:
port_no=random.randint(5555,55555)
h2o.init(strict_version_check=False,min_mem_size_GB=5,port=port_no)

Checking whether there is an H2O instance running at http://localhost:39276 ..... not found.
Attempting to start a local H2O server...
; Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
  Starting server from C:\Users\Manvi\Anaconda3\lib\site-packages\h2o\backend\bin\h2o.jar
  Ice root: C:\Users\Manvi\AppData\Local\Temp\tmpudkkovc5
  JVM stdout: C:\Users\Manvi\AppData\Local\Temp\tmpudkkovc5\h2o_Manvi_started_from_python.out
  JVM stderr: C:\Users\Manvi\AppData\Local\Temp\tmpudkkovc5\h2o_Manvi_started_from_python.err
  Server is running at http://127.0.0.1:39276
Connecting to H2O server at http://127.0.0.1:39276 ... successful.


0,1
H2O cluster uptime:,02 secs
H2O cluster timezone:,America/New_York
H2O data parsing timezone:,UTC
H2O cluster version:,3.24.0.1
H2O cluster version age:,25 days
H2O cluster name:,H2O_from_python_Manvi_jbzc1q
H2O cluster total nodes:,1
H2O cluster free memory:,4.792 Gb
H2O cluster total cores:,8
H2O cluster allowed cores:,8


In [11]:
#importing data to the server
df = h2o.import_file(path="indian_liver_patient.csv")

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [12]:
df.head()

Age,Gender,Total_Bilirubin,Direct_Bilirubin,Alkaline_Phosphotase,Alamine_Aminotransferase,Aspartate_Aminotransferase,Total_Protiens,Albumin,Albumin_and_Globulin_Ratio,Dataset
65,Female,0.7,0.1,187,16,18,6.8,3.3,0.9,1
62,Male,10.9,5.5,699,64,100,7.5,3.2,0.74,1
62,Male,7.3,4.1,490,60,68,7.0,3.3,0.89,1
58,Male,1.0,0.4,182,14,20,6.8,3.4,1.0,1
72,Male,3.9,2.0,195,27,59,7.3,2.4,0.4,1
46,Male,1.8,0.7,208,19,14,7.6,4.4,1.3,1
26,Female,0.9,0.2,154,16,12,7.0,3.5,1.0,1
29,Female,0.9,0.3,202,14,11,6.7,3.6,1.1,1
17,Male,0.9,0.3,202,22,19,7.4,4.1,1.2,2
55,Male,0.7,0.2,290,53,58,6.8,3.4,1.0,1




In [20]:
df.isna()

isNA(Age),isNA(Gender),isNA(Total_Bilirubin),isNA(Direct_Bilirubin),isNA(Alkaline_Phosphotase),isNA(Alamine_Aminotransferase),isNA(Aspartate_Aminotransferase),isNA(Total_Protiens),isNA(Albumin),isNA(Albumin_and_Globulin_Ratio),isNA(Dataset)
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0




In [36]:
target = 'Dataset'
run_time=1000
pct_memory=0.5
server_path=None 
data_path=None
all_variables=None
test_path=None
model_path=None
nthreads=1 
name=None 
virtual_memory=psutil.virtual_memory()
min_mem_size=int(round(int(pct_memory*virtual_memory.available)/1073741824,0))
run_id='SOME_ID_20180617_221529' # Just some arbitrary ID
classification=True
scale=False
max_models=None
balance_y=False # balance_classes=balance_y
balance_threshold=0.2
project ="automl_test"
analysis=0

defining functions

In [37]:
def alphabet(n):
  alpha='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'    
  str=''
  r=len(alpha)-1   
  while len(str)<n:
    i=random.randint(0,r)
    str+=alpha[i]   
  return str


def set_meta_data(analysis,run_id,server,data,test,model_path,target,run_time,classification,scale,model,balance,balance_threshold,name,path,nthreads,min_mem_size):
  m_data={}
  m_data['start_time'] = time.time()
  m_data['target']=target
  m_data['server_path']=server
  m_data['data_path']=data 
  m_data['test_path']=test
  m_data['max_models']=model
  m_data['run_time']=run_time
  m_data['run_id'] =run_id
  m_data['scale']=scale
  m_data['classification']=classification
  m_data['scale']=False
  m_data['model_path']=model_path
  m_data['balance']=balance
  m_data['balance_threshold']=balance_threshold
  m_data['project'] =name
  m_data['end_time'] = time.time()
  m_data['execution_time'] = 0.0
  m_data['run_path'] =path
  m_data['nthreads'] = nthreads
  m_data['min_mem_size'] = min_mem_size
  m_data['analysis'] = analysis
  return m_data


def automl(maxruntime,X,Y,df):
    aml = H2OAutoML(max_runtime_secs=maxruntime,exclude_algos = ['DeepLearning'])
    aml.train(x=X,y=y,training_frame=df)
    return aml


def dict_to_json(dct,n):  
    j = json.dumps(dct, indent=4)
    f = open(n, 'w')
    print(j, file=f)
    f.close()

generating a unique random ID for every runtime

In [38]:
run_id=alphabet(9)
if server_path==None:
  server_path=os.path.abspath(os.curdir)
os.chdir(server_path) 
run_dir = os.path.join(server_path,run_id)
os.mkdir(run_dir)
os.chdir(run_dir)    

# run_id to std out
print (run_id+"_1000sec")

0IwyUgWQK_1000sec


In [39]:
# meta data
meta_data = set_meta_data(analysis, run_id,server_path,data_path,test_path,model_path,target,run_time,classification,scale,max_models,balance_y,balance_threshold,name,run_dir,nthreads,min_mem_size)
print(meta_data)

{'start_time': 1556263586.0162807, 'target': 'Dataset', 'server_path': 'C:\\Users\\Manvi\\Anaconda3\\indian-liver-patient-records\\New folder\\MGN5nmDff', 'data_path': None, 'test_path': None, 'max_models': None, 'run_time': 1000, 'run_id': '0IwyUgWQK', 'scale': False, 'classification': True, 'model_path': None, 'balance': False, 'balance_threshold': 0.2, 'project': None, 'end_time': 1556263586.0162807, 'execution_time': 0.0, 'run_path': 'C:\\Users\\Manvi\\Anaconda3\\indian-liver-patient-records\\New folder\\MGN5nmDff\\0IwyUgWQK', 'nthreads': 1, 'min_mem_size': 0, 'analysis': 0}


In [40]:
y = target
X=[name for name in df.columns if name != y]
print(X)
print(y)

['Age', 'Gender', 'Total_Bilirubin', 'Direct_Bilirubin', 'Alkaline_Phosphotase', 'Alamine_Aminotransferase', 'Aspartate_Aminotransferase', 'Total_Protiens', 'Albumin', 'Albumin_and_Globulin_Ratio']
Dataset


In [41]:
meta_data['X']=X  
model_start_time = time.time()

In [42]:
if analysis == 3:
  classification=False
elif analysis == 2:
  classification=True
elif analysis == 1:
  classification=True

the dependent variable is of classification type

In [43]:
if classification:
    df[y] = df[y].asfactor()

In [44]:
classification=True
if classification:
    print(df[y].levels())

[['1', '2']]


# Runtime: 1000sec

In [46]:
aml3 = automl(1000,X,y,df)

AutoML progress: |████████████████████████████████████████████████████████| 100%


In [47]:
meta_data['run_time'] = 1000
meta_data['end_time'] = time.time()
meta_data['execution_time'] = meta_data['end_time'] - meta_data['start_time']

generating a leaderboard for best models

In [48]:
aml3.leaderboard

model_id,auc,logloss,mean_per_class_error,rmse,mse
GBM_1_AutoML_20190426_032640,0.755722,0.528181,0.30667,0.42215,0.17821
GBM_grid_1_AutoML_20190426_032651_model_7,0.75285,0.540896,0.301733,0.424004,0.179779
GLM_grid_1_AutoML_20190426_032640_model_1,0.751699,0.503103,0.2955,0.413658,0.171113
GLM_grid_1_AutoML_20190426_032651_model_1,0.751699,0.503103,0.2955,0.413658,0.171113
GBM_grid_1_AutoML_20190426_032651_model_12,0.748683,0.510543,0.30752,0.416134,0.173168
GBM_1_AutoML_20190426_032651,0.748316,0.528733,0.302014,0.422315,0.17835
GBM_grid_1_AutoML_20190426_032651_model_11,0.74661,0.513226,0.302431,0.417403,0.174225
DRF_1_AutoML_20190426_032640,0.745617,0.626455,0.310384,0.421702,0.177832
GBM_grid_1_AutoML_20190426_032651_model_17,0.744811,0.565738,0.298408,0.436902,0.190883
GBM_5_AutoML_20190426_032651,0.744674,0.515969,0.297249,0.417489,0.174297




In [49]:
aml3_leaderboard_df=aml3.leaderboard.as_data_frame()
aml3_leaderboard_df

Unnamed: 0,model_id,auc,logloss,mean_per_class_error,rmse,mse
0,GBM_1_AutoML_20190426_032640,0.755722,0.528181,0.30667,0.42215,0.17821
1,GBM_grid_1_AutoML_20190426_032651_model_7,0.75285,0.540896,0.301733,0.424004,0.179779
2,GLM_grid_1_AutoML_20190426_032640_model_1,0.751699,0.503103,0.2955,0.413658,0.171113
3,GLM_grid_1_AutoML_20190426_032651_model_1,0.751699,0.503103,0.2955,0.413658,0.171113
4,GBM_grid_1_AutoML_20190426_032651_model_12,0.748683,0.510543,0.30752,0.416134,0.173168
5,GBM_1_AutoML_20190426_032651,0.748316,0.528733,0.302014,0.422315,0.17835
6,GBM_grid_1_AutoML_20190426_032651_model_11,0.74661,0.513226,0.302431,0.417403,0.174225
7,DRF_1_AutoML_20190426_032640,0.745617,0.626455,0.310384,0.421702,0.177832
8,GBM_grid_1_AutoML_20190426_032651_model_17,0.744811,0.565738,0.298408,0.436902,0.190883
9,GBM_5_AutoML_20190426_032651,0.744674,0.515969,0.297249,0.417489,0.174297


In [50]:
length = len(aml3_leaderboard_df)
length
meta_data["models_generated"] = length

In [51]:
# save leaderboard
leaderboard_stats=run_id+'_1000_leaderboard.csv'
aml3_leaderboard_df.to_csv(leaderboard_stats)

In [52]:
aml3_leaderboard_df=aml3.leaderboard.as_data_frame()
model3_set=aml3_leaderboard_df['model_id']
model3_set

0                   GBM_1_AutoML_20190426_032640
1      GBM_grid_1_AutoML_20190426_032651_model_7
2      GLM_grid_1_AutoML_20190426_032640_model_1
3      GLM_grid_1_AutoML_20190426_032651_model_1
4     GBM_grid_1_AutoML_20190426_032651_model_12
5                   GBM_1_AutoML_20190426_032651
6     GBM_grid_1_AutoML_20190426_032651_model_11
7                   DRF_1_AutoML_20190426_032640
8     GBM_grid_1_AutoML_20190426_032651_model_17
9                   GBM_5_AutoML_20190426_032651
10     GBM_grid_1_AutoML_20190426_032651_model_3
11    GBM_grid_1_AutoML_20190426_032651_model_20
12    GBM_grid_1_AutoML_20190426_032651_model_19
13    GBM_grid_1_AutoML_20190426_032651_model_21
14     GBM_grid_1_AutoML_20190426_032651_model_5
15     GBM_grid_1_AutoML_20190426_032651_model_2
16                  XRT_1_AutoML_20190426_032651
17    GBM_grid_1_AutoML_20190426_032651_model_13
18                  DRF_1_AutoML_20190426_032651
19                  GBM_2_AutoML_20190426_032651
20                  

generating best parameters and storing them in a json file

In [53]:
count = 0;
for i in model3_set:
    count = count+1;
for i in range(0,count):
    mod_best=h2o.get_model(model3_set[i])
    parameters = mod_best.params
    n= str((model3_set[i]))+'__1000'
    dict_to_json(parameters,n)

In [54]:
# Update and save meta data
n=run_id+'_1000'+'_meta_data.json'
dict_to_json(meta_data,n)

In [55]:
meta_data

{'start_time': 1556263586.0162807,
 'target': 'Dataset',
 'server_path': 'C:\\Users\\Manvi\\Anaconda3\\indian-liver-patient-records\\New folder\\MGN5nmDff',
 'data_path': None,
 'test_path': None,
 'max_models': None,
 'run_time': 1000,
 'run_id': '0IwyUgWQK',
 'scale': False,
 'classification': True,
 'model_path': None,
 'balance': False,
 'balance_threshold': 0.2,
 'project': None,
 'end_time': 1556263628.5971706,
 'execution_time': 42.58088994026184,
 'run_path': 'C:\\Users\\Manvi\\Anaconda3\\indian-liver-patient-records\\New folder\\MGN5nmDff\\0IwyUgWQK',
 'nthreads': 1,
 'min_mem_size': 0,
 'analysis': 0,
 'X': ['Age',
  'Gender',
  'Total_Bilirubin',
  'Direct_Bilirubin',
  'Alkaline_Phosphotase',
  'Alamine_Aminotransferase',
  'Aspartate_Aminotransferase',
  'Total_Protiens',
  'Albumin',
  'Albumin_and_Globulin_Ratio'],
 'models_generated': 34}

In [56]:
mods=mod_best.coef_norm
print(mods)

Model Details
H2OGradientBoostingEstimator :  Gradient Boosting Machine
Model Key:  GBM_grid_1_AutoML_20190426_032651_model_10


ModelMetricsBinomial: gbm
** Reported on train data. **

MSE: 0.20300791755604158
RMSE: 0.4505639994007972
LogLoss: 0.5955451923981491
Mean Per-Class Error: 0.26224234223859977
AUC: 0.7980481345002304
pr_auc: 0.5701186056474624
Gini: 0.5960962690004608
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.2876321470623677: 


0,1,2,3,4
,1.0,2.0,Error,Rate
1,285.0,131.0,0.3149,(131.0/416.0)
2,35.0,132.0,0.2096,(35.0/167.0)
Total,320.0,263.0,0.2847,(166.0/583.0)


Maximum Metrics: Maximum metrics at their respective thresholds



0,1,2,3
metric,threshold,value,idx
max f1,0.2876321,0.6139535,167.0
max f2,0.2836293,0.7502287,284.0
max f0point5,0.2894218,0.5751015,97.0
max accuracy,0.2905630,0.7598628,45.0
max precision,0.2925442,1.0,0.0
max recall,0.2806901,1.0,386.0
max specificity,0.2925442,1.0,0.0
max absolute_mcc,0.2876321,0.4320326,167.0
max min_per_class_accuracy,0.2881212,0.7185629,149.0


Gains/Lift Table: Avg response rate: 28.64 %, avg score: 28.64 %



0,1,2,3,4,5,6,7,8,9,10,11,12,13
,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
,1,0.0102916,0.2916769,2.3273453,2.3273453,0.6666667,0.2921361,0.6666667,0.2921361,0.0239521,0.0239521,132.7345309,132.7345309
,2,0.0205832,0.2915105,2.9091816,2.6182635,0.8333333,0.2916084,0.75,0.2918723,0.0299401,0.0538922,190.9181637,161.8263473
,3,0.0308748,0.2914106,1.7455090,2.3273453,0.5,0.2914430,0.6666667,0.2917292,0.0179641,0.0718563,74.5508982,132.7345309
,4,0.0411664,0.2913202,2.3273453,2.3273453,0.6666667,0.2913544,0.6666667,0.2916355,0.0239521,0.0958084,132.7345309,132.7345309
,5,0.0514580,0.2911665,2.3273453,2.3273453,0.6666667,0.2912377,0.6666667,0.2915559,0.0239521,0.1197605,132.7345309,132.7345309
,6,0.1012007,0.2905650,2.6483585,2.4851314,0.7586207,0.2908105,0.7118644,0.2911895,0.1317365,0.2514970,164.8358456,148.5131432
,7,0.1509434,0.2901804,1.4445592,2.1422156,0.4137931,0.2904069,0.6136364,0.2909316,0.0718563,0.3233533,44.4559158,114.2215569
,8,0.2006861,0.2897318,1.8056989,2.0588055,0.5172414,0.2899646,0.5897436,0.2906919,0.0898204,0.4131737,80.5698947,105.8805466
,9,0.3001715,0.2888653,1.5649391,1.8951240,0.4482759,0.2893426,0.5428571,0.2902447,0.1556886,0.5688623,56.4939087,89.5124038




ModelMetricsBinomial: gbm
** Reported on cross-validation data. **

MSE: 0.20353498798076544
RMSE: 0.45114852097814245
LogLoss: 0.5968293551795263
Mean Per-Class Error: 0.3873934822662368
AUC: 0.6015373099953938
pr_auc: 0.3517442816326592
Gini: 0.20307461999078757
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.2824158658102547: 


0,1,2,3,4
,1.0,2.0,Error,Rate
1,146.0,270.0,0.649,(270.0/416.0)
2,21.0,146.0,0.1257,(21.0/167.0)
Total,167.0,416.0,0.4991,(291.0/583.0)


Maximum Metrics: Maximum metrics at their respective thresholds



0,1,2,3
metric,threshold,value,idx
max f1,0.2824159,0.5008576,277.0
max f2,0.2799921,0.6938422,320.0
max f0point5,0.2824159,0.3986892,277.0
max accuracy,0.2987303,0.7135506,1.0
max precision,0.2987303,0.5,1.0
max recall,0.2759622,1.0,386.0
max specificity,0.2990493,0.9975962,0.0
max absolute_mcc,0.2823856,0.2256390,278.0
max min_per_class_accuracy,0.2859404,0.5144231,202.0


Gains/Lift Table: Avg response rate: 28.64 %, avg score: 28.64 %



0,1,2,3,4,5,6,7,8,9,10,11,12,13
,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
,1,0.0102916,0.2979950,0.5818363,0.5818363,0.1666667,0.2984656,0.1666667,0.2984656,0.0059880,0.0059880,-41.8163673,-41.8163673
,2,0.0205832,0.2975376,1.7455090,1.1636727,0.5,0.2978423,0.3333333,0.2981539,0.0179641,0.0239521,74.5508982,16.3672655
,3,0.0308748,0.2972454,2.3273453,1.5515635,0.6666667,0.2973603,0.4444444,0.2978894,0.0239521,0.0479042,132.7345309,55.1563540
,4,0.0411664,0.2969389,1.1636727,1.4545908,0.3333333,0.2971200,0.4166667,0.2976970,0.0119760,0.0598802,16.3672655,45.4590818
,5,0.0514580,0.2967761,1.7455090,1.5127745,0.5,0.2968593,0.4333333,0.2975295,0.0179641,0.0778443,74.5508982,51.2774451
,6,0.1012007,0.2955090,1.5649391,1.5384147,0.4482759,0.2962400,0.4406780,0.2968957,0.0778443,0.1556886,56.4939087,53.8414696
,7,0.1509434,0.2942653,1.3241792,1.4678144,0.3793103,0.2949755,0.4204545,0.2962629,0.0658683,0.2215569,32.4179228,46.7814371
,8,0.2006861,0.2930706,0.8426595,1.3128615,0.2413793,0.2936996,0.3760684,0.2956275,0.0419162,0.2634731,-15.7340491,31.2861457
,9,0.3001715,0.2896211,1.0834194,1.2368178,0.3103448,0.2913015,0.3542857,0.2941937,0.1077844,0.3712575,8.3419368,23.6817793



Cross-Validation Metrics Summary: 


0,1,2,3,4,5,6,7
,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.6980843,0.0255204,0.7264957,0.6837607,0.7008547,0.7413793,0.6379311
auc,0.7522242,0.0217422,0.7493141,0.7659415,0.7088122,0.8010417,0.7360115
err,0.3019157,0.0255204,0.2735043,0.3162393,0.2991453,0.2586207,0.3620690
err_count,35.2,2.946184,32.0,37.0,35.0,30.0,42.0
f0point5,0.5259607,0.0329689,0.5681818,0.4981550,0.4816514,0.5952381,0.4865772
f1,0.5990568,0.0282026,0.6097561,0.5934066,0.5454546,0.6666667,0.58
f2,0.6991461,0.0340805,0.6578947,0.7336956,0.6287425,0.7575757,0.7178218
lift_top_group,1.792061,0.8500546,1.625,3.7741935,1.95,1.6111112,0.0
logloss,0.5968756,0.0139331,0.6160719,0.5768826,0.5707051,0.6182513,0.602467


Scoring History: 


0,1,2,3,4,5,6,7,8,9
,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_auc,training_pr_auc,training_lift,training_classification_error
,2019-04-26 03:27:03,2.296 sec,0.0,0.4521019,0.5989418,0.5,0.0,1.0,0.7135506
,2019-04-26 03:27:03,2.304 sec,5.0,0.4518615,0.5984103,0.7882456,0.5683848,3.0546407,0.3739280
,2019-04-26 03:27:03,2.311 sec,10.0,0.4516346,0.5979087,0.7889509,0.5585189,2.3273453,0.2881647
,2019-04-26 03:27:03,2.318 sec,15.0,0.4513954,0.5973804,0.7919234,0.5613203,2.9091816,0.2950257
,2019-04-26 03:27:03,2.325 sec,20.0,0.4510953,0.5967180,0.7948598,0.5738246,2.9091816,0.2847341
,2019-04-26 03:27:03,2.332 sec,25.0,0.4508387,0.5961515,0.7969038,0.5710858,2.9091816,0.2915952
,2019-04-26 03:27:03,2.338 sec,30.0,0.4505640,0.5955452,0.7980481,0.5701186,2.3273453,0.2847341


Variable Importances: 


0,1,2,3
variable,relative_importance,scaled_importance,percentage
Alkaline_Phosphotase,75.9798126,1.0,0.2233020
Total_Bilirubin,56.1800461,0.7394075,0.1651112
Aspartate_Aminotransferase,53.3104439,0.7016396,0.1566775
Direct_Bilirubin,49.1745148,0.6472050,0.1445222
Alamine_Aminotransferase,46.7647018,0.6154885,0.1374398
Age,22.2756271,0.2931782,0.0654673
Albumin_and_Globulin_Ratio,20.2417564,0.2664097,0.0594898
Albumin,10.1361151,0.1334054,0.0297897
Total_Protiens,4.4956541,0.0591691,0.0132126


<bound method ModelBase.coef_norm of >


# Conclusion

1.Models have been generated through H2OAutoML for runtime of 1000secs.

2.A leaderboard is obtained listing the best models 

3.Best models are considered based on metrics like rmse,mse,auc,logloss.

4.Model from GBM is considered as best

# Contribution

selected a dataset and performed H2O algorithm to generate a leaderboard of best models

# Citations

https://github.com/prabhuSub/Hyperparamter-Samples
    
https://machinelearningmastery.com/vector-norms-machine-learning/
    
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/grid-search.html?highlight=hyperparameters#supported-grid-search-hyperparameters


# License

Copyright 2019 Manogjna Potluri 

Copyright 2019 Manvitha Jagadam


Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE
