# H2O

Let's use [H20 AutoML](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html) and see what we can build. This seems like 'stacking, the easy way out'. 

In [1]:
import h2o
from h2o.automl import H2OAutoML

h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: java version "11.0.10" 2021-01-19 LTS; Java(TM) SE Runtime Environment 18.9 (build 11.0.10+8-LTS-162); Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.10+8-LTS-162, mixed mode)
  Starting server from /Users/king/opt/anaconda3/envs/tabular/lib/python3.7/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/wn/c096zq791xd853brbq55tmq80000gn/T/tmpwgzdhugk
  JVM stdout: /var/folders/wn/c096zq791xd853brbq55tmq80000gn/T/tmpwgzdhugk/h2o_king_started_from_python.out
  JVM stderr: /var/folders/wn/c096zq791xd853brbq55tmq80000gn/T/tmpwgzdhugk/h2o_king_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,02 secs
H2O_cluster_timezone:,Europe/Athens
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.32.0.4
H2O_cluster_version_age:,1 month and 12 days
H2O_cluster_name:,H2O_from_python_king_ucprj7
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,2 Gb
H2O_cluster_total_cores:,8
H2O_cluster_allowed_cores:,8


In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import pickle
from pathlib import Path
from tqdm.notebook import trange, tqdm
### USE FOR LOCAL JUPYTER NOTEBOOKS ###
DOWNLOAD_DIR = Path('../download')
DATA_DIR = Path('../data')
SUBMISSIONS_DIR = Path('../submissions')
MODEL_DIR = Path('../models')
#######################################

# Paths must be strings
X = h2o.import_file(path='../download/train_values.csv')
y = h2o.import_file(path='../download/train_labels.csv')
y['damage_grade'] = y['damage_grade'].asfactor()
data = X.merge(y)
y_str = 'damage_grade'

Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%


In [3]:
data.drop('building_id')

geo_level_1_id,geo_level_2_id,geo_level_3_id,count_floors_pre_eq,age,area_percentage,height_percentage,land_surface_condition,foundation_type,roof_type,ground_floor_type,other_floor_type,position,plan_configuration,has_superstructure_adobe_mud,has_superstructure_mud_mortar_stone,has_superstructure_stone_flag,has_superstructure_cement_mortar_stone,has_superstructure_mud_mortar_brick,has_superstructure_cement_mortar_brick,has_superstructure_timber,has_superstructure_bamboo,has_superstructure_rc_non_engineered,has_superstructure_rc_engineered,has_superstructure_other,legal_ownership_status,count_families,has_secondary_use,has_secondary_use_agriculture,has_secondary_use_hotel,has_secondary_use_rental,has_secondary_use_institution,has_secondary_use_school,has_secondary_use_industry,has_secondary_use_health_post,has_secondary_use_gov_office,has_secondary_use_use_police,has_secondary_use_other,damage_grade
30,266,1224,1,25,5,2,t,r,n,f,j,s,d,0,1,0,0,0,0,0,0,0,0,0,v,0,0,0,0,0,0,0,0,0,0,0,0,2
17,409,12182,2,0,13,7,t,r,n,f,q,s,d,0,1,0,0,0,0,0,0,0,0,0,v,1,0,0,0,0,0,0,0,0,0,0,0,3
17,716,7056,2,5,12,6,o,r,q,f,q,s,d,0,1,0,0,0,0,0,0,0,0,0,v,1,0,0,0,0,0,0,0,0,0,0,0,3
4,651,105,2,80,5,4,n,r,n,f,q,s,d,0,1,0,0,0,0,0,0,0,0,0,v,1,0,0,0,0,0,0,0,0,0,0,0,2
3,1387,3909,5,40,5,10,t,r,n,f,q,o,d,0,0,0,0,1,0,0,0,0,0,0,v,1,0,0,0,0,0,0,0,0,0,0,0,2
26,1132,6645,2,0,6,6,t,w,n,f,x,s,d,0,0,0,0,0,0,1,0,0,0,0,a,1,0,0,0,0,0,0,0,0,0,0,0,1
8,1297,9721,2,0,2,6,t,r,n,f,x,s,d,0,1,1,0,0,0,0,0,0,0,0,v,1,0,0,0,0,0,0,0,0,0,0,0,3
6,398,4512,2,30,10,5,t,r,n,f,q,t,d,0,1,0,0,0,0,0,0,0,0,0,v,0,1,1,0,0,0,0,0,0,0,0,0,3
7,555,2763,3,40,5,6,t,r,n,f,q,s,d,0,1,0,0,0,0,0,0,0,0,0,v,2,0,0,0,0,0,0,0,0,0,0,0,2
20,508,10459,2,5,7,6,t,w,q,f,q,s,d,0,1,0,0,0,0,0,1,0,0,0,v,1,0,0,0,0,0,0,0,0,0,0,0,1




In [4]:
aml = H2OAutoML(max_models=30, seed=1)
aml.train(y=y_str, training_frame=data.drop('building_id'))

AutoML progress: |████████████████████████████████████████████████████████| 100%


In [5]:
lb = aml.leaderboard
lb.head(rows=lb.nrows)

model_id,mean_per_class_error,logloss,rmse,mse,auc,aucpr
XGBoost_grid__1_AutoML_20210313_140139_model_9,0.328603,0.588312,0.435151,0.189356,,
XGBoost_grid__1_AutoML_20210313_140139_model_6,0.329042,0.573337,0.432224,0.186817,,
StackedEnsemble_AllModels_AutoML_20210313_140139,0.329807,0.578588,0.432323,0.186903,,
XGBoost_grid__1_AutoML_20210313_140139_model_7,0.329828,0.572672,0.431514,0.186205,,
XGBoost_grid__1_AutoML_20210313_140139_model_5,0.331112,0.579035,0.433962,0.188323,,
XGBoost_grid__1_AutoML_20210313_140139_model_3,0.331267,0.588748,0.435529,0.189685,,
XGBoost_2_AutoML_20210313_140139,0.331618,0.593365,0.436941,0.190918,,
XGBoost_1_AutoML_20210313_140139,0.333698,0.578879,0.434243,0.188567,,
StackedEnsemble_BestOfFamily_AutoML_20210313_140139,0.336732,0.588068,0.436551,0.190576,,
XGBoost_grid__1_AutoML_20210313_140139_model_4,0.345828,0.591415,0.440666,0.194186,,




In [6]:
aml.leader

Model Details
H2OXGBoostEstimator :  XGBoost
Model Key:  XGBoost_grid__1_AutoML_20210313_140139_model_9


Model Summary: 


Unnamed: 0,Unnamed: 1,number_of_trees
0,,92.0




ModelMetricsMultinomial: xgboost
** Reported on train data. **

MSE: 0.10055552790234056
RMSE: 0.31710491623804976
LogLoss: 0.33330343766730575
Mean Per-Class Error: 0.15641388177426996
AUC: NaN
AUCPR: NaN
Multinomial auc values: Table is not computed because it is disabled (model parameter 'auc_type' is set to AUTO or NONE) or due to domain size (maximum is 50 domains).
Multinomial auc_pr values: Table is not computed because it is disabled (model parameter 'auc_type' is set to AUTO or NONE) or due to domain size (maximum is 50 domains).

Confusion Matrix: Row labels: Actual class; Column labels: Predicted class


Unnamed: 0,1,2,3,Error,Rate
0,19813.0,4977.0,334.0,0.211391,"5,311 / 25,124"
1,1793.0,138085.0,8381.0,0.068623,"10,174 / 148,259"
2,381.0,16123.0,70714.0,0.189227,"16,504 / 87,218"
3,21987.0,159185.0,79429.0,0.122751,"31,989 / 260,601"



Top-3 Hit Ratios: 


Unnamed: 0,k,hit_ratio
0,1,0.877249
1,2,0.993553
2,3,1.0



ModelMetricsMultinomial: xgboost
** Reported on cross-validation data. **

MSE: 0.18935641260901265
RMSE: 0.43515102275992945
LogLoss: 0.5883120669077554
Mean Per-Class Error: 0.3286031781439564
AUC: NaN
AUCPR: NaN
Multinomial auc values: Table is not computed because it is disabled (model parameter 'auc_type' is set to AUTO or NONE) or due to domain size (maximum is 50 domains).
Multinomial auc_pr values: Table is not computed because it is disabled (model parameter 'auc_type' is set to AUTO or NONE) or due to domain size (maximum is 50 domains).

Confusion Matrix: Row labels: Actual class; Column labels: Predicted class


Unnamed: 0,1,2,3,Error,Rate
0,13467.0,11154.0,503.0,0.463979,"11,657 / 25,124"
1,6132.0,123004.0,19123.0,0.170344,"25,255 / 148,259"
2,629.0,30027.0,56562.0,0.351487,"30,656 / 87,218"
3,20228.0,164185.0,76188.0,0.259278,"67,568 / 260,601"



Top-3 Hit Ratios: 


Unnamed: 0,k,hit_ratio
0,1,0.740722
1,2,0.976412
2,3,1.0



Cross-Validation Metrics Summary: 


Unnamed: 0,Unnamed: 1,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
0,accuracy,0.7407224,0.0017923272,0.73914546,0.74167305,0.7431312,0.7388143,0.74084806
1,auc,,0.0,,,,,
2,aucpr,,0.0,,,,,
3,err,0.25927758,0.0017923272,0.26085454,0.25832695,0.25686875,0.26118574,0.25915197
4,err_count,13513.6,93.473526,13596.0,13464.0,13388.0,13613.0,13507.0
5,logloss,0.58831203,0.0064428956,0.59351325,0.58420044,0.5795625,0.59498745,0.58929664
6,max_per_class_error,0.46395004,0.005510371,0.46662742,0.45450917,0.46414787,0.4658976,0.46856812
7,mean_per_class_accuracy,0.67140377,0.0022805822,0.6693269,0.6747946,0.6726718,0.66995215,0.6702733
8,mean_per_class_error,0.32859623,0.0022805822,0.3306731,0.3252054,0.3273282,0.33004782,0.3297267
9,mse,0.1893564,0.001390354,0.1903943,0.1883722,0.18759157,0.19095622,0.18946774



Scoring History: 


Unnamed: 0,Unnamed: 1,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_classification_error,training_auc,training_pr_auc
0,,2021-03-14 05:17:34,12:32:58.927,0.0,0.666667,1.098612,0.66532,,
1,,2021-03-14 05:18:22,12:33:47.176,5.0,0.46844,0.634673,0.232835,,
2,,2021-03-14 05:19:10,12:34:34.370,10.0,0.42049,0.535461,0.215997,,
3,,2021-03-14 05:19:57,12:35:22.214,15.0,0.398864,0.490581,0.203119,,
4,,2021-03-14 05:20:43,12:36:07.977,20.0,0.387153,0.467089,0.19417,,
5,,2021-03-14 05:21:31,12:36:55.515,25.0,0.378896,0.450472,0.186814,,
6,,2021-03-14 05:22:18,12:37:42.359,30.0,0.372198,0.436842,0.180544,,
7,,2021-03-14 05:23:04,12:38:28.922,35.0,0.367301,0.427003,0.175368,,
8,,2021-03-14 05:23:52,12:39:16.702,40.0,0.361265,0.415102,0.169032,,
9,,2021-03-14 05:24:40,12:40:04.672,45.0,0.356198,0.405174,0.163407,,



Variable Importances: 


Unnamed: 0,variable,relative_importance,scaled_importance,percentage
0,geo_level_1_id,102377.023438,1.0,0.202757
1,geo_level_2_id,94625.773438,0.924287,0.187405
2,geo_level_3_id,82079.992188,0.801742,0.162559
3,foundation_type.r,33353.753906,0.325793,0.066057
4,area_percentage,33165.886719,0.323958,0.065685
5,age,32660.517578,0.319022,0.064684
6,height_percentage,18750.371094,0.18315,0.037135
7,has_superstructure_mud_mortar_stone,11508.138672,0.112409,0.022792
8,ground_floor_type.v,10634.819336,0.103879,0.021062
9,count_floors_pre_eq,6107.184082,0.059654,0.012095



See the whole table with table.as_data_frame()




In [7]:
X_test = h2o.import_file(path='../download/test_values.csv')

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [8]:
preds = aml.predict(X_test)['predict']

xgboost prediction progress: |████████████████████████████████████████████| 100%


In [9]:
building_id_df = h2o.as_list(X_test['building_id'])
preds_def = h2o.as_list(preds)
my_sub = pd.concat([building_id_df, preds_def], axis=1)
my_sub = my_sub.set_index('building_id')
title = '../submissions/03-13 h2o AutoML - 30 models - seed=1 - no data preprocessing'
my_sub.to_csv(title)

In [10]:
my_sub

Unnamed: 0_level_0,predict
building_id,Unnamed: 1_level_1
300051,3
99355,2
890251,2
745817,1
421793,3
...,...
310028,2
663567,2
1049160,2
442785,2
