# Modeling with h2o

In this notebook the h2o library is employed to produce a machine learning model from the dataset generated in the 'Processing' notebook.

### Importing and intializing h2o

In [1]:
import h2o
from h2o.automl import H2OAutoML

In [2]:
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321 . connected.


0,1
H2O_cluster_uptime:,15 mins 30 secs
H2O_cluster_timezone:,Europe/Berlin
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.36.0.3
H2O_cluster_version_age:,"7 days, 15 hours and 47 minutes"
H2O_cluster_name:,H2O_from_python_diego_nfazhu
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,1.889 Gb
H2O_cluster_total_cores:,8
H2O_cluster_allowed_cores:,8


### Running h2o 

h2o is run with the selected parameters.

In [64]:
train=h2o.import_file('paket_h2o.csv')

X=train.columns
y='price'
X.remove(y)

# 20 modelos

aml=H2OAutoML(max_models=45, seed=666)
aml.train(x=X, y=y, training_frame=train)

Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
AutoML progress: |███████████████████████████████████████████████████████████████| (done) 100%
Model Details
H2OStackedEnsembleEstimator :  Stacked Ensemble
Model Key:  StackedEnsemble_BestOfFamily_7_AutoML_8_20220224_143041

No model summary for this model

ModelMetricsRegressionGLM: stackedensemble
** Reported on train data. **

MSE: 4950.241807906329
RMSE: 70.35795483032696
MAE: 46.967183297213225
RMSLE: 0.3549683005887495
R^2: 0.5439820257291558
Mean Residual Deviance: 4950.241807906329
Null degrees of freedom: 3376
Residual degrees of freedom: 3370
Null deviance: 36658569.46106011
Residual deviance: 16716966.585299673
AIC: 38328.29723919774

ModelMetricsRegressionGLM: stackedensemble
** Reported on cross-validation data. **

MSE: 6099.666845800276
RMSE: 78.10036392873131
MAE: 51.08668843948363
RMSLE: 0.3833067531078766
R^2: 0.4380965967548739
Mean Residual Deviance: 6099.666845800276
Nul



### Displaying h2o leaderboard

In [61]:
lb=aml.leaderboard

lb.head(rows=lb.nrows)

model_id,mean_residual_deviance,rmse,mse,mae,rmsle
StackedEnsemble_AllModels_4_AutoML_7_20220224_134638,6001.36,77.4684,6001.36,50.3211,0.376633
StackedEnsemble_AllModels_3_AutoML_7_20220224_134638,6011.37,77.533,6011.37,50.1835,0.375806
StackedEnsemble_BestOfFamily_7_AutoML_7_20220224_134638,6019.83,77.5876,6019.83,50.4918,0.380658
StackedEnsemble_BestOfFamily_4_AutoML_7_20220224_134638,6021.32,77.5972,6021.32,50.4172,0.384015
StackedEnsemble_BestOfFamily_3_AutoML_7_20220224_134638,6032.49,77.6691,6032.49,50.616,
StackedEnsemble_AllModels_7_AutoML_7_20220224_134638,6041.64,77.728,6041.64,50.637,0.379281
StackedEnsemble_AllModels_2_AutoML_7_20220224_134638,6044.26,77.7449,6044.26,50.5495,0.37685
StackedEnsemble_BestOfFamily_2_AutoML_7_20220224_134638,6051.96,77.7943,6051.96,50.6635,
StackedEnsemble_AllModels_1_AutoML_7_20220224_134638,6092.01,78.0513,6092.01,50.6772,
DeepLearning_grid_1_AutoML_7_20220224_134638_model_1,6100.33,78.1046,6100.33,50.3109,




In [62]:
aml.leader

Model Details
H2OStackedEnsembleEstimator :  Stacked Ensemble
Model Key:  StackedEnsemble_AllModels_4_AutoML_7_20220224_134638

No model summary for this model

ModelMetricsRegressionGLM: stackedensemble
** Reported on train data. **

MSE: 4367.06907930702
RMSE: 66.08380345672471
MAE: 43.522073421041526
RMSLE: 0.3305753881561138
R^2: 0.5977040976330199
Mean Residual Deviance: 4367.06907930702
Null degrees of freedom: 3376
Residual degrees of freedom: 3364
Null deviance: 36658569.46106011
Residual deviance: 14747592.280819807
AIC: 37917.00942797332

ModelMetricsRegressionGLM: stackedensemble
** Reported on cross-validation data. **

MSE: 6001.35639184639
RMSE: 77.46842190109716
MAE: 50.32112679813683
RMSLE: 0.37663259436246005
R^2: 0.4471529895133237
Mean Residual Deviance: 6001.35639184639
Null degrees of freedom: 3376
Residual degrees of freedom: 3365
Null deviance: 36689922.40283111
Residual deviance: 20266580.53526526
AIC: 38988.535494235715




### Preparing h2o prediction 

In [63]:
test=h2o.import_file('paket_test.csv')

Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%


In [51]:
A = aml.leader.predict(test)
A.shape

stackedensemble prediction progress: |███████████████████████████████████████████| (done) 100%


(1389, 1)

### Exporting the prediction

In [52]:
type(A)

h2o.frame.H2OFrame

In [53]:
import pandas as pd

A = A.as_data_frame()
A['id']=[str(i) for i in range(len(A))]
A['price']=A['predict']
A=A.drop('predict', axis=1)
A.head()
A.to_csv("WE.csv", index = False, header=True)

In [54]:
A.head()
print(A.shape)

(1389, 2)
