# Week 1 - Iris Deep Learning

To get familiar with the H2O interface and flow, let's train a deep learning model with H2O using the Iris dataset.  How easy is this to do?

In [1]:
# import h2o - that's it!
import h2o

In [2]:
# check to see if H2O is running already and connect, or start it up
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "1.8.0_232"; OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-0ubuntu1~19.04.1-b09); OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
  Starting server from /home/megan/Projects/h2oclass/lib/python3.7/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmp_akayob7
  JVM stdout: /tmp/tmp_akayob7/h2o_megan_started_from_python.out
  JVM stderr: /tmp/tmp_akayob7/h2o_megan_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O cluster uptime:,01 secs
H2O cluster timezone:,America/Chicago
H2O data parsing timezone:,UTC
H2O cluster version:,3.28.0.3
H2O cluster version age:,4 days
H2O cluster name:,H2O_from_python_megan_o30yrl
H2O cluster total nodes:,1
H2O cluster free memory:,1.520 Gb
H2O cluster total cores:,3
H2O cluster allowed cores:,3


In [3]:
# where we will download the iris data set from
url = 'http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv'

In [4]:
# import the iris dataset
iris = h2o.import_file(url)

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [5]:
# split the data into 80% train and 20% test (approximately)
train, test = iris.split_frame([0.8])

In [6]:
# review a summary of our training data
train.summary()

Unnamed: 0,sepal_len,sepal_wid,petal_len,petal_wid,class
type,real,real,real,real,enum
mins,4.3,2.0,1.1,0.1,
mean,5.902521008403363,3.023529411764705,3.9117647058823533,1.252941176470588,
maxs,7.9,4.4,6.9,2.5,
sigma,0.8513660194834988,0.42240351066549164,1.7448314216365453,0.7477450314804432,
zeros,0,0,0,0,
missing,0,0,0,0,0
0,4.9,3.0,1.4,0.2,Iris-setosa
1,4.6,3.1,1.5,0.2,Iris-setosa
2,5.0,3.6,1.4,0.2,Iris-setosa


In [7]:
# how many rows in training - not exactly 80%
train.nrows

119

In [8]:
# how many rows in testing - not exactly 20%
test.nrows

31

In [9]:
# import the deep learning estimator (like sklearn interface)
from h2o.estimators.deeplearning import H2ODeepLearningEstimator

In [10]:
# train the model
# can specify column numbers or names - names is more readable imo
mDL = H2ODeepLearningEstimator()
mDL.train(['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid'], 'class', train)

deeplearning Model Build progress: |██████████████████████████████████████| 100%


In [11]:
# view the model
mDL

Model Details
H2ODeepLearningEstimator :  Deep Learning
Model Key:  DeepLearning_model_python_1581297734311_1


Status of Neuron Layers: predicting class, 3-class classification, multinomial distribution, CrossEntropy loss, 41,803 weights/biases, 498.2 KB, 1,190 training samples, mini-batch size 1


Unnamed: 0,Unnamed: 1,layer,units,type,dropout,l1,l2,mean_rate,rate_rms,momentum,mean_weight,weight_rms,mean_bias,bias_rms
0,,1,4,Input,0.0,,,,,,,,,
1,,2,200,Rectifier,0.0,0.0,0.0,0.00369556,0.00338471,0.0,-0.00382494,0.102527,0.489776,0.00834126
2,,3,200,Rectifier,0.0,0.0,0.0,0.0136753,0.069432,0.0,-0.000822689,0.0698648,0.999005,0.00346705
3,,4,3,Softmax,,0.0,0.0,0.00909187,0.0706261,0.0,0.0139605,0.409415,-0.000829714,0.000349181




ModelMetricsMultinomial: deeplearning
** Reported on train data. **

MSE: 0.01809790230710012
RMSE: 0.1345284442305794
LogLoss: 0.06472600646062063
Mean Per-Class Error: 0.015151515151515152

Confusion Matrix: Row labels: Actual class; Column labels: Predicted class


Unnamed: 0,Iris-setosa,Iris-versicolor,Iris-virginica,Error,Rate
0,35.0,0.0,0.0,0.0,0 / 35
1,0.0,40.0,0.0,0.0,0 / 40
2,0.0,2.0,42.0,0.045455,2 / 44
3,35.0,42.0,42.0,0.016807,2 / 119



Top-3 Hit Ratios: 


Unnamed: 0,k,hit_ratio
0,1,0.983193
1,2,1.0
2,3,1.0



Scoring History: 


Unnamed: 0,Unnamed: 1,timestamp,duration,training_speed,epochs,iterations,samples,training_rmse,training_logloss,training_r2,training_classification_error
0,,2020-02-09 19:33:11,0.000 sec,,0.0,0,0.0,,,,
1,,2020-02-09 19:33:11,0.591 sec,411 obs/sec,1.0,1,119.0,0.568377,1.715992,0.509147,0.336134
2,,2020-02-09 19:33:12,1.236 sec,1287 obs/sec,10.0,10,1190.0,0.134528,0.064726,0.972502,0.016807



Variable Importances: 


Unnamed: 0,variable,relative_importance,scaled_importance,percentage
0,petal_wid,1.0,1.0,0.270421
1,sepal_wid,0.93165,0.93165,0.251938
2,petal_len,0.897872,0.897872,0.242804
3,sepal_len,0.868409,0.868409,0.234837




In [12]:
# predict on the test set
p = mDL.predict(test)

deeplearning prediction progress: |███████████████████████████████████████| 100%


In [13]:
# review predictions
p

predict,Iris-setosa,Iris-versicolor,Iris-virginica
Iris-setosa,0.997844,0.0021562,4.01707e-18
Iris-setosa,0.998985,0.00101524,7.80419e-18
Iris-setosa,0.997612,0.00238806,9.5431e-18
Iris-setosa,0.995164,0.00483646,4.30798e-18
Iris-setosa,0.996422,0.00357776,2.17748e-17
Iris-setosa,0.997411,0.00258868,1.44822e-17
Iris-setosa,0.999884,0.000115778,1.41429e-19
Iris-setosa,0.981846,0.0181541,5.20557e-15
Iris-setosa,0.995715,0.00428472,2.24941e-16
Iris-setosa,0.999199,0.00080114,1.33253e-19




In [14]:
# review model performance on the test set
mDL.model_performance(test)


ModelMetricsMultinomial: deeplearning
** Reported on test data. **

MSE: 0.00957316266244938
RMSE: 0.0978425401471639
LogLoss: 0.030119761640012316
Mean Per-Class Error: 0.03333333333333333

Confusion Matrix: Row labels: Actual class; Column labels: Predicted class


Unnamed: 0,Iris-setosa,Iris-versicolor,Iris-virginica,Error,Rate
0,15.0,0.0,0.0,0.0,0 / 15
1,0.0,9.0,1.0,0.1,1 / 10
2,0.0,0.0,6.0,0.0,0 / 6
3,15.0,9.0,7.0,0.032258,1 / 31



Top-3 Hit Ratios: 


Unnamed: 0,k,hit_ratio
0,1,0.967742
1,2,1.0
2,3,1.0


