# Iris H2O Example

## Acquisitor and Cleaner

In [1]:
import pandas as pd
from marvin_python_toolbox.common.data import MarvinData

file_path = MarvinData.download_file(url="https://s3.amazonaws.com/marvin-engines-data/Iris.csv")

iris = pd.read_csv(file_path)

marvin_initial_dataset = iris

## Training Preparator

In [2]:
from sklearn.model_selection import train_test_split
from sklearn import model_selection

X_train, X_test= train_test_split(marvin_initial_dataset, random_state=1,test_size=0.3)

marvin_dataset = {'train_X': X_train,  'test_X': X_test}


## Trainer

In [3]:
import h2o
from h2o.automl import H2OAutoML

h2o.init()

train_X_frame = h2o.H2OFrame.from_python(marvin_dataset['train_X'])
test_X_frame = h2o.H2OFrame.from_python(marvin_dataset['test_X'])

x = train_X_frame.columns
y = 'Species'
x.remove(y)

automl = H2OAutoML(max_models=20, seed=1)
automl.train(x=x, 
             y=y, 
             training_frame=train_X_frame)

marvin_model = automl

Checking whether there is an H2O instance running at http://localhost:54321 . connected.


0,1
H2O cluster uptime:,1 day 0 hours 48 mins
H2O cluster timezone:,America/Sao_Paulo
H2O data parsing timezone:,UTC
H2O cluster version:,3.26.0.3
H2O cluster version age:,20 days
H2O cluster name:,H2O_from_python_fernandozagatti_7kn410
H2O cluster total nodes:,1
H2O cluster free memory:,3.791 Gb
H2O cluster total cores:,4
H2O cluster allowed cores:,4


Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
AutoML progress: |████████████████████████████████████████████████████████| 100%


## Metrics Evaluator

In [4]:
import h2o
from sklearn import metrics

#h2o.init()

y_test = marvin_dataset['test_X']['Species']
marvin_dataset['test_X'].drop(columns='Species',inplace= True)

teste = h2o.H2OFrame.from_python(marvin_dataset['test_X'])
preds = marvin_model.predict(teste).as_data_frame()['predict'].values
marvin_metrics = metrics.accuracy_score(y_test, preds)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%


## Prediction Preparator

In [11]:
import h2o
import pandas as pd

input_message = {'SepalLengthCm': [input_message[0]], 'SepalWidthCm': [input_message[1]],
                 'PetalLengthCm': [input_message[2]], 'PetalWidthCm': [input_message[3]]}
input_message = pd.DataFrame(data=input_message)
input_message = h2o.H2OFrame.from_python(input_message)

Parse progress: |█████████████████████████████████████████████████████████| 100%


## Predictor

In [12]:
final_prediction = marvin_model.predict(input_message).as_data_frame().values[0][0]

gbm prediction progress: |████████████████████████████████████████████████| 100%


In [13]:
print(final_prediction)

Iris-versicolor
