# Introduction

In this notebook, we will build 100 machine learning models with [h2o](https://www.h2o.ai/). 
In particular, we focus on the [AutoML](https://docs.h2o.ai/h2o-tutorials/latest-stable/h2o-world-2017/automl/index.html) feature.
The best models will be saved for future use. 

# Libraries

In [68]:
from source import h2o_modelling


# Setup 

Start the h2o cluster.

In [69]:
h2o.init()


Checking whether there is an H2O instance running at http://localhost:54321 . connected.


0,1
H2O_cluster_uptime:,1 hour 26 mins
H2O_cluster_timezone:,Etc/UTC
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.34.0.7
H2O_cluster_version_age:,5 months and 24 days !!!
H2O_cluster_name:,H2O_from_python_vscode_puvpf3
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,3.649 Gb
H2O_cluster_total_cores:,4
H2O_cluster_allowed_cores:,4


# Data

Load in the data from the parquet files and import to `h2o`.  

In [47]:
bb_line_df = h2o_modelling.parquet_to_h2o("data/bb_nobuy_line.parquet.gzip", "data/bb_buy_line.parquet.gzip")
bb_candle_df = h2o_modelling.parquet_to_h2o("data/bb_nobuy_candle.parquet.gzip", "data/bb_buy_candle.parquet.gzip")
macd_line_df = h2o_modelling.parquet_to_h2o("data/macd_nobuy_line.parquet.gzip", "data/macd_buy_line.parquet.gzip")
macd_candle_df = h2o_modelling.parquet_to_h2o("data/macd_nobuy_candle.parquet.gzip", "data/macd_buy_candle.parquet.gzip")


Prepare the `h2o` dataframe by converting the `label` column to a categorical variable.

In [49]:
y = 'label'

bb_line_y, bb_line_x = h2o_modelling.prepare_h2o_df(df=bb_line_df, outcome=y)
bb_candle_y, bb_candle_x = h2o_modelling.prepare_h2o_df(df=bb_candle_df, outcome=y)
macd_line_y, macd_line_x = h2o_modelling.prepare_h2o_df(df=macd_line_df, outcome=y)
macd_candle_y, macd_candle_x = h2o_modelling.prepare_h2o_df(df=macd_candle_df, outcome=y)


# Modelling

Build the models! Each model will run for a maximum of 6 hours or create 100 models. 

In [67]:
bb_line_lb = h2o_modelling.train_and_save(df=bb_line_df, outcome=bb_line_y, bb_line_x, f"models/bb-models/bb_line", max_models=100, max_runtime_min=6*60)
bb_candle_lb = h2o_modelling.train_and_save(df=bb_candle_df, outcome=bb_candle_y, bb_candle_x, f"models/bb-models/bb_candle", max_models=100, max_runtime_min=6*60)
macd_line_lb = h2o_modelling.train_and_save(df=macd_line_df, outcome=macd_line_y, macd_line_x, f"models/macd-models/macd_line", max_models=100, max_runtime_min=6*60)
macd_candle_lb = h2o_modelling.train_and_save(df=macd_candle_df, outcome=macd_candle_y, macd_candle_x, f"models/macd-models/macd_candle", max_models=100, max_runtime_min=6*60)


h2o.frame.H2OFrame