# Set-Up

In [1]:
%pwd  
%cd /workspaces/image-classification-for-technical-indicators

/workspaces/image-classification-for-technical-indicators


# Introduction

In this notebook, we will build 100 machine learning models with [h2o](https://www.h2o.ai/). 
In particular, we focus on the [AutoML](https://docs.h2o.ai/h2o-tutorials/latest-stable/h2o-world-2017/automl/index.html) feature.
The best models will be saved for future use. 

# Libraries

In [2]:
import h2o

from source import h2o_modelling


# Setup 

Start the h2o cluster.

In [3]:
h2o.init()


Checking whether there is an H2O instance running at http://localhost:54321 . connected.


0,1
H2O_cluster_uptime:,4 hours 42 mins
H2O_cluster_timezone:,Etc/UTC
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.34.0.7
H2O_cluster_version_age:,6 months and 20 days !!!
H2O_cluster_name:,H2O_from_python_vscode_hkygsb
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,3.069 Gb
H2O_cluster_total_cores:,4
H2O_cluster_allowed_cores:,4


# Data

Load in the data from the parquet files and import to `h2o`.  

In [4]:
bb_line_df = h2o_modelling.parquet_to_h2o("data/bb_nobuy_line.parquet.gzip", "data/bb_buy_line.parquet.gzip")
bb_candle_df = h2o_modelling.parquet_to_h2o("data/bb_nobuy_candle.parquet.gzip", "data/bb_buy_candle.parquet.gzip")
macd_line_df = h2o_modelling.parquet_to_h2o("data/macd_nobuy_line.parquet.gzip", "data/macd_buy_line.parquet.gzip")
macd_candle_df = h2o_modelling.parquet_to_h2o("data/macd_nobuy_candle.parquet.gzip", "data/macd_buy_candle.parquet.gzip")


Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%


Prepare the `h2o` dataframe by converting the `label` column to a categorical variable.

In [5]:
y = 'label'

bb_line_df_h2o, bb_line_y, bb_line_x = h2o_modelling.prepare_h2o_df(df=bb_line_df, outcome=y)
bb_candle_df_h2o, bb_candle_y, bb_candle_x = h2o_modelling.prepare_h2o_df(df=bb_candle_df, outcome=y)
macd_line_df_h2o, macd_line_y, macd_line_x = h2o_modelling.prepare_h2o_df(df=macd_line_df, outcome=y)
macd_candle_df_h2o, macd_candle_y, macd_candle_x = h2o_modelling.prepare_h2o_df(df=macd_candle_df, outcome=y)


['label', 'pixel_0', 'pixel_1', 'pixel_2', 'pixel_3', 'pixel_4', 'pixel_5', 'pixel_6', 'pixel_7', 'pixel_8', 'pixel_9', 'pixel_10', 'pixel_11', 'pixel_12', 'pixel_13', 'pixel_14', 'pixel_15', 'pixel_16', 'pixel_17', 'pixel_18', 'pixel_19', 'pixel_20', 'pixel_21', 'pixel_22', 'pixel_23', 'pixel_24', 'pixel_25', 'pixel_26', 'pixel_27', 'pixel_28', 'pixel_29', 'pixel_30', 'pixel_31', 'pixel_32', 'pixel_33', 'pixel_34', 'pixel_35', 'pixel_36', 'pixel_37', 'pixel_38', 'pixel_39', 'pixel_40', 'pixel_41', 'pixel_42', 'pixel_43', 'pixel_44', 'pixel_45', 'pixel_46', 'pixel_47', 'pixel_48', 'pixel_49', 'pixel_50', 'pixel_51', 'pixel_52', 'pixel_53', 'pixel_54', 'pixel_55', 'pixel_56', 'pixel_57', 'pixel_58', 'pixel_59', 'pixel_60', 'pixel_61', 'pixel_62', 'pixel_63', 'pixel_64', 'pixel_65', 'pixel_66', 'pixel_67', 'pixel_68', 'pixel_69', 'pixel_70', 'pixel_71', 'pixel_72', 'pixel_73', 'pixel_74', 'pixel_75', 'pixel_76', 'pixel_77', 'pixel_78', 'pixel_79', 'pixel_80', 'pixel_81', 'pixel_82', 'pix

# Modelling

Build the models! Each model will run for a maximum of 6 hours or create 100 models. 

In [18]:
bb_line_lb = h2o_modelling.train_and_save(df=bb_line_df_h2o, outcome=bb_line_y, predictors=bb_line_x, save_path=f"models/bb-models/bb_line", max_models=1, max_runtime_min=6*60)
print(bb_line_lb[:10])


AutoML progress: |
20:40:56.110: _train param, Dropping bad and constant columns: [name]

███████████████████████████████████████████████████████████████| (done) 100%


In [None]:
bb_candle_lb = h2o_modelling.train_and_save(df=bb_candle_df_h2o, outcome=bb_candle_y, predictors=bb_candle_x, save_path=f"models/bb-models/bb_candle", max_models=100, max_runtime_min=6*60)
print(bb_candle_lb[:10])


In [None]:
macd_line_lb = h2o_modelling.train_and_save(df=macd_line_df_h2o, outcome=macd_line_y, predictors=macd_line_x, save_path=f"models/macd-models/macd_line", max_models=100, max_runtime_min=6*60)
print(macd_line_lb[:10])


In [None]:
macd_candle_lb = h2o_modelling.train_and_save(df=macd_candle_df_h2o, outcome=macd_candle_y, predictors=macd_candle_x, save_path=f"models/macd-models/macd_candle", max_models=100, max_runtime_min=6*60)
print(macd_candle_lb[:10])
