# Set-Up

In [1]:
%pwd  
%cd /workspaces/image-classification-for-technical-indicators

/workspaces/image-classification-for-technical-indicators


# Introduction

In this notebook, we will build 100 machine learning models with [h2o](https://www.h2o.ai/). 
In particular, we focus on the [AutoML](https://docs.h2o.ai/h2o-tutorials/latest-stable/h2o-world-2017/automl/index.html) feature.
The best models will be saved for future use. 

# Libraries

In [2]:
import h2o

from source import h2o_modelling


# Setup 

Start the h2o cluster.

In [3]:
h2o.init()


Checking whether there is an H2O instance running at http://localhost:54321 . connected.


0,1
H2O_cluster_uptime:,33 days 0 hours 11 mins
H2O_cluster_timezone:,Etc/UTC
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.34.0.7
H2O_cluster_version_age:,7 months and 19 days !!!
H2O_cluster_name:,H2O_from_python_vscode_tj9xpd
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,1.769 Gb
H2O_cluster_total_cores:,30
H2O_cluster_allowed_cores:,30


# Data

Load in the data from the parquet files and import to `h2o`.  

In [21]:
bb_line_df = h2o_modelling.parquet_to_h2o("data/bb_nobuy_line.parquet.gzip", "data/bb_buy_line.parquet.gzip")
bb_candle_df = h2o_modelling.parquet_to_h2o("data/bb_nobuy_candle.parquet.gzip", "data/bb_buy_candle.parquet.gzip")
macd_line_df = h2o_modelling.parquet_to_h2o("data/macd_nobuy_line.parquet.gzip", "data/macd_buy_line.parquet.gzip")
macd_candle_df = h2o_modelling.parquet_to_h2o("data/macd_nobuy_candle.parquet.gzip", "data/macd_buy_candle.parquet.gzip")


Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%


Prepare the `h2o` dataframe by converting the `label` column to a categorical variable.

In [22]:
y = 'label'

bb_line_df_h2o, bb_line_y, bb_line_x = h2o_modelling.prepare_h2o_df(df=bb_line_df, outcome=y)
bb_candle_df_h2o, bb_candle_y, bb_candle_x = h2o_modelling.prepare_h2o_df(df=bb_candle_df, outcome=y)
macd_line_df_h2o, macd_line_y, macd_line_x = h2o_modelling.prepare_h2o_df(df=macd_line_df, outcome=y)
macd_candle_df_h2o, macd_candle_y, macd_candle_x = h2o_modelling.prepare_h2o_df(df=macd_candle_df, outcome=y)


# Modelling

Build the models! Each model will run for a maximum of 6 hours or create 100 models. 

In [24]:
bb_line_lb = h2o_modelling.train_and_save(df=bb_line_df_h2o, outcome=bb_line_y, predictors=bb_line_x, save_path=f"../models/bb-models/bb_line", max_models=100, max_runtime_min=6*60)
print(bb_line_lb[:10])


AutoML progress: |███████████████████████████████████████████████████████████████| (done) 100%


model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse
StackedEnsemble_AllModels_6_AutoML_14_20220809_172134,0.99535,0.0797838,0.994618,0.0294698,0.151006,0.0228029
StackedEnsemble_BestOfFamily_7_AutoML_14_20220809_172134,0.99535,0.0805795,0.994639,0.0306373,0.151943,0.0230866
StackedEnsemble_BestOfFamily_4_AutoML_14_20220809_172134,0.995341,0.0807789,0.994644,0.0304878,0.151971,0.0230953
StackedEnsemble_AllModels_3_AutoML_14_20220809_172134,0.995339,0.0801739,0.994656,0.0301456,0.151237,0.0228727
StackedEnsemble_BestOfFamily_6_AutoML_14_20220809_172134,0.995163,0.0834108,0.994067,0.0298615,0.151872,0.023065
StackedEnsemble_AllModels_2_AutoML_14_20220809_172134,0.995139,0.081982,0.994401,0.0306175,0.152945,0.0233921
StackedEnsemble_AllModels_1_AutoML_14_20220809_172134,0.995126,0.0822674,0.994416,0.0306297,0.153093,0.0234374
StackedEnsemble_AllModels_5_AutoML_14_20220809_172134,0.995121,0.0839209,0.993756,0.0303471,0.1518,0.0230432
GBM_grid_1_AutoML_14_20220809_172134_model_31,0.995109,0.0838724,0.994288,0.0313807,0.154505,0.0238717
GBM_grid_1_AutoML_14_20220809_172134_model_27,0.99506,0.083909,0.994315,0.0310299,0.154128,0.0237553





In [27]:
bb_candle_lb = h2o_modelling.train_and_save(df=bb_candle_df_h2o, outcome=bb_candle_y, predictors=bb_candle_x, save_path=f"../models/bb-models/bb_candle", max_models=100, max_runtime_min=6*60)
print(bb_candle_lb[:10])


AutoML progress: |
18:35:47.814: _train param, Dropping bad and constant columns: [pixel_804, pixel_787]

███
18:36:11.862: _train param, Dropping bad and constant columns: [pixel_804, pixel_787]

██████████████████
18:39:15.913: _train param, Dropping bad and constant columns: [pixel_804, pixel_787]

█████
18:40:59.39: _train param, Dropping unused columns: [pixel_804, pixel_787]
18:41:01.63: _train param, Dropping bad and constant columns: [pixel_804, pixel_787]

█
18:41:22.92: _train param, Dropping bad and constant columns: [pixel_804, pixel_787]

██
18:42:31.159: _train param, Dropping bad and constant columns: [pixel_804, pixel_787]

█
18:43:32.230: _train param, Dropping bad and constant columns: [pixel_804, pixel_787]

██
18:44:44.299: _train param, Dropping bad and constant columns: [pixel_804, pixel_787]

██████
18:46:32.349: _train param, Dropping unused columns: [pixel_804, pixel_787]
18:46:34.384: _train param, Dropping unused columns: [pixel_804, pixel_787]


18:46:36.410

model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse
StackedEnsemble_AllModels_6_AutoML_15_20220809_183547,0.982375,0.160821,0.978551,0.0622856,0.217256,0.0472003
StackedEnsemble_AllModels_3_AutoML_15_20220809_183547,0.982344,0.160926,0.97848,0.0617453,0.217304,0.0472209
StackedEnsemble_AllModels_2_AutoML_15_20220809_183547,0.98193,0.162735,0.977868,0.0625918,0.218273,0.0476432
StackedEnsemble_BestOfFamily_4_AutoML_15_20220809_183547,0.981919,0.163409,0.977905,0.0635059,0.218997,0.0479598
StackedEnsemble_BestOfFamily_7_AutoML_15_20220809_183547,0.981898,0.163853,0.977975,0.0634044,0.219016,0.0479679
StackedEnsemble_AllModels_5_AutoML_15_20220809_183547,0.981873,0.162318,0.976735,0.0621695,0.216793,0.0469992
StackedEnsemble_AllModels_1_AutoML_15_20220809_183547,0.981868,0.16352,0.977872,0.0638006,0.21878,0.0478648
StackedEnsemble_BestOfFamily_3_AutoML_15_20220809_183547,0.98167,0.163636,0.977252,0.0634187,0.218927,0.0479289
StackedEnsemble_BestOfFamily_2_AutoML_15_20220809_183547,0.981621,0.164384,0.9775,0.0636149,0.219442,0.0481548
StackedEnsemble_BestOfFamily_1_AutoML_15_20220809_183547,0.981577,0.164711,0.977447,0.063791,0.219562,0.0482076





In [28]:
macd_line_lb = h2o_modelling.train_and_save(df=macd_line_df_h2o, outcome=macd_line_y, predictors=macd_line_x, save_path=f"../models/macd-models/macd_line", max_models=100, max_runtime_min=6*60)
print(macd_line_lb[:10])


AutoML progress: |███████████████████████████████████████████████████████████████| (done) 100%


model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse
StackedEnsemble_AllModels_3_AutoML_16_20220809_200853,0.876778,0.435764,0.855521,0.204057,0.374794,0.14047
StackedEnsemble_AllModels_6_AutoML_16_20220809_200853,0.876757,0.435715,0.855352,0.202988,0.374745,0.140434
StackedEnsemble_AllModels_1_AutoML_16_20220809_200853,0.875384,0.438222,0.853756,0.204357,0.375885,0.14129
StackedEnsemble_AllModels_2_AutoML_16_20220809_200853,0.875373,0.43814,0.853795,0.205074,0.375938,0.14133
StackedEnsemble_AllModels_5_AutoML_16_20220809_200853,0.874845,0.437818,0.851764,0.202288,0.37559,0.141068
StackedEnsemble_BestOfFamily_4_AutoML_16_20220809_200853,0.874154,0.44001,0.8517,0.205201,0.376804,0.141982
StackedEnsemble_BestOfFamily_7_AutoML_16_20220809_200853,0.874127,0.440126,0.851932,0.206076,0.376822,0.141995
GBM_grid_1_AutoML_16_20220809_200853_model_27,0.872609,0.442593,0.849825,0.205536,0.378009,0.142891
StackedEnsemble_BestOfFamily_3_AutoML_16_20220809_200853,0.872608,0.44242,0.850708,0.204767,0.378132,0.142984
StackedEnsemble_BestOfFamily_2_AutoML_16_20220809_200853,0.872357,0.442963,0.850532,0.209047,0.378178,0.143018





In [29]:
macd_candle_lb = h2o_modelling.train_and_save(df=macd_candle_df_h2o, outcome=macd_candle_y, predictors=macd_candle_x, save_path=f"../models/macd-models/macd_candle", max_models=100, max_runtime_min=6*60)
print(macd_candle_lb[:10])


AutoML progress: |
20:59:46.542: _train param, Dropping bad and constant columns: [pixel_787, pixel_786]

███
21:00:18.584: _train param, Dropping bad and constant columns: [pixel_787, pixel_786]

███████████
21:02:50.698: _train param, Dropping bad and constant columns: [pixel_787, pixel_786]

████
21:04:32.359: _train param, Dropping unused columns: [pixel_787, pixel_786]

█
21:04:34.452: _train param, Dropping bad and constant columns: [pixel_787, pixel_786]

█
21:05:11.489: _train param, Dropping bad and constant columns: [pixel_787, pixel_786]

████
21:07:33.468: _train param, Dropping bad and constant columns: [pixel_787, pixel_786]

█
21:08:38.96: _train param, Dropping bad and constant columns: [pixel_787, pixel_786]

████
21:10:01.270: _train param, Dropping bad and constant columns: [pixel_787, pixel_786]

███████████
21:12:38.412: _train param, Dropping unused columns: [pixel_787, pixel_786]


21:12:40.445: _train param, Dropping unused columns: [pixel_787, pixel_786]


21:1

model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse
StackedEnsemble_Best1000_1_AutoML_17_20220809_205946,0.889018,0.412323,0.861921,0.189487,0.363447,0.132093
StackedEnsemble_AllModels_3_AutoML_17_20220809_205946,0.888412,0.41329,0.861665,0.190101,0.363923,0.13244
StackedEnsemble_AllModels_6_AutoML_17_20220809_205946,0.888326,0.413526,0.861645,0.191423,0.364081,0.132555
StackedEnsemble_AllModels_5_AutoML_17_20220809_205946,0.887908,0.411402,0.858668,0.188839,0.363098,0.13184
StackedEnsemble_BestOfFamily_8_AutoML_17_20220809_205946,0.887132,0.415845,0.860224,0.197629,0.365208,0.133377
StackedEnsemble_AllModels_1_AutoML_17_20220809_205946,0.886375,0.417398,0.858832,0.195057,0.366013,0.133966
StackedEnsemble_AllModels_2_AutoML_17_20220809_205946,0.886347,0.417354,0.85892,0.194455,0.366022,0.133972
StackedEnsemble_BestOfFamily_7_AutoML_17_20220809_205946,0.88584,0.418435,0.856845,0.194667,0.366191,0.134095
StackedEnsemble_BestOfFamily_4_AutoML_17_20220809_205946,0.884565,0.420424,0.856535,0.191831,0.367399,0.134982
XGBoost_lr_search_selection_AutoML_17_20220809_205946_select_grid_model_3,0.884086,0.421955,0.855226,0.193653,0.36779,0.135269



