### Automatic Machine Learning

This notebook ingests a dataset, and trains many machine learning models intelligently searching their parameters for optimal values. A leaderboard is maintained. Finally, an ensemble is created stacking together some of the base learners and the result is added to the leaderboard. The best model is used ion production. 


In [None]:
import h2o
from h2o.automl import H2OAutoML

In [None]:
%%capture
h2o.init(nthreads=1, max_mem_size=2)

In [None]:
# Import some data from Amazon S3
df = h2o.import_file("https://s3-us-west-1.amazonaws.com/dsclouddata/LendingClubData/LoansGoodBad.csv")

# Stratified Split into Train/Test
stratsplit = df["Bad_Loan"].stratified_split(test_frac=0.3, seed=12349453)
train = df[stratsplit=="train"]
test = df[stratsplit=="test"]


In [None]:
test.head(10)

In [None]:
# Identify predictors and response
x = train.columns
y = "Bad_Loan"
x.remove(y)

# For binary classification, response should be a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()

In [None]:
%%capture
# Run AutoML for 30 seconds
autoModel = H2OAutoML(max_runtime_secs = 600)
autoModel.train(x = x, y = y,
          training_frame = train,
          leaderboard_frame = test)

## Leaderboard
Display the best models, sorted by descending AUC

In [26]:
leaders = aml.leaderboard
leaders

C1,model_id,auc,logloss
0,StackedEnsemble_model_1496424679945_791,0.756107,0.584872
1,DRF_model_1496424679945_3,0.739127,0.599221
2,XRT_model_1496424679945_421,0.730277,0.657249




In [10]:
preds = aml.predict(test)

Parse progress: |█████████████████████████████████████████████████████████| 100%
stackedensemble prediction progress: |████████████████████████████████████| 100%
