# Build an Experiment with Python API

ref: http://docs.h2o.ai/driverless-ai/1-8-lts/docs/userguide/python_client.html

nb This is the legacy py client

## Sign In
Import the required modules and log in.

Pass in your credentials through the Client class which creates an authentication token to send to the Driverless AI Server. In plain English: to sign into the Driverless AI web page (which then sends requests to the Driverless Server), instantiate the Client class with your Driverless AI address and login credentials.

In [1]:
## install matplotlib
# conda install -c conda-forge matplotlib

In [2]:
## install packages
import sys
!conda install --yes --prefix {sys.prefix} pandas
!conda install --yes --prefix {sys.prefix} matplotlib
!conda install --yes --prefix {sys.prefix} sklearn

In [None]:
from h2oai_client import Client
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import model_selection

In [None]:
# address = 'http://ip_where_driverless_is_running:12345'
address = 'http://18.140.223.189:12345'

username = 'h2oai'
## check new one
password = 'h2oai'

h2oai = Client(address = address, username = username, password = password)

# make sure to use the same user name and password when signing in through the GUI

## Upload Datasets
Upload training and testing datasets from the Driverless AI /data folder.

You can provide a training, validation, and testing dataset for an experiment. The validation and testing dataset are optional. In this example, we will provide only training and testing.

In [None]:
## split data for modelling

CC = pd.read_csv('./data/CreditCard.csv')
train_cc, test_cc = model_selection.train_test_split(cc, test_size=0.2, random_state=2018)
train_cc.to_csv("CreditCard-train.csv", index=False)
test_cc.to_csv("CreditCard-test.csv", index=False)

## load data

train_path = '/data/CreditCard-train.csv'
test_path = '/data/CreditCard-test.csv'

train = h2oai.create_dataset_sync(train_path)
test = h2oai.create_dataset_sync(test_path)

## Set Experiment Parameters
We will now set the parameters of our experiment. Some of the parameters include:

Target Column: The column we are trying to predict.

Dropped Columns: The columns we do not want to use as predictors such as ID columns, columns with data leakage, etc.

Weight Column: The column that indicates the per row observation weights. If None, each row will have an observation weight of 1.

Fold Column: The column that indicates the fold. If None, the folds will be determined by Driverless AI.

Is Time Series: Whether or not the experiment is a time-series use case.

For information on the experiment settings, refer to the Experiment Settings.

For this example, we will be predicting ``default payment next month``. The parameters that control the experiment process are: accuracy, time, and interpretability. We can use the get_experiment_preview_sync function to get a sense of what will happen during the experiment.

We will start out by seeing what the experiment will look like with accuracy, time, and interpretability all set to 5.

In [None]:
target="default payment next month"
exp_preview = h2oai.get_experiment_preview_sync(dataset_key= train.key
                                                , validset_key=''
                                                , classification=True
                                                , dropped_cols=[]
                                                , target_col=target
                                                , is_time_series=False
                                                , enable_gpus=False
                                                , accuracy=5, time=5, interpretability=5
                                                , reproducible=True
                                                , resumed_experiment_id=''
                                                , time_col=''
                                                , config_overrides=None)
exp_preview

With these settings, the Driverless AI experiment will train about 124 models: 
* 16 for model and feature tuning
* 104 for feature evolution
* 4 for the final pipeline

When we start the experiment, we can either: 

* specify parameters
* use Driverless AI to suggest parameters

Driverless AI can suggest the parameters based on the dataset and target column.  Below we will use the **`get_experiment_tuning_suggestion`** to see what settings Driverless AI suggests.

In [None]:
# let Driverless suggest parameters for experiment
params = h2oai.get_experiment_tuning_suggestion(dataset_key = train.key, target_col = target, 
                                                is_classification = True, is_time_series = False,
                                                config_overrides = None, cols_to_drop=[])

params.dump()

Driverless AI has found that the best parameters are to set **`accuracy = 5`**, **`time = 4`**, **`interpretability = 6`**. It has selected **`AUC`** as the scorer (this is the default scorer for binomial problems).

## Launch Experiment: Feature Engineering + Final Model Training

Launch the experiment using the parameters that Driverless AI suggested along with the testset, scorer, and seed that were added. We can launch the experiment with the suggested parameters or create our own.

In [None]:
experiment = h2oai.start_experiment_sync(dataset_key=train.key,
                                         testset_key = test.key,
                                         target_col=target,
                                         is_classification=True,
                                         accuracy=5,
                                         time=4,
                                         interpretability=6,
                                         scorer="AUC",
                                         enable_gpus=True,
                                         seed=1234,
                                         cols_to_drop=['ID'])

## Examine Experiment

View the final model score for the validation and test datasets. When feature engineering is complete, an ensemble model can be built depending on the accuracy setting. The experiment object also contains the score on the validation and test data for this ensemble model.  In this case, the validation score is the score on the training cross-validation predictions.

In [None]:
print("Final Model Score on Validation Data: " + str(round(experiment.valid_score, 3)))
print("Final Model Score on Test Data: " + str(round(experiment.test_score, 3)))

The experiment object also contains the scores calculated for each iteration on bootstrapped samples on the validation data.

This information is saved in the experiment object.

## Build an Experiment in Web UI and Access Through Python

It is also possible to use the Python API to examine an experiment that was started through the Web UI using the experiment key.

You can get a pointer to the experiment by referencing the experiment key in the Web UI.

In [None]:
# Get list of experiments
experiment_list = list(map(lambda x: x.key, h2oai.list_models(offset=0, limit=100).models))
experiment_list

In [None]:
# Get pointer to experiment
experiment = h2oai.get_model_job(experiment_list[0]).entity