In [None]:
%load_ext autoreload
%autoreload 2

# Overview

> yoda wants to simplify the way to run jobs on Google AI platform and organize your model process in a config file.

In this session, we will go through a few examples to see how yoda works.

## Run on local

Here is an example of a config file `config1.yaml`.

In [None]:
config1 = '../data/configs/config1.yaml'
with open(config1) as f:
    print(f.read())

data: 
  input_path: "../data/iris_data.csv"
  eval_path: "../data/iris_data.csv"
  output_path: "../output/"
  features: "sepal_length,sepal_width,petal_length"
  label: species
train:
  estimator: xgboost.XGBClassifier
  params:
    max_depth: 4
    num_estimator: 50
eval:
  metrics: "accuracy,f1_macro"


We can run this config file locally by 

```{shell}
yoda run config1.yaml
```

### The following is how yoda process the config file, you can safely ignore this part.

In [None]:
# load the file
conf_dict = yaml.load(open(config1), Loader=yaml.SafeLoader)

In [None]:
conf_dict

{'data': {'input_path': '../data/iris_data.csv',
  'eval_path': '../data/iris_data.csv',
  'output_path': '../output/',
  'features': 'sepal_length,sepal_width,petal_length',
  'label': 'species'},
 'train': {'estimator': 'xgboost.XGBClassifier',
  'params': {'max_depth': 4, 'num_estimator': 50}},
 'eval': {'metrics': 'accuracy,f1_macro'}}

During the ***Data*** session, yoda loads the config file and read the data from input_path. The data looks like this:

In [None]:
data = Data(**conf_dict['data'])

In [None]:
data.X.head()

Unnamed: 0,sepal_length,sepal_width,petal_length
0,0.0,1.0,2.0
1,5.1,3.5,1.4
2,4.9,3.0,1.4
3,4.7,3.2,1.3
4,4.6,3.1,1.5


In [None]:
data.y.value_counts()

0    51
2    50
1    50
Name: species, dtype: int64

Then, it will generate an object for the ***Train*** session:

In [None]:
train = Train(**conf_dict['train'])

In [None]:
train.fit(data.X, data.y)

In [None]:
run_eval(conf_dict, data, train.estimator, output_dir=None)

{'accuracy': {'sd': 0, 'avg': 1.0}, 'f1_macro': {'sd': 0, 'avg': 1.0}}

In [None]:
conf_dict["eval"]["cv"] = 5
data.eval_path = None

In [None]:
run_eval(conf_dict, data, train.estimator, output_dir=None)

{'accuracy': {'sd': 0.0574503560948385, 'avg': 0.9402150537634408},
 'f1_macro': {'sd': 0.05748872061476293, 'avg': 0.9398830409356724}}

## Run on GCP AI platform

Before we run on AI platform, we need to create an image that have all depedencies installed.

```{shell}
export PROJECT_ID=$(gcloud config list project --format "value(core.project)")
export IMAGE_REPO_NAME=yoda
export IMAGE_TAG=basic
export IMAGE_URI=gcr.io/$PROJECT_ID/$IMAGE_REPO_NAME:$IMAGE_TAG

docker build -f ../docker/Dockerfile.basic -t $IMAGE_URI ./
docker push $IMAGE_URI
```

The config for GCP looks like this:

In [None]:
config2 = '../data/configs/config2.yaml'
with open(config2) as f:
    print(f.read())

data: 
  input_path: !format "gs://{BUCKET}/{USER}/test/iris_data.csv"
  eval_path: !format "gs://{BUCKET}/{USER}/test/iris_data.csv"
  output_path: !format "gs://{BUCKET}/{USER}/test/output/"
  features: "sepal_length,sepal_width,petal_length"
  label: species
train:
  estimator: xgboost.XGBClassifier
  params:
    max_depth: 4
    num_estimator: 50
eval:
  metrics: "accuracy,f1_macro"


In [None]:
os.environ["BUCKET"] = "testjobsubmit"
conf_dict2 = yaml.safe_load(open(config2))

In [None]:
conf_dict2

{'data': {'input_path': 'gs://testjobsubmit/j0l04cl/test/iris_data.csv',
  'eval_path': 'gs://testjobsubmit/j0l04cl/test/iris_data.csv',
  'output_path': 'gs://testjobsubmit/j0l04cl/test/output/',
  'features': 'sepal_length,sepal_width,petal_length',
  'label': 'species'},
 'train': {'estimator': 'xgboost.XGBClassifier',
  'params': {'max_depth': 4, 'num_estimator': 50}},
 'eval': {'metrics': 'accuracy,f1_macro'}}

In [None]:
from nbdev.export import notebook2script
notebook2script()

Converted 00_core.ipynb.
Converted 01_runner.ipynb.
Converted 02_cli.ipynb.
