# CASE STUDY - AAVIAL AI Enterprise Workflow Capstone

You will be building your own workflow template in this tutorial.  You already have a Dockerfile and a basic Flask application to build an API.  Lets combine what you have learned about logging to build a ``workflow-template`` that can be used to deploy models in a way that facilitates performance monitoring.

There are three main parts to this case study.

1. Write unit tests for a logger and a logging API endpoint
2. Add logging to your Docker container
3. Add an API endpoint for logging
4. Make sure all tests pass
5. Create model performance investigative tooling
6. Swap out the iris data for the AAVAIL churn data

You may want to eventually rename the directory because in this case-study you will swap out the iris data for `aavail-target.csv`.  It reality you will eventually want a library of workflow templates to work from and the naming convention you decide on can help with organization.  This notebook should reside in that source directory regardless of the name.  We suggest that you go through all of the tasks **first** using the iris data **then** copy the template to a new folder and make it work for the AAVAIL churn data.  Eventually you will want a suite of workflow templates that you will be able to select from.

In [34]:
import os
import sys
import csv
import requests
from collections import Counter
from datetime import date
from ast import literal_eval
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import time

from model import model_load
from model import model_predict

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.preprocessing import StandardScaler

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

from cslib import fetch_ts, engineer_features

%matplotlib inline

## Getting started

The ``workflow-template.zip`` is a workflow template.  Unpack the directory in a location where you would like the source code to exist.  Leaving out the ``static`` directory that contains CSS and JavaScript to render a landing page, the important pieces are shown in the following tree.

```
├── app.py
├── cslib.py
├── Dockerfile
├── logger.py
├── model.py
├── README.rst
├── requirements.txt
├── run-tests.py
├── templates
│   ├── base.html
│   ├── dashboard.html
│   ├── index.html
│   └── running.html
└── unittests
    ├── ApiTests.py
    ├── __init__.py
    ├── ModelTests.py
```

If you plan on modifying the HTML website you will need to modify the files in ``templates``.  The rest of the files you should be familiar with at this point.

We will be working with an Flask API to interact with our model. In order to access the different endpoints of this API make sure the app is running. Open a new command prompt and run the app with the command :


```
python path/to/working/directory/app.py -d
```

## TASK 1: Write units test for a logger

1. Using `model.py` and `./unittests/ModelTests.py` as an example complete `logger.py` and 
`./unittests/LoggerTests.py`.
2. Modify the files so that there are at a minimum the following tests:
 
    * ensure predict log is automatically created
    * ensure train log is automatically created
    * ensure that content can be retrieved from predict log file
    * ensure that content can be retrieved from train log file
    
> IMPORTANT: when writing to a log file from a unit test you will want to ensure that you do not modify or delete existing 'production' logs.  You can test your function with the following code (although it is likely easier to work directly in a terminal).

In [10]:
!python ./unittests/LoggerTests.py

....
----------------------------------------------------------------------
Ran 4 tests in 0.030s

OK


## TASK 2: Add an API endpoint for logging

In addition to the `predict` and `train` endpoints, create a third endpoint that returns 
logs.  Remember that there are `train` and `predict` log files and that they are set up 
to create new files each month.  You will need to ensure that your endpoint can accommodate this and the best way to ensure this is to **first write the unit tests** then write the code.

Flask has several functions to help with the sending of files. One example is [send_from_directory](https://flask.palletsprojects.com/en/1.1.x/api/#flask.send_from_directory).

In [11]:
# The API is ready we can test it. We invite you to take a close look into the ApiTests.py script.
!python ./unittests/ApiTests.py

....
----------------------------------------------------------------------
Ran 4 tests in 5.520s

OK


### SOLUTION NOTE

We can now access the logs through the API

In [12]:
r = requests.get('http://127.0.0.1:8081/logs/train-test.log')
print(r.text)

unique_id,timestamp,x_shape,eval_test,model_version,model_version_note,runtime
a288a6a6-8f5e-47e5-b94c-4ee746ddb0de,1610266371.87012,"(100, 10)",{'rmse': 0.5},0.1,test model,00:00:01
739ea33c-9b31-4977-8389-05e186970a3c,1610266371.87012,"(100, 10)",{'rmse': 0.5},0.1,test model,00:00:01
fc19a4e2-36ae-4ec4-a136-387579f654eb,1610266413.6052504,"(900, 4)","{'0.0': {'precision': 0.8347107438016529, 'recall': 0.9439252336448598, 'f1-score': 0.8859649122807018, 'support': 214}, '1.0': {'precision': 0.7818181818181819, 'recall': 0.5180722891566265, 'f1-score': 0.6231884057971014, 'support': 83}, 'accuracy': 0.8249158249158249, 'macro avg': {'precision': 0.8082644628099174, 'recall': 0.7309987614007432, 'f1-score': 0.7545766590389016, 'support': 297}, 'weighted avg': {'precision': 0.819929320755767, 'recall': 0.8249158249158249, 'f1-score': 0.8125290535664297, 'support': 297}}",0.1,RF on AAVAIL churn,000:00:05



## TASK 3: Make sure all tests pass

You have been working on specific suites of unit tests.  It is a best practice to double-check that all tests pass after making major changes like the ones you have just completed.

> make sure you modify the `./unittests/__init__.py` so that the LoggerTest suite is also included when running all tests.

In [31]:
!python run-tests.py

... test flag on
...... subseting data
...... subseting countries
data\cs-train\ts-data
... loading ts data from files
... saving test version of model: models\test-all-0_1.joblib
... saving test version of model: models\test-united_kingdom-0_1.joblib
C:\Users\NewizZ\Workshop\aavail-ai-enterprise-workflow-capstone\data\cs-train\ts-data
... loading ts data from files
C:\Users\NewizZ\Workshop\aavail-ai-enterprise-workflow-capstone\data\cs-train\ts-data
... loading ts data from files
2018-08-01


..
----------------------------------------------------------------------
Ran 11 tests in 71.826s

OK


## TASK 4: Create model performance investigative tooling

There are a lot of convenience functions you could create here.  Create them directly in this notebook or create them as scripts that you may call from this notebook.  

First write a script that accomplishes the following:

* train one model, then select another type of machine learning model and train again,  ensuring that each has separate version numbers.
* simulate a couple of hundred predictions for each model.

At minimum create a tablular summary and/or a simple plot that accomplishes the following:

1. Compare model performance for the two models
2. Determine if there was any drift from the first model to the second using a novelty detection algorithm.

In [33]:
! python run-model-train.py

C:\Users\NewizZ\Workshop\aavail-ai-enterprise-workflow-capstone\data\cs-train\ts-data
... loading ts data from files
... saving model: models\sl-all-0_1.joblib
... saving model: models\sl-eire-0_1.joblib
... saving model: models\sl-france-0_1.joblib
... saving model: models\sl-germany-0_1.joblib
... saving model: models\sl-hong_kong-0_1.joblib
... saving model: models\sl-netherlands-0_1.joblib
... saving model: models\sl-norway-0_1.joblib
... saving model: models\sl-portugal-0_1.joblib
... saving model: models\sl-singapore-0_1.joblib
... saving model: models\sl-spain-0_1.joblib
... saving model: models\sl-united_kingdom-0_1.joblib
C:\Users\NewizZ\Workshop\aavail-ai-enterprise-workflow-capstone\data\cs-train\ts-data
... loading ts data from files
model training complete.




In [35]:
data_dir = os.path.join("data","cs-train")

ts_all = fetch_ts(data_dir,clean=False)

data\cs-train\ts-data
... loading ts data from files


In [36]:
ts_all

{'all':            date  purchases  unique_invoices  unique_streams  total_views  \
 0    2017-11-01          0                0               0            0   
 1    2017-11-02          0                0               0            0   
 2    2017-11-03          0                0               0            0   
 3    2017-11-04          0                0               0            0   
 4    2017-11-05          0                0               0            0   
 ..          ...        ...              ...             ...          ...   
 602  2019-06-26       1358               67             999         6420   
 603  2019-06-27       1620               80             944         9435   
 604  2019-06-28       1027               70             607         5539   
 605  2019-06-29          0                0               0            0   
 606  2019-06-30        602               27             423         2534   
 
     year_month  revenue  
 0      2017-11     0.00  
 1      2017-

In [37]:
X,y,dates = engineer_features(ts_all['all'])
        
## Perform a train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=True, random_state=42)

In [38]:
param_grid_rf = {
    'rf__criterion': ['mse','mae'],
    'rf__n_estimators': [10,15,20,25,50,100]
    }

time_start = time.time()
pipe_rf = Pipeline(steps=[('scaler', StandardScaler()), ('rf', RandomForestRegressor())])

grid = GridSearchCV(pipe_rf, param_grid=param_grid_rf, cv=5, iid=False, n_jobs=-1)
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)

rf_mae =  mean_absolute_error(y_test, y_pred)
rf_mse =  mean_squared_error(y_test, y_pred)
rf_r2_score = r2_score(y_test, y_pred)

print("train time = ", time.strftime('%H:%M:%S', time.gmtime(time.time()-time_start)))
print("mae = {:.0f}".format(rf_mae))
print("mse = {:.0f}".format(rf_mse))
print("r2_score = {:.3f}".format(rf_r2_score))
print("best params =", grid.best_params_)



train time =  00:00:05
mae = 11348
mse = 262818322
r2_score = 0.960
best params = {'rf__criterion': 'mse', 'rf__n_estimators': 100}


In [39]:
param_grid_rf = {
    'rf__criterion': ['mse','mae'],
    'rf__n_estimators': [10,15,20,25,50,100]
    }

time_start = time.time()
pipe_rf = Pipeline(steps=[('scaler', StandardScaler()), ('rf', RandomForestRegressor())])

grid = GridSearchCV(pipe_rf, param_grid=param_grid_rf, cv=5, iid=False, n_jobs=-1)
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)

rf_mae =  mean_absolute_error(y_test, y_pred)
rf_mse =  mean_squared_error(y_test, y_pred)
rf_r2_score = r2_score(y_test, y_pred)

print("train time = ", time.strftime('%H:%M:%S', time.gmtime(time.time()-time_start)))
print("mae = {:.0f}".format(rf_mae))
print("mse = {:.0f}".format(rf_mse))
print("r2_score = {:.3f}".format(rf_r2_score))
print("best params =", grid.best_params_)



train time =  00:00:02
mae = 11868
mse = 294755145
r2_score = 0.955
best params = {'rf__criterion': 'mse', 'rf__n_estimators': 100}


In [40]:
param_grid_gb = {
    'gb__criterion': ['mse','mae'],
    'gb__n_estimators': [10,15,20,25,50,100]
    }

time_start = time.time()
pipe_gb = Pipeline(steps=[('scaler', StandardScaler()), ('gb', GradientBoostingRegressor())])

grid = GridSearchCV(pipe_gb, param_grid=param_grid_gb, cv=5, iid=False, n_jobs=-1)
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)

gb_mae =  mean_absolute_error(y_test, y_pred)
gb_mse =  mean_squared_error(y_test, y_pred)
gb_r2_score = r2_score(y_test, y_pred)

print("train time = ", time.strftime('%H:%M:%S', time.gmtime(time.time()-time_start)))
print("mae = {:.0f}".format(gb_mae))
print("mse = {:.0f}".format(gb_mse))
print("r2_score = {:.3f}".format(gb_r2_score))
print("best params =", grid.best_params_)

train time =  00:00:02
mae = 16209
mse = 459616927
r2_score = 0.930
best params = {'gb__criterion': 'mse', 'gb__n_estimators': 100}




In [41]:
param_grid_gb = {
    'gb__criterion': ['mse','mae'],
    'gb__n_estimators': [10,15,20,25,50,100]
    }

time_start = time.time()
pipe_gb = Pipeline(steps=[('scaler', StandardScaler()), ('gb', GradientBoostingRegressor())])

grid = GridSearchCV(pipe_gb, param_grid=param_grid_gb, cv=5, iid=False, n_jobs=-1)
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)

gb_mae =  mean_absolute_error(y_test, y_pred)
gb_mse =  mean_squared_error(y_test, y_pred)
gb_r2_score = r2_score(y_test, y_pred)

print("train time = ", time.strftime('%H:%M:%S', time.gmtime(time.time()-time_start)))
print("mae = {:.0f}".format(gb_mae))
print("mse = {:.0f}".format(gb_mse))
print("r2_score = {:.3f}".format(gb_r2_score))
print("best params =", grid.best_params_)

train time =  00:00:02
mae = 16443
mse = 471243254
r2_score = 0.928
best params = {'gb__criterion': 'mse', 'gb__n_estimators': 100}




In [42]:
param_grid_dt = {
    'dt__criterion': ['mse','mae'],
    'dt__max_depth': [5,10,20,50],
    'dt__min_samples_leaf': [1,2,3,4,5]
    }

time_start = time.time()
pipe_ts = Pipeline(steps=[('scaler', StandardScaler()), ('dt', DecisionTreeRegressor())])

grid = GridSearchCV(pipe_ts, param_grid=param_grid_dt, cv=5, iid=False, n_jobs=-1)
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)

dt_mae =  mean_absolute_error(y_test, y_pred)
dt_mse =  mean_squared_error(y_test, y_pred)
dt_r2_score = r2_score(y_test, y_pred)

print("train time = ", time.strftime('%H:%M:%S', time.gmtime(time.time()-time_start)))
print("mae = {:.0f}".format(dt_mae))
print("mse = {:.0f}".format(dt_mse))
print("r2_score = {:.3f}".format(dt_r2_score))
print("best params =", grid.best_params_)

train time =  00:00:00
mae = 11998
mse = 400134904
r2_score = 0.939
best params = {'dt__criterion': 'mse', 'dt__max_depth': 50, 'dt__min_samples_leaf': 2}




In [43]:
param_grid_dt = {
    'dt__criterion': ['mse','mae'],
    'dt__max_depth': [5,10,20,50],
    'dt__min_samples_leaf': [1,2,3,4,5]
    }

time_start = time.time()
pipe_ts = Pipeline(steps=[('scaler', StandardScaler()), ('dt', DecisionTreeRegressor())])

grid = GridSearchCV(pipe_ts, param_grid=param_grid_dt, cv=5, iid=False, n_jobs=-1)
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)

dt_mae =  mean_absolute_error(y_test, y_pred)
dt_mse =  mean_squared_error(y_test, y_pred)
dt_r2_score = r2_score(y_test, y_pred)

print("train time = ", time.strftime('%H:%M:%S', time.gmtime(time.time()-time_start)))
print("mae = {:.0f}".format(dt_mae))
print("mse = {:.0f}".format(dt_mse))
print("r2_score = {:.3f}".format(dt_r2_score))
print("best params =", grid.best_params_)

train time =  00:00:00
mae = 12476
mse = 448740966
r2_score = 0.932
best params = {'dt__criterion': 'mse', 'dt__max_depth': 50, 'dt__min_samples_leaf': 2}


