## REUSING BLUEPRINTS, HYPERPARAMETERS and DEPLOYING MODELS via API

**Author**: Tim Whittaker

The point of this script is to illustrate the following
<a id="toc"></a>
1. [Pull blueprint for a model from an existing project](#ebp)
2. [Train that blueprint in a new project with a new data set](#tbp)
3. [Deploy the model (or replace in a current deployment)](#deploy)
4. [Keep the hyper parameters for step 2](#savehp)

## Requirements
* DataRobot Modeling API
Please us `pip install datarobot --upgrade` to get latest and greatest.  

__This example assumes that you have built a project using the wine quality dataset, and the project id and a specific model id are available.  If not, see wine_autopilot.py__

In [1]:
# # DataRobot upgrade command below if needed
# !pip install datarobot --upgrade

In [2]:
import pandas as pd
import datarobot as dr
from config import *
from datetime import datetime
import numpy as np
import yaml



## Get Data

In [3]:
## get data
## we are actually going to break this up and use half as new data
## the project we are pulling from was built on the entire wine-quality dataset.
wine = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv", delimiter=";")
np.random.seed(1)
msk = np.random.rand(len(wine)) < 0.75
old_data = wine[msk]
new_data = wine[~msk]

In [4]:
new_data.describe()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,1219.0,1219.0,1219.0,1219.0,1219.0,1219.0,1219.0,1219.0,1219.0,1219.0,1219.0,1219.0
mean,6.845406,0.276948,0.336678,6.399754,0.045771,35.106235,137.548811,0.993951,3.187916,0.484668,10.564731,5.922888
std,0.861629,0.096943,0.126456,5.078021,0.02284,16.396451,42.710354,0.003004,0.149502,0.107361,1.246939,0.902811
min,3.8,0.08,0.0,0.6,0.014,3.0,18.0,0.98713,2.79,0.23,8.4,3.0
25%,6.3,0.21,0.27,1.7,0.035,23.0,107.0,0.991665,3.09,0.4,9.5,5.0
50%,6.8,0.26,0.32,5.2,0.042,33.0,133.0,0.9937,3.18,0.47,10.4,6.0
75%,7.4,0.32,0.39,10.0,0.05,45.0,167.0,0.99592,3.28,0.54,11.4,6.0
max,11.8,1.005,1.66,31.6,0.301,118.5,366.5,1.0103,3.81,1.0,14.0,9.0


<a id="ebp"></a>
## Get an existing blueprint
[Table of Contents](#toc) 

In [5]:
## config datarobot Client
## Don't keep api token in script.  Place your token in a config file
## so there is no concern of accidentally sharing.  See config.py
dr.Client(token=DATAROBOT_API_TOKEN, endpoint=DATAROBOT_ENDPOINT)

## original project id and original model id are accessible from the gui url
## just click on desired model in leaderboard and pull approapriate ids from url
## for example.
## https://app.datarobot.com/projects/<project_id>/models/<model_id>/blueprint
original_pid = "5cf71ab5d9436e2c4d0c7a7b"
original_mid = "5cf71c005ff3772856c2a81b"

## ============================================================================#
## 1. Pull blueprint for a model from an existing project
## all we now at this point is the original project id
## as well as the model id we want to use.
## the project id and model id are available in gui by clicking on the model
## and pulling info from url, for example
## https://app.datarobot.com/projects/<project_id>/models/<model_id>/blueprint

re_orig_project = dr.Project.get(project_id=original_pid)
blueprints = re_orig_project.get_blueprints()
models = re_orig_project.get_models()

xgb_model = [m for m in models if m.id == original_mid].pop()

xgb_blueprint = [bp for bp in blueprints if bp.id == xgb_model.blueprint_id].pop()
## instead of finding the particular blueprint, we could just use
# xgb_blueprint = xgb_model.blueprint_id
## be advised this returns a string and not an actual Blueprint object.
## ============================================================================#


<a id="tbp"></a>
## Train that blueprint in a new project with a new data set
[Table of Contents](#toc)

In [6]:
## ============================================================================#
## 2. Train on that blueprint in a new project with a new data set
## one thing to consider - do we want the same exact set of hyperparameters and blueprint
## used in the previous project (case a), or do we just want the same blue print (case b)
## and let DataRobot figure out the new set of best hyperparameters for the data?
## there is a chance it will learn the same hyperparameters on the new data.  

## create a new project
new_project = dr.Project.create(sourcedata=new_data,
                           project_name='new wine data {}'.format(datetime.now()))
new_project.set_target(target="quality", mode="manual")

## here we are using the blueprint only.
## as DataRobot runs the model is will select the best hyperparameters based on the
## data.  It is entirely possible that DataRobot will select the same hyperparameters 
## as in the original project.  
new_project.train(xgb_blueprint, source_project_id=original_pid, sample_pct=64)
## the following would also have worked
# new_project.train(xgb_model.blueprint_id, source_project_id=original_pid, sample_pct=43)
model_job = new_project.get_model_jobs()
done = model_job[0].get_result_when_complete()

In [13]:
new_features = new_project.get_features()
orig_features = re_orig_project.get_features()

new_features = set( [ (f.name, f.feature_type) for f in new_features])
orig_features = set( [ (f.name, f.feature_type) for f in orig_features])


In [17]:
if orig_features.difference(new_features) != set():
    print("features in new project are different from old project on basis of name and type")

### Reuse hyperparameters from original project

In [7]:
## now suppose that I wanted to use the same exact set of hyperparameters as used
## in the original project.
hyper_params = xgb_model.get_advanced_tuning_parameters()
## PLEASE BE ADVISED: as of DR Python API 2.17, the "default_value" key contains
## best of searched parameters.  This may change in a later version.
## PLEASE BE ADVISED: some models aren't tunable, thus an exception will be tossed.
## If `get_advanced_tuning_parameters` tosses an exception with a 500 internal
## server error message, please reach out to support.

## the best of searched hyperparameters.
best_hyper_params = dict([(param["parameter_id"], param["default_value"]) for param in hyper_params["tuning_parameters"]])
new_xgb = new_project.get_models()[0]
model_job = new_xgb.advanced_tune(best_hyper_params, description="hyperparameters from original project {}".format(original_pid))
## PLEASE BE ADVISED: an exception will be tossed if a model on the leaderboard with sameblueprint has the
## same set of hyperparameters.  This means that when we trainined on the blueprint, and let DR learn they 
## hyperparameters, it learned the same set.  
new_xgb_tuned = model_job.get_result_when_complete()

## next open the leaderboard browser to view the models.
## the model with the old hyperparameters will have a description as set above.
new_project.open_leaderboard_browser()

True

<a id="deploy"></a>
## Deploy
[Table of Contents](#toc)

In [8]:
## 3. Deploy the model (or replace in a current deployment)
## model deployment is available in python api 2.17.0 and this script will be
## updated soon.
print("=")
prediction_server = dr.PredictionServer.list()[0]
prediction_server.id

## grab current deployments
deployments = dr.Deployment.list()

## let's deploy the xgboost from the original project
deployment = dr.Deployment.create_from_learning_model(
    xgb_model.id, label='xgBoost Model', description='A new deployment',
    default_prediction_server_id=prediction_server.id)

print(deployment.id)  ## this is also available via gui url
deployment_id = deployment.id
## clean up
del deployment, deployments

=
5cf913b787cf0a073b663311


### Replace Deployment

In [9]:
## oops, we should have deployed the new xgBoost model tuned with the
## original models hyperparameters
from datarobot.enums import MODEL_REPLACEMENT_REASON

deployment = dr.Deployment.get(deployment_id=deployment_id)

print("current deployment details\n\tmodel type:{}\n\tmodel id:{}".format(deployment.model['type'],deployment.model['id']))

deployment.replace_model(new_xgb_tuned.id, MODEL_REPLACEMENT_REASON.OTHER)

print("new deployment details\n\tmodel type:{}\n\tmodel id:{}".format(deployment.model['type'],deployment.model['id']))

current deployment details
	model type:eXtreme Gradient Boosted Trees Regressor
	model id:5cf71c005ff3772856c2a81b
new deployment details
	model type:eXtreme Gradient Boosted Trees Regressor (Least-Squares Loss)
	model id:5cf91396d9436e76200c7ace


Example of the new model on the leaderboard as well as a description.
<img src="img/scree-grab.png"></img>

<a id="savehp"></a>
## Stash hyperparameters
[Table of Contents](#toc)

In [18]:
try:
    assert(yaml.__version__ == "5.1")
except:
    print("loading hyperparameters from yaml may throw and exception.  Try setting Loader=None")

In [19]:
## 4. Keep the hyper parameters for step 2
## In any event, regardless of which model we want to keep we can easily store hyperparameters on disk
## options include yaml, pickle, etc.  Yaml example below.

with open("model_hyperparameters.yaml", "w") as f:
    f.write( yaml.dump(hyper_params))

## load hyperparameters back into python dictionary.
with open("model_hyperparameters.yaml", "r") as f:
    hyperparams_dict = yaml.load(f, Loader=yaml.FullLoader)