### 6. Python API Training - Continuous Model Training [Solution]

<b>Author:</b> Thodoris Petropoulos <br>
<b>Contributors:</b> Rajiv Shah

This is the 6th exercise to complete in order to finish your `Python API Training for DataRobot` course! This exercise teaches you how to deploy a trained model, make predictions (**Warning**: Multiple ways of getting predictions out of DataRobot), and monitor drift to replace a model.

Here are the actual sections of the notebook alongside time to complete: 

1. Connect to DataRobot. [3min]<br>
2. Retrieve the first project created in `Exercise 4 - Model Factory`. [5min]
3. Search for the `recommended for deployment` model and deploy it as a rest API. [20min]
4. Create a scoring procedure  using dataset (1) that will force data drift on that deployment. [25min]
5. Check data drift. Does it look like data is drifting?. [3min]
6. Create a new project using data (2). [5min]
7. Replace the previously deployed model with the new `recommended for deployment` model from the new project. [10min]

Each section will have specific instructions so do not worry if things are still blurry!

As always, consult:

- [API Documentation](https://datarobot-public-api-client.readthedocs-hosted.com)
- [Samples](https://github.com/datarobot-community/examples-for-data-scientists)
- [Tutorials](https://github.com/datarobot-community/tutorials-for-data-scientists)

The last two links should provide you with the snippets you need to complete most of these exercises.

<b>Data</b>

(1) The dataset we will be using throughout these exercises is the well-known `readmissions dataset`. You can access it or directly download it through DataRobot's public S3 bucket [here](https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes.csv).

(2) This dataset will be used to retrain the model. It can be accessed [here](https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes_scoring.csv) through DataRobot's public S3 bucket.

### Import Libraries
Import libraries here as you start finding out what libraries are needed. The DataRobot package is already included for your convenience.

In [1]:
import datarobot as dr

#Proposed Libraries needed
import pandas as pd

### 1. Connect to DataRobot [3min]

In [2]:
#Possible solution
dr.Client(config_path='../../github/config.yaml')

<datarobot.rest.RESTClientObject at 0x113535d30>

### 2. Retrieve the first project created in `Exercise 4 - Model Factory` . [5min]

This should be the first project created during the exercise. Not one of the projects created using a sample of `readmission_type_id`.

In [3]:
#Proposed Solution
project = dr.Project.get('YOUR_PROJECT_ID')

### 3. Search for the `recommended for deployment` model and deploy it as a rest API. [10min]

**Hint**: The recommended model can be found using the `DataRobot.ModelRecommendation` method. 

**Hint 2**: Use the `update_drift_tracking_settings` method on the DataRobot Deployment object to enable data drift tracking.

In [31]:
# Proposed Solution

#Find the recommended model
recommended_model = dr.ModelRecommendation.get(project.id).get_model()

#Deploy the model
prediction_server = dr.PredictionServer.list()[0]

deployment = dr.Deployment.create_from_learning_model(recommended_model.id, label='Readmissions Deployment', default_prediction_server_id=prediction_server.id)
deployment.update_drift_tracking_settings(feature_drift_enabled=True)

### 4. Create a scoring procedure using dataset (1) that will force data drift on that deployment.  [25min]

**Instructions**
1. Take the first 100 rows of dataset (1) and save them to a Pandas DataFrame
2. Score 5 times using these observations to force drift.
3. Use the deployment you created during `question 3`.

**Hint**: The easiest way to score using a deployed model in DataRobot is to go to the `Deployments` page within DataRobot and navigate to the `Integrations` and `scoring code` tab. There you will find sample code for Python that you can use to score.

**Hint 2**: The only thing you will have to change for the code to work is change the filename variable to point to the csv file to be scored and create a for loop.

In [32]:
# Proposed Solution 

#Save the dataset that is going to be scored as a csv file
scoring_dataset = pd.read_csv('https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes.csv').head(100)
scoring_dataset.to_csv('scoring_dataset.csv', index=False)

#This has been copied from the `integrations` tab. 
#The only thing you actually have to do is change the filename variable in the bottom of the script and
#create the for loop.

"""
Usage:
    python datarobot-predict.py <input-file.csv>
 
This example uses the requests library which you can install with:
    pip install requests
We highly recommend that you update SSL certificates with:
    pip install -U urllib3[secure] certifi
"""
import sys
import json
import requests
 
DATAROBOT_KEY = ''
API_KEY = ''
USERNAME = ''
 
DEPLOYMENT_ID = ''
MAX_PREDICTION_FILE_SIZE_BYTES = 52428800  # 50 MB
 
 
class DataRobotPredictionError(Exception):
    """Raised if there are issues getting predictions from DataRobot"""
 
 
def make_datarobot_deployment_predictions(data, deployment_id):
    """
    Make predictions on data provided using DataRobot deployment_id provided.
    See docs for details:
         https://app.eu.datarobot.com/docs/users-guide/predictions/api/new-prediction-api.html
 
    Parameters
    ----------
    data : str
        Feature1,Feature2
        numeric_value,string
    deployment_id : str
        The ID of the deployment to make predictions with.
 
    Returns
    -------
    Response schema:
        https://app.eu.datarobot.com/docs/users-guide/predictions/api/new-prediction-api.html#response-schema
 
    Raises
    ------
    DataRobotPredictionError if there are issues getting predictions from DataRobot
    """
    # Set HTTP headers. The charset should match the contents of the file.
    headers = {'Content-Type': 'text/plain; charset=UTF-8', 'datarobot-key': DATAROBOT_KEY}
 
    url = 'https://cfds.orm.eu.datarobot.com/predApi/v1.0/deployments/{deployment_id}/'\
          'predictions'.format(deployment_id=deployment_id)
    # Make API request for predictions
    predictions_response = requests.post(
        url,
        auth=(USERNAME, API_KEY),
        data=data,
        headers=headers,
    )
    _raise_dataroboterror_for_status(predictions_response)
    # Return a Python dict following the schema in the documentation
    return predictions_response.json()
 
 
def _raise_dataroboterror_for_status(response):
    """Raise DataRobotPredictionError if the request fails along with the response returned"""
    try:
        response.raise_for_status()
    except requests.exceptions.HTTPError:
        err_msg = '{code} Error: {msg}'.format(
            code=response.status_code, msg=response.text)
        raise DataRobotPredictionError(err_msg)
 
 
def main(filename, deployment_id):
    """
    Return an exit code on script completion or error. Codes > 0 are errors to the shell.
    Also useful as a usage demonstration of
    `make_datarobot_deployment_predictions(data, deployment_id)`
    """
    if not filename:
        print(
            'Input file is required argument. '
            'Usage: python datarobot-predict.py <input-file.csv>')
        return 1
    data = open(filename, 'rb').read()
    data_size = sys.getsizeof(data)
    if data_size >= MAX_PREDICTION_FILE_SIZE_BYTES:
        print(
            'Input file is too large: {} bytes. '
            'Max allowed size is: {} bytes.'
        ).format(data_size, MAX_PREDICTION_FILE_SIZE_BYTES)
        return 1
    try:
        predictions = make_datarobot_deployment_predictions(data, deployment_id)
    except DataRobotPredictionError as exc:
        print(exc)
        return 1
    print(json.dumps(predictions, indent=4))
    return 0
 
for i in range(0,5):
    filename = 'scoring_dataset.csv'
    main(filename, DEPLOYMENT_ID)

{
    "data": [
        {
            "predictionValues": [
                {
                    "value": 0.1951537877,
                    "label": 1.0
                },
                {
                    "value": 0.8048462123,
                    "label": 0.0
                }
            ],
            "predictionThreshold": 0.5,
            "prediction": 0.0,
            "rowId": 0
        },
        {
            "predictionValues": [
                {
                    "value": 0.2464775145,
                    "label": 1.0
                },
                {
                    "value": 0.7535224855,
                    "label": 0.0
                }
            ],
            "predictionThreshold": 0.5,
            "prediction": 0.0,
            "rowId": 1
        },
        {
            "predictionValues": [
                {
                    "value": 0.523460269,
                    "label": 1.0
                },
                {
                    "value": 0.4

{
    "data": [
        {
            "predictionValues": [
                {
                    "value": 0.1951537877,
                    "label": 1.0
                },
                {
                    "value": 0.8048462123,
                    "label": 0.0
                }
            ],
            "predictionThreshold": 0.5,
            "prediction": 0.0,
            "rowId": 0
        },
        {
            "predictionValues": [
                {
                    "value": 0.2464775145,
                    "label": 1.0
                },
                {
                    "value": 0.7535224855,
                    "label": 0.0
                }
            ],
            "predictionThreshold": 0.5,
            "prediction": 0.0,
            "rowId": 1
        },
        {
            "predictionValues": [
                {
                    "value": 0.523460269,
                    "label": 1.0
                },
                {
                    "value": 0.4

{
    "data": [
        {
            "predictionValues": [
                {
                    "value": 0.1951537877,
                    "label": 1.0
                },
                {
                    "value": 0.8048462123,
                    "label": 0.0
                }
            ],
            "predictionThreshold": 0.5,
            "prediction": 0.0,
            "rowId": 0
        },
        {
            "predictionValues": [
                {
                    "value": 0.2464775145,
                    "label": 1.0
                },
                {
                    "value": 0.7535224855,
                    "label": 0.0
                }
            ],
            "predictionThreshold": 0.5,
            "prediction": 0.0,
            "rowId": 1
        },
        {
            "predictionValues": [
                {
                    "value": 0.523460269,
                    "label": 1.0
                },
                {
                    "value": 0.4

{
    "data": [
        {
            "predictionValues": [
                {
                    "value": 0.1951537877,
                    "label": 1.0
                },
                {
                    "value": 0.8048462123,
                    "label": 0.0
                }
            ],
            "predictionThreshold": 0.5,
            "prediction": 0.0,
            "rowId": 0
        },
        {
            "predictionValues": [
                {
                    "value": 0.2464775145,
                    "label": 1.0
                },
                {
                    "value": 0.7535224855,
                    "label": 0.0
                }
            ],
            "predictionThreshold": 0.5,
            "prediction": 0.0,
            "rowId": 1
        },
        {
            "predictionValues": [
                {
                    "value": 0.523460269,
                    "label": 1.0
                },
                {
                    "value": 0.4

{
    "data": [
        {
            "predictionValues": [
                {
                    "value": 0.1951537877,
                    "label": 1.0
                },
                {
                    "value": 0.8048462123,
                    "label": 0.0
                }
            ],
            "predictionThreshold": 0.5,
            "prediction": 0.0,
            "rowId": 0
        },
        {
            "predictionValues": [
                {
                    "value": 0.2464775145,
                    "label": 1.0
                },
                {
                    "value": 0.7535224855,
                    "label": 0.0
                }
            ],
            "predictionThreshold": 0.5,
            "prediction": 0.0,
            "rowId": 1
        },
        {
            "predictionValues": [
                {
                    "value": 0.523460269,
                    "label": 1.0
                },
                {
                    "value": 0.4

### 5. Check data drift. Does it look like data is drifting?. [3min]

Check data drift from within the `Deployments` page in the UI. Is data drift marked as red?

### 6. Create a new project using data (2). [5min]

Link to data: https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes_scoring.csv

In [None]:
#Proposed solution
new_project = dr.Project.create(sourcedata = 'https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes_scoring.csv',
                           project_name = '06_New_Project')

new_project.set_target(target = 'readmitted', mode = 'quick', worker_count = -1)
new_project.wait_for_autopilot()

In progress: 4, queued: 9 (waited: 0s)
In progress: 4, queued: 9 (waited: 1s)
In progress: 4, queued: 9 (waited: 2s)
In progress: 4, queued: 9 (waited: 4s)
In progress: 4, queued: 9 (waited: 6s)
In progress: 4, queued: 9 (waited: 8s)
In progress: 4, queued: 9 (waited: 13s)


### 7. Replace the previously deployed model with the new `recommended for deployment` model from the new project. [10min]

**Hint**: You will have to provide a reason why you are replacing the model. Try: `dr.enums.MODEL_REPLACEMENT_REASON.DATA_DRIFT`.

In [None]:
#Proposed Solution
new_recommended_model = dr.ModelRecommendation.get(new_project.id).get_model()
deployment.replace_model(new_recommended_model.id, dr.enums.MODEL_REPLACEMENT_REASON.DATA_DRIFT)