<a href="https://colab.research.google.com/github/datarobot-community/DRU-MLOps/blob/master/10Dec2021_MLOps_II_Laboratory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **MLOps II Laboratory**

Welcome to the MLOps II Hands-On Lab!

**Pre-requisites:**
1. You will need a DataRobot account and an API key.  
2. Add your API Key to the first cell in the notebook. The API Key is found in the Developer Tools which is located on the profile icon in the DataRobot GUI App.
3. Once you create a model package and deploy it, you will need the model ID and deployment ID


**Documentation:**

The MLOps Agent tarball includes documentation in the /docs folder.




### ***You will complete certain lines of code in this notebook to provide the necessary functionality!***

HINTS: 
* Shell commands that take no parameters are shown as ___
* API calls that take no parameters are shown as \_\_\_()
* API calls that take 1 parameter are shown as \_\_\_(\_\_\_)
* API calls that take 2 parameters are shown as \_\_\_(\___ , \_\__)
* You get the idea.


# 0.- Some necessary Python modules will be installed

In [None]:
!pip install folium==0.2.1

### You will get an error message when installing the next module - please disregard it!

In [None]:
!pip install -U 'boto3<2'

In [None]:
!pip install 'urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1'

# 1.- Create and deploy a remote model package via the GUI

This is something that you have done already via the MLOps GUI.

# 2.- Specify Model ID and Deployment ID


We need to supply the Deployment ID and Model ID found in the code sample provided in MLOps under "Predictions" -> "Monitoring"

In [None]:
DEPLOYMENT_ID = ""
MODEL_ID = ""

# 3.- Add your API_KEY and the location of the DataRobot instance you are using.  


In [None]:
import yaml
import requests
import re
API_KEY = ""
DR_URL = "https://app.datarobot.com"

The following two shell commands will show you \
a) where we are within the Colab runtime and \
b) what is contained within it.

In [None]:
% pwd

In [None]:
% ls -al

# 4.- Download the MLOps Agent tarball to the local Colab directory.

In [None]:
# This cell downloads the MLOps Agent tarball
url = DR_URL + "/api/v2/mlopsInstaller"

headers = {'Authorization': 'Bearer {}'.format(API_KEY)}
response = requests.request("GET", url, headers=headers)
if 'UNAUTHORIZED' in response.reason:
    print('Put your real API key in')
with open("mlops-agents.tar.gz", "wb") as f:
    f.write(response.content)

In [None]:
# Lets grab the filename which has the latest version of the tarball
d = response.headers['content-disposition']
fname = re.findall("filename=(.+).tar.gz", d)[0]
n = fname.rfind("-")
filename = fname[:n]
filename

In [None]:
% ls

As shown by the output of the previous shell command, we now have the MLOps Agent tarball within the runtime.

# 5.- Untar the MLOPs Agent tarball, and then create a tmp directory to spool the predictions


In [None]:
# Untar the tarball
!tar -xvf /content/mlops-agents.tar.gz

In [None]:
# Here we create the directory where the spool file will be located
# This is where the MLOps Agent will look for prediction data
%cd $filename
!mkdir -p /tmp/ta
%ls -al

# 6.-  Install the MLOps library.

### The tarball contains Wheel files that wiil be used to install the MLOps Agent Libraries:  
### * **lib/datarobot_mlops-8.0.2-py2.py3-none-any.whl** 
### * **lib/datarobot_mlops_connected_client-8.0.2-py2.py3-none-any.whl**


In [None]:
# We now install the MLOps Agent Libraries
!pip install lib/datarobot_mlops-8.0.2-py2.py3-none-any.whl   ##If you have a newer version of the agent, this could be different filename

In [None]:
!pip install lib/datarobot_mlops_connected_client-8.0.2-py2.py3-none-any.whl   ##If you have a newer version of the agent, this could be different filename

# 7.- Edit mlops.agent.conf.yaml

This file contains the properties used in the configuration of the MLOps service.  For this notebook, you will only need to set the DR host and your API token.

For this purpose, we will edit the Configuration YAML file by reading it into a dictionary, modifying the corresponding fields in it, and then writing this dictionary back to the YAML file.

In [None]:
with open(r'conf/mlops.agent.conf.yaml') as file:      # read the yaml file as a dictionary
    documents = yaml.load(file)

# Set your DR host:
documents['mlopsUrl'] = DR_URL                         # set the required values in this dictionary

# Set your API token
documents['apiToken'] = API_KEY

with open('conf/mlops.agent.conf.yaml', "w") as f:     # write back the dictionary to the yaml file
    yaml.dump(documents, f)

In this notebook we will use FS_SPOOL as the messaging channel. More sophisticated monitoring will likely use other channels.

channelConfigs:
   - type: “FS_SPOOL”
     details: {name: “bench”, spoolDirectoryPath: “/tmp/ta”}
   - type: “SQS_SPOOL”
     details: {name: “sqsSpool”, queueUrl: “https://SQS_URL”}
   - type: “PUBSUB_SPOOL”
     details: {name: “pubsubSpool”, projectId: “yourprojectId”, topicName: “yourtopicName”}
   - type: “RABBITMQ_SPOOL”
     details: {name: “rabbit”, queueName: “rabbitmq”, queueUrl: “https://SQS_URL”}

Verify the changes in the mlops.agent.conf.yaml.  You should see the correct MLOps URL and API token.


In [None]:
print(open('conf/mlops.agent.conf.yaml').read())

# 8.- Start the agent and get its status

The following shell commands are required to \
a) start the MLOps Agent service. \
b) get the status of the MLOps Agent service. 

In [None]:
# Start the agent
!___

In [None]:
# Get agent status
!___

# 9.- Load scoring data 

* We now load the scoring data that will be used to obtain predictions. Navigate to the folder where the class material is and select the file named "**surgical-complication-scoring.csv**"

In [None]:
# Data for surgical complications is loaded ("surgical-complication-scoring.csv"). 
# The target is "complication"
from google.colab import files
uploaded = files.upload()

In [None]:
# Some required modules
import pandas as pd
import numpy as np
import time
import csv
import datetime
import joblib
import warnings
warnings.filterwarnings("ignore")

# The scoring data is read into a dataframe
scoring_df = pd.read_csv("surgical-complication-scoring.csv")

columns = list(scoring_df.columns)

# We make a copy of the scoring data as a Numpy array
scoring_data = scoring_df.to_numpy()

print("Done!")

In [None]:
# We grab the target column name, as well as the labels for the positive & negstive class
target_column_name = columns[len(columns) - 1]
orig_labels = scoring_df[target_column_name].tolist()  

# 10.- Upload a pickle file with a pre-trained model pipeline to Google Colab

We will load a pickle file named "pipeline.pkl" (found in the zip file that contains the class material); this file contains a pre-trained ML model pipeline. Navigate to the folder where the class material is and select the file named "**pipeline.pkl**"

In [None]:
uploaded = files.upload()

# 11.- Make predictions

## 11.1.- We call the remote model's predict function and send prediction data to MLOps. Note that the model is supplied using the pickle file uploaded in the previous step.

In [None]:
# MLOps Agent Library imports
from datarobot.mlops.mlops import MLOps
from datarobot.mlops.connected.client import MLOpsClient
from datarobot.mlops.common.exception import DRConnectedException
from datarobot.mlops.constants import Constants

In [None]:
# Some necessary variables will be defined first

CLASS_NAMES = ['0', "1"]

# Here we define the parameters of the spool file that is used as messaging channel
# Spool directory path must match the Monitoring Agent path configured by admin in the YAML configuration file.
SPOOL_DIR = "/tmp/ta"
MLOPS_FILESYSTEM_MAX_FILE_SIZE = 104857600
MLOPS_FILESYSTEM_MAX_NUM_FILES = 5

# name of the file that contains actuals
ACTUALS_OUTPUT_FILE = "actuals.csv"

In [None]:
# Spool file parameters are defined as environment variables
!export MLOPS_FILESYSTEM_MAX_FILE_SIZE
!export MLOPS_FILESYSTEM_MAX_NUM_FILES

## We are now ready to make predictions

In [None]:
# load pickle file with model pipeline
model = joblib.load(filename="pipeline.pkl")

# Get predictions
start_time = time.time()
predictions = model.predict_proba(scoring_data).tolist()
end_time = time.time()

# number of predictions
num_predictions = len(predictions)

# time required to generate the predictions
prediction_time = end_time - start_time

In [None]:
# Generate assocation ids for the predictions so we can match them with actuals
# this is necessary for accuracy monitoring
# The association ids are generated by taking the current time and appending a row counter to it
def generate_unique_association_ids(num_samples):
    ts = time.time()
    return ["x_{}_{}".format(ts, i) for i in range(num_samples)]

patient_id = generate_unique_association_ids(len(scoring_data))

In [None]:
# Initialize a MLOPS instance
mlops = MLOps().___(___) \
               .___(___) \
               .___(___) \
               .___()

In [None]:
# MLOPS: report the number of predictions in the request and the execution time.
print("Send MLOps deployment stats")
mlops.___(___, ___)

print("Done!")

In [None]:
# MLOPS: report the predictions data: features, predictions, class_names
print("Send MLOps prediction data")
mlops.___(features_df=___,  predictions=___, class_names=___, association_ids=___)

print("Done!")

## 11.2.- In the next steps we are simulating a situation in which we receive a file with actual outcomes observed by the business. 

In [None]:
# We are now going to define a function to write a simulated actuals file to the Colab runtime
import pytz
print("Wrote actuals file: %s" % ACTUALS_OUTPUT_FILE)
def write_actuals_file(out_filename, test_data_labels, association_ids):
    """
      Generate a CSV file with the association ids and labels, this example
      uses a dataset that has labels already.
      In a real use case actuals (labels) will show after prediction is done.

    :param out_filename:      name of csv file
    :param test_data_labels:  actual values (labels)
    :param association_ids:   association id list used for predictions
    """
    with open(out_filename, mode="w") as actuals_csv_file:
        writer = csv.writer(actuals_csv_file, delimiter=",")
        writer.writerow(
            [
                Constants.ACTUALS_ASSOCIATION_ID_KEY,
                Constants.ACTUALS_VALUE_KEY,
                Constants.ACTUALS_TIMESTAMP_KEY
            ]
        )
        tz = pytz.timezone("America/Los_Angeles")
        for (association_id, label) in zip(association_ids, test_data_labels):
            actual_timestamp = datetime.datetime.now().replace(tzinfo=tz).isoformat()
            writer.writerow([association_id, "1" if label else "0", actual_timestamp])           

In [None]:
# Write csv file with labels and association IDs
write_actuals_file(ACTUALS_OUTPUT_FILE, orig_labels, patient_id)

print("Done!")

In [None]:
# MLOPS: release MLOps resources when finished.
mlops.___()

print("Done!")

# 12.- Upload actuals back to MLOps

In [None]:
# A couple of utility functions

# If we deal with regression we return a number, otherwise a string
def _get_correct_actual_value(deployment_type, value):
    if deployment_type == "Regression":
        return float(value)
    return str(value)

# convert True/False strigns to boolean values
def _get_correct_flag_value(value_str):
    if value_str == "True":
        return True
    return False

In [None]:
# We now define another function to 
# 1) Read data from the "actuals.csv" file
# 2) Place the actual values in an array names "actuals"
# 3) Place actuals in the messaging channel

def upload_actuals():
    print("Connect MLOps client")           # create connected client object
    mlops_connected_client = ___(___, ___)

    # get deployment type
    deployment_type = mlops_connected_client.___(___)

    # read actuals file
    actuals = []           # THIS IS THE ARRAY THAT WILL CONTAIN ACTUALS (AS REPORTED BY THE HOSPITAL)
    with open(ACTUALS_OUTPUT_FILE, mode="r") as actuals_csv_file:
        reader = csv.DictReader(actuals_csv_file)
        for row in reader:
            actual = {}
            for key, value in row.items():
                if key == Constants.ACTUALS_WAS_ACTED_ON_KEY:
                    value = _get_correct_flag_value(value)
                if key == Constants.ACTUALS_VALUE_KEY:
                    value = _get_correct_actual_value(deployment_type, value)
                actual[key] = value
            actuals.append(actual)
         
    # Upload actuals to MLOps
    print("Submit actuals")
    mlops_connected_client.___(___, ___)
    
    print("Done!")    

In [None]:
upload_actuals()

# 13.- Stop the mlops service

In [None]:
% ls bin/

In [None]:
!___

# 14.- Inspect the MLOps agent logs

In [None]:
cat /content/datarobot_mlops_package-8.0.2/logs/mlops.agent.log