[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datarobot-community/custom-models/blob/master/tracking_agents/python/Main_Script.ipynb)

## MLOps Agent - Python End to End

**Original Author**: Matthew Cohen

**Modified by**: Rodrigo Miranda, Mandie Quartly

#### Scope
The scope of this Notebook is to provide instructions on how to use DataRobot's MLOps Agents. 

#### Tested With
- Python 3.7.13
- MLOps Agent 8.0.7

Your version might be different but the below procedure should remain the same.

In [1]:
#Clone the repository
!git clone https://github.com/datarobot-community/custom-models

Cloning into 'custom-models'...
remote: Enumerating objects: 1407, done.[K
remote: Counting objects: 100% (147/147), done.[K
remote: Compressing objects: 100% (15/15), done.[K
remote: Total 1407 (delta 135), reused 132 (delta 132), pack-reused 1260[K
Receiving objects: 100% (1407/1407), 110.12 MiB | 22.23 MiB/s, done.
Resolving deltas: 100% (638/638), done.
Checking out files: 100% (196/196), done.


In [None]:
#Install needed packages
!pip install datarobot-mlops-connected-client
!pip install -r /content/custom-models/tracking_agents/python/requirements.txt

### Configuring the Agent

To configure the agent, we just need to define the DataRobot MLOps location and our API token. By default, the agent expects the data to be spooled on the local file system. Make sure that default location (`/tmp/ta`) exists.

The `token` needs to be your personal token found under Developer Tools in your DataRobot instance. The endpoint specified below is the DataRobot trial endpoint but you should change it if needed.


In [4]:
import datarobot as dr
import os

In [5]:
token = "YOUR_API_TOKEN"
 endpoint = "https://app2.datarobot.com"
## connect to DataRobot platform with python client. 
client = dr.Client(token, "{}/api/v2".format(endpoint))

In [6]:
mlops_agents_tb = client.get("mlopsInstaller")
with open("/content/custom-models/tracking_agents/python/mlops-agent.tar.gz", "wb") as f:
     f.write(mlops_agents_tb.content)

Once it is downloaded... and saved to your local filesystem, open/uncompress the file

In [7]:
!tar -xf /content/custom-models/tracking_agents/python/mlops-agent.tar.gz

In [9]:
#Save the details of the folder where the whl file is saved
with os.popen("ls /content") as pipe:
    for line in pipe:
        if line.startswith('datarobot_mlops_package'):
            mlops_package = line.strip()
            version = line.strip()[-5:]
print(mlops_package)
print(version)

datarobot_mlops_package-8.0.7
8.0.7


In [10]:
#Execute command and install mlops-agent
os.system('pip install /content/{}/lib/datarobot_mlops-{}-py2.py3-none-any.whl'.format(mlops_package, version))

0

If the installation of datarobot-mlops-connected-client in the cell above asks you to restart the runtime, you want to set up variables with package versions again: 

### Open Quick Start

As noted in comment code from the Deployment Integrations tab above, open to get started with the agent software configuration steps: `.../{agent install dir}/docs/html/index.html`

Edit `.../{agent install dir}/conf/mlops.agent.conf.yaml` to use this (everything else can stay as default if you want).  This file contains the properties used by the MLOps service. Namely, the DataRobot host url, your authentication token and the spool to use queue data to send to MLOps.

In [None]:
"""
# Set your DR host:
mlopsURL: "https://app2.datarobot.com"

# Set your API token
apiToken: "NWQ1NDA3ZTdmNTU1Y2Q......"

# Create the spool directory on your system that you want MLOps to use, eg /tmp/ta
channelConfigs:
  - type: "FS_SPOOL"
    details: {name: "bench", spoolDirectoryPath: "/tmp/ta"}
"""

### Commands to get you started 

This will allow you to start, get status, and stop the MLOps agent service. You will only need to run start for now.  Run status if you want to check on the service.

In [14]:
#!rm /content/datarobot_mlops_package-8.0.7/bin/PID.agent
!bash /content/datarobot_mlops_package-8.0.7/bin/start-agent.sh      #Change version based on the downloaded file

INFO: AGENT_CONFIG_YAML=/content/datarobot_mlops_package-8.0.7/conf/mlops.agent.conf.yaml
INFO: AGENT_LOG_PROPERTIES=/content/datarobot_mlops_package-8.0.7/conf/mlops.log4j2.properties
INFO: AGENT_JVM_OPT=-Xmx1G
INFO: AGENT_JAR_PATH=/content/datarobot_mlops_package-8.0.7/lib/mlops-agent-8.0.7.jar
INFO: AGENT_LOG_PATH=/content/datarobot_mlops_package-8.0.7/logs/mlops.agent.log

Starting MLOps-Agent


DataRobot MLOps-Agent is running.


In [15]:
!bash /content/datarobot_mlops_package-8.0.7/bin/status-agent.sh

DataRobot MLOps-Agent is running as a service.


In [None]:
# Shutdown - DON'T RUN THIS CELL, IT'S JUST SHOWING YOU HOW TO SHUTDOWN
#!bash datarobot_mlops_package-6.3.3/bin/stop-agent.sh

## Create an MLOps Model Package for a model and deploy it

### Train a simple RandomForestClassifier model to use for this example

In [None]:
import pandas as pd
import numpy as np
import time
import csv
import pytz
import json
import yaml
import datetime
from sklearn.ensemble import RandomForestClassifier

TRAINING_DATA = '/content/{}/examples/data/mlops-example-surgical-dataset.csv'.format(mlops_package)

df = pd.read_csv(TRAINING_DATA)

columns = list(df.columns)
arr = df.to_numpy()

np.random.shuffle(arr)

split_ratio = 0.8
prediction_threshold = 0.5

train_data_len = int(arr.shape[0] * split_ratio)

train_data = arr[:train_data_len, :-1]
label = arr[:train_data_len, -1]
test_data = arr[train_data_len:, :-1]
test_df = df[train_data_len:]

# train the model
clf = RandomForestClassifier(n_estimators=10, max_depth=2, random_state=0)
clf.fit(train_data, label)

RandomForestClassifier(max_depth=2, n_estimators=10, random_state=0)

### Create empty deployment in DataRobot MLOps

Using the MLOps client, create a new model package to represent the random forest model we just created.  This includes uploading the traning data and enabling data drift.

In [None]:
!cat /content/datarobot_mlops_package-8.0.7/examples/model_config/surgical_binary_classification.json

{
    "name": "MLOps Example Surgical Model",
    "modelDescription": {
        "modelName": "Binary Model for Surgical Complications",
        "description": "Binary classification on surgical dataset",
        "location": "/tmp/myModel"
    },
    "target": {
        "type": "Binary",
        "name": "complication",
        "classNames": ["1","0"],
        "predictionThreshold": 0.5
    }
}


In [None]:
from datarobot.mlops.mlops import MLOps
# from datarobot.mlops.common.enums import OutputType
from datarobot.mlops.connected.client import MLOpsClient
from datarobot.mlops.common.exception import DRConnectedException
from datarobot.mlops.constants import Constants

# Read the model configuration info from the example.  This is used to create the model package.
with open('/content/{}/examples/model_config/surgical_binary_classification.json'.format(mlops_package), "r") as f:
    model_info = json.loads(f.read())
model_info

# Read the mlops connection info from the provided example 
with open('/content/{}/conf/mlops.agent.conf.yaml'.format(mlops_package)) as file:
    # The FullLoader parameter handles the conversion from YAML
    # scalar values to Python the dictionary format
    agent_yaml_dict = yaml.load(file, Loader=yaml.FullLoader)

MLOPS_URL = agent_yaml_dict['mlopsUrl']
API_TOKEN = agent_yaml_dict['apiToken']

# Create connected client
mlops_connected_client = MLOpsClient(MLOPS_URL, API_TOKEN)

# Add training_data to model configuration
print("Uploading training data - {}. This may take some time...".format(TRAINING_DATA))
dataset_id = mlops_connected_client.upload_dataset(TRAINING_DATA)
print("Training dataset uploaded. Catalog ID {}.".format(dataset_id))
model_info["datasets"] = {"trainingDataCatalogId": dataset_id}

# Create the model package
print('Create model package')
model_pkg_id = mlops_connected_client.create_model_package(model_info)
model_pkg = mlops_connected_client.get_model_package(model_pkg_id)
model_id = model_pkg["modelId"]

# Create Prediction Environment (needed for Challengers)
print('Create Prediction Environment')
predEnv = {"name": "External Prediction Environment / Notebook",
           "description": "Notebook",
           "platform": 'gcp',
           "supportedModelFormats": ['externalModel']
           }
prediction_environment_id = mlops_connected_client.create_prediction_environment(predEnv)

# Deploy the model package
print('Deploy model package')

# Give the deployment a name:
DEPLOYMENT_NAME="Python binary classification remote model " + str(datetime.datetime.now())

deployment_id = mlops_connected_client.deploy_model_package(model_pkg["id"],
                                                            DEPLOYMENT_NAME,
                                                            prediction_environment_id=prediction_environment_id)

# Enable data drift tracking
print('Enable feature drift')
enable_feature_drift = TRAINING_DATA is not None
mlops_connected_client.update_deployment_settings(deployment_id, target_drift=True,
                                                  feature_drift=enable_feature_drift)
_ = mlops_connected_client.get_deployment_settings(deployment_id)

print("\nDone.")
print("\nDEPLOYMENT_ID=%s, MODEL_ID=%s" % (deployment_id, model_id))

DEPLOYMENT_ID = deployment_id
MODEL_ID = model_id

Uploading training data - /content/datarobot_mlops_package-8.0.7/examples/data/mlops-example-surgical-dataset.csv. This may take some time...
Training dataset uploaded. Catalog ID 62417df534ae96fc63bdc451.
Create model package
Create Prediction Environment
Deploy model package
Enable feature drift

Done.

DEPLOYMENT_ID=62417e32ebe6673cf978b81a, MODEL_ID=62417e304e43c6957ec5dad6


## Run Model Predictions

### Call the external model's predict fuction and send prediction data to MLOps

You can find Deployment and Model ID under `Deployments` --> `Predictions` --> `Monitoring` Tab.

In [None]:
# variables in case runtime is restarted, replace with your own
DEPLOYMENT_ID='62417e32ebe6673cf978b81a'
MODEL_ID='62417e304e43c6957ec5dad6'

In [None]:
import sys
import time
import random
import pandas as pd
 
from datarobot.mlops.mlops import MLOps

CLASS_NAMES = ["1", "0"]
SPOOL_DIR = "/content/tmp/ta"
ACTUALS_OUTPUT_FILE = '/content/actuals.csv'

In [None]:
mlops = MLOps() \
        .set_deployment_id(DEPLOYMENT_ID) \
        .set_model_id(MODEL_ID) \
        .set_filesystem_spooler(SPOOL_DIR) \
        .init()

In [None]:
# Get predictions
start_time = time.time()
predictions = clf.predict_proba(test_data).tolist()
num_predictions = len(predictions)
print(num_predictions)
end_time = time.time()

# Get assocation id's for the predictions so we can track them with the actuals
def _generate_unique_association_ids(num_samples):
    ts = time.time()
    return ["x_{}_{}".format(ts, i) for i in range(num_samples)]

association_ids = _generate_unique_association_ids(len(test_data))

400


In [None]:
# MLOPS: report the number of predictions in the request and the execution time.
mlops.report_deployment_stats(num_predictions, end_time - start_time)

True

In [None]:
# MLOPS: report the predictions data: features, predictions, class_names
mlops.report_predictions_data(features_df=test_df, 
                                predictions=predictions, 
                                class_names=CLASS_NAMES,
                                association_ids=association_ids)

True

In [None]:
# MLOPS: release MLOps resources when finished.
mlops.shutdown()

### Writing and uploading actuals to MLOps

In [None]:
from datarobot.mlops.constants import Constants

target_column_name = columns[len(columns) - 1]
target_values = []
orig_labels = test_df[target_column_name].tolist()

print("Writing actuals file: %s" % ACTUALS_OUTPUT_FILE)
def write_actuals_file(out_filename, test_data_labels, association_ids):
    """
    Generate a CSV file with the association ids and labels, this example
    uses a dataset that has labels already.
    In a real use case actuals (labels) will show after prediction is done.

    :param out_filename:      name of csv file
    :param test_data_labels:  actual values (labels)
    :param association_ids:   association id list used for predictions
    """
    with open(out_filename, mode="w") as actuals_csv_file:
        writer = csv.writer(actuals_csv_file, delimiter=",")
        writer.writerow(
            [
                Constants.ACTUALS_ASSOCIATION_ID_KEY,
                Constants.ACTUALS_VALUE_KEY,
                Constants.ACTUALS_TIMESTAMP_KEY
            ]
        )
        tz = pytz.timezone("America/Los_Angeles")
        for (association_id, label) in zip(association_ids, test_data_labels):
            actual_timestamp = datetime.datetime.now().replace(tzinfo=tz).isoformat()
            writer.writerow([association_id, "1" if label else "0", actual_timestamp])


# Write csv file with labels and association Id, when output file is provided
write_actuals_file(ACTUALS_OUTPUT_FILE, orig_labels, association_ids)

Writing actuals file: /content/actuals.csv


In [None]:
from datarobot.mlops.connected.client import MLOpsClient

# Read the mlops connection info from the provided example 
with open('/content/{}/conf/mlops.agent.conf.yaml'.format(mlops_package)) as file:
    # The FullLoader parameter handles the conversion from YAML
    # scalar values to Python the dictionary format
    agent_yaml_dict = yaml.load(file, Loader=yaml.FullLoader)

MLOPS_URL = agent_yaml_dict['mlopsUrl']
API_TOKEN = agent_yaml_dict['apiToken']


def _get_correct_actual_value(deployment_type, value):
    if deployment_type == "Regression":
        return float(value)
    return str(value)

def _get_correct_flag_value(value_str):
    if value_str == "True":
        return True
    return False
    
def upload_actuals():
    print("Connect MLOps client")
    mlops_connected_client = MLOpsClient(MLOPS_URL, API_TOKEN)
    deployment_type = mlops_connected_client.get_deployment_type(DEPLOYMENT_ID)

    actuals = []
    with open(ACTUALS_OUTPUT_FILE, mode="r") as actuals_csv_file:
        reader = csv.DictReader(actuals_csv_file)
        for row in reader:
            actual = {}
            for key, value in row.items():
                if key == Constants.ACTUALS_WAS_ACTED_ON_KEY:
                    value = _get_correct_flag_value(value)
                if key == Constants.ACTUALS_VALUE_KEY:
                    value = _get_correct_actual_value(deployment_type, value)
                actual[key] = value
            actuals.append(actual)

            if len(actuals) == 10000:
                mlops_connected_client.submit_actuals(DEPLOYMENT_ID, actuals)
                actuals = []

    # Submit the actuals
    print("Submit actuals")
    mlops_connected_client.submit_actuals(DEPLOYMENT_ID, actuals)
    
    print("Done.")    

upload_actuals()

Connect MLOps client
Submit actuals
Done.


In [None]:
!bash /content/datarobot_mlops_package-8.0.7/bin/stop-agent.sh    #Change version based on the downloaded file

DataRobot MLOps-Agent shutdown done.


In [None]:
# !rm /content/datarobot_mlops_package-8.0.7/bin/PID.agent   # Use to remove PID if agent wasn't closed cleanly