## MLOps Agent - Python End to End
**Author**: Matthew Cohen

#### Scope
The scope of this Notebook is to provide instructions on how to use DataRobot's MLOps Agents.

#### Requirements

- Python 3.7.3
- MLOps Agent 6.1.3

Your version might be different but the below procedure should remain the same.

### Installing the Agent

In DataRobot, navigate to your deployment of the external model:
- Click on the Deployment itself.
- Then click on "Integrations and follow the instructions you see for your selected language (Python, Java, R).
- You will need to install the MLOPs Library. In python this is done with: 
  - pip install <unpacked DataRobot's Monitoring Agent tar file>/lib/datarobot_mlops-*-py2.py3-none-any.whl


Usage:
    DataRobot's Monitoring Agent is an advanced feature that enables monitoring for models deployed
    outside of DataRobot. Follow these steps to use this feature.
    1. Download and extract DataRobot's Monitoring Agent tar file. This is available through the
       DataRobot MLOps UI via User icon -> "Developer Tools" -> "External Monitoring Agent".
    2. Open your preferred browser. In toolbar, click "File" -> "Open File".
       Then choose this file <your unpacked directory>/docs/html/quickstart.html.
    3. Follow the "Quick Start" instructions to set up Monitoring Agent.
    4. This example uses the DataRobot's MLOps library which you can install with:
       pip install <unpacked DataRobot's Monitoring Agent tar file>/lib/datarobot_mlops-*-py2.py3-none-any.whl
    5. MLOps library requires the following parameters to be provided:
       deployment ID, model ID, output type [STDOUT, OUTPUT_DIR, NULL].
       If output type is OUTPUT_DIR, which means that Monitoring Agent will send data to the DataRobot MLOPs service,
       the following parameters also have to be configured: spool directory, max size per spool file,
       max number of spool files. Spool directory path must match the Monitoring Agent path configured by admin.
       For advanced usage, see the examples.
       These parameters can be configured with APIs, check the MLOps library init() call,
       or with environment variables. For example:
       # export MLOPS_DEPLOYMENT_ID='YOUR_DEPLOYMENT_ID'
       # export MLOPS_MODEL_ID='YOUR_DEPLOYMENT_ID'
       # export MLOPS_OUTPUT_TYPE=OUTPUT_DIR
       # export MLOPS_SPOOLER_DIR_PATH=/tmp/ta
       # export MLOPS_SPOOLER_FILE_MAX_SIZE=104857600
       # export MLOPS_SPOOLER_MAX_FILES=5

       Notes:
              - parameter configuration via the API takes precedence over environment variables.
              - for testing purposes, you can start with STDOUT output type,
                which doesn't require providing any of the spooler-related parameters.
    6. Run current snippet:
       python datarobot-report-stats.py

#### Once it is downloaded....
and saved to your local filesystem, open/uncompress the file

In [1]:
#
# Change directory to where you saved the tar file and uncompressed it
#  .../{agent install dir}
#
%cd ~/Documents/DR/MLOps/_Agent_example
!tar -xvf datarobot-mlops-agent-6.2.1-363.tar.gz
%cd ./datarobot-mlops-agent-6.2.1

/Users/matthew.cohen/Documents/DR/MLOps/_Agent_example
x datarobot-mlops-agent-6.2.1/
x datarobot-mlops-agent-6.2.1/bin/
x datarobot-mlops-agent-6.2.1/bin/stop-bosun.sh
x datarobot-mlops-agent-6.2.1/bin/start-bosun.sh
x datarobot-mlops-agent-6.2.1/bin/start-agent.sh
x datarobot-mlops-agent-6.2.1/bin/status-agent.sh
x datarobot-mlops-agent-6.2.1/bin/run-agent-once.sh
x datarobot-mlops-agent-6.2.1/bin/stop-agent.sh
x datarobot-mlops-agent-6.2.1/bin/status-bosun.sh
x datarobot-mlops-agent-6.2.1/logs/
x datarobot-mlops-agent-6.2.1/logs/mlops.agent.log
x datarobot-mlops-agent-6.2.1/logs/mlops.bosun.log
x datarobot-mlops-agent-6.2.1/bosun_plugins/
x datarobot-mlops-agent-6.2.1/bosun_plugins/mlops-bosun-plugin-externalcommand-6.2.1.jar
x datarobot-mlops-agent-6.2.1/.BUILD.info
x datarobot-mlops-agent-6.2.1/README.md
x datarobot-mlops-agent-6.2.1/conf/
x datarobot-mlops-agent-6.2.1/conf/stdout.mlops.log4j2.properties
x datarobot-mlops-agent-6.2.1/conf/mlops.log4j2.properties
x datarobot-mlops-

x datarobot-mlops-agent-6.2.1/examples/java/CodeGenExample/src/main/java/
x datarobot-mlops-agent-6.2.1/examples/java/CodeGenExample/src/main/java/com/
x datarobot-mlops-agent-6.2.1/examples/java/CodeGenExample/src/main/java/com/datarobot/
x datarobot-mlops-agent-6.2.1/examples/java/CodeGenExample/src/main/java/com/datarobot/mlops/
x datarobot-mlops-agent-6.2.1/examples/java/CodeGenExample/src/main/java/com/datarobot/mlops/examples/
x datarobot-mlops-agent-6.2.1/examples/java/CodeGenExample/src/main/java/com/datarobot/mlops/examples/CodeGenExample.java
x datarobot-mlops-agent-6.2.1/examples/java/CodeGenExample/src/main/resources/
x datarobot-mlops-agent-6.2.1/examples/java/CodeGenExample/src/main/resources/log4j2.xml
x datarobot-mlops-agent-6.2.1/examples/java/CodeGenExample/run_example.sh
x datarobot-mlops-agent-6.2.1/examples/java/CodeGenExample/README.md
x datarobot-mlops-agent-6.2.1/examples/java/CodeGenExample/create_deployment.sh
x datarobot-mlops-agent-6.2.1/examples/java/CodeGe

x datarobot-mlops-agent-6.2.1/docs/html/_sources/spark_monitoring_use_case.rst.txt
x datarobot-mlops-agent-6.2.1/docs/html/_sources/agent_installation.rst.txt
x datarobot-mlops-agent-6.2.1/docs/html/_sources/java_examples.rst.txt
x datarobot-mlops-agent-6.2.1/docs/html/_sources/java_spark_api.rst.txt
x datarobot-mlops-agent-6.2.1/docs/html/_sources/index.rst.txt
x datarobot-mlops-agent-6.2.1/docs/html/_sources/python_api.rst.txt
x datarobot-mlops-agent-6.2.1/docs/html/_sources/r_examples.rst.txt
x datarobot-mlops-agent-6.2.1/docs/html/_sources/mlops_python.rst.txt
x datarobot-mlops-agent-6.2.1/docs/html/_sources/overview.rst.txt
x datarobot-mlops-agent-6.2.1/docs/html/_static/
x datarobot-mlops-agent-6.2.1/docs/html/_static/minus.png
x datarobot-mlops-agent-6.2.1/docs/html/_static/pygments.css
x datarobot-mlops-agent-6.2.1/docs/html/_static/comment-close.png
x datarobot-mlops-agent-6.2.1/docs/html/_static/doctools.js
x datarobot-mlops-agent-6.2.1/docs/html/_static/up.png
x datarobot-ml

/Users/matthew.cohen/Documents/DR/MLOps/_Agent_example/datarobot-mlops-agent-6.2.1


####  To see the file structure...

In [13]:
!ls -l

total 64
-rw-r--r--@  1 matthew.cohen  staff   1816 Aug  4 09:06 LICENSE.txt
-rw-r--r--@  1 matthew.cohen  staff   1624 Aug  4 09:06 README.md
-rw-r--r--   1 matthew.cohen  staff  23927 Aug 12 06:52 actuals.csv
drwxr-xr-x@  9 matthew.cohen  staff    288 Aug 12 06:52 [34mbin[m[m
drwxr-xr-x@  3 matthew.cohen  staff     96 Aug  4 09:06 [34mbosun_plugins[m[m
drwxr-xr-x@  8 matthew.cohen  staff    256 Aug 11 17:25 [34mconf[m[m
drwxr-xr-x@  3 matthew.cohen  staff     96 Aug  4 09:06 [34mdocs[m[m
drwxr-xr-x@ 11 matthew.cohen  staff    352 Aug  4 09:06 [34mexamples[m[m
drwxr-xr-x@  6 matthew.cohen  staff    192 Aug  4 09:06 [34mlib[m[m
drwxr-xr-x@  5 matthew.cohen  staff    160 Aug 11 17:26 [34mlogs[m[m
drwxr-xr-x@  7 matthew.cohen  staff    224 Aug  4 09:06 [34mtools[m[m


#### Install dependency libraries

In [14]:
!pip install lib/datarobot_mlops-6.2.1-py2.py3-none-any.whl ##If you have a newer version of the agent, this could be different filename

You should consider upgrading via the '/Users/matthew.cohen/opt/anaconda3/bin/python -m pip install --upgrade pip' command.[0m


#### Open Quick Start

As noted in comment code from the Deployment Integrations tab above, open to get started with the agent software configuration steps:

.../{agent install dir}/docs/html/quickstart.html

#### Edit  .../{agent install dir}/conf/mlops.agent.conf.yaml to have this (everything else can stay as default if you want)

This file is contains the properties used by the MLOps service.  Namely, the DataRobpt host url, your authentication token, the spool to use queue data to send to MLOps.

In [15]:
"""
# Set your DR host:
mlopsURL: "https://app.datarobot.com"
    
# Set your API token
apiToken: "NWQ1NDA3ZTdmNTU1Y2Q......"

# Create the spool directory on your system that you want MLOps to use, eg /tmp/ta
channelConfigs:
  - type: "FS_SPOOL"
    details: {name: "bench", spoolDirectoryPath: "/tmp/ta"}
"""
!mkdir /tmp/ta

mkdir: /tmp/ta: File exists


#### Commands to get you started 

This will allow you to start, get status, and stop the MLOps agent service. You will only need to run start for now.  Run status if you want to check on the service.

In [16]:
# Start
!./bin/start-agent.sh

INFO: AGENT_CONFIG_YAML=/Users/matthew.cohen/Documents/DR/MLOps/_Agent_example/datarobot-mlops-agent-6.2.1/conf/mlops.agent.conf.yaml
INFO: AGENT_LOG_PROPERTIES=/Users/matthew.cohen/Documents/DR/MLOps/_Agent_example/datarobot-mlops-agent-6.2.1/conf/mlops.log4j2.properties
INFO: AGENT_JVM_OPT=-Xmx1G
INFO: AGENT_JAR_PATH=/Users/matthew.cohen/Documents/DR/MLOps/_Agent_example/datarobot-mlops-agent-6.2.1/lib/mlops-agent-6.2.1.jar
INFO: AGENT_LOG_PATH=/Users/matthew.cohen/Documents/DR/MLOps/_Agent_example/datarobot-mlops-agent-6.2.1/logs/mlops.agent.log

Starting MLOps-Agent


DataRobot MLOps-Agent is running.


In [17]:
# Get Status
!./bin/status-agent.sh

DataRobot MLOps-Agent is running as a service.


In [6]:
# Shutdown - DON'T RUN THIS CELL, IT'S JUST SHOWING YOU HOW TO SHUTDOWN
!./bin/stop-agent.sh

## Create an MLOps Model Package for a model and deploy it

#### Train a simple RandomForestClassifier model to use for this example

In [18]:
# Change the notebook shell working directory to the instal location.
# %cd /usr/local/opt/datarobot-mlops-agent-6.1.3

import pandas as pd
import numpy as np
import time
import csv
import pytz
import json
import yaml
import datetime
from sklearn.ensemble import RandomForestClassifier

TRAINING_DATA = './examples/data/surgical-dataset.csv'

df = pd.read_csv(TRAINING_DATA)

columns = list(df.columns)
arr = df.to_numpy()

np.random.shuffle(arr)

split_ratio = 0.8
prediction_threshold = 0.5

train_data_len = int(arr.shape[0] * split_ratio)

train_data = arr[:train_data_len, :-1]
label = arr[:train_data_len, -1]
test_data = arr[train_data_len:, :-1]
test_df = df[train_data_len:]

# train the model
clf = RandomForestClassifier(n_estimators=10, max_depth=2, random_state=0)
clf.fit(train_data, label)

RandomForestClassifier(max_depth=2, n_estimators=10, random_state=0)

#### Using the MLOps client, create a new model package to represent the random forest model we just created.  This includes uploading the traning data and enabling data drift.

In [19]:
from datarobot.mlops.mlops import MLOps
from datarobot.mlops.common.enums import OutputType
from datarobot.mlops.connected.client import MLOpsClient
from datarobot.mlops.common.exception import DRConnectedException
from datarobot.mlops.constants import Constants

# Read the model configuration info from the example.  This is used to create the model package.
with open('examples/model_config/surgical_binary_classification.json', "r") as f:
    model_info = json.loads(f.read())
model_info

# Read the mlops connection info from the provided example 
with open('./conf/mlops.agent.conf.yaml') as file:
    # The FullLoader parameter handles the conversion from YAML
    # scalar values to Python the dictionary format
    agent_yaml_dict = yaml.load(file, Loader=yaml.FullLoader)

MLOPS_URL = agent_yaml_dict['mlopsURL']
API_TOKEN = agent_yaml_dict['apiToken']

# Create connected client
mlops_connected_client = MLOpsClient(MLOPS_URL, API_TOKEN)

# Add training_data to model configuration
print("Uploading training data - {}. This may take some time...".format(TRAINING_DATA))
dataset_id = mlops_connected_client.upload_dataset(TRAINING_DATA)
print("Training dataset uploaded. Catalog ID {}.".format(dataset_id))
model_info["datasets"] = {"trainingDataCatalogId": dataset_id}

# Create the model package
print('Create model package')
model_pkg_id = mlops_connected_client.create_model_package(model_info)
model_pkg = mlops_connected_client.get_model_package(model_pkg_id)
model_id = model_pkg["modelId"]

# Deploy the model package
print('Deploy model package')

# Give the deployment a name:
DEPLOYMENT_NAME="Python binary classification remote model " + str(datetime.datetime.now())

deployment_id = mlops_connected_client.deploy_model_package(model_pkg["id"],
                                                            DEPLOYMENT_NAME)

# Enable data drift tracking
print('Enable feature drift')
enable_feature_drift = TRAINING_DATA is not None
mlops_connected_client.update_deployment_settings(deployment_id, target_drift=True,
                                                  feature_drift=enable_feature_drift)
_ = mlops_connected_client.get_deployment_settings(deployment_id)

print("\nDone.")
print("DEPLOYMENT_ID=%s, MODEL_ID=%s" % (deployment_id, model_id))

DEPLOYMENT_ID = deployment_id
MODEL_ID = model_id

Uploading training data - ./examples/data/surgical-dataset.csv. This may take some time...
Training dataset uploaded. Catalog ID 5f33fb45fa5ce90daaa8bc40.
Create model package
Deploy model package
Enable feature drift

Done.
DEPLOYMENT_ID=5f33fb6a735d1f1236a97347, MODEL_ID=5f33fb6899bcc941e5d28050


## Run Model Predictions

#### Call the external model's predict fuction and send prediction data to MLOps

In [20]:
#
# This can code be found on:
# 1. under the Integrtation tab for your depoyment in DataRobot MLOps, or in
# 2. the agent example code on your filesystem in ./examples/python/ and ./tools/
#    This example is from BinaryClassificationExample
#
CLASS_NAMES = ['1', "0"]
OUTPUT_TYPE = OutputType.OUTPUT_DIR

# Spool directory path must match the Monitoring Agent path configured by admin.
SPOOL_DIR = "/tmp/ta"
SPOOL_MAX_FILE_SIZE = 104857600
SPOOL_MAX_FILES = 5

ACTUALS_OUTPUT_FILE = "actuals.csv"
               
def process_predictions(deployment_id, model_id, output_type, spool_dir, spool_max_file_size, spool_max_files, class_names):
    """
    This is a binary classification algorithm example.
    User can call the DataRobot MLOps library functions to report statistics.
    """

    # Get predictions
    start_time = time.time()
    predictions = clf.predict_proba(test_data).tolist()
    num_predictions = len(predictions)
    end_time = time.time()
    
    # Get assocation id's for the predictions so we can track them with the actuals
    def _generate_unique_association_ids(num_samples):
        ts = time.time()
        return ["x_{}_{}".format(ts, i) for i in range(num_samples)]
    association_ids = _generate_unique_association_ids(len(test_data))

    # MLOPS: initialize the MLOps instance
    print("Get an MLOps instance")
    mlops = MLOps() \
        .set_deployment_id(deployment_id) \
        .set_model_id(model_id) \
        .set_output_type(output_type) \
        .set_spool_dir(spool_dir) \
        .set_spool_file_max_size(spool_max_file_size) \
        .set_spool_max_files(spool_max_files) \
        .init()

    # MLOPS: report the number of predictions in the request and the execution time.
    print("Send MLOps deployment stats")
    mlops.report_deployment_stats(num_predictions, end_time - start_time)

    # MLOPS: report the predictions data: features, predictions, class_names
    print("Send MLOps prediction data")
    mlops.report_predictions_data(features_df=test_df, 
                                  predictions=predictions, 
                                  class_names=class_names,
                                  association_ids=association_ids)
    
    target_column_name = columns[len(columns) - 1]
    target_values = []
    orig_labels = test_df[target_column_name].tolist()
    
    print("Wrote actuals file: %s" % ACTUALS_OUTPUT_FILE)
    def write_actuals_file(out_filename, test_data_labels, association_ids):
        """
         Generate a CSV file with the association ids and labels, this example
         uses a dataset that has labels already.
         In a real use case actuals (labels) will show after prediction is done.

        :param out_filename:      name of csv file
        :param test_data_labels:  actual values (labels)
        :param association_ids:   association id list used for predictions
        """
        with open(out_filename, mode="w") as actuals_csv_file:
            writer = csv.writer(actuals_csv_file, delimiter=",")
            writer.writerow(
                [
                    Constants.ACTUALS_ASSOCIATION_ID_KEY,
                    Constants.ACTUALS_VALUE_KEY,
                    Constants.ACTUALS_TIMESTAMP_KEY
                ]
            )
            tz = pytz.timezone("America/Los_Angeles")
            for (association_id, label) in zip(association_ids, test_data_labels):
                actual_timestamp = datetime.datetime.now().replace(tzinfo=tz).isoformat()
                writer.writerow([association_id, "1" if label else "0", actual_timestamp])

        
    # Write csv file with labels and association Id, when output file is provided
    write_actuals_file(ACTUALS_OUTPUT_FILE, orig_labels, association_ids)

    # MLOPS: release MLOps resources when finished.
    mlops.shutdown()

    print("Done.")

process_predictions(DEPLOYMENT_ID, MODEL_ID, OUTPUT_TYPE, SPOOL_DIR, SPOOL_MAX_FILE_SIZE, SPOOL_MAX_FILES, CLASS_NAMES)

Get an MLOps instance
Send MLOps deployment stats
Send MLOps prediction data
Wrote actuals file: actuals.csv
Done.


### Upload actuals back to MLOps

In [21]:
def _get_correct_actual_value(deployment_type, value):
    if deployment_type == "Regression":
        return float(value)
    return str(value)


def _get_correct_flag_value(value_str):
    if value_str == "True":
        return True
    return False
    
def upload_actuals():
    print("Connect MLOps client")
    mlops_connected_client = MLOpsClient(MLOPS_URL, API_TOKEN)
    deployment_type = mlops_connected_client.get_deployment_type(DEPLOYMENT_ID)

    actuals = []
    with open(ACTUALS_OUTPUT_FILE, mode="r") as actuals_csv_file:
        reader = csv.DictReader(actuals_csv_file)
        for row in reader:
            actual = {}
            for key, value in row.items():
                if key == Constants.ACTUALS_WAS_ACTED_ON_KEY:
                    value = _get_correct_flag_value(value)
                if key == Constants.ACTUALS_VALUE_KEY:
                    value = _get_correct_actual_value(deployment_type, value)
                actual[key] = value
            actuals.append(actual)

            if len(actuals) == 10000:
                mlops_connected_client.submit_actuals(DEPLOYMENT_ID, actuals)
                actuals = []

    # Submit the actuals
    print("Submit actuals")
    mlops_connected_client.submit_actuals(DEPLOYMENT_ID, actuals)
    
    print("Done.")    

upload_actuals()

Connect MLOps client
Submit actuals
Done.


### Stop the mlops service

In [22]:
!bin/stop-agent.sh

DataRobot MLOps-Agent shutdown done.
