<a href="https://colab.research.google.com/github/bluetarget-ai/demos/blob/main/Getting_started_monitoring_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overview
BlueTarget is a MLOps platform which allows ML engineer and Data Science deploy, and monitor their machine learning models. 

In this section we're going to see, how to create a **Monitor** for a classification model.

# API KEY
First of all, we need an api-key to run this tutorial. 

1.   Click [here](https://deploy.bluetarget.ai/signup) to create a new account.
2.   Get your API-key clicking [here](https://deploy.bluetarget.ai/api-keys).




In [45]:
API_KEY = 'nuKS4WiD7aysMyCMu87CGV'

# Installation 

You need to install BlueTarget library in order to get started with monitoring. We use sklarn to create and emulate real-time inference

In [None]:
pip install bluetarget sklearn

# Train
Let's us consider a simple model trained on iris dataset to get started with monitoring. 

In [None]:
import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
feature_names = ["sepal_length", "sepal_width", "petal_length", "petal_width"]

iris_frame = pd.DataFrame(iris.data, columns = feature_names)
X = iris.data
y = np.array([iris.target_names[i] for i in iris.target])

#Split into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Model Training
clf = svm.SVC(gamma='scale', kernel='rbf', probability=True)
clf.fit(X, y)

# Monitor
A monitor is the representation of a machine learning model. 

A monitor can be identified by your own id or auto-generated id.

To learn more about monitor, click on the following [link](https://docs.deploy.bluetarget.ai/monitoring/defining-the-model-schema)


For this example, let us to create a monitor to manage our **iris model**.





In [None]:
import uuid
from bluetarget import Monitor, ModelSchema, MonitorPredictionType

monitor = Monitor(api_key=API_KEY)

MONITOR_ID = uuid.uuid4().hex

monitor.create(
    ModelSchema(
        monitorId=MONITOR_ID,
        name="Iris sklearn",
        description="sklearn model, rbf kernel",
        predictionType=MonitorPredictionType.CATEGORICAL,
    )
)


# Monitor version

A version, is the representation of each iteration of the machine learning model. For example, you have an ML in production, and you want to re-train, you should create a new monitor version, working in this way, you're able to see the performance, data drift and quality across of each iteration.




In [None]:
from bluetarget import ModelSchemaVersion, MonitorSchemaType

MONITOR_VERSION_ID = "v1"

monitor.create_version(MONITOR_ID, ModelSchemaVersion(
    versionId=MONITOR_VERSION_ID,
    model_schema={
        "sepal_length": MonitorSchemaType.FLOAT,
        "sepal_width": MonitorSchemaType.FLOAT,
        "petal_length": MonitorSchemaType.FLOAT,
        "petal_width": MonitorSchemaType.FLOAT
    }
))

# Logging the predictions

Let us create fake inference to show how it works.

For this example, we consider the following aspects.


*   Number of predictions: **3000**.
*   Date for predictions: **Random dates between today and 2 weeks ago**.






In [50]:
import random
from datetime import datetime, timedelta

# Generate random date between today and 2 weeks ago
def random_date():
  current_date = datetime.now()
  diff = random.randint(0, 14)
  return current_date - timedelta(days=diff)


In [None]:
from bluetarget import Prediction

# Predictions
prediction = clf.predict(X_test)
prediction_probabilities = clf.predict_proba(X_test)

# Generate fake inference to test with more data
X_test_inc = np.repeat(X_test, 100, axis=0)
prediction_inc = np.repeat(prediction, 100, axis=0)
prediction_probabilities_inc = np.repeat(prediction_probabilities, 100, axis=0)
actuals_inc = np.repeat(y_test, 100)

#Store prediction id to set up the actuals
prediction_ids = []

# Let's send a batch of prediction
prediction_batch = []

CHUNK_SIZE = 300

for i in range(3000):

    features = {feature_names[j]: float(X_test_inc[i][j]) for j in range(4)}

    probabilities = {iris.target_names[j]: float(
        prediction_probabilities_inc[i][j]) for j in range(3)}
    
    prediction_id = uuid.uuid4().hex
    prediction_ids.append(prediction_id)
     
    # Fill the batch array to send in chunks of 300 
    prediction_batch.append(Prediction(
            prediction_id=prediction_id,
            features=features,
            value=prediction_inc[i],
            probabilities=probabilities,
            created_at=random_date()
    ))
    
    if len(prediction_batch) == CHUNK_SIZE:
      monitor.log_predictions(prediction_batch)
      prediction_batch = []

#Logging actuals

Let consider that we have the ground truth for the previus inferences. The idea is that you can see, the metrics on the platform.

For this example, we consider the following aspects.

Number of predictions: 3000.
Date for predictions: Random dates between today and 2 weeks ago.

In [52]:
def random_actual():
  return iris.target_names[random.randint(0, 2)]

In [53]:
from bluetarget import PredictionActual

actual_batch = []

for i in range(3000):
  
  # Fill the batch array to send in chunks of 300 
  actual_batch.append(PredictionActual(
    prediction_id=prediction_ids[i],
    value=random_actual(),
    created_at=random_date()
  ))

  if len(actual_batch) == CHUNK_SIZE:
    monitor.log_actuals(actual_batch)
    actual_batch = []

#Reference dataset


Adding a reference dataset, typically the dataset which was used to trained your model, helps you to understand when and where your data is drifting in production.


Let's replicate our original dataset to have more information.

In [54]:
from bluetarget import ColumnMapping

iris_frame['target'] = [iris.target_names[i] for i in iris.target]
iris_frame['prediction'] = clf.predict( iris.data )

reference_df = pd.concat([iris_frame] * 10, ignore_index=True)

monitor.add_reference_dataset(reference_df, column_mapping=ColumnMapping(
    features=feature_names, target="target", prediction="prediction"
))

#Download inference data

This section describes how a user can fetch the inference data with a python API which can be used to re-train the model on real time data.

In [None]:
inference_df = monitor.get_inference_dataset(
    start_time=datetime(2022, 12, 1),
    end_time=datetime(2022, 12, 14)
)

inference_df