#### A Jupyter notebook log an external/unsupported ML model on MLflow

This notebook is a simplistic implementation of the tutorial [here](https://mlflow.org/docs/latest/model-registry.html#registering-an-unsupported-machine-learning-model).

In some cases, you might use a ML framework without its built-in MLflow Model flavor support. For instance, the [vaderSentiment](https://pypi.org/project/vaderSentiment/) library is a standard Natural Language Processing (NLP) library used for sentiment analysis. Since it lacks a built-in MLflow Model flavor, you cannot log or register the model using MLflow Model fluent APIs.

To work around this problem, you can create an instance of a `mlflow.pyfunc` model flavor and embed your NLP model inside it, allowing you to save, log or register the model. Once registered, load the model from the Model Registry and score using the predict function.

The code sections below demonstrate how to create a `PythonFuncModel` class with a `vaderSentiment` model embedded in it, save, log, register, and load from the Model Registry and score. We will use the package `SentimentIntensityAnalyzer` that returns sentiment scores for an input text.

In [95]:
# To use this example, you will need to pip install vaderSentiment (do this in environment)
# You may also install your model via pip, conda or clone it from GitHub repo, and finally import it here.
import os
import pandas as pd
# Import the MLflow package that helps you work with unsupported/external models
import mlflow.pyfunc
# Import the external package
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

#### MLflow sync and set up
In the following cell, we give a name to the experiment, run, `model_path`, and name with which we want to register the model.

In [96]:
experiment_name = 'Test'
run_name = 'test_vaderSentiment'
model_path = "vader" # name of folder in which the model will be saved
registered_model_name = "PyFuncVaderSentiments" # name with which the model will be registered
# Remote location of the S3 bucket (on AWS)
# You should have defined this as a custom key 
# in your environment
s3_bucket=os.environ['CUSTOM_KEY']
# Location to store the ML experiments locally
# This is also the location that you sync with the
# S3 bucket (see below)
tracking_uri = '/tmp/mlflow/db/'
# Sync all contents from the S3 bucket (remote) to the local location
os.system(f"aws s3 sync {s3_bucket} {tracking_uri} --quiet")
# Let mlflow where you are storing your ML experiments
mlflow.set_tracking_uri(tracking_uri)
# If the expr_name is not already in use, create one
does_experiment_exist = mlflow.get_experiment_by_name(experiment_name)
if not does_experiment_exist:
    mlflow.create_experiment(experiment_name)
else:
    print (f'Experiment with name {experiment_name} exists. Loading it...')
# If the expr_name is already in use, use it to track
# your MLflow
mlflow.set_experiment(experiment_name)


Experiment with name Test exists. Loading it...


<Experiment: artifact_location='/tmp/mlflow/db/794067698698294328', creation_time=1714987543542, experiment_id='794067698698294328', last_update_time=1714987543542, lifecycle_stage='active', name='Test', tags={}>

#### Define a class and extend from PythonModel

In the cell below we define a class using the method `PythonModel` from`mlflow.pyfunc`. Within the class we add the package `SentimentIntensityAnalyzer`.
Performing this step will bring the external model into `mlflow.pyfunc` framework. You can replance the `SentimentIntensityAnalyzer` with the package from the library you imported.

1. Within the `__init__()` function, we embed the package `SentimentIntensityAnalyzer`. Replace this by the package you'd like to work with. You can change this based on your library and its packages.
2. Within the `score()` function, we define the code to evaluate the sentiment of an input. Set it up based on your library and its packages.
3. Within the `predict()` function, we call the `score` function and return its output. Note that `predict()` is a reserved function within the `PythonModel` class, so it is required to have it with the parameters defined below. For most purposes, you will not need to modify it. We will use this function for inference purpose later.

In [97]:
# Define a class and extend from PythonModel
class SocialMediaAnalyserModel(mlflow.pyfunc.PythonModel):
    def __init__(self):
        super().__init__()
        # embed your vader model instance
        # Replace this by your own model
        self._analyser = SentimentIntensityAnalyzer()

     # predict the input from the vader sentiment model
    def _score(self, txt):
        # Replace this by the functions of your package
        prediction_scores = self._analyser.polarity_scores(txt)
        return prediction_scores

    def predict(self, context, model_input, params=None):
        # Call the score function
        model_output = self._score(model_input)
        return model_output

#### Test it out with an example

In [98]:
vader_model = SocialMediaAnalyserModel()
# Positive example
print (vader_model._score('Amazing movie'))
# Negative example
print (vader_model._score('Disappointing sequels'))
# Neutral example
print (vader_model._score('T-rex saves the world'))

{'neg': 0.0, 'neu': 0.208, 'pos': 0.792, 'compound': 0.5859}
{'neg': 0.762, 'neu': 0.238, 'pos': 0.0, 'compound': -0.4939}
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}


#### Save the model using mlflow.pyfunc

In [99]:
with mlflow.start_run(run_name=run_name) as run:
    run_id = run.info.run_id # run_id
    model_path = f"{model_path}-{run.info.run_id}" # name with the model should be saved within the run (you can set it to anything)
    mlflow.log_param("algorithm", "VADER")
    mlflow.pyfunc.save_model(path=model_path, python_model=vader_model)
    mlflow.pyfunc.log_model(
        artifact_path=model_path, # path where the model is saved during the run
        python_model=vader_model, # the ML model
        registered_model_name=registered_model_name, # name of the model with it is to be registered
    )
mlflow.end_run()

Registered model 'PyFuncVaderSentiments' already exists. Creating a new version of this model...
Created version '8' of model 'PyFuncVaderSentiments'.


#### Log metrics
Log some dummy metrics

In [100]:
# Create dummy metrics
metrics = {"mse": 2500.00, "rmse": 50.00}
# Log a batch of metrics to the run_id above
with mlflow.start_run(run_id=run_id):
    mlflow.log_metrics(metrics)

#### MLflow sync (IMPORTANT)
Once your ML experiment has ended, please sync your local copy with the S3 bucket.
Failure to do so will lead to loss of experiment logs

In [101]:
# Sync the local contents with the S3 bucket
os.system(f"aws s3 sync {tracking_uri} {s3_bucket} --quiet")

0

#### Inference
In this step we load the model back from MLflow for inference purposes.

In [102]:
# Load our pipeline as a generic python function
version_number = 7
model_uri = f"models:/{registered_model_name}/{version_number}" # models:/<name_of_registered_model>/<version_number>
loaded_model = mlflow.pyfunc.load_model(model_uri)
# inference of a positive example
print (loaded_model.predict('happy'))
# inference of a negative example
print (loaded_model.predict('sad'))
# inference of a neutral example
print (loaded_model.predict('sun'))


{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.5719}
{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.4767}
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
