# Hands-on Workshop - *Predicting product ratings from reviews*

This workshop is focused on the creation, deployment, monitoring and management of a machine learning model for predicting product ratings from reviews.

In this notebook we will be exploring the data, and going through the steps to train the machine learning model itself. We will use a fine tuned [DistilBERT hugging face transformer model](https://huggingface.co/docs/transformers/main/en/model_doc/distilbert), which is a "is a small, fast, cheap and light Transformer model trained by distilling BERT base". For the sake of time, we will use a pretrained model rather than training the model in the workshop. 

We will then deploy our trained model using the Seldon Deploy SDK and view our running deployments in the Seldon Deploy UI. 

We will deploy a second Tensorflow model with slightly differing architecture as a Canary model to demonstrate the A/B testing functionality Seldon Deploy provides. This time we will deploy direct from the UI. 

Then we will begin to add the advanced monitoring that Seldon Alibi is famed for. 

-----------------------------------

Firstly, we will install and import the relevant packages which we will use throughout the exploration, training, and deployment process. Google Colab comes with a number of packages pre-installed, so we only need to install any additional packages we may need.

In [None]:
!pip install transformers
!pip install seldon_deploy_sdk
!pip install alibi_detect==0.8.1
!pip install datasets
!pip install nltk

In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd 
import datasets
import nltk

import tensorflow as tf
tf.get_logger().setLevel('INFO')

from transformers import AutoTokenizer, DefaultDataCollator, TFAutoModelForSequenceClassification

from sklearn.model_selection import train_test_split

from seldon_deploy_sdk import Configuration, ApiClient, SeldonDeploymentsApi, ModelMetadataServiceApi, DriftDetectorApi, BatchJobsApi, BatchJobDefinition
from seldon_deploy_sdk.auth import OIDCAuthenticator

from alibi_detect.cd import KSDrift
from alibi_detect.utils.saving import save_detector, load_detector

from google.cloud import storage

## Retrieving the data

The reviews data is held in a Google Storage bucket. We can download the data using the gsutil tool, which enables us to access Google Cloud storage from the command line. We then load the data into a Pandas DataFrame.

In [None]:
!gsutil cp gs://kelly-seldon/nlp-ratings/review_data.csv review_data.csv

In [None]:
df = pd.read_csv("review_data.csv", delimiter=";")
df.head()

## Preprocessing the data

First, we can drop all columns that we don't need to just leave us with ```rating``` and ```review```.

In [None]:
df.drop(columns=['product', 'user_id', 'date_created'], axis=1, inplace=True)
df.head()

Next, let's check for any missing data in our DataFrame and drop rows where the review is missing. 

In [None]:
is_NaN = df.isnull()
row_has_NaN = is_NaN.any(axis=1)
rows_with_NaN = df[row_has_NaN]
print(rows_with_NaN.head(), "\n\n", "Number of rows with missing values:", len(rows_with_NaN))

In [None]:
df = df.drop(rows_with_NaN.index)
df.reset_index(inplace=True, drop=True)
df.head()

We can then ensure that our reviews and ratings columns are strings.

In [None]:
df['review'] = df['review'].astype(str)
df['rating'] = df['rating'].astype(str)

Then we can map our string rating categories to integers, which will be the output labels for the model. 

In [None]:
rating_mapping = {
    '1.0': 0,
    '1.5': 1,
    '2.0': 2,
    '2.5': 3,
    '3.0': 4,
    '3.5': 5,
    '4.0': 6, 
    '4.5': 7,
    '5.0': 8
}

df['label'] = df['rating'].apply(lambda x: rating_mapping[x])

We can then drop the rating column to leave us with ```review``` and ```label```.

In [None]:
df.drop(columns="rating", axis=1, inplace=True)
df.head()

Now we can take a look at some of the reviews.

In [None]:
df.review[90]

In [None]:
df.review[29]

Here we can see a clear need for some text preprocessing of the reviews. 

We will carry out the following preprocessing steps:

- Removing punctuation
- Lowercasing
- Removing stopwords
- Lemmatisation

First we can remove all punctuation, using the string python library, which contains punctuation.

In [None]:
import string
string.punctuation

In [None]:
df_proc = df.copy()

In [None]:
def remove_punctuation(text):
    punctuationfree = "".join([i for i in text if i not in string.punctuation])
    return punctuationfree

#storing the puntuation free text
df_proc['processed_review']= df_proc['review'].apply(lambda x:remove_punctuation(x))
df_proc.head()

Then we can ensure all text is lowercase.

In [None]:
df_proc['processed_review']= df_proc['processed_review'].apply(lambda x: x.lower())
df_proc.head()

Then we can remove stopwords that don't add any predictive power. The NLTK library consists of a list of words that are considered stopwords for the English language.

In [None]:
nltk.download('stopwords')
#Stop words present in the library
stopwords = nltk.corpus.stopwords.words('english')

In [None]:
def remove_stopwords(text):
    text = ' '.join([word for word in text.split() if word not in stopwords])
    return text

In [None]:
df_proc['processed_review']= df_proc['processed_review'].apply(lambda x:remove_stopwords(x))
df_proc.head()

And finally we can carry out Lemmatisation to stem the words but ensure they maintain their meaning.

In [None]:
nltk.download('wordnet')
nltk.download('omw-1.4')
from nltk.stem import WordNetLemmatizer
# Defining the object for Lemmatisation
wordnet_lemmatizer = WordNetLemmatizer()

In [None]:
def lemmatizer(text):
    lemm_text = ' '. join([wordnet_lemmatizer.lemmatize(word) for word in text.split()])
    return lemm_text

df_proc['processed_review']=df_proc['processed_review'].apply(lambda x:lemmatizer(x))
df_proc.head()

In [None]:
df_proc.drop(columns="review", inplace=True, axis=1)
df_proc.head()

## Training the transformer model

Now we will run through the steps to train a hugging face Distilbert transformer using tensorflow. As mentioned previously, we won't actually train the model here, this has been done prior to the workshop, but we will load the trained model into the notebook so we can evaluate it here.

First we split the data into train and test sets. 

In [None]:
train, test = train_test_split(df_proc, test_size=0.3, random_state=42)

Then we convert to hugging face Datasets for ease in preprocessing.

In [None]:
train_ds = datasets.Dataset.from_pandas(train, preserve_index=False)
test_ds = datasets.Dataset.from_pandas(test, preserve_index=False)
comp_ds = datasets.DatasetDict({"train":train_ds,"test":test_ds})

Next we tokenize the text using a pretrained tokenizer. We need to first instantiate the Distilbert tokenizer class of the library from a pre-trained model vocabulary. Then we create a preprocessing function to tokenize text, truncate sequences to be no longer than DistilBERT’s maximum input length, and pad the text to maximum length accepted by the model, so all inputs are a uniform length. 

In [None]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

In [None]:
def preprocess_function(df):
    return tokenizer(df["processed_review"], padding="max_length", truncation=True)

We can use the hugging face Datasets map function to apply the preprocessing function over the entire dataset. We can speed up the map function by setting ```batched=True``` to process multiple elements of the dataset at once.

In [None]:
tokenized_revs = comp_ds.map(preprocess_function, batched=True)

Then we initialise a simple data collator creates batches of examples and returns tensors of the correct type.

In [None]:
data_collator = DefaultDataCollator(return_tensors="tf")

In order to fine-tune a model in TensorFlow, we then convert our train and test datasets to the tf.data.Dataset format.

In [None]:
tf_train_set = tokenized_revs["train"].to_tf_dataset(
    columns=["attention_mask", "input_ids"],
    label_cols=["labels"],
    shuffle=True,
    batch_size=16,
    collate_fn=data_collator
)

tf_test_set = tokenized_revs["test"].to_tf_dataset(
    columns=["attention_mask", "input_ids"],
    label_cols=["labels"],
    shuffle=False,
    batch_size=16,
    collate_fn=data_collator
)

We need to load DistilBERT with ```TFAutoModelForSequenceClassification```, along with setting the number of expected labels, which in our case is 9.

In [None]:
model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=9)

We can run ```model.summary()``` to see the model architecture.

In [None]:
model.summary()

We can see that the first layer is the pretrained distilbert layer, so we can set ```trainable``` to ```False``` to save time during model training as we don't need to retrain this layer.

Here we also set up an optimizer function, learning rate schedule and loss and accuracy metrics to monitor during training. We then compile the model to configure it for training. 

In [None]:
model.layers[0].trainable = False

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=tf.metrics.SparseCategoricalAccuracy(),
)

Finally, we call ```fit``` to fine-tune the model, using 5 epochs for training.

We will not run this and instead we will load a pre-trained model from a Google Storage bucket.

```model.fit(tf_train_set, validation_data=tf_test_set, epochs=5)```

## Evaluating the model

We can load in the pre-trained model for evaluation purposes.

The model is stored in the ```kelly-seldon``` Google Storage bucket at the path ```nlp-ratings/model/1```.

In [None]:
from pathlib import Path
Path("1").mkdir(parents=True, exist_ok=True)

In [None]:
client = storage.Client.create_anonymous_client()
bucket = client.bucket('kelly-seldon')

In [None]:
def load_model(bucket):
    blobs = bucket.list_blobs(prefix="nlp-ratings/model/1/")
    for blob in blobs:
        filename = blob.name.split('/')[-1]
        blob.download_to_filename("1/" + filename)
    model = TFAutoModelForSequenceClassification.from_pretrained("1", num_labels=9)
    return model

In [None]:
model = load_model(bucket)

We can call ```model.summary()``` again to double check the loaded model.

In [None]:
model.summary()

We can compile the model again after loading. 

In [None]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=tf.metrics.SparseCategoricalAccuracy(),
)

Create a batch of the test data to save time in the workshop. Usually we would use the whole test set to evaluate the model. The test set is actually passed as a validation set during the model training, so we can see validation loss and accuracy there also, but seeing as we aren't training the model in the workshop, we evaluate the model here.

In [None]:
test_batch = test[:200]
test_batch_ds = datasets.Dataset.from_pandas(test_batch, preserve_index=False)
tokenized_revs_batch = test_batch_ds.map(preprocess_function, batched=True)

test_tf = tokenized_revs_batch.to_tf_dataset(
    columns=["attention_mask", "input_ids"],
    label_cols=["labels"],
    shuffle=False,
    batch_size=16,
    collate_fn=data_collator
)

**Long computation in this cell!** ~ 3 mins

In [None]:
loss, accuracy = model.evaluate(test_tf)

print("Model accuracy: {:2.2%}".format(accuracy))
print("Model loss: {}".format(loss))

As we can see here, the accuracy is fairly low. If we look more closely, we see that it classifies almost all reviews as ```8```, translating to a rating of ```5.0```. If we then go back and look at the training set we see that it is very heavily skewed towards a rating of 5.0. We did go back and compute class weights and pass these during model training, however this didn't improve the model performance. This would require a lot more EDA and retraining until we achieved more success with classifying the other ratings, however for now we will move on to model deployment and monitoring to see Seldon's capabilities.

## Deploying the Model

As we have seen in the previous sections the reviews are pre-processed using a variety of techniques. In order to account for this we have 2 options for how to account for the pre-processing logic in production:

1. **Custom Model:** Incorporate the pre-processing directly in the predict method of a custom model. This provides simplicity when creating the deployment as there is only a single code base to worry about and a single component to be deployed.
2. **Input Transformer:** Make use of a separate container to perform all of the input transformation and then pass the vectors to the model for prediction. The schematic below outlines how this would work.

```
            ________________________________________
            |            SeldonDeployment          |
            |                                      |
Request -->  Input transformer   -->     Model    -->  Response
            |  (Pre-processing)          (SKLearn) |
            |                                      |
            ________________________________________
```
         
The use of an input transformer allows us to separate the pre-processing logic from the prediction logic. This means we can leverage the pre-packaged Tensorflow server provided by Seldon to serve our model, and each of the components can be upgraded independently of one another. However, it does introduce additional complexity in the deployment which is generated, and how that then interacts with advanced monitoring components such as outlier and drift detectors. 

This workshop will focus on the generation of a **custom model for this case**, therefore we need to define an `__init__` and `predict` method which shall load and perform inference respectively in our new deployment. 

-----


## Set up 

We then define our Seldon custom model. The component parts required to build the custom model are outlined below. Each of the files play a key part in building the eventual Seldon docker container.

---

### ReviewRatings.py


This is the critical file as it contains the logic associated with the deployment wrapped as part of a class by the same name as the Python file.

A key thing to note about the way this has been structured is that we have focused on making this deployment reusable. The ```__init__``` method accepts a custom predictor parameter - the path to the saved model (```model_path```).

The advantage of this is that it allows us to upgrade the model without having to re-build the container image. Additionally, if the logic was more general it could be used to accept a wider variety of objects for greater reusability.



```
import pandas as pd
import numpy as np
import datasets
from transformers import AutoTokenizer, DefaultDataCollator, TFAutoModelForSequenceClassification
from google.cloud import storage
import logging
import string
import nltk
from nltk.stem import WordNetLemmatizer

from pathlib import Path

Path("1").mkdir(parents=True, exist_ok=True)

nltk.download("stopwords", download_dir="./nltk")
nltk.download("wordnet", download_dir="./nltk")
nltk.download("omw-1.4", download_dir="./nltk")
nltk.data.path.append("./nltk")
# Stop words present in the library
stopwords = nltk.corpus.stopwords.words('english')

logger = logging.getLogger(__name__)


class ReviewRatings(object):
    def __init__(self, model_path):
        logger.info("Connecting to GCS")
        self.client = storage.Client.create_anonymous_client()
        self.bucket = self.client.bucket('kelly-seldon')

        logger.info(f"Model name: {model_path}")
        self.model = None
        self.prefix = model_path
        self.local_dir = "1/"

        self.wordnet_lemmatizer = WordNetLemmatizer()

        logger.info("Loading tokenizer and data collator")
        self.tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
        self.data_collator = DefaultDataCollator(return_tensors="tf")

        self.ready = False

    def load_model(self):
        logger.info("Getting model artifact from GCS")
        blobs = self.bucket.list_blobs(prefix=self.prefix)
        for blob in blobs:
            filename = blob.name.split('/')[-1]
            blob.download_to_filename(self.local_dir + filename)
        logger.info("Loading model")
        self.model = TFAutoModelForSequenceClassification.from_pretrained("1", num_labels=9)
        logger.info(f"{self.model.summary}")

    def preprocess_text(self, text, feature_names):
        logger.info("Preprocessing text")
        logger.info(f"Incoming text: {text}")
        text_list = text[0]
        dict_text = {"review": text_list}
        df = pd.DataFrame(data=dict_text)
        logger.info(f"Dataframe created: {df}")
        logger.info("Removing punctuation")
        df['review'] = df['review'].apply(lambda x: self.remove_punctuation(x))
        logger.info("Lowercase all characters")
        df['review'] = df['review'].apply(lambda x: x.lower())
        logger.info("Removing stopwords")
        df['review'] = df['review'].apply(lambda x: self.remove_stopwords(x))
        logger.info("Carrying out lemmatization")
        df['review'] = df['review'].apply(lambda x: self.lemmatizer(x))

        len_df = len(df)
        logger.info(f"{len(df)}")

        dataset = datasets.Dataset.from_pandas(df, preserve_index=False)
        logger.info(f"Dataset created: {dataset}")

        tokenized_revs = dataset.map(self.tokenize, batched=True)
        logger.info(f"Tokenized reviews: {tokenized_revs}")

        logger.info("Converting tokenized reviews to tf dataset")
        tf_inf = tokenized_revs.to_tf_dataset(
            columns=["attention_mask", "input_ids"],
            label_cols=["labels"],
            shuffle=True,
            batch_size=len_df,
            collate_fn=self.data_collator
        )
        logger.info(f"TF dataset created: {tf_inf}")

        return tf_inf

    def remove_punctuation(self, text):
        punctuation_free = "".join([i for i in text if i not in string.punctuation])
        return punctuation_free

    def remove_stopwords(self, text):
        text = ' '.join([word for word in text.split() if word not in stopwords])
        return text

    def lemmatizer(self, text):
        lemm_text = ' '.join([self.wordnet_lemmatizer.lemmatize(word) for word in text.split()])
        return lemm_text

    def tokenize(self, ds):
        return self.tokenizer(ds["review"], padding="max_length", truncation=True)

    def process_output(self, preds):
        logger.info("Processing model predictions")
        rating_preds = []
        for i in preds["logits"]:
            rating_preds.append(np.argmax(i, axis=0))

        logger.info("Create output array for predictions")
        rating_preds = np.array(rating_preds)

        return rating_preds

    def process_whole(self, text):
        tf_inf = self.preprocess_text(text, feature_names=None)
        logger.info("Predictions ready to be made")
        preds = self.model.predict(tf_inf)
        logger.info(f"Prediction type: {type(preds)}")
        logger.info(f"Predictions: {preds}")
        preds_proc = self.process_output(preds)
        logger.info(f"Processed predictions: {preds_proc}, Processed predictions type: {type(preds_proc)}")

        return preds_proc

    def predict(self, text, names=[], meta=[]):
        try:
            if not self.ready:
                self.load_model()
                logger.info("Model successfully loaded")
                self.ready = True
                logger.info(f"{self.model.summary}")
                pred_proc = self.process_whole(text)
            else:
                pred_proc = self.process_whole(text)

            return pred_proc

        except Exception as ex:
            logging.exception(f"Failed during predict: {ex}")

```

### Testing Locally

In order to ensure that we have gotten the `ReviewRatings.py` working correctly we can use the `seldon_core` Python package to run our model locally and test the endpoint. 

```
seldon-core-microservice ReviewRatings --service-type MODEL
                                        --parameters='[{ 
                                                        "name": "model_path",
                                                        "value": "nlp-ratings/model/1/",
                                                        "type": "STRING"
                                                        }]'
```

This endpoint can then be tested by posting cURL commands to the local endpoint: 

```
curl -H 'Content-Type: application/json' -d '{"data": {"ndarray": ["I love the product"]}}' http://localhost:9000/api/v1.0/predictions
```

Seldon then provides tools to build your reusable container image in two different ways:

1. Using ```s2i``` technology
2. Write a ```Dockerfile``` if you need more control. 

In this case, we will create a ```Dockerfile```:

```
FROM python:3.8-slim
WORKDIR /app

# Install python packages
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

# Copy source code
COPY . .

# Port for GRPC
EXPOSE 5000
# Port for REST
EXPOSE 9000

# Seldon runs as user 8888 thus change file permissions as such
RUN mkdir /.cache
RUN chown -R 8888 /.cache
RUN chown -R 8888 /app

# Define environment variables
ENV MODEL_NAME ReviewRatings
ENV PERSISTENCE 0
ENV SERVICE_TYPE MODEL

CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE
```

Note that in order for the Seldon base image to correctly convert your source code to an image it requires certain environment variables. In this case it is only 3 variables:
* `MODEL_NAME`: The model name matches the name of the Python file and class which is created. 
* `SERVICE_TYPE`: Seldon allows you to create many different components each specialised for a different purpose e.g. `TRANSFORMER` for performing pre or post-processing steps. 
* `PERSISTENCE`: In some cases you would like to save the state of your deployments to Redis e.g. when scaling up multi-armed bandits

We also have our ```requirements.txt``` file, which contains a list of Python packages which the deployment requires to run:

```
datasets == 2.2.2
numpy == 1.21.6
pandas == 1.3.5
tensorflow == 2.7.3
transformers == 4.20.0
seldon_core
google-cloud-storage
nltk
```

In this case we will use my pre-built container image for speed and simplicity, which we can now deploy using the SDK. 

### Deploying the model using the SDK

You can now deploy your model to the dedicated Seldon Deploy cluster which we have configured for this workshop. To do so you will interact with the Seldon Deploy SDK and deploy your model using that.

First, setting up the configuration and authentication required to access the cluster. Make sure to fill in the `SD_IP` variable to be the same as the cluster you are using.

In [None]:
SD_IP = "XXXXXXXX"

config = Configuration()
config.host = f"http://{SD_IP}/seldon-deploy/api/v1alpha1"
config.oidc_client_id = "sd-api"
config.oidc_server = f"http://{SD_IP}/auth/realms/deploy-realm"
config.oidc_client_secret = "sd-api-secret"
config.auth_method = "client_credentials"

def auth():
    auth = OIDCAuthenticator(config)
    config.id_token = auth.authenticate()
    api_client = ApiClient(configuration=config, authenticator=auth)
    return api_client

Now you have configured the IP correctly as well as setup your authentication function you can describe the deployment you would like to create. 

You will need to fill in the `YOUR_NAME` variable. This MUST be lower case as it will be used for the `DEPLOYMENT_NAME` variable later on.

The `MODEL_NAME` and `MODEL_PATH` variables have been prefilled for you as we are using a pretrained model and a pre-built container image for the sake of saving time in the workshop. 

The rest of the `mldeployment` description has been completed for you.


In [None]:
# MUST BE ALL LOWERCASE WITH NO UNDERSCORES
YOUR_NAME = YOUR_NAME

DEPLOYMENT_NAME = f"{YOUR_NAME}-prr"
CONTAINER_NAME = f"tomfarrand/review-ratings:0.3"

NAMESPACE = "seldon-gitops"

CPU_REQUESTS = "1"
MEMORY_REQUESTS = "2Gi"

CPU_LIMITS = "1"
MEMORY_LIMITS = "2Gi"

MODEL_PATH = "nlp-ratings/model/1/"

In [None]:
mldeployment = {
    "kind": "SeldonDeployment",
    "metadata": {
        "name": DEPLOYMENT_NAME,
        "namespace": NAMESPACE,
        "labels": {
            "fluentd": "true"
        }
    },
    "apiVersion": "machinelearning.seldon.io/v1alpha2",
    "spec": {
        "name": DEPLOYMENT_NAME,
        "annotations": {
            "seldon.io/engine-seldon-log-messages-externally": "true"
        },
        "protocol": "seldon",
        "predictors": [
            {
                "componentSpecs": [
                    {
                        "spec": {
                            "containers": [
                                {
                                    "name": f"{DEPLOYMENT_NAME}-container",
                                    "image": CONTAINER_NAME,
                                    "resources": {
                                        "requests": {
                                            "cpu": CPU_REQUESTS,
                                            "memory": MEMORY_REQUESTS
                                        },
                                        "limits": {
                                            "cpu": CPU_LIMITS,
                                            "memory": MEMORY_LIMITS
                                        }
                                    }
                                }
                            ]
                        }
                    }
                ],
                "name": "default",
                "replicas": 1,
                "traffic": 100,
                "graph": {
                    "name": f"{DEPLOYMENT_NAME}-container",
                    "parameters": [
                        {
                            "name":"model_path",
                            "value":MODEL_PATH,
                            "type":"STRING"
                        }
                    ],
                    "children": [],
                    "logger": {
                        "mode": "all"
                    }
                }
            }
        ]
    },
    "status": {}
}
      

You can now invoke the SeldonDeploymentsApi and create a new Seldon Deployment.

Time for you to get your hands dirty. You will use the Seldon Deploy SDK to create a new Seldon deployment. You can find the reference documentation [here](https://github.com/SeldonIO/seldon-deploy-sdk/blob/master/python/README.md).

In [None]:
deployment_api = SeldonDeploymentsApi(auth())
deployment_api.create_seldon_deployment(namespace=NAMESPACE, mldeployment=mldeployment)

You can access the Seldon Deploy cluster and view your freshly created deployment here:

Again, remember to replace the XXXXX with your cluster IP.


* URL: http://XXXXX/seldon-deploy/
* Username: admin@seldon.io
* Password: 12341234

## Adding metadata and prediction schema

Seldon Deploy has a model catalog where all deployed models are automatically registered. The model catalog can store custom metadata as well as prediction schemas for your models.

Metadata promotes lineage from across different machine learning systems, aids knowledge transfer between teams, and allows for faster deployment. Meanwhile, prediction schemas allow Seldon Deploy to automatically profile tabular data into histograms, allowing for filtering on features to explore trends.

In order to effectively construct a prediction schema Seldon has the ML Prediction Schema project. 

In [None]:
prediction_schema = {
  "requests": [
    {
      "name": "Review",
      "type": "TEXT",
      "dataType": "STRING",
      "nCategories": "0",
      "categoryMap": {},
      "schema": [],
      "shape": []
    }
  ],
  "responses": [
    {
      "name": "Rating",
      "type": "CATEGORICAL",
      "dataType": "INT",
      "nCategories": "9",
      "categoryMap": {
        "0": "1.0",
        "1": "1.5",
        "2": "2.0",
        "3": "2.5",
        "4": "3.0",
        "5": "3.5",
        "6": "4.0",
        "7": "4.5",
        "8": "5.0"
      },
      "schema": [],
      "shape": []
    }
  ]
}

You then add the prediction schema to the wider model catalog metadata. This includes information such as the model storage location, the name, who authored the model etc. The metadata tags and metrics which can be associated with a model are freeform and can therefore be determined based upon the use case which is being developed. 

In [None]:
model_catalog_metadata = {
      "URI": CONTAINER_NAME,
      "name": f"{DEPLOYMENT_NAME}-model",
      "version": "v1.0",
      "artifactType": "CUSTOM",
      "taskType": "Product review rating classification",
      "tags": {
        "auto_created": "true",
        "author": f"{YOUR_NAME}"
      },
      "metrics": {
          "accuracy": f"{accuracy}",
          "loss": f"{loss}"
      },
      "project": "default",
      "prediction_schema": prediction_schema
    }

model_catalog_metadata

Next, using the metadata API you can add this to the model which you have just created in Seldon.

In [None]:
metadata_api = ModelMetadataServiceApi(auth())
metadata_api.model_metadata_service_update_model_metadata(model_catalog_metadata)

You can then list the metadata via the API, or view it in the UI, to confirm that it has been successfully added to the model. 

In [None]:
metadata_response = metadata_api.model_metadata_service_list_model_metadata(uri=CONTAINER_NAME)
metadata_response

## Making Predictions

Now you can have a go at sending some requests to your model using the 'Predict' tab in the UI. 

An example of a good review that we would expect to correspond to a higher rating.

```
{
    "data": {
        "ndarray": ["_product_ is excellent! I love it, it's great!"]
    }
}
```

And an example of a negative review that we would expect to correspond to a lower rating.

```
{
    "data": {
        "ndarray": ["_product_ was terrible, I would not use it again, it was awful!"]
    }
}
```

## A/B testing - Deploying a Canary Model

Seldon supports a number of advanced deployment patterns, including Multi-armed bandits, A/B testing, Canary deployments and Shadow deployments. 

Here, we will deploy a second hugging face transformer model as a Canary Deployment and we will direct 30% of traffic to this model. 

Again to save time during the workshop, we will use a pre-trained model. This model is almost identical to the previously deployed model - I slightly increased the learning rate and added an extra epoch to change it slightly:



```
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=8e-5),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=tf.metrics.SparseCategoricalAccuracy(),
)

model.fit(tf_train_set, validation_data=tf_test_set, epochs=4)
```

This is where we can take advantage of the reusable nature of the custom container images as we can use the same custom container image that we saw earlier and just pass a different value to the ```model_path``` predictor parameter.

We will deploy our Canary model using the Deployment Wizard on the UI. 

We need to set the following field values:

**'Add a Canary'** tab:

* Runtime - "Custom"
* Docker Image - `tomfarrand/review-ratings:0.3`
* Canary Traffic Percentage - 30%

**'Predictor Parameters'** tab:

Use the light green '+' to add a new predictor parameter:

- Name - `model_path`
- Value - `nlp-ratings/canary/1/`

**'Resource Limits'** tab:

- Ensure all CPU fields are set to 1.0 and all memory fields are set to 2Gi.

Now you can click through the remaining tabs and launch the updated Seldon Deployment.

Once our canary model is available, we can make predictions and then view live requests and resource monitoring on the "Dashboard" tab in the Deployment UI for both models running in production. 

We will assume that this model has been running in production for some time and it is performing better than our default model. 

We can then finally go ahead and promote the canary model to be the default model in the deployment. 


## Drift Detection

Although powerful, modern machine learning models can be sensitive. Seemingly subtle changes in a data distribution can destroy the performance of otherwise state-of-the art models, which can be especially problematic when ML models are deployed in production. Typically, ML models are tested on held out data in order to estimate their future performance. Crucially, this assumes that the process underlying the input data `X` and output data `Y` remains constant.

Drift can be classified into the following types:
* **Covariate drift**: Also referred to as input drift, this occurs when the distribution of the input data has shifted `P(X) != Pref(X)`, whilst `P(Y|X) = Pref(Y|X)`. This may result in the model giving unreliable predictions.

* **Prior drift**: Also referred to as label drift, this occurs when the distribution of the outputs has shifted `P(Y) != Pref(Y)`, whilst `P(X|Y) = Pref(X|Y)`. This can affect the model’s decision boundary, as well as the model’s performance metrics.

* **Concept drift**: This occurs when the process generating `Y` from `X` has changed, such that `P(Y|X) != Pref(Y|X)`. It is possible that the model might no longer give a suitable approximation of the true process.

-----------------

In this example you will use the Maximum Mean Discrepancy method. Covariate or input drift detection relies on creating a distance measure between two distributions; a reference distribution and a new distribution. The MMD drift detector is no different; the mean embeddings of your features are used to generate the distributions and then the distance between them is measured. The training data is used to calculate the reference distribution, while the new distribution comes from your inference data.


We will use part of the training set as our reference distribution. Creating our drift detector is then as simple as writing a single line of code:

In [None]:
x_ref = train["processed_review"][:1000].to_list()

Alibi Detect comes with a range of transformers out of the box, these can be readily leveraged to generate embeddings from the text which is passed to the model. In this case we make use of a `distilBERT` model. 

Crucially, the pre-processing which we perform when feeding data to our drift detector does not necessarily have to match the pre-processing being used by the model. This means that the embedding method which generates the best results for the model and the drift detector can be controlled independently of one another. 

In [None]:
from alibi_detect.models.tensorflow import TransformerEmbedding

emb_type = 'hidden_state'
n_layers = 1
layers = [-_ for _ in range(1, n_layers + 1)]

embedding = TransformerEmbedding("distilbert-base-uncased", emb_type, layers)

In [None]:
tokens = tokenizer(list(x_ref[0][:5]), padding="max_length", return_tensors='tf')
x_emb = embedding(tokens)
print(x_emb.shape)

The embedding returns a 768 length vector for each of the tokens in the product review. If these were passed directly to the drift detector this would result in an massive computational task due to the sheer number of dimensions.

Therefore, the embeddings are then passed through an untrained autoencoder. The autoencoder acts to perform dimensionality reduction, resulting in the 768 length vectors being reduced to 32. This allows for drift detection to be calculated in a near real time setting. 

In [None]:
from alibi_detect.cd.tensorflow import UAE

enc_dim = 32
shape = (x_emb.shape[1],)

uae = UAE(input_layer=embedding, shape=shape, enc_dim=enc_dim)

Alibi Detect then allows us to construct a preprocessing function by bringing together each of these different components into a single function which can be readily serialised. 

In [None]:
from functools import partial
from alibi_detect.cd.tensorflow import preprocess_drift

# define preprocessing function
preprocess_fn = partial(preprocess_drift, model=uae, tokenizer=tokenizer, max_len=512, batch_size=32)

Fitting the MMD Drift detector on the reference data set. 

In [None]:
cd = MMDDrift(x_ref, p_val=.05, preprocess_fn=preprocess_fn, input_shape=(512,))

We can then observe whether drift is flagged on two separate batches of data: 
- `batch_0` containing the first 100 data points from the reference set. 
- `batch_1` containing a single example repeated 100 times. 

In [None]:
batch_0 = x_ref[:100]
batch_1 = [x_ref[0]] * 100

In [None]:
preds_cd = cd.predict(batch_0)
labels = ['No!', 'Yes!']
print('Drift? {}'.format(labels[preds_cd['data']['is_drift']]))
print('p-value: {}'.format(preds_cd['data']['p_val']))
print('Drift Distance: {}'.format(preds_cd['data']['distance']))

In [None]:
preds_cd = cd.predict(batch_1)
labels = ['No!', 'Yes!']
print('Drift? {}'.format(labels[preds_cd['data']['is_drift']]))
print('p-value: {}'.format(preds_cd['data']['p_val']))
print('Drift Distance: {}'.format(preds_cd['data']['distance']))

Now we will go ahead and save our drift detectors. We will save them to my public Google Storage bucket, ```kelly-seldon``` for ease in subfolders corresponding to the ```YOUR_NAME``` variable.

In [None]:
save_detector(cd, "reviews-dd")

In [None]:
!gsutil cp -r reviews-dd gs://kelly-seldon/nlp-ratings/dd/{YOUR_NAME}/reviews-dd

## Deploy the Drift Detector

To deploy the drift detector you will use Seldon Deploy's user interface. Simply navigate to your deployment, and select the "Create" button for your drift detector.

This will bring up a form. Add your ```Detector Name```, the ```Storage URI```:

In [None]:
print(f"gs://kelly-seldon/nlp-ratings/dd/{YOUR_NAME}/reviews-dd/")

the ```Batch Size```:

* ```20```

and the `Drift Type`:

* `Batch`

The batch size configuration sets how many data points have to be sent to the endpoint before drift is calculated.

## Run a Batch Job

The simplest way to test that your drift detector and prediction schema are behaving correctly is to kick off a batch job. 

There is already a pre-prepared batch of data stored in a Minio bucket which Seldon can access, therefore all you have to do is provide the configuration which is outlined below. This shows all of the available parameters which can be changed, but not all are required and are shown for educational purposes. 

In [None]:
workflow = BatchJobDefinition(
    batch_data_type='data',
    batch_gateway_type='seldon',
    batch_interval=0.0,
    batch_method='predict',
    batch_payload_type='ndarray',
    batch_retries='3',
    batch_size=0,
    batch_transport_protocol='rest',
    batch_workers='3',
    input_data='minio://reviews/batch_examples.txt',
    object_store_secret_name='minio-bucket',
    output_data='minio://reviews/output_batch_examples.txt',
    pvc_size='1Gi'
)

You can then pass the above configuration to the batch job API and kick it off. 

In [None]:
batch_api = BatchJobsApi(auth())
batch_api.create_seldon_deployment_batch_job(DEPLOYMENT_NAME, NAMESPACE, workflow)

# Congratulations!

Thank you for sticking it out to the end of the workshop! 

As a recap you have done the following:

1. Explored and preprocessed a set of product review data
2. Run through the steps to train a hugging face tensorflow transformer model  
3. Deployed that model as a custom model using Seldon
4. Added Metadata and Prediction Schema to the model
5. Deployed a second model as a canary deployment and promoted that to the default model
6. Trained and deployed a drift detector to understand when your data changes. 
7. Run a batch job to see the drift detector in action

Not a bad list. Well done!