# BYOP Example

## Summary

On this notebook we will train a dummy `KMeans` model for clustering images using a pre-trained `VGG16` model on `cifar10` as a feature extractor. Finally, we will upload the model using the `Wallaroo SDK` and run inferences on unseen data.

> If you want to skip the model training and Python implementation steps, the model is also available in model zoo [here](https://storage.cloud.google.com/wallaroo-model-zoo/model-auto-conversion/BYOP/vgg16_clustering.zip?authuser=0).

> 🗒️ As mentioned in the docs, `custom_model/` should contain the following: 
> - all necessary model artifacts;
> - one or multiple Python files implementing the aforementioned classes (i.e. `Inference` & `InferenceBuilder`). The user can use *any naming of their choice* for the implemented classes as long as they inherit from the appropriate base classes. The Python file(s) can also be arbitrarily named;
> - a `requirements.txt` file with all necessary pip requirements to successfully run the inference;

## Imports

In [2]:
import numpy as np
import pandas as pd
import json
import os
import pickle
import pyarrow as pa
import tensorflow as tf
import wallaroo

from sklearn.cluster import KMeans
from tensorflow.keras.datasets import cifar10
from tensorflow.keras import Model
from tensorflow.keras.layers import Flatten
from wallaroo.pipeline   import Pipeline
from wallaroo.deployment_config import DeploymentConfigBuilder
from wallaroo.framework import Framework

wl = wallaroo.Client(auth_type="sso", interactive=True)

## Model training

### Load dataset

In [3]:
# Load and preprocess the CIFAR-10 dataset
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

# Normalize the pixel values to be between 0 and 1
X_train = X_train / 255.0
X_test = X_test / 255.0

In [4]:
X_train.shape

(50000, 32, 32, 3)

### Train KMeans with VGG16 as feature extractor

In [5]:
pretrained_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet', input_shape=(32, 32, 3))
embedding_model = Model(inputs=pretrained_model.input, outputs=Flatten()(pretrained_model.output))

2023-06-28 16:32:29.023010: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-06-28 16:32:29.023038: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-06-28 16:32:29.023060: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (jupyter-john-2ehummel-40wallaroo-2eai): /proc/driver/nvidia/version does not exist
2023-06-28 16:32:29.028418: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [6]:
X_train_embeddings = embedding_model.predict(X_train[:100])
X_test_embeddings = embedding_model.predict(X_test[:100])

In [7]:
kmeans = KMeans(n_clusters=2, random_state=0).fit(X_train_embeddings)



### Save models

Let's first create the directory where the model artifacts will be saved:

In [8]:
os.mkdir("vgg16_clustering/")

FileExistsError: [Errno 17] File exists: 'vgg16_clustering/'

And now save the two models:

In [9]:
with  open('vgg16_clustering/kmeans.pkl', 'wb') as fp:
    pickle.dump(kmeans, fp)

In [10]:
embedding_model.save("vgg16_clustering/feature_extractor.h5")





> All needed model artifacts have been now saved under `vgg16_clustering/`.

## Extend Inference & InferenceBuilder to serve a custom inference

**Note:** First we need to make sure that `mac` is included in our current Wallaroo SDK installation.

After making sure that is the case, we can now extend `mac` to serve a custom inference with the pre-trained `VGG16` model as a feature extractor and the trained `KMeans` model for clustering as such:

```python
"""This module features an example implementation of a custom Inference and its
corresponding InferenceBuilder."""

import pathlib
import pickle
from typing import Any, Set

import tensorflow as tf
from mac.config.inference import CustomInferenceConfig
from mac.inference import Inference
from mac.inference.creation import InferenceBuilder
from mac.types import InferenceData
from sklearn.cluster import KMeans


class ImageClustering(Inference):
    """Inference class for image clustering, that uses
    a pre-trained VGG16 model on cifar10 as a feature extractor
    and performs clustering on a trained KMeans model.

    Attributes:
        - feature_extractor: The embedding model we will use
        as a feature extractor (i.e. a trained VGG16).
        - expected_model_types: A set of model instance types that are expected by this inference.
        - model: The model on which the inference is calculated.
    """

    def __init__(self, feature_extractor: tf.keras.Model):
        self.feature_extractor = feature_extractor
        super().__init__()

    @property
    def expected_model_types(self) -> Set[Any]:
        return {KMeans}

    @Inference.model.setter  # type: ignore
    def model(self, model) -> None:
        """Sets the model on which the inference is calculated.

        :param model: A model instance on which the inference is calculated.

        :raises TypeError: If the model is not an instance of expected_model_types
            (i.e. KMeans).
        """
        self._raise_error_if_model_is_wrong_type(model) # this will make sure an error will be raised if the model is of wrong type
        self._model = model

    def _predict(self, input_data: InferenceData) -> InferenceData:
        """Calculates the inference on the given input data.
        This is the core function that each subclass needs to implement
        in order to calculate the inference.

        :param input_data: The input data on which the inference is calculated.
        It is of type InferenceData, meaning it comes as a dictionary of numpy
        arrays.

        :raises InferenceDataValidationError: If the input data is not valid.
        Ideally, every subclass should raise this error if the input data is not valid.

        :return: The output of the model, that is a dictionary of numpy arrays.
        """

        # input_data maps to the input_schema we have defined
        # with PyArrow, coming as a dictionary of numpy arrays
        inputs = input_data["images"]

        # Forward inputs to the models
        embeddings = self.feature_extractor(inputs)
        predictions = self.model.predict(embeddings.numpy())

        # Return predictions as dictionary of numpy arrays
        return {"predictions": predictions}


class ImageClusteringBuilder(InferenceBuilder):
    """InferenceBuilder subclass for ImageClustering, that loads
    a pre-trained VGG16 model on cifar10 as a feature extractor
    and a trained KMeans model, and creates an ImageClustering object."""

    @property
    def inference(self) -> ImageClustering:
        return ImageClustering

    def create(self, config: CustomInferenceConfig) -> ImageClustering:
        """Creates an Inference subclass and assigns a model and additionally
        needed attributes to it.

        :param config: Custom inference configuration. In particular, we're
        interested in `config.model_path` that is a pathlib.Path object
        pointing to the folder where the model artifacts are saved.
        Every artifact we need to load from this folder has to be
        relative to `config.model_path`.

        :return: A custom Inference instance.
        """
        feature_extractor = self._load_feature_extractor(
            config.model_path / "feature_extractor.h5"
        )
        inference = self.inference(feature_extractor)
        model = self._load_model(config.model_path / "kmeans.pkl")
        inference.model = model

        return inference

    def _load_feature_extractor(
        self, file_path: pathlib.Path
    ) -> tf.keras.Model:
        return tf.keras.models.load_model(file_path)

    def _load_model(self, file_path: pathlib.Path) -> KMeans:
        with open(file_path.as_posix(), "rb") as fp:
            model = pickle.load(fp)
        return model
```

> `config.model_path` points to the zipped model folder we're going to upload via `wl.upload_model()`,
i.e. the `vgg16_clustering.zip` file we'll save on the next step.

We can save this implementation to a Python file (i.e. `custom_inference.py`) and save it inside the `vgg16_clustering/` folder.

### Create requirements.txt

As a last step we need to create a `requirements.txt` file and save it under `vgg_clustering/`. The file should contain all the necessary pip requirements needed to run the inference. It should like like this:

```txt
tensorflow==2.8.0
scikit-learn==1.2.2
```

> **Attention**: Please make sure to align with the framework requirements mentioned in the docs -both during training your models as well as in `requirements.txt`-, otherwise it's not guaranteed the inference will run successfully.

### Zip model folder

Assuming we have stored the following files inside `vgg_clustering/`:
1. `feature_extractor.h5`
2. `kmeans.pkl`
3. `custom_inference.py`

as a final step we need to zip the folder via the terminal as follows:

`zip -r vgg16_clustering.zip vgg16_clustering/`

> The custom model can now be uploaded via `wl.upload_model()` and be deployed as a pipeline step.

## Configure & Upload Model

### Get Framework for the `custom` inference

Let's see what frameworks are supported via the `Framework` Enum:

In [None]:
[e.value for e in Framework]

The appropriate one for the serving a `custom` inference is the following:

In [None]:
Framework.CUSTOM

### Configure PyArrow Schema

> `input_schema` and `output_schema` should match exactly the data we're expecting to retrieve and return within `ImageClustering._predict()`.

In [11]:
input_schema = pa.schema([
    pa.field('images', pa.list_(
        pa.list_(
            pa.list_(
                pa.int64(),
                list_size=3
            ),
            list_size=32
        ),
        list_size=32
    )),
])

output_schema = pa.schema([
    pa.field('predictions', pa.int64()),
])

### Upload Model

In [18]:
model = wl.upload_model('vgg16-clustering', 'model-auto-conversion_BYOP_vgg16_clustering.zip', framework=Framework.CUSTOM, input_schema=input_schema, output_schema=output_schema, convert_wait=True)
model

Waiting for model conversion... It may take up to 10.0min.
Model is Pending conversion..Converting.................Ready.


{'name': 'vgg16-clustering', 'version': '819114a4-c1a4-43b4-b66b-a486e05a867f', 'file_name': 'model-auto-conversion_BYOP_vgg16_clustering.zip', 'image_path': 'proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mlflow-deploy:v2023.3.0-main-3443', 'last_update_time': datetime.datetime(2023, 6, 28, 16, 54, 38, 299848, tzinfo=tzutc())}

## Deploy Pipeline

In [13]:
deployment_config = DeploymentConfigBuilder() \
    .cpus(0.25).memory('4Gi') \
    .build()

In [14]:
pipeline_name = "vgg16-clustering-pipeline"
pipeline = wl.build_pipeline(pipeline_name)
pipeline.add_model_step(model)

pipeline.deploy(deployment_config=deployment_config)
pipeline.status()

Waiting for deployment - this will take up to 90s ............................. ok


{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.29.63',
   'name': 'engine-dfd47ffbc-gs9b5',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'vgg16-clustering-pipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'vgg16-clustering',
      'version': '5f3f4a0e-8921-4e36-b3af-ee32dec77314',
      'sha': 'f5f5e1ab29057ac750b7b7afefd6fb16c789b22c3291a966597a5d9846eb1c53',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.29.62',
   'name': 'engine-lb-584f54c899-7m4dz',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': [{'ip': '10.244.4.22',
   'name': 'engine-sidekick-vgg16-clustering-46-6cb499d45b-tmfkk',
   'status': 'Running',
   'reason': None,
   'details': [],
   'statuses': '\n'}]}

## Run inference

In [15]:
input_data = {
        "images": [np.random.randint(0, 256, (32, 32, 3), dtype=np.uint8)] * 2,
}
dataframe = pd.DataFrame(input_data)
dataframe

Unnamed: 0,images
0,"[[[0, 42, 244], [163, 88, 141], [195, 14, 131]..."
1,"[[[0, 42, 244], [163, 88, 141], [195, 14, 131]..."


In [16]:
%time
pipeline.infer(dataframe, timeout=10000)

CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 4.53 µs


Unnamed: 0,time,in.images,out.predictions,check_failures
0,2023-06-28 16:37:46.068,"[0, 42, 244, 163, 88, 141, 195, 14, 131, 89, 1...",1,0
1,2023-06-28 16:37:46.068,"[0, 42, 244, 163, 88, 141, 195, 14, 131, 89, 1...",1,0


## Undeploy Pipelines

In [17]:
for pipeline in wl.list_pipelines():
    pipeline.undeploy()

Waiting for undeployment - this will take up to 45s ...................................... ok
Waiting for undeployment - this will take up to 45s ..................................... ok
