# Model Deployment using Numpy Linear Classifier

## 1. Introduction
In this workbook, we will look into the basics of deploying a model. For simplicity, we will consider a simple numpy linear classifier $$ \mathbf{Y} = \mathbf{W} \mathbf{X} + \mathbf{b}$$

For simplicity, we will consider $\mathbf{X}$ to be 6 dimensional ($\mathbb{R}^6$). i.e. 1 data point $x \in \mathbf{X}$ will be a numpy array of shape $(1,6)$. The output $\mathbf{Y}$ is 3 dimensional ($\mathbb{R}^3$). Then, the weights $\mathbf{W}$ will be a numpy array of shape $(3,6)$ and bias $\mathbf{b}$ will be a numpy array of shape $(,3)$. 

In this workbook, we will demonstrate how to deploy this numpy linear classifier as a server and how to perform query on this numpy linear classifier.

## 2. Imports and Dependencies.
The few packages needed are loaded next. Particularly, `numpy`, `mlflow` will be majorly used in this tutorial. `requests` package will be used for performing query. `json` is used to post and get response from the server.

In [1]:
#https://stackoverflow.com/questions/61615818/setting-up-mlflow-on-google-colab
!pip install mlflow --quiet
!pip install pyngrok --quiet

[K     |████████████████████████████████| 16.5 MB 110 kB/s 
[K     |████████████████████████████████| 210 kB 31.1 MB/s 
[K     |████████████████████████████████| 596 kB 36.1 MB/s 
[K     |████████████████████████████████| 62 kB 566 kB/s 
[K     |████████████████████████████████| 146 kB 38.0 MB/s 
[K     |████████████████████████████████| 181 kB 40.4 MB/s 
[K     |████████████████████████████████| 79 kB 7.0 MB/s 
[K     |████████████████████████████████| 54 kB 2.4 MB/s 
[K     |████████████████████████████████| 63 kB 1.4 MB/s 
[K     |████████████████████████████████| 78 kB 5.9 MB/s 
[?25h  Building wheel for databricks-cli (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 745 kB 24.1 MB/s 
[?25h  Building wheel for pyngrok (setup.py) ... [?25l[?25hdone


In [2]:
import mlflow

with mlflow.start_run(run_name="MLflow on Colab"):
  mlflow.log_metric("m1", 2.0)
  mlflow.log_param("p1", "mlflow-colab")

# run tracking UI in the background
get_ipython().system_raw("mlflow ui --port 5000 &") # run tracking UI in the background


# create remote tunnel using ngrok.com to allow local port access
# borrowed from https://colab.research.google.com/github/alfozan/MLflow-GBRT-demo/blob/master/MLflow-GBRT-demo.ipynb#scrollTo=4h3bKHMYUIG6

from pyngrok import ngrok

# Terminate open tunnels if exist
ngrok.kill()

# Setting the authtoken (optional)
# Get your authtoken from https://dashboard.ngrok.com/auth
NGROK_AUTH_TOKEN = "27XxLnFfAI8offnpHao1vsKyWja_4uTdALriviWk545yXPDFW"
ngrok.set_auth_token(NGROK_AUTH_TOKEN)

# Open an HTTPs tunnel on port 5000 for http://localhost:5000
ngrok_tunnel = ngrok.connect(addr="5000", proto="http", bind_tls=True)
print("MLflow Tracking UI:", ngrok_tunnel.public_url)

MLflow Tracking UI: https://b7ba-34-85-170-199.ngrok.io


In [3]:
import os
import sys
import mlflow
import numpy as np

# Suppress warnings
import warnings
warnings.filterwarnings("ignore")

## MLflow for experiment tracking and model deployment

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It tackles four primary functions:

- Tracking experiments to record and compare parameters and results (MLflow Tracking).
- Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms (MLflow Models).
- Providing a central model store to collaboratively manage the full lifecycle of an MLflow Model, including model versioning, stage transitions, and annotations (MLflow Model Registry).

More information [here](https://www.mlflow.org/docs/latest/index.html#)



![image.png](https://www.mlflow.org/docs/latest/_images/scenario_4.png)

- localhost maps to the server on which the current notebook is running

- Tracking server maps to the server at environment variable `TRACKING_URL` that can be printed using `os.environ.get("TRACKING_URL")`

- Create an mlflow client that communicates with the tracking server

In [4]:
from mlflow import pyfunc

# Setting a tracking uri to log the mlflow logs in a particular location tracked by 
from mlflow.tracking import MlflowClient
tracking_uri = os.environ.get("TRACKING_URL")
client = MlflowClient(tracking_uri=tracking_uri)
mlflow.set_tracking_uri(tracking_uri)

## Create an experiment in mlflow database using mlflow client

- Get the list of all the experiments (Click on **Experiments** tab on the sidebar to see the list)
- Create a new experiment named *numpy_deployment* if it doesn't exist
- Set *numpy_deployment* as the new experiment under which different **runs** are tracked

## MLflow Entity Hierarchy

- Experiment 1
    - Run 1
        - Parameters
        - Metrics
        - Artifacts
            - Folder 1
                - File 1
                - File 2
            - Folder 2 
    - Run 2
    - Run 3

- Experiment 2
- Experiment 3        

In [5]:
# Setting a tracking project experiment name to keep the experiments organized
experiments = client.list_experiments()
experiment_names = []
for exp in experiments:
    experiment_names.append(exp.name)
experiment_name = "numpy_deployment"
if experiment_name not in experiment_names:
    mlflow.create_experiment(experiment_name)
mlflow.set_experiment(experiment_name)


<Experiment: artifact_location='file:///content/mlruns/1', experiment_id='1', lifecycle_stage='active', name='numpy_deployment', tags={}>

## Python Class for inference

- ModelWrapper is derived from mlflow.pyfunc.PythonModel [more info](https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html)
- load_context() member function is used to load the model. In this case, it loads a numpy file with two arrays **weights** and **bias**
- predict member function takes a numpy array as input and outputs another numpy array
- An object of this class will be saved as a pickle file in blob storage

In [6]:
## Model Wrapper that takes 
class ModelWrapper(mlflow.pyfunc.PythonModel):
    def load_context(self,context):
        import numpy as np
        self.model = np.load(context.artifacts['model_path'], allow_pickle=True).tolist()
        print("Model initialized")
    
    def predict(self, context, model_input):
        import numpy as np
        import json
        json_txt = ", ".join(model_input.columns)
        data_list = json.loads(json_txt)
        inputs = np.array(data_list)
        if len(inputs.shape) == 2:
            print('batch inference')
            predictions = []
            for idx in range(inputs.shape[0]):
                prediction = np.matmul(inputs[idx,:],self.model['weights'].T) + self.model['bias']
                predictions.append(prediction.tolist())
        elif len(inputs.shape) == 1:
            print('single inference')
            predictions = self.model['weights'].T * inputs + self.model['bias']
            predictions = predictions.tolist()
        else:
            raise ValueError('invalid input shape')
        return json.dumps(predictions)

## Register a model using mlflow

- Log user-defined parameters in a remote database through a remote server
- Create a model_wrapper object using ModelWrapper() class in the above cell
- Create a default conda environment that need to be installed on the Docker conatiner that serves a REST API
- Save the model object as a pickle file and conda environment as artifacts (files) in S3 or Blob Storage

In [7]:
# instantiate the python inference model wrapper for the server
model_wrapper = ModelWrapper()


# define the model weights randomly
np_weights = np.random.rand(3,6)
np_bias = np.random.rand(3)

# checkpointing and logging the model in mlflow
artifact_path = './np_model'
np.save(artifact_path, {'weights':np_weights, 'bias':np_bias})
model_artifacts = {"model_path" : artifact_path+'.npy'}

#Conda environment
env = mlflow.sklearn.get_default_conda_env()
with mlflow.start_run():
    mlflow.log_param("features",6)
    mlflow.log_param("labels",3)
    mlflow.pyfunc.log_model("np_model", python_model=model_wrapper, artifacts=model_artifacts, conda_env=env)

## 5. Use the Endpoint and Query from the server

There are two methods to perform query... The first is using `requests` library and the other using `curl` shell command.

In [16]:
import requests
import json

################################################################################
# *** SET MODEL URL HERE BEFORE RUNNING THIS CELL (instructions above) ***
# Example: http://127.0.0.1:5000/invocations
url = "http://localhost:5002/invocations"
################################################################################

if not url:
    raise ValueError('Model URL not set! Please read instructions on how to deploy model, set the correct URL, and try again.')

headers = {"Content-Type":"text/csv"}

# First case, run inference on single data point
np_array = np.random.rand(1,6).tolist()
json_data = json.dumps(np_array)

if url:
    response = requests.post(url,data=json_data,headers=headers)
    if response.status_code == 200:
        output = np.array(json.loads(response.json())).astype(np.float32)
        print(output)
    else:
        print(response.status_code)
        print("REST API deployment is in progress -- please try again in a few minutes!")
else:
    print("Make sure that the model is in ON state. Copy the Endpoint")

# Second case, run inference on multiple data points
np_array = np.random.rand(20,6).tolist()
json_data = json.dumps(np_array)

if url:
    response = requests.post(url,data=json_data,headers=headers)
    if response.status_code == 200:
        output = np.array(json.loads(response.json())).astype(np.float32)
        print(output)
    else:
        print(response.status_code)
        print("REST API deployment is in progress -- please try again in a few minutes!")
else:
    print("Make sure that the model is in ON state. Copy the Endpoint")


[[1.2642828 1.417612  1.5717524]]
[[1.7700912 1.3319434 2.044264 ]
 [2.5932946 1.8004078 3.0677977]
 [2.0653286 1.4503199 2.342546 ]
 [2.2049885 1.5345415 2.477913 ]
 [1.7142023 1.5396183 2.2291229]
 [2.2388797 1.4262313 2.5884798]
 [1.8614157 1.5010685 2.4475756]
 [1.7703766 1.569534  1.9948692]
 [1.5760703 1.6345834 2.1014514]
 [1.8516906 1.8806502 2.3388996]
 [1.3438103 1.2168763 1.6201899]
 [2.468498  1.741846  2.9562566]
 [2.3695126 1.6911329 2.5876129]
 [1.5991931 1.5471226 2.0437183]
 [2.4479113 1.6151091 2.8826597]
 [2.2748516 1.5989865 2.7707183]
 [1.502824  1.2764392 1.7962253]
 [1.848351  1.5082681 2.2302144]
 [2.0217803 1.6401912 2.5365727]
 [2.2394376 1.6383314 2.465779 ]]


In [14]:
!mlflow models serve -m file:///content/mlruns/1/8a16c03605de48c08050cc64726d190f/artifacts/np_model --port 5002 --no-conda

2022/04/09 05:04:29 INFO mlflow.models.cli: Selected backend for flavor 'python_function'
2022/04/09 05:04:29 INFO mlflow.pyfunc.backend: === Running command 'gunicorn --timeout=60 -b 127.0.0.1:5002 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2022-04-09 05:04:30 +0000] [351] [INFO] Starting gunicorn 20.1.0
[2022-04-09 05:04:30 +0000] [351] [INFO] Listening at: http://127.0.0.1:5002 (351)
[2022-04-09 05:04:30 +0000] [351] [INFO] Using worker: sync
[2022-04-09 05:04:30 +0000] [354] [INFO] Booting worker with pid: 354
Model initialized

[2022-04-09 05:04:40 +0000] [351] [INFO] Handling signal: int
Aborted!
[2022-04-09 05:04:41 +0000] [354] [INFO] Worker exiting (pid: 354)
[2022-04-09 05:04:41 +0000] [351] [INFO] Shutting down: Master


In [15]:
get_ipython().system_raw("mlflow models serve -m file:///content/mlruns/1/8a16c03605de48c08050cc64726d190f/artifacts/np_model --port 5002 --no-conda &") # run tracking UI in the background
