# Deploying end-to-end machine learning workflows with HPE Ezmeral ML Ops - Lab 4

## Delivering the trained model to production

### **Lab workflow**

After you have trained your ML model and saved it to a file in the central project repository, it is time to move the model to production by creating a service layer to serve the model. You do that by creating a secure RESTful API service endpoint that client applications can access to use the model and make predictions. 

Delivering an ML model to production is a two-step process. We will cover both of these steps in this lab.

1. You will first register the trained model in HPE Ezmeral ML Ops model registry. You do it by creating a ConfigMap resource in the Kubernetes cluster. The ConfigMap object stores metadata about the trained model to be used to make predictions. It contains information such as model name, description, versioning, trained model file path (XGB.pickle.dat), and the scoring path. The scoring path locates a Python script (XGB_Scoring.py) that is used to load the saved model file from the central project repository and generate predictions from new data.  

2. You will then deploy an ML Ops deployment engine application cluster as a component of the ML pipeline and attach the registered model to that cluster. The deployment engine cluster loads information about the registered model from the ConfigMap object. The deployment engine exposes a set of microservices with a secure RESTful API that allows clients to consume to registered model and draw predictions on new input data.

**Definitions:**

- *Model registry:* The trained model to be used is identified and characterized in HPE Ezmeral ML Ops by a Kubernetes ConfigMap object. The integrated model registry enables version tracking and seamless updates to models in production.

- *Model predictions:* The trained model is deployed to a target ML Ops _"deployment engine"_ application cluster environment to serve predictions and for answering prediction queries from the trained model you registered.


### **1- Initialize the environment**

Let's first define the environment variables needed to execute this part of the lab.

In [None]:
#
# environment variables
#
studentId=$(grep hpecp-user $HOME/.kube/config | cut -d= -f2)

gateway_host="{{ HPEECPGWNAME }}"
Internet_access="{{ JPHOSTEXT }}"

DeploymentEngineApp="cr-cluster-mlops-endpoint-wrapper.yaml" # the Deployment engine KD App manifest you will deploy to query your model for answers 
PipelineConfigMap="mlops-pipeline-configmap.yaml" # ConfigMap manifest used to register the trained model version 1 
TrainingModel="mlops-model-${studentId}"

echo "Your studentId is: "$studentId 

### **2- Register your trained model**

You will need to register the trained model in the Kubernetes cluster model registry by creating a ConfigMap resource. The ML Ops deployment engine application will load the model information from the registry. The ConfigMap object stores metadata about the trained model to be used to make predictions. It contains information such as:
* the model name, 
* a label: **kubedirector.hpe.com/cmType: "model"**
* a description, 
* a versioning (for example 1 for the first version of the model) 
* the full path to the trained model (serialized) file (XGB.pickle.dat),
* the full path to the scoring (prediction) script (XGB_Scoring.py) that will be used by the deployment engine to load (deserialize) the model and process the model to make predictions from new data (this process is also known as **_scoring_**, hence the name of this Python script file). 

#### Create the ConfigMap resource using a YAML manifest file:
The `kubectl apply -f ManifestAppFile` command is used to create the ConfigMap resource in the Kubernetes cluster. The application manifest is a YAML file that describes the registry information for the trained model. 

In [None]:
cat $PipelineConfigMap

In [None]:
kubectl apply -f $PipelineConfigMap

In [None]:
kubectl get configmap $TrainingModel

### **3- Deploying your model to a deployment engine environment for model serving**

#### Create the manifest file and deploy an instance of the _deployment-engine_ ML Ops application:
You will now deploy an instance of the _**deployment-engine**_ ML Ops application. The deployment engine cluster environment is used to stand up a secure RESTful API service that will allow client applications to consume the registered model and draw predictions. 

>**Notes:** _The deployment-engine ML Ops application supports common open source ML toolkits and mainstream frameworks such as Scikit-learn, TensorFlow, PyTorch, XGBoost, Pandas, Numpy and Flask for model serving._ 

Like any other containerized application deployment on Kubernetes, the `kubectl apply -f ManifestAppFile` command is used to deploy an instance of this ML Ops application. 

> **Note:** _The manifest file includes the **Connections** stanza. The Connections stanza here is used to attach your model from the model registry (that is the ConfigMap object) to the deployment engine cluster. The deployment engine cluster will load information about the registered model from the ConfigMap object into a JSON file (**/etc/guestconfig/configmeta.json**) within the deployment engine cluster containers._

In [None]:
cat $DeploymentEngineApp

In [None]:
kubectl apply -f $DeploymentEngineApp

After a few seconds, you should get the response message: *kubedirectorcluster/Your-instance-name created*.  

### **4- Inspect the deployed ML Ops application instance** 
The deployment engine ML Ops application will be represented in the Kubernetes cluster by a custom resource of type **KubeDirectorCluster (kdcluster)**, with the name that was indicated inside the YAML file used to create it. 

In [None]:
clusterName="mlops-inference-server-${studentId}"
kubectl get kdcluster $clusterName

After creating the instance of the deployment-engine application, you can use the `kubectl describe kdcluster` command below to observe its status and any events logged against it.

The virtual cluster status indicates its overall "state" (top-level property of the status object). It should have a value of **"configured"**. 

> **Note:** _The first time a virtual cluster of a given ML Ops application type is created, it may take several minutes to reach its **"configured"** state, as the relevant Docker image must be downloaded and imported._ 

**>Run the `kubectl describe` command below and scroll down to the `Events` section to check the overal state of your virtual application cluster.**

**>Regularly repeat (every minute or so) the command below until the virtual cluster is in state "configured"**.

In [None]:
kubectl describe kdcluster $clusterName

You can use the `kubectl get pod,service,statefulset` command that matches against a value of the **kubedirector.hpe.com/kdcluster=YourClusterApplicationName** label to observe the standard Kubernetes resources that compose the application virtual cluster:

In [None]:
kubectl get pod,service,statefulset -l kubedirector.hpe.com/kdcluster=$clusterName

Your instance of the application virtual cluster is made up of a **StatefulSet**, a **POD** (a cluster node) and a **NodePort Service** per service role member (LoadBalancer, RESTServer), and a **headless service** for the application cluster.   

* The ClusterIP service is the headless service required by a Kubernetes StatefulSet to work. It maintains a stable POD network identity (i.e.: persistence of the hostname of the PODs across PODs rescheduling).
* The NodePort services expose the LoadBalancer and RESTServer application services with token-based authorization outside the Kubernetes cluster. 

Now, follow the instructions in Lab 5 to serve prediction queries.

* [Lab 5 Model Serving](5-WKSHP-MLOps-K8s-Model-Serving.ipynb)

## Summary

In this lab, you learned how you can deliver a trained model to production and make it available for answering prediction queries. You first registered the trained model in the Kubernetes cluster with relevant model information in a ConfigMap resource. You then deployed the registered model to a target deployment engine environment that exposes a RESTful API service endpoint to serve predictions.