# Deploying end-to-end machine learning workflows with HPE Ezmeral ML Ops - Lab 5
## Model serving

### **Lab workflow:**
You have developed and trained a model from a dataset, register the trained model and deployed it in a deployment engine ML Ops application cluster. Now it is time to make predictions (i.e.: _how long my taxi ride take?_) with new input data (i.e.: new data points). 
The ML Ops deployment engine cluster you just deployed is a set of microservices with a secure RESTful API to serve online predictions. The deployment engine cluster exposes network service endpoints such as a **LoadBalancer and a scalable RESTServer** with token-based authorization that can now be used to serve your model and make predictions using REST API queries from application clients. 

In this lab:

1. You will use kubectl commands in the context of your tenant user account and get the LoadBalancer network service endpoints with token-based authentication of the deployment engine application cluster you have just deployed. 

2. You will then use **_cURL_** as an HTTP application client for making REST API queries to your model though the API service exposed by the ML Ops deployment engine cluster you have just deployed. 


**Definitions:**

- *Model predictions:* The trained model is deployed to a target _"deployment engine"_ ML Ops application virtual cluster environment in the Kubernetes cluster to serve predictions and for answering prediction queries from the trained model you registered.

- *Scoring:* Scoring denotes the process of generating predicted values from new input data. The scoring script you specified in the model registry for your trained model is used to load the trained model, read the input data that is being received from the deployment engine REST API endpoint, and make predictions. 

### **1- Initialize the environment**

Let's first define the environment variables needed to execute this part of the lab.

In [None]:
#
# environment variables
#

studentId=$(grep hpecp-user $HOME/.kube/config | cut -d= -f2)

gateway_host="{{ HPEECPGWNAME }}"
Internet_access="{{ JPHOSTEXT }}"

sc_secret="sc-secret-mlops-students.yaml" # The secret object for the GitHub VCS information. it is used by the notebook to know how to connect to the VCS.
JupyterNotebookApp="cr-cluster-mlops-jupyter-notebook.yaml" # the Jupyter Notebook ML Ops app manifest you will deploy to build your model
DeploymentEngineApp="cr-cluster-mlops-endpoint-wrapper.yaml" # the Deployment engine KD App manifest you will deploy to query your model for answers 
PipelineConfigMap="mlops-pipeline-configmap.yaml" # ConfigMap manifest used to register the trained model version 1 
#
clusterName="mlops-inference-server-${studentId}"
#
# Model registry information
#
TrainingModel="mlops-model-${studentId}"
modelVersion="1"
#
echo "Your studentId is: "$studentId 

### **2- Serving queries through the Load Balancer service of the deployment engine cluster**

#### Get the service endpoint and the Authentication token of the Load Balancer of the deployment engine cluster:
To get a report on all services related to a specific virtual cluster, you can use a form of **kubectl describe** that matches against a value of the **kubedirector.hpe.com/kdcluster=YourClusterApplicationName,kubedirector.hpe.com/role=LoadBalancer** label.

In [None]:
#
# Getting the Model Serving access point from the haproxy service of the LoadBalancer (role: LoadBalancer, internal port: 32700):
#
LoadBalancerURL=$(kubectl describe service -l kubedirector.hpe.com/kdcluster=${clusterName},kubedirector.hpe.com/role=LoadBalancer | grep gateway/32700 | awk '{print $2}')
LoadBalancerPort=$(echo $LoadBalancerURL | cut -d':' -f 2) # extract the gateway re-mapped port value.
LoadBalancer_endpoint="https://$gateway_host:$LoadBalancerPort"
echo "The Model Serving LoadBalancer service endpoint re-mapped port is: "$LoadBalancerPort
echo "Your Model Serving LoadBalancer service endpoint is: "$LoadBalancer_endpoint
#echo "The LoadBalancer service endpoint URL is: "https://$Internet_access:$RESTServerPort
#
# Getting the auth-token:
#
LoadBalancerAuthToken=$(kubectl describe service -l kubedirector.hpe.com/kdcluster=${clusterName},kubedirector.hpe.com/role=LoadBalancer | grep kd-auth-token  | awk '{print $2}' | tr -d '\r')
echo "The Model Serving Load Balancer service authentication token is: "$LoadBalancerAuthToken

### **3- Making predictions on new data**
To make a prediction, you create an authenticated "POST" REST API call and send it to the prediction service endpoint (that is, the deployment engine REST API network service endpoint). The REST API call is formulated as follows:  
https://loadbalancer_endpoint/registeredModel/modelVersion/predict

The query below is used to predict how long a taxi ride in NY City with attributes listed below will take based on:
* pickup location: West 23rd street
* dropoff location: Centre Market place 
* on a weekday 
* at 09:00 am 
* in February

>Note: _It may take a few seconds for the prediction service to generate the prediction and send the result of the REST API call back to the cURL HTTP application client._

In [None]:
curl --location -k -s --request POST "${LoadBalancer_endpoint}/${TrainingModel}/${modelVersion}/predict" \
--header "X-Auth-Token: ${LoadBalancerAuthToken}" \
--header 'Content-Type: application/json' \
--data-raw '{
    "use_scoring": true,
    "scoring_args": {
        "work": 0,
        "start_latitude": 40.57689727,
        "start_longitude": -73.99047356,
        "end_latitude": 40.72058154,
        "end_longitude": -73.99740673,
        "distance": 8,
        "weekday": 1,
        "hour": 9,
        "month_1": 0,
        "month_2": 1,
        "month_3": 0,
        "month_4": 0,
        "month_5": 0,
        "month_6": 0
    }
}' | python -m json.tool | grep output | cut -d'\' -f 1


The new data that you provide as input have the same columns (features) that were used to train the model, minus the outcome column. 
> Fields description:
> * work is a boolean for work hours (1 if the ride occurs Mon-Fri 8am-5pm, 0 otherwise)
> * start_latitude is the pickup location latitude
> * start_longitude is the pickup location longitude
> * end_latitude is the dropoff location latitude
> * end_longitude is the dropoff location longitude
> * distance is the trip distance in miles
> * weekday is a boolean for weekday (1 if the ride occurs on Mon-Fri, 0 otherwise)
> * hour is the hour of day (0 to 23)
> * month_1 is a boolean if the ride is is in January (1 if true, 0 otherwise)
> * month_2 is a boolean if the ride is is in February (1 if true, 0 otherwise)
> * month_3 is a boolean if the ride is is in March (1 if true, 0 otherwise)
> * month_4 is a boolean if the ride is is in April (1 if true, 0 otherwise)
> * month_5 is a boolean if the ride is is in May (1 if true, 0 otherwise)
> * month_6 is a boolean if the ride is is in June (1 if true, 0 otherwise)

### **4- Time to go through some cleanup**

In [None]:
kubectl delete -f $sc_secret

In [None]:
kubectl delete -f $DeploymentEngineApp

In [None]:
kubectl delete -f $PipelineConfigMap

In [None]:
kubectl delete -f $JupyterNotebookApp

## Summary

In this lab, we have shown you how you can make prediction queries, using REST API calls, to a target deployment engine ML Ops application virtual cluster environment that serves your model.

* [Conclusion](6-Conclusion.ipynb)