# Introduction

This tutorial will introduce you to some basic methods for deploying your machine learning model to production for your clients and other parts of the project to access it. Deploying code to production is a huge part of data science, and there are currently many ways of doing it with different trade-offs. The most commnon methods for companies are now cloud computing servcies. Amazon AWS, Google Clouds, IBM Watson Machine Learning service, and countless other companies are providing such services for organizations ranging from small businesss and non-profits, to industry giants such as walmarts, not counting mass internal usage. 

## Tutorial content

In this tutorial, we will demonstrate how to deploy a locally trained model to Watson Machine Learning service, specifically using [scikit-learn](https://scikit-learn.org/stable/modules/classes.html), and [ibm-watson-machine-learning](https://pypi.python.org/pypi/ibm-watson-machine-learning). We will also introduce commands that explore the data, train the model from a pre-defined model in scikit-learn, regularize the model, deploying the model, scoring and updating, and finally re-deploy the model. 

We will be using data that is part of the sample data in scikit-learn from the MNIST dataset and with the data we will train a model that recognize hand-written digits 0-9. There are many open source libraries that provides data for hand-written digits, MNIST is one of the most widely used dataset that contains 60,000 handwritten digits for training a machine learning model and 10,000 handwritten digits for testing models. MNIST was established since 1998 and has been the benchmark for classification purposed machine learning models. 

Here's an outline of what we will cover in the tutorial:
- Build and train a model.
- Save a pipeline as a model.
- Deploy the model.
- Test the deployed model.

# Installing the libraries

Before diving into machine learning, we will have to install the following libraries and create an account on [IBM Cloud](https://cloud.ibm.com/registration). We will also need to create a Watson Machine Learning Instance and obtain an API key. However, since we are focusing more on deployment part of the process, we will be using pre-obtained account, API key, and deployment space in this tutorial.

At the same time, run the following block to install IBM Wtson Machine Learning python SDK and other necessary libraries.

In [33]:
!pip install ibm-watson-machine-learning
import sklearn
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn import preprocessing
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
import json
import time
import pandas as pd
from ibm_watson_machine_learning import APIClient
from ibm_watson_machine_learning.helpers import DataConnection, S3Location



# Setup

Now with necessary packages install, we need to build connections and authenticate the Watson Machine Learning service on IBM Cloud Pack for Data. As stated above, this tutorial will use the account I pre-registered with IBM and the API keys and other credentials. Please do not distribute this information as it will impact the student created the tutorial negatively. 

Now with these information we will supply as credentials.

In [34]:
wml_credentials = {
    "apikey": "xSqONJn72levvEh-YuFFrQ2LigfQBkIwHkGpwJSvS7ZL",
    "url": 'https://us-south.ml.cloud.ibm.com'
}

client = APIClient(wml_credentials)

We also need to create a deployment space to store and manage our instances. To be more specific, deployment spaces contain deployable assets, deployments, deployment jobs, associated input and output data, and the associated environments. You can use spaces to deploy models and manage your deployments. We can access the deployment space through either the user interface, or REST APIs. 

Again we are using a pre-established deployment space for the purpose of this tutorial. First we need to get the ID of our deployment space.

In [35]:
client.spaces.list(limit=10)

------------------------------------  --------------------  ------------------------
ID                                    NAME                  CREATED
849ff240-61ae-4d5a-98c4-af28ad2ed5dd  PDS-Deployment-Space  2022-04-06T20:11:22.562Z
------------------------------------  --------------------  ------------------------


We will set our client to use the above deployment space with the ID we obtained from above. 

In [36]:
client.set.default_space("849ff240-61ae-4d5a-98c4-af28ad2ed5dd")

'SUCCESS'

# Explore dataset 

As the first step we will load the data from the scikit-learn sample dataset and perform basic exploratory data analysis. 

In [37]:
digits = datasets.load_digits()

Let's take a look at the first row of the dataset, using data and target.

In [38]:
digits.data[0].reshape((8, 8))


array([[ 0.,  0.,  5., 13.,  9.,  1.,  0.,  0.],
       [ 0.,  0., 13., 15., 10., 15.,  5.,  0.],
       [ 0.,  3., 15.,  2.,  0., 11.,  8.,  0.],
       [ 0.,  4., 12.,  0.,  0.,  8.,  8.,  0.],
       [ 0.,  5.,  8.,  0.,  0.,  9.,  8.,  0.],
       [ 0.,  4., 11.,  0.,  1., 12.,  7.,  0.],
       [ 0.,  2., 14.,  5., 10., 12.,  0.,  0.],
       [ 0.,  0.,  6., 13., 10.,  0.,  0.,  0.]])

In [39]:
digits.target[0]

0

We would also like to know the size of the dataset.

In [40]:
samples_count = len(digits.images)
samples_count

1797

# Create scikit-learn model

## Prepare datasets

First we need to split our data into 3 sets: train, validation, test. We first divide the dataset into 90% training set and 10% testing set, then further divide training set into 20% validation set and 80% training set. 

In [41]:
train_data = digits.data[: int(0.72*samples_count)]
train_labels = digits.target[: int(0.72*samples_count)]
vali_data = digits.data[int(0.72*samples_count): int(0.9*samples_count)]
vali_labels = digits.target[int(0.72*samples_count): int(0.9*samples_count)]
test_data = digits.data[int(0.9*samples_count): ]

## Training model

We will train our model using scikit-learn packages. We will setup a pipeline that includes a transformer and an estimator
- 1) clean and preprocess the datasets; 
- 2) define an estimator

We will use the standard scaler as our processor as shown in class examples. We will use scikit-learn's pre-defined Naive Bayes classifier.

In [45]:
model = GaussianNB()
model.fit(train_data, train_labels)

GaussianNB()

## Validate model quality

We will evaluate our model's quality with our validation set. The metrics we will use for evaluation also comes from scikit-learn. The metrics package has a classification report that generates several important metrics for evaluating the model's precision, including f1-score (combined metrics for precision and recall), support (number of actual occurrences of the class in the dataset), accuracy, macro avg (averaging the unweighted mean per label), and weighted avg. 

In [46]:
predicted = model.predict(vali_data)
print("Evaluation report: \n\n%s" % metrics.classification_report(vali_labels, predicted))

Evaluation report: 

              precision    recall  f1-score   support

           0       0.94      0.97      0.96        34
           1       0.69      0.75      0.72        32
           2       1.00      0.78      0.88        32
           3       0.96      0.82      0.89        33
           4       0.97      0.90      0.93        31
           5       0.91      0.94      0.93        33
           6       0.97      0.97      0.97        34
           7       0.71      0.84      0.77        32
           8       0.72      0.77      0.74        30
           9       0.74      0.76      0.75        33

    accuracy                           0.85       324
   macro avg       0.86      0.85      0.85       324
weighted avg       0.86      0.85      0.85       324



We can also tune our model based on this report for better accuracy or for specific trade-offs. However we will exclude the model tuning section in this tutorial. 

# Deploying model 

Now we have trained our model locally, we will store the model in the Watson Machine Learning repository by using the IBM Watson Machine Learning SDK.

## Persist the local model

First we publish our model in the Watson Machine Learning repository on Cloud. `store_model` takes a model, metadata (provides more context to the instance in the store space), and train dataset and label. 

In [47]:
sofware_spec_uid = client.software_specifications.get_id_by_name("default_py3.8")

metadata = {
    client.repository.ModelMetaNames.NAME: 'Scikit model',
    client.repository.ModelMetaNames.TYPE: 'scikit-learn_0.23',
    client.repository.ModelMetaNames.SOFTWARE_SPEC_UID: sofware_spec_uid
}

published_model = client.repository.store_model(
    model=model,
    meta_props=metadata,
    training_data=train_data,
    training_target=train_labels)



With the model uploaded in Cloud, we can try to retrieve data and details about the model from our storing space. You can scroll through the json output file, and you will see our defined metadata appears somewhere in the json file.  

In [48]:
published_model_id = client.repository.get_model_id(published_model)
model_details = client.repository.get_details(published_model_id)
print(json.dumps(model_details, indent=2))

{
  "entity": {
    "hybrid_pipeline_software_specs": [],
    "label_column": "l1",
    "software_spec": {
      "id": "ab9e1b80-f2ce-592c-a7d2-4f2344f77194",
      "name": "default_py3.8"
    },
    "training_data_references": [
      {
        "connection": {
          "access_key_id": "not_applicable",
          "endpoint_url": "not_applicable",
          "secret_access_key": "not_applicable"
        },
        "id": "1",
        "location": {},
        "schema": {
          "fields": [
            {
              "name": "f0",
              "type": "float"
            },
            {
              "name": "f1",
              "type": "float"
            },
            {
              "name": "f2",
              "type": "float"
            },
            {
              "name": "f3",
              "type": "float"
            },
            {
              "name": "f4",
              "type": "float"
            },
            {
              "name": "f5",
              "type": "float

Now lets try to get all the models stored in the space. 

In [49]:
models_details = client.repository.list_models()

------------------------------------  ------------  ------------------------  -----------------
ID                                    NAME          CREATED                   TYPE
750f5f3e-cec3-4305-b191-aebd47b0e664  Scikit model  2022-04-07T00:50:13.002Z  scikit-learn_0.23
10c602d9-693e-47f6-b921-862d0cd927dc  Scikit model  2022-04-06T23:21:11.002Z  scikit-learn_0.23
4278f933-e53b-4036-bf79-d71bca9d0b4c  Scikit model  2022-04-06T23:13:13.002Z  scikit-learn_0.23
------------------------------------  ------------  ------------------------  -----------------


## Deploy and score

Finally we are ready to deploy our model. In this section we will create online predicting and predict a new data record using the IBM Watson Machine Learning SDK.

We will start off by creating an online deployment for the published model. 

In [50]:
metadata = {
    client.deployments.ConfigurationMetaNames.NAME: "Deployment of scikit model",
    client.deployments.ConfigurationMetaNames.ONLINE: {}
}

created_deployment = client.deployments.create(published_model_id, meta_props=metadata)



#######################################################################################

Synchronous deployment creation for uid: '750f5f3e-cec3-4305-b191-aebd47b0e664' started

#######################################################################################


initializing
Note: Software specification default_py3.8 is deprecated. Use runtime-22.1-py3.9 software specification instead. For details, see https://dataplatform.cloud.ibm.com/docs/content/wsj/wmls/wmls-deploy-python-types.html?context=cpdaas.

ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='73f183b2-42fe-4b0e-b118-ae225f1c1e52'
------------------------------------------------------------------------------------------------




Here we use the deployment url saved in the `published_model` object. In the next section, we retrieve the deployment url from the Watson Machine Learning instance.

In [51]:
deployment_uid = client.deployments.get_uid(created_deployment)

Now we print the online scoring endpoint. 

In [55]:
endpoint = client.deployments.get_scoring_href(created_deployment)
print(endpoint)

https://us-south.ml.cloud.ibm.com/ml/v4/deployments/73f183b2-42fe-4b0e-b118-ae225f1c1e52/predictions


We can also list all existing deployments and get deployment details.

In [56]:
client.deployments.list()

------------------------------------  --------------------------  -----  ------------------------
GUID                                  NAME                        STATE  CREATED
73f183b2-42fe-4b0e-b118-ae225f1c1e52  Deployment of scikit model  ready  2022-04-07T00:50:36.300Z
------------------------------------  --------------------------  -----  ------------------------


In [57]:
client.deployments.get_details(deployment_uid)

Note: Software specification default_py3.8 is deprecated. Use runtime-22.1-py3.9 software specification instead. For details, see https://dataplatform.cloud.ibm.com/docs/content/wsj/wmls/wmls-deploy-python-types.html?context=cpdaas.


{'entity': {'asset': {'id': '750f5f3e-cec3-4305-b191-aebd47b0e664'},
  'custom': {},
  'deployed_asset_type': 'model',
  'hardware_spec': {'id': 'e7ed1d6c-2e89-42d7-aed5-863b972c1d2b',
   'name': 'S',
   'num_nodes': 1},
  'name': 'Deployment of scikit model',
  'online': {},
  'space_id': '849ff240-61ae-4d5a-98c4-af28ad2ed5dd',
  'status': {'online_url': {'url': 'https://us-south.ml.cloud.ibm.com/ml/v4/deployments/73f183b2-42fe-4b0e-b118-ae225f1c1e52/predictions'},
   'serving_urls': ['https://us-south.ml.cloud.ibm.com/ml/v4/deployments/73f183b2-42fe-4b0e-b118-ae225f1c1e52/predictions'],
   'state': 'ready'}},
 'metadata': {'created_at': '2022-04-07T00:50:36.300Z',
  'id': '73f183b2-42fe-4b0e-b118-ae225f1c1e52',
  'modified_at': '2022-04-07T00:50:36.300Z',
  'name': 'Deployment of scikit model',
  'owner': 'IBMid-663003G1H5',
  'space_id': '849ff240-61ae-4d5a-98c4-af28ad2ed5dd'},
    'message': 'Software specification default_py3.8 is deprecated. Use runtime-22.1-py3.9 software specif

We can use the following method to perform a test prediction against the deployed model.

In [58]:
test_0 = list(test_data[0])
test_1 = list(test_data[1])
payload = {"input_data": [{"values": [test_0, test_1]}]}
predictions = client.deployments.score(deployment_uid, payload)
predictions

{'predictions': [{'fields': ['prediction', 'probability'],
   'values': [[1,
     [0.0,
      0.9999801928897586,
      1.747353420552863e-24,
      1.5290467883620225e-25,
      2.0314705060312653e-20,
      1.6740980396227345e-05,
      2.7399648472432136e-20,
      2.146632436324533e-31,
      3.066129844528332e-06,
      7.749851268684119e-26]],
    [9,
     [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0804329725294796e-43, 0.0, 0.0, 1.0]]]}]}

# Clean up

To clean all the created instances, run the following command. 

In [59]:
deployments_details = client.deployments.get_details()
for deployment in deployments_details['resources']:
    client.deployments.delete(deployment['metadata']['id'])

Now we can check our deployment list again to make sure we deleted all our deployed instances. 

In [60]:
client.deployments.list(limit=10)

----  ----  -----  -------
GUID  NAME  STATE  CREATED
----  ----  -----  -------
