<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>Use scikit-learn to predict hand-written digits</b></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>
   </tr>
   <tr style="border: none">
       <th style="border: none"><img src="https://github.com/pmservice/wml-sample-models/raw/master/scikit-learn/hand-written-digits-recognition/images/numbers_banner-04.png" width="600" alt="Icon"> </th>
   </tr>
</table>

This notebook contains the steps and code to work with the [watson-machine-learning-client](https://pypi.python.org/pypi/watson-machine-learning-client) library available in PyPI repository. This notebook introduces commands for getting data and for basic data exploration, pipeline creation, model training and evaluation, model persistance to the Watson Machine Learning (WML) repository, model deployment, and scoring.

Some familiarity with Python is helpful. This notebook uses Python 3.5, scikit-learn and the watson-machine-learning-client package.

You will use the sample data set of hand-written digit images, **sklearn.datasets.load_digits**, which is available in scikit-learn, to recognize hand-written digits.

## Learning goals

In this notebook, you will learn how to:

-  Load a sample data set from ``scikit-learn``.
-  Explore data.
-  Prepare data for training and evaluation.
-  Create an scikit-learn machine learning pipeline.
-  Train and evaluate a model.
-  Store a model in the Watson Machine Learning (WML) repository.
-  Deploy a model for online scoring using the client library.
-  Score sample records using the client library.


## Contents

1.	[Set up the environment](#setup)
2.	[Load and explore data](#load)
3.	[Create the scikit-learn model](#model)
4.	[Save the model](#persistence)
5.	[Deploy and score data in the IBM Cloud](#scoring)
6.	[Summary and next steps](#summary)

<a id="setup"></a>
## 1. Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a [Watson Machine Learning (WML) Service](https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/) instance (a free plan is offered and information about how to create the instance is [here](https://dataplatform.ibm.com/docs/content/analyze-data/wml-setup.html))

- Configure your local python environment:
  + python 3.5
  + scikit-learn 0.19.1
  + watson-machine-learning-client

**Tip**: Run the cell below to install libraries from <a href="https://pypi.python.org/pypi" target="_blank" rel="noopener no referrer">PyPI</a>.

In [1]:
!rm -rf $PIP_BUILD/watson-machine-learning-client

In [None]:
!pip install watson-machine-learning-client --upgrade

<a id="load"></a>
## 2. Load and explore data

In this section you will load the data from scikit-learn sample data sets and then perform a basic exploration.

In [3]:
# Load the data.
import sklearn
from sklearn import datasets

digits = datasets.load_digits()

The sample data set consists of 8x8 pixel images of hand-written digits.

Display the first digit data and label using **data** and **target**.

In [4]:
print(digits.data[0].reshape((8, 8)))

[[  0.   0.   5.  13.   9.   1.   0.   0.]
 [  0.   0.  13.  15.  10.  15.   5.   0.]
 [  0.   3.  15.   2.   0.  11.   8.   0.]
 [  0.   4.  12.   0.   0.   8.   8.   0.]
 [  0.   5.   8.   0.   0.   9.   8.   0.]
 [  0.   4.  11.   0.   1.  12.   7.   0.]
 [  0.   2.  14.   5.  10.  12.   0.   0.]
 [  0.   0.   6.  13.  10.   0.   0.   0.]]


In [5]:
digits.target[0]

0

In the next step, count the data examples.

In [6]:
# Calculate the number of samples.
samples_count = len(digits.images)

print("Number of samples: " + str(samples_count))

Number of samples: 1797


<a id="model"></a>
## 3. Create a scikit-learn model

In this section you learn how to:
- [Prepare the data](#prep)
- [Create the scikit-learn machine learning pipeline](#pipe)
- [Train a model](#train)

### 3.1 Prepare the data<a id="prep"></a>

In this subsection you will split your data into: 
- Train data set
- Test data set
- Score data sets

In [7]:
# Split the data into data sets and display the number of records for each data set.
train_data = digits.data[: int(0.7*samples_count)]
train_labels = digits.target[: int(0.7*samples_count)]

test_data = digits.data[int(0.7*samples_count): int(0.9*samples_count)]
test_labels = digits.target[int(0.7*samples_count): int(0.9*samples_count)]

score_data = digits.data[int(0.9*samples_count): ]

print("Number of training records: " + str(len(train_data)))
print("Number of testing records : " + str(len(test_data)))
print("Number of scoring records : " + str(len(score_data)))

Number of training records: 1257
Number of testing records : 360
Number of scoring records : 180


Your data has been successfully split into three data sets: 

-  The train data set, which is the largest group, is used for training.
-  The test data set will be used for model evaluation and is used to test the assumptions of the model.
-  The score data set will be used for scoring in Cloud.

### 3.2 Create the scikit-learn machine learning pipeline<a id="pipe"></a>

In this section you will create a scikit-learn machine learning pipeline and then train the model.

First, import the scikit-learn machine learning packages that are needed in the subsequent steps.

In [8]:
# Import scikit-learn packages
from sklearn.pipeline import Pipeline
from sklearn import preprocessing
from sklearn import svm, metrics

Standardize the features by removing the mean and scaling to unit variance.

In [9]:
scaler = preprocessing.StandardScaler()

Next, define the estimators you want to use for classification. The following example uses Support Vector Machines (SVM) with the radial basis function as kernel.

In [10]:
clf = svm.SVC(kernel='rbf')

Build the pipeline. A pipeline consists of transformer and an estimator.

In [11]:
pipeline = Pipeline([('scaler', scaler), ('svc', clf)])

### 3.3 Train the model<a id="train"></a>

Now, you can use the **pipeline** and **train data** you defined previously to train your SVM model.

In [12]:
model = pipeline.fit(train_data, train_labels)

Use **test data** to generate an evaluation report to check your **model quality**.

In [13]:
# Evaluate your model.
predicted = model.predict(test_data)

print("Evaluation report: \n\n%s" % metrics.classification_report(test_labels, predicted))

Evaluation report: 

             precision    recall  f1-score   support

          0       1.00      0.97      0.99        37
          1       0.97      0.97      0.97        34
          2       1.00      0.97      0.99        36
          3       1.00      0.94      0.97        35
          4       0.78      0.97      0.87        37
          5       0.97      0.97      0.97        38
          6       0.97      0.86      0.91        36
          7       0.92      0.97      0.94        35
          8       0.91      0.89      0.90        35
          9       0.97      0.92      0.94        37

avg / total       0.95      0.94      0.95       360



**Note:** You can tune your model to achieve better accuracy. For simplicity of this example, the tuning section is omitted.

<a id="persistence"></a>
## 4. Work with the WML repository

In this section you will learn how to use the common Python client to manage your model in the WML repository.

- [Work with your WML instance](#work)
- [Save the model to the WML repository](#save)
- [Load a model from the WML repository](#load)
- [Delete a model from the WML repository](#delete)

**Tip**: You can find more information about the watson-machine-learning-client [here](https://wml-api-pyclient.mybluemix.net).

### 4.1 Work with your WML instance<a id="work"></a>

First, you must import client libraries.

In [14]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

Authenticate to the Watson Machine Learning service on IBM Cloud.

**Tip**: Authentication information (your credentials) can be found in the [Service Credentials](https://console.bluemix.net/docs/services/service_credentials.html#service_credentials) tab of the service instance that you created on IBM Cloud. <BR>If you cannot see the **instance_id** field in **Service Credentials**, click **New credential (+)** to generate new authentication information. 

**Action**: Enter your Watson Machine Learning service instance credentials here.


In [15]:
wml_credentials = {
  "username": "****",
  "password": "****",
  "instance_id": "****",
  "url": "https://ibm-watson-ml.mybluemix.net"
}

#### Create the API client. 

In [16]:
client = WatsonMachineLearningAPIClient(wml_credentials)

#### Get instance details.

In [17]:
import json

instance_details = client.service_instance.get_details()

### 4.2 Save the model to the WML repository<a id="save"></a>

Define the model name, author name and email.

In [18]:
published_model = client.repository.store_model(model=model, meta_props={'name':'Digits prediction model'}, \
                                                training_data=train_data, training_target=train_labels)

#### Get information about a specific model in the WML repository.

In [19]:
published_model_uid = client.repository.get_model_uid(published_model)
model_details = client.repository.get_details(published_model_uid)

print(json.dumps(model_details, indent=2))

{
  "metadata": {
    "url": "https://ibm-watson-ml.mybluemix.net/v3/wml_instances/d0755448-6eb4-425a-b35b-479be91ff2d5/published_models/4d2a4e7d-f89e-4829-bd43-0de0632f0c0b",
    "guid": "4d2a4e7d-f89e-4829-bd43-0de0632f0c0b",
    "created_at": "2018-05-28T18:46:55.938Z",
    "modified_at": "2018-05-28T18:46:55.997Z"
  },
  "entity": {
    "model_type": "scikit-learn-0.19",
    "input_data_schema": {
      "features": {
        "fields": [
          {
            "type": "float",
            "name": "f0"
          },
          {
            "type": "float",
            "name": "f1"
          },
          {
            "type": "float",
            "name": "f2"
          },
          {
            "type": "float",
            "name": "f3"
          },
          {
            "type": "float",
            "name": "f4"
          },
          {
            "type": "float",
            "name": "f5"
          },
          {
            "type": "float",
            "name": "f6"
          },
  

#### Get information about all of the models in the WML repository.

In [20]:
models_details = client.repository.list_models()

------------------------------------  -----------------------  ------------------------  -----------------
GUID                                  NAME                     CREATED                   FRAMEWORK
4d2a4e7d-f89e-4829-bd43-0de0632f0c0b  Digits prediction model  2018-05-28T18:46:55.938Z  scikit-learn-0.19
------------------------------------  -----------------------  ------------------------  -----------------


### 4.3 Load a model from the WML repository<a id="load"></a>

In this subsection you will learn how to load a saved model from a specific WML instance.

In [21]:
loaded_model = client.repository.load(published_model_uid)

Make test predictions to check that the model has been loaded correctly.

In [22]:
test_predictions = loaded_model.predict(test_data[:10])

In [23]:
print(test_predictions)

[4 0 5 3 6 9 6 4 7 5]


As you can see you are able to make predictions, which means that the model has loaded correctly. You have now  learned how save to and load the model from the WML repository.

### 4.4 Delete a model from the WML repository<a id="delete"></a>

The code in the following cell deletes a published model from the WML repository. The code is commented out at this stage because you still need the model for deployment.

In [None]:
# client.repository.delete(published_model_uid)

<a id="scoring"></a>
## 5. Deploy and score data in the IBM Cloud

In this section you will learn how to use the WML client to create online scoring and to score a new data record.

- [Create the online deployment for the published model](#create)
- [Get deployments](#getdeploy)
- [Score data](#score)
- [Delete the deployment](#deldeploy)
- [Delete the model](#delmodel)


### 5.1 Create the online deployment for the published model<a id="create"></a>

In [24]:
created_deployment = client.deployments.create(published_model_uid, "Deployment of scikit model")



#######################################################################################

Synchronous deployment creation for uid: '9312d59f-616c-4e87-a371-06184ba9e901' started

#######################################################################################


INITIALIZING
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='9312d59f-616c-4e87-a371-06184ba9e901'
------------------------------------------------------------------------------------------------




**Note**: Here you use the deployment url saved in published_model object. The next section shows you how to retrieve the deployment url from WML instance.

Now you can define and print an online scoring endpoint. 

In [25]:
scoring_endpoint = client.deployments.get_scoring_url(created_deployment)

print(scoring_endpoint)

https://ibm-watson-ml.mybluemix.net/v3/wml_instances/d0755448-6eb4-425a-b35b-479be91ff2d5/published_models/4d2a4e7d-f89e-4829-bd43-0de0632f0c0b/deployments/9312d59f-616c-4e87-a371-06184ba9e901/online


### 5.2 Get deployments<a id="getdeploy"></a>

In [26]:
deployments = client.deployments.get_details()

You can get the deployment_url by parsing the deployment details for the last deployed model.

In [27]:
deployment_url = client.deployments.get_url(created_deployment)

print(deployment_url)

https://ibm-watson-ml.mybluemix.net/v3/wml_instances/d0755448-6eb4-425a-b35b-479be91ff2d5/published_models/4d2a4e7d-f89e-4829-bd43-0de0632f0c0b/deployments/9312d59f-616c-4e87-a371-06184ba9e901


### 5.3 Score data<a id="score"></a>

Use the following method to run a test scoring request against the deployed model.

**Action**: Prepare scoring payload with the records to score.

In [28]:
scoring_payload = {"values": [list(score_data[0]), list(score_data[1])]}

Use the ``client.deployments.score()`` method to run the scoring.

In [29]:
predictions = client.deployments.score(scoring_endpoint, scoring_payload)

In [30]:
print(json.dumps(predictions, indent=2))

{
  "fields": [
    "prediction"
  ],
  "values": [
    [
      5
    ],
    [
      2
    ]
  ]
}


### 5.4 Delete the deployment<a id="deldeploy"></a>

Use the following method to delete the deployment.

In [31]:
client.deployments.delete(client.deployments.get_uid(created_deployment))

You can check that your deployment was deleted by generating a list of your saved deployments:

In [32]:
client.deployments.list()

----  ----  ----  -----  -------  ---------
GUID  NAME  TYPE  STATE  CREATED  FRAMEWORK
----  ----  ----  -----  -------  ---------


### 5.5 Delete the model<a id="delmodel"></a>

In [33]:
client.repository.delete(published_model_uid)

You can check that your model was deleted by generating a list of your saved models:

In [34]:
client.repository.list_models()

----  ----  -------  ---------
GUID  NAME  CREATED  FRAMEWORK
----  ----  -------  ---------


<a id="summary"></a>
## 6. Summary and next steps     

You successfully completed this notebook! 
 
You learned how to use scikit-learn machine learning as well as Watson Machine Learning for model creation and deployment. 

Check out our [Online Documentation](https://dataplatform.ibm.com/docs/content/analyze-data/wml-setup.html) for more samples, tutorials, documentation, how-tos, and blog posts. 

### Author

**Wojciech Sobala** is a Data Scientist at IBM developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.

Copyright © 2017, 2018 IBM. This notebook and its source code are released under the terms of the MIT License.

<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>