<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>Deploy and score an AutoAI model using the V4 Python client</b></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>
   </tr>
</table>

This notebook will take you through the steps of using the Watson Machine Learning V4 Python client to deploy and score an AutoAI model made with the <a href="https://dataplatform.cloud.ibm.com/exchange/public/entry/view/50fa9246181026cd7ae2a5bc7ea444e6" target="_blank" rel="noopener noreferrer">Bank Marketing</a> data set.

**<a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/autoai-overview.html" target="_blank" rel="noopener noreferrer">AutoAI</a>** is a tool on Watson Studio that automatically analyzes data and builds model pipelines with an uploaded data set. AutoAI employs algorithms to infer patterns from your data, prepare the data, and modify the data, all the while requiring minimal user input. The AutoAI process follows the following steps
- Data preprocessing
- Automated model selection
- Automated feature engineering
- Hyperparameter optimization

Once the tool is finished building the model pipelines, it will list all the models on a leaderboard, ranked by a metric of the user's choosing such as accuracy, precision, or ROC-AUC (based on the prediction type).

You will use the Bank Marketing data set available on the <a href="https://archive.ics.uci.edu/ml/datasets/Bank+Marketing" target="_blank" rel="noopener noreferrer">UCI Machine Learning Repository</a>. This data deals with a Portuguese banking institution's marketing campaign, which used phone calls to determine whether customers would subscribe to a term deposit.

The notebook uses Python 3.6 and the Watson Machine Learning V4 Python client to manage the deployments.

## 1. Setting up


**Action:** Before you proceed with the steps in this notebook, please ensure that you've completed the following:
1. Get a <a href="https://cloud.ibm.com/catalog/services/machine-learning" target="_blank" rel="noopener noreferrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create an instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/wml-setup.html" target="_blank" rel="noopener noreferrer">here</a>).
1. Download the small <a href="https://dataplatform.cloud.ibm.com/exchange/public/entry/view/50fa9246181026cd7ae2a5bc7ea444e6" target="_blank" rel="noopener noreferrer">Bank Marketing</a> data set from the Watson Studio Community.
2. Load the .csv file as a Data Asset in your project by clicking **Add to Project** on your project page.
3. In the next step, you will <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/autoai-build.html" target="_blank" rel="noopener noreferrer">build an AutoAI model</a>.

## 2. Build an AutoAI model


1. In your Watson Studio project, click **Add to project**. Then, click **AutoAI Experiment**.
2. Make sure you have **From Sample** selected. Specify a name and description for your experiment and click **Create**.
3. To add the training data, choose the Bank Marketing .csv file from your project. You'll now see an option to select the column to predict. 
4. Select the column named `y`. Based on analyzing a subset of the data set, AutoAI will choose a default model type: binary classification. 
5. Click **Run Experiment** to begin the model pipeline creation.
6. When the pipeline generation process completes, you can view the leading model candidates and <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/autoai-view-results.html" target="_blank" rel="noopener noreferrer">evaluate</a> them before saving a pipeline as a model. The default ranking metric for binary classification models is the area under the ROC curve, so with Ranking based on `ROC AUC`, select the top pipeline. Click **Save as model**. 

## 3. Deploy the model

### Load the data

1. Download the **Bank Marketing (add link)** Data Set from the Watson Studio Community.
2. Load the .csv file into your notebook. Click the Data icon on the notebook action bar. Drop the file into the box or browse to select the file. The file is loaded to your object storage and appears in the Data Assets section of the project.
3. To load the data into a DataFrame, click in the next code cell and select **Insert to code > Insert Pandas DataFrame** under the file name.
4. Rename `df_data_x` to `df_data_1`.
5. Run the cell.

In [1]:
# Load Bank Marketing data as dataframe
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share your notebook.
client_cf30deff925f4df28741f6b0a9ebdaab = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='***',
    ibm_auth_endpoint="***",
    config=Config(signature_version='oauth'),
    endpoint_url='h***')

body = client_cf30deff925f4df28741f6b0a9ebdaab.get_object(Bucket='watsonstudioexamplenotebooks-donotdelete-pr-5atise9lcgbnvl',Key='bank-marketing-small.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_1 = pd.read_csv(body)
df_data_1.head()

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown,no
1,33,services,married,secondary,no,4789,yes,yes,cellular,11,may,220,1,339,4,failure,no
2,35,management,single,tertiary,no,1350,yes,no,cellular,16,apr,185,1,330,1,failure,no
3,30,management,married,tertiary,no,1476,yes,yes,unknown,3,jun,199,4,-1,0,unknown,no
4,59,blue-collar,married,secondary,no,0,yes,no,unknown,5,may,226,1,-1,0,unknown,no


### V4 Python client

In this section, you will learn how to use the V4 Python client to manage your model in the WML repository.

**Tip**: Authentication information (your credentials) can be found in the <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-get-wml-credentials.html" target="_blank" rel="noopener noreferrer">Service Credentials</a> tab of the service instance that you created on the IBM Cloud. <BR>If you cannot find the **instance_id** field in **Service Credentials**, click **New credential (+)** to generate new authentication information.

In [2]:
 wml_credentials={
  'url': 'https://ibm-watson-ml.mybluemix.net',
  'apikey': '****',
  'username': '****',
  'password': '****',
  'instance_id': '****'
}

**Tip:** You can find more information about the watson-machine-learning-client <a href="https://wml-api-pyclient-dev-v4.mybluemix.net/" target="_blank" rel="noopener noreferrer">here</a>.

In [4]:
!rm -rf $PIP_BUILD/watson-machine-learning-client-v4

In [None]:
!pip install watson-machine-learning-client-v4

First, import the `WatsonMachineLearningAPIClient` module.

In [6]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

Instantiate the `WatsonMachineLearningAPIClient` object.

In [7]:
client = WatsonMachineLearningAPIClient(wml_credentials)

You can see the list of models in the WML Repository by running the following cell.

In [8]:
client.repository.list_models()

------------------------------------  -------------------------------------------------------  ------------------------  -----------------
GUID                                  NAME                                                     CREATED                   TYPE
53b90fb3-0d12-4d66-96c8-b4e4cf96a0ae  Customer churn Spark model                               2019-07-03T15:47:49.778Z  mllib_2.3
ad24c140-f97a-49f2-b02c-f8ce44a58c27  Custom ARIMA estimator for sklearn pipeline              2019-07-03T01:04:34.000Z  scikit-learn_0.19
5f1cfce9-01fe-466d-a9a6-0acf8f9d3352  Diet                                                     2019-06-21T10:16:55.901Z  do-docplex_12.9
abcb8b9f-ea49-4274-b4c5-3b1aec281f0f  Diet                                                     2019-06-21T10:13:35.444Z  do-docplex_12.9
0c83a866-fa91-44b4-ba53-d2ce231fdf48  bank marketing - P3 GradientBoostingClassifierEstimator  2019-06-06T18:07:33.500Z  wml-hybrid_0.1
471b39cc-9c3b-4ff9-a8ea-0297efe0ca5d  Boston house pric

**Action:** Find the AutoAI model you saved from the list of models above. Copy the GUID and paste it in the cell below.

In [9]:
# Note: Enter the saved AutoAI model's GUID here
model_uid = '0c83a866-fa91-44b4-ba53-d2ce231fdf48'

You'll now use the `model_uid` to create the deployment.

### Deploy

Now, you can create a deployment of your AutoAI model using the V4 Python client.

In [10]:
# Deployment metadata.
deploy_meta = {
    client.deployments.ConfigurationMetaNames.NAME: "AutoAI Deployment - bank marketing",
    client.deployments.ConfigurationMetaNames.ONLINE: {}
}

In [11]:
# Create the deployment.
deploy_details = client.deployments.create(model_uid, meta_props=deploy_meta)



#######################################################################################

Synchronous deployment creation for uid: '0c83a866-fa91-44b4-ba53-d2ce231fdf48' started

#######################################################################################


initializing


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='a166c7f3-be1e-463a-962e-55512310d6af'
------------------------------------------------------------------------------------------------




Get the list of all the deployments.

In [12]:
client.deployments.list()

------------------------------------  -------------------------------------------------------  ------------  ------------------------  -------------
GUID                                  NAME                                                     STATE         CREATED                   ARTIFACT_TYPE
a166c7f3-be1e-463a-962e-55512310d6af  AutoAI Deployment - bank marketing                       initializing  2019-07-08T18:00:55+0000  model
7985e484-4192-400e-82e5-3756d4600668  ARIMA model python function deployment                   ready         2019-07-03T20:33:40.817Z  function
6aedb5b7-638a-4388-ab0d-45fecb3b7081  Customer Churn Prediction                                ready         2019-07-03T15:51:34.166Z  model
cfcd5f9e-5b07-4bea-b57d-304c12254add  sklearn_pipeline_arima                                   ready         2019-07-03T01:06:29.302Z  model
a217b1be-5a29-4f69-bcc1-dc1287b5c30a  AutoAI Deployment - bank marketing                       ready         2019-06-24T18:15:32+0000  

The model has been successfully deployed.

In [13]:
# Deployment UID.
deployment_uid = client.deployments.get_uid(deploy_details)
print('Deployment uid = {}'.format(deployment_uid))

Deployment uid = a166c7f3-be1e-463a-962e-55512310d6af


## 4. Score the model

In this notebook, you will learn how to score deployed models using the V4 Python client.

Use the following method to run a test scoring request against the deployed model using the V4 Python client.

Prepare the scoring payload with the records to score.

In [14]:
# Prepare scoring payload.
from pprint import pprint
job_payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [{
        'fields': ['age', 'job', 'marital', 'education', 'default', 'balance', 'housing',
       'loan', 'contact', 'day', 'month', 'duration', 'campaign', 'pdays',
       'previous', 'poutcome', 'y'],
        'values': [[30, 'unemployed', 'married', 'primary', 'no', 1787, 'no', 'no', 'cellular', 19, 'oct', 79, 1, -1, 0, 'unknown', 'no'],
                   [20, 'student', 'single', 'secondary', 'no', 502, 'no', 'no', 'cellular', 30, 'apr', 261, 1, -1, 0, 'unknown', 'yes']]
    }]
}
pprint(job_payload)

{'scoring_input_data': [{'fields': ['age',
                                    'job',
                                    'marital',
                                    'education',
                                    'default',
                                    'balance',
                                    'housing',
                                    'loan',
                                    'contact',
                                    'day',
                                    'month',
                                    'duration',
                                    'campaign',
                                    'pdays',
                                    'previous',
                                    'poutcome',
                                    'y'],
                         'values': [[30,
                                     'unemployed',
                                     'married',
                                     'primary',
                               

Finally, you can use this method to run the scoring.

In [15]:
# Perform prediction and display the result.
job_details = client.deployments.score(deployment_uid, job_payload)
pprint(job_details)

{'predictions': [{'fields': ['prediction', 'probability'],
                  'values': [[0.0, [0.9732180907567131, 0.02678190924328699]],
                             [1.0,
                              [0.24594307548171945, 0.7540569245182805]]]}]}


You can see that the first client is predicted to have **not been subscribed** a term deposit with a probability of 0.97. The second client is predicted to have been **subscribed** a term deposit with a probability of 0.75.

## 5. Summary and next steps

You have successfully completed this notebook! 

You learned how to use deploy and score an AutoAI model using the V4 Python client. 

Check out our <a href="https://dataplatform.ibm.com/docs/content/analyze-data/wml-setup.html" target="_blank" rel="noopener no referrer">Online Documentation</a> for more samples, tutorials, documentation, how-tos, and blog posts. 

### Citation

Moro, S., Cortez, P., & Rita, P. (2014). A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62, 22-31.

### Authors

**Ananya Kaushik** is a Data Scientist at IBM.

Copyright © 2019 IBM. This notebook and its source code are released under the terms of the MIT License.

<div style='background:#F5F7FA; height:110px; padding: 2em; font-size:14px;'>
<span style='font-size:18px;color:#152935;'>Love this notebook? </span>
<span style='font-size:15px;color:#152935;float:right;margin-right:40px;'>Don't have an account yet?</span><br>
<span style='color:#5A6872;'>Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style='border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;'><a href='https://ibm.co/wsnotebooks' target='_blank' style='color: #3d70b2;text-decoration: none;'>Sign Up</a></span><br>
</div>