# Batch Scoring on IBM Cloud Pak for Data (ICP4D)

We are going to use this notebook to create and/or run a batch scoring job against a model that has previously been created and deployed to the Watson Machine Learning (WML) instance on Cloud Pak for Data (CP4D).

## 1.0 Install required packages


There are a couple of Python packages we will use in this notebook. First we make sure the Watson Machine Learning client v3 is removed (its not installed by default) and then install/upgrade the v4 version of the client (this package is installed by default on CP4D).
- WML Client: https://wml-api-pyclient-dev-v4.mybluemix.net/#repository

In [1]:
!pip uninstall watson-machine-learning-client -y
!pip install --user watson-machine-learning-client-v4 --upgrade | tail -n 1

Successfully installed ibm-cos-sdk-2.6.0 ibm-cos-sdk-core-2.6.0 ibm-cos-sdk-s3transfer-2.6.0 watson-machine-learning-client-v4-1.0.95


In [2]:
import json
from watson_machine_learning_client import WatsonMachineLearningAPIClient

## 2.0 Create Batch Deployment Job

### 2.1 Instantiate Watson Machine Learning Client

To interact with the local Watson Machine Learning instance, we will be using the Python SDK. 

<font color=red>**<< UPDATE THE VARIABLES BELOW >>**</font>

<font color=red>Replace the `username` and `password` values of `*****` with your Cloud Pak for Data `username` and `password`. The value for `url` should match the `url` for your Cloud Pak for Data cluster.</font>

In [3]:
# Be sure to update these credentials before running the cell.
wml_credentials = {
                   "url": "https://zen-cpd-zen.omid-cp4d-v5-2bef1f4b4097001da9502000c44fc2b2-0001.us-south.containers.appdomain.cloud",
                   "username": "demouser2",
                   "password" : "********",
                   "instance_id": "wml_local",
                   "version" : "2.5.0"
                  }

wml_client = WatsonMachineLearningAPIClient(wml_credentials)

In [4]:
wml_client.spaces.list()

------------------------------------  -----------------------  ------------------------
GUID                                  NAME                     CREATED
451c80cc-d796-4317-97cc-6bce33caa88b  WorkshopDeploymentSpace  2020-05-11T21:12:05.241Z
aa3cc600-86f2-46d4-8ea5-ebd737b31926  ChurnDeployment          2020-05-05T22:58:28.916Z
------------------------------------  -----------------------  ------------------------


### 2.2 Find Deployment Space

We will try to find the `GUID` for the deployment space you want to use and set it as the default space for the client.

<font color=red>**<< UPDATE THE VARIABLES BELOW >>**</font>

- Update with the name of the deployment space where you have created the batch deployment.

In [5]:
# Be sure to update the name of the space with the one you want to use.
DEPLOYMENT_SPACE_NAME = 'WorkshopDeploymentSpace'

In [6]:
all_spaces = wml_client.spaces.get_details()['resources']
space_id = None
for space in all_spaces:
    if space['entity']['name'] == DEPLOYMENT_SPACE_NAME:
        space_id = space["metadata"]["guid"]
        print("\nDeployment Space GUID: ", space_id)

if space_id is None:
    print("WARNING: Your space does not exist. Create a deployment space before proceeding.")
    # We could programmatically create the space.
    #space_id = wml_client.spaces.store(meta_props={wml_client.spaces.ConfigurationMetaNames.NAME: space_name})["metadata"]["guid"]


Deployment Space GUID:  451c80cc-d796-4317-97cc-6bce33caa88b


In [7]:
# Now set the default space to the GUID for your deployment space. If this is successful, you will see a 'SUCCESS' message.
wml_client.set.default_space(space_id)

'SUCCESS'

In [8]:
# These are the models and deployments we currently have in our deployment space.
wml_client.repository.list_models()
wml_client.deployments.list()

------------------------------------  ---------------------------------------------------------------------  ------------------------  --------------
GUID                                  NAME                                                                   CREATED                   TYPE
4ca96645-6381-47ae-8c3a-53f233d232c2  CreditRiskAutoAIExperimentv1 - P4 GradientBoostingClassifierEstimator  2020-05-20T14:23:40.002Z  wml-hybrid_0.1
5f0dbd06-d915-42c6-bbe5-1aeef8196552  CreditRiskSpark05192020v1                                              2020-05-19T14:43:19.002Z  mllib_2.3
------------------------------------  ---------------------------------------------------------------------  ------------------------  --------------
------------------------------------  ---------------------------  -----  ------------------------  -------------
GUID                                  NAME                         STATE  CREATED                   ARTIFACT_TYPE
49c0575f-1460-429e-b326-d9a81c7d82e3 

### 2.3 Find Batch Deployment

We will try to find the batch deployment which was created.

<font color=red>**<< UPDATE THE VARIABLES BELOW >>**</font>

- Update the variable with the name of the batch deployment you created previously.

In [9]:
DEPLOYMENT_NAME = 'CreditRiskBatchDeploymentv1'

In [10]:
wml_deployments = wml_client.deployments.get_details()
deployment_uid = None
deployment_details = None
for deployment in wml_deployments['resources']:
    if DEPLOYMENT_NAME == deployment['entity']['name']:
        deployment_uid = deployment['metadata']['guid']
        deployment_details = deployment
        #print(json.dumps(deployment_details, indent=3))
        break

print("Deployment id: {}".format(deployment_uid))
wml_client.deployments.get_details(deployment_uid)

Deployment id: 49c0575f-1460-429e-b326-d9a81c7d82e3


{'metadata': {'parent': {'href': ''},
  'guid': '49c0575f-1460-429e-b326-d9a81c7d82e3',
  'modified_at': '',
  'created_at': '2020-05-20T14:42:45.513Z',
  'href': '/v4/deployments/49c0575f-1460-429e-b326-d9a81c7d82e3'},
 'entity': {'name': 'CreditRiskBatchDeploymentv1',
  'custom': {},
  'description': '',
  'compute': {'name': 'XS', 'nodes': 1},
  'batch': {'schedule': {}},
  'space': {'href': '/v4/spaces/451c80cc-d796-4317-97cc-6bce33caa88b'},
  'status': {'state': 'ready'},
  'asset': {'href': '/v4/models/5f0dbd06-d915-42c6-bbe5-1aeef8196552?space_id=451c80cc-d796-4317-97cc-6bce33caa88b'},
  'auto_redeploy': False}}

### 2.4 Get Batch Test Data

We will load some data to run the batch predictions.

In [11]:
import pandas as pd

from project_lib import Project
project = Project.access()

batch_set = pd.read_csv(project.get_file('German-Credit-Risk-SmallBatchSet.csv'))
batch_set = batch_set.drop('CUSTOMERID', axis=1,errors = 'ignore')
batch_set.head()

Unnamed: 0,CHECKINGSTATUS,LOANDURATION,CREDITHISTORY,LOANPURPOSE,LOANAMOUNT,EXISTINGSAVINGS,EMPLOYMENTDURATION,INSTALLMENTPERCENT,SEX,OTHERSONLOAN,CURRENTRESIDENCEDURATION,OWNSPROPERTY,AGE,INSTALLMENTPLANS,HOUSING,EXISTINGCREDITSCOUNT,JOB,DEPENDENTS,TELEPHONE,FOREIGNWORKER
0,0_to_200,4,all_credits_paid_back,car_new,250,100_to_500,less_1,2,female,none,3,real_estate,26,bank,rent,1,unskilled,1,none,yes
1,0_to_200,14,credits_paid_to_date,car_new,3148,less_100,1_to_4,3,male,none,3,car_other,41,none,own,2,skilled,1,none,yes
2,greater_200,19,credits_paid_to_date,radio_tv,5351,100_to_500,greater_7,4,male,none,3,savings_insurance,49,none,own,2,skilled,1,yes,yes
3,greater_200,34,outstanding_credit,other,5790,500_to_1000,greater_7,5,male,none,4,car_other,44,stores,own,2,unskilled,1,yes,yes
4,less_0,4,all_credits_paid_back,car_new,250,100_to_500,less_1,1,female,none,1,real_estate,21,bank,rent,1,unskilled,1,none,yes


### 2.5 Create Job

We can now use the information about the deployment and the test data to create a new job against our batch deployment. We submit the data as inline payload and want the results (i.e predictions) stored in a CSV file.

In [12]:
import time
timestr = time.strftime("%Y%m%d_%H%M%S")
job_payload = {
    wml_client.deployments.ScoringMetaNames.INPUT_DATA: [{
        'fields': batch_set.columns.values.tolist(),
        'values': batch_set.values.tolist()
    }],
    wml_client.deployments.ScoringMetaNames.OUTPUT_DATA_REFERENCE: {
            "type": "data_asset",
            "connection": {},
            "location": {
                "name": "batchres_{}_{}.csv".format(timestr,deployment_uid),
                "description": "results"
            }
    }
}

job = wml_client.deployments.create_job(deployment_id=deployment_uid, meta_props=job_payload)
job_uid = wml_client.deployments.get_job_uid(job)

print('Job uid = {}'.format(job_uid))

Job uid = 7f8dd9e4-0909-4358-bdc5-51652152bd48


In [13]:
wml_client.deployments.list_jobs()

------------------------------------  ---------  ------------------------  ------------------------------------
JOB-UID                               STATE      CREATED                   DEPLOYMENT-ID
7f8dd9e4-0909-4358-bdc5-51652152bd48  queued     2020-05-20T18:59:01.742Z  49c0575f-1460-429e-b326-d9a81c7d82e3
9bda0c09-d8e4-416d-bb71-48250e90bbc3  canceled   2020-05-20T17:05:52.944Z  0b37cf8a-ee3a-43cb-8797-05b38c8ef2a2
42f387d2-8f71-4e06-9023-b23573e19511  completed  2020-05-08T15:49:18.084Z  0b37cf8a-ee3a-43cb-8797-05b38c8ef2a2
------------------------------------  ---------  ------------------------  ------------------------------------


## 3.0 Monitor Batch Job Status

The batch job is an async operation. We can use the identifier to track its progress. Below we will just poll until the job completes (or fails).

In [14]:
def poll_async_job(client, job_uid):
    import time
    while True:
        job_status = client.deployments.get_job_status(job_uid)
        print(job_status)
        state = job_status['state']
        if state == 'completed' or 'fail' in state:
            return client.deployments.get_job_details(job_uid)
        time.sleep(5)
            
job_details = poll_async_job(wml_client, job_uid)

{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'queued', 'running_at': '', 'completed_at': ''}
{'state': 'que

In [15]:
wml_client.deployments.list_jobs()

------------------------------------  ---------  ------------------------  ------------------------------------
JOB-UID                               STATE      CREATED                   DEPLOYMENT-ID
7f8dd9e4-0909-4358-bdc5-51652152bd48  completed  2020-05-20T18:59:01.742Z  49c0575f-1460-429e-b326-d9a81c7d82e3
9bda0c09-d8e4-416d-bb71-48250e90bbc3  canceled   2020-05-20T17:05:52.944Z  0b37cf8a-ee3a-43cb-8797-05b38c8ef2a2
42f387d2-8f71-4e06-9023-b23573e19511  completed  2020-05-08T15:49:18.084Z  0b37cf8a-ee3a-43cb-8797-05b38c8ef2a2
------------------------------------  ---------  ------------------------  ------------------------------------


### 3.1 Check Results

With the job complete, we can see the predictions. 

In [16]:
wml_client.deployments.get_job_details()

{'resources': [{'metadata': {'guid': '42f387d2-8f71-4e06-9023-b23573e19511',
    'href': '/v4/deployment_jobs/42f387d2-8f71-4e06-9023-b23573e19511',
    'created_at': '2020-05-08T15:49:18.084Z',
    'parent': {'href': ''}},
   'entity': {'deployment': {'href': '/v4/deployments/0b37cf8a-ee3a-43cb-8797-05b38c8ef2a2'},
    'scoring': {'input_data': [{'fields': ['gender',
        'SeniorCitizen',
        'Partner',
        'Dependents',
        'tenure',
        'PhoneService',
        'MultipleLines',
        'InternetService',
        'OnlineSecurity',
        'OnlineBackup',
        'DeviceProtection',
        'TechSupport',
        'StreamingTV',
        'StreamingMovies',
        'Contract',
        'PaperlessBilling',
        'PaymentMethod',
        'MonthlyCharges',
        'TotalCharges'],
       'values': [['Female',
         1,
         'Yes',
         'Yes',
         23,
         'Yes',
         'No',
         'DSL',
         'Yes',
         'Yes',
         'Yes',
         'Yes

In [17]:
print(json.dumps(job_details, indent=2))

{
  "metadata": {
    "guid": "7f8dd9e4-0909-4358-bdc5-51652152bd48",
    "href": "/v4/deployment_jobs/7f8dd9e4-0909-4358-bdc5-51652152bd48",
    "created_at": "2020-05-20T18:59:01.742Z",
    "parent": {
      "href": ""
    }
  },
  "entity": {
    "deployment": {
      "href": "/v4/deployments/49c0575f-1460-429e-b326-d9a81c7d82e3"
    },
    "scoring": {
      "input_data": [
        {
          "fields": [
            "CHECKINGSTATUS",
            "LOANDURATION",
            "CREDITHISTORY",
            "LOANPURPOSE",
            "LOANAMOUNT",
            "EXISTINGSAVINGS",
            "EMPLOYMENTDURATION",
            "INSTALLMENTPERCENT",
            "SEX",
            "OTHERSONLOAN",
            "CURRENTRESIDENCEDURATION",
            "OWNSPROPERTY",
            "AGE",
            "INSTALLMENTPLANS",
            "HOUSING",
            "EXISTINGCREDITSCOUNT",
            "JOB",
            "DEPENDENTS",
            "TELEPHONE",
            "FOREIGNWORKER"
          ],
          "v

## Congratulations, you have created and submitted a job for batch scoring !