# Batch Scoring on IBM Cloud Pak for Data (ICP4D)

We are going to use this notebook to create and/or run a batch scoring job against a model that has previously been created and deployed to the Watson Machine Learning (WML) instance on Cloud Pak for Data (CP4D).

## 1.0 Install required packages


There are a couple of Python packages we will use in this notebook. We will install/upgrade the Watson Machine Learning (WML) client library to interact with the Watson Machine Learning service from this notebook. These package may already be installed by default on CP4D.

- WML Client: http://ibm-wml-api-pyclient.mybluemix.net/

In [23]:
import warnings
warnings.filterwarnings('ignore')

In [24]:
!pip install --user ibm-watson-machine-learning --upgrade | tail -n 1



In [25]:
import json
from ibm_watson_machine_learning import APIClient

## 2.0 Create Batch Deployment Job

### 2.1 Instantiate Watson Machine Learning Client

To interact with the local Watson Machine Learning instance, we will be using the Python SDK. 

<font color=red>**<< UPDATE THE VARIABLES BELOW >>**</font>

<font color='red'>Replace the `username` and `password` values of `************` with your Cloud Pak for Data `username` and `password`. The value for `url` should match the `url` for your Cloud Pak for Data cluster, which you can get from the browser address bar (be sure to include the 'https://'.</font> The credentials should look something like this (these are example values, not the ones you will use):

`
wml_credentials = {
    "url": "https://zen.clusterid.us-south.containers.appdomain.cloud/",
    "username": "cp4duser",
    "password" : "cp4dpass",
    "instance_id": "wml_local",
    "version" : "3.5"
}
`
#### NOTE: Make sure that there is no trailing forward slash `/` in the `url`

In [26]:
# Be sure to update these credentials before running the cell.
import os
location = os.environ['RUNTIME_ENV_APSX_URL']
print(location)
wml_credentials = {
                   "url": location,
                   "username": "admin",
                   "password" : "CP4DDataFabric",
                   "instance_id": "openshift",   
                   "version" : "4.0"
                  }

wml_client = APIClient(wml_credentials)

https://internal-nginx-svc:12443
Python 3.7 and 3.8 frameworks are deprecated and will be removed in a future release. Use Python 3.9 framework instead.


In [27]:
wml_client.spaces.list()

Note: 'limit' is not provided. Only first 50 records will be displayed if the number of records exceed 50
------------------------------------  -------------------------------------------------------------------  ------------------------
ID                                    NAME                                                                 CREATED
7ff7a9a9-92a2-4c8c-b804-95f5d679b7c1  openscale-express-path-preprod-00000000-0000-0000-0000-000000000000  2022-03-28T10:58:29.610Z
3cdf33f2-8203-47d4-8bf5-3e9e778309e2  openscale-express-path-00000000-0000-0000-0000-000000000000          2022-03-28T10:58:12.112Z
42667a87-7277-420b-9cd2-1f59ba49cfba  Credit_Risk_Deployment                                               2022-03-28T09:50:42.117Z
------------------------------------  -------------------------------------------------------------------  ------------------------


### 2.2 Find Deployment Space

We will try to find the `ID` for the deployment space you want to use and set it as the default space for the client.

<font color=red>**<< UPDATE THE VARIABLES BELOW >>**</font>

- Update with the name of the deployment space where you have created the batch deployment.

In [28]:
# Be sure to update the name of the space with the one you want to use.
DEPLOYMENT_SPACE_NAME = 'Credit_Risk_Deployment'

In [29]:
all_spaces = wml_client.spaces.get_details()['resources']
space_id = None
for space in all_spaces:
    if space['entity']['name'] == DEPLOYMENT_SPACE_NAME:
        space_id = space["metadata"]["id"]
        print("\nDeployment Space ID: ", space_id)

if space_id is None:
    print("WARNING: Your space does not exist. Create a deployment space before proceeding.")
    # We could programmatically create the space.
    #space_id = wml_client.spaces.store(meta_props={wml_client.spaces.ConfigurationMetaNames.NAME: space_name})["metadata"]["guid"]


Deployment Space ID:  42667a87-7277-420b-9cd2-1f59ba49cfba


In [30]:
# Now set the default space to the ID for your deployment space. If this is successful, you will see a 'SUCCESS' message.
wml_client.set.default_space(space_id)

'SUCCESS'

In [31]:
# These are the models and deployments we currently have in our deployment space.
wml_client.repository.list_models()
wml_client.deployments.list()

------------------------------------  --------------------  ------------------------  ---------
ID                                    NAME                  CREATED                   TYPE
392a3176-c0fd-4c7a-9ade-16ea5272550c  Credit_Risk_Model_JN  2022-03-28T09:58:09.002Z  mllib_3.0
------------------------------------  --------------------  ------------------------  ---------
------------------------------------  ------------------  -----  ------------------------
GUID                                  NAME                STATE  CREATED
5b88a8bf-9dcd-4797-bfb7-24e3b8298177  CreditRisk_batch    ready  2022-03-28T10:22:46.068Z
7f3a7083-0bd1-4af8-bc1c-bf4f7528307f  Credit_Risk_Online  ready  2022-03-28T10:03:23.642Z
------------------------------------  ------------------  -----  ------------------------


### 2.3 Find Batch Deployment

We will try to find the batch deployment which was created.

<font color=red>**<< UPDATE THE VARIABLES BELOW >>**</font>

- Update the variable with the name of the batch deployment you created previously.

In [32]:
DEPLOYMENT_NAME = 'CreditRisk_batch'

In [33]:
wml_deployments = wml_client.deployments.get_details()
deployment_uid = None
deployment_details = None
for deployment in wml_deployments['resources']:
    if DEPLOYMENT_NAME == deployment['entity']['name']:
        deployment_uid = deployment['metadata']['id']
        deployment_details = deployment
        #print(json.dumps(deployment_details, indent=3))
        break

print("Deployment id: {}".format(deployment_uid))
wml_client.deployments.get_details(deployment_uid)

Deployment id: 5b88a8bf-9dcd-4797-bfb7-24e3b8298177


{'entity': {'asset': {'id': '392a3176-c0fd-4c7a-9ade-16ea5272550c'},
  'batch': {},
  'custom': {},
  'deployed_asset_type': 'model',
  'hardware_spec': {'id': 'f3ebac7d-0a75-410c-8b48-a931428cc4c5',
   'name': 'XS',
   'num_nodes': 1},
  'name': 'CreditRisk_batch',
  'space_id': '42667a87-7277-420b-9cd2-1f59ba49cfba',
  'status': {'state': 'ready'}},
 'metadata': {'created_at': '2022-03-28T10:22:46.068Z',
  'id': '5b88a8bf-9dcd-4797-bfb7-24e3b8298177',
  'modified_at': '2022-03-28T10:22:46.068Z',
  'name': 'CreditRisk_batch',
  'owner': '1000330999',
  'space_id': '42667a87-7277-420b-9cd2-1f59ba49cfba'}}

In [34]:
os.getcwd()

'/userfs/assets/notebooks'

### 2.4 Get Batch Test Data

We will load some data to run the batch predictions.

In [35]:
import pandas as pd


batch_set = pd.read_csv('/userfs/assets/data_asset/German-Credit-Risk-SmallBatchSet.csv')
batch_set = batch_set.drop('CustomerID', axis=1,errors = 'ignore')
batch_set.head()

Unnamed: 0,CheckingStatus,LoanDuration,CreditHistory,LoanPurpose,LoanAmount,ExistingSavings,EmploymentDuration,InstallmentPercent,Sex,OthersOnLoan,CurrentResidenceDuration,OwnsProperty,Age,InstallmentPlans,Housing,ExistingCreditsCount,Job,Dependents,Telephone,ForeignWorker
0,0_to_200,4,all_credits_paid_back,car_new,250,100_to_500,less_1,2,female,none,3,real_estate,26,bank,rent,1,unskilled,1,none,yes
1,0_to_200,14,credits_paid_to_date,car_new,3148,less_100,1_to_4,3,male,none,3,car_other,41,none,own,2,skilled,1,none,yes
2,greater_200,19,credits_paid_to_date,radio_tv,5351,100_to_500,greater_7,4,male,none,3,savings_insurance,49,none,own,2,skilled,1,yes,yes
3,greater_200,34,outstanding_credit,other,5790,500_to_1000,greater_7,5,male,none,4,car_other,44,stores,own,2,unskilled,1,yes,yes
4,less_0,4,all_credits_paid_back,car_new,250,100_to_500,less_1,1,female,none,1,real_estate,21,bank,rent,1,unskilled,1,none,yes


### 2.5 Create Job

We can now use the information about the deployment and the test data to create a new job against our batch deployment. We submit the data as inline payload and want the results (i.e predictions) stored in a CSV file.

In [36]:
import time
timestr = time.strftime("%Y%m%d_%H%M%S")
job_payload = {
    wml_client.deployments.ScoringMetaNames.INPUT_DATA: [{
        'fields': batch_set.columns.values.tolist(),
        'values': batch_set.values.tolist()
    }],
    wml_client.deployments.ScoringMetaNames.OUTPUT_DATA_REFERENCE: {
            "type": "data_asset",
            "connection": {},
            "location": {
                "name": "batchres_{}_{}.csv".format(timestr,deployment_uid),
                "description": "results"
            }
    }
}

job = wml_client.deployments.create_job(deployment_id=deployment_uid, meta_props=job_payload)
job_uid = wml_client.deployments.get_job_uid(job)

print('Job uid = {}'.format(job_uid))

Job uid = 8ba5684c-8bfe-4adb-9c55-878f016c3b76


In [37]:
wml_client.deployments.list_jobs()

------------------------------------  ---------  ------------------------  ------------------------------------
JOB-UID                               STATE      CREATED                   DEPLOYMENT-ID
8ba5684c-8bfe-4adb-9c55-878f016c3b76  queued     2022-03-28T18:07:20.782Z  5b88a8bf-9dcd-4797-bfb7-24e3b8298177
23a65196-107d-4572-9372-a26f65549130  completed  2022-03-28T10:34:01.412Z  5b88a8bf-9dcd-4797-bfb7-24e3b8298177
------------------------------------  ---------  ------------------------  ------------------------------------


## 3.0 Monitor Batch Job Status

The batch job is an async operation. We can use the identifier to track its progress. Below we will just poll until the job completes (or fails).

In [19]:
def poll_async_job(client, job_uid):
    import time
    while True:
        job_status = client.deployments.get_job_status(job_uid)
        print(job_status)
        state = job_status['state']
        if state == 'completed' or 'fail' in state:
            return client.deployments.get_job_details(job_uid)
        time.sleep(5)
            
job_details = poll_async_job(wml_client, job_uid)

{'completed_at': '', 'running_at': '', 'state': 'queued'}
{'completed_at': '', 'running_at': '', 'state': 'queued'}
{'completed_at': '', 'running_at': '', 'state': 'queued'}
{'completed_at': '', 'running_at': '', 'state': 'queued'}
{'completed_at': '2022-03-28T10:34:41.000Z', 'running_at': '2022-03-28T10:34:39.000Z', 'state': 'completed'}


In [20]:
wml_client.deployments.list_jobs()

------------------------------------  ---------  ------------------------  ------------------------------------
JOB-UID                               STATE      CREATED                   DEPLOYMENT-ID
23a65196-107d-4572-9372-a26f65549130  completed  2022-03-28T10:34:01.412Z  5b88a8bf-9dcd-4797-bfb7-24e3b8298177
------------------------------------  ---------  ------------------------  ------------------------------------


### 3.1 Check Results

With the job complete, we can see the predictions. 

In [21]:
wml_client.deployments.get_job_details()

{'resources': [{'entity': {'deployment': {'id': '5b88a8bf-9dcd-4797-bfb7-24e3b8298177'},
    'platform_job': {'job_id': '6e1d7556-e00a-43fe-83b2-4efdf4ae3b1a',
     'run_id': '24792940-95ad-478d-a7b5-9849d01251e9'},
    'scoring': {'input_data': [{'fields': ['CheckingStatus',
        'LoanDuration',
        'CreditHistory',
        'LoanPurpose',
        'LoanAmount',
        'ExistingSavings',
        'EmploymentDuration',
        'InstallmentPercent',
        'Sex',
        'OthersOnLoan',
        'CurrentResidenceDuration',
        'OwnsProperty',
        'Age',
        'InstallmentPlans',
        'Housing',
        'ExistingCreditsCount',
        'Job',
        'Dependents',
        'Telephone',
        'ForeignWorker'],
       'values': [['0_to_200',
         4,
         'all_credits_paid_back',
         'car_new',
         250,
         '100_to_500',
         'less_1',
         2,
         'female',
         'none',
         3,
         'real_estate',
         26,
         'bank'

In [22]:
print(json.dumps(job_details, indent=2))

{
  "entity": {
    "deployment": {
      "id": "5b88a8bf-9dcd-4797-bfb7-24e3b8298177"
    },
    "platform_job": {
      "job_id": "6e1d7556-e00a-43fe-83b2-4efdf4ae3b1a",
      "run_id": "24792940-95ad-478d-a7b5-9849d01251e9"
    },
    "scoring": {
      "input_data": [
        {
          "fields": [
            "CheckingStatus",
            "LoanDuration",
            "CreditHistory",
            "LoanPurpose",
            "LoanAmount",
            "ExistingSavings",
            "EmploymentDuration",
            "InstallmentPercent",
            "Sex",
            "OthersOnLoan",
            "CurrentResidenceDuration",
            "OwnsProperty",
            "Age",
            "InstallmentPlans",
            "Housing",
            "ExistingCreditsCount",
            "Job",
            "Dependents",
            "Telephone",
            "ForeignWorker"
          ],
          "values": [
            [
              "0_to_200",
              4,
              "all_credits_paid_back",
  

## Congratulations, you have created and submitted a job for batch scoring !