<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Watson Machine Learning

The notebook will train, create and deploy a Credit Risk model. It will then configure OpenScale to monitor drift in data and accuracy by injecting sample payloads for viewing in the OpenScale Insights dashboard.

### Contents

- [1. Setup](#setup)
- [2. Model building and deployment](#model)
- [3. OpenScale configuration](#openscale)
- [4. Generate drift model](#driftmodel)
- [5. Submit payload](#payload)
- [6. Enable drift monitoring](#monitor)
- [7. Run drift monitor](# )

# 1.0 Setup <a name="setup"></a>

## 1.1 Package installation

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [45]:
!pip install --upgrade watson-machine-learning-client-V4 | tail -n 1
!pip install --upgrade ibm-ai-openscale --no-cache | tail -n 1
!pip install --upgrade ibm-wos-utils |tail -n 1
! pip install scikit-learn==0.20.2 | tail -n 1

[31mERROR: Could not install packages due to an EnvironmentError: [Errno 30] Read-only file system: 'INSTALLER'
[0m
    Uninstalling ibm-ai-openscale-2.1.21:


### Action: restart the kernel!

## 1.2 Configure credentials

- WOS_CREDENTIALS (ICP)
- WML_CREDENTIALS (ICP)

The url for `WOS_CREDENTIALS` is the url of the CP4D cluster, i.e. `https://zen-cpd-zen.apps.com`.

In [3]:
import warnings
warnings.filterwarnings('ignore')

In [5]:
WOS_CREDENTIALS = {
    "url": "https://zen1-cpd-zen1.aida-cpd3-dal10-b3c32x128-f2c6cdc6801be85fd188b09d006f13e3-0001.us-south.containers.appdomain.cloud",
    "username": "scottda-noadmin",
    "password": "<redacted>"
}

In [6]:
WML_CREDENTIALS = WOS_CREDENTIALS.copy()
WML_CREDENTIALS['instance_id']='openshift'
WML_CREDENTIALS['version']='3.0.0'

In [1]:
%store -r MODEL_NAME
%store -r DEPLOYMENT_NAME
%store -r DEFAULT_SPACE

no stored variable or alias MODEL_NAME
no stored variable or alias DEPLOYMENT_NAME
no stored variable or alias DEFAULT_SPACE


In [7]:
MODEL_NAME = "sda-model-8-3-2020"
DEPLOYMENT_NAME = "scottda-model-deployment-8-3-2020"
DEFAULT_SPACE = "3f2c0a53-e1bb-4e09-957d-d7b84c16a56f"

# 2.0 Load the training data

In [7]:
!rm -rf german_credit_data_biased_training.csv
!wget https://raw.githubusercontent.com/scottdangelo/credit-risk-workshop-cpd/addData/data/openscale/german_credit_data_biased_training.csv

--2020-08-14 15:43:22--  https://raw.githubusercontent.com/scottdangelo/credit-risk-workshop-cpd/addData/data/openscale/german_credit_data_biased_training.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.8.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.8.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 689622 (673K) [text/plain]
Saving to: ‘german_credit_data_biased_training.csv’


2020-08-14 15:43:22 (31.6 MB/s) - ‘german_credit_data_biased_training.csv’ saved [689622/689622]



In [8]:
import pandas as pd
#!rm -rf german_credit_data_biased_training.csv
#!wget https://raw.githubusercontent.com/IBM/cpd-intelligent-loan-agent-assets/master/data/german_credit_data_biased_training.csv  -O german_credit_data_biased_training.csv
    
!ls -lh german_credit_data_biased_training.csv

data_df = pd.read_csv('german_credit_data_biased_training.csv', sep=",", header=0)


-rw-r-----. 1 wsuser watsonstudio 674K Aug 14 15:43 german_credit_data_biased_training.csv


In [9]:
data_df.head()

Unnamed: 0,CheckingStatus,LoanDuration,CreditHistory,LoanPurpose,LoanAmount,ExistingSavings,EmploymentDuration,InstallmentPercent,Sex,OthersOnLoan,...,OwnsProperty,Age,InstallmentPlans,Housing,ExistingCreditsCount,Job,Dependents,Telephone,ForeignWorker,Risk
0,0_to_200,31,credits_paid_to_date,other,1889,100_to_500,less_1,3,female,none,...,savings_insurance,32,none,own,1,skilled,1,none,yes,No Risk
1,less_0,18,credits_paid_to_date,car_new,462,less_100,1_to_4,2,female,none,...,savings_insurance,37,stores,own,2,skilled,1,none,yes,No Risk
2,less_0,15,prior_payments_delayed,furniture,250,less_100,1_to_4,2,male,none,...,real_estate,28,none,own,2,skilled,1,yes,no,No Risk
3,0_to_200,28,credits_paid_to_date,retraining,3693,less_100,greater_7,3,male,none,...,savings_insurance,32,none,own,1,skilled,1,none,yes,No Risk
4,no_checking,28,prior_payments_delayed,education,6235,500_to_1000,greater_7,3,male,none,...,unknown,57,none,own,2,skilled,1,none,yes,Risk


# 3.0 Configure OpenScale <a name="openscale"></a>

The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [10]:
from ibm_ai_openscale import APIClient4ICP
from ibm_ai_openscale.engines import *
from ibm_ai_openscale.utils import *
from ibm_ai_openscale.supporting_classes import PayloadRecord, Feature
from ibm_ai_openscale.supporting_classes.enums import *

In [11]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient
import json

wml_client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)

In [12]:
ai_client = APIClient4ICP(WOS_CREDENTIALS)
ai_client.version

'2.1.21'

In [13]:
subscription = None

if subscription is None:
    subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
    for sub in subscriptions_uids:
        if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:
            print("Found existing subscription.")
            subscription = ai_client.data_mart.subscriptions.get(sub)
if subscription is None:
    print("No subscription found. Please run openscale-initial-setup.ipynb to configure.")

Found existing subscription.


### Set Deployment UID

In [14]:
wml_client.set.default_space(DEFAULT_SPACE)

'SUCCESS'

In [15]:
wml_deployments = wml_client.deployments.get_details()
deployment_uid = None
for deployment in wml_deployments['resources']:
    print(deployment['entity']['name'])
    if DEPLOYMENT_NAME == deployment['entity']['name']:
        deployment_uid = deployment['metadata']['guid']
        break
        
print(deployment_uid)

WOS-INTERNAL-e11cebdc-d53f-4812-b34f-8ce67d9c31a8
scottda-batch-deployment-8-3-2020
scottda-model-deployment-8-3-2020
d435943c-d802-4e3b-b34c-e0f42fa4ee52


# 4.0 Generate drift model <a name="driftmodel"></a>


Drift requires a trained model to be uploaded manually for WML. You can train, create and download a drift detection model using the code below. The entire code can be found [here](https://github.com/IBM-Watson/aios-data-distribution/blob/master/training_statistics_notebook.ipynb) ( check for Drift detection model generation). 

In [16]:
training_data_info = {
  "class_label":'Risk',
   "feature_columns":["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
    "categorical_columns":["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]
}


In [17]:
#Set model_type. Acceptable values are:["binary","multiclass","regression"]
model_type = "binary"
#model_type = "multiclass"
#model_type = "regression"

In [18]:
def score(training_data_frame):
     #To be filled by the user
      WML_CREDENTAILS = WML_CREDENTIALS
      
      
      #The data type of the label column and prediction column should be same .
      #User needs to make sure that label column and prediction column array should have the same unique class labels
      prediction_column_name = "predictedLabel"
      probability_column_name = "probability"
        
      feature_columns = list(training_data_frame.columns)
      training_data_rows = training_data_frame[feature_columns].values.tolist()
      #print(training_data_rows)
    

    
      payload_scoring = {
          wml_client.deployments.ScoringMetaNames.INPUT_DATA: [{
               "fields": feature_columns,
               "values": [x for x in training_data_rows]
          }]
      }
      
      score = wml_client.deployments.score(deployment_uid, payload_scoring)
      score_predictions = score.get('predictions')[0]
      
      prob_col_index = list(score_predictions.get('fields')).index(probability_column_name)
      predict_col_index = list(score_predictions.get('fields')).index(prediction_column_name)
      
      if prob_col_index < 0 or predict_col_index < 0:
          raise Exception("Missing prediction/probability column in the scoring response")
          
      import numpy as np
      probability_array = np.array([value[prob_col_index] for value in score_predictions.get('values')])
      prediction_vector = np.array([value[predict_col_index] for value in score_predictions.get('values')])
      
      return probability_array, prediction_vector

In [21]:
#Generate drift detection model
from ibm_wos_utils.drift.drift_trainer import DriftTrainer

drift_detection_input = {
        "feature_columns":training_data_info.get('feature_columns'),
        "categorical_columns":training_data_info.get('categorical_columns'),
        "label_column": training_data_info.get('class_label'),
        "problem_type": model_type
    }
    
    
drift_trainer = DriftTrainer(data_df,drift_detection_input)
if model_type != "regression":
        #Note: batch_size can be customized by user as per the training data size
    drift_trainer.generate_drift_detection_model(score,batch_size=data_df.shape[0])
    
    #Note: Two column constraints are not computed beyond two_column_learner_limit(default set to 200)
    #User can adjust the value depending on the requirement
drift_trainer.learn_constraints(two_column_learner_limit=200)
drift_trainer.create_archive()

Scoring training dataframe...: 100%|██████████| 4000/4000 [00:01<00:00, 2504.10rows/s]
Optimising Drift Detection Model...: 100%|██████████| 40/40 [00:49<00:00,  1.36s/models]
Scoring training dataframe...: 100%|██████████| 1000/1000 [00:00<00:00, 1211.60rows/s]
Computing feature stats...: 100%|██████████| 20/20 [00:00<00:00, 221.54features/s]
Learning single feature constraints...: 100%|██████████| 21/21 [00:00<00:00, 1400.68constraints/s]
Learning two feature constraints...: 100%|██████████| 209/209 [00:03<00:00, 67.57constraints/s] 


In [22]:
#Generate a download link for drift detection model
from IPython.display import HTML
import base64
import io

def create_download_link_for_ddm( title = "Download Drift detection model", filename = "drift_detection_model.tar.gz"):  
    
    #Retains stats information    

    with open(filename,'rb') as file:
        ddm = file.read()
    b64 = base64.b64encode(ddm)
    payload = b64.decode()
        
    html = '<a download="{filename}" href="data:text/json;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title=title,filename=filename)
    return HTML(html)
    
create_download_link_for_ddm()

In [26]:
!rm -rf drift_detection_model.tar.gz
!wget -O drift_detection_model.tar.gz https://github.com/IBM/cpd-intelligent-loan-agent-assets/blob/master/models/drift_detection_model.tar.gz?raw=true

--2020-08-14 14:56:26--  https://github.com/IBM/cpd-intelligent-loan-agent-assets/blob/master/models/drift_detection_model.tar.gz?raw=true
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/IBM/cpd-intelligent-loan-agent-assets/raw/master/models/drift_detection_model.tar.gz [following]
--2020-08-14 14:56:26--  https://github.com/IBM/cpd-intelligent-loan-agent-assets/raw/master/models/drift_detection_model.tar.gz
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/IBM/cpd-intelligent-loan-agent-assets/master/models/drift_detection_model.tar.gz [following]
--2020-08-14 14:56:26--  https://raw.githubusercontent.com/IBM/cpd-intelligent-loan-agent-assets/master/models/drift_detection_model.tar.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com

# 5.0 Submit payload <a name="payload"></a>

### Score the model so we can configure monitors

Now that the WML service has been bound and the subscription has been created, we need to send a request to the model before we configure OpenScale. This allows OpenScale to create a payload log in the datamart with the correct schema, so it can capture data coming into and out of the model. 

In [23]:
fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"]
values = [
  ["no_checking",13,"credits_paid_to_date","car_new",1343,"100_to_500","1_to_4",2,"female","none",3,"savings_insurance",46,"none","own",2,"skilled",1,"none","yes"],
  ["no_checking",24,"prior_payments_delayed","furniture",4567,"500_to_1000","1_to_4",4,"male","none",4,"savings_insurance",36,"none","free",2,"management_self-employed",1,"none","yes"],
  ["0_to_200",26,"all_credits_paid_back","car_new",863,"less_100","less_1",2,"female","co-applicant",2,"real_estate",38,"none","own",1,"skilled",1,"none","yes"],
  ["0_to_200",14,"no_credits","car_new",2368,"less_100","1_to_4",3,"female","none",3,"real_estate",29,"none","own",1,"skilled",1,"none","yes"],
  ["0_to_200",4,"no_credits","car_new",250,"less_100","unemployed",2,"female","none",3,"real_estate",23,"none","rent",1,"management_self-employed",1,"none","yes"],
  ["no_checking",17,"credits_paid_to_date","car_new",832,"100_to_500","1_to_4",2,"male","none",2,"real_estate",42,"none","own",1,"skilled",1,"none","yes"],
  ["no_checking",33,"outstanding_credit","appliances",5696,"unknown","greater_7",4,"male","co-applicant",4,"unknown",54,"none","free",2,"skilled",1,"yes","yes"],
  ["0_to_200",13,"prior_payments_delayed","retraining",1375,"100_to_500","4_to_7",3,"male","none",3,"real_estate",37,"none","own",2,"management_self-employed",1,"none","yes"]
]

payload_scoring = {"fields": fields,"values": values}
payload = {
    wml_client.deployments.ScoringMetaNames.INPUT_DATA: [payload_scoring]
}
scoring_response = wml_client.deployments.score(deployment_uid, payload)



print('Single record scoring result:', '\n fields:', scoring_response['predictions'][0]['fields'], '\n values: ', scoring_response['predictions'][0]['values'][0])

Single record scoring result: 
 fields: ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'CheckingStatus_IX', 'CreditHistory_IX', 'LoanPurpose_IX', 'ExistingSavings_IX', 'EmploymentDuration_IX', 'Sex_IX', 'OthersOnLoan_IX', 'OwnsProperty_IX', 'InstallmentPlans_IX', 'Housing_IX', 'Job_IX', 'Telephone_IX', 'ForeignWorker_IX', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'] 
 values:  ['no_checking', 13, 'credits_paid_to_date', 'car_new', 1343, '100_to_500', '1_to_4', 2, 'female', 'none', 3, 'savings_insurance', 46, 'none', 'own', 2, 'skilled', 1, 'none', 'yes', 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, [20, [1, 3, 5, 13, 14, 15, 16, 17, 18, 19], [1.0, 1.0, 1.0, 13.0, 

# 6. Enable drift monitoring <a name="monitor"></a>

In [25]:
subscription.drift_monitoring.enable(threshold=0.05, min_records=10,model_path="./drift_detection_model.tar.gz")

{'config_status': {'state': 'finished'},
 'data_drift_enabled': True,
 'drift_threshold': 0.05,
 'is_schedule_enabled': True,
 'min_samples': 10,
 'model_drift_enabled': True,
 'next_scheduled_run_timestamp': '2020-08-14T16:46:46.066992Z',
 'schedule_repeat_interval': 3,
 'schedule_repeat_type': 'hour'}

# 7. Run Drift monitor on demand <a name="driftrun"></a>

In [26]:
!rm german_credit_feed.json
!wget https://raw.githubusercontent.com/IBM/cpd-intelligent-loan-agent-assets/master/data/german_credit_feed.json

--2020-08-14 16:44:59--  https://raw.githubusercontent.com/IBM/cpd-intelligent-loan-agent-assets/master/data/german_credit_feed.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.8.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.8.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3076548 (2.9M) [text/plain]
Saving to: ‘german_credit_feed.json’


2020-08-14 16:44:59 (60.1 MB/s) - ‘german_credit_feed.json’ saved [3076548/3076548]



In [27]:
import random

with open('german_credit_feed.json', 'r') as scoring_file:
    scoring_data = json.load(scoring_file)

fields = scoring_data['fields']
values = []
for _ in range(10):
    current = random.choice(scoring_data['values'])
    #set age of all rows to 100 to increase drift values on dashboard
    current[12] = 100
   
    values.append(current)
payload_scoring = {"fields": fields, "values": values}
payload = {
    wml_client.deployments.ScoringMetaNames.INPUT_DATA: [payload_scoring]
}
scoring_response = wml_client.deployments.score(deployment_uid, payload)

In [29]:
drift_run_details = subscription.drift_monitoring.run(background_mode=False)




 Waiting for end of drift monitoring run  




RUNNING..........
COMPLETED

---------------------------
 Successfully finished run 
---------------------------




In [30]:
subscription.drift_monitoring.get_table_content()


Unnamed: 0,ts,id,measurement_id,value,lower limit,upper limit,tags,binding_id,subscription_id,deployment_id
0,2020-08-14 16:49:16.712972+00:00,data_drift_magnitude,2e957090-f629-47fb-96f1-6085607a3fe0,0.055006,,,,3f65b02c-ad55-4e51-bdf3-5bb87275cd04,f978ce05-e4f6-42b0-81b2-be06884cd358,d435943c-d802-4e3b-b34c-e0f42fa4ee52
1,2020-08-14 16:49:16.712972+00:00,drift_magnitude,2e957090-f629-47fb-96f1-6085607a3fe0,0.029436,,0.05,,3f65b02c-ad55-4e51-bdf3-5bb87275cd04,f978ce05-e4f6-42b0-81b2-be06884cd358,d435943c-d802-4e3b-b34c-e0f42fa4ee52
2,2020-08-14 16:49:16.712972+00:00,predicted_accuracy,2e957090-f629-47fb-96f1-6085607a3fe0,0.776564,,,,3f65b02c-ad55-4e51-bdf3-5bb87275cd04,f978ce05-e4f6-42b0-81b2-be06884cd358,d435943c-d802-4e3b-b34c-e0f42fa4ee52
3,2020-08-14 14:58:42.298582+00:00,data_drift_magnitude,d0da9fa6-2eff-4097-9c52-05f0d9b05468,0.055201,,,,3f65b02c-ad55-4e51-bdf3-5bb87275cd04,f978ce05-e4f6-42b0-81b2-be06884cd358,d435943c-d802-4e3b-b34c-e0f42fa4ee52
4,2020-08-14 14:58:42.298582+00:00,drift_magnitude,d0da9fa6-2eff-4097-9c52-05f0d9b05468,0.021861,,0.05,,3f65b02c-ad55-4e51-bdf3-5bb87275cd04,f978ce05-e4f6-42b0-81b2-be06884cd358,d435943c-d802-4e3b-b34c-e0f42fa4ee52
5,2020-08-14 14:58:42.298582+00:00,predicted_accuracy,d0da9fa6-2eff-4097-9c52-05f0d9b05468,0.784139,,,,3f65b02c-ad55-4e51-bdf3-5bb87275cd04,f978ce05-e4f6-42b0-81b2-be06884cd358,d435943c-d802-4e3b-b34c-e0f42fa4ee52
6,2020-08-14 14:58:16.264871+00:00,data_drift_magnitude,abd587e2-7bf8-482d-930a-8bf4bc56ef7b,0.055201,,,,3f65b02c-ad55-4e51-bdf3-5bb87275cd04,f978ce05-e4f6-42b0-81b2-be06884cd358,d435943c-d802-4e3b-b34c-e0f42fa4ee52
7,2020-08-14 14:58:16.264871+00:00,drift_magnitude,abd587e2-7bf8-482d-930a-8bf4bc56ef7b,0.021861,,0.05,,3f65b02c-ad55-4e51-bdf3-5bb87275cd04,f978ce05-e4f6-42b0-81b2-be06884cd358,d435943c-d802-4e3b-b34c-e0f42fa4ee52
8,2020-08-14 14:58:16.264871+00:00,predicted_accuracy,abd587e2-7bf8-482d-930a-8bf4bc56ef7b,0.784139,,,,3f65b02c-ad55-4e51-bdf3-5bb87275cd04,f978ce05-e4f6-42b0-81b2-be06884cd358,d435943c-d802-4e3b-b34c-e0f42fa4ee52


## Congratulations!

You have finished running all the cells within the notebook for IBM Watson OpenScale. You can now view the OpenScale dashboard by going to the CP4D `Home` page, and clicking `Services`. Choose the `OpenScale` tile and click the menu to `Open`. Click on the tile for the model you've created to see fairness, accuracy, and performance monitors. Click on the timeseries graph to get detailed information on transactions during a specific time window.

OpenScale shows model performance over time. You have two options to keep data flowing to your OpenScale graphs:
  * Download, configure and schedule the [model feed notebook](https://raw.githubusercontent.com/emartensibm/german-credit/master/german_credit_scoring_feed.ipynb). This notebook can be set up with your WML credentials, and scheduled to provide a consistent flow of scoring requests to your model, which will appear in your OpenScale monitors.
  * Re-run this notebook. Running this notebook from the beginning will delete and re-create the model and deployment, and re-create the historical data. Please note that the payload and measurement logs for the previous deployment will continue to be stored in your datamart, and can be deleted if necessary.