<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Watson Machine Learning

The notebook will train, create and deploy a Credit Risk model. It will then configure OpenScale to monitor drift in data and accuracy by injecting sample payloads for viewing in the OpenScale Insights dashboard.

### Contents

- [1. Setup](#setup)
- [2. Model building and deployment](#model)
- [3. OpenScale configuration](#openscale)
- [4. Generate drift model](#driftmodel)
- [5. Submit payload](#payload)
- [6. Enable drift monitoring](#monitor)
- [7. Run drift monitor](# )

# 1.0 Setup <a name="setup"></a>

## 1.1 Package installation

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
!pip install --upgrade watson-machine-learning-client-V4 | tail -n 1
!pip install --upgrade ibm-ai-openscale --no-cache | tail -n 1
!pip install --upgrade ibm-wos-utils |tail -n 1
! pip install scikit-learn==0.20.2 | tail -n 1

### Action: restart the kernel!

## 1.2 Configure credentials

- WOS_CREDENTIALS
- WML_CREDENTIALS

The url for `WOS_CREDENTIALS` is the url of the CP4D cluster, i.e. `https://zen-cpd-zen.apps.com`.

In [None]:
import warnings
warnings.filterwarnings('ignore')

<font color='red'>Replace the `username` and `password` values of `************` with your Cloud Pak for Data `username` and `password`. The value for `url` should match the `url` for your Cloud Pak for Data cluster, which you can get from the browser address bar (be sure to include the 'https://'.</font> The credentials should look something like this (these are example values, not the ones you will use):

`
wml_credentials = {
                   "url": "https://zen.clusterid.us-south.containers.appdomain.cloud",
                   "username": "cp4duser",
                   "password" : "cp4dpass",
                   "instance_id": "wml_local",
                   "version" : "3.0.0"
                  }
`
#### NOTE: Make sure that there is no trailing forward slash `/` in the `url`

In [None]:
WOS_CREDENTIALS = {
    "url": "************",
    "username": "************",
    "password": "************"
}

In [None]:
WML_CREDENTIALS = WOS_CREDENTIALS.copy()
WML_CREDENTIALS['instance_id']='openshift'
WML_CREDENTIALS['version']='3.0.0'

In [None]:
%store -r MODEL_NAME
%store -r DEPLOYMENT_NAME
%store -r DEFAULT_SPACE

# 2.0 Load the training data

In [None]:
!rm -rf german_credit_data_biased_training.csv
!wget https://raw.githubusercontent.com/scottdangelo/credit-risk-workshop-cpd/addData/data/openscale/german_credit_data_biased_training.csv

In [None]:
import pandas as pd

data_df = pd.read_csv('german_credit_data_biased_training.csv', sep=",", header=0)

In [None]:
data_df.head()

# 3.0 Configure OpenScale <a name="openscale"></a>

The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [None]:
from ibm_ai_openscale import APIClient4ICP
from ibm_ai_openscale.engines import *
from ibm_ai_openscale.utils import *
from ibm_ai_openscale.supporting_classes import PayloadRecord, Feature
from ibm_ai_openscale.supporting_classes.enums import *

In [None]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient
import json

wml_client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)

In [None]:
ai_client = APIClient4ICP(WOS_CREDENTIALS)
ai_client.version

In [None]:
subscription = None

if subscription is None:
    subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
    for sub in subscriptions_uids:
        if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:
            print("Found existing subscription.")
            subscription = ai_client.data_mart.subscriptions.get(sub)
if subscription is None:
    print("No subscription found. Please run openscale-initial-setup.ipynb to configure.")

### Set Deployment UID

In [None]:
wml_client.set.default_space(DEFAULT_SPACE)

In [None]:
wml_deployments = wml_client.deployments.get_details()
deployment_uid = None
for deployment in wml_deployments['resources']:
    print(deployment['entity']['name'])
    if DEPLOYMENT_NAME == deployment['entity']['name']:
        deployment_uid = deployment['metadata']['guid']
        break
        
print(deployment_uid)

# 4.0 Generate drift model <a name="driftmodel"></a>


Drift requires a trained model to be uploaded manually for WML. You can train, create and download a drift detection model using the code below. The entire code can be found in the [training_statistics_notebook](https://github.com/IBM-Watson/aios-data-distribution/blob/master/training_statistics_notebook.ipynb) ( check for Drift detection model generation). 

In [None]:
training_data_info = {
  "class_label":'Risk',
   "feature_columns":["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
    "categorical_columns":["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]
}


In [None]:
#Set model_type. Acceptable values are:["binary","multiclass","regression"]
model_type = "binary"
#model_type = "multiclass"
#model_type = "regression"

In [None]:
def score(training_data_frame):
     
      WML_CREDENTAILS = WML_CREDENTIALS
      
      
      #The data type of the label column and prediction column should be same .
      #User needs to make sure that label column and prediction column array should have the same unique class labels
      prediction_column_name = "predictedLabel"
      probability_column_name = "probability"
        
      feature_columns = list(training_data_frame.columns)
      training_data_rows = training_data_frame[feature_columns].values.tolist()
      #print(training_data_rows)
    

    
      payload_scoring = {
          wml_client.deployments.ScoringMetaNames.INPUT_DATA: [{
               "fields": feature_columns,
               "values": [x for x in training_data_rows]
          }]
      }
      
      score = wml_client.deployments.score(deployment_uid, payload_scoring)
      score_predictions = score.get('predictions')[0]
      
      prob_col_index = list(score_predictions.get('fields')).index(probability_column_name)
      predict_col_index = list(score_predictions.get('fields')).index(prediction_column_name)
      
      if prob_col_index < 0 or predict_col_index < 0:
          raise Exception("Missing prediction/probability column in the scoring response")
          
      import numpy as np
      probability_array = np.array([value[prob_col_index] for value in score_predictions.get('values')])
      prediction_vector = np.array([value[predict_col_index] for value in score_predictions.get('values')])
      
      return probability_array, prediction_vector

In [None]:
#Generate drift detection model
from ibm_wos_utils.drift.drift_trainer import DriftTrainer

drift_detection_input = {
        "feature_columns":training_data_info.get('feature_columns'),
        "categorical_columns":training_data_info.get('categorical_columns'),
        "label_column": training_data_info.get('class_label'),
        "problem_type": model_type
    }
    
    
drift_trainer = DriftTrainer(data_df,drift_detection_input)
if model_type != "regression":
    #Note: batch_size can be customized by user as per the training data size
    drift_trainer.generate_drift_detection_model(score,batch_size=data_df.shape[0])
    
    #Note: Two column constraints are not computed beyond two_column_learner_limit(default set to 200)
    #User can adjust the value depending on the requirement
drift_trainer.learn_constraints(two_column_learner_limit=200)
drift_trainer.create_archive()

In [None]:
#Generate a download link for drift detection model
from IPython.display import HTML
import base64
import io

def create_download_link_for_ddm( title = "Download Drift detection model", filename = "drift_detection_model.tar.gz"):  
    
    #Retains stats information    

    with open(filename,'rb') as file:
        ddm = file.read()
    b64 = base64.b64encode(ddm)
    payload = b64.decode()
        
    html = '<a download="{filename}" href="data:text/json;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title=title,filename=filename)
    return HTML(html)
    
create_download_link_for_ddm()

# 5.0 Submit payload <a name="payload"></a>

### Score the model so we can configure monitors

Now that the WML service has been bound and the subscription has been created, we need to send a request to the model before we configure OpenScale. This allows OpenScale to create a payload log in the datamart with the correct schema, so it can capture data coming into and out of the model. 

In [None]:
fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"]
values = [
  ["no_checking",13,"credits_paid_to_date","car_new",1343,"100_to_500","1_to_4",2,"female","none",3,"savings_insurance",46,"none","own",2,"skilled",1,"none","yes"],
  ["no_checking",24,"prior_payments_delayed","furniture",4567,"500_to_1000","1_to_4",4,"male","none",4,"savings_insurance",36,"none","free",2,"management_self-employed",1,"none","yes"],
  ["0_to_200",26,"all_credits_paid_back","car_new",863,"less_100","less_1",2,"female","co-applicant",2,"real_estate",38,"none","own",1,"skilled",1,"none","yes"],
  ["0_to_200",14,"no_credits","car_new",2368,"less_100","1_to_4",3,"female","none",3,"real_estate",29,"none","own",1,"skilled",1,"none","yes"],
  ["0_to_200",4,"no_credits","car_new",250,"less_100","unemployed",2,"female","none",3,"real_estate",23,"none","rent",1,"management_self-employed",1,"none","yes"],
  ["no_checking",17,"credits_paid_to_date","car_new",832,"100_to_500","1_to_4",2,"male","none",2,"real_estate",42,"none","own",1,"skilled",1,"none","yes"],
  ["no_checking",33,"outstanding_credit","appliances",5696,"unknown","greater_7",4,"male","co-applicant",4,"unknown",54,"none","free",2,"skilled",1,"yes","yes"],
  ["0_to_200",13,"prior_payments_delayed","retraining",1375,"100_to_500","4_to_7",3,"male","none",3,"real_estate",37,"none","own",2,"management_self-employed",1,"none","yes"]
]

payload_scoring = {"fields": fields,"values": values}
payload = {
    wml_client.deployments.ScoringMetaNames.INPUT_DATA: [payload_scoring]
}
scoring_response = wml_client.deployments.score(deployment_uid, payload)



print('Single record scoring result:', '\n fields:', scoring_response['predictions'][0]['fields'], '\n values: ', scoring_response['predictions'][0]['values'][0])

# 6. Enable drift monitoring <a name="monitor"></a>

In [None]:
subscription.drift_monitoring.enable(threshold=0.05, min_records=10,model_path="./drift_detection_model.tar.gz")

# 7. Run Drift monitor on demand <a name="driftrun"></a>

In [None]:
!rm german_credit_feed.json
!wget https://raw.githubusercontent.com/IBM/cpd-intelligent-loan-agent-assets/master/data/german_credit_feed.json

In [None]:
import random

with open('german_credit_feed.json', 'r') as scoring_file:
    scoring_data = json.load(scoring_file)

fields = scoring_data['fields']
values = []
for _ in range(10):
    current = random.choice(scoring_data['values'])
    #set age of all rows to 100 to increase drift values on dashboard
    current[12] = 100
   
    values.append(current)
payload_scoring = {"fields": fields, "values": values}
payload = {
    wml_client.deployments.ScoringMetaNames.INPUT_DATA: [payload_scoring]
}
scoring_response = wml_client.deployments.score(deployment_uid, payload)

In [None]:
drift_run_details = subscription.drift_monitoring.run(background_mode=False)

In [None]:
subscription.drift_monitoring.get_table_content()


## Congratulations!

You have finished running all the cells within the notebook for IBM Watson OpenScale. You can now view the OpenScale dashboard by going to the CP4D `Home` page, and clicking `Services`. Choose the `OpenScale` tile and click the menu to `Open`. Click on the tile for the model you've created to see fairness, accuracy, and performance monitors. Click on the timeseries graph to get detailed information on transactions during a specific time window.

OpenScale shows model performance over time. You have two options to keep data flowing to your OpenScale graphs:
  * Download, configure and schedule the [model feed notebook](https://raw.githubusercontent.com/emartensibm/german-credit/master/german_credit_scoring_feed.ipynb). This notebook can be set up with your WML credentials, and scheduled to provide a consistent flow of scoring requests to your model, which will appear in your OpenScale monitors.
  * Re-run this notebook. Running this notebook from the beginning will delete and re-create the model and deployment, and re-create the historical data. Please note that the payload and measurement logs for the previous deployment will continue to be stored in your datamart, and can be deleted if necessary.