<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Watson Machine Learning

This notebook should be run using with **Python 3.7** runtime environment. **If you are viewing this in Watson Studio and do not see Python 3.7 in the upper right corner of your screen, please update the runtime now.** It requires service credentials for the following services:
  * Watson OpenScale
  * Watson Machine Learning 
  * DB2

  
The notebook will train, create and deploy a German Credit Risk model, configure OpenScale to monitor that deployment, and inject seven days' worth of historical records and measurements for viewing in the OpenScale Insights dashboard.

### Contents

- [Setup](#setup)
- [Model building and deployment](#model)
- [OpenScale configuration](#openscale)
- [Quality monitor and feedback logging](#quality)
- [Fairness, drift monitoring and explanations](#fairness)
- [Custom monitors and metrics](#custom)
- [Historical data](#historical)

# 1.0 Setup <a name="setup"></a>

## 1.1 Package installation

> Note: Some packages that are installed are dependencies for other packages. The versions are pinned to prevent warnings or errors.

In [60]:
import warnings
warnings.filterwarnings('ignore')

In [61]:
!pip install --upgrade pyspark==2.4 --no-cache | tail -n 1

!pip install --upgrade pandas==0.25.3 --no-cache | tail -n 1
!pip install --upgrade requests==2.23 --no-cache | tail -n 1
!pip install numpy==1.16.4 --no-cache | tail -n 1
!pip install SciPy==1.5.0 --no-cache | tail -n 1
!pip install SciPy --no-cache | tail -n 1
!pip install lime --no-cache | tail -n 1
!pip install ibm-cloud-sdk-core --no-cache | tail -n 1

!pip install --upgrade ibm-watson-machine-learning==1.0.53 --user | tail -n 1
!pip install --upgrade ibm-watson-openscale==3.0.4 --no-cache | tail -n 1



### Action: restart the kernel!

## 1.2 Configure credentials

- WOS_CREDENTIALS (CP4D)
- WML_CREDENTIALS (CP4D)
- DATABASE_CREDENTIALS (DB2 on CP4D or Cloud Object Storage (COS))
- SCHEMA_NAME

<font color='red'>Replace the `username` and `password` values of `************` with your Cloud Pak for Data `username` and `password`. The value for `url` should match the `url` for your Cloud Pak for Data cluster, which you can get from the browser address bar (be sure to include the 'https://'.</font> The credentials should look something like this (these are example values, not the ones you will use):

`
WOS_CREDENTIALS = {
                   "url": "https://zen.clusterid.us-south.containers.appdomain.cloud",
                   "username": "cp4duser",
                   "password" : "cp4dpass"
                  }
`
#### NOTE: Make sure that there is no trailing forward slash `/` in the `url`

In [62]:
WOS_CREDENTIALS = {
    "url": "******",
    "username": "******",
    "password": "******"
}

In [64]:
WML_CREDENTIALS = {
                   "url": "https://******",
                   "username": "******",
                   "password" : "******",
                   "instance_id": "wml_local",
                   "version" : "3.5"
                  }

In [66]:
#IBM DB2 database connection format example
DATABASE_CREDENTIALS = {
    "hostname":"******",
    "username":"******",
    "password":"******",
    "database":"******",
    "port":"******",
    "ssl":False
    #"sslmode":"verify-full",
    #"certificate_base64":"***"
}

### Action: put created schema name below.

In [68]:
SCHEMA_NAME = 'AIOSFASTPATHICP'

Provide a custom name to be concatenated to model name, deployment name and open scale monitor. Sample value for CUSTOM_NAME could be ```CUSTOM_NAME = 'SAMAYA_OPENSCALE_3.0'```

In [None]:
CUSTOM_NAME = '******'

In [71]:
MODEL_NAME = CUSTOM_NAME + "_MODEL"
DEPLOYMENT_NAME = CUSTOM_NAME + "_DEPLOYMENT"
MONITOR_NAME = CUSTOM_NAME + "_MONITOR"

# 2.0 Model building and deployment <a name="model"></a>

In this section you will learn how to train Spark MLLib model and next deploy it as web-service using Watson Machine Learning service.

## 2.1 Load the training data from github

In [72]:
!rm german_credit_data_biased_training.csv
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/german_credit_data_biased_training.csv


--2021-03-25 15:40:25--  https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/german_credit_data_biased_training.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 689622 (673K) [text/plain]
Saving to: ‘german_credit_data_biased_training.csv’


2021-03-25 15:40:26 (25.6 MB/s) - ‘german_credit_data_biased_training.csv’ saved [689622/689622]



In [73]:
from pyspark.sql import SparkSession
import pandas as pd
import json

spark = SparkSession.builder.getOrCreate()
pd_data = pd.read_csv("german_credit_data_biased_training.csv", sep=",", header=0)
df_data = spark.read.csv(path="german_credit_data_biased_training.csv", sep=",", header=True, inferSchema=True)
df_data.head()

Row(CheckingStatus='0_to_200', LoanDuration=31, CreditHistory='credits_paid_to_date', LoanPurpose='other', LoanAmount=1889, ExistingSavings='100_to_500', EmploymentDuration='less_1', InstallmentPercent=3, Sex='female', OthersOnLoan='none', CurrentResidenceDuration=3, OwnsProperty='savings_insurance', Age=32, InstallmentPlans='none', Housing='own', ExistingCreditsCount=1, Job='skilled', Dependents=1, Telephone='none', ForeignWorker='yes', Risk='No Risk')

## 2.2 Explore data

In [74]:
df_data.printSchema()

root
 |-- CheckingStatus: string (nullable = true)
 |-- LoanDuration: integer (nullable = true)
 |-- CreditHistory: string (nullable = true)
 |-- LoanPurpose: string (nullable = true)
 |-- LoanAmount: integer (nullable = true)
 |-- ExistingSavings: string (nullable = true)
 |-- EmploymentDuration: string (nullable = true)
 |-- InstallmentPercent: integer (nullable = true)
 |-- Sex: string (nullable = true)
 |-- OthersOnLoan: string (nullable = true)
 |-- CurrentResidenceDuration: integer (nullable = true)
 |-- OwnsProperty: string (nullable = true)
 |-- Age: integer (nullable = true)
 |-- InstallmentPlans: string (nullable = true)
 |-- Housing: string (nullable = true)
 |-- ExistingCreditsCount: integer (nullable = true)
 |-- Job: string (nullable = true)
 |-- Dependents: integer (nullable = true)
 |-- Telephone: string (nullable = true)
 |-- ForeignWorker: string (nullable = true)
 |-- Risk: string (nullable = true)



In [75]:
print("Number of records: " + str(df_data.count()))

Number of records: 5000


## 2.3 Create a model

In [76]:
spark_df = df_data
(train_data, test_data) = spark_df.randomSplit([0.8, 0.2], 24)

MODEL_NAME = "Spark German Risk Model - Final"
DEPLOYMENT_NAME = "Spark German Risk Deployment - Final"

print("Number of records for training: " + str(train_data.count()))
print("Number of records for evaluation: " + str(test_data.count()))

spark_df.printSchema()

Number of records for training: 4016
Number of records for evaluation: 984
root
 |-- CheckingStatus: string (nullable = true)
 |-- LoanDuration: integer (nullable = true)
 |-- CreditHistory: string (nullable = true)
 |-- LoanPurpose: string (nullable = true)
 |-- LoanAmount: integer (nullable = true)
 |-- ExistingSavings: string (nullable = true)
 |-- EmploymentDuration: string (nullable = true)
 |-- InstallmentPercent: integer (nullable = true)
 |-- Sex: string (nullable = true)
 |-- OthersOnLoan: string (nullable = true)
 |-- CurrentResidenceDuration: integer (nullable = true)
 |-- OwnsProperty: string (nullable = true)
 |-- Age: integer (nullable = true)
 |-- InstallmentPlans: string (nullable = true)
 |-- Housing: string (nullable = true)
 |-- ExistingCreditsCount: integer (nullable = true)
 |-- Job: string (nullable = true)
 |-- Dependents: integer (nullable = true)
 |-- Telephone: string (nullable = true)
 |-- ForeignWorker: string (nullable = true)
 |-- Risk: string (nullable = 

The code below creates a Random Forest Classifier with Spark, setting up string indexers for the categorical features and the label column. Finally, this notebook creates a pipeline including the indexers and the model, and does an initial Area Under ROC evaluation of the model.

In [77]:
from pyspark.ml.feature import OneHotEncoder, StringIndexer, IndexToString, VectorAssembler
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml import Pipeline, Model
from pyspark.ml.feature import SQLTransformer

features = [x for x in spark_df.columns if x != 'Risk']
categorical_features = ['CheckingStatus', 'CreditHistory', 'LoanPurpose', 'ExistingSavings', 'EmploymentDuration', 'Sex', 'OthersOnLoan', 'OwnsProperty', 'InstallmentPlans', 'Housing', 'Job', 'Telephone', 'ForeignWorker']
categorical_num_features = [x + '_IX' for x in categorical_features]
si_list = [StringIndexer(inputCol=x, outputCol=y) for x, y in zip(categorical_features, categorical_num_features)]
va_features = VectorAssembler(inputCols=categorical_num_features + [x for x in features if x not in categorical_features], outputCol="features")

In [78]:
si_label = StringIndexer(inputCol="Risk", outputCol="label").fit(spark_df)
label_converter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=si_label.labels)

In [79]:
from pyspark.ml.classification import RandomForestClassifier

classifier = RandomForestClassifier(featuresCol="features")
pipeline = Pipeline(stages= si_list + [si_label, va_features, classifier, label_converter])

model = pipeline.fit(train_data)

**Note**: If you want filter features from model output please replace `*` with feature names to be retained in `SQLTransformer` statement.

In [80]:
predictions = model.transform(test_data)
evaluatorDT = BinaryClassificationEvaluator(rawPredictionCol="prediction",  metricName='areaUnderROC')
area_under_curve = evaluatorDT.evaluate(predictions)

evaluatorDT = BinaryClassificationEvaluator(rawPredictionCol="prediction",  metricName='areaUnderPR')
area_under_PR = evaluatorDT.evaluate(predictions)
#default evaluation is areaUnderROC
print("areaUnderROC = %g" % area_under_curve, "areaUnderPR = %g" % area_under_PR)

areaUnderROC = 0.714249 areaUnderPR = 0.655821


In [81]:
# extra code: evaluate more metrics by exporting them into pandas and numpy
from sklearn.metrics import classification_report
y_pred = predictions.toPandas()['prediction']
y_pred = ['Risk' if pred == 1.0 else 'No Risk' for pred in y_pred]
y_test = test_data.toPandas()['Risk']
print(classification_report(y_test, y_pred, target_names=['Risk', 'No Risk']))

              precision    recall  f1-score   support

        Risk       0.79      0.92      0.85       657
     No Risk       0.76      0.51      0.61       327

    accuracy                           0.78       984
   macro avg       0.78      0.71      0.73       984
weighted avg       0.78      0.78      0.77       984



## 2.4 Save training data to Cloud Object Storage

### 2.4.1 Cloud object storage details
In next cells, you will need to paste some credentials to Cloud Object Storage. If you haven't worked with COS yet please visit getting started with COS tutorial. You can find COS_API_KEY_ID and COS_RESOURCE_CRN variables in Service Credentials in menu of your COS instance. Used COS Service Credentials must be created with Role parameter set as Writer. Later training data file will be loaded to the bucket of your instance and used as training refecence in subsription.
COS_ENDPOINT variable can be found in Endpoint field of the menu.

In [82]:
IAM_URL="https://iam.ng.bluemix.net/oidc/token"

In [83]:
COS_API_KEY_ID = "******"
COS_RESOURCE_CRN = "crn:*******::" # eg "crn:v1:bluemix:public:cloud-object-storage:global:a/3bf0d9003abfb5d29761c3e97696b71c:d6f04d83-6c4f-4a62-a165-696756d63903::"
COS_ENDPOINT = "https://******" # Current list avaiable at https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints

In [85]:
BUCKET_NAME = "*******" #example: "credit-risk-training-data"

In [87]:
training_data_file_name="german_credit_data_biased_training.csv"

In [88]:
import ibm_boto3
from ibm_botocore.client import Config, ClientError

cos_client = ibm_boto3.resource("s3",
    ibm_api_key_id=COS_API_KEY_ID,
    ibm_service_instance_id=COS_RESOURCE_CRN,
    ibm_auth_endpoint="https://iam.bluemix.net/oidc/token",
    config=Config(signature_version="oauth"),
    endpoint_url=COS_ENDPOINT
)

In [89]:
with open(training_data_file_name, "rb") as file_data:
    cos_client.Object(BUCKET_NAME, training_data_file_name).upload_fileobj(
        Fileobj=file_data
    )

## 2.5 Publish the model

In this section, the notebook uses Watson Machine Learning to save the model (including the pipeline) to the WML instance. Previous versions of the model are removed so that the notebook can be run again, resetting all data for another demo.

In [90]:
import json
from ibm_watson_machine_learning import APIClient

wml_client = APIClient(WML_CREDENTIALS)
wml_client.version

'1.0.44'

### 2.5.1 Set default space

In order to deploy a model, you would have to create different
 deployment spaces and deploy your models there. You can list all the spaces using the .list()
 function, or you can create new spaces by going to CP4D menu on top left corner --> Deployments --> New Deployment Space. Once you know which space you want to deploy
 in, simply use the GUID of the space as argument for .set.default_space() function below


In [91]:
wml_client.spaces.list()

Note: 'limit' is not provided. Only first 50 records will be displayed if the number of records exceed 50
------------------------------------  --------------------------------  ------------------------
ID                                    NAME                              CREATED
392e71b9-8119-4f5d-ba67-306804a759fb  test-data-links_deployment_space  2021-03-25T15:35:39.860Z
2bb31cb0-728a-4145-a520-67bc64eafe92  tutorial-space                    2021-03-08T17:52:57.499Z
ac4e404c-3ed0-4d8a-9283-7951944d0921  sda-deployment-space              2020-12-11T12:00:21.303Z
2569df16-56fc-4a20-bdd6-433dcbb47c77  scottda-deployment-space          2020-12-08T20:52:39.775Z
------------------------------------  --------------------------------  ------------------------


We'll use the `GUID` for your Deployment space as listed for  the `default_space` in the method below:

In [92]:
wml_client.set.default_space('ac4e404c-3ed0-4d8a-9283-7951944d0921')

'SUCCESS'

In [93]:

space_name = CUSTOM_NAME + "_deployment_space"
# create the space and set it as default
space_meta_data = {
       wml_client.spaces.ConfigurationMetaNames.NAME : space_name,
        wml_client.spaces.ConfigurationMetaNames.DESCRIPTION : CUSTOM_NAME +' tutorial_space'
}
spaces = wml_client.spaces.get_details()['resources']
space_id = None
for space in spaces:
    if space['entity']['name'] == space_name:
        space_id = space["metadata"]["id"]
if space_id is None:
    space_id = wml_client.spaces.store(meta_props=space_meta_data)["metadata"]["id"]

print(space_id)
wml_client.set.default_space(space_id)


Space has been created. However some background setup activities might still be on-going. Check for 'status' field in the response. It has to show 'active' before space can be used. If its not 'active', you can monitor the state with a call to spaces.get_details(space_id)
bb897eb1-41d0-4de3-bcc9-9ed4eb01d756


'SUCCESS'

In [94]:
spaces = wml_client.spaces.get_details()['resources']

In [95]:
spaces

[{'entity': {'compute': [{'crn': 'crn:v1:cpd:private:pm-20:private:a/cpduser:99999999-9999-9999-9999-999999999999::',
     'guid': '99999999-9999-9999-9999-999999999999',
     'name': 'Watson Machine Learning',
     'type': 'machine_learning'}],
   'description': '',
   'name': 'scottda-deployment-space',
   'scope': {'bss_account_id': 'cpdaccount'},
   'status': {'state': 'active'}},
  'metadata': {'created_at': '2020-12-08T20:52:39.775Z',
   'creator_id': '1000331001',
   'id': '2569df16-56fc-4a20-bdd6-433dcbb47c77',
   'updated_at': '2020-12-08T20:52:44.899Z',
   'url': '/v2/spaces/2569df16-56fc-4a20-bdd6-433dcbb47c77'}},
 {'entity': {'compute': [{'crn': 'crn:v1:cpd:private:pm-20:private:a/cpduser:99999999-9999-9999-9999-999999999999::',
     'guid': '99999999-9999-9999-9999-999999999999',
     'name': 'Watson Machine Learning',
     'type': 'machine_learning'}],
   'description': 'Deployments for Health Care assets.',
   'name': 'sda-deployment-space',
   'scope': {'bss_account_id'

### 2.5.2 Remove existing model and deployment

In [96]:
deployments_list = wml_client.deployments.get_details()
for deployment in deployments_list["resources"]:
    model_id = deployment["entity"]["asset"]["id"]
    deployment_id = deployment["metadata"]["id"]
    if deployment["metadata"]["name"] == DEPLOYMENT_NAME:
        print("Deleting deployment id", deployment_id)
        wml_client.deployments.delete(deployment_id)
        print("Deleting model id", model_id)
        wml_client.repository.delete(model_id)
wml_client.repository.list_models()

--  ----  -------  ----
ID  NAME  CREATED  TYPE
--  ----  -------  ----


#### 2.5.2.1 Add training data reference either from DB2 on CP4D or Cloud Object Storage

In [97]:
# COS training data reference example format

training_data_references = [
                {
                    "id": "Credit Risk",
                    "type": "s3",
                    "connection": {
                        "access_key_id": COS_API_KEY_ID,
                        "endpoint_url": COS_ENDPOINT,
                        "resource_instance_id":COS_RESOURCE_CRN
                    },
                    "location": {
                        "bucket": BUCKET_NAME,
                        "path": training_data_file_name,
                    }
                }
            ]

In [98]:
software_spec_uid = wml_client.software_specifications.get_id_by_name("spark-mllib_2.4")
print("Software Specification ID: {}".format(software_spec_uid))
model_props = {
        wml_client._models.ConfigurationMetaNames.NAME:"{}".format(MODEL_NAME),
        #wml_client._models.ConfigurationMetaNames.SPACE_UID: space_id,
        wml_client._models.ConfigurationMetaNames.TYPE: "mllib_2.4",
        wml_client._models.ConfigurationMetaNames.SOFTWARE_SPEC_UID: software_spec_uid,
        wml_client._models.ConfigurationMetaNames.TRAINING_DATA_REFERENCES: training_data_references,
        wml_client._models.ConfigurationMetaNames.LABEL_FIELD: "Risk",
    }

Software Specification ID: 390d21f8-e58b-4fac-9c55-d7ceda621326


In [99]:
print("Storing model ...")
published_model_details = wml_client.repository.store_model(
    model=model, 
    meta_props=model_props, 
    training_data=train_data, 
    pipeline=pipeline)

model_uid = wml_client.repository.get_model_uid(published_model_details)
print("Done")
print("Model ID: {}".format(model_uid))

Storing model ...
Done
Model ID: 007fba82-8912-4059-a358-850c4c8a37fc


In [100]:
wml_client.repository.list_models()

------------------------------------  -------------------------------  ------------------------  ---------
ID                                    NAME                             CREATED                   TYPE
007fba82-8912-4059-a358-850c4c8a37fc  Spark German Risk Model - Final  2021-03-25T15:41:00.002Z  mllib_2.4
------------------------------------  -------------------------------  ------------------------  ---------


## 2.6 Deploy the model

The next section of the notebook deploys the model as a RESTful web service in Watson Machine Learning. The deployed model will have a scoring URL you can use to send data to the model for predictions.

In [101]:
deployment_details = wml_client.deployments.create(
    model_uid, 
    meta_props={
        wml_client.deployments.ConfigurationMetaNames.NAME: "{}".format(DEPLOYMENT_NAME),
        wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
    }
)
scoring_url = wml_client.deployments.get_scoring_href(deployment_details)
deployment_uid=wml_client.deployments.get_uid(deployment_details)

print("Scoring URL:" + scoring_url)
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))



#######################################################################################

Synchronous deployment creation for uid: '007fba82-8912-4059-a358-850c4c8a37fc' started

#######################################################################################


initializing..
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='6f074e16-84e8-4188-bd84-2627fd8680b5'
------------------------------------------------------------------------------------------------


Scoring URL:https://zen-cpd-zen.jrtorres-cpd3-os45-v2-2bef1f4b4097001da9502000c44fc2b2-0000.us-south.containers.appdomain.cloud/ml/v4/deployments/6f074e16-84e8-4188-bd84-2627fd8680b5/predictions
Model id: 007fba82-8912-4059-a358-850c4c8a37fc
Deployment id: 6f074e16-84e8-4188-bd84-2627fd8680b5


## 2.7 Sample scoring

In [102]:
fields = ["CheckingStatus", "LoanDuration", "CreditHistory", "LoanPurpose", "LoanAmount", "ExistingSavings",
                  "EmploymentDuration", "InstallmentPercent", "Sex", "OthersOnLoan", "CurrentResidenceDuration",
                  "OwnsProperty", "Age", "InstallmentPlans", "Housing", "ExistingCreditsCount", "Job", "Dependents",
                  "Telephone", "ForeignWorker"]
values = [
            ["no_checking", 13, "credits_paid_to_date", "car_new", 1343, "100_to_500", "1_to_4", 2, "female", "none", 3,
             "savings_insurance", 46, "none", "own", 2, "skilled", 1, "none", "yes"],
            ["no_checking", 24, "prior_payments_delayed", "furniture", 4567, "500_to_1000", "1_to_4", 4, "male", "none",
             4, "savings_insurance", 36, "none", "free", 2, "management_self-employed", 1, "none", "yes"],
        ]

scoring_payload = {"input_data": [{"fields": fields, "values": values}]}

In [103]:
scoring_response = wml_client.deployments.score(deployment_uid, scoring_payload)
scoring_response

{'predictions': [{'fields': ['CheckingStatus',
    'LoanDuration',
    'CreditHistory',
    'LoanPurpose',
    'LoanAmount',
    'ExistingSavings',
    'EmploymentDuration',
    'InstallmentPercent',
    'Sex',
    'OthersOnLoan',
    'CurrentResidenceDuration',
    'OwnsProperty',
    'Age',
    'InstallmentPlans',
    'Housing',
    'ExistingCreditsCount',
    'Job',
    'Dependents',
    'Telephone',
    'ForeignWorker',
    'CheckingStatus_IX',
    'CreditHistory_IX',
    'LoanPurpose_IX',
    'ExistingSavings_IX',
    'EmploymentDuration_IX',
    'Sex_IX',
    'OthersOnLoan_IX',
    'OwnsProperty_IX',
    'InstallmentPlans_IX',
    'Housing_IX',
    'Job_IX',
    'Telephone_IX',
    'ForeignWorker_IX',
    'label',
    'features',
    'rawPrediction',
    'probability',
    'prediction',
    'predictedLabel'],
   'values': [['no_checking',
     13,
     'credits_paid_to_date',
     'car_new',
     1343,
     '100_to_500',
     '1_to_4',
     2,
     'female',
     'none',
     3,


# 3.0 Configure OpenScale <a name="openscale"></a>

The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [104]:
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator
from ibm_watson_openscale import APIClient

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *

In [105]:
authenticator = CloudPakForDataAuthenticator(
        url=WOS_CREDENTIALS['url'],
        username=WOS_CREDENTIALS['username'],
        password=WOS_CREDENTIALS['password'],
        disable_ssl_verification=True
    )

wos_client = APIClient(service_url=WOS_CREDENTIALS['url'],service_instance_id='00000000-0000-0000-0000-1614796138825286', authenticator=authenticator)
wos_client.version

'3.0.1'

## 3.1 Create datamart

### 3.1.1 Set up datamart

Watson OpenScale uses a database to store payload logs and calculated metrics. If database credentials were not supplied above, the notebook will use the free, internal lite database. If database credentials were supplied, the datamart will be created there unless there is an existing datamart and the KEEP_MY_INTERNAL_POSTGRES variable is set to True. If an OpenScale datamart exists in Db2 or PostgreSQL, the existing datamart will be used and no data will be overwritten.

Prior instances of the German Credit model will be removed from OpenScale monitoring.

In [106]:
wos_client.data_marts.show()

0,1,2,3,4,5
AIOSFASTPATHICP-00000000-0000-0000-0000-1614796138825286,Data Mart created by OpenScale ExpressPath,False,active,2021-03-03 18:41:43.808000+00:00,00000000-0000-0000-0000-1614796138825286


In [107]:
data_marts = wos_client.data_marts.list().result.data_marts
if len(data_marts) == 0:
    if DB_CREDENTIALS is not None:
        if SCHEMA_NAME is None: 
            print("Please specify the SCHEMA_NAME and rerun the cell")

        print('Setting up external datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook",
                database_configuration=DatabaseConfigurationRequest(
                  database_type=DatabaseType.DB2,
                    credentials=PrimaryStorageCredentialsLong(
                        hostname=DATABASE_CREDENTIALS['hostname'],
                        username=DATABASE_CREDENTIALS['username'],
                        password=DATABASE_CREDENTIALS['password'],
                        db=DATABASE_CREDENTIALS['database'],
                        port=DATABASE_CREDENTIALS['port'],
                        ssl=DATABASE_CREDENTIALS['ssl'],
                        sslmode=DATABASE_CREDENTIALS['sslmode'],
                        certificate_base64=DATABASE_CREDENTIALS['certificate_base64']
                    ),
                    location=LocationSchemaName(
                        schema_name= SCHEMA_NAME
                    )
                )
             ).result
    else:
        print('Setting up internal datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook", 
                internal_database = True).result
        
    data_mart_id = added_data_mart_result.metadata.id
    
else:
    data_mart_id=data_marts[0].metadata.id
    print('Using existing datamart {}'.format(data_mart_id))

Using existing datamart 00000000-0000-0000-0000-1614796138825286


## 3.2 Remove existing service provider connected with used WML instance.
Multiple service providers for the same engine instance are avaiable in Watson OpenScale. To avoid multiple service providers of used WML instance in the tutorial notebook the following code deletes existing service provder(s) and then adds new one.

In [108]:
SERVICE_PROVIDER_NAME = "Watson Machine Learning V2"
SERVICE_PROVIDER_DESCRIPTION = "Added by tutorial WOS notebook."

In [109]:
service_providers = wos_client.service_providers.list().result.service_providers
for service_provider in service_providers:
    service_instance_name = service_provider.entity.name
    if service_instance_name == SERVICE_PROVIDER_NAME:
        service_provider_id = service_provider.metadata.id
        wos_client.service_providers.delete(service_provider_id)
        print("Deleted existing service_provider for WML instance: {}".format(service_provider_id))

Deleted existing service_provider for WML instance: 0e61f1b0-7073-4947-9598-253cda21a8b9
Deleted existing service_provider for WML instance: b6c20f90-a4f1-4c11-98a4-915bd2387ec8


## 3.3 Add service provider
Watson OpenScale needs to be bound to the Watson Machine Learning instance to capture payload data into and out of the model.

**Note:** You can bind more than one engine instance if needed by calling `wos_client.service_providers.add` method. Next, you can refer to particular service provider using `service_provider_id`.

In [110]:
added_service_provider_result = wos_client.service_providers.add(
        name=SERVICE_PROVIDER_NAME,
        description=SERVICE_PROVIDER_DESCRIPTION,
        service_type=ServiceTypes.WATSON_MACHINE_LEARNING,
        deployment_space_id = space_id,
        operational_space_id = "production",
        credentials=WMLCredentialsCP4D(
            url=WML_CREDENTIALS["url"],
            username=WML_CREDENTIALS["username"],
            password=WML_CREDENTIALS["password"],
            instance_id=None
        ),
        background_mode=False
    ).result
service_provider_id = added_service_provider_result.metadata.id




 Waiting for end of adding service provider 476fc46b-aca1-40b0-9468-96747988afcb 




active

-----------------------------------------------
 Successfully finished adding service provider 
-----------------------------------------------




In [111]:
wos_client.service_providers.show()

0,1,2,3,4,5
99999999-9999-9999-9999-999999999999,active,Watson Machine Learning V2,watson_machine_learning,2021-03-25 15:41:31.718000+00:00,476fc46b-aca1-40b0-9468-96747988afcb
99999999-9999-9999-9999-999999999999,active,WOS ExpressPath WML pre_production binding,watson_machine_learning,2021-03-03 18:42:06.659000+00:00,2b189084-6cbf-402f-bff9-e138c80582bc
99999999-9999-9999-9999-999999999999,active,WOS ExpressPath WML production binding,watson_machine_learning,2021-03-03 18:42:00.851000+00:00,5f824736-86ce-48b9-9419-6a91aa5d1f8e


In [112]:
asset_deployment_details = wos_client.service_providers.list_assets(data_mart_id=data_mart_id, service_provider_id=service_provider_id, deployment_space_id = space_id).result['resources'][0]
asset_deployment_details

{'metadata': {'guid': '6f074e16-84e8-4188-bd84-2627fd8680b5',
  'created_at': '2021-03-25T15:41:10.304Z',
  'modified_at': '2021-03-25T15:41:10.304Z'},
 'entity': {'name': 'Spark German Risk Deployment - Final',
  'type': 'online',
  'scoring_endpoint': {'url': 'https://ibm-nginx-svc.zen.svc.cluster.local/ml/v4/deployments/6f074e16-84e8-4188-bd84-2627fd8680b5/predictions'},
  'asset': {},
  'asset_properties': {}}}

In [113]:
model_asset_details_from_deployment=wos_client.service_providers.get_deployment_asset(data_mart_id=data_mart_id,service_provider_id=service_provider_id,deployment_id=deployment_uid,deployment_space_id=space_id)
model_asset_details_from_deployment

{'metadata': {'guid': '6f074e16-84e8-4188-bd84-2627fd8680b5',
  'created_at': '2021-03-25T15:41:10.304Z',
  'modified_at': '2021-03-25T15:41:10.304Z'},
 'entity': {'name': 'Spark German Risk Deployment - Final',
  'type': 'online',
  'scoring_endpoint': {'url': 'https://ibm-nginx-svc.zen.svc.cluster.local/ml/v4/deployments/6f074e16-84e8-4188-bd84-2627fd8680b5/predictions'},
  'asset': {'asset_id': '007fba82-8912-4059-a358-850c4c8a37fc',
   'url': 'https://ibm-nginx-svc.zen.svc.cluster.local/ml/v4/models/007fba82-8912-4059-a358-850c4c8a37fc?space_id=bb897eb1-41d0-4de3-bcc9-9ed4eb01d756&version=2020-06-12',
   'name': 'Spark German Risk Model - Final',
   'asset_type': 'model',
   'created_at': '2021-03-25T15:41:00.248Z',
   'modified_at': '2021-03-25T15:41:06.113Z'},
  'asset_properties': {'model_type': 'mllib_2.4',
   'runtime_environment': 'spark-2.4',
   'label_column': 'Risk',
   'input_data_schema': {'type': 'struct',
    'id': '1',
    'fields': [{'name': 'CheckingStatus',
      '

## 3.4 Subscriptions

### 3.4.1 Remove existing credit risk subscriptions

This code removes previous subscriptions to the German Credit model to refresh the monitors with the new model and new data.

In [114]:
wos_client.subscriptions.show()

0,1,2,3,4,5,6,7,8
ebb1a9e0-3514-42c2-bf35-8bf5c1031378,GermanCreditRiskModelICP,00000000-0000-0000-0000-1614796138825286,85aed096-8595-4723-a21c-2b5b17f50fb0,GermanCreditRiskModelICP,5f824736-86ce-48b9-9419-6a91aa5d1f8e,active,2021-03-03 18:46:57.339000+00:00,bcc692f8-7ef4-4afd-8511-e445adbbc772
aa527349-fc7c-4d91-94ff-ba144137e3a6,GermanCreditRiskModelPreProdICP,00000000-0000-0000-0000-1614796138825286,5295cbd6-4fc6-4421-8281-b2fb488fb8ea,GermanCreditRiskModelPreProdICP,2b189084-6cbf-402f-bff9-e138c80582bc,active,2021-03-03 18:45:29.541000+00:00,3cc74421-17d2-4690-ac36-c32184320d4d
70e71357-cc60-4708-9fe1-43a638bcd5f8,GermanCreditRiskModelChallengerICP,00000000-0000-0000-0000-1614796138825286,50598d64-2a9c-495f-b8d8-7012c7b28d0f,GermanCreditRiskModelChallengerICP,2b189084-6cbf-402f-bff9-e138c80582bc,active,2021-03-03 18:42:34.833000+00:00,ca70048b-b5ac-4bb2-9053-e81aea2a4290


In [115]:
subscriptions = wos_client.subscriptions.list().result.subscriptions
for subscription in subscriptions:
    sub_model_id = subscription.entity.asset.asset_id
    if sub_model_id == model_uid:
        wos_client.subscriptions.delete(subscription.metadata.id)
        print('Deleted existing subscription for model', sub_model_id)

This code creates the model subscription in OpenScale using the Python client API. Note that we need to provide the model unique identifier, and some information about the model itself.

In [116]:
subscription_details = wos_client.subscriptions.add(
        data_mart_id=data_mart_id,
        service_provider_id=service_provider_id,
        asset=Asset(
            asset_id=model_asset_details_from_deployment["entity"]["asset"]["asset_id"],
            name=model_asset_details_from_deployment["entity"]["asset"]["name"],
            url=model_asset_details_from_deployment["entity"]["asset"]["url"],
            asset_type=AssetTypes.MODEL,
            input_data_type=InputDataType.STRUCTURED,
            problem_type=ProblemType.BINARY_CLASSIFICATION
        ),
        deployment=AssetDeploymentRequest(
            deployment_id=asset_deployment_details['metadata']['guid'],
            name=asset_deployment_details['entity']['name'],
            deployment_type= DeploymentTypes.ONLINE,
            url=asset_deployment_details['entity']['scoring_endpoint']['url']
        ),
        asset_properties=AssetPropertiesRequest(
            label_column='Risk',
            probability_fields=['probability'],
            prediction_field='predictedLabel',
            feature_fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
            categorical_fields = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"],
            training_data_reference=TrainingDataReference(type='cos',
                                                          location=COSTrainingDataReferenceLocation(bucket = BUCKET_NAME,
                                                                                                    file_name = training_data_file_name),
                                                          connection=COSTrainingDataReferenceConnection.from_dict({
                                                                        "resource_instance_id": COS_RESOURCE_CRN,
                                                                        "url": COS_ENDPOINT,
                                                                        "api_key": COS_API_KEY_ID,
                                                                        "iam_url": IAM_URL})),
            training_data_schema=SparkStruct.from_dict(model_asset_details_from_deployment["entity"]["asset_properties"]["training_data_schema"])
        )
    ).result
subscription_id = subscription_details.metadata.id
subscription_id

'654221b8-abb9-4369-a82a-6db10d74da51'

In [117]:
import time

time.sleep(5)
payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id: ", payload_data_set_id)

Payload data set id:  02360277-bf28-48cf-b308-42c396933071


In [118]:
wos_client.data_sets.show()

0,1,2,3,4,5,6
00000000-0000-0000-0000-1614796138825286,active,654221b8-abb9-4369-a82a-6db10d74da51,subscription,manual_labeling,2021-03-25 15:41:41.524000+00:00,e7f053d3-0b31-442f-92c3-54c43d8bcbea
00000000-0000-0000-0000-1614796138825286,active,654221b8-abb9-4369-a82a-6db10d74da51,subscription,payload_logging,2021-03-25 15:41:41.463000+00:00,02360277-bf28-48cf-b308-42c396933071
00000000-0000-0000-0000-1614796138825286,active,bcc692f8-7ef4-4afd-8511-e445adbbc772,subscription,manual_labeling,2021-03-03 18:46:59.977000+00:00,70aa062d-d85d-4d02-a331-af2cbf4b2fb4
00000000-0000-0000-0000-1614796138825286,active,3cc74421-17d2-4690-ac36-c32184320d4d,subscription,manual_labeling,2021-03-03 18:45:31.253000+00:00,ec1a0e12-0573-4167-9133-266089e87440
00000000-0000-0000-0000-1614796138825286,active,ca70048b-b5ac-4bb2-9053-e81aea2a4290,subscription,manual_labeling,2021-03-03 18:42:44.681000+00:00,1718662a-4be4-4cde-88e1-ee48464b8383
00000000-0000-0000-0000-1614796138825286,active,bcc692f8-7ef4-4afd-8511-e445adbbc772,subscription,payload_logging,2021-03-03 18:46:59.850000+00:00,b4a3eaa2-445c-4f04-ab97-92ada27dfb0d
00000000-0000-0000-0000-1614796138825286,active,bcc692f8-7ef4-4afd-8511-e445adbbc772,subscription,feedback,2021-03-03 18:47:36.892000+00:00,4e4f020d-7256-45ed-8d0e-fadd8eda5792
00000000-0000-0000-0000-1614796138825286,active,3cc74421-17d2-4690-ac36-c32184320d4d,subscription,payload_logging,2021-03-03 18:45:31.152000+00:00,91434557-3edb-46df-a623-b0b2f42a55bb
00000000-0000-0000-0000-1614796138825286,active,bcc692f8-7ef4-4afd-8511-e445adbbc772,subscription,training,2021-03-03 18:47:00.067000+00:00,5cd17974-558c-4355-987c-a64d34671d70
00000000-0000-0000-0000-1614796138825286,active,3cc74421-17d2-4690-ac36-c32184320d4d,subscription,feedback,2021-03-03 18:45:59.413000+00:00,9c35ef98-8b40-46b9-9bb6-26bf6680c69b


Note: First 10 records were displayed.


### 3.4.2 Score the model so we can configure monitors

Now that the WML service has been bound and the subscription has been created, we need to send a request to the model before we configure OpenScale. This allows OpenScale to create a payload log in the datamart with the correct schema, so it can capture data coming into and out of the model.

In [119]:
fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"]
values = [
  ["no_checking",13,"credits_paid_to_date","car_new",1343,"100_to_500","1_to_4",2,"female","none",3,"savings_insurance",46,"none","own",2,"skilled",1,"none","yes"],
  ["no_checking",24,"prior_payments_delayed","furniture",4567,"500_to_1000","1_to_4",4,"male","none",4,"savings_insurance",36,"none","free",2,"management_self-employed",1,"none","yes"],
  ["0_to_200",26,"all_credits_paid_back","car_new",863,"less_100","less_1",2,"female","co-applicant",2,"real_estate",38,"none","own",1,"skilled",1,"none","yes"],
  ["0_to_200",14,"no_credits","car_new",2368,"less_100","1_to_4",3,"female","none",3,"real_estate",29,"none","own",1,"skilled",1,"none","yes"],
  ["0_to_200",4,"no_credits","car_new",250,"less_100","unemployed",2,"female","none",3,"real_estate",23,"none","rent",1,"management_self-employed",1,"none","yes"],
  ["no_checking",17,"credits_paid_to_date","car_new",832,"100_to_500","1_to_4",2,"male","none",2,"real_estate",42,"none","own",1,"skilled",1,"none","yes"],
  ["no_checking",33,"outstanding_credit","appliances",5696,"unknown","greater_7",4,"male","co-applicant",4,"unknown",54,"none","free",2,"skilled",1,"yes","yes"],
  ["0_to_200",13,"prior_payments_delayed","retraining",1375,"100_to_500","4_to_7",3,"male","none",3,"real_estate",37,"none","own",2,"management_self-employed",1,"none","yes"]
]

payload_scoring = {"fields": fields,"values": values}
payload = {
    wml_client.deployments.ScoringMetaNames.INPUT_DATA: [payload_scoring]
}
scoring_response = wml_client.deployments.score(deployment_uid, payload)

print('Single record scoring result:', '\n fields:', scoring_response['predictions'][0]['fields'], '\n values: ', scoring_response['predictions'][0]['values'][0])

Single record scoring result: 
 fields: ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'CheckingStatus_IX', 'CreditHistory_IX', 'LoanPurpose_IX', 'ExistingSavings_IX', 'EmploymentDuration_IX', 'Sex_IX', 'OthersOnLoan_IX', 'OwnsProperty_IX', 'InstallmentPlans_IX', 'Housing_IX', 'Job_IX', 'Telephone_IX', 'ForeignWorker_IX', 'label', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'] 
 values:  ['no_checking', 13, 'credits_paid_to_date', 'car_new', 1343, '100_to_500', '1_to_4', 2, 'female', 'none', 3, 'savings_insurance', 46, 'none', 'own', 2, 'skilled', 1, 'none', 'yes', 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, [20, [1, 3, 5, 13, 14, 15, 16, 17, 18, 19], [1.0, 1.

## 3.5 Check if WML payload logging worked else manually store payload records

In [120]:
import uuid
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
#time.sleep(5)
wml_predictions = {"fields": scoring_response['predictions'][0]['fields'], "values":scoring_response['predictions'][0]['values']}
#print(wml_predictions)
#print("==========")
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    print("Payload logging did not happen, performing explicit payload logging.")
    wos_client.data_sets.store_records(data_set_id=payload_data_set_id, request_body=[PayloadRecord(
                   scoring_id=str(uuid.uuid4()),
                   request=payload_scoring,
                   response=wml_predictions,
                   response_time=460
               )])
    time.sleep(15)
    pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))

Number of records in the payload logging table: 0
Payload logging did not happen, performing explicit payload logging.
Number of records in the payload logging table: 8


In [121]:
time.sleep(10)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))

Number of records in the payload logging table: 8



# 4.0 Quality monitoring and feedback logging <a name="quality"></a>

## 4.1 Enable quality monitoring

The code below waits ten seconds to allow the payload logging table to be set up before it begins enabling monitors. First, it turns on the quality (accuracy) monitor and sets an alert threshold of 70%. OpenScale will show an alert on the dashboard if the model accuracy measurement (area under the curve, in the case of a binary classifier) falls below this threshold.

The second paramater supplied, min_records, specifies the minimum number of feedback records OpenScale needs before it calculates a new measurement. The quality monitor runs hourly, but the accuracy reading in the dashboard will not change until an additional 50 feedback records have been added, via the user interface, the Python client, or the supplied feedback endpoint.

In [122]:
import time

time.sleep(10)
target = Target(
        target_type=TargetTypes.SUBSCRIPTION,
        target_id=subscription_id
)
parameters = {
    "min_feedback_data_size": 50
}
thresholds = [
                {
                    "metric_id": "area_under_roc",
                    "type": "lower_limit",
                    "value": .80
                }
            ]
quality_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.QUALITY.ID,
    target=target,
    parameters=parameters,
    thresholds=thresholds
).result




 Waiting for end of monitor instance creation 42865c28-2931-4886-9114-623337382ce3 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




In [123]:
quality_monitor_instance_id = quality_monitor_details.metadata.id
quality_monitor_instance_id

'42865c28-2931-4886-9114-623337382ce3'

## 4.2 Feedback logging

The code below downloads and stores enough feedback data to meet the minimum threshold so that OpenScale can calculate a new accuracy measurement. It then kicks off the accuracy monitor. The monitors run hourly, or can be initiated via the Python API, the REST API, or the graphical user interface.

In [124]:
!rm additional_feedback_data_v2.json
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/additional_feedback_data_v2.json


rm: cannot remove 'additional_feedback_data_v2.json': No such file or directory
--2021-03-25 15:42:30--  https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/additional_feedback_data_v2.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 50890 (50K) [text/plain]
Saving to: ‘additional_feedback_data_v2.json’


2021-03-25 15:42:30 (18.4 MB/s) - ‘additional_feedback_data_v2.json’ saved [50890/50890]



## 4.3 Get feedback logging dataset ID

In [125]:
feedback_dataset_id = None
feedback_dataset = wos_client.data_sets.list(type=DataSetTypes.FEEDBACK, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result
print(feedback_dataset)
feedback_dataset_id = feedback_dataset.data_sets[0].metadata.id
if feedback_dataset_id is None:
    print("Feedback data set not found. Please check quality monitor status.")

{
  "data_sets": [
    {
      "metadata": {
        "id": "108726c0-de25-4909-810b-735e2d7ac2d0",
        "crn": "crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-1614796138825286:data_set:108726c0-de25-4909-810b-735e2d7ac2d0",
        "url": "/v2/data_sets/108726c0-de25-4909-810b-735e2d7ac2d0",
        "created_at": "2021-03-25T15:42:24.205000Z",
        "created_by": "internal-service",
        "modified_at": "2021-03-25T15:42:24.623000Z",
        "modified_by": "internal-service"
      },
      "entity": {
        "data_mart_id": "00000000-0000-0000-0000-1614796138825286",
        "name": "654221b8-abb9-4369-a82a-6db10d74da51_feedback",
        "description": "654221b8-abb9-4369-a82a-6db10d74da51_feedback",
        "type": "feedback",
        "target": {
          "target_type": "subscription",
          "target_id": "654221b8-abb9-4369-a82a-6db10d74da51"
        },
        "schema_update_mode": "auto",
        "data_schema": {
          "type": "struct",
   

In [126]:
with open('additional_feedback_data_v2.json') as feedback_file:
    additional_feedback_data = json.load(feedback_file)

In [127]:
wos_client.data_sets.store_records(feedback_dataset_id, request_body=additional_feedback_data, background_mode=False)




 Waiting for end of storing records with request id: 2da8749f-ea37-4258-8161-aec3be05e5c9 




active

---------------------------------------
 Successfully finished storing records 
---------------------------------------




<ibm_cloud_sdk_core.detailed_response.DetailedResponse at 0x7eff02af6d90>

In [128]:
wos_client.data_sets.get_records_count(data_set_id=feedback_dataset_id)

98

In [129]:
run_details = wos_client.monitor_instances.run(monitor_instance_id=quality_monitor_instance_id, background_mode=False).result




 Waiting for end of monitoring run 67464003-b951-46b1-8e35-44dce95db604 




error

-------------------------------
 Run failed with status: error 
-------------------------------


Reason: ['code: AIQFS0002E, message: Action `Score Batch` has failed with status code 500; associated message: `{2}`', "code: AIQGS0099E, message: User '1000330999' is not authorized to perform this action on space 'bb897eb1-41d0-4de3-bcc9-9ed4eb01d756'."]


In [130]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=quality_monitor_instance_id)

# 5.0 Fairness, drift monitoring and explanations 
 <a name="fairness"></a>

The code below configures fairness monitoring for our model. It turns on monitoring for two features, Sex and Age. In each case, we must specify:
  * Which model feature to monitor
  * One or more **majority** groups, which are values of that feature that we expect to receive a higher percentage of favorable outcomes
  * One or more **minority** groups, which are values of that feature that we expect to receive a higher percentage of unfavorable outcomes
  * The threshold at which we would like OpenScale to display an alert if the fairness measurement falls below (in this case, 95%)

Additionally, we must specify which outcomes from the model are favourable outcomes, and which are unfavourable. We must also provide the number of records OpenScale will use to calculate the fairness score. In this case, OpenScale's fairness monitor will run hourly, but will not calculate a new fairness rating until at least 200 records have been added. Finally, to calculate fairness, OpenScale must perform some calculations on the training data, so we provide the dataframe containing the data.

In [131]:
wos_client.monitor_instances.show()

0,1,2,3,4,5,6
00000000-0000-0000-0000-1614796138825286,active,654221b8-abb9-4369-a82a-6db10d74da51,subscription,quality,2021-03-25 15:42:23.440000+00:00,42865c28-2931-4886-9114-623337382ce3
00000000-0000-0000-0000-1614796138825286,active,bcc692f8-7ef4-4afd-8511-e445adbbc772,subscription,fairness,2021-03-03 18:47:23.117000+00:00,d9d5f0b1-641b-400a-a16d-64338609567d
00000000-0000-0000-0000-1614796138825286,active,bcc692f8-7ef4-4afd-8511-e445adbbc772,subscription,drift,2021-03-03 18:47:42.095000+00:00,5c4b4626-7a3c-4f83-9e3d-d6a2eb447bef
00000000-0000-0000-0000-1614796138825286,active,7530d5ea-be94-44ea-b788-841b06bb627a,instance,performance,2021-03-15 20:44:16.551000+00:00,451b423d-1844-4d5c-a4ad-4b6d0babbf19
00000000-0000-0000-0000-1614796138825286,active,6fcd8ec3-e6ff-4d83-acf2-7b1d48f19bdd,instance,performance,2021-03-12 16:48:09.432000+00:00,54235f35-004a-42f6-8451-ac1abd9fdcfd
00000000-0000-0000-0000-1614796138825286,active,b4a3eaa2-445c-4f04-ab97-92ada27dfb0d,instance,performance,2021-03-03 18:50:38.059000+00:00,c5249a88-9c98-4115-adf6-7d7aee80aef1
00000000-0000-0000-0000-1614796138825286,active,3cc74421-17d2-4690-ac36-c32184320d4d,subscription,fairness,2021-03-03 18:45:52.727000+00:00,5994153e-05af-40ff-9881-96f74981daaa
00000000-0000-0000-0000-1614796138825286,active,bcc692f8-7ef4-4afd-8511-e445adbbc772,subscription,mrm,2021-03-03 18:47:48.164000+00:00,7e20767a-c1b5-4a49-9dd3-62cadbb39387
00000000-0000-0000-0000-1614796138825286,active,bcc692f8-7ef4-4afd-8511-e445adbbc772,subscription,quality,2021-03-03 18:47:34.931000+00:00,e04329ee-3f50-40a8-8ae7-7a4814ecd1de
00000000-0000-0000-0000-1614796138825286,active,3cc74421-17d2-4690-ac36-c32184320d4d,subscription,drift,2021-03-03 18:46:05.792000+00:00,e6ab3892-9ddf-4e7b-a580-0ef883f9aeb5


Note: First 10 records were displayed.


In [132]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id

)
parameters = {
    "features": [
        {"feature": "Sex",
         "majority": ['male'],
         "minority": ['female'],
         "threshold": 0.95
         },
        {"feature": "Age",
         "majority": [[26, 75]],
         "minority": [[18, 25]],
         "threshold": 0.95
         }
    ],
    "favourable_class": ["No Risk"],
    "unfavourable_class": ["Risk"],
    "min_records": 100
}

fairness_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.FAIRNESS.ID,
    target=target,
    parameters=parameters).result
fairness_monitor_instance_id =fairness_monitor_details.metadata.id
fairness_monitor_instance_id




 Waiting for end of monitor instance creation 33d3a0d3-8ecc-4eb4-af8a-3d28da686ccd 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




'33d3a0d3-8ecc-4eb4-af8a-3d28da686ccd'

## 5.1 Drift configuration

In [133]:
monitor_instances = wos_client.monitor_instances.list().result.monitor_instances
for monitor_instance in monitor_instances:
    monitor_def_id=monitor_instance.entity.monitor_definition_id
    if monitor_def_id == "drift" and monitor_instance.entity.target.target_id == subscription_id:
        wos_client.monitor_instances.delete(monitor_instance.metadata.id)
        print('Deleted existing drift monitor instance with id: ', monitor_instance.metadata.id)

In [134]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id

)
parameters = {
    "min_samples": 100,
    "drift_threshold": 0.1,
    "train_drift_model": True,
    "enable_model_drift": False,
    "enable_data_drift": True
}

drift_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.DRIFT.ID,
    target=target,
    parameters=parameters
).result

drift_monitor_instance_id = drift_monitor_details.metadata.id
drift_monitor_instance_id




 Waiting for end of monitor instance creation 442f219f-c7aa-42f5-becb-273c6cbab62c 




preparing.
active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




'442f219f-c7aa-42f5-becb-273c6cbab62c'

## 5.2 Score the model again now that monitoring is configured

This next section randomly selects 200 records from the data feed and sends those records to the model for predictions. This is enough to exceed the minimum threshold for records set in the previous section, which allows OpenScale to begin calculating fairness.

In [135]:
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/german_credit_feed.json
!ls -lh german_credit_feed.json

--2021-03-25 15:43:08--  https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/german_credit_feed.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3076279 (2.9M) [text/plain]
Saving to: ‘german_credit_feed.json’


2021-03-25 15:43:08 (61.4 MB/s) - ‘german_credit_feed.json’ saved [3076279/3076279]

-rw-r-----. 1 wsuser watsonstudio 3.0M Mar 25 15:43 german_credit_feed.json


Score 200 randomly chosen records

In [136]:
import random

with open('german_credit_feed.json', 'r') as scoring_file:
    scoring_data = json.load(scoring_file)

fields = scoring_data['fields']
values = []
for _ in range(200):
    values.append(random.choice(scoring_data['values']))
payload_scoring = {"input_data": [{"fields": fields, "values": values}]}

scoring_response = wml_client.deployments.score(deployment_uid, payload_scoring)
time.sleep(5)

if pl_records_count == 8:
    print("Payload logging did not happen, performing explicit payload logging.")
    wos_client.data_sets.store_records(data_set_id=payload_data_set_id, request_body=[PayloadRecord(
                   scoring_id=str(uuid.uuid4()),
                   request=payload_scoring,
                   response=scoring_response,
                   response_time=460
               )])
    time.sleep(5)
    pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))

Payload logging did not happen, performing explicit payload logging.
Number of records in the payload logging table: 208


In [137]:
print('Number of records in payload table: ', wos_client.data_sets.get_records_count(data_set_id=payload_data_set_id))

Number of records in payload table:  208


## 5.3 Run fairness monitor

Kick off a fairness monitor run on current data. The monitor runs hourly, but can be manually initiated using the Python client, the REST API, or the graphical user interface.

In [138]:
time.sleep(5)
run_details = wos_client.monitor_instances.run(monitor_instance_id=fairness_monitor_instance_id, background_mode=False)




 Waiting for end of monitoring run a6351ccc-6851-467a-9cac-789e4bfbd9c6 




error

-------------------------------
 Run failed with status: error 
-------------------------------


Reason: ['code: AIQFM6004, message: An unexpected bias error occured. An error while doing the scoring for the datamart 00000000-0000-0000-0000-1614796138825286, deployment_id 6f074e16-84e8-4188-bd84-2627fd8680b5 and subscription_id 654221b8-abb9-4369-a82a-6db10d74da51. Error: (500, <Errors.AIQFM5005: \'An error while doing the scoring for the datamart %s, deployment_id %s and subscription_id %s. Error: %s\'>, \'00000000-0000-0000-0000-1614796138825286\', \'6f074e16-84e8-4188-bd84-2627fd8680b5\', \'654221b8-abb9-4369-a82a-6db10d74da51\', \'Failed to Score against ML Gateway using REST Client. Status code: 500, response: {"trace":"654221b8-abb9-4369-a82a-6db10d74da51/bias","error":{"code":"AIQGS0099E","message":"AIQGS0099E : Score request failed for deployment id = `6f074e16-84e8-4188-bd84-2627fd8680b5`, 

In [139]:
time.sleep(10)
wos_client.monitor_instances.show_metrics(monitor_instance_id=fairness_monitor_instance_id)

## 5.4 Run drift monitor

Kick off a drift monitor run on current data. The monitor runs every hour, but can be manually initiated using the Python client, the REST API.

In [140]:
drift_run_details = wos_client.monitor_instances.run(monitor_instance_id=drift_monitor_instance_id, background_mode=False)




 Waiting for end of monitoring run 413b8d32-3c3e-41da-9342-0f226f68050d 




finished

---------------------------
 Successfully finished run 
---------------------------




In [141]:
time.sleep(5)
wos_client.monitor_instances.show_metrics(monitor_instance_id=drift_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2021-03-25 15:43:45.227750+00:00,data_drift_magnitude,ef2b0d77-2642-43da-8aa8-2356aebf15db,0.1057692307692307,,,[],drift,442f219f-c7aa-42f5-becb-273c6cbab62c,413b8d32-3c3e-41da-9342-0f226f68050d,subscription,654221b8-abb9-4369-a82a-6db10d74da51


## 5.5 Configure Explainability

Finally, we provide OpenScale with the training data to enable and configure the explainability features.

In [142]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)
parameters = {
    "enabled": True
}
explainability_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.EXPLAINABILITY.ID,
    target=target,
    parameters=parameters
).result

explainability_monitor_id = explainability_details.metadata.id




 Waiting for end of monitor instance creation 658bfc0b-b797-4b2b-8ce3-c2d23681cd20 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




## 5.6 Run explanation for sample record

In [143]:
pl_records_resp = wos_client.data_sets.get_list_of_records(data_set_id=payload_data_set_id, limit=1, offset=0).result
scoring_ids = [pl_records_resp["records"][0]["entity"]["values"]["scoring_id"]]
print("Running explanations on scoring IDs: {}".format(scoring_ids))
explanation_types = ["lime", "contrastive"]
result = wos_client.monitor_instances.explanation_tasks(scoring_ids=scoring_ids, explanation_types=explanation_types).result
print(result)

Running explanations on scoring IDs: ['96c3d602-cd53-4f55-9d83-b87257cdce99-1']
{
  "metadata": {
    "explanation_task_ids": [
      "1a696ac6-b52d-49e0-8e85-be83a76c8fc7"
    ],
    "created_by": "1000331001",
    "created_at": "2021-03-25T15:44:01.181164Z"
  }
}


# 6.0 Custom monitors and metrics <a name="custom"></a>

## 6.1 Register custom monitor

In [144]:
def get_definition(monitor_name):
    monitor_definitions = wos_client.monitor_definitions.list().result.monitor_definitions
    
    for definition in monitor_definitions:
        if monitor_name == definition.entity.name:
            return definition
    
    return None

In [145]:
monitor_name = 'my model performance'
metrics = [MonitorMetricRequest(name='sensitivity',
                                thresholds=[MetricThreshold(type=MetricThresholdTypes.LOWER_LIMIT, default=0.8)]),
          MonitorMetricRequest(name='specificity',
                                thresholds=[MetricThreshold(type=MetricThresholdTypes.LOWER_LIMIT, default=0.75)])]
tags = [MonitorTagRequest(name='region', description='customer geographical region')]

existing_definition = get_definition(monitor_name)

if existing_definition is None:
    custom_monitor_details = wos_client.monitor_definitions.add(name=monitor_name, metrics=metrics, tags=tags, background_mode=False).result
else:
    custom_monitor_details = existing_definition

## 6.2 Show available monitors types

In [146]:
wos_client.monitor_definitions.show()

0,1,2
test_no_cos_monitor,test_no_COS_MONITOR,"['sensitivity', 'specificity']"
my_model_performance,my model performance,"['sensitivity', 'specificity']"
assurance,Assurance,"['Uncertainty', 'Confidence']"
fairness,Fairness,"['Fairness value', 'Average Odds Difference metric value', 'False Discovery Rate Difference metric value', 'Error Rate Difference metric value', 'False Negative Rate Difference metric value', 'False Omission Rate Difference metric value', 'False Positive Rate Difference metric value', 'True Positive Rate Difference metric value']"
performance,Performance,['Number of records']
explainability,Explainability,[]
mrm,Model risk management monitoring,"['Tests run', 'Tests passed', 'Tests failed', 'Tests skipped', 'Fairness score', 'Quality score', 'Drift score']"
correlations,Correlations,"['Maximum positive correlation coefficient', 'Maximum negative correlation coefficient', 'Mean absolute correlation coefficient', 'Significant correlation coefficients count']"
drift,Drift,"['Drop in accuracy', 'Predicted accuracy', 'Drop in data consistency']"
quality,Quality,"['Area under ROC', 'Area under PR', 'Proportion explained variance', 'Mean absolute error', 'Mean squared error', 'R squared', 'Root of mean squared error', 'Accuracy', 'Weighted True Positive Rate (wTPR)', 'True positive rate (TPR)', 'Weighted False Positive Rate (wFPR)', 'False positive rate (FPR)', 'Weighted recall', 'Recall', 'Weighted precision', 'Precision', 'Weighted F1-Measure', 'F1-Measure', 'Logarithmic loss']"


### 6.2.1 Get monitors uids and details

In [147]:
custom_monitor_id = custom_monitor_details.metadata.id

print(custom_monitor_id)

my_model_performance


In [148]:
custom_monitor_details = wos_client.monitor_definitions.get(monitor_definition_id=custom_monitor_id).result
print('Monitor definition details:', custom_monitor_details)

Monitor definition details: {
  "metadata": {
    "id": "my_model_performance",
    "crn": "crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-1614796138825286:monitor_definition:my_model_performance",
    "url": "/v2/monitor_definitions/my_model_performance",
    "created_at": "2021-03-12T16:47:11.603000Z",
    "created_by": "scottda"
  },
  "entity": {
    "name": "my model performance",
    "metrics": [
      {
        "name": "sensitivity",
        "thresholds": [
          {
            "type": "lower_limit",
            "default": 0.8
          }
        ],
        "expected_direction": "increasing",
        "id": "sensitivity"
      },
      {
        "name": "specificity",
        "thresholds": [
          {
            "type": "lower_limit",
            "default": 0.75
          }
        ],
        "expected_direction": "increasing",
        "id": "specificity"
      }
    ],
    "tags": [
      {
        "name": "region",
        "description": "customer

## 6.3 Enable custom monitor for subscription

In [149]:
target = Target(
        target_type=TargetTypes.SUBSCRIPTION,
        target_id=subscription_id
    )

thresholds = [MetricThresholdOverride(metric_id='sensitivity', type = MetricThresholdTypes.LOWER_LIMIT, value=0.9)]

custom_monitor_instance_details = wos_client.monitor_instances.create(
            data_mart_id=data_mart_id,
            background_mode=False,
            monitor_definition_id=custom_monitor_id,
            target=target
).result




 Waiting for end of monitor instance creation 72f1490e-eb2d-4ea8-8960-0b08568b954c 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




### 6.3.1 Get monitor instance id and configuration details

In [150]:
custom_monitor_instance_id = custom_monitor_instance_details.metadata.id

In [151]:
custom_monitor_instance_details = wos_client.monitor_instances.get(custom_monitor_instance_id).result
print(custom_monitor_instance_details)

{
  "metadata": {
    "id": "72f1490e-eb2d-4ea8-8960-0b08568b954c",
    "crn": "crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-1614796138825286:monitor_instance:72f1490e-eb2d-4ea8-8960-0b08568b954c",
    "url": "/v2/monitor_instances/72f1490e-eb2d-4ea8-8960-0b08568b954c",
    "created_at": "2021-03-25T15:44:06.250000Z",
    "created_by": "scottda"
  },
  "entity": {
    "data_mart_id": "00000000-0000-0000-0000-1614796138825286",
    "monitor_definition_id": "my_model_performance",
    "target": {
      "target_type": "subscription",
      "target_id": "654221b8-abb9-4369-a82a-6db10d74da51"
    },
    "thresholds": [
      {
        "metric_id": "sensitivity",
        "type": "lower_limit",
        "value": 0.8
      },
      {
        "metric_id": "specificity",
        "type": "lower_limit",
        "value": 0.75
      }
    ],
    "schedule": {
      "repeat_interval": 60,
      "repeat_unit": "minute",
      "repeat_type": "minute"
    },
    "status": {
   

## 6.4 Storing custom metrics

In [152]:
from datetime import datetime, timezone, timedelta
from ibm_watson_openscale.base_classes.watson_open_scale_v2 import MonitorMeasurementRequest
custom_monitoring_run_id = "11122223333111abc"
measurement_request = [MonitorMeasurementRequest(timestamp=datetime.now(timezone.utc), 
                                                 metrics=[{"specificity": 0.78, "sensitivity": 0.67, "region": "us-south"}], run_id=custom_monitoring_run_id)]
print(measurement_request[0])

{
  "timestamp": "2021-03-25T15:44:11.616366Z",
  "run_id": "11122223333111abc",
  "metrics": [
    {
      "specificity": 0.78,
      "sensitivity": 0.67,
      "region": "us-south"
    }
  ]
}


In [153]:
published_measurement_response = wos_client.monitor_instances.measurements.add(
    monitor_instance_id=custom_monitor_instance_id,
    monitor_measurement_request=measurement_request).result
published_measurement_id = published_measurement_response[0]["measurement_id"]
print(published_measurement_response)

[{'measurement_id': 'd0f994f7-f6e4-469a-9dd2-e89f377bd880', 'metrics': [{'region': 'us-south', 'sensitivity': 0.67, 'specificity': 0.78}], 'run_id': '11122223333111abc', 'timestamp': '2021-03-25T15:44:11.616366Z'}]


### 6.4.1 List and get custom metrics

In [154]:
time.sleep(5)
published_measurement = wos_client.monitor_instances.measurements.get(monitor_instance_id=custom_monitor_instance_id, measurement_id=published_measurement_id).result
print(published_measurement)

{
  "metadata": {
    "id": "d0f994f7-f6e4-469a-9dd2-e89f377bd880",
    "crn": "",
    "url": "/v2/monitor_instances/72f1490e-eb2d-4ea8-8960-0b08568b954c/measurements/d0f994f7-f6e4-469a-9dd2-e89f377bd880",
    "created_at": "2021-03-25T15:44:12.248000Z",
    "created_by": ""
  },
  "entity": {
    "timestamp": "2021-03-25T15:44:11.616366Z",
    "run_id": "11122223333111abc",
    "values": [
      {
        "metrics": [
          {
            "id": "sensitivity",
            "value": 0.67,
            "lower_limit": 0.8
          },
          {
            "id": "specificity",
            "value": 0.78,
            "lower_limit": 0.75
          }
        ],
        "tags": [
          {
            "id": "region",
            "value": "us-south"
          }
        ]
      }
    ],
    "issue_count": 1,
    "target": {
      "target_type": "subscription",
      "target_id": "654221b8-abb9-4369-a82a-6db10d74da51"
    },
    "monitor_instance_id": "72f1490e-eb2d-4ea8-8960-0b08568b954c",


# 7.0 Historical data <a name="historical"></a>

In [155]:
historyDays = 7

 ## 7.1 Insert historical payloads

The next section of the notebook downloads and writes historical data to the payload and measurement tables to simulate a production model that has been monitored and receiving regular traffic for the last seven days. This historical data can be viewed in the Watson OpenScale user interface. The code uses the Python and REST APIs to write this data.

In [156]:
!rm history_fairness_v2.json
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_fairness_v2.json
!ls -lh history_fairness_v2.json

rm: cannot remove 'history_fairness_v2.json': No such file or directory
--2021-03-25 15:44:19--  https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_fairness_v2.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2021-03-25 15:44:19 ERROR 404: Not Found.

ls: cannot access 'history_fairness_v2.json': No such file or directory


In [160]:
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/historical_data/credit_risk/history_fairness_v2.json
!ls -lh history_fairness_v2.json


--2021-03-25 15:48:15--  https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/historical_data/credit_risk/history_fairness_v2.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 37259 (36K) [text/plain]
Saving to: ‘history_fairness_v2.json.1’


2021-03-25 15:48:15 (17.2 MB/s) - ‘history_fairness_v2.json.1’ saved [37259/37259]

-rw-r-----. 1 wsuser watsonstudio 37K Mar 25 15:47 history_fairness_v2.json


In [161]:
from datetime import datetime, timedelta, timezone

with open('history_fairness_v2.json', 'r') as history_file:
    payloads = json.load(history_file)

for day in range(historyDays):
    print('Loading day', day + 1)
    daily_measurement_requests = []
    
    for hour in range(24):
        score_time = datetime.now(timezone.utc) + timedelta(hours=(-(24*day + hour + 1)))
        index = (day * 24 + hour) % len(payloads) # wrap around and reuse values if needed
 
        measurement_request = MonitorMeasurementRequest(timestamp=score_time,metrics = [payloads[index][0], payloads[index][1]])
        daily_measurement_requests.append(measurement_request)
        
        
    response = wos_client.monitor_instances.measurements.add(
                                            monitor_instance_id=fairness_monitor_instance_id,
                                            monitor_measurement_request=daily_measurement_requests).result     
print('Finished')

Loading day 1
Loading day 2
Loading day 3
Loading day 4
Loading day 5
Loading day 6
Loading day 7
Finished


## 7.2 Insert historical debias metrics

In [162]:
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_debias_v2.json
!ls -lh history_debias_v2.json

--2021-03-25 15:48:21--  https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_debias_v2.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 37259 (36K) [text/plain]
Saving to: ‘history_debias_v2.json’


2021-03-25 15:48:21 (18.8 MB/s) - ‘history_debias_v2.json’ saved [37259/37259]

-rw-r-----. 1 wsuser watsonstudio 37K Mar 25 15:48 history_debias_v2.json


In [163]:
with open('history_debias_v2.json', 'r') as history_file:
    payloads = json.load(history_file)

for day in range(historyDays):
    print('Loading day', day + 1)
    daily_measurement_requests = []
    for hour in range(24):
        score_time = datetime.now(timezone.utc) + timedelta(hours=(-(24*day + hour + 1)))
        index = (day * 24 + hour) % len(payloads) # wrap around and reuse values if needed

        measurement_request = MonitorMeasurementRequest(timestamp=score_time,metrics = [payloads[index][0], payloads[index][1]])
        
        daily_measurement_requests.append(measurement_request)
        
    response = wos_client.monitor_instances.measurements.add(
                                            monitor_instance_id=fairness_monitor_instance_id,
                                            monitor_measurement_request=daily_measurement_requests).result     

print('Finished')

Loading day 1
Loading day 2
Loading day 3
Loading day 4
Loading day 5
Loading day 6
Loading day 7
Finished


## 7.3 Insert historical quality metrics

In [164]:
measurements = [0.76, 0.78, 0.68, 0.72, 0.73, 0.77, 0.80]
for day in range(historyDays):
    quality_measurement_requests = []
    print('Loading day', day + 1)
    for hour in range(24):
        score_time = datetime.utcnow() + timedelta(hours=(-(24*day + hour + 1)))
        score_time = score_time.isoformat() + "Z"
        
        metric = {"area_under_roc": measurements[day]}
                
        measurement_request = MonitorMeasurementRequest(timestamp=score_time,metrics = [metric])
        quality_measurement_requests.append(measurement_request)
        
        
    response = wos_client.monitor_instances.measurements.add(
                                            monitor_instance_id=quality_monitor_instance_id,
                                            monitor_measurement_request=quality_measurement_requests).result    
    
print('Finished')

Loading day 1
Loading day 2
Loading day 3
Loading day 4
Loading day 5
Loading day 6
Loading day 7
Finished


## 7.4 Insert historical confusion matrixes

In [165]:
!rm history_quality_metrics.json
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_quality_metrics.json
!ls -lh history_quality_metrics.json

rm: cannot remove 'history_quality_metrics.json': No such file or directory
--2021-03-25 15:48:32--  https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_quality_metrics.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 80099 (78K) [text/plain]
Saving to: ‘history_quality_metrics.json’


2021-03-25 15:48:32 (15.8 MB/s) - ‘history_quality_metrics.json’ saved [80099/80099]

-rw-r-----. 1 wsuser watsonstudio 79K Mar 25 15:48 history_quality_metrics.json


In [166]:
from ibm_watson_openscale.base_classes.watson_open_scale_v2 import Source

with open('history_quality_metrics.json') as json_file:
    records = json.load(json_file)
    
for day in range(historyDays):
    index = 0
    cm_measurement_requests = []
    print('Loading day', day + 1)
    
    for hour in range(24):
        score_time = datetime.utcnow() + timedelta(hours=(-(24*day + hour + 1)))
        score_time = score_time.isoformat() + "Z"

        metric = records[index]['metrics']
        source = records[index]['sources']

        
        measurement_request = {"timestamp": score_time, "metrics": [metric], "sources": [source]}
        cm_measurement_requests.append(measurement_request)

        index+=1

    response = wos_client.monitor_instances.measurements.add(monitor_instance_id=quality_monitor_instance_id, monitor_measurement_request=cm_measurement_requests).result    

print('Finished')

Loading day 1
Loading day 2
Loading day 3
Loading day 4
Loading day 5
Loading day 6
Loading day 7
Finished


## 7.5 Insert historical performance metrics

In [167]:
target = Target(
        target_type=TargetTypes.INSTANCE,
        target_id=payload_data_set_id
    )


performance_monitor_instance_details = wos_client.monitor_instances.create(
            data_mart_id=data_mart_id,
            background_mode=False,
            monitor_definition_id=wos_client.monitor_definitions.MONITORS.PERFORMANCE.ID,
            target=target
).result
performance_monitor_instance_id = performance_monitor_instance_details.metadata.id




 Waiting for end of monitor instance creation df792fb5-137d-4066-bf79-14df981f9c34 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




In [168]:
for day in range(historyDays):
    performance_measurement_requests = []
    print('Loading day', day + 1)
    for hour in range(24):
        score_time = datetime.utcnow() + timedelta(hours=(-(24*day + hour + 1)))
        score_time = score_time.isoformat() + "Z"
        score_count = random.randint(60, 600)
        
        metric = {"record_count": score_count, "data_set_type": "scoring_payload"}
        
        measurement_request = {"timestamp": score_time, "metrics": [metric]}
        performance_measurement_requests.append(measurement_request)
        
    response = wos_client.monitor_instances.measurements.add(
                                            monitor_instance_id=performance_monitor_instance_id,
                                            monitor_measurement_request=performance_measurement_requests).result    

print('Finished')

Loading day 1
Loading day 2
Loading day 3
Loading day 4
Loading day 5
Loading day 6
Loading day 7
Finished


## 7.6 Insert historical drift measurements

In [169]:
!rm history_drift_measurement_*.json
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_drift_measurement_0.json
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_drift_measurement_1.json
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_drift_measurement_2.json
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_drift_measurement_3.json
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_drift_measurement_4.json
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_drift_measurement_5.json
!wget https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_drift_measurement_6.json
!ls -lh history_drift_measurement_*.json

rm: cannot remove 'history_drift_measurement_*.json': No such file or directory
--2021-03-25 15:48:48--  https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_drift_measurement_0.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 850981 (831K) [text/plain]
Saving to: ‘history_drift_measurement_0.json’


2021-03-25 15:48:48 (33.8 MB/s) - ‘history_drift_measurement_0.json’ saved [850981/850981]

--2021-03-25 15:48:49--  https://raw.githubusercontent.com/IBM/credit-risk-workshop-cpd/master/data/openscale/history_drift_measurement_1.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.co

In [170]:
for day in range(historyDays):
    drift_measurements = []

    with open("history_drift_measurement_{}.json".format(day), 'r') as history_file:
        drift_daily_measurements = json.load(history_file)
    print('Loading day', day + 1)

    #Historical data contains 8 records per day - each represents 3 hour drift window.
    
    for nb_window, records in enumerate(drift_daily_measurements):
        for record in records:
            window_start =  datetime.utcnow() + timedelta(hours=(-(24 * day + (nb_window+1)*3 + 1))) # first_payload_record_timestamp_in_window (oldest)
            window_end = datetime.utcnow() + timedelta(hours=(-(24 * day + nb_window*3 + 1)))# last_payload_record_timestamp_in_window (most recent)
            #modify start and end time for each record
            record['sources'][0]['data']['start'] = window_start.isoformat() + "Z"
            record['sources'][0]['data']['end'] = window_end.isoformat() + "Z"
            
            
            metric = record['metrics'][0]
            source = record['sources'][0]

            measurement_request = {"timestamp": window_start.isoformat() + "Z", "metrics": [metric], "sources": [source]}
            
            drift_measurements.append(measurement_request)
        
    response = wos_client.monitor_instances.measurements.add(
                                            monitor_instance_id=drift_monitor_instance_id,
                                            monitor_measurement_request=drift_measurements).result    

    
    print("Daily loading finished.")

Loading day 1
Daily loading finished.
Loading day 2
Daily loading finished.
Loading day 3
Daily loading finished.
Loading day 4
Daily loading finished.
Loading day 5
Daily loading finished.
Loading day 6
Daily loading finished.
Loading day 7
Daily loading finished.


## 7.7 Additional data to help debugging

In [171]:
print('Datamart:', data_mart_id)
print('Model:', model_uid)
print('Deployment:', deployment_uid)

Datamart: 00000000-0000-0000-0000-1614796138825286
Model: 007fba82-8912-4059-a358-850c4c8a37fc
Deployment: 6f074e16-84e8-4188-bd84-2627fd8680b5


## 7.8 Identify transactions for Explainability

Transaction IDs identified by the cells below can be copied and pasted into the Explainability tab of the OpenScale dashboard.

In [172]:
wos_client.data_sets.show_records(payload_data_set_id, limit=5)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44
,1044,96c3d602-cd53-4f55-9d83-b87257cdce99-1,0.0,3.0,1.0,2,less_100,2021-03-25T15:43:15.834Z,0.916959172667867,,"[20, [0, 1, 2, 7, 13, 14, 15, 16, 17, 18, 19], [1.0, 3.0, 3.0, 2.0, 4.0, 1044.0, 2.0, 2.0, 23.0, 1.0, 1.0]]",less_0,radio_tv,0.0,No Risk,2.0,0.0,skilled,none,0.0,3.0,none,4,23,all_credits_paid_back,0.0,2,0.0,0.0,0.0,0.0,yes,own,0.0,1,6f074e16-84e8-4188-bd84-2627fd8680b5,"[18.339183453357336, 1.660816546642661]",none,0.0,male,1,real_estate,1_to_4,"[0.916959172667867, 0.08304082733213307]"
,7702,96c3d602-cd53-4f55-9d83-b87257cdce99-10,0.0,3.0,0.0,4,greater_1000,2021-03-25T15:43:15.834Z,0.7495816914778399,,"[0.0, 0.0, 3.0, 3.0, 1.0, 0.0, 1.0, 3.0, 0.0, 0.0, 0.0, 1.0, 0.0, 43.0, 7702.0, 4.0, 4.0, 48.0, 2.0, 2.0]",no_checking,radio_tv,0.0,Risk,3.0,1.0,skilled,none,3.0,0.0,co-applicant,43,48,prior_payments_delayed,0.0,4,1.0,1.0,0.0,0.0,yes,own,0.0,2,6f074e16-84e8-4188-bd84-2627fd8680b5,"[5.0083661704432005, 14.991633829556799]",yes,1.0,male,2,unknown,4_to_7,"[0.25041830852216, 0.7495816914778399]"
,1725,96c3d602-cd53-4f55-9d83-b87257cdce99-100,1.0,2.0,0.0,2,100_to_500,2021-03-25T15:43:15.834Z,0.7473271430598172,,"[0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 23.0, 1725.0, 2.0, 2.0, 29.0, 1.0, 1.0]",no_checking,car_used,0.0,No Risk,1.0,0.0,skilled,stores,1.0,0.0,none,23,29,prior_payments_delayed,1.0,2,0.0,0.0,0.0,1.0,no,rent,1.0,1,6f074e16-84e8-4188-bd84-2627fd8680b5,"[14.946542861196344, 5.053457138803658]",none,0.0,female,1,car_other,1_to_4,"[0.7473271430598172, 0.25267285694018293]"
,6225,96c3d602-cd53-4f55-9d83-b87257cdce99-101,0.0,4.0,0.0,5,100_to_500,2021-03-25T15:43:15.834Z,0.7218430061671144,,"[0.0, 1.0, 4.0, 1.0, 2.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 41.0, 6225.0, 5.0, 5.0, 56.0, 2.0, 1.0]",no_checking,appliances,0.0,Risk,0.0,1.0,unskilled,stores,1.0,1.0,co-applicant,41,56,credits_paid_to_date,1.0,5,1.0,2.0,1.0,0.0,yes,own,0.0,2,6f074e16-84e8-4188-bd84-2627fd8680b5,"[5.5631398766577105, 14.436860123342289]",yes,1.0,male,1,savings_insurance,greater_7,"[0.2781569938328855, 0.7218430061671144]"
,4935,96c3d602-cd53-4f55-9d83-b87257cdce99-102,0.0,3.0,0.0,2,100_to_500,2021-03-25T15:43:15.834Z,0.7435127654360196,,"[20, [1, 2, 3, 4, 6, 13, 14, 15, 16, 17, 18, 19], [1.0, 3.0, 1.0, 3.0, 1.0, 22.0, 4935.0, 2.0, 3.0, 34.0, 1.0, 1.0]]",no_checking,radio_tv,0.0,No Risk,0.0,0.0,skilled,none,1.0,1.0,co-applicant,22,34,credits_paid_to_date,0.0,3,0.0,3.0,0.0,0.0,yes,own,0.0,1,6f074e16-84e8-4188-bd84-2627fd8680b5,"[14.870255308720393, 5.1297446912796065]",none,1.0,male,1,savings_insurance,less_1,"[0.7435127654360196, 0.2564872345639803]"


## Congratulations!

You have finished the hands-on lab for IBM Watson OpenScale. You can now view the OpenScale Dashboard: (https://url-to-your-cp4d-cluster/aiopenscale). Click on the tile for the German Credit model to see fairness, accuracy, and performance monitors. Click on the timeseries graph to get detailed information on transactions during a specific time window.
