<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Watson Machine Learning

This notebook should be run using with **Python 3.x with Spark** runtime environment. **If you are viewing this in Watson Studio and do not see Python 3.x with Spark in the upper right corner of your screen, please update the runtime now.** It requires service credentials for the following services:
  * Watson OpenScale
  * Watson Machine Learning
  * DB2
  
The notebook will train, create and deploy a German Credit Risk model, configure OpenScale to monitor that deployment, and inject seven days' worth of historical records and measurements for viewing in the OpenScale Insights dashboard.

### Contents

- [Setup](#setup)
- [Model building and deployment](#model)
- [AI Function wrapper and deployment of model](#function)
- [OpenScale configuration using AI function](#openscale)
- [Quality monitor and feedback logging](#quality)
- [Fairness, drift monitoring and explanations](#fairness)
- [Custom monitors and metrics](#custom)
- [Payload analytics](#analytics)
- [Setup business Application](#BKPI)
- [Historical data](#historical)
- [Business Application and Correlation monitor run](#correlation)

# Setup <a name="setup"></a>

## Package installation

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
#!rm -rf /home/spark/shared/user-libs/python3.6*

!pip install --upgrade ibm-ai-openscale --no-cache | tail -n 1
#!pip install --upgrade watson-machine-learning-client-V4 | tail -n 1
!pip install --upgrade numpy --no-cache | tail -n 1
!pip install --upgrade SciPy --no-cache | tail -n 1
!pip install lime --no-cache | tail -n 1
!pip install pixiedust | tail -n 1

[31mERROR: tensorflow 1.15.2 requires opt-einsum>=2.3.2, which is not installed.[0m
[31mERROR: onnx 1.5.0 requires typing-extensions>=3.6.2.1, which is not installed.[0m
[31mERROR: hdijupyterutils 0.12.9 requires jupyter>=1, which is not installed.[0m
[31mERROR: brunel 2.3 requires JPype1-py3, which is not installed.[0m
[31mERROR: seaborn 0.10.0 has requirement matplotlib>=2.1.2, but you'll have matplotlib 2.1.0 which is incompatible.[0m
Successfully installed numpy-1.18.2
[31mERROR: seaborn 0.10.0 has requirement matplotlib>=2.1.2, but you'll have matplotlib 2.1.0 which is incompatible.[0m
Successfully installed SciPy-1.4.1


In [3]:
!pip install pyspark==2.3.0



### Action: restart the kernel!

## Configure credentials

- WOS_CREDENTIALS (CP4D)
- WML_CREDENTIALS (CP4D)
- DATABASE_CREDENTIALS (DB2 on CP4D)
- SCHEMA_NAME

In [4]:
WOS_CREDENTIALS = {
    "url": "https://namespace1-cpd-namespace1.apps.rsmar16.os.fyre.ibm.com",
    "username": "admin",
    "password": "password"
}

In [5]:
# The code was removed by Watson Studio for sharing.

In [6]:
WML_CREDENTIALS = WOS_CREDENTIALS.copy()
WML_CREDENTIALS['instance_id']='openshift'
WML_CREDENTIALS['version']='2.5.0'

In [7]:
DATABASE_CREDENTIALS = {
    "jdbcurl": "jdbc:db2://10.16.3.46:50001/SAMPLE",
    "hostname": "rsmar16-inf.fyre.ibm.com",
    "username": "db2inst1",
    "password": "C0wTiger",
    "port": 50000,
    "db": "SAMPLE",
#     "dsn": "***",
#     "uri": "***"
}

In [8]:
# The code was removed by Watson Studio for sharing.

### Action: put created schema name below.

In [9]:
SCHEMA_NAME = 'AIOSFASTPATHICP'

## Run the notebook

At this point, the notebook is ready to run. You can either run the cells one at a time, or click the **Kernel** option above and select **Restart and Run All** to run all the cells.

# Model building and deployment <a name="model"></a>

In this section you will learn how to train Spark MLLib model and next deploy it as web-service using Watson Machine Learning service.

## Load the training data from github

In [10]:
import pandas as pd
pd_data = pd.read_csv('/project_data/data_asset/training_data_bias_6.csv')
pd_data.head()

Unnamed: 0,CheckingStatus,LoanDuration,CreditHistory,LoanPurpose,LoanAmount,ExistingSavings,EmploymentDuration,InstallmentPercent,Sex,OthersOnLoan,...,OwnsProperty,Age,InstallmentPlans,Housing,ExistingCreditsCount,Job,Dependents,Telephone,ForeignWorker,Risk
0,0_to_200,31,credits_paid_to_date,other,1889,100_to_500,less_1,3,female,none,...,savings_insurance,32,none,own,1,skilled,1,none,yes,No Risk
1,less_0,18,credits_paid_to_date,car_new,462,less_100,1_to_4,2,female,none,...,savings_insurance,37,stores,own,2,skilled,1,none,yes,No Risk
2,less_0,15,prior_payments_delayed,furniture,250,less_100,1_to_4,2,male,none,...,real_estate,28,none,own,2,skilled,1,yes,no,No Risk
3,0_to_200,28,credits_paid_to_date,retraining,3693,less_100,greater_7,3,male,none,...,savings_insurance,32,none,own,1,skilled,1,none,yes,No Risk
4,no_checking,28,prior_payments_delayed,education,6235,500_to_1000,greater_7,3,male,none,...,unknown,57,none,own,2,skilled,1,none,yes,Risk


In [8]:
import ibm_db

In [13]:
dsn_driver = "jdbc:db2://10.16.3.46:50001/SAMPLE"
dsn_database = "SAMPLE"            # e.g. "BLUDB"
dsn_hostname = "rsmar16-inf.fyre.ibm.com" # e.g.: "awh-yp-small03.services.dal.bluemix.net"
dsn_port = "50000"                # e.g. "50000" 
dsn_protocol = "TCPIP"            # i.e. "TCPIP"
dsn_uid = "db2inst1"        # e.g. "dash104434"
dsn_pwd = "C0wTiger"

In [19]:
#Create database connection
dsn = (
    "DRIVER={0};"
    "DATABASE={1};"
    "HOSTNAME={2};"
    "PORT={3};"
    "PROTOCOL={4};"
    "UID={5};"
    "PWD={6};").format(dsn_driver, dsn_database, dsn_hostname, dsn_port, dsn_protocol, dsn_uid, dsn_pwd)

print(dsn)
try:
    conn = ibm_db.connect(dsn, "", "")
    print ("Connected!")
 
except:
    print ("Unable to connect to database")

DRIVER=jdbc:db2://10.16.3.46:50001/SAMPLE;DATABASE=SAMPLE;HOSTNAME=rsmar16-inf.fyre.ibm.com;PORT=50000;PROTOCOL=TCPIP;UID=db2inst1;PWD=C0wTiger;
Connected!


In [85]:
DATABASE_CREDENTIALS = (
    "DRIVER=jdbc:db2://10.16.3.46:50000/SAMPLE;"
    "DATABASE=SAMPLE;"
    "HOSTNAME=rsmar16-inf.fyre.ibm.com;"
    "PORT= 50000;"
    "PROTOCOL=TCPIP;"
    "UID=db2inst1;"
    "PWD=C0wTiger;")

print(DATABASE_CREDENTIALS)

try:
    conn = ibm_db.connect(DATABASE_CREDENTIALS, "", "")
    print ("Connected!")
 
except:
    print ("Unable to connect to database")

DRIVER=jdbc:db2://10.16.3.46:50000/SAMPLE;DATABASE=SAMPLE;HOSTNAME=rsmar16-inf.fyre.ibm.com;PORT= 50000;PROTOCOL=TCPIP;UID=db2inst1;PWD=C0wTiger;
Connected!


In [41]:
stmt = ibm_db.exec_immediate(conn, 'SELECT count(*) FROM "AIOSFASTPATHICP"."Payload_fc0c9935-5431-4b07-8f5e-3c9696804149"')
tup = ibm_db.fetch_tuple(stmt)
tup

(1,)

In [42]:
tuple_of_tuples = tuple([tuple(x) for x in pd_data.values])

In [43]:
sql = "INSERT INTO AIOSFASTPATHICP.GCR_TRAIN_DATA VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
stmt = ibm_db.prepare(conn, sql)
ibm_db.execute_many(stmt, tuple_of_tuples)

5000

In [246]:
# !rm german_credit_data_biased_training.csv
# !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/german_credit_data_biased_training.csv

In [12]:
from pyspark.sql import SparkSession
import json

spark = SparkSession.builder.getOrCreate()
pd_data = pd.read_csv("/project_data/data_asset/training_data_bias_6.csv", sep=",", header=0)
df_data = spark.read.csv(path="/project_data/data_asset/training_data_bias_6.csv", sep=",", header=True, inferSchema=True)
df_data.head()

Row(CheckingStatus='0_to_200', LoanDuration=31, CreditHistory='credits_paid_to_date', LoanPurpose='other', LoanAmount=1889, ExistingSavings='100_to_500', EmploymentDuration='less_1', InstallmentPercent=3, Sex='female', OthersOnLoan='none', CurrentResidenceDuration=3, OwnsProperty='savings_insurance', Age=32, InstallmentPlans='none', Housing='own', ExistingCreditsCount=1, Job='skilled', Dependents=1, Telephone='none', ForeignWorker='yes', Risk='No Risk')

## Explore data

In [12]:
df_data.printSchema()

root
 |-- CheckingStatus: string (nullable = true)
 |-- LoanDuration: integer (nullable = true)
 |-- CreditHistory: string (nullable = true)
 |-- LoanPurpose: string (nullable = true)
 |-- LoanAmount: integer (nullable = true)
 |-- ExistingSavings: string (nullable = true)
 |-- EmploymentDuration: string (nullable = true)
 |-- InstallmentPercent: integer (nullable = true)
 |-- Sex: string (nullable = true)
 |-- OthersOnLoan: string (nullable = true)
 |-- CurrentResidenceDuration: integer (nullable = true)
 |-- OwnsProperty: string (nullable = true)
 |-- Age: integer (nullable = true)
 |-- InstallmentPlans: string (nullable = true)
 |-- Housing: string (nullable = true)
 |-- ExistingCreditsCount: integer (nullable = true)
 |-- Job: string (nullable = true)
 |-- Dependents: integer (nullable = true)
 |-- Telephone: string (nullable = true)
 |-- ForeignWorker: string (nullable = true)
 |-- Risk: string (nullable = true)



In [13]:
print("Number of records: " + str(df_data.count()))

Number of records: 5000


## Visualize data with pixiedust

In [14]:
import pixiedust

Pixiedust database opened successfully
Table VERSION_TRACKER created successfully
Table METRICS_TRACKER created successfully

Share anonymous install statistics? (opt-out instructions)

PixieDust will record metadata on its environment the next time the package is installed or updated. The data is anonymized and aggregated to help plan for future releases, and records only the following values:

{
   "data_sent": currentDate,
   "runtime": "python",
   "application_version": currentPixiedustVersion,
   "space_id": nonIdentifyingUniqueId,
   "config": {
       "repository_id": "https://github.com/ibm-watson-data-lab/pixiedust",
       "target_runtimes": ["Data Science Experience"],
       "event_id": "web",
       "event_organizer": "dev-journeys"
   }
}
You can opt out by calling pixiedust.optOut() in a new cell.


[31mPixiedust runtime updated. Please restart kernel[0m
Table SPARK_PACKAGES created successfully
Table USER_PREFERENCES created successfully
Table service_connections created successfully


In [15]:
display(df_data)

## Create a model

In [13]:
spark_df = df_data
(train_data, test_data) = spark_df.randomSplit([0.9, 0.1], 24)


print("Number of records for training: " + str(train_data.count()))
print("Number of records for evaluation: " + str(test_data.count()))

spark_df.printSchema()

Number of records for training: 4494
Number of records for evaluation: 506
root
 |-- CheckingStatus: string (nullable = true)
 |-- LoanDuration: integer (nullable = true)
 |-- CreditHistory: string (nullable = true)
 |-- LoanPurpose: string (nullable = true)
 |-- LoanAmount: integer (nullable = true)
 |-- ExistingSavings: string (nullable = true)
 |-- EmploymentDuration: string (nullable = true)
 |-- InstallmentPercent: integer (nullable = true)
 |-- Sex: string (nullable = true)
 |-- OthersOnLoan: string (nullable = true)
 |-- CurrentResidenceDuration: integer (nullable = true)
 |-- OwnsProperty: string (nullable = true)
 |-- Age: integer (nullable = true)
 |-- InstallmentPlans: string (nullable = true)
 |-- Housing: string (nullable = true)
 |-- ExistingCreditsCount: integer (nullable = true)
 |-- Job: string (nullable = true)
 |-- Dependents: integer (nullable = true)
 |-- Telephone: string (nullable = true)
 |-- ForeignWorker: string (nullable = true)
 |-- Risk: string (nullable = 

The code below creates a Random Forest Classifier with Spark, setting up string indexers for the categorical features and the label column. Finally, this notebook creates a pipeline including the indexers and the model, and does an initial Area Under ROC evaluation of the model.

In [14]:
from pyspark import SparkContext, SQLContext
from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier,GBTClassifier
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml.feature import StringIndexer, VectorAssembler, IndexToString
from pyspark.sql.types import StructType, DoubleType, StringType, ArrayType

In [15]:
from pyspark.ml.feature import OneHotEncoder, StringIndexer, IndexToString, VectorAssembler
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml import Pipeline, Model

features = [x for x in spark_df.columns if x != 'Risk']
categorical_features = ['CheckingStatus', 'CreditHistory', 'LoanPurpose', 'ExistingSavings', 'EmploymentDuration', 'Sex', 'OthersOnLoan', 'OwnsProperty', 'InstallmentPlans', 'Housing', 'Job', 'Telephone', 'ForeignWorker']
categorical_num_features = [x + '_IX' for x in categorical_features]
continuous_features = [x for x in features if x not in categorical_features]

si_list = [StringIndexer(inputCol=nm_in, outputCol=nm_out) for nm_in, nm_out in zip(categorical_features, categorical_num_features)]

In [16]:
si_Label = StringIndexer(inputCol="Risk", outputCol="label").fit(spark_df)
label_converter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=si_Label.labels)

In [17]:
va_features = VectorAssembler(inputCols=categorical_num_features + continuous_features, outputCol="features")

In [18]:
classifier=GBTClassifier(featuresCol="features")

In [19]:
pipeline = Pipeline(stages= si_list + [si_Label, va_features, classifier, label_converter])

model = pipeline.fit(train_data)

**Note**: If you want filter features from model output please replace `*` with feature names to be retained in `SQLTransformer` statement.

In [20]:
predictions = model.transform(test_data)
evaluatorDT = BinaryClassificationEvaluator(rawPredictionCol="prediction",  metricName='areaUnderROC')
area_under_curve = evaluatorDT.evaluate(predictions)

evaluatorDT = BinaryClassificationEvaluator(rawPredictionCol="prediction",  metricName='areaUnderPR')
area_under_PR = evaluatorDT.evaluate(predictions)
#default evaluation is areaUnderROC
print("areaUnderROC = %g" % area_under_curve, "areaUnderPR = %g" % area_under_PR)

areaUnderROC = 0.701143 areaUnderPR = 0.661933


In [21]:
# extra code: evaluate more metrics by exporting them into pandas and numpy
from sklearn.metrics import classification_report
y_pred = predictions.toPandas()['prediction']
y_pred = ['Risk' if pred == 1.0 else 'No Risk' for pred in y_pred]
y_test = test_data.toPandas()['Risk']
print(classification_report(y_test, y_pred, target_names=['Risk', 'No Risk']))

              precision    recall  f1-score   support

        Risk       0.72      0.85      0.78       291
     No Risk       0.73      0.55      0.63       215

    accuracy                           0.72       506
   macro avg       0.73      0.70      0.70       506
weighted avg       0.72      0.72      0.72       506



## Publish the model

In this section, the notebook uses Watson Machine Learning to save the model (including the pipeline) to the WML instance. Previous versions of the model are removed so that the notebook can be run again, resetting all data for another demo.

In [22]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient
import json

wml_client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)

# this is a new feature in CP4D, in order to deploy a model, you would have to create different
# deployment spaces and deploy your models there. You can list all the spaces using the .list()
# function, or you can create new spaces by going to CP4D menu on top left corner --> analyze -->
# analytics deployments --> New Deployment Space. Once you know which space you want to deploy
# in, simply use the GUID of the space as argument for .set.default_space() function below
wml_client.spaces.list()
# wml_client.set.default_space('323608a7-21d3-48c8-8c62-dcf4bfe06e17')

------------------------------------  ---------------------------  ------------------------
GUID                                  NAME                         CREATED
5f722cbf-f361-4638-9711-03f0935f7bd2  aios_test                    2020-04-09T16:16:27.570Z
df0fbe87-4bee-4010-8cbe-94215cf146ae  openscale-fast-path-preprod  2020-04-09T16:13:48.032Z
c251a6d0-c6ea-4608-9543-6415a9e954be  openscale-fast-path          2020-04-09T16:13:42.707Z
------------------------------------  ---------------------------  ------------------------


In [23]:
space_name = "aios_test"
spaces = wml_client.spaces.get_details()['resources']
space_id = None
for space in spaces:
    if space['entity']['name'] == space_name:
        space_id = space["metadata"]["guid"]
if space_id is None:
    space_id = wml_client.spaces.store(
        meta_props={wml_client.spaces.ConfigurationMetaNames.NAME: space_name})["metadata"]["guid"]
wml_client.set.default_space(space_id)

'SUCCESS'

### Remove existing model and deployment

In [24]:
MODEL_NAME = "German Risk Model test"
DEPLOYMENT_NAME = "German Risk Deployment - test"

In [25]:
deployment_details = wml_client.deployments.get_details()
for deployment in deployment_details['resources']:
    deployment_id = deployment['metadata']['guid']
    model_id = deployment['entity']['asset']['href'].split('/')[3].split('?')[0]
    if deployment['entity']['name'] == DEPLOYMENT_NAME:
        print('Deleting deployment id', deployment_id)
        wml_client.deployments.delete(deployment_id)
        print('Deleting model id', model_id)
        wml_client.repository.delete(model_id)
wml_client.repository.list_models()

------------------------------------  --------------------------  ------------------------  ---------
GUID                                  NAME                        CREATED                   TYPE
6a01cc1c-e5c9-4d69-98d1-3a6cf18c4451  German Risk Model function  2020-04-09T16:20:15.002Z  mllib_2.3
------------------------------------  --------------------------  ------------------------  ---------


In [31]:
# training_data_reference = training_data_reference = {
#     "type":"***",
#             "name": "***",
#             "connection": DATABASE_CREDENTIALS,
#             "location": {
#                 "tablename": "***",
#                 "type": "***"
#             }
#         }

In [32]:
# The code was removed by Watson Studio for sharing.

In [26]:
model = pipeline.fit(df_data)

In [27]:
wml_models = wml_client.repository.get_model_details()
model_uid = None

for model_in in wml_models['resources']:
    if MODEL_NAME == model_in['entity']['name']:
        model_uid = model_in['metadata']['guid']
        break

if model_uid is None:
    print("Storing model ...")
    metadata = {
        wml_client.repository.ModelMetaNames.NAME: MODEL_NAME,
        wml_client.repository.ModelMetaNames.TYPE: 'mllib_2.3',
        wml_client.repository.ModelMetaNames.RUNTIME_UID: 'spark-mllib_2.3',
    }

    published_model_details = wml_client.repository.store_model(model, metadata, training_data=df_data,  pipeline=pipeline)
    model_uid = wml_client.repository.get_model_uid(published_model_details)
    print("Done")

Storing model ...
Done


In [28]:
model_uid

'76c52e9e-585d-46c3-a67e-ebf4f2d61ad3'

In [29]:
wml_client.repository.list_models()

------------------------------------  --------------------------  ------------------------  ---------
GUID                                  NAME                        CREATED                   TYPE
76c52e9e-585d-46c3-a67e-ebf4f2d61ad3  German Risk Model test      2020-04-09T19:01:42.002Z  mllib_2.3
6a01cc1c-e5c9-4d69-98d1-3a6cf18c4451  German Risk Model function  2020-04-09T16:20:15.002Z  mllib_2.3
------------------------------------  --------------------------  ------------------------  ---------


## Deploy the model

The next section of the notebook deploys the model as a RESTful web service in Watson Machine Learning. The deployed model will have a scoring URL you can use to send data to the model for predictions.

In [30]:
wml_deployments = wml_client.deployments.get_details()
deployment_uid = None
for deployment in wml_deployments['resources']:
    if DEPLOYMENT_NAME == deployment['entity']['name']:
        deployment_uid = deployment['metadata']['guid']
        break

if deployment_uid is None:
    print("Deploying model...")
    meta_props = {
        wml_client.deployments.ConfigurationMetaNames.NAME: DEPLOYMENT_NAME,
        wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
    }
    deployment = wml_client.deployments.create(artifact_uid=model_uid, meta_props=meta_props)
    deployment_uid = wml_client.deployments.get_uid(deployment)
    
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))

Deploying model...


#######################################################################################

Synchronous deployment creation for uid: '76c52e9e-585d-46c3-a67e-ebf4f2d61ad3' started

#######################################################################################


initializing.
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='bbf74a08-38c1-42ab-9c4d-3954ade0c2f0'
------------------------------------------------------------------------------------------------


Model id: 76c52e9e-585d-46c3-a67e-ebf4f2d61ad3
Deployment id: bbf74a08-38c1-42ab-9c4d-3954ade0c2f0


In [32]:
#params for model function definition
ai_params = {"wml_credentials": WML_CREDENTIALS, 
             "deployment_uid": deployment_uid,
             "space_name":"aios_test"
            }

In [33]:
# Generate function
def score_generator(params=ai_params):
    
    from watson_machine_learning_client import WatsonMachineLearningAPIClient

    wml_credentials = {'url': 'https://namespace1-cpd-namespace1.apps.rsmar16.os.fyre.ibm.com',
                         'username': 'admin',
                         'password': 'password',
                         'instance_id': 'openshift',
                         'version': '2.5.0'}
    deployment_uid = params["deployment_uid"]
    space_name=params["space_name"]

    client = WatsonMachineLearningAPIClient(wml_credentials)
    
    spaces = client.spaces.get_details()['resources']
    space_id = None
    for space in spaces:
        if space['entity']['name'] == space_name:
            space_id = space["metadata"]["guid"]
    if space_id is None:
        space_id = wml_client.spaces.store(
            meta_props={client.spaces.ConfigurationMetaNames.NAME: space_name})["metadata"]["guid"]
    client.set.default_space(space_id)

    def score(payload):
        """AI function with model version.

        Example:
          {"fields":["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
            "values": [["no_checking",13,"credits_paid_to_date","car_new",1343,"100_to_500","1_to_4",2,"female","none",3,"savings_insurance",46,"none","own",2,"skilled",1,"none","yes"]]}
        """
        
        scores = client.deployments.score(deployment_uid, payload)
        #print (scores)
        return scores

    return score

In [34]:
#sample scoring
fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"]
values = [
  ["no_checking",13,"credits_paid_to_date","car_new",1343,"100_to_500","1_to_4",2,"female","none",3,"savings_insurance",46,"none","own",2,"skilled",1,"none","yes"],
]

In [35]:
payload_scoring = {"fields": fields,"values": values}
payload = {
    wml_client.deployments.ScoringMetaNames.INPUT_DATA: [payload_scoring]
}

In [36]:
score = score_generator()
scores_ai = score(payload)
print(scores_ai)

{'predictions': [{'fields': ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'CheckingStatus_IX', 'CreditHistory_IX', 'LoanPurpose_IX', 'ExistingSavings_IX', 'EmploymentDuration_IX', 'Sex_IX', 'OthersOnLoan_IX', 'OwnsProperty_IX', 'InstallmentPlans_IX', 'Housing_IX', 'Job_IX', 'Telephone_IX', 'ForeignWorker_IX', 'label', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'], 'values': [['no_checking', 13, 'credits_paid_to_date', 'car_new', 1343, '100_to_500', '1_to_4', 2, 'female', 'none', 3, 'savings_insurance', 46, 'none', 'own', 2, 'skilled', 1, 'none', 'yes', 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, [20, [1, 3, 5, 13, 14, 15, 16, 17, 18, 19], [1.0, 1.0, 1.0, 13.

In [37]:
#Function store
meta_data = {
    wml_client.repository.FunctionMetaNames.NAME: 'German Credit Risk Model - AI Function2',
}

function_details = wml_client.repository.store_function(meta_props=meta_data, function=score_generator)

Using default runtime with uid: ai-function_0.1-py3.6


In [53]:
wml_client.repository.list_functions()

------------------------------------  ---------------------------------------  ------------------------  ------
GUID                                  NAME                                     CREATED                   TYPE
fc4a2b09-b942-4e57-ab3d-773bf2c885a8  German Credit Risk Model - AI Function2  2020-04-09T19:05:46.002Z  python
bf5cd36e-73a5-4926-b46f-e962ebf3b959  German Credit Risk Model - AI Function   2020-04-09T16:24:04.002Z  python
------------------------------------  ---------------------------------------  ------------------------  ------


In [55]:
function_uid

'fc4a2b09-b942-4e57-ab3d-773bf2c885a8'

In [38]:
function_uid = wml_client.repository.get_function_uid(function_details)
meta_props = {
        wml_client.deployments.ConfigurationMetaNames.NAME: 'German Credit Risk Model - AI Function Deployment2',
        wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
    }
deployment_function = wml_client.deployments.create(artifact_uid=function_uid, meta_props=meta_props)
deployment_func_uid = wml_client.deployments.get_uid(deployment_function)



#######################################################################################

Synchronous deployment creation for uid: 'fc4a2b09-b942-4e57-ab3d-773bf2c885a8' started

#######################################################################################


initializing......
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='7f416433-d9e5-436a-8351-fa6c6146b01f'
------------------------------------------------------------------------------------------------




In [80]:
deployment_fuid = wml_client.deployments.get_uid(deployment_function)
ai_function_scoring_endpoint = deployment_function['entity']['status']['online_url']['url']
print(ai_function_scoring_endpoint)

https://namespace1-cpd-namespace1.apps.rsmar16.os.fyre.ibm.com/v4/deployments/7f416433-d9e5-436a-8351-fa6c6146b01f/predictions


In [81]:
response = wml_client.deployments.score(deployment_fuid, payload)
response

{'predictions': [{'fields': ['CheckingStatus',
    'LoanDuration',
    'CreditHistory',
    'LoanPurpose',
    'LoanAmount',
    'ExistingSavings',
    'EmploymentDuration',
    'InstallmentPercent',
    'Sex',
    'OthersOnLoan',
    'CurrentResidenceDuration',
    'OwnsProperty',
    'Age',
    'InstallmentPlans',
    'Housing',
    'ExistingCreditsCount',
    'Job',
    'Dependents',
    'Telephone',
    'ForeignWorker',
    'CheckingStatus_IX',
    'CreditHistory_IX',
    'LoanPurpose_IX',
    'ExistingSavings_IX',
    'EmploymentDuration_IX',
    'Sex_IX',
    'OthersOnLoan_IX',
    'OwnsProperty_IX',
    'InstallmentPlans_IX',
    'Housing_IX',
    'Job_IX',
    'Telephone_IX',
    'ForeignWorker_IX',
    'label',
    'features',
    'rawPrediction',
    'probability',
    'prediction',
    'predictedLabel'],
   'values': [['less_0',
     17,
     'credits_paid_to_date',
     'car_used',
     250,
     'less_100',
     '4_to_7',
     3,
     'female',
     'none',
     2,
     'r

# Configure OpenScale <a name="openscale"></a>

The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [38]:
#!pip install ibm_ai_openscale

In [82]:
from ibm_ai_openscale import APIClient4ICP
from ibm_ai_openscale.engines import *
from ibm_ai_openscale.utils import *
from ibm_ai_openscale.supporting_classes import PayloadRecord, Feature
from ibm_ai_openscale.supporting_classes.enums import *

In [83]:
ai_client = APIClient4ICP(WOS_CREDENTIALS)
ai_client.version

'2.2.1'

## Create datamart

### Set up datamart

Watson OpenScale uses a database to store payload logs and calculated metrics. If an OpenScale datamart exists in Db2, the existing datamart will be used and no data will be overwritten.

Prior instances of the German Credit model will be removed from OpenScale monitoring.

In [43]:
try:
    data_mart_details = ai_client.data_mart.get_details()
    print('Using existing external datamart')
except:
    print('Setting up external datamart')
    ai_client.data_mart.setup(db_credentials=DATABASE_CREDENTIALS, schema=SCHEMA_NAME)

Using existing external datamart


In [44]:
data_mart_details = ai_client.data_mart.get_details()
data_mart_details

{'database_configuration': {'credentials': {'db': 'SAMPLE',
   'db_type': 'db2',
   'hostname': 'rsmar16-inf.fyre.ibm.com',
   'password': 'C0wTiger',
   'port': 50000,
   'username': 'db2inst1'},
  'database_type': 'db2',
  'location': {'schema': 'AIOSFASTPATHICP'},
  'name': 'db2'},
 'internal_database': False,
 'service_instance_crn': 'N/A',
 'status': {'state': 'active'}}

## Bind machine learning engines

Watson OpenScale needs to be bound to the Watson Machine Learning instance to capture payload data into and out of the model. If this binding already exists, this code will output a warning message and use the existing binding.

In [45]:
binding_uid = ai_client.data_mart.bindings.add('WML instance', WatsonMachineLearningInstance4ICP())
if binding_uid is None:
    binding_uid = ai_client.data_mart.bindings.get_details()['service_bindings'][0]['metadata']['guid']
bindings_details = ai_client.data_mart.bindings.get_details()
ai_client.data_mart.bindings.list()

Status code: 409, body: {"errors":[{"code":"AIQCS0010W","message":"Service Binding with this id is already defined"}],"trace":"YmU4NGM3NjUtOWI2Yy00Y2Y2LWI1ZDQtMWI5ODE2M2Q0YjM0"}


0,1,2,3
964b9601-7a80-11ea-98e7-5d9886a1e08e,New provider,watson_machine_learning,2020-04-09T16:39:02.680Z
998,WML pre_production,watson_machine_learning,2020-04-09T16:14:24.295Z
999,WML production,watson_machine_learning,2020-04-09T16:14:21.028Z


In [46]:
print(binding_uid)

999


In [52]:
ai_client.data_mart.bindings.list_assets(binding_uid=binding_uid)

0,1,2,3,4,5,6
fc4a2b09-b942-4e57-ab3d-773bf2c885a8,German Credit Risk Model - AI Function2,2020-04-09T19:05:46.002Z,function,python,999,False
76c52e9e-585d-46c3-a67e-ebf4f2d61ad3,German Risk Model test,2020-04-09T19:01:42.002Z,model,mllib_2.3,999,False
bf5cd36e-73a5-4926-b46f-e962ebf3b959,German Credit Risk Model - AI Function,2020-04-09T16:24:04.002Z,function,python,999,True
2d9b3c46-a7e3-450c-b5d0-75c0c2292998,GermanCreditRiskModelICP,2020-04-09T16:21:20.002Z,model,mllib_2.3,999,True
6a01cc1c-e5c9-4d69-98d1-3a6cf18c4451,German Risk Model function,2020-04-09T16:20:15.002Z,model,mllib_2.3,999,False
e7336771-f5f3-4d81-b361-1a859e7ec55f,GermanCreditRiskModelPreProdICP,2020-04-09T16:17:40.002Z,model,mllib_2.3,999,False
acac92d8-8d73-4161-996b-6425ff75c373,GermanCreditRiskModelChallengerICP,2020-04-09T16:14:32.002Z,model,scikit-learn_0.20,999,True


## Subscriptions

### Remove existing credit risk subscriptions

This code removes previous subscriptions to the German Credit model to refresh the monitors with the new model and new data.

In [47]:
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
for subscription in subscriptions_uids:
    sub_name = ai_client.data_mart.subscriptions.get_details(subscription)['entity']['asset']['name']
    if sub_name == MODEL_NAME:
        ai_client.data_mart.subscriptions.delete(subscription)
        print('Deleted existing subscription for', MODEL_NAME)

This code creates the model subscription in OpenScale using the Python client API. Note that we need to provide the model unique identifier, and some information about the model itself.

In [42]:
# training_data_reference= {
#   "name": "DB2 training data reference",
#   "connection": {
#     "hostname": "<db2_hostname>>",
#     "password": "<db2_password>,
#     "database_name": "BLUDB",
#     "port": 50001,
#     "ssl": true,
#     "username": "<db2_username>"
#   },
#   "location": {
#     "schema_name": "<training_data_schema_name>",
#     "table_name": "<training_data_table_name>"
#   },
#   "type": "db2"
# }

In [48]:
training_data_reference= {
    "type": "cos",
    "connection": {
        "url": "https://s3.us.cloud-object-storage.appdomain.cloud",
        "resource_instance_id": "crn:v1:bluemix:public:cloud-object-storage:global:a/e0b56432b1f1bd804706dc29b8a89ca1:57b5eb6e-7b5d-4b90-a8e8-3736129c9010::",
        "api_key": "917Z-0MQzVpgdqkHClXbfDchrXZ_bl7kbCxZSkynzsLP",
        "iam_url": "https://iam.ng.bluemix.net/oidc/token"
    },
    "location": {
        "bucket": "resources-donotdelete-pr-lbyypdyr2le8tz",
        "file_name": "training_data.csv",
        "firstlineheader": True,
        "file_format": 'csv'
    }
}

In [95]:
# DATABASE_CREDENTIALS = {
#     #"jdbcurl": "jdbc:db2://10.16.3.46:50000/SAMPLE",
#     "hostname": "rsmar16-inf.fyre.ibm.com",
#     "username": "db2inst1",
#     "password": "C0wTiger",
#     "port": 50000,
#     "db": "SAMPLE",
# #     "dsn": "***",
# #     "uri": "***"
# }

In [96]:
# training_data_reference = {
#     "type":"db2",
#             "name": "DB2 training data reference",
#             "connection": DATABASE_CREDENTIALS,
#             "location": {
#                 "schema_name": SCHEMA_NAME,
#                 "tablename": "GCR_TRAIN_DATA"
                
#             }
#         }
# training_data_reference

{'type': 'db2',
 'name': 'DB2 training data reference',
 'connection': {'hostname': 'rsmar16-inf.fyre.ibm.com',
  'username': 'db2inst1',
  'password': 'C0wTiger',
  'port': 50000,
  'db': 'SAMPLE'},
 'location': {'schema_name': 'AIOSFASTPATHICP', 'tablename': 'GCR_TRAIN_DATA'}}

In [56]:
subscription = ai_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(
    function_uid,
    problem_type=ProblemType.BINARY_CLASSIFICATION,
    input_data_type=InputDataType.STRUCTURED,
    label_column='Risk',
    prediction_column='predictedLabel',
    transaction_id_column='transaction_id',
    probability_column='probability',
    feature_columns = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
    categorical_columns = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"],
    training_data_reference=training_data_reference
))

if subscription is None:
    print('Subscription already exists; get the existing one')
    subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
    for sub in subscriptions_uids:
        if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:
            subscription = ai_client.data_mart.subscriptions.get(sub)

Get subscription list

In [57]:
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
ai_client.data_mart.subscriptions.list()

0,1,2,3,4
71426d73-aa90-473e-ab45-2ae8d1842796,German Credit Risk Model - AI Function2,function,999,2020-04-09T19:16:42.551Z
572b37e6-7c3d-45e7-b6a9-5b0499ecab5a,German Credit Risk Model - AI Function,function,964b9601-7a80-11ea-98e7-5d9886a1e08e,2020-04-09T16:39:29.664Z
57efcac7-d089-4c74-8016-cf4b55df379a,GermanCreditRiskModelICP,model,999,2020-04-09T16:21:59.243Z
d65a62bb-7a11-4f9a-aff2-8b7a036e9e5f,GermanCreditRiskModelChallengerICP,model,998,2020-04-09T16:15:26.555Z


In [58]:
subscription_details = subscription.get_details()

### Score the model so we can configure monitors

Now that the WML service has been bound and the subscription has been created, we need to send a request to the model before we configure OpenScale. This allows OpenScale to create a payload log in the datamart with the correct schema, so it can capture data coming into and out of the model. First, the code gets the model deployment's endpoint URL, and then sends a few records for predictions.

In [59]:
deployment_uid=deployment_fuid

In [60]:
credit_risk_scoring_endpoint = None
print(deployment_uid)

for deployment in wml_client.deployments.get_details()['resources']:
    if deployment_uid in deployment['metadata']['guid']:
        credit_risk_scoring_endpoint = deployment['entity']['status']['online_url']['url']

7f416433-d9e5-436a-8351-fa6c6146b01f


In [61]:
fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"]
values = [
  ["no_checking",13,"credits_paid_to_date","car_new",1343,"100_to_500","1_to_4",2,"female","none",3,"savings_insurance",46,"none","own",2,"skilled",1,"none","yes"],
#   ["no_checking",24,"prior_payments_delayed","furniture",4567,"500_to_1000","1_to_4",4,"male","none",4,"savings_insurance",36,"none","free",2,"management_self-employed",1,"none","yes"],
#   ["0_to_200",26,"all_credits_paid_back","car_new",863,"less_100","less_1",2,"female","co-applicant",2,"real_estate",38,"none","own",1,"skilled",1,"none","yes"],
#   ["0_to_200",14,"no_credits","car_new",2368,"less_100","1_to_4",3,"female","none",3,"real_estate",29,"none","own",1,"skilled",1,"none","yes"],
#   ["0_to_200",4,"no_credits","car_new",250,"less_100","unemployed",2,"female","none",3,"real_estate",23,"none","rent",1,"management_self-employed",1,"none","yes"],
#   ["no_checking",17,"credits_paid_to_date","car_new",832,"100_to_500","1_to_4",2,"male","none",2,"real_estate",42,"none","own",1,"skilled",1,"none","yes"],
#   ["no_checking",33,"outstanding_credit","appliances",5696,"unknown","greater_7",4,"male","co-applicant",4,"unknown",54,"none","free",2,"skilled",1,"yes","yes"],
#   ["0_to_200",13,"prior_payments_delayed","retraining",1375,"100_to_500","4_to_7",3,"male","none",3,"real_estate",37,"none","own",2,"management_self-employed",1,"none","yes"]
]

payload_scoring = {"fields": fields,"values": values}
payload = {
    wml_client.deployments.ScoringMetaNames.INPUT_DATA: [payload_scoring]
}
scoring_response = wml_client.deployments.score(deployment_uid, payload)

print('Single record scoring result:', '\n fields:', scoring_response['predictions'][0]['fields'], '\n values: ', scoring_response['predictions'][0]['values'][0])

Single record scoring result: 
 fields: ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'CheckingStatus_IX', 'CreditHistory_IX', 'LoanPurpose_IX', 'ExistingSavings_IX', 'EmploymentDuration_IX', 'Sex_IX', 'OthersOnLoan_IX', 'OwnsProperty_IX', 'InstallmentPlans_IX', 'Housing_IX', 'Job_IX', 'Telephone_IX', 'ForeignWorker_IX', 'label', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'] 
 values:  ['no_checking', 13, 'credits_paid_to_date', 'car_new', 1343, '100_to_500', '1_to_4', 2, 'female', 'none', 3, 'savings_insurance', 46, 'none', 'own', 2, 'skilled', 1, 'none', 'yes', 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, [20, [1, 3, 5, 13, 14, 15, 16, 17, 18, 19], [1.0, 1.


# Quality monitoring and feedback logging <a name="quality"></a>

## Enable quality monitoring

The code below waits ten seconds to allow the payload logging table to be set up before it begins enabling monitors. First, it turns on the quality (accuracy) monitor and sets an alert threshold of 70%. OpenScale will show an alert on the dashboard if the model accuracy measurement (area under the curve, in the case of a binary classifier) falls below this threshold.

The second paramater supplied, min_records, specifies the minimum number of feedback records OpenScale needs before it calculates a new measurement. The quality monitor runs hourly, but the accuracy reading in the dashboard will not change until an additional 50 feedback records have been added, via the user interface, the Python client, or the supplied feedback endpoint.

In [62]:
time.sleep(10)
subscription.quality_monitoring.enable(threshold=0.8, min_records=40)

## Feedback logging

The code below downloads and stores enough feedback data to meet the minimum threshold so that OpenScale can calculate a new accuracy measurement. It then kicks off the accuracy monitor. The monitors run hourly, or can be initiated via the Python API, the REST API, or the graphical user interface.

In [63]:
!rm additional_feedback_data.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/additional_feedback_data.json

--2020-04-09 19:19:12--  https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/additional_feedback_data.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.68.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.68.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16506 (16K) [text/plain]
Saving to: ‘additional_feedback_data.json’


2020-04-09 19:19:18 (250 KB/s) - ‘additional_feedback_data.json’ saved [16506/16506]



In [64]:
with open('additional_feedback_data.json') as feedback_file:
    additional_feedback_data = json.load(feedback_file)
subscription.feedback_logging.store(additional_feedback_data['data'])

In [77]:
feed_data = pd.read_csv('/project_data/data_asset/feedback_data_1.csv')
feed_data.shape

(49, 21)

In [78]:
subscription.feedback_logging.store(feed_data.values.tolist())

In [88]:
time.sleep(5)
subscription.feedback_logging.show_table()

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21
less_0,17,credits_paid_to_date,car_used,250,less_100,4_to_7,3,female,none,2,real_estate,40,none,free,2,skilled,1,none,yes,No Risk,2020-03-24 01:37:40.583000+00:00
no_checking,27,prior_payments_delayed,radio_tv,4521,100_to_500,less_1,4,male,none,4,savings_insurance,28,none,own,1,management_self-employed,2,yes,yes,No Risk,2020-03-24 01:37:40.583000+00:00
no_checking,37,prior_payments_delayed,other,7945,500_to_1000,1_to_4,4,male,none,4,savings_insurance,39,none,own,2,management_self-employed,1,none,yes,No Risk,2020-03-24 01:37:40.583000+00:00
less_0,6,all_credits_paid_back,car_used,250,less_100,1_to_4,2,male,none,2,savings_insurance,28,stores,rent,1,skilled,1,none,yes,Risk,2020-03-24 01:37:40.583000+00:00
less_0,14,all_credits_paid_back,appliances,1431,less_100,unemployed,1,female,none,1,car_other,25,stores,own,1,skilled,1,none,yes,Risk,2020-03-24 01:37:40.583000+00:00
greater_200,5,credits_paid_to_date,car_used,250,less_100,4_to_7,3,male,none,2,savings_insurance,42,none,rent,1,skilled,1,none,yes,No Risk,2020-03-24 01:37:40.583000+00:00
0_to_200,23,credits_paid_to_date,retraining,3387,less_100,less_1,3,female,none,3,savings_insurance,28,none,own,1,skilled,1,none,yes,No Risk,2020-03-24 01:37:40.582000+00:00
no_checking,14,credits_paid_to_date,furniture,1269,500_to_1000,greater_7,2,male,none,2,savings_insurance,39,none,own,1,skilled,1,none,yes,No Risk,2020-03-24 01:37:40.582000+00:00
no_checking,36,prior_payments_delayed,appliances,9570,100_to_500,4_to_7,4,male,co-applicant,3,car_other,53,none,free,2,skilled,1,yes,yes,No Risk,2020-03-24 01:37:40.582000+00:00
less_0,16,credits_paid_to_date,car_new,1428,less_100,4_to_7,1,male,none,1,car_other,20,bank,rent,1,unemployed,1,yes,yes,No Risk,2020-03-24 01:37:40.582000+00:00


In [65]:
run_details = subscription.quality_monitoring.run(background_mode=False)




 Waiting for end of quality monitoring run 025f5778-5ec5-4092-8130-a77d867d366e 




initializing
running.......
completed

---------------------------
 Successfully finished run 
---------------------------




In [80]:
time.sleep(5)
subscription.quality_monitoring.show_table()

0,1,2,3,4,5,6,7,8,9
2020-03-30 18:35:00.263000+00:00,true_positive_rate,19feb419-b9af-4166-a047-1dfb567f1b0f,0.6666666666666666,,,model_type: original,999,7a44f319-d8bc-437e-a410-7dd8363e3ed6,5775e1ad-7ab2-4461-b502-b36a06cfe9e3
2020-03-30 18:35:00.263000+00:00,area_under_roc,19feb419-b9af-4166-a047-1dfb567f1b0f,0.7849462365591396,0.8,,model_type: original,999,7a44f319-d8bc-437e-a410-7dd8363e3ed6,5775e1ad-7ab2-4461-b502-b36a06cfe9e3
2020-03-30 18:35:00.263000+00:00,precision,19feb419-b9af-4166-a047-1dfb567f1b0f,0.8,,,model_type: original,999,7a44f319-d8bc-437e-a410-7dd8363e3ed6,5775e1ad-7ab2-4461-b502-b36a06cfe9e3
2020-03-30 18:35:00.263000+00:00,f1_measure,19feb419-b9af-4166-a047-1dfb567f1b0f,0.7272727272727272,,,model_type: original,999,7a44f319-d8bc-437e-a410-7dd8363e3ed6,5775e1ad-7ab2-4461-b502-b36a06cfe9e3
2020-03-30 18:35:00.263000+00:00,accuracy,19feb419-b9af-4166-a047-1dfb567f1b0f,0.8163265306122449,,,model_type: original,999,7a44f319-d8bc-437e-a410-7dd8363e3ed6,5775e1ad-7ab2-4461-b502-b36a06cfe9e3
2020-03-30 18:35:00.263000+00:00,log_loss,19feb419-b9af-4166-a047-1dfb567f1b0f,0.41767466249293,,,model_type: original,999,7a44f319-d8bc-437e-a410-7dd8363e3ed6,5775e1ad-7ab2-4461-b502-b36a06cfe9e3
2020-03-30 18:35:00.263000+00:00,false_positive_rate,19feb419-b9af-4166-a047-1dfb567f1b0f,0.0967741935483871,,,model_type: original,999,7a44f319-d8bc-437e-a410-7dd8363e3ed6,5775e1ad-7ab2-4461-b502-b36a06cfe9e3
2020-03-30 18:35:00.263000+00:00,area_under_pr,19feb419-b9af-4166-a047-1dfb567f1b0f,0.7278911564625851,,,model_type: original,999,7a44f319-d8bc-437e-a410-7dd8363e3ed6,5775e1ad-7ab2-4461-b502-b36a06cfe9e3
2020-03-30 18:35:00.263000+00:00,recall,19feb419-b9af-4166-a047-1dfb567f1b0f,0.6666666666666666,,,model_type: original,999,7a44f319-d8bc-437e-a410-7dd8363e3ed6,5775e1ad-7ab2-4461-b502-b36a06cfe9e3


In [235]:
#import matplotlib
# %matplotlib inline

# quality_pd = subscription.quality_monitoring.get_table_content(format='pandas')
# quality_pd.plot.barh(x='id', y='value');

# Fairness, drift monitoring and explanations 
 <a name="fairness"></a>

The code below configures fairness monitoring for our model. It turns on monitoring for two features, Sex and Age. In each case, we must specify:
  * Which model feature to monitor
  * One or more **majority** groups, which are values of that feature that we expect to receive a higher percentage of favorable outcomes
  * One or more **minority** groups, which are values of that feature that we expect to receive a higher percentage of unfavorable outcomes
  * The threshold at which we would like OpenScale to display an alert if the fairness measurement falls below (in this case, 95%)

Additionally, we must specify which outcomes from the model are favourable outcomes, and which are unfavourable. We must also provide the number of records OpenScale will use to calculate the fairness score. In this case, OpenScale's fairness monitor will run hourly, but will not calculate a new fairness rating until at least 200 records have been added. Finally, to calculate fairness, OpenScale must perform some calculations on the training data, so we provide the dataframe containing the data.

In [66]:
subscription.fairness_monitoring.enable(
            features=[
                Feature("Sex", majority=['male'], minority=['female'], threshold=0.80),
                Feature("Age", majority=[[26,74]], minority=[[18,25]], threshold=0.80)
            ],
            favourable_classes=['No Risk'],
            unfavourable_classes=['Risk'],
            min_records=100,
            training_data=pd_data
        )

## Drift configuration

In [67]:
subscription.drift_monitoring.enable(min_records=100, threshold=0.1)

drift_status = None
while drift_status != 'finished':
    drift_details = subscription.drift_monitoring.get_details()
    drift_status = drift_details['parameters']['config_status']['state']
    if drift_status != 'finished':
        print(datetime.utcnow().strftime('%H:%M:%S'), drift_status)
        time.sleep(30)
print(drift_status)

19:23:25 new
19:23:56 in_progress
19:24:27 in_progress
19:24:58 in_progress
19:25:30 in_progress
19:26:01 in_progress
19:26:32 in_progress
finished


## Score the model again now that monitoring is configured

This next section randomly selects 200 records from the data feed and sends those records to the model for predictions. This is enough to exceed the minimum threshold for records set in the previous section, which allows OpenScale to begin calculating fairness.

In [89]:
# !rm german_credit_feed.json
# !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/german_credit_feed.json

Score 200 randomly chosen records

In [68]:
scoring_data = pd.read_csv('/project_data/data_asset/100_bias_drift_payload.csv',index_col=0)
scoring_data.head()

Unnamed: 0,CheckingStatus,LoanDuration,CreditHistory,LoanPurpose,LoanAmount,ExistingSavings,EmploymentDuration,InstallmentPercent,Sex,OthersOnLoan,CurrentResidenceDuration,OwnsProperty,Age,InstallmentPlans,Housing,ExistingCreditsCount,Job,Dependents,Telephone,ForeignWorker
0,no_checking,4,outstanding_credit,other,2000,unknown,4_to_7,5,male,none,4,unknown,20,stores,free,2,unskilled,2,yes,yes
1,no_checking,41,credits_paid_to_date,appliances,2901,100_to_500,4_to_7,4,female,none,3,savings_insurance,36,none,own,1,skilled,1,none,yes
2,no_checking,36,outstanding_credit,repairs,5673,unknown,greater_7,5,male,none,4,unknown,48,none,own,2,skilled,1,none,yes
3,no_checking,41,credits_paid_to_date,appliances,2901,100_to_500,4_to_7,4,female,none,3,savings_insurance,36,none,own,1,skilled,1,none,yes
4,0_to_200,4,outstanding_credit,vacation,2000,500_to_1000,4_to_7,4,male,none,3,car_other,20,none,own,2,skilled,1,none,yes


In [69]:
import random

# with open('german_credit_feed.json', 'r') as scoring_file:
#     scoring_data = json.load(scoring_file)

# fields = scoring_data['fields']
values =[]
for _ in range(100):
    values.append(random.choice(scoring_data.values.tolist()))
payload_scoring = {"fields": fields, "values": values}
payload = {
    wml_client.deployments.ScoringMetaNames.INPUT_DATA: [payload_scoring]
}
scoring_response = wml_client.deployments.score(deployment_uid, payload)

In [181]:
# biased_100_payload=pd.DataFrame(values, columns=fields)
# biased_100_payload.shape

(100, 20)

In [182]:
# import base64
# import pandas as pd
# from IPython.display import HTML

# def create_download_link( df, title = "Download CSV file", filename = "data.csv"):
#     csv = biased_100_payload.to_csv()
#     b64 = base64.b64encode(csv.encode())
#     payload = b64.decode()
#     html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
#     html = html.format(payload=payload,title=title,filename=filename)
#     return HTML(html)

# df = pd.DataFrame(data = [[1,2],[3,4]], columns=['Col 1', 'Col 2'])
# create_download_link(df)

## Run fairness monitor

Kick off a fairness monitor run on current data. The monitor runs hourly, but can be manually initiated using the Python client, the REST API, or the graphical user interface.

In [70]:
time.sleep(5)

run_details = subscription.fairness_monitoring.run(background_mode=False)




 Counting bias for deployment_uid=7f416433-d9e5-436a-8351-fa6c6146b01f 




RUNNING.......
FINISHED

---------------------------
 Successfully finished run 
---------------------------




In [71]:
time.sleep(5)

subscription.fairness_monitoring.show_table()

0,1,2,3,4,5,6,7,8,9,10
2020-04-09 19:28:52.081359+00:00,Sex,female,True,0.694,34.0,999,71426d73-aa90-473e-ab45-2ae8d1842796,71426d73-aa90-473e-ab45-2ae8d1842796,7f416433-d9e5-436a-8351-fa6c6146b01f,
2020-04-09 19:28:52.081359+00:00,Age,"[18, 25]",False,0.86,37.0,999,71426d73-aa90-473e-ab45-2ae8d1842796,71426d73-aa90-473e-ab45-2ae8d1842796,7f416433-d9e5-436a-8351-fa6c6146b01f,


## Run drift monitor

Kick off a drift monitor run on current data. The monitor runs every hour, but can be manually initiated using the Python client, the REST API.

In [72]:
drift_run_details = subscription.drift_monitoring.run(background_mode=False)




 Waiting for end of drift monitoring run  




RUNNING.
COMPLETED

---------------------------
 Successfully finished run 
---------------------------




In [74]:
subscription.drift_monitoring.get_table_content()

Unnamed: 0,ts,id,measurement_id,value,lower limit,upper limit,tags,binding_id,subscription_id,deployment_id
0,2020-04-09 19:32:29.471206+00:00,data_drift_magnitude,5c111d13-d6f1-4d7f-9d84-e7c3f2b64649,0.326733,,,,999,71426d73-aa90-473e-ab45-2ae8d1842796,7f416433-d9e5-436a-8351-fa6c6146b01f
1,2020-04-09 19:32:29.471206+00:00,drift_magnitude,5c111d13-d6f1-4d7f-9d84-e7c3f2b64649,0.274752,,0.1,,999,71426d73-aa90-473e-ab45-2ae8d1842796,7f416433-d9e5-436a-8351-fa6c6146b01f
2,2020-04-09 19:32:29.471206+00:00,predicted_accuracy,5c111d13-d6f1-4d7f-9d84-e7c3f2b64649,0.496248,,,,999,71426d73-aa90-473e-ab45-2ae8d1842796,7f416433-d9e5-436a-8351-fa6c6146b01f
3,2020-04-09 19:29:17.474719+00:00,data_drift_magnitude,62db9814-f177-4059-bf03-18691eaef9fe,0.326733,,,,999,71426d73-aa90-473e-ab45-2ae8d1842796,7f416433-d9e5-436a-8351-fa6c6146b01f
4,2020-04-09 19:29:17.474719+00:00,drift_magnitude,62db9814-f177-4059-bf03-18691eaef9fe,0.274752,,0.1,,999,71426d73-aa90-473e-ab45-2ae8d1842796,7f416433-d9e5-436a-8351-fa6c6146b01f
5,2020-04-09 19:29:17.474719+00:00,predicted_accuracy,62db9814-f177-4059-bf03-18691eaef9fe,0.496248,,,,999,71426d73-aa90-473e-ab45-2ae8d1842796,7f416433-d9e5-436a-8351-fa6c6146b01f


## Configure Explainability

Finally, we provide OpenScale with the training data to enable and configure the explainability features.

In [75]:
from ibm_ai_openscale.supporting_classes import *
subscription.explainability.enable(training_data=pd_data)

In [76]:
explainability_details = subscription.explainability.get_details()

## Run explanation for sample record

In [77]:
transaction_id = subscription.payload_logging.get_table_content(limit=1)['scoring_id'].values[0]

print(transaction_id)

20687ba9-acf5-4e87-ac02-89639090f0d0-1


In [78]:
explain_run = subscription.explainability.run(transaction_id=transaction_id, background_mode=False)




 Looking for explanation for 20687ba9-acf5-4e87-ac02-89639090f0d0-1 




in_progress........
finished

---------------------------
 Successfully finished run 
---------------------------




In [79]:
if explain_run == None:
    # explanation didn't finish within 180 seconds, if explaination is still not finished give it a minute or so then re-run this cell
    time.sleep(10)
    explain_table = subscription.explainability.get_table_content(format='pandas')
    explain_result = pd.DataFrame.from_dict(explain_table[explain_table['transaction_id']==transaction_id]['explanation'][0]['entity']['predictions'][0]['explanation_features'])
else:
    explain_result = pd.DataFrame.from_dict(explain_run['entity']['predictions'][0]['explanation_features'])

explain_result.plot.barh(x='feature_name', y='weight', color='g', alpha=0.8);

In [171]:
!curl -k -u admin:password https://namespace1-cpd-namespace1.apps.rsmar16.os.fyre.ibm.com/v1/preauth/validateAuth

{"username":"admin","role":"Admin","permissions":["administrator","can_provision","manage_catalog"],"sub":"admin","iss":"KNOXSSO","aud":"DSX","uid":"1000330999","authenticator":"default","accessToken":"eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6ImFkbWluIiwicm9sZSI6IkFkbWluIiwicGVybWlzc2lvbnMiOlsiYWRtaW5pc3RyYXRvciIsImNhbl9wcm92aXNpb24iLCJtYW5hZ2VfY2F0YWxvZyJdLCJzdWIiOiJhZG1pbiIsImlzcyI6IktOT1hTU08iLCJhdWQiOiJEU1giLCJ1aWQiOiIxMDAwMzMwOTk5IiwiYXV0aGVudGljYXRvciI6ImRlZmF1bHQiLCJpYXQiOjE1ODQ3MzQ4MTIsImV4cCI6MTU4NDc3ODAxMn0.oGojLI2RSiV5PLW8YPQUZ_BvRivz2cKVIm0-WlPNN4EZNF0x7olDF6CNoXN97yQODDLWPJlqLI1Wt4UgFejZroGTIX3qeKrbRb76xEDXSo311TnqCTSAok-MmsEhpn9FkZtf0swNgvp_VGJe_RS_mecfE7LXqzxuPYsSE03KKdOJXolFEtezzWI5O-U2w2neTMNF0-9D-CLiShFpoocvDIUkTPWru1Qh22ypAbAIJEKP3xopbJgiyWi8qnfTAgUL9oYj-r2pSPqxXfsiWISGQIR83xTSWvUVDYNiJN-wSdMmtF8Jkl2IR8IdTfPCmDecCD86Kmg2C-hEw3TORUkYrw","_messageCode_":"success","message":"success"}

In [186]:
!curl -k --location --request GET 'https://namespace1-cpd-namespace1.apps.rsmar16.os.fyre.ibm.com/v1/data_marts/00000000-0000-0000-0000-000000000000/deployment_metrics' \
--header 'Authorization: bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6ImFkbWluIiwicm9sZSI6IkFkbWluIiwicGVybWlzc2lvbnMiOlsiYWRtaW5pc3RyYXRvciIsImNhbl9wcm92aXNpb24iLCJtYW5hZ2VfY2F0YWxvZyJdLCJzdWIiOiJhZG1pbiIsImlzcyI6IktOT1hTU08iLCJhdWQiOiJEU1giLCJ1aWQiOiIxMDAwMzMwOTk5IiwiYXV0aGVudGljYXRvciI6ImRlZmF1bHQiLCJpYXQiOjE1ODQ3MzQ4MTIsImV4cCI6MTU4NDc3ODAxMn0.oGojLI2RSiV5PLW8YPQUZ_BvRivz2cKVIm0-WlPNN4EZNF0x7olDF6CNoXN97yQODDLWPJlqLI1Wt4UgFejZroGTIX3qeKrbRb76xEDXSo311TnqCTSAok-MmsEhpn9FkZtf0swNgvp_VGJe_RS_mecfE7LXqzxuPYsSE03KKdOJXolFEtezzWI5O-U2w2neTMNF0-9D-CLiShFpoocvDIUkTPWru1Qh22ypAbAIJEKP3xopbJgiyWi8qnfTAgUL9oYj-r2pSPqxXfsiWISGQIR83xTSWvUVDYNiJN-wSdMmtF8Jkl2IR8IdTfPCmDecCD86Kmg2C-hEw3TORUkYrw' \
--header 'Accept: application/json'

{
  "deployment_metrics": [{
    "asset": {
      "asset_id": "1ac063ba-0672-4d32-a70f-40ffaa34be8e",
      "asset_type": "model",
      "created_at": "2020-03-20T15:13:32.002Z",
      "name": "Spark German Risk Model - Final",
      "url": "https://ibm-nginx-svc.namespace1.svc.cluster.local/v4/models/1ac063ba-0672-4d32-a70f-40ffaa34be8e?space_id=1b0adc24-f91d-402b-ba9f-95ec10ad3498"
    },
    "deployment": {
      "created_at": "2020-03-20T15:14:17.647Z",
      "deployment_id": "8c922ed0-be19-45eb-b102-0e49c1fbfe2b",
      "deployment_rn": "",
      "deployment_type": "online",
      "name": "Spark German Risk Deployment - Final",
      "scoring_endpoint": {
        "request_headers": {
          "Content-Type": "application/json"
        },
        "url": "https://ibm-nginx-svc.namespace1.svc.cluster.local/v4/deployments/8c922ed0-be19-45eb-b102-0e49c1fbfe2b/predictions"
      },
      "url": "https://ibm-nginx-svc.namespace1.svc.cluster.local/v4/deployments/8c92

In [180]:
subscription.fairness_monitoring.get_metrics(deployment_uid)

{'end': '2020-03-20T20:12:41.576Z',
 'metrics': [{'asset_revision': '0d786253-982f-4d6b-9547-4e1ffa0b206c',
   'timestamp': '2020-03-20T20:08:48.066294Z',
   'value': {'copy_reason': 'No new data added since last fairness computation.',
    'evaluated_at': '2020-03-20T18:12:52.137682Z',
    'is_copied': True,
    'manual_labelling_store': 'Manual_Labeling_0d786253-982f-4d6b-9547-4e1ffa0b206c',
    'metrics': [{'fairness_threshold': 0.8,
      'feature': 'Sex',
      'majority': {'total_fav_percent': 50.0,
       'total_rows_percent': 50.0,
       'values': [{'distribution': {'male': [{'count': 25,
            'is_favourable': True,
            'label': 'No Risk'},
           {'count': 37, 'is_favourable': False, 'label': 'Risk'}]},
         'fav_class_percent': 50.0,
         'payload_perturb_distribution': {'male': [{'count': 50,
            'is_favourable': True,
            'label': 'No Risk'},
           {'count': 50, 'is_favourable': False, 'label': 'Risk'}]},
         'value': 'm

In [179]:
subscription.drift_monitoring.get_metrics(deployment_uid)

[{'asset_id': '1ac063ba-0672-4d32-a70f-40ffaa34be8e',
  'binding_id': '999',
  'process': 'Drift run for subscription_0d786253-982f-4d6b-9547-4e1ffa0b206c',
  'tags': [],
  'ts': '2020-03-20T18:13:48.196898Z',
  'measurement_id': '695c6a70-d6a3-437e-8548-bf9ef2e7cfa6',
  'monitor_definition_id': 'drift',
  'subscription_id': '0d786253-982f-4d6b-9547-4e1ffa0b206c',
  'metrics': [{'id': 'data_drift_magnitude', 'value': 0.3069306930693069},
   {'upper_limit': 0.05,
    'id': 'drift_magnitude',
    'value': 0.06222772277227728},
   {'id': 'predicted_accuracy', 'value': 0.7637722772277227}],
  'deployment_id': '8c922ed0-be19-45eb-b102-0e49c1fbfe2b'},
 {'asset_id': '1ac063ba-0672-4d32-a70f-40ffaa34be8e',
  'binding_id': '999',
  'process': 'Drift run for subscription_0d786253-982f-4d6b-9547-4e1ffa0b206c',
  'tags': [],
  'ts': '2020-03-20T18:13:40.414398Z',
  'measurement_id': 'ca20259b-c093-4f9d-a7cd-dcfffd065de6',
  'monitor_definition_id': 'drift',
  'subscription_id': '0d786253-982f-4d6b

In [181]:
subscription.quality_monitoring.get_metrics(deployment_uid)

[{'asset_id': '1ac063ba-0672-4d32-a70f-40ffaa34be8e',
  'binding_id': '999',
  'tags': [{'id': 'model_type', 'value': 'original'}],
  'ts': '2020-03-20T20:07:50.444Z',
  'measurement_id': '544b1062-f546-4c89-b749-a10917025b80',
  'monitor_definition_id': 'quality',
  'subscription_id': '0d786253-982f-4d6b-9547-4e1ffa0b206c',
  'metrics': [{'id': 'true_positive_rate', 'value': 0.36363636363636365},
   {'lower_limit': 0.7, 'id': 'area_under_roc', 'value': 0.6202797202797203},
   {'id': 'precision', 'value': 0.6},
   {'id': 'f1_measure', 'value': 0.4528301886792453},
   {'id': 'accuracy', 'value': 0.7040816326530612},
   {'id': 'log_loss', 'value': 0.5627085275784918},
   {'id': 'false_positive_rate', 'value': 0.12307692307692308},
   {'id': 'area_under_pr', 'value': 0.5162337662337662},
   {'id': 'recall', 'value': 0.36363636363636365}],
  'deployment_id': '8c922ed0-be19-45eb-b102-0e49c1fbfe2b'},
 {'asset_id': '1ac063ba-0672-4d32-a70f-40ffaa34be8e',
  'binding_id': '999',
  'tags': [{'id

# Historical data <a name="historical"></a>

In [212]:
historyDays = 7

 ## Insert historical payloads

The next section of the notebook downloads and writes historical data to the payload and measurement tables to simulate a production model that has been monitored and receiving regular traffic for the last seven days. This historical data can be viewed in the Watson OpenScale user interface. The code uses the Python and REST APIs to write this data.

In [213]:
!rm history_payloads_with_transaction_*.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/fastpath/history_payloads_with_transaction_id_0.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/fastpath/history_payloads_with_transaction_id_1.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/fastpath/history_payloads_with_transaction_id_2.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/fastpath/history_payloads_with_transaction_id_3.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/fastpath/history_payloads_with_transaction_id_4.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/fastpath/history_payloads_with_transaction_id_5.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/fastpath/history_payloads_with_transaction_id_6.json

--2020-03-20 03:12:10--  https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/fastpath/history_payloads_with_transaction_id_0.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.36.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.36.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5209237 (5.0M) [text/plain]
Saving to: ‘history_payloads_with_transaction_id_0.json’


2020-03-20 03:12:12 (7.01 MB/s) - ‘history_payloads_with_transaction_id_0.json’ saved [5209237/5209237]

--2020-03-20 03:12:12--  https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/fastpath/history_payloads_with_transaction_id_1.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.36.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.36.133|:443... 

In [174]:
with open('history_payloads_with_transaction_id_1.json') as f:
    data = json.load(f)

In [175]:
import itertools
col=data[0]['request']['fields']
val=[list(itertools.chain(*data[i]['request']['values'])) for i,j in enumerate(data)]   

In [176]:
df=pd.DataFrame(val,columns=col)

In [177]:
df.shape

(2880, 20)

In [178]:
df['Sex'].value_counts()

male      1767
female    1113
Name: Sex, dtype: int64

In [214]:
# from ibm_ai_openscale.utils.inject_demo_data import DemoData
# import os

# historicalData = DemoData(WOS_CREDENTIALS, ai_client)
# historical_data_path=os.getcwd()

# historicalData.load_historical_scoring_payload(subscription, deployment_uid,file_path=historical_data_path, day_template="history_payloads_with_transaction_id_{}.json" )

Loading historical scoring payload...
Day 0 injection.
Daily loading finished.
Day 1 injection.
Daily loading finished.
Day 2 injection.
Daily loading finished.
Day 3 injection.
Daily loading finished.
Day 4 injection.
Daily loading finished.
Day 5 injection.
Daily loading finished.
Day 6 injection.
Daily loading finished.


In [215]:
data_mart_id = subscription.get_details()['metadata']['url'].split('/service_bindings')[0].split('marts/')[1]
print(data_mart_id)

00000000-0000-0000-0000-000000000000


In [216]:
performance_metrics_url = WOS_CREDENTIALS['url'] + subscription.get_details()['metadata']['url'].split('/service_bindings')[0] + '/metrics'

In [217]:
performance_metrics_url

'https://namespace1-cpd-namespace1.apps.rsmar16.os.fyre.ibm.com/v1/data_marts/00000000-0000-0000-0000-000000000000/metrics'

## Insert historical fairness metrics

In [218]:
!rm history_fairness.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_fairness.json

--2020-03-20 03:13:52--  https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_fairness.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.36.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.36.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1424078 (1.4M) [text/plain]
Saving to: ‘history_fairness.json’


2020-03-20 03:13:53 (2.16 MB/s) - ‘history_fairness.json’ saved [1424078/1424078]



In [179]:
with open('history_fairness.json') as f:
    test = json.load(f)
test

[{'metrics': [{'feature': 'Sex',
    'majority': {'values': [{'value': 'male',
       'distribution': {'male': [{'count': 48,
          'label': 'No Risk',
          'is_favourable': True},
         {'count': 11, 'label': 'Risk', 'is_favourable': False}]},
       'fav_class_percent': 76.0,
       'payload_perturb_distribution': {'male': [{'count': 76,
          'label': 'No Risk',
          'is_favourable': True},
         {'count': 24, 'label': 'Risk', 'is_favourable': False}]}}],
     'total_fav_percent': 76.0,
     'total_rows_percent': 50.0},
    'minority': {'values': [{'value': 'female',
       'is_biased': True,
       'distribution': {'female': [{'count': 29,
          'label': 'No Risk',
          'is_favourable': True},
         {'count': 12, 'label': 'Risk', 'is_favourable': False}]},
       'fairness_value': 0.947,
       'fav_class_percent': 72.0,
       'payload_perturb_distribution': {'female': [{'count': 72,
          'label': 'No Risk',
          'is_favourable': True}

## Insert historical debias metrics

In [224]:
!rm history_debias.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_debias.json

--2020-03-20 03:17:59--  https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_debias.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.36.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.36.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 565971 (553K) [text/plain]
Saving to: ‘history_debias.json’


2020-03-20 03:18:00 (1.42 MB/s) - ‘history_debias.json’ saved [565971/565971]



In [180]:
with open('history_debias.json') as f:
    test = json.load(f)
test[0]

{'metrics': [{'feature': 'Sex',
   'majority': {'values': [{'value': 'male',
      'distribution': {'male': [{'count': 5,
         'label': 'Risk',
         'is_favourable': False},
        {'count': 56, 'label': 'No Risk', 'is_favourable': False}]},
      'fav_class_percent': 95.0}],
    'total_fav_percent': 95.0,
    'total_rows_percent': 50.0},
   'minority': {'values': [{'value': 'female',
      'is_biased': False,
      'distribution': {'female': [{'count': 39,
         'label': 'No Risk',
         'is_favourable': False}]},
      'fairness_value': 1.0,
      'fav_class_percent': 95.0}],
    'total_fav_percent': 95.0,
    'total_rows_percent': 50.0}},
  {'feature': 'Age',
   'majority': {'values': [{'value': [26, 75],
      'distribution': {'26': [{'count': 2,
         'label': 'No Risk',
         'is_favourable': False}],
       '28': [{'count': 5, 'label': 'No Risk', 'is_favourable': False}],
       '29': [{'count': 2, 'label': 'No Risk', 'is_favourable': False}],
       '30': [

## Insert historical quality metrics

In [229]:
iam_token = create_token()
iam_headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer %s' % iam_token
}

measurements = [0.76, 0.78, 0.68, 0.72, 0.73, 0.77, 0.80]
for day in range(historyDays):
    print('Day', day + 1)
    for hour in range(24):
        score_time = (datetime.utcnow() + timedelta(hours=(-(24*day + hour + 1)))).strftime('%Y-%m-%dT%H:%M:%SZ')
        qualityMetric = {
            'metric_type': 'quality',
            'binding_id': binding_uid,
            'timestamp': score_time,
            'subscription_id': model_uid,
            'asset_revision': model_uid,
            'deployment_id': deployment_uid,
            'value': {
                'quality': measurements[day],
                'threshold': 0.7,
                'metrics': [
                    {
                        'name': 'auroc',
                        'value': measurements[day],
                        'threshold': 0.7
                    }
                ]
            }
        }

        response = requests.post(performance_metrics_url, json=[qualityMetric], headers=iam_headers, verify=False)
print('Finished')

Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
Finished


## Insert historical confusion matrixes

In [230]:
!rm history_quality_metrics.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wos/history_quality_metrics.json

--2020-03-20 03:21:36--  https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wos/history_quality_metrics.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.36.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.36.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 80099 (78K) [text/plain]
Saving to: ‘history_quality_metrics.json’


2020-03-20 03:21:37 (505 KB/s) - ‘history_quality_metrics.json’ saved [80099/80099]



In [181]:
with open('history_quality_metrics.json') as f:
    test = json.load(f)
test[0]

{'metrics': {'true_positive_rate': 0.8663522012578616,
  'area_under_roc': 0.990711282022907,
  'precision': 0.9565972222222222,
  'f1_measure': 0.9092409240924092,
  'accuracy': 0.945,
  'log_loss': 0.22099052348498696,
  'false_positive_rate': 0.13364779874213836,
  'area_under_pr': 0.9789962753443889,
  'recall': 0.8663522012578616},
 'sources': {'id': 'confusion_matrix_1',
  'type': 'confusion_matrix',
  'data': {'labels': ['No Risk', 'Risk'], 'values': [[1339, 25], [85, 551]]}}}

## Insert historical manual labeling

In [234]:
manual_labeling_url = WOS_CREDENTIALS['url'] + subscription.get_details()['metadata']['url'].split('/service_bindings')[0] + '/manual_labelings'

In [235]:
!rm history_manual_labeling.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_manual_labeling.json

--2020-03-20 03:22:30--  https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_manual_labeling.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.36.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.36.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 947956 (926K) [text/plain]
Saving to: ‘history_manual_labeling.json’


2020-03-20 03:22:36 (1.86 MB/s) - ‘history_manual_labeling.json’ saved [947956/947956]



In [187]:
with open('history_manual_labeling.json') as f:
    test = json.load(f)
test[0]

{'CheckingStatus': 'no_checking',
 'LoanDuration': 27.0,
 'CreditHistory': 'prior_payments_delayed',
 'LoanPurpose': 'retraining',
 'LoanAmount': 3610.0,
 'ExistingSavings': 'less_100',
 'EmploymentDuration': '4_to_7',
 'InstallmentPercent': 2.0,
 'Sex': 'male',
 'OthersOnLoan': 'none',
 'CurrentResidenceDuration': 4.0,
 'OwnsProperty': 'car_other',
 'Age': 19.0,
 'InstallmentPlans': 'none',
 'Housing': 'own',
 'ExistingCreditsCount': 2.0,
 'Job': 'skilled',
 'Dependents': 1.0,
 'Telephone': 'none',
 'ForeignWorker': 'no',
 'predictedLabel': 'No Risk',
 'scoring_id': '315c7070-ed76-4571-ad38-cb7883976b42-7',
 'fastpath_history_day': 0,
 'fastpath_history_hour': 0,
 'perturbed': False}

## Insert historical drift measurements

In [237]:
!rm history_drift_measurement_*.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wos/history_drift_measurement_0.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wos/history_drift_measurement_1.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wos/history_drift_measurement_2.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wos/history_drift_measurement_3.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wos/history_drift_measurement_4.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wos/history_drift_measurement_5.json
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wos/history_drift_measurement_6.json

--2020-03-20 03:22:48--  https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wos/history_drift_measurement_0.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.36.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.36.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 850981 (831K) [text/plain]
Saving to: ‘history_drift_measurement_0.json’


2020-03-20 03:22:49 (1.74 MB/s) - ‘history_drift_measurement_0.json’ saved [850981/850981]

--2020-03-20 03:22:49--  https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wos/history_drift_measurement_1.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.36.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.36.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length:

In [182]:
with open('history_drift_measurement_6.json') as f:
    test = json.load(f)
test[0]

[{'monitor_definition_id': 'drift',
  'timestamp': '<timestamp>',
  'binding_id': '<binding_id>',
  'asset_id': '<asset_id>',
  'subscription_id': '<subscription_id>',
  'deployment_id': '<deployment_id>',
  'process': 'Drift run for subscription_<subscription_id>',
  'metrics': [{'drift_magnitude': 0.03700000000000003,
    'predicted_accuracy': 0.763,
    'data_drift_magnitude': 0.06111111111111111}],
  'sources': [{'id': 'drift_explain_summary_1',
    'type': 'drift_explain_summary',
    'data': {'drifted_transactions': {'clusters': [{'id': '1',
        'count': 17,
        'top_features': [{'feature': 'LoanDuration',
          'importance': 0.4022303329509668,
          'norm_importance': 24.449018156239614,
          'influence': 'small'},
         {'feature': 'LoanAmount',
          'importance': 0.4002811370297075,
          'norm_importance': 24.330538960204574,
          'influence': 'small'},
         {'feature': 'Age',
          'importance': 0.3098384595632163,
          'no

## Additional data to help debugging

In [123]:
print('Datamart:', data_mart_id)
print('Model:', model_uid)
print('Deployment:', deployment_uid)
print('Binding:', binding_uid)
# print('Scoring URL:', credit_risk_scoring_endpoint)

Datamart: 00000000-0000-0000-0000-000000000000
Model: deb87fe2-754b-44b1-8ef6-b3e374094004
Deployment: f02576ec-a345-44d4-b818-5fe802951b05
Binding: 999


## Identify transactions for Explainability

Transaction IDs identified by the cells below can be copied and pasted into the Explainability tab of the OpenScale dashboard.

In [124]:
payload_data = subscription.payload_logging.get_table_content(limit=60)
payload_data.filter(items=['scoring_id', 'predictedLabel', 'probability'])

Unnamed: 0,scoring_id,predictedLabel,probability
0,f27ed9f8-d966-40d5-adbb-96b88f1e5980-1,No Risk,"[0.7342992006506507, 0.26570079934934937]"
1,f27ed9f8-d966-40d5-adbb-96b88f1e5980-2,No Risk,"[0.9325817190008531, 0.067418280999147]"
2,f27ed9f8-d966-40d5-adbb-96b88f1e5980-3,No Risk,"[0.6627934624673603, 0.33720653753263974]"
3,f27ed9f8-d966-40d5-adbb-96b88f1e5980-4,No Risk,"[0.7186449206874034, 0.2813550793125967]"
4,f27ed9f8-d966-40d5-adbb-96b88f1e5980-5,No Risk,"[0.8447352571721282, 0.15526474282787184]"
5,f27ed9f8-d966-40d5-adbb-96b88f1e5980-6,No Risk,"[0.949735394122734, 0.050264605877265986]"
6,f27ed9f8-d966-40d5-adbb-96b88f1e5980-7,Risk,"[0.07119184454084779, 0.9288081554591523]"
7,f27ed9f8-d966-40d5-adbb-96b88f1e5980-8,No Risk,"[0.8106540974128227, 0.18934590258717732]"
8,f27ed9f8-d966-40d5-adbb-96b88f1e5980-9,No Risk,"[0.7271672744053184, 0.2728327255946816]"
9,f27ed9f8-d966-40d5-adbb-96b88f1e5980-11,No Risk,"[0.5090501922747169, 0.49094980772528307]"



# additional fastpath files


In [190]:
# Edit this cell for your own purpose for the file generation below

model_info = {        
    "problem_type": "BINARY_CLASSIFICATION",
    "input_data_type": "STRUCTURED",
    "label_column": "Risk",
    "categorical_columns": ['CheckingStatus', 'CreditHistory', 'LoanPurpose', 'ExistingSavings',
                        'EmploymentDuration', 'Sex', 'OthersOnLoan', 'OwnsProperty', 'InstallmentPlans',
                        'Housing', 'Job', 'Telephone', 'ForeignWorker']
 }

fairness_config = {
    "features": [
        {
            "feature": "Age",
            "majority": [
                [ 25, 74 ]
            ],
            "minority": [
                [ 18, 24 ]
            ],
            "threshold": 0.8
        },
        
       {
            "feature": "Sex",
            "majority": ['male'],
            "minority": ['female'],
            "threshold": 0.8
        }, 
    ],
    "favourable_classes": [
        'No Risk'
    ],
    "unfavourable_classes": [
        'Risk'
    ],
    "min_records": 100
}

quality_config = {
    "threshold": 0.7,
    "min_records": 40
}

# training_data_reference = {
#     "credentials" : {
#       "url":"",
#       "apikey": "",
#       "endpoints": "https://cos-service.bluemix.net/endpoints",
#       "iam_apikey_description": "",
#       "iam_apikey_name": "",
#       "iam_role_crn": "",
#       "iam_serviceid_crn": "",
#       "resource_instance_id": ""
#     },
#     "path" : "buket/file",
#     "firstlineheader": "True"
# }

training_data_reference= {
        "credentials" : {
          "url":"https://s3.us.cloud-object-storage.appdomain.cloud",
          "apikey": "917Z-0MQzVpgdqkHClXbfDchrXZ_bl7kbCxZSkynzsLP",
          "endpoints": "https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints",
          "iam_apikey_description": "Auto-generated for key 802ff53e-bbed-4bfb-80fa-e01f3d6929ee",
          "iam_apikey_name": "resource_final",
          "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Manager",
          "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/e0b56432b1f1bd804706dc29b8a89ca1::serviceid:ServiceId-74064b71-cec6-45ac-853b-f4b728d408b3",
          "resource_instance_id": "crn:v1:bluemix:public:cloud-object-storage:global:a/e0b56432b1f1bd804706dc29b8a89ca1:57b5eb6e-7b5d-4b90-a8e8-3736129c9010::"
        },
        "path" : "resources-donotdelete-pr-lbyypdyr2le8tz/training_data_bias_6.csv",
        "firstlineheader": "True"
    }

### for drift (if applicable)
model_type = problem_type = "binary"
# model_type = problem_type =  "multiclass"
# model_type = problem_type = "regression"
enable_drift = True
drift_config = {
    "threshold": 0.05,
    "min_records": 100
}

In [206]:
model_details=published_model_details

In [225]:
# Create a file download link
import base64
from IPython.display import HTML

def create_download_link( title = "Download training data distribution JSON file", filename = None ,config_json=None):  

        output_json = json.dumps(config_json, indent=2)
        b64 = base64.b64encode(output_json.encode())
        payload = b64.decode()
        html = '<a download="{filename}" href="data:text/json;base64,{payload}" target="_blank">{title}</a>'
        html = html.format(payload=payload,title=title,filename=filename)
        return HTML(html)

In [232]:
import json
import requests
from ibm_ai_openscale import APIClient4ICP
from ibm_ai_openscale.utils import handle_response
from requests.auth import HTTPBasicAuth
    
# feedback data
# def generate_feedback_data(training_data):
#     _,feedback_data = train_test_split(training_data, test_size=0.05, shuffle=True, random_state=42)
#     feedback_data.to_csv('feedback_data.csv', index=False)

# configuration.json
def generate_configuration_file(training_data, model_info, fairness_config, quality_config, training_data_reference):
    config = {
        "asset_metadata": model_info,
        "training_data_reference": training_data_reference,
        "training_data_type": {},
        "fairness_configuration": fairness_config,
        "quality_configuration": quality_config
    }
    config["asset_metadata"]["prediction_column"] = "predictedLabel"
    config["asset_metadata"]["probability_column"] = "probability"
    feature_cols = list(training_data.columns.values)
    feature_cols.remove(model_info['label_column'])
    config["asset_metadata"]["feature_columns"] = feature_cols
    
    for col in training_data.columns.values:
        if 'int' in str(training_data[col].dtypes):
            config['training_data_type'][col] = 'int'
        elif 'float' in str(training_data[col].dtypes) or str(training_data[col].dtypes) in ['double', 'long']:
            config['training_data_type'][col] = 'float'
        else:
            config['training_data_type'][col] = 'string'
            
    if enable_drift:
        config["drift_configuration"] = drift_config
        
    
        #json.dump(config, file)
    output_json = json.dumps(config, indent=2)
    b64 = base64.b64encode(output_json.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/json;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title='Download Data',filename='configuration.json')
    return HTML(html)


In [233]:
data = pd.read_csv('/project_data/data_asset/training_data_bias_6.csv')
generate_configuration_file(data, model_info, fairness_config, quality_config, training_data_reference)

In [234]:
# model_meta.json
def generate_model_meta_file(model_info, model_details):
    config = {}
    config['type'] = model_details['entity']['type']
    config['runtime_uid'] = model_details['entity']['runtime']['href'].split('/')[-1]
    config['training_data_references'] = model_details['entity']['training_data_references']
    #with open('model_meta.json','w') as file:
    output_json = json.dumps(config, indent=2)
    b64 = base64.b64encode(output_json.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/json;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title='Download Data',filename='model_meta.json')
    return HTML(html)

In [235]:
generate_model_meta_file(model_info, model_details)

In [237]:
def create_token(WML_CREDENTIALS):    
    WOS_CREDENTIALS = {
        "url": WML_CREDENTIALS['url'],
        "username": WML_CREDENTIALS['username'],
        "password": WML_CREDENTIALS['password'],
    }
    ai_client = APIClient4ICP(WOS_CREDENTIALS)
    header = {
        "Content-Type": "application/x-www-form-urlencoded",
        "Accept": "application/json"
    }
    response = ai_client.requests_session.get(
            WOS_CREDENTIALS['url'] + '/v1/preauth/validateAuth',
            headers=header,
            auth=HTTPBasicAuth(
                WOS_CREDENTIALS['username'],
                WOS_CREDENTIALS['password']
            ),
            verify=False)
    response = handle_response(200, 'access token', response, True)
    token = response['accessToken']
    return token
        
        
# model_content.gzip
def download_model_content(WML_CREDENTIALS, model_details):
    model_uid = model_details['metadata']['href'].split('/')[3].split('?')[0]
    space_uid = model_details['metadata']['href'].split('=')[1]
    host = WML_CREDENTIALS['url']
    
    URL = "{}/v4/models/{}/content?space_id={}".format(host, model_uid, space_uid)
    print(URL)
    token = create_token(WML_CREDENTIALS)
    headers = {
        "Accept": "application/gzip",
        "Authorization": "Bearer "+token,
        "ML-Instance-ID": "999",
        "format": "native"
    }
    response = ai_client.requests_session.get(
        url=URL,
        headers=headers,
    )
    try:   # if it's an error response not model content
        response.content.decode("utf-8")
        print(response.content)
        return
    except:
        with open('model_content.gzip','wb') as file:
            file.write(response.content)
            
download_model_content(WML_CREDENTIALS, model_details)

https://namespace1-cpd-namespace1.apps.rsmar16.os.fyre.ibm.com/v4/models/1ac063ba-0672-4d32-a70f-40ffaa34be8e/content?space_id=1b0adc24-f91d-402b-ba9f-95ec10ad3498


In [238]:
!ls

additional_feedback_data.json	    history_manual_labeling.json
configuration.json		    history_payloads_with_transaction_id_0.json
drift_model.gzip		    history_payloads_with_transaction_id_1.json
history_business_payloads_week.csv  history_payloads_with_transaction_id_2.json
history_debias.json		    history_payloads_with_transaction_id_3.json
history_drift_measurement_0.json    history_payloads_with_transaction_id_4.json
history_drift_measurement_1.json    history_payloads_with_transaction_id_5.json
history_drift_measurement_2.json    history_payloads_with_transaction_id_6.json
history_drift_measurement_3.json    history_quality_metrics.json
history_drift_measurement_4.json    library_content.tar.gz
history_drift_measurement_5.json    model_content.gzip
history_drift_measurement_6.json    model_meta.json
history_fairness.json		    spark-warehouse


In [209]:
def score(training_data_frame):
    #To be filled by the user
    WML_CREDENTAILS = {
        "instance_id": "openshift",
        "url" : "https://namespace1-cpd-namespace1.apps.rsmar16.os.fyre.ibm.com",
        "username":"admin",
        "password": "password",
        "version": "2.5.0"
    }
    deployment_id = deployment_uid
    space_id = '1b0adc24-f91d-402b-ba9f-95ec10ad3498'
      
    #The data type of the label column and prediction column should be same .
    #User needs to make sure that label column and prediction column array should have the same unique class labels
    prediction_column_name = "predictedLabel"
    probability_column_name = "probability"
        
    feature_columns = list(training_data_frame.columns)
    training_data_rows = training_data_frame[feature_columns].values.tolist()
    #print(training_data_rows)
    
    from watson_machine_learning_client import WatsonMachineLearningAPIClient
    wml_client = WatsonMachineLearningAPIClient(WML_CREDENTAILS)
    wml_client.set.default_space(space_id)
    
    payload_scoring = {
      wml_client.deployments.ScoringMetaNames.INPUT_DATA: [{
           "fields": feature_columns,
           "values": [x for x in training_data_rows]
      }]
    }
      
    score = wml_client.deployments.score(deployment_id, payload_scoring)
    score_predictions = score.get('predictions')[0]
      
    prob_col_index = list(score_predictions.get('fields')).index(probability_column_name)
    predict_col_index = list(score_predictions.get('fields')).index(prediction_column_name)
      
    if prob_col_index < 0 or predict_col_index < 0:
        raise Exception("Missing prediction/probability column in the scoring response")
          
    import numpy as np
    probability_array = np.array([value[prob_col_index] for value in score_predictions.get('values')])
    prediction_vector = np.array([value[predict_col_index] for value in score_predictions.get('values')])
      
    return probability_array, prediction_vector

!pip install --upgrade ibm-wos-utils | tail -n 1
from ibm_wos_utils.drift.drift_trainer import DriftTrainer

#Generate drift detection model
feature_columns = list(data.columns)
feature_columns.remove(model_info['label_column'])
drift_detection_input = {
    "feature_columns":feature_columns,
    "categorical_columns":model_info['categorical_columns'],
    "label_column": model_info['label_column'],
    "problem_type": problem_type
}

drift_trainer = DriftTrainer(data,drift_detection_input)
if model_type != "regression":
    #Note: batch_size can be customized by user as per the training data size
    drift_trainer.generate_drift_detection_model(score,batch_size=pd_data.shape[0])

drift_trainer.learn_constraints()
drift_trainer.create_archive()

!mv drift_detection_model.tar.gz drift_model.gzip

Successfully installed ibm-wos-utils-1.1.5
Scoring training dataframe...: 100%|██████████| 4000/4000 [00:02<00:00, 1369.76rows/s]
Optimising Drift Detection Model...: 100%|██████████| 40/40 [01:38<00:00,  2.82s/models]
Scoring training dataframe...: 100%|██████████| 1000/1000 [00:01<00:00, 694.21rows/s]
Computing feature stats...: 100%|██████████| 20/20 [00:00<00:00, 114.10features/s]
Learning single feature constraints...: 100%|██████████| 25/25 [00:00<00:00, 528.58constraints/s]
Learning two feature constraints...: 100%|██████████| 274/274 [00:09<00:00, 28.92constraints/s]


In [None]:
enable_drift=True

In [240]:
#Generate a download link for drift detection model
from IPython.display import HTML
import base64
import io

def create_download_link_for_ddm( title = "Download Drift detection model", filename = None):  
    
    #Retains stats information    
    if enable_drift:
        with open(filename,'rb') as file:
            ddm = file.read()
        b64 = base64.b64encode(ddm)
        payload = b64.decode()
        
        html = '<a download="{filename}" href="data:text/json;base64,{payload}" target="_blank">{title}</a>'
        html = html.format(payload=payload,title=title,filename=filename)
        return HTML(html)
    else:
        print("Drift Detection is not enabled. Please enable and rerun the notebook")


In [241]:
create_download_link_for_ddm(filename='model_content.gzip')


In [242]:
create_download_link_for_ddm(filename='drift_model.gzip')