<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Watson Machine Learning

This notebook should be run using with **Default Spark 3.3 & Python 3.10** runtime environment. **If you are viewing this in Watson Studio and do not see Python 3.10.x in the upper right corner of your screen, please update the runtime now.** It requires service credentials for the following services:
  * Watson OpenScale
  * Watson Machine Learning 
  * DB2

  
The notebook will train, create and deploy a model, configure OpenScale to monitor that deployment, and inject seven days' worth of historical records and measurements for viewing in the OpenScale Insights dashboard.

# Setup <a name="setup"></a>

## Package installation

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
!pip install --upgrade pyspark==3.0.3 --no-cache | tail -n 1

#If you are running this notebook in non IBM Watson Studio env then uncomment the below pip statements and run it
#!pip install --upgrade pandas==1.2.3 --no-cache | tail -n 1
#!pip install --upgrade requests==2.23 --no-cache | tail -n 1
#!pip install numpy==1.20.1 --no-cache | tail -n 1
#!pip install SciPy --no-cache | tail -n 1
#!pip install lime --no-cache | tail -n 1

!pip install --upgrade ibm-watson-machine-learning --user | tail -n 1
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1

### Action: restart the kernel!

## Configure credentials

- WOS_CREDENTIALS (CP4D)
- WML_CREDENTIALS (CP4D)
- DATABASE_CREDENTIALS (DB2 on CP4D or Cloud Object Storage (COS))
- SCHEMA_NAME

In [2]:
#masked
WOS_CREDENTIALS = {
    "url": "Cluster host name",
    "username": "XX",
    "password": "XX"
}

In [3]:
#masked
WML_CREDENTIALS = {
                   "url": "Cluster host name",
                   "username": "XX",
                   "password" : "XX",
                   "instance_id": "wml_local",
                   "version" : "3.5" #If your env is CP4D 4.0 then specify "4.0" instead of "3.5" 
                  }

In [4]:
#masked
#IBM DB2 database connection format example
DATABASE_CREDENTIALS = {
    "hostname":"9.999.999.99",
    "username":"XX",
    "password":"XX",
    "database":"SAMPLE",
    "port":"50000"
}

### Action: put created schema name below.

In [5]:
SCHEMA_NAME = 'AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000'


## Save training data to Cloud Object Storage

### Cloud object storage details¶

In next cells, you will need to paste some credentials to Cloud Object Storage. If you haven't worked with COS yet please visit getting started with COS tutorial. You can find COS_API_KEY_ID and COS_RESOURCE_CRN variables in Service Credentials in menu of your COS instance. Used COS Service Credentials must be created with Role parameter set as Writer. Later training data file will be loaded to the bucket of your instance and used as training refecence in subsription. COS_ENDPOINT variable can be found in Endpoint field of the menu.

In [6]:
IAM_URL="https://iam.ng.bluemix.net/oidc/token"

In [7]:
# masked
COS_API_KEY_ID = "***"
COS_RESOURCE_CRN = "***" # eg "crn:v1:bluemix:public:cloud-object-storage:global:a/3bf0d9003abfb5d29761c3e97696b71c:d6f04d83-6c4f-4a62-a165-696756d63903::"
COS_ENDPOINT = "https://s3.us.cloud-object-storage.appdomain.cloud" # Current list avaiable at https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints
BUCKET_NAME = "testcasebucket"
FILE_NAME = "Indirect_bias_AdultCensusdata.csv"

# Load and explore data

In [8]:
!rm Indirect_bias_AdultCensusdata.csv
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/adult_census/Indirect_bias_AdultCensusdata.csv

--2024-08-06 18:54:16--  https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/adult_census/Indirect_bias_AdultCensusdata.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3551145 (3.4M) [text/plain]
Saving to: ‘Indirect_bias_AdultCensusdata.csv’


2024-08-06 18:54:16 (10.4 MB/s) - ‘Indirect_bias_AdultCensusdata.csv’ saved [3551145/3551145]



## Explore data

In [9]:
from pyspark.sql import SparkSession
import json

spark = SparkSession.builder.getOrCreate()
df_data = spark.read.csv(path="Indirect_bias_AdultCensusdata.csv", sep=",", header=True, inferSchema=True) 
df_data.head()

24/08/06 18:54:18 WARN Utils: Your hostname, Nelwins-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.0.103 instead (on interface en0)
24/08/06 18:54:18 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/08/06 18:54:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/08/06 18:54:18 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.


Row(age=39, workclass='State-gov', fnlwgt=77516, education='Bachelors', education-num=13, Marital='Never-married', occupation='Adm-clerical', relationship='Not-in-family', race='White', sex='Male', capitalgain=2174, loss=0, hoursper=40, citizen_status='United-States', label='<=50K')

In [10]:
print("Number of records: " + str(df_data.count()))

Number of records: 32561


# Create a model

In [11]:
# spark_df = sqlCtx.createDataFrame(df_data)
spark_df = df_data
# Remove protected attributes from training data
protected_attributes = ["race", "age", "sex"]
for attr in protected_attributes:
    spark_df = spark_df.drop(attr)
columns = spark_df.columns
model_name = "Adult Census Income Classifier Model"
deployment_name = "Adult Census Income Classifier Deployment"

spark_df.printSchema()

root
 |-- workclass: string (nullable = true)
 |-- fnlwgt: integer (nullable = true)
 |-- education: string (nullable = true)
 |-- education-num: integer (nullable = true)
 |-- Marital: string (nullable = true)
 |-- occupation: string (nullable = true)
 |-- relationship: string (nullable = true)
 |-- capitalgain: integer (nullable = true)
 |-- loss: integer (nullable = true)
 |-- hoursper: integer (nullable = true)
 |-- citizen_status: string (nullable = true)
 |-- label: string (nullable = true)



In [12]:
from pyspark.ml.feature import OneHotEncoder, StringIndexer, IndexToString, VectorAssembler
from pyspark.ml import Pipeline, Model

cat_features = ['workclass', 'education', 'Marital', 'occupation', 'relationship', 'citizen_status'] 
num_features = ["fnlwgt", "education-num", "capitalgain", "loss", "hoursper"]
stages=[]

for feature in cat_features:
    string_indexer = StringIndexer(inputCol = feature, outputCol = feature + '_IX').setHandleInvalid("keep")
    encoder = OneHotEncoder(inputCols=[string_indexer.getOutputCol()], outputCols=[feature + "classVec"])
    stages += [string_indexer, encoder]

si_Label = StringIndexer(inputCol="label", outputCol="encoded_label").fit(spark_df)
label_converter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=si_Label.labels)
stages.append(si_Label)

In [13]:
assembler_inputs = [c + "classVec" for c in cat_features] + num_features
va_features = VectorAssembler(inputCols=assembler_inputs, outputCol="features")
stages.append(va_features)

In [14]:
(train_data, test_data) = spark_df.randomSplit([0.8, 0.2], 24)
print("Number of records for training: " + str(train_data.count()))
print("Number of records for evaluation: " + str(test_data.count()))

Number of records for training: 26056
Number of records for evaluation: 6505


In [15]:
train_data.columns

['workclass',
 'fnlwgt',
 'education',
 'education-num',
 'Marital',
 'occupation',
 'relationship',
 'capitalgain',
 'loss',
 'hoursper',
 'citizen_status',
 'label']

In [16]:
from pyspark.ml.classification import GBTClassifier, DecisionTreeClassifier, RandomForestClassifier
classifier = RandomForestClassifier(labelCol="encoded_label", featuresCol="features")
stages.append(classifier)
stages.append(label_converter)
pipeline = Pipeline(stages=stages)
model = pipeline.fit(train_data)

In [17]:
predictions = model.transform(test_data)
predictions.printSchema()
predictions.head()

root
 |-- workclass: string (nullable = true)
 |-- fnlwgt: integer (nullable = true)
 |-- education: string (nullable = true)
 |-- education-num: integer (nullable = true)
 |-- Marital: string (nullable = true)
 |-- occupation: string (nullable = true)
 |-- relationship: string (nullable = true)
 |-- capitalgain: integer (nullable = true)
 |-- loss: integer (nullable = true)
 |-- hoursper: integer (nullable = true)
 |-- citizen_status: string (nullable = true)
 |-- label: string (nullable = true)
 |-- workclass_IX: double (nullable = false)
 |-- workclassclassVec: vector (nullable = true)
 |-- education_IX: double (nullable = false)
 |-- educationclassVec: vector (nullable = true)
 |-- Marital_IX: double (nullable = false)
 |-- MaritalclassVec: vector (nullable = true)
 |-- occupation_IX: double (nullable = false)
 |-- occupationclassVec: vector (nullable = true)
 |-- relationship_IX: double (nullable = false)
 |-- relationshipclassVec: vector (nullable = true)
 |-- citizen_status_IX: 

24/08/06 18:54:27 WARN SparkStringUtils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.


Row(workclass='?', fnlwgt=23780, education='Masters', education-num=14, Marital='Married-spouse-absent', occupation='?', relationship='Other-relative', capitalgain=0, loss=0, hoursper=40, citizen_status='United-States', label='<=50K', workclass_IX=3.0, workclassclassVec=SparseVector(9, {3: 1.0}), education_IX=3.0, educationclassVec=SparseVector(16, {3: 1.0}), Marital_IX=5.0, MaritalclassVec=SparseVector(7, {5: 1.0}), occupation_IX=7.0, occupationclassVec=SparseVector(15, {7: 1.0}), relationship_IX=5.0, relationshipclassVec=SparseVector(6, {5: 1.0}), citizen_status_IX=0.0, citizen_statusclassVec=SparseVector(42, {0: 1.0}), encoded_label=0.0, features=SparseVector(100, {3: 1.0, 12: 1.0, 30: 1.0, 39: 1.0, 52: 1.0, 53: 1.0, 95: 23780.0, 96: 14.0, 99: 40.0}), rawPrediction=DenseVector([15.3339, 4.6661]), probability=DenseVector([0.7667, 0.2333]), prediction=0.0, predictedLabel='<=50K')

In [18]:
from pyspark.ml.evaluation import BinaryClassificationEvaluator
evaluatorDT = BinaryClassificationEvaluator(labelCol="encoded_label", rawPredictionCol="rawPrediction")
accuracy = evaluatorDT.evaluate(predictions)

print("Accuracy = %g" % accuracy)

Accuracy = 0.891208


# Save and deploy the model

In [19]:
import json
from ibm_watson_machine_learning import APIClient

wml_client = APIClient(WML_CREDENTIALS)
wml_client.version

24/08/06 18:54:34 WARN GarbageCollectionMetrics: To enable non-built-in garbage collector(s) List(G1 Concurrent GC), users should configure it(them) to spark.eventLog.gcMetrics.youngGenerationGarbageCollectors or spark.eventLog.gcMetrics.oldGenerationGarbageCollectors


'1.0.360'

In [20]:
wml_client.spaces.list(limit=10)

------------------------------------  -------------------------------------------------------------------  ------------------------
ID                                    NAME                                                                 CREATED
7e5a8be6-9103-4c22-9c43-b66f3d8364de  poojitha_notebooks_space                                             2024-06-29T13:53:46.645Z
16ccd855-46bd-43ed-8219-5f00ac565d08  shreya-space                                                         2024-06-26T04:29:17.302Z
bc3b9797-c509-4fb4-a424-f67b1e2ed4be  QUALITY_WMLV4_PREPROD                                                2024-06-23T12:23:04.790Z
e396e187-2977-47b4-ade3-1539f9f10adc  QUALITY_WMLV4_PROD                                                   2024-06-23T12:22:54.422Z
40c4d032-0339-4da6-bfec-4bdb096c9650  shreya                                                               2024-06-20T10:54:20.088Z
088c142e-f35e-4e48-a30c-ad55a6edeecc  notebooks 5.0                                          

Unnamed: 0,ID,NAME,CREATED
0,7e5a8be6-9103-4c22-9c43-b66f3d8364de,poojitha_notebooks_space,2024-06-29T13:53:46.645Z
1,16ccd855-46bd-43ed-8219-5f00ac565d08,shreya-space,2024-06-26T04:29:17.302Z
2,bc3b9797-c509-4fb4-a424-f67b1e2ed4be,QUALITY_WMLV4_PREPROD,2024-06-23T12:23:04.790Z
3,e396e187-2977-47b4-ade3-1539f9f10adc,QUALITY_WMLV4_PROD,2024-06-23T12:22:54.422Z
4,40c4d032-0339-4da6-bfec-4bdb096c9650,shreya,2024-06-20T10:54:20.088Z
5,088c142e-f35e-4e48-a30c-ad55a6edeecc,notebooks 5.0,2024-06-13T04:42:07.336Z
6,b9b3d3b4-6e26-4e16-807d-e8bf5e7d6984,MRM_WMLV4_PREPROD,2024-06-12T15:49:26.571Z
7,d22e2b6b-917c-4427-a40c-1a439352a742,MRM_WMLV4_PROD,2024-06-12T15:49:16.185Z
8,ce15e0f6-be30-4349-af47-35ae15983bf1,openscale-express-path-preprod-00000000-0000-0...,2024-06-04T05:18:51.988Z
9,6264dc0e-087a-4dea-bcbc-6bd872b510fb,openscale-express-path-00000000-0000-0000-0000...,2024-06-04T05:18:30.811Z


## Find the space that you would like to associate the model that is created and deployed as part of the notebook, and specify it in the next cell

In [21]:
WML_SPACE_ID='***' # use space id here
wml_client.set.default_space(WML_SPACE_ID)

'SUCCESS'

In [22]:
deployments_list = wml_client.deployments.get_details()
for deployment in deployments_list["resources"]:
    model_id = deployment["entity"]["asset"]["id"]
    deployment_id = deployment["metadata"]["id"]
    if deployment["metadata"]["name"] == deployment_name:
        print("Deleting deployment id", deployment_id)
        wml_client.deployments.delete(deployment_id)
        print("Deleting model id", model_id)
        wml_client.repository.delete(model_id)
wml_client.repository.list_models()

Deleting deployment id 1bc4d5d7-26c6-4d9e-8cc8-c128e8bf1983
Deleting model id 62b958dd-8bd9-42b5-9a63-33a58ea0a26c
------------------------------------  -------------------------------  ------------------------  ---------  ----------  ----------------
ID                                    NAME                             CREATED                   TYPE       SPEC_STATE  SPEC_REPLACEMENT
773f34d7-920b-4cb1-a0b6-043d71d3a8ec  Spark German Risk Model - Final  2024-08-06T12:55:37.002Z  mllib_3.3  deprecated  spark-mllib_3.4
------------------------------------  -------------------------------  ------------------------  ---------  ----------  ----------------


Unnamed: 0,ID,NAME,CREATED,TYPE,SPEC_STATE,SPEC_REPLACEMENT
0,773f34d7-920b-4cb1-a0b6-043d71d3a8ec,Spark German Risk Model - Final,2024-08-06T12:55:37.002Z,mllib_3.3,deprecated,spark-mllib_3.4


In [23]:
datasource_type = wml_client.connections.get_datasource_type_uid_by_name('bluemixcloudobjectstorage')
conn_meta_props= {
    wml_client.connections.ConfigurationMetaNames.NAME: "Connection My COS ",
    wml_client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: datasource_type,
    wml_client.connections.ConfigurationMetaNames.DESCRIPTION: "Connection to my COS",
    wml_client.connections.ConfigurationMetaNames.PROPERTIES: {
        'bucket': BUCKET_NAME,
        'api_key': COS_API_KEY_ID,
        'resource_instance_id': COS_RESOURCE_CRN,
        'iam_url': "https://iam.ng.bluemix.net/oidc/token",
        'url': COS_ENDPOINT
    }
}

conn_details = wml_client.connections.create(meta_props=conn_meta_props)
connection_id = wml_client.connections.get_uid(conn_details)

training_data_references = [
    {
        "id": "German Credit Risk", 
        "type": "connection_asset",
        "connection": {
            "id": connection_id,
            "href": "/v2/connections/" + connection_id + "?space_id=" + WML_SPACE_ID

        },
        "location": {
            "bucket": BUCKET_NAME,
            "file_name": FILE_NAME
        }
    }    
]

Creating connections...
SUCCESS


In [24]:
software_spec_uid = wml_client.software_specifications.get_id_by_name("spark-mllib_3.3")
print("Software Specification ID: {}".format(software_spec_uid))
model_props = {
        wml_client._models.ConfigurationMetaNames.NAME:"{}".format(model_name),
        wml_client._models.ConfigurationMetaNames.TYPE: "mllib_3.3",
        wml_client._models.ConfigurationMetaNames.SOFTWARE_SPEC_UID: software_spec_uid,
        wml_client._models.ConfigurationMetaNames.LABEL_FIELD: "label",
    }

Software Specification ID: d11f2434-4fc7-58b7-8a62-755da64fdaf8


In [25]:
print("Storing model ...")
published_model_details = wml_client.repository.store_model(
    model=model, 
    meta_props=model_props, 
    training_data=train_data, 
    pipeline=pipeline)

model_uid = wml_client.repository.get_model_id(published_model_details)
print("Done")
print("Model ID: {}".format(model_uid))

Storing model ...


                                                                                

Done
Model ID: eed8b225-19e5-4460-9b6f-7271dc1e3ff2


In [26]:
published_model_details

{'entity': {'hybrid_pipeline_software_specs': [],
  'label_column': 'label',
  'pipeline': {'id': '5174fc07-5a11-4d7e-ab19-8228b0c2c0ef'},
  'schemas': {'input': [{'fields': [{'metadata': {},
       'name': 'workclass',
       'nullable': True,
       'type': 'string'},
      {'metadata': {}, 'name': 'fnlwgt', 'nullable': True, 'type': 'integer'},
      {'metadata': {},
       'name': 'education',
       'nullable': True,
       'type': 'string'},
      {'metadata': {},
       'name': 'education-num',
       'nullable': True,
       'type': 'integer'},
      {'metadata': {}, 'name': 'Marital', 'nullable': True, 'type': 'string'},
      {'metadata': {},
       'name': 'occupation',
       'nullable': True,
       'type': 'string'},
      {'metadata': {},
       'name': 'relationship',
       'nullable': True,
       'type': 'string'},
      {'metadata': {},
       'name': 'capitalgain',
       'nullable': True,
       'type': 'integer'},
      {'metadata': {}, 'name': 'loss', 'nullable'

## Create a model deployment

In [27]:
deployment_details = wml_client.deployments.create(
    model_uid, 
    meta_props={
        wml_client.deployments.ConfigurationMetaNames.NAME: "{}".format(deployment_name),
        wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
    }
)
scoring_url = wml_client.deployments.get_scoring_href(deployment_details)
deployment_uid=wml_client.deployments.get_id(deployment_details)

print("Scoring URL:" + scoring_url)
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))



#######################################################################################

Synchronous deployment creation for uid: 'eed8b225-19e5-4460-9b6f-7271dc1e3ff2' started

#######################################################################################


initializing
Note: Software specification spark-mllib_3.3 is deprecated. Use spark-mllib_3.4 software specification instead when saving a spark model. For details, see https://www.ibm.com/support/producthub/icpdata/docs/content/SSQNUZ_latest/wsj/wmls/wmls-deploy-python-types.html.
.
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='038b8c68-6615-4c0d-8d46-15783d6ab9c8'
------------------------------------------------------------------------------------------------


Scoring URL:https://cpd-cpd-instance.apps.wos415nfs2672.cp.fyre.ibm.com/ml/v4/deployments/038b8c68-6615-4c0d-8d46-15783d6ab9c8/predictions
Model 

# Construct the scoring payload

In [28]:
import pandas as pd

df = pd.read_csv("Indirect_bias_AdultCensusdata.csv")
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,Marital,occupation,relationship,race,sex,capitalgain,loss,hoursper,citizen_status,label
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


## Remove the sensitive attributes

In [29]:
cols_to_remove = ['label']
cols_to_remove.extend(protected_attributes)
cols_to_remove

['label', 'race', 'age', 'sex']

## Create the meta data frame capturing the sensitive data

In [30]:
meta_df = df[protected_attributes].copy()
meta_fields = meta_df.columns.tolist()
meta_values = meta_df[meta_fields].values.tolist()

## Construct the scoring payload comprising the meta fields

In [31]:
def get_scoring_payload(no_of_records_to_score = 1):
    meta_payload = {
        "fields": meta_fields,
        "values": meta_values[:no_of_records_to_score]
    }

    for col in cols_to_remove:
        if col in df.columns:
            del df[col] 

    fields = df.columns.tolist()
    values = df[fields].values.tolist()

    payload_scoring = {"input_data": [{"fields": fields, "values": values[:no_of_records_to_score],"meta": meta_payload}]}  
    return payload_scoring

## Method to perform scoring

In [32]:
def sample_scoring(no_of_records_to_score = 1):
    records_list=[]
    payload_scoring = get_scoring_payload(no_of_records_to_score)
    scoring_response = wml_client.deployments.score(deployment_uid, payload_scoring)
    print('Single record scoring result:', '\n fields:', scoring_response['predictions'][0]['fields'], '\n values: ', scoring_response['predictions'][0]['values'][0])
    print(json.dumps(scoring_response, indent=None))
    return payload_scoring, scoring_response

In [33]:
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
def payload_logging(no_of_records_to_score = 1):
    records_list=[]
    payload_scoring = get_scoring_payload(no_of_records_to_score)
    
    
    scoring_response = wml_client.deployments.score(deployment_uid, payload_scoring)
    time.sleep(5)
    pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))
    if pl_records_count == 0:
        print("Payload logging did not happen, performing explicit payload logging.")
    
        #manual PL logging if automated logging does not work
        score_input=payload_scoring['input_data'][0]
        score_response=scoring_response['predictions'][0]
        pl_record = PayloadRecord(request=score_input, response=score_response, response_time=int(460))
        records_list.append(pl_record)
        wos_client.data_sets.store_records(data_set_id = payload_data_set_id, request_body=records_list)
        
        
        time.sleep(5)
        pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
        print("Number of records in the payload logging table: {}".format(pl_records_count))

## Score the model and print the scoring response

In [34]:
sample_scoring(no_of_records_to_score = 1)

Single record scoring result: 
 fields: ['workclass', 'fnlwgt', 'education', 'education-num', 'Marital', 'occupation', 'relationship', 'capitalgain', 'loss', 'hoursper', 'citizen_status', 'workclass_IX', 'workclassclassVec', 'education_IX', 'educationclassVec', 'Marital_IX', 'MaritalclassVec', 'occupation_IX', 'occupationclassVec', 'relationship_IX', 'relationshipclassVec', 'citizen_status_IX', 'citizen_statusclassVec', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'] 
 values:  ['State-gov', 77516, 'Bachelors', 13, 'Never-married', 'Adm-clerical', 'Not-in-family', 2174, 0, 40, 'United-States', 4.0, [9, [4], [1.0]], 2.0, [16, [2], [1.0]], 1.0, [7, [1], [1.0]], 3.0, [15, [3], [1.0]], 1.0, [6, [1], [1.0]], 0.0, [42, [0], [1.0]], [100, [4, 11, 26, 35, 48, 53, 95, 96, 97, 99], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 77516.0, 13.0, 2174.0, 40.0]], [17.120135477812976, 2.879864522187024], [0.8560067738906488, 0.1439932261093512], 0.0, '<=50K']
{"predictions": [{"fields": ["

({'input_data': [{'fields': ['workclass',
     'fnlwgt',
     'education',
     'education-num',
     'Marital',
     'occupation',
     'relationship',
     'capitalgain',
     'loss',
     'hoursper',
     'citizen_status'],
    'values': [['State-gov',
      77516,
      'Bachelors',
      13,
      'Never-married',
      'Adm-clerical',
      'Not-in-family',
      2174,
      0,
      40,
      'United-States']],
    'meta': {'fields': ['race', 'age', 'sex'],
     'values': [['White', 39, 'Male']]}}]},
 {'predictions': [{'fields': ['workclass',
     'fnlwgt',
     'education',
     'education-num',
     'Marital',
     'occupation',
     'relationship',
     'capitalgain',
     'loss',
     'hoursper',
     'citizen_status',
     'workclass_IX',
     'workclassclassVec',
     'education_IX',
     'educationclassVec',
     'Marital_IX',
     'MaritalclassVec',
     'occupation_IX',
     'occupationclassVec',
     'relationship_IX',
     'relationshipclassVec',
     'citizen_status_

# Configure OpenScale 

The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [35]:
from ibm_watson_openscale import APIClient
from ibm_watson_openscale.utils import *
from ibm_watson_openscale.supporting_classes import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

import json
import requests
import base64
from requests.auth import HTTPBasicAuth
import time

## Get a instance of the OpenScale SDK client

In [36]:
authenticator = CloudPakForDataAuthenticator(
        url=WOS_CREDENTIALS['url'],
        username=WOS_CREDENTIALS['username'],
        password=WOS_CREDENTIALS['password'],
        disable_ssl_verification=True
    )

wos_client = APIClient(service_url=WOS_CREDENTIALS['url'],authenticator=authenticator)
wos_client.version

'3.0.39'

## Create datamart

### Set up datamart

Watson OpenScale uses a database to store payload logs and calculated metrics. If database credentials were supplied, the datamart will be created there unless there is an existing datamart and the KEEP_MY_INTERNAL_POSTGRES variable is set to True. If an OpenScale datamart exists in Db2 or PostgreSQL, the existing datamart will be used and no data will be overwritten.

Prior instances of the model will be removed from OpenScale monitoring.

In [37]:
wos_client.data_marts.show()

0,1,2,3,4,5
AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000,Data Mart created by OpenScale ExpressPath,False,active,2024-06-04 05:19:03.698000+00:00,00000000-0000-0000-0000-000000000000


In [38]:
data_marts = wos_client.data_marts.list().result.data_marts
if len(data_marts) == 0:
    if DB_CREDENTIALS is not None:
        if SCHEMA_NAME is None: 
            print("Please specify the SCHEMA_NAME and rerun the cell")

        print('Setting up external datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook",
                database_configuration=DatabaseConfigurationRequest(
                  database_type=DatabaseType.DB2,
                    credentials=PrimaryStorageCredentialsLong(
                        hostname=DATABASE_CREDENTIALS['hostname'],
                        username=DATABASE_CREDENTIALS['username'],
                        password=DATABASE_CREDENTIALS['password'],
                        db=DATABASE_CREDENTIALS['database'],
                        port=DATABASE_CREDENTIALS['port'],
                        ssl=DATABASE_CREDENTIALS['ssl'],
                        sslmode=DATABASE_CREDENTIALS['sslmode'],
                        certificate_base64=DATABASE_CREDENTIALS['certificate_base64']
                    ),
                    location=LocationSchemaName(
                        schema_name= SCHEMA_NAME
                    )
                )
             ).result
    else:
        print('Setting up internal datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook", 
                internal_database = True).result
        
    data_mart_id = added_data_mart_result.metadata.id
    
else:
    data_mart_id=data_marts[0].metadata.id
    print('Using existing datamart {}'.format(data_mart_id))

Using existing datamart 00000000-0000-0000-0000-000000000000


In [39]:
data_mart_details = wos_client.data_marts.list().result.data_marts[0]
data_mart_details.to_dict()

{'metadata': {'id': '00000000-0000-0000-0000-000000000000',
  'crn': 'crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:data_mart:00000000-0000-0000-0000-000000000000',
  'url': '/v2/data_marts/00000000-0000-0000-0000-000000000000',
  'created_at': '2024-06-04T05:19:03.698000Z',
  'created_by': 'cpadmin',
  'modified_at': '2024-06-04T05:19:11.402000Z',
  'modified_by': 'cpadmin'},
 'entity': {'name': 'AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000',
  'description': 'Data Mart created by OpenScale ExpressPath',
  'service_instance_crn': 'N/A',
  'internal_database': False,
  'database_configuration': {'database_type': 'db2',
   'credentials': {'secret_id': '3bd1e5c0-6987-4719-a9f8-7dc94bfac74d'},
   'location': {'schema_name': 'AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000'}},
  'status': {'state': 'active'}}}

In [40]:
wos_client.service_providers.show()

0,1,2,3,4,5
,active,IAE7,custom_machine_learning,2024-07-02 09:08:06.273000+00:00,184e73a2-7fd8-4f3f-b994-bb648f6eb8ec
,active,IAE6,custom_machine_learning,2024-07-02 09:04:22.643000+00:00,644befcd-6d36-4f4d-a30a-cd51a28b63fe
,active,WML_IAE5,custom_machine_learning,2024-07-02 08:43:08.723000+00:00,e986a0d7-8187-4fed-ab2a-c614a9683cae
,active,WML_IAE4,custom_machine_learning,2024-07-02 07:00:41.816000+00:00,d90c6bf2-49c6-4179-9876-8b85b0247d95
99999999-9999-9999-9999-999999999999,active,Image Multiclass Watson Machine Learning V2_test,watson_machine_learning,2024-07-01 17:17:14.696000+00:00,a7ca157a-de07-457a-8c4c-b1a2e998699c
,active,WML_remote_spark_jdbc,custom_machine_learning,2024-07-01 09:40:35.190000+00:00,26b2af15-e396-4295-81d2-6912ab93912b
00000000-0000-0000-0000-000000000000,active,WML_IAE3,watson_machine_learning,2024-06-30 11:22:00.507000+00:00,d7ad1164-a387-462d-b495-dd25d6241404
,active,OpenScale Headless Service Provider,custom_machine_learning,2024-06-29 03:21:50.385000+00:00,e67332c4-0f88-4a8f-9c41-396d28c07448
,active,IAE3,custom_machine_learning,2024-06-28 05:11:53.191000+00:00,8ac485c9-66c8-4a9c-9786-57d2940f26e8
,active,WML_IAE2,custom_machine_learning,2024-06-27 13:36:27.009000+00:00,8b65752d-0dc9-4715-a2a2-96ee19fb7ded


Note: First 10 records were displayed.


## Remove existing service provider connected with used WML instance.

Multiple service providers for the same engine instance are avaiable in Watson OpenScale. To avoid multiple service providers of used WML instance in the tutorial notebook the following code deletes existing service provder(s) and then adds new one.

In [41]:
SERVICE_PROVIDER_NAME = "Watson Machine Learning - Indirect Bias Demo"
SERVICE_PROVIDER_DESCRIPTION = "Added by tutorial WOS notebook to showcase Indirect Bias functionality."

In [42]:
service_providers = wos_client.service_providers.list().result.service_providers
for service_provider in service_providers:
    service_instance_name = service_provider.entity.name
    if service_instance_name == SERVICE_PROVIDER_NAME:
        service_provider_id = service_provider.metadata.id
        wos_client.service_providers.delete(service_provider_id)
        print("Deleted existing service_provider for WML instance: {}".format(service_provider_id))

## Add service provider

Watson OpenScale needs to be bound to the Watson Machine Learning instance to capture payload data into and out of the model.
Note: You can bind more than one engine instance if needed by calling wos_client.service_providers.add method. Next, you can refer to particular service provider using service_provider_id.

In [43]:
added_service_provider_result = wos_client.service_providers.add(
        name=SERVICE_PROVIDER_NAME,
        description=SERVICE_PROVIDER_DESCRIPTION,
        service_type=ServiceTypes.WATSON_MACHINE_LEARNING,
        deployment_space_id = WML_SPACE_ID,
        operational_space_id = "production",
        credentials=WMLCredentialsCP4D(),
        background_mode=False
    ).result
service_provider_id = added_service_provider_result.metadata.id




 Waiting for end of adding service provider c3601bc6-22c9-45a1-b016-c29cf0bf64b4 




active

-----------------------------------------------
 Successfully finished adding service provider 
-----------------------------------------------




In [44]:
print(wos_client.service_providers.get(service_provider_id).result)

{
  "metadata": {
    "id": "c3601bc6-22c9-45a1-b016-c29cf0bf64b4",
    "crn": "crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:service_provider:c3601bc6-22c9-45a1-b016-c29cf0bf64b4",
    "url": "/v2/service_providers/c3601bc6-22c9-45a1-b016-c29cf0bf64b4",
    "created_at": "2024-08-06T13:25:37.046000Z",
    "created_by": "cpadmin"
  },
  "entity": {
    "name": "Watson Machine Learning - Indirect Bias Demo-2",
    "service_type": "watson_machine_learning",
    "instance_id": "99999999-9999-9999-9999-999999999999",
    "credentials": {
      "secret_id": "b5bc1437-c72a-4efc-b06c-8e59934ed321"
    },
    "operational_space_id": "production",
    "deployment_space_id": "e8b87647-a0e9-4932-920f-1e8f1d1f383d",
    "status": {
      "state": "active"
    }
  }
}


In [46]:
asset_deployment_details = wos_client.service_providers.list_assets(data_mart_id=data_mart_id, service_provider_id=service_provider_id, deployment_id=deployment_uid,deployment_space_id = WML_SPACE_ID).result['resources'][0]
asset_deployment_details

{'metadata': {'guid': '038b8c68-6615-4c0d-8d46-15783d6ab9c8',
  'created_at': '2024-08-06T13:25:16.409Z',
  'modified_at': '2024-08-06T13:25:16.409Z'},
 'entity': {'name': 'Adult Census Income Classifier Deployment',
  'type': 'online',
  'scoring_endpoint': {'url': 'https://internal-nginx-svc:12443/ml/v4/deployments/038b8c68-6615-4c0d-8d46-15783d6ab9c8/predictions'},
  'asset': {},
  'asset_properties': {}}}

In [47]:
model_asset_details_from_deployment=wos_client.service_providers.get_deployment_asset(data_mart_id=data_mart_id,service_provider_id=service_provider_id,deployment_id=deployment_uid,deployment_space_id=WML_SPACE_ID)
#model_asset_details_from_deployment

## Subscriptions

Remove existing credit risk subscriptions

This code removes previous subscriptions to the model to refresh the monitors with the new model and new data.

In [48]:
wos_client.subscriptions.show()

0,1,2,3,4,5,6,7,8,9
438ca544-9bd1-48c2-8e8d-3de4ef4ca79b,model,WML_IAE4,00000000-0000-0000-0000-000000000000,78a0af9e-1014-4fb1-b22a-5e11f4fd70e7,WML_IAE4,d90c6bf2-49c6-4179-9876-8b85b0247d95,active,2024-07-02 07:03:31.504000+00:00,e34b9b87-b6e1-4c53-b92e-cb80dea042be
592b902d-3dc9-4e56-8bcb-86cbf1a6d8a9,model,gcr - P2 XGB Classifier - Model,00000000-0000-0000-0000-000000000000,2b976af0-e4ab-4859-af7d-2f2287d864ad,gcr model,4d2f2fb2-6b64-4d58-8f13-257166e468e9,active,2024-07-17 07:11:44.727000+00:00,e2df4ec7-6c75-416f-a444-8d21389f7513
e3ac9fc3-bccf-4a4e-b37b-490bfb93dd81,model,GCR AutoAI - P2 XGB Classifier - Model,00000000-0000-0000-0000-000000000000,6399e6e8-df5a-4370-9af4-34b2f2e76bc6,GCR Auto AI,a7ca157a-de07-457a-8c4c-b1a2e998699c,active,2024-07-03 10:37:20.487000+00:00,ce36911c-75f6-4f99-b4d2-b71e5d55a802
592b902d-3dc9-4e56-8bcb-86cbf1a6d8a9,model,gcr - P2 XGB Classifier - Model,00000000-0000-0000-0000-000000000000,2b976af0-e4ab-4859-af7d-2f2287d864ad,gcr model,4d2f2fb2-6b64-4d58-8f13-257166e468e9,active,2024-07-02 15:33:17.454000+00:00,e96278a6-7190-48f3-b8bd-945fa48cfe50
327d8aea-ecfc-4990-9bb9-601a1695094d,model,GCR AutoAI - P2 XGB Classifier - Model,00000000-0000-0000-0000-000000000000,755c3e75-24b5-4839-8a8f-3f85c07a40c9,GCR demo,a7ca157a-de07-457a-8c4c-b1a2e998699c,active,2024-07-02 12:03:14.224000+00:00,a0f86241-8bfc-4322-895a-2597512e1653
b21904ef-7478-4dae-b93f-c120e95c9200,model,My SDK Batch Subscription-db2,00000000-0000-0000-0000-000000000000,a10a121b-2394-4fe7-9a2d-f0520abd212c,My SDK Batch Subscription-db2,644befcd-6d36-4f4d-a30a-cd51a28b63fe,active,2024-07-02 09:04:38.883000+00:00,70c9c394-2530-4c0f-b0b2-6b81e44612d0
7bf1d492-3275-405d-9b61-da4e2075e746,model,My SDK Batch Subscription-db2,00000000-0000-0000-0000-000000000000,a302d0d4-3f38-4ded-8688-3e4b2e9bb55c,My SDK Batch Subscription-db2,e986a0d7-8187-4fed-ab2a-c614a9683cae,active,2024-07-02 08:43:25.526000+00:00,f4178567-4f9a-4b0e-b255-2babdb8b5f38
405fe789-e294-42ce-9989-774d484205c3,model,WML_remote_spark_jdbc,00000000-0000-0000-0000-000000000000,ea8d064d-2eb9-48a0-97af-f385dd358843,WML_remote_spark_jdbc,26b2af15-e396-4295-81d2-6912ab93912b,active,2024-07-01 09:41:24.463000+00:00,3046b546-e422-4737-9123-a26e025d3cef
d1869f46-b9a0-4e99-bfbe-491074a4e40d,model,MNIST Model,00000000-0000-0000-0000-000000000000,211b2ebc-ec74-4090-a526-e896d5ed6166,MNIST Model deployment,a7ca157a-de07-457a-8c4c-b1a2e998699c,active,2024-07-01 17:17:44.331000+00:00,1b8ebb3c-e69a-4cc1-9a0a-57424546cab3
592b902d-3dc9-4e56-8bcb-86cbf1a6d8a9,model,gcr - P2 XGB Classifier - Model,00000000-0000-0000-0000-000000000000,2b976af0-e4ab-4859-af7d-2f2287d864ad,gcr model,4d2f2fb2-6b64-4d58-8f13-257166e468e9,active,2024-06-30 12:02:35.155000+00:00,7585a078-c1b6-4a77-9ab3-87be9fb6026c


Note: First 10 records were displayed.


## Remove the existing subscription

In [49]:
subscriptions = wos_client.subscriptions.list().result.subscriptions
for subscription in subscriptions:
    sub_model_id = subscription.entity.asset.asset_id
    if sub_model_id == model_uid:
        wos_client.subscriptions.delete(subscription.metadata.id)
        print('Deleted existing subscription for model', sub_model_id)

This code creates the model subscription in OpenScale using the Python client API. Note that we need to provide the model unique identifier, and some information about the model itself.

In [50]:
feature_columns = cat_features + num_features
feature_columns

['workclass',
 'education',
 'Marital',
 'occupation',
 'relationship',
 'citizen_status',
 'fnlwgt',
 'education-num',
 'capitalgain',
 'loss',
 'hoursper']

In [51]:
subscription_details = wos_client.subscriptions.add(
        data_mart_id=data_mart_id,
        service_provider_id=service_provider_id,
        asset=Asset(
            asset_id=model_asset_details_from_deployment["entity"]["asset"]["asset_id"],
            name=model_asset_details_from_deployment["entity"]["asset"]["name"],
            url=model_asset_details_from_deployment["entity"]["asset"]["url"],
            asset_type=AssetTypes.MODEL,
            input_data_type=InputDataType.STRUCTURED,
            problem_type=ProblemType.BINARY_CLASSIFICATION
        ),
        deployment=AssetDeploymentRequest(
            deployment_id=asset_deployment_details['metadata']['guid'],
            name=asset_deployment_details['entity']['name'],
            deployment_type= DeploymentTypes.ONLINE,
            url=asset_deployment_details['entity']['scoring_endpoint']['url']
        ),
        asset_properties=AssetPropertiesRequest(
            label_column="label",
            probability_fields=["probability"],
            prediction_field="predictedLabel",
            feature_fields = feature_columns,
            categorical_fields = cat_features,
            training_data_reference=TrainingDataReference(type="cos",
                                                          location=COSTrainingDataReferenceLocation(bucket = BUCKET_NAME,
                                                                                                    file_name = FILE_NAME),
                                                          connection=COSTrainingDataReferenceConnection.from_dict({
                                                                        "resource_instance_id": COS_RESOURCE_CRN,
                                                                        "url": COS_ENDPOINT,
                                                                        "api_key": COS_API_KEY_ID,
                                                                        "iam_url": IAM_URL})),
            training_data_schema=SparkStruct.from_dict(model_asset_details_from_deployment["entity"]["asset_properties"]["training_data_schema"])
        )
    ).result
subscription_id = subscription_details.metadata.id
print('subscription_id: ' + subscription_id)

subscription_id: c274e76c-6bb9-43ce-ba64-127a3db95ede


In [52]:
import time

time.sleep(5)
payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id:", payload_data_set_id)

Payload data set id: 4b2d90a6-b7b2-49b8-9fa8-ee01e03d7d13


In [53]:
wos_client.data_sets.show()

0,1,2,3,4,5,6
00000000-0000-0000-0000-000000000000,active,c274e76c-6bb9-43ce-ba64-127a3db95ede,subscription,payload_logging_error,2024-08-06 13:26:11.281000+00:00,857c4ab8-1dc5-455f-8603-5de875cfba52
00000000-0000-0000-0000-000000000000,active,c274e76c-6bb9-43ce-ba64-127a3db95ede,subscription,model_health,2024-08-06 13:26:11.262000+00:00,16f42ee3-0e97-4e29-9056-25fe654c96d0
00000000-0000-0000-0000-000000000000,active,c274e76c-6bb9-43ce-ba64-127a3db95ede,subscription,manual_labeling,2024-08-06 13:26:11.076000+00:00,0650fc95-c136-4c31-883c-2dd5e3307d7e
00000000-0000-0000-0000-000000000000,active,c274e76c-6bb9-43ce-ba64-127a3db95ede,subscription,payload_logging,2024-08-06 13:26:10.877000+00:00,4b2d90a6-b7b2-49b8-9fa8-ee01e03d7d13
00000000-0000-0000-0000-000000000000,active,e34b9b87-b6e1-4c53-b92e-cb80dea042be,subscription,payload_logging,2024-07-02 07:03:32.634000+00:00,0063481a-26eb-4934-85a7-4ebc71a2fea0
00000000-0000-0000-0000-000000000000,active,e96278a6-7190-48f3-b8bd-945fa48cfe50,subscription,model_health,2024-07-02 15:33:20.908000+00:00,0189e173-0d7f-41da-baaa-20f022f362f8
00000000-0000-0000-0000-000000000000,active,7c0c6db0-0c7f-415b-85f1-cae28daded5b,subscription,model_health,2024-06-14 03:16:37.811000+00:00,e56a3698-37d4-49cd-87fd-34b9be83b343
00000000-0000-0000-0000-000000000000,active,e2df4ec7-6c75-416f-a444-8d21389f7513,subscription,payload_logging,2024-07-17 07:11:46.741000+00:00,9d6c26da-0e8a-4ec1-a077-30595e96797c
00000000-0000-0000-0000-000000000000,active,e2df4ec7-6c75-416f-a444-8d21389f7513,subscription,explanations,2024-07-17 12:14:38.155000+00:00,884d6811-c36a-4756-bc7b-45c7f0220a44
00000000-0000-0000-0000-000000000000,active,e2df4ec7-6c75-416f-a444-8d21389f7513,subscription,payload_logging_error,2024-07-17 07:11:47.156000+00:00,41ef8863-2f81-42d4-97e2-814a4cf31a2f


Note: First 10 records were displayed.


In [54]:
wos_client.subscriptions.get(subscription_id).result.to_dict()

{'metadata': {'id': 'c274e76c-6bb9-43ce-ba64-127a3db95ede',
  'crn': 'crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:subscription:c274e76c-6bb9-43ce-ba64-127a3db95ede',
  'url': '/v2/subscriptions/c274e76c-6bb9-43ce-ba64-127a3db95ede',
  'created_at': '2024-08-06T13:26:09.549000Z',
  'created_by': 'cpadmin',
  'modified_at': '2024-08-06T13:26:11.503000Z',
  'modified_by': 'cpadmin'},
 'entity': {'data_mart_id': '00000000-0000-0000-0000-000000000000',
  'service_provider_id': 'c3601bc6-22c9-45a1-b016-c29cf0bf64b4',
  'asset': {'asset_id': 'eed8b225-19e5-4460-9b6f-7271dc1e3ff2',
   'url': 'https://internal-nginx-svc:12443/ml/v4/models/eed8b225-19e5-4460-9b6f-7271dc1e3ff2?space_id=e8b87647-a0e9-4932-920f-1e8f1d1f383d&version=2020-06-12',
   'name': 'Adult Census Income Classifier Model',
   'asset_type': 'model',
   'problem_type': 'binary',
   'input_data_type': 'structured'},
  'asset_properties': {'training_data_reference': {'secret_id': 'f4dd80ca-

# Score the model so we can configure monitors

Now that the WML service has been bound and the subscription has been created, we need to send a request to the model before we configure OpenScale. This allows OpenScale to create a payload log in the datamart with the correct schema, so it can capture data coming into and out of the model.

In [55]:
payload_logging(no_of_records_to_score = 1000)

Number of records in the payload logging table: 1000


In [56]:
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    raise Exception("Payload logging did not happen!")

Number of records in the payload logging table: 1000


## Fairness configuration

The code below configures fairness monitoring for our model. It turns on monitoring for two features, sex and age. In each case, we must specify:
    
Which model feature to monitor One or more majority groups, which are values of that feature that we expect to receive a higher percentage of favorable outcomes One or more minority groups, which are values of that feature that we expect to receive a higher percentage of unfavorable outcomes The threshold at which we would like OpenScale to display an alert if the fairness measurement falls below (in this case, 80%) Additionally, we must specify which outcomes from the model are favourable outcomes, and which are unfavourable. We must also provide the number of records OpenScale will use to calculate the fairness score. In this case, OpenScale's fairness monitor will run hourly, but will not calculate a new fairness rating until at least 100 records have been added. Finally, to calculate fairness, OpenScale must perform some calculations on the training data, so we provide the dataframe containing the data.

### Create Fairness Monitor Instance

In [57]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)
parameters = {
    "features": [
        {
            "feature": "sex",
            "majority": ["Male"],
            "minority": ["Female"]
        },
        {
            "feature": "age",
            "majority": [[41,75]],
            "minority": [[18,33]]
        }
    ],
    "favourable_class": [">50K"],
    "unfavourable_class": ["<=50K"],
    "min_records": 1000
}
thresholds = [
    {
        "metric_id": "fairness_value",
        "specific_values": [
            {
                "applies_to": [
                    {
                        "type": "tag",
                        "value": "sex",
                        "key": "feature"
                    }
                ],
                "value": 80
            },
            {
                "applies_to": [
                    {
                        "type": "tag",
                        "value": "age",
                        "key": "feature"
                    }
                ],
                "value": 80
            }
        ],
        "type": "lower_limit",
        "value": 80
    }
]
fairness_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.FAIRNESS.ID,
    target=target,
    parameters=parameters,
    thresholds=thresholds
).result
fairness_monitor_instance_id =fairness_monitor_details.metadata.id




 Waiting for end of monitor instance creation 841a81b1-b4cf-4e98-87de-1d374d43df63 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




### Get Fairness Monitor Instance

In [58]:
wos_client.monitor_instances.show()

0,1,2,3,4,5,6
00000000-0000-0000-0000-000000000000,active,c274e76c-6bb9-43ce-ba64-127a3db95ede,subscription,fairness,2024-08-06 13:26:36.564000+00:00,841a81b1-b4cf-4e98-87de-1d374d43df63
00000000-0000-0000-0000-000000000000,active,c274e76c-6bb9-43ce-ba64-127a3db95ede,subscription,model_health,2024-08-06 13:26:10.720000+00:00,0f6bfa05-d2f6-4286-8714-827d711d22aa
00000000-0000-0000-0000-000000000000,active,c274e76c-6bb9-43ce-ba64-127a3db95ede,subscription,performance,2024-08-06 13:26:11.762000+00:00,da29a599-b38d-4f70-a562-43e371fc4e87
00000000-0000-0000-0000-000000000000,active,7c0c6db0-0c7f-415b-85f1-cae28daded5b,subscription,fairness,2024-06-14 03:25:06.580000+00:00,a7160110-6133-4635-8534-6478e96e1fbf
00000000-0000-0000-0000-000000000000,active,9acf1d51-2958-44f8-ade9-d6d7663784ef,subscription,model_health,2024-06-20 13:24:56.801000+00:00,309f5342-a9bd-4bbb-ab22-a109c8deb35c
00000000-0000-0000-0000-000000000000,active,cb83c8bf-9e3b-42bf-9447-c4fb2706b56e,subscription,model_health,2024-06-29 03:22:11.907000+00:00,a1e95409-191f-4382-9128-77fe7f9349db
00000000-0000-0000-0000-000000000000,active,cc0a7062-56ef-4feb-91aa-33764217ad77,subscription,model_health,2024-06-20 14:21:30.390000+00:00,7dcdf7ac-7cba-4a07-914f-b6a58b499d6a
00000000-0000-0000-0000-000000000000,active,1b8ebb3c-e69a-4cc1-9a0a-57424546cab3,subscription,model_health,2024-07-01 17:17:45.382000+00:00,1d0924c1-7f1a-419c-b2bc-bff1f9a5d5d0
00000000-0000-0000-0000-000000000000,active,7c0c6db0-0c7f-415b-85f1-cae28daded5b,subscription,mrm,2024-06-14 03:16:37.493000+00:00,87fce793-9360-4782-8b02-1f83bd4b8b2f
00000000-0000-0000-0000-000000000000,active,7c0c6db0-0c7f-415b-85f1-cae28daded5b,subscription,model_health,2024-06-14 03:16:37.532000+00:00,bf46efe7-3db2-4627-92db-e4119cb9bbe0


Note: First 10 records were displayed.


### Get run details
In case of production subscription, initial monitoring run is triggered internally. Checking its status

In [59]:
runs = wos_client.monitor_instances.list_runs(fairness_monitor_instance_id, limit=1).result.to_dict()
fairness_monitoring_run_id = runs["runs"][0]["metadata"]["id"]
run_status = None
while(run_status not in ["finished", "error"]):
    run_details = wos_client.monitor_instances.get_run_details(fairness_monitor_instance_id, fairness_monitoring_run_id).result.to_dict()
    run_status = run_details["entity"]["status"]["state"]
    print('run_status: ', run_status)
    if run_status in ["finished", "error"]:
        break
    time.sleep(10)

run_status:  finished


### Fairness run output

In [60]:
wos_client.monitor_instances.get_run_details(fairness_monitor_instance_id, fairness_monitoring_run_id).result.to_dict()

{'metadata': {'id': '6c53aa46-439a-4008-bbfc-165132dff3ff',
  'crn': 'crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:run:6c53aa46-439a-4008-bbfc-165132dff3ff',
  'url': '/v2/monitor_instances/841a81b1-b4cf-4e98-87de-1d374d43df63/runs/6c53aa46-439a-4008-bbfc-165132dff3ff',
  'created_at': '2024-08-06T13:26:37.927000Z',
  'created_by': 'internal-service'},
 'entity': {'triggered_by': 'user',
  'parameters': {'is_group_bias_completed': True,
   'measurement_id': '0ac72f74-aa61-4412-84fd-718b5e746fa6',
   'total_records_processed': 1000},
  'status': {'state': 'finished',
   'queued_at': '2024-08-06T13:26:37.919000Z',
   'started_at': '2024-08-06T13:26:38.779000Z',
   'updated_at': '2024-08-06T13:26:40.985000Z',
   'completed_at': '2024-08-06T13:26:40.933000Z',
   'message': 'bias run is successful.',
   'operators': []}}}

In [61]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=fairness_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2024-08-06 13:26:40.688183+00:00,fairness_value,0ac72f74-aa61-4412-84fd-718b5e746fa6,18.377,80.0,,"['feature:sex', 'fairness_metric_type:fairness', 'feature_value:Female']",fairness,841a81b1-b4cf-4e98-87de-1d374d43df63,6c53aa46-439a-4008-bbfc-165132dff3ff,subscription,c274e76c-6bb9-43ce-ba64-127a3db95ede
2024-08-06 13:26:40.688183+00:00,fairness_value,0ac72f74-aa61-4412-84fd-718b5e746fa6,38.577,80.0,,"['feature:age', 'fairness_metric_type:fairness', 'feature_value:18-33']",fairness,841a81b1-b4cf-4e98-87de-1d374d43df63,6c53aa46-439a-4008-bbfc-165132dff3ff,subscription,c274e76c-6bb9-43ce-ba64-127a3db95ede


In [62]:
FAIRNESS_DASHBOARD_URL = WOS_CREDENTIALS["url"] + "/aiopenscale/insights/{0}/fairness/age?features=fairnessv2,indirect_bias,v2transaction".format(deployment_uid)

In [63]:
from IPython.display import Markdown as md
md("#### Link to IBM Watson OpenScale Fairness Dashboard: {}".format(FAIRNESS_DASHBOARD_URL))

#### Link to IBM Watson OpenScale Fairness Dashboard: https://cpd-cpd-instance.apps.wos415nfs2672.cp.fyre.ibm.com/aiopenscale/insights/038b8c68-6615-4c0d-8d46-15783d6ab9c8/fairness/age?features=fairnessv2,indirect_bias,v2transaction

### Run on-demand Fairness
If you would like to peform an on-demand fairness check, then we need to score a fresh set of data with meta-fields, so that they would be used for indirect bias checking. So the below two cells will score and make sure these records are reached to payload logging table.

In [64]:
payload_logging(no_of_records_to_score = 1000)

Number of records in the payload logging table: 2000


In [65]:
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    raise Exception("Payload logging did not happen!")

Number of records in the payload logging table: 2000


### Trigger fairness monitoring run

In [66]:
run_details = wos_client.monitor_instances.run(monitor_instance_id=fairness_monitor_instance_id, background_mode=False)




 Waiting for end of monitoring run 1e2008e1-3cf1-4876-ae20-a10401ba8ae1 




finished

---------------------------
 Successfully finished run 
---------------------------




### Check for its status

In [67]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=fairness_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2024-08-06 13:27:03.983373+00:00,fairness_value,5826be95-e692-4c13-ab6e-274f08b33563,18.377,80.0,,"['feature:sex', 'fairness_metric_type:fairness', 'feature_value:Female']",fairness,841a81b1-b4cf-4e98-87de-1d374d43df63,1e2008e1-3cf1-4876-ae20-a10401ba8ae1,subscription,c274e76c-6bb9-43ce-ba64-127a3db95ede
2024-08-06 13:27:03.983373+00:00,fairness_value,5826be95-e692-4c13-ab6e-274f08b33563,38.577,80.0,,"['feature:age', 'fairness_metric_type:fairness', 'feature_value:18-33']",fairness,841a81b1-b4cf-4e98-87de-1d374d43df63,1e2008e1-3cf1-4876-ae20-a10401ba8ae1,subscription,c274e76c-6bb9-43ce-ba64-127a3db95ede
2024-08-06 13:26:40.688183+00:00,fairness_value,0ac72f74-aa61-4412-84fd-718b5e746fa6,18.377,80.0,,"['feature:sex', 'fairness_metric_type:fairness', 'feature_value:Female']",fairness,841a81b1-b4cf-4e98-87de-1d374d43df63,6c53aa46-439a-4008-bbfc-165132dff3ff,subscription,c274e76c-6bb9-43ce-ba64-127a3db95ede
2024-08-06 13:26:40.688183+00:00,fairness_value,0ac72f74-aa61-4412-84fd-718b5e746fa6,38.577,80.0,,"['feature:age', 'fairness_metric_type:fairness', 'feature_value:18-33']",fairness,841a81b1-b4cf-4e98-87de-1d374d43df63,6c53aa46-439a-4008-bbfc-165132dff3ff,subscription,c274e76c-6bb9-43ce-ba64-127a3db95ede


In [68]:
from IPython.display import Markdown as md
md("#### To view the latest evaluation of the fairness check, please visit IBM Watson OpenScale Fairness Dashboard: {}".format(FAIRNESS_DASHBOARD_URL))

#### To view the latest evaluation of the fairness check, please visit IBM Watson OpenScale Fairness Dashboard: https://cpd-cpd-instance.apps.wos415nfs2672.cp.fyre.ibm.com/aiopenscale/insights/038b8c68-6615-4c0d-8d46-15783d6ab9c8/fairness/age?features=fairnessv2,indirect_bias,v2transaction

# Active debiasing

In [69]:
no_of_records_to_score = 200
payload_scoring, scoring_response = sample_scoring(no_of_records_to_score)

Single record scoring result: 
 fields: ['workclass', 'fnlwgt', 'education', 'education-num', 'Marital', 'occupation', 'relationship', 'capitalgain', 'loss', 'hoursper', 'citizen_status', 'workclass_IX', 'workclassclassVec', 'education_IX', 'educationclassVec', 'Marital_IX', 'MaritalclassVec', 'occupation_IX', 'occupationclassVec', 'relationship_IX', 'relationshipclassVec', 'citizen_status_IX', 'citizen_statusclassVec', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'] 
 values:  ['State-gov', 77516, 'Bachelors', 13, 'Never-married', 'Adm-clerical', 'Not-in-family', 2174, 0, 40, 'United-States', 4.0, [9, [4], [1.0]], 2.0, [16, [2], [1.0]], 1.0, [7, [1], [1.0]], 3.0, [15, [3], [1.0]], 1.0, [6, [1], [1.0]], 0.0, [42, [0], [1.0]], [100, [4, 11, 26, 35, 48, 53, 95, 96, 97, 99], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 77516.0, 13.0, 2174.0, 40.0]], [17.120135477812976, 2.879864522187024], [0.8560067738906488, 0.1439932261093512], 0.0, '<=50K']
{"predictions": [{"fields": ["

### List the original model predictions

In [70]:
# for i in range(no_of_records_to_score):
#     print(scoring_response['predictions'][0]['values'][i][-1:][0])

## Get the token for calling OpenScale API

In [71]:
import requests
import urllib3
from http import HTTPStatus

def get_iamtoken(url, username, password):
    fqdn = urllib3.util.parse_url(url).netloc
    domain = '.'.join(fqdn.split('.')[1:])
    token_url = 'https://cp-console.{}/idprovider/v1/auth/identitytoken'.format(domain)
    data = {
        'grant_type': 'password',
        'username': username,
        'password': password,
        'scope': 'openid'
    }
    return requests.post(token_url, data, verify=False)

def get_accesstoken(url, username, iamtoken):
    url = '{}/v1/preauth/validateAuth'.format(url)
    headers = {
        'Content-type': 'application/json',
        'username': username,
        'iam-token': iamtoken
    }
    return requests.get(url, headers=headers, verify=False)

def generate_access_token():
    url=WOS_CREDENTIALS['url']
    username=WOS_CREDENTIALS['username']
    password=WOS_CREDENTIALS['password'] 
    response = get_iamtoken(url,username,password)
    #service is not available when iamintegration=false so fall back to old way of generating code
    if response.status_code==HTTPStatus.SERVICE_UNAVAILABLE:
        url = '{}/v1/preauth/validateAuth'.format(url)
        headers = {'Content-type': 'application/json'}
        data = {
            'grant_type': 'password',
            'username': username,
            'password': password
        }
        return requests.get(url, headers=headers, auth= (username,password),verify=False).json()['accessToken']
        
    else:
        return get_accesstoken(url,username, response.json()['access_token']).json()['accessToken']

access_token = generate_access_token()
print(access_token)

eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6ImM1NG4yLWRFSU9NS0ZsNHUwZFZyaW5VcE1EazdRSFdPV2h2Y19oQnpleW8ifQ.eyJ1c2VybmFtZSI6ImNwYWRtaW4iLCJyb2xlIjoiQWRtaW4iLCJwZXJtaXNzaW9ucyI6WyJtYW5hZ2VfY2F0YWxvZyIsImFkZF9jYXRhbG9nX2Fzc2V0c190b19wcm9qZWN0cyIsImFkbWluaXN0cmF0b3IiLCJjYW5fcHJvdmlzaW9uIiwibW9uaXRvcl9wbGF0Zm9ybSIsImNvbmZpZ3VyZV9wbGF0Zm9ybSIsInZpZXdfcGxhdGZvcm1faGVhbHRoIiwiY29uZmlndXJlX2F1dGgiLCJtYW5hZ2VfdXNlcnMiLCJtYW5hZ2VfZ3JvdXBzIiwibWFuYWdlX3NlcnZpY2VfaW5zdGFuY2VzIiwibWFuYWdlX3ZhdWx0c19hbmRfc2VjcmV0cyIsInNoYXJlX3NlY3JldHMiLCJhZGRfdmF1bHRzIiwibWFuYWdlX2xvY2F0aW9ucyIsIm1hbmFnZV9kYXRhX3BsYW5lcyIsInZpZXdfbG9jYXRpb25zIiwidmlld19kYXRhX3BsYW5lcyIsImNyZWF0ZV9wcm9qZWN0IiwiY3JlYXRlX3NwYWNlIl0sImdyb3VwcyI6WzEwMDAwXSwic3ViIjoiY3BhZG1pbiIsImlzcyI6IktOT1hTU08iLCJhdWQiOiJEU1giLCJ1aWQiOiIxMDAwMzMxMDAxIiwiYXV0aGVudGljYXRvciI6ImV4dGVybmFsIiwiaWFtIjp7ImFjY2Vzc1Rva2VuIjoiZTFkMTNkMDVmMDIxMGZiZTJmNWM3NjE5ZDIwODQyYmZiOWNlYmQzNTdhZmFjYzUxZjBlMWYwNTdlNmUwYmFiNDA1Nzg5ZmUwMDY1NjYwODMzY2ZmM2ZlMTQ3MDI1OGRiYWZjMTg5O

In [72]:
DEBIASING_PREDICTIONS_URL = WOS_CREDENTIALS['url'] + "/openscale/{0}/v2/subscriptions/{1}/predictions".format(data_mart_id,subscription_id)
print(DEBIASING_PREDICTIONS_URL)

headers = {}
headers["Content-Type"] = "application/json"
headers["Accept"] = "application/json"
headers["Authorization"] = "Bearer {}".format(access_token)

debiased_scoring_payload = payload_scoring['input_data'][0]
print('\n>>>>>>>>>>>>>>>\n')
print(debiased_scoring_payload)
print('\n>>>>>>>>>>>>>>>\n')

response = requests.post(DEBIASING_PREDICTIONS_URL, data=json.dumps(debiased_scoring_payload), headers=headers, verify=False)

https://cpd-cpd-instance.apps.wos415nfs2672.cp.fyre.ibm.com/openscale/00000000-0000-0000-0000-000000000000/v2/subscriptions/c274e76c-6bb9-43ce-ba64-127a3db95ede/predictions

>>>>>>>>>>>>>>>

{'fields': ['workclass', 'fnlwgt', 'education', 'education-num', 'Marital', 'occupation', 'relationship', 'capitalgain', 'loss', 'hoursper', 'citizen_status'], 'values': [['State-gov', 77516, 'Bachelors', 13, 'Never-married', 'Adm-clerical', 'Not-in-family', 2174, 0, 40, 'United-States'], ['Self-emp-not-inc', 83311, 'Bachelors', 13, 'Married-civ-spouse', 'Exec-managerial', 'Husband', 0, 0, 13, 'United-States'], ['Private', 215646, 'HS-grad', 9, 'Divorced', 'Handlers-cleaners', 'Not-in-family', 0, 0, 40, 'United-States'], ['Private', 234721, '11th', 7, 'Married-civ-spouse', 'Handlers-cleaners', 'Husband', 0, 0, 40, 'United-States'], ['Private', 338409, 'Bachelors', 13, 'Married-civ-spouse', 'Prof-specialty', 'Wife', 0, 0, 40, 'Cuba'], ['Private', 284582, 'Masters', 14, 'Married-civ-spouse', 'Exec-ma

## Listing those predictions whose original model prediction is different from the debiased prediction

In [73]:
predictedLabel_index = response.json()['fields'].index('predictedLabel')
debiased_prediction_index = response.json()['fields'].index('debiased_prediction')

for j in range(no_of_records_to_score):
    scored_record = response.json()['values'][j]
    predictedLabel = scored_record[predictedLabel_index]
    debiased_prediction = scored_record[debiased_prediction_index]
    if predictedLabel != debiased_prediction:
        print('==========')
        print(scored_record)
        print('predictedLabel:' + str(predictedLabel) + ', debiased_prediction:' + str(debiased_prediction))
        print('==========')

## Additional data to help debugging

In [74]:
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))
print("OpenScale Datamart id: {}".format(data_mart_id))
print("OpenScale Subscription id: {}".format(subscription_id))
print("OpenScale Fairness Monitor Instance id: {}".format(fairness_monitor_instance_id))
print("OpenScale Fairness Monitoring Run id: {}".format(fairness_monitoring_run_id))

Model id: eed8b225-19e5-4460-9b6f-7271dc1e3ff2
Deployment id: 038b8c68-6615-4c0d-8d46-15783d6ab9c8
OpenScale Datamart id: 00000000-0000-0000-0000-000000000000
OpenScale Subscription id: c274e76c-6bb9-43ce-ba64-127a3db95ede
OpenScale Fairness Monitor Instance id: 841a81b1-b4cf-4e98-87de-1d374d43df63
OpenScale Fairness Monitoring Run id: 6c53aa46-439a-4008-bbfc-165132dff3ff


## Conclusion

As part of this notebook we did the following tasks

- Created and trained an Income classification model. We made sure to remove the sensitive attributes - age, sex and race while training the model.
- Identified a Space to be associated with the model and its deployment.
- Deployed the model to the space and scored it with additional meta fields.
- Configured OpenScale and subscribed the deployment.
- Configured fairness on the meta fields (sensitive attributes) age and sex.
- Ran fairness monitor
- Noticed that Indirect Bias exists against age attribute, as it can be visualised in the OpenScale dashboard.
- Did an on-demand evaluation of the fairness monitor as well.
- Call the active debias API, otherwise called as OpenScale predictions API, to notice that from the set of scored records indeed there exists some records for which debiased prediction is different from the original prediction.  
- The above step proves that OpenScale is successfully able to debiased the model prediction even on the meta/sensitive attributes.

That's all for now. Thank You!