<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Watson Machine Learning

This notebook should be run using with **Python 3.7** runtime environment. **If you are viewing this in Watson Studio and do not see Python 3.7 in the upper right corner of your screen, please update the runtime now.** It requires service credentials for the following services:
  * Watson OpenScale
  * Watson Machine Learning 
  * DB2

  
The notebook will train, create and deploy a model, configure OpenScale to monitor that deployment, and inject seven days' worth of historical records and measurements for viewing in the OpenScale Insights dashboard.

# Setup <a name="setup"></a>

## Package installation

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
!pip install --upgrade pyspark==2.4 --no-cache | tail -n 1

!pip install --upgrade pandas==0.25.3 --no-cache | tail -n 1
!pip install --upgrade requests==2.23 --no-cache | tail -n 1
!pip install numpy==1.16.4 --no-cache | tail -n 1
!pip install SciPy --no-cache | tail -n 1
!pip install lime --no-cache | tail -n 1
!pip install ibm-cloud-sdk-core --no-cache | tail -n 1

!pip install --upgrade ibm-watson-machine-learning --user | tail -n 1
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1

### Action: restart the kernel!

## Configure credentials

- WOS_CREDENTIALS (CP4D)
- WML_CREDENTIALS (CP4D)
- DATABASE_CREDENTIALS (DB2 on CP4D or Cloud Object Storage (COS))
- SCHEMA_NAME

In [None]:
#masked
WOS_CREDENTIALS = {
    "url": "Cluster host name",
    "username": "XX",
    "password": "XX",
    "version": "3.5"
}

In [None]:
#masked
WML_CREDENTIALS = {
                   "url": "Cluster host name",
                   "username": "XX",
                   "password" : "XX",
                   "instance_id": "wml_local",
                   "version" : "3.5"
                  }

In [None]:
#masked
#IBM DB2 database connection format example
DATABASE_CREDENTIALS = {
    "hostname":"9.999.999.99",
    "username":"XX",
    "password":"XX",
    "database":"SAMPLE",
    "port":"50000"
}

### Action: put created schema name below.

In [None]:
SCHEMA_NAME = 'AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000'

## Save training data to Cloud Object Storage

### Cloud object storage details¶

In next cells, you will need to paste some credentials to Cloud Object Storage. If you haven't worked with COS yet please visit getting started with COS tutorial. You can find COS_API_KEY_ID and COS_RESOURCE_CRN variables in Service Credentials in menu of your COS instance. Used COS Service Credentials must be created with Role parameter set as Writer. Later training data file will be loaded to the bucket of your instance and used as training refecence in subsription. COS_ENDPOINT variable can be found in Endpoint field of the menu.

In [None]:
IAM_URL="https://iam.ng.bluemix.net/oidc/token"

In [None]:
# masked
COS_API_KEY_ID = "***"
COS_RESOURCE_CRN = "***" # eg "crn:v1:bluemix:public:cloud-object-storage:global:a/3bf0d9003abfb5d29761c3e97696b71c:d6f04d83-6c4f-4a62-a165-696756d63903::"
COS_ENDPOINT = "https://s3.us.cloud-object-storage.appdomain.cloud" # Current list avaiable at https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints
BUCKET_NAME = "testcasebucket"
FILE_NAME = "Indirect_bias_AdultCensusdata.csv"

# Load and explore data

In [None]:
!rm Indirect_bias_AdultCensusdata.csv
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/adult_census/Indirect_bias_AdultCensusdata.csv

## Explore data

In [None]:
from pyspark.sql import SparkSession
import json

spark = SparkSession.builder.getOrCreate()
df_data = spark.read.csv(path="Indirect_bias_AdultCensusdata.csv", sep=",", header=True, inferSchema=True) 
df_data.head()

In [None]:
print("Number of records: " + str(df_data.count()))

# Create a model

In [None]:
# spark_df = sqlCtx.createDataFrame(df_data)
spark_df = df_data
# Remove protected attributes from training data
protected_attributes = ["race", "age", "sex"]
for attr in protected_attributes:
    spark_df = spark_df.drop(attr)
columns = spark_df.columns
model_name = "Adult Census Income Classifier Model"
deployment_name = "Adult Census Income Classifier Deployment"

spark_df.printSchema()

In [None]:
from pyspark.ml.feature import OneHotEncoderEstimator, StringIndexer, IndexToString, VectorAssembler
from pyspark.ml import Pipeline, Model

cat_features = ['workclass', 'education', 'Marital', 'occupation', 'relationship', 'citizen_status'] 
num_features = ["fnlwgt", "education-num", "capitalgain", "loss", "hoursper"]
stages=[]

for feature in cat_features:
    string_indexer = StringIndexer(inputCol = feature, outputCol = feature + '_IX').setHandleInvalid("keep")
    encoder = OneHotEncoderEstimator(inputCols=[string_indexer.getOutputCol()], outputCols=[feature + "classVec"])
    stages += [string_indexer, encoder]

si_Label = StringIndexer(inputCol="label", outputCol="encoded_label").fit(spark_df)
label_converter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=si_Label.labels)
stages.append(si_Label)

In [None]:
assembler_inputs = [c + "classVec" for c in cat_features] + num_features
va_features = VectorAssembler(inputCols=assembler_inputs, outputCol="features")
stages.append(va_features)

In [None]:
(train_data, test_data) = spark_df.randomSplit([0.8, 0.2], 24)
print("Number of records for training: " + str(train_data.count()))
print("Number of records for evaluation: " + str(test_data.count()))

In [None]:
train_data.columns

In [None]:
from pyspark.ml.classification import GBTClassifier, DecisionTreeClassifier, RandomForestClassifier
classifier = RandomForestClassifier(labelCol="encoded_label", featuresCol="features")
stages.append(classifier)
stages.append(label_converter)
pipeline = Pipeline(stages=stages)
model = pipeline.fit(train_data)

In [None]:
predictions = model.transform(test_data)
predictions.printSchema()
predictions.head()

In [None]:
from pyspark.ml.evaluation import BinaryClassificationEvaluator
evaluatorDT = BinaryClassificationEvaluator(labelCol="encoded_label", rawPredictionCol="rawPrediction")
accuracy = evaluatorDT.evaluate(predictions)

print("Accuracy = %g" % accuracy)

# Save and deploy the model

In [None]:
import json
from ibm_watson_machine_learning import APIClient

wml_client = APIClient(WML_CREDENTIALS)
wml_client.version

In [None]:
wml_client.spaces.list(limit=10)

## Find the space that you would like to associate the model that is created and deployed as part of the notebook, and specify it in the next cell

In [None]:
WML_SPACE_ID='***' # use space id here
wml_client.set.default_space(WML_SPACE_ID)

In [None]:
deployments_list = wml_client.deployments.get_details()
for deployment in deployments_list["resources"]:
    model_id = deployment["entity"]["asset"]["id"]
    deployment_id = deployment["metadata"]["id"]
    if deployment["metadata"]["name"] == deployment_name:
        print("Deleting deployment id", deployment_id)
        wml_client.deployments.delete(deployment_id)
        print("Deleting model id", model_id)
        wml_client.repository.delete(model_id)
wml_client.repository.list_models()

In [None]:
training_data_references = [
                {
                    "id": "product line",
                    "type": "s3",
                    "connection": {
                        "access_key_id": COS_API_KEY_ID,
                        "endpoint_url": COS_ENDPOINT,
                        "resource_instance_id":COS_RESOURCE_CRN
                    },
                    "location": {
                        "bucket": BUCKET_NAME,
                        "path": FILE_NAME,
                    }
                }
            ]

In [None]:
software_spec_uid = wml_client.software_specifications.get_id_by_name("spark-mllib_2.4")
print("Software Specification ID: {}".format(software_spec_uid))
model_props = {
        wml_client._models.ConfigurationMetaNames.NAME:"{}".format(model_name),
        wml_client._models.ConfigurationMetaNames.SPACE_UID: WML_SPACE_ID,
        wml_client._models.ConfigurationMetaNames.TYPE: "mllib_2.4",
        wml_client._models.ConfigurationMetaNames.SOFTWARE_SPEC_UID: software_spec_uid,
        wml_client._models.ConfigurationMetaNames.TRAINING_DATA_REFERENCES: training_data_references,
        wml_client._models.ConfigurationMetaNames.LABEL_FIELD: "label",
    }

In [None]:
print("Storing model ...")
published_model_details = wml_client.repository.store_model(
    model=model, 
    meta_props=model_props, 
    training_data=train_data, 
    pipeline=pipeline)

model_uid = wml_client.repository.get_model_uid(published_model_details)
print("Done")
print("Model ID: {}".format(model_uid))

In [None]:
published_model_details

## Create a model deployment

In [None]:
deployment_details = wml_client.deployments.create(
    model_uid, 
    meta_props={
        wml_client.deployments.ConfigurationMetaNames.NAME: "{}".format(deployment_name),
        wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
    }
)
scoring_url = wml_client.deployments.get_scoring_href(deployment_details)
deployment_uid=wml_client.deployments.get_uid(deployment_details)

print("Scoring URL:" + scoring_url)
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))

# Construct the scoring payload

In [None]:
import pandas as pd

df = pd.read_csv("Indirect_bias_AdultCensusdata.csv")
df.head()

## Remove the sensitive attributes

In [None]:
cols_to_remove = ['label']
cols_to_remove.extend(protected_attributes)
cols_to_remove

## Create the meta data frame capturing the sensitive data

In [None]:
meta_df = df[protected_attributes].copy()
meta_fields = meta_df.columns.tolist()
meta_values = meta_df[meta_fields].values.tolist()

## Construct the scoring payload comprising the meta fields

In [None]:
def get_scoring_payload(no_of_records_to_score = 1):
    meta_payload = {
        "fields": meta_fields,
        "values": meta_values[:no_of_records_to_score]
    }

    for col in cols_to_remove:
        if col in df.columns:
            del df[col] 

    fields = df.columns.tolist()
    values = df[fields].values.tolist()

    payload_scoring = {"input_data": [{"fields": fields, "values": values[:no_of_records_to_score],"meta": meta_payload}]}  
    return payload_scoring

## Method to perform scoring

In [None]:
def sample_scoring(no_of_records_to_score = 1):
    records_list=[]
    payload_scoring = get_scoring_payload(no_of_records_to_score)
    scoring_response = wml_client.deployments.score(deployment_uid, payload_scoring)
    print('Single record scoring result:', '\n fields:', scoring_response['predictions'][0]['fields'], '\n values: ', scoring_response['predictions'][0]['values'][0])
    print(json.dumps(scoring_response, indent=None))
    return payload_scoring, scoring_response

In [None]:
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
def payload_logging(no_of_records_to_score = 1):
    records_list=[]
    payload_scoring = get_scoring_payload(no_of_records_to_score)
    
    
    scoring_response = wml_client.deployments.score(deployment_uid, payload_scoring)
    time.sleep(5)
    pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))
    if pl_records_count == 0:
        print("Payload logging did not happen, performing explicit payload logging.")
    
        #manual PL logging if automated logging does not work
        score_input=payload_scoring['input_data'][0]
        score_response=scoring_response['predictions'][0]
        pl_record = PayloadRecord(request=score_input, response=score_response, response_time=int(460))
        records_list.append(pl_record)
        wos_client.data_sets.store_records(data_set_id = payload_data_set_id, request_body=records_list)
        
        
        time.sleep(5)
        pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
        print("Number of records in the payload logging table: {}".format(pl_records_count))

## Score the model and print the scoring response

In [None]:
sample_scoring(no_of_records_to_score = 1)

# Configure OpenScale 

The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [None]:
from ibm_watson_openscale import APIClient
from ibm_watson_openscale.utils import *
from ibm_watson_openscale.supporting_classes import *
from ibm_watson_openscale.supporting_classes.enums import *

import json
import requests
import base64
from requests.auth import HTTPBasicAuth
import time

## Get a instance of the OpenScale SDK client

In [None]:
authenticator = CloudPakForDataAuthenticator(
        url=WOS_CREDENTIALS['url'],
        username=WOS_CREDENTIALS['username'],
        password=WOS_CREDENTIALS['password'],
        disable_ssl_verification=True
    )

wos_client = APIClient(service_url=WOS_CREDENTIALS['url'],authenticator=authenticator)
wos_client.version

## Create datamart

### Set up datamart

Watson OpenScale uses a database to store payload logs and calculated metrics. If database credentials were not supplied above, the notebook will use the free, internal lite database. If database credentials were supplied, the datamart will be created there unless there is an existing datamart and the KEEP_MY_INTERNAL_POSTGRES variable is set to True. If an OpenScale datamart exists in Db2 or PostgreSQL, the existing datamart will be used and no data will be overwritten.

Prior instances of the model will be removed from OpenScale monitoring.

In [None]:
wos_client.data_marts.show()

In [None]:
data_marts = wos_client.data_marts.list().result.data_marts
if len(data_marts) == 0:
    if DB_CREDENTIALS is not None:
        if SCHEMA_NAME is None: 
            print("Please specify the SCHEMA_NAME and rerun the cell")

        print('Setting up external datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook",
                database_configuration=DatabaseConfigurationRequest(
                  database_type=DatabaseType.DB2,
                    credentials=PrimaryStorageCredentialsLong(
                        hostname=DATABASE_CREDENTIALS['hostname'],
                        username=DATABASE_CREDENTIALS['username'],
                        password=DATABASE_CREDENTIALS['password'],
                        db=DATABASE_CREDENTIALS['database'],
                        port=DATABASE_CREDENTIALS['port'],
                        ssl=DATABASE_CREDENTIALS['ssl'],
                        sslmode=DATABASE_CREDENTIALS['sslmode'],
                        certificate_base64=DATABASE_CREDENTIALS['certificate_base64']
                    ),
                    location=LocationSchemaName(
                        schema_name= SCHEMA_NAME
                    )
                )
             ).result
    else:
        print('Setting up internal datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook", 
                internal_database = True).result
        
    data_mart_id = added_data_mart_result.metadata.id
    
else:
    data_mart_id=data_marts[0].metadata.id
    print('Using existing datamart {}'.format(data_mart_id))

In [None]:
data_mart_details = wos_client.data_marts.list().result.data_marts[0]
data_mart_details.to_dict()

In [None]:
wos_client.service_providers.show()

## Remove existing service provider connected with used WML instance.

Multiple service providers for the same engine instance are avaiable in Watson OpenScale. To avoid multiple service providers of used WML instance in the tutorial notebook the following code deletes existing service provder(s) and then adds new one.

In [None]:
SERVICE_PROVIDER_NAME = "Watson Machine Learning - Indirect Bias Demo"
SERVICE_PROVIDER_DESCRIPTION = "Added by tutorial WOS notebook to showcase Indirect Bias functionality."

In [None]:
service_providers = wos_client.service_providers.list().result.service_providers
for service_provider in service_providers:
    service_instance_name = service_provider.entity.name
    if service_instance_name == SERVICE_PROVIDER_NAME:
        service_provider_id = service_provider.metadata.id
        wos_client.service_providers.delete(service_provider_id)
        print("Deleted existing service_provider for WML instance: {}".format(service_provider_id))

## Add service provider

Watson OpenScale needs to be bound to the Watson Machine Learning instance to capture payload data into and out of the model.
Note: You can bind more than one engine instance if needed by calling wos_client.service_providers.add method. Next, you can refer to particular service provider using service_provider_id.

In [None]:
added_service_provider_result = wos_client.service_providers.add(
        name=SERVICE_PROVIDER_NAME,
        description=SERVICE_PROVIDER_DESCRIPTION,
        service_type=ServiceTypes.WATSON_MACHINE_LEARNING,
        deployment_space_id = WML_SPACE_ID,
        operational_space_id = "production",
        credentials=WMLCredentialsCP4D(
            url=WML_CREDENTIALS["url"],
            username=WML_CREDENTIALS["username"],
            password=WML_CREDENTIALS["password"],
            instance_id=None
        ),
        background_mode=False
    ).result
service_provider_id = added_service_provider_result.metadata.id

In [None]:
print(wos_client.service_providers.get(service_provider_id).result)

In [None]:
asset_deployment_details = wos_client.service_providers.list_assets(data_mart_id=data_mart_id, service_provider_id=service_provider_id, deployment_id=deployment_uid, deployment_space_id = WML_SPACE_ID).result['resources'][0]
asset_deployment_details

In [None]:
model_asset_details_from_deployment=wos_client.service_providers.get_deployment_asset(data_mart_id=data_mart_id,service_provider_id=service_provider_id,deployment_id=deployment_uid,deployment_space_id=WML_SPACE_ID)
#model_asset_details_from_deployment

## Subscriptions

Remove existing credit risk subscriptions

This code removes previous subscriptions to the model to refresh the monitors with the new model and new data.

In [None]:
wos_client.subscriptions.show()

## Remove the existing subscription

In [None]:
subscriptions = wos_client.subscriptions.list().result.subscriptions
for subscription in subscriptions:
    sub_model_id = subscription.entity.asset.asset_id
    if sub_model_id == model_uid:
        wos_client.subscriptions.delete(subscription.metadata.id)
        print('Deleted existing subscription for model', sub_model_id)

This code creates the model subscription in OpenScale using the Python client API. Note that we need to provide the model unique identifier, and some information about the model itself.

In [None]:
feature_columns = cat_features + num_features
feature_columns

In [None]:
subscription_details = wos_client.subscriptions.add(
        data_mart_id=data_mart_id,
        service_provider_id=service_provider_id,
        asset=Asset(
            asset_id=model_asset_details_from_deployment["entity"]["asset"]["asset_id"],
            name=model_asset_details_from_deployment["entity"]["asset"]["name"],
            url=model_asset_details_from_deployment["entity"]["asset"]["url"],
            asset_type=AssetTypes.MODEL,
            input_data_type=InputDataType.STRUCTURED,
            problem_type=ProblemType.BINARY_CLASSIFICATION
        ),
        deployment=AssetDeploymentRequest(
            deployment_id=asset_deployment_details['metadata']['guid'],
            name=asset_deployment_details['entity']['name'],
            deployment_type= DeploymentTypes.ONLINE,
            url=asset_deployment_details['entity']['scoring_endpoint']['url']
        ),
        asset_properties=AssetPropertiesRequest(
            label_column="label",
            probability_fields=["probability"],
            prediction_field="predictedLabel",
            feature_fields = feature_columns,
            categorical_fields = cat_features,
            training_data_reference=TrainingDataReference(type="cos",
                                                          location=COSTrainingDataReferenceLocation(bucket = BUCKET_NAME,
                                                                                                    file_name = FILE_NAME),
                                                          connection=COSTrainingDataReferenceConnection.from_dict({
                                                                        "resource_instance_id": COS_RESOURCE_CRN,
                                                                        "url": COS_ENDPOINT,
                                                                        "api_key": COS_API_KEY_ID,
                                                                        "iam_url": IAM_URL})),
            training_data_schema=SparkStruct.from_dict(model_asset_details_from_deployment["entity"]["asset_properties"]["training_data_schema"])
        )
    ).result
subscription_id = subscription_details.metadata.id
print('subscription_id: ' + subscription_id)

In [None]:
import time

time.sleep(5)
payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id:", payload_data_set_id)

In [None]:
wos_client.data_sets.show()

In [None]:
wos_client.subscriptions.get(subscription_id).result.to_dict()

# Score the model so we can configure monitors

Now that the WML service has been bound and the subscription has been created, we need to send a request to the model before we configure OpenScale. This allows OpenScale to create a payload log in the datamart with the correct schema, so it can capture data coming into and out of the model.

In [None]:
payload_logging(no_of_records_to_score = 1000)

In [None]:
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    raise Exception("Payload logging did not happen!")

## Fairness configuration

The code below configures fairness monitoring for our model. It turns on monitoring for two features, sex and age. In each case, we must specify:
    
Which model feature to monitor One or more majority groups, which are values of that feature that we expect to receive a higher percentage of favorable outcomes One or more minority groups, which are values of that feature that we expect to receive a higher percentage of unfavorable outcomes The threshold at which we would like OpenScale to display an alert if the fairness measurement falls below (in this case, 80%) Additionally, we must specify which outcomes from the model are favourable outcomes, and which are unfavourable. We must also provide the number of records OpenScale will use to calculate the fairness score. In this case, OpenScale's fairness monitor will run hourly, but will not calculate a new fairness rating until at least 100 records have been added. Finally, to calculate fairness, OpenScale must perform some calculations on the training data, so we provide the dataframe containing the data.

### Create Fairness Monitor Instance

In [None]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)
parameters = {
    "features": [
        {
            "feature": "sex",
            "majority": ["Male"],
            "minority": ["Female"]
        },
        {
            "feature": "age",
            "majority": [[41,75]],
            "minority": [[18,33]]
        }
    ],
    "favourable_class": [">50K"],
    "unfavourable_class": ["<=50K"],
    "min_records": 1000
}
thresholds = [
    {
        "metric_id": "fairness_value",
        "specific_values": [
            {
                "applies_to": [
                    {
                        "type": "tag",
                        "value": "sex",
                        "key": "feature"
                    }
                ],
                "value": 80
            },
            {
                "applies_to": [
                    {
                        "type": "tag",
                        "value": "age",
                        "key": "feature"
                    }
                ],
                "value": 80
            }
        ],
        "type": "lower_limit",
        "value": 80
    }
]
fairness_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.FAIRNESS.ID,
    target=target,
    parameters=parameters,
    thresholds=thresholds
).result
fairness_monitor_instance_id =fairness_monitor_details.metadata.id

### Get Fairness Monitor Instance

In [None]:
wos_client.monitor_instances.show()

### Get run details
In case of production subscription, initial monitoring run is triggered internally. Checking its status

In [None]:
runs = wos_client.monitor_instances.list_runs(fairness_monitor_instance_id, limit=1).result.to_dict()
fairness_monitoring_run_id = runs["runs"][0]["metadata"]["id"]
run_status = None
while(run_status not in ["finished", "error"]):
    run_details = wos_client.monitor_instances.get_run_details(fairness_monitor_instance_id, fairness_monitoring_run_id).result.to_dict()
    run_status = run_details["entity"]["status"]["state"]
    print('run_status: ', run_status)
    if run_status in ["finished", "error"]:
        break
    time.sleep(10)

### Fairness run output

In [None]:
wos_client.monitor_instances.get_run_details(fairness_monitor_instance_id, fairness_monitoring_run_id).result.to_dict()

In [None]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=fairness_monitor_instance_id)

In [None]:
FAIRNESS_DASHBOARD_URL = WOS_CREDENTIALS["url"] + "/aiopenscale/insights/{0}/fairness/age?features=fairnessv2,indirect_bias,v2transaction".format(deployment_uid)

In [None]:
from IPython.display import Markdown as md
md("#### Link to IBM Watson OpenScale Fairness Dashboard: {}".format(FAIRNESS_DASHBOARD_URL))

### Run on-demand Fairness
If you would like to peform an on-demand fairness check, then we need to score a fresh set of data with meta-fields, so that they would be used for indirect bias checking. So the below two cells will score and make sure these records are reached to payload logging table.

In [None]:
payload_logging(no_of_records_to_score = 1000)

In [None]:
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    raise Exception("Payload logging did not happen!")

### Trigger fairness monitoring run

In [None]:
run_details = wos_client.monitor_instances.run(monitor_instance_id=fairness_monitor_instance_id, background_mode=False)

### Check for its status

In [None]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=fairness_monitor_instance_id)

In [None]:
from IPython.display import Markdown as md
md("#### To view the latest evaluation of the fairness check, please visit IBM Watson OpenScale Fairness Dashboard: {}".format(FAIRNESS_DASHBOARD_URL))

# Active debiasing

In [None]:
no_of_records_to_score = 200
payload_scoring, scoring_response = sample_scoring(no_of_records_to_score)

### List the original model predictions

In [None]:
# for i in range(no_of_records_to_score):
#     print(scoring_response['predictions'][0]['values'][i][-1:][0])

## Get the token for calling OpenScale API

In [None]:
import json
import requests
import base64
from requests.auth import HTTPBasicAuth
import time

token_url = WOS_CREDENTIALS['url'] + '/v1/preauth/validateAuth'
headers = {}
headers["Accept"] = "application/json"
auth = HTTPBasicAuth(WOS_CREDENTIALS['username'], WOS_CREDENTIALS['password'])
response = requests.get(token_url, headers=headers, auth=auth, verify=False)
json_data = response.json()
access_token = json_data['accessToken']
access_token

In [None]:
DEBIASING_PREDICTIONS_URL = WOS_CREDENTIALS['url'] + "/openscale/{0}/v2/subscriptions/{1}/predictions".format(data_mart_id,subscription_id)
print(DEBIASING_PREDICTIONS_URL)

headers = {}
headers["Content-Type"] = "application/json"
headers["Accept"] = "application/json"
headers["Authorization"] = "Bearer {}".format(access_token)

debiased_scoring_payload = payload_scoring['input_data'][0]
print('\n>>>>>>>>>>>>>>>\n')
print(debiased_scoring_payload)
print('\n>>>>>>>>>>>>>>>\n')

response = requests.post(DEBIASING_PREDICTIONS_URL, data=json.dumps(debiased_scoring_payload), headers=headers, verify=False)

## Listing those predictions whose original model prediction is different from the debiased prediction

In [None]:
predictedLabel_index = response.json()['fields'].index('predictedLabel')
debiased_prediction_index = response.json()['fields'].index('debiased_prediction')

for j in range(no_of_records_to_score):
    scored_record = response.json()['values'][j]
    predictedLabel = scored_record[predictedLabel_index]
    debiased_prediction = scored_record[debiased_prediction_index]
    if predictedLabel != debiased_prediction:
        print('==========')
        print(scored_record)
        print('predictedLabel:' + str(predictedLabel) + ', debiased_prediction:' + str(debiased_prediction))
        print('==========')

## Additional data to help debugging

In [None]:
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))
print("OpenScale Datamart id: {}".format(data_mart_id))
print("OpenScale Subscription id: {}".format(subscription_id))
print("OpenScale Fairness Monitor Instance id: {}".format(fairness_monitor_instance_id))
print("OpenScale Fairness Monitoring Run id: {}".format(fairness_monitoring_run_id))

## Conclusion

As part of this notebook we did the following tasks

- Created and trained an Income classification model. We made sure to remove the sensitive attributes - age, sex and race while training the model.
- Identified a Space to be associated with the model and its deployment.
- Deployed the model to the space and scored it with additional meta fields.
- Configured OpenScale and subscribed the deployment.
- Configured fairness on the meta fields (sensitive attributes) age and sex.
- Ran fairness monitor
- Noticed that Indirect Bias exists against age attribute, as it can be visualised in the OpenScale dashboard.
- Did an on-demand evaluation of the fairness monitor as well.
- Call the active debias API, otherwise called as OpenScale predictions API, to notice that from the set of scored records indeed there exists some records for which debiased prediction is different from the original prediction.  
- The above step proves that OpenScale is successfully able to debiased the model prediction even on the meta/sensitive attributes.

That's all for now. Thank You!