<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

This notebook should be run with **Default Spark 3.4 & Python 3.9** or **Python 3.10** runtime environment. 

If you are viewing this in Watson Studio and do not see Python 3.9.x in the upper right corner of your screen, please update the runtime now.

It requires service credentials for the following services:
  * Watson OpenScale
  * Watson Machine Learning 
  * DB2
  
The notebook will train, create and deploy a German Credit Risk model, and configure OpenScale to monitor that deployment. Model Evaluation will also be triggered and published Fact will be retrieved.

**Note**: The AI Factsheets add-on must be installed on the CPD cluster for the facts to be published and retrieved successfully.

### Contents

- [Setup](#setup)
- [Model Building and Deployment](#model)
- [Configure OpenScale](#openscale)
- [Monitor Configurations](#monitor)
- [MRM Evaluation](#mrm)

# 1. Setup <a name="setup"></a>

## 1.1 Package installation

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Install WML and WOS SDKs

!pip install --upgrade ibm-watson-machine-learning | tail -n 1
!pip install --upgrade ibm-watson-openscale==1.0.359 --no-cache | tail -n 1

In [None]:
# Install pyspark if runtime environment doesn't include Spark

!pip install --upgrade pyspark==3.3.0 --no-cache | tail -n 1

**Note** - Restart the kernel now to use the updated libraries.

## 1.2 Configure credentials

Provide OpenScale, Watson Machine Learning, and DB2 service credentials.

In [2]:
cpd_url = "***"
cpd_username = "***"
cpd_password = "***"

cpd_url = cpd_url.rstrip("/")

WOS_CREDENTIALS = {
    "url": cpd_url,
    "username": cpd_username,
    "password": cpd_password
}

WML_CREDENTIALS = {
    "url": "***",
    "password": "***",
    "username": "***",
    "instance_id": "***",
    "version": "4.7"
}

DB2_CREDENTIALS = {
    "hostname": "***",
    "username": "***",
    "password": "***",
    "database_name": "***",
    "port": 50000,
    "ssl": True
}

# Location details of German Credit Risk training data
TRAINING_DATA_SCHEMA_NAME = "***"
TRAINING_DATA_TABLE_NAME = "***"

# Location details of German Credit Risk evaluation data 
EVALUATION_DATA_SCHEMA_NAME = "***"
EVALUATION_DATA_TABLE_NAME = TEST_DATA_SET_NAME = "***"

# 2. Model Building and Deployment <a name="model"></a>

In this section you will learn how to train Spark MLLib model and next deploy it as web-service using Watson Machine Learning service.

## 2.1 Load the training data from Github <a name="model"></a>

In [3]:
!rm german_credit_data_biased_training.csv
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/Cloud%20Pak%20for%20Data/WML/assets/data/credit_risk/german_credit_data_biased_training.csv
!rm credit_risk_test_data.csv
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/Cloud%20Pak%20for%20Data/WML/assets/data/credit_risk/credit_risk_test_data.csv

--2024-09-03 11:21:34--  https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/Cloud%20Pak%20for%20Data/WML/assets/data/credit_risk/german_credit_data_biased_training.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 689622 (673K) [text/plain]
Saving to: ‘german_credit_data_biased_training.csv’


2024-09-03 11:21:34 (36.5 MB/s) - ‘german_credit_data_biased_training.csv’ saved [689622/689622]



In [4]:
from pyspark.sql import SparkSession
import pandas as pd
import json
import time
spark = SparkSession.builder.getOrCreate()
df_data = spark.read.csv(path="german_credit_data_biased_training.csv", sep=",", header=True, inferSchema=True)
df_data.head()

Row(CheckingStatus='0_to_200', LoanDuration=31, CreditHistory='credits_paid_to_date', LoanPurpose='other', LoanAmount=1889, ExistingSavings='100_to_500', EmploymentDuration='less_1', InstallmentPercent=3, Sex='female', OthersOnLoan='none', CurrentResidenceDuration=3, OwnsProperty='savings_insurance', Age=32, InstallmentPlans='none', Housing='own', ExistingCreditsCount=1, Job='skilled', Dependents=1, Telephone='none', ForeignWorker='yes', Risk='No Risk')

## 2.2 Explore data

In [5]:
df_data.printSchema()

root
 |-- CheckingStatus: string (nullable = true)
 |-- LoanDuration: integer (nullable = true)
 |-- CreditHistory: string (nullable = true)
 |-- LoanPurpose: string (nullable = true)
 |-- LoanAmount: integer (nullable = true)
 |-- ExistingSavings: string (nullable = true)
 |-- EmploymentDuration: string (nullable = true)
 |-- InstallmentPercent: integer (nullable = true)
 |-- Sex: string (nullable = true)
 |-- OthersOnLoan: string (nullable = true)
 |-- CurrentResidenceDuration: integer (nullable = true)
 |-- OwnsProperty: string (nullable = true)
 |-- Age: integer (nullable = true)
 |-- InstallmentPlans: string (nullable = true)
 |-- Housing: string (nullable = true)
 |-- ExistingCreditsCount: integer (nullable = true)
 |-- Job: string (nullable = true)
 |-- Dependents: integer (nullable = true)
 |-- Telephone: string (nullable = true)
 |-- ForeignWorker: string (nullable = true)
 |-- Risk: string (nullable = true)



In [6]:
print("Number of records: " + str(df_data.count()))

Number of records: 5000


### 2.2.1 Storing training data and testing data to database.
> #### If training data and testing data already stored in database skip next 3 cells.
> ##### creating connection to database...

In [7]:
import pandas as pd
import ibm_db

# Creating Connection to Database
dsn_hostname = DB2_CREDENTIALS['hostname']
dsn_uid = DB2_CREDENTIALS["username"]
dsn_pwd = DB2_CREDENTIALS["password"]
dsn_database = DB2_CREDENTIALS["database_name"]
dsn_port = DB2_CREDENTIALS["port"]
dsn_security = "SSL"
conn = None
dsn = (
    "DATABASE={0};"
    "HOSTNAME={1};"
    "PORT={2};"
    "UID={3};"
    "PWD={4};"
    "SECURITY={5};").format(dsn_database, dsn_hostname, dsn_port, dsn_uid, dsn_pwd, dsn_security)
try:
    conn = ibm_db.connect(dsn, "", "")
    print("Connected to database: ", dsn_database, "as user: ", dsn_uid, "on host: ", dsn_hostname)
except Exception as e:
    print("Unable to connect: ", ibm_db.conn_errormsg())

Connected to database:  bludb as user:  zqj21624 on host:  1bbf73c5-d84a-4bb0-85b9-ab1a4348f4a4.c3n41cmd0nqnrk39u98g.databases.appdomain.cloud


### Extracting data's from csv to store in database.

In [8]:
def create_table(df, table, schema):
    # Extracting column based on csv features
    columns = []
    for column_name, dtype in df.dtypes.items():
        if pd.api.types.is_integer_dtype(dtype):
            columns.append(f'"{column_name}" INT')
        elif pd.api.types.is_float_dtype(dtype):
            columns.append(f'"{column_name}" FLOAT')
        elif pd.api.types.is_bool_dtype(dtype):
            columns.append(f'"{column_name}" BOOLEAN')
        elif pd.api.types.is_datetime64_any_dtype(dtype):
            columns.append(f'"{column_name}" TIMESTAMP')
        else:
            columns.append(f'"{column_name}" VARCHAR(255)')
    # Deleting table if already exist.
    try:
        drop_table = f"DROP TABLE {schema}.{table}"
        ibm_db.exec_immediate(conn, drop_table)
        print("Table Deleted, Creating new table")
    except Exception as e:
        print("Creating new table")
    # Creating new table
    create_table_sql = f"CREATE TABLE {schema}.{table} ({', '.join(columns)})"
    try:
        ibm_db.exec_immediate(conn, create_table_sql)
        print(f"Table '{table}' created successfully.")
    except Exception as e:
        print("Table creation failed, Please try again. ",e)

def insert_column(df, table, values):
    print("Inserting rows into table")
    try:
        columns = ', '.join(map(lambda col: f'"{col}"', df.columns))
        insert_sql = f"INSERT INTO {table} ({columns}) VALUES {values}"
        ibm_db.exec_immediate(conn, insert_sql)
        ibm_db.commit(conn)
        print("Data inserted successfully.")
    except Exception as e:
        print("Data insertion unsuccessfully.", e)


def extract_values_from_csv(df):
    print("Extracting values")
    values = []
    count = 0
    for index, row in df.iterrows():
        sd = str(tuple(row))
        values.append(sd)
        count += 1
    final_value = ",".join(values)
    print(f"Extracted {count} rows from csv file.")
    return final_value

### Inserting training data and testing data into database

In [9]:
# Inserting training_data to database.
print("..............Inserting training data.............")
csv_file_path = 'german_credit_data_biased_training.csv'
data = pd.read_csv(csv_file_path)
table_name = TRAINING_DATA_TABLE_NAME
create_table(df=data, table=table_name, schema=TRAINING_DATA_SCHEMA_NAME)
value = extract_values_from_csv(df=data)
insert_column(table=table_name, df=data, values=value)

# Inserting test_data to database
print("..............Inserting test data.............")
csv_file_path = 'credit_risk_test_data.csv'
data = pd.read_csv(csv_file_path)
table_name = EVALUATION_DATA_TABLE_NAME
create_table(df=data, table=table_name, schema=EVALUATION_DATA_SCHEMA_NAME)
value = extract_values_from_csv(df=data)
insert_column(table=table_name, df=data, values=value)

print("..............Closing connection.............")
ibm_db.close(conn)
print("..............Connection closed..............")

..............Inserting training data.............
Creating new table
Table 'TRAINING_DATA' created successfully.
Extracting values
Extracted 5000 rows from csv file.
Inserting rows into table
Data inserted successfully.
..............Inserting test data.............
Table Deleted, Creating new table
Table 'TESTING_DATA' created successfully.
Extracting values
Extracted 1000 rows from csv file.
Inserting rows into table
Data inserted successfully.
..............Closing connection.............
..............Connection closed..............


## 2.3 Create a model

In [10]:
spark_df = df_data
(train_data, test_data) = spark_df.randomSplit([0.8, 0.2], 24)

print("Number of records for training: " + str(train_data.count()))
print("Number of records for evaluation: " + str(test_data.count()))

spark_df.printSchema()

Number of records for training: 4005
Number of records for evaluation: 995
root
 |-- CheckingStatus: string (nullable = true)
 |-- LoanDuration: integer (nullable = true)
 |-- CreditHistory: string (nullable = true)
 |-- LoanPurpose: string (nullable = true)
 |-- LoanAmount: integer (nullable = true)
 |-- ExistingSavings: string (nullable = true)
 |-- EmploymentDuration: string (nullable = true)
 |-- InstallmentPercent: integer (nullable = true)
 |-- Sex: string (nullable = true)
 |-- OthersOnLoan: string (nullable = true)
 |-- CurrentResidenceDuration: integer (nullable = true)
 |-- OwnsProperty: string (nullable = true)
 |-- Age: integer (nullable = true)
 |-- InstallmentPlans: string (nullable = true)
 |-- Housing: string (nullable = true)
 |-- ExistingCreditsCount: integer (nullable = true)
 |-- Job: string (nullable = true)
 |-- Dependents: integer (nullable = true)
 |-- Telephone: string (nullable = true)
 |-- ForeignWorker: string (nullable = true)
 |-- Risk: string (nullable = 

The code below creates a **Random Forest Classifier** with Spark, setting up string indexers for the categorical features and the label column. Finally, this notebook creates a pipeline including the indexers and the model and does an initial **Area Under ROC** evaluation of the model.

In [11]:
from pyspark.ml.feature import StringIndexer, IndexToString, VectorAssembler
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml import Pipeline

features = [x for x in spark_df.columns if x != 'Risk']
categorical_features = ['CheckingStatus', 'CreditHistory', 'LoanPurpose', 'ExistingSavings', 'EmploymentDuration', 'Sex', 'OthersOnLoan', 'OwnsProperty', 'InstallmentPlans', 'Housing', 'Job', 'Telephone', 'ForeignWorker']
categorical_num_features = [x + '_IX' for x in categorical_features]
si_list = [StringIndexer(inputCol=x, outputCol=y) for x, y in zip(categorical_features, categorical_num_features)]
va_features = VectorAssembler(inputCols=categorical_num_features + [x for x in features if x not in categorical_features], outputCol="features")
si_label = StringIndexer(inputCol="Risk", outputCol="label").fit(spark_df)
label_converter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=si_label.labels)

In [12]:
from pyspark.ml.classification import RandomForestClassifier

classifier = RandomForestClassifier(featuresCol="features")
pipeline = Pipeline(stages= si_list + [si_label, va_features, classifier, label_converter])

model = pipeline.fit(train_data)

In [13]:
predictions = model.transform(test_data)
evaluatorDT = BinaryClassificationEvaluator(rawPredictionCol="prediction",  metricName='areaUnderROC')
area_under_curve = evaluatorDT.evaluate(predictions)

evaluatorDT = BinaryClassificationEvaluator(rawPredictionCol="prediction",  metricName='areaUnderPR')
area_under_PR = evaluatorDT.evaluate(predictions)

# Default evaluation is areaUnderROC
print("areaUnderROC = %g" % area_under_curve, "areaUnderPR = %g" % area_under_PR)

areaUnderROC = 0.727776 areaUnderPR = 0.685996


## 2.4 Create and Save the Model

Save and deploy the German Credit Risk model into the WML instance that is designated as **Pre-Production**.

In [14]:
PRE_PROD_MODEL_NAME = "GCR Model"
PRE_PROD_DEPLOYMENT_NAME = "GCR Model Deployment"
SPACE_ID = "*********"

In [15]:
from ibm_watson_machine_learning import APIClient

wml_client = APIClient(WML_CREDENTIALS)
print(wml_client.version)
wml_client.set.default_space(SPACE_ID)

1.0.355


'SUCCESS'

### 2.4.1 Cleaning up existing model, deployments, subscriptions

In [16]:
from ibm_watson_machine_learning.wml_client_error import WMLClientError

deployments_list = wml_client.deployments.get_details()
models_to_delete = []
deployments_deleted = []
for deployment in deployments_list["resources"]:
    model_id = deployment["entity"]["asset"]["id"]
    dep_model_name = wml_client.repository.get_details(model_id)["metadata"]["name"]
    deployment_id = deployment["metadata"]["id"]
    if deployment["metadata"]["name"] == PRE_PROD_DEPLOYMENT_NAME or dep_model_name == PRE_PROD_MODEL_NAME:
        deployments_deleted.append(deployment_id)
        if model_id not in models_to_delete:
            models_to_delete.append(model_id)

for deployment_id in deployments_deleted:
    try:
        print("Deleting deployment id", deployment_id)
        wml_client.deployments.delete(deployment_id)
    except WMLClientError as wce:
        if "deployment_does_not_exist" in wce.error_msg:
            # Shadow deployment
            pass
        else:
            raise wce

for model_id in models_to_delete:
    print("Deleting model id", model_id)
    wml_client.repository.delete(model_id)
wml_client.repository.list_models()

------------------------------------  ---------  ------------------------  ---------  ----------  ----------------
ID                                    NAME       CREATED                   TYPE       SPEC_STATE  SPEC_REPLACEMENT
d080e2d9-b305-4ebc-8642-22573f01409d  GCR Model  2024-09-03T08:30:39.002Z  mllib_3.3  deprecated  spark-mllib_3.4
------------------------------------  ---------  ------------------------  ---------  ----------  ----------------


Unnamed: 0,ID,NAME,CREATED,TYPE,SPEC_STATE,SPEC_REPLACEMENT
0,d080e2d9-b305-4ebc-8642-22573f01409d,GCR Model,2024-09-03T08:30:39.002Z,mllib_3.3,deprecated,spark-mllib_3.4


### 2.4.2 Save the Model

In [17]:
model_props = {
    wml_client.repository.ModelMetaNames.NAME: PRE_PROD_MODEL_NAME,
    wml_client.repository.ModelMetaNames.TYPE: 'mllib_3.3',
    wml_client.repository.ModelMetaNames.SOFTWARE_SPEC_UID: wml_client.software_specifications.get_id_by_name("spark-mllib_3.3")
}

published_model_details = wml_client.repository.store_model(model=model, meta_props=model_props, 
                                                        training_data=train_data, pipeline=pipeline)
model_uid = wml_client.repository.get_model_id(published_model_details)
print("Model UID:" + model_uid)

Model UID:03655de9-b000-4f6a-b54b-2f1ef29a8598


## 2.5 Deploy the Model

The next section of the notebook deploys the model as a RESTful web service in Watson Machine Learning. The deployed model will have a scoring URL you can use to send data to the model for predictions.

In [18]:
deployment_details = wml_client.deployments.create(
    model_uid, 
    meta_props={
        wml_client.deployments.ConfigurationMetaNames.NAME: "{}".format(PRE_PROD_DEPLOYMENT_NAME),
        wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
    }
)

scoring_url = wml_client.deployments.get_scoring_href(deployment_details)
deployment_uid = wml_client.deployments.get_uid(deployment_details)

print("Scoring URL: {}".format(scoring_url))
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))



#######################################################################################

Synchronous deployment creation for uid: '03655de9-b000-4f6a-b54b-2f1ef29a8598' started

#######################################################################################


initializing
Note: Software specification spark-mllib_3.3 is deprecated. Use spark-mllib_3.4 software specification instead when saving a spark model. For details, see https://www.ibm.com/support/producthub/icpdata/docs/content/SSQNUZ_latest/wsj/wmls/wmls-deploy-python-types.html.
..........
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='a781b922-0a05-4523-b355-bec0377f2646'
------------------------------------------------------------------------------------------------


Scoring URL: https://cpd-cpd-instance.apps.wxg416nfs2822.cp.fyre.ibm.com/ml/v4/deployments/a781b922-0a05-4523-b355-bec0377f2646/predicti

### 2.5.1 Sample scoring

In [19]:
fields = ["CheckingStatus", "LoanDuration", "CreditHistory", "LoanPurpose", "LoanAmount", "ExistingSavings",
                  "EmploymentDuration", "InstallmentPercent", "Sex", "OthersOnLoan", "CurrentResidenceDuration",
                  "OwnsProperty", "Age", "InstallmentPlans", "Housing", "ExistingCreditsCount", "Job", "Dependents",
                  "Telephone", "ForeignWorker"]
values = [
            ["no_checking", 13, "credits_paid_to_date", "car_new", 1343, "100_to_500", "1_to_4", 2, "female", "none", 3,
             "savings_insurance", 46, "none", "own", 2, "skilled", 1, "none", "yes"],
            ["no_checking", 24, "prior_payments_delayed", "furniture", 4567, "500_to_1000", "1_to_4", 4, "male", "none",
             4, "savings_insurance", 36, "none", "free", 2, "management_self-employed", 1, "none", "yes"],
        ]

scoring_payload = {"input_data": [{"fields": fields, "values": values}]}

In [20]:
scoring_response = wml_client.deployments.score(deployment_uid, scoring_payload)
scoring_response

{'predictions': [{'fields': ['CheckingStatus',
    'LoanDuration',
    'CreditHistory',
    'LoanPurpose',
    'LoanAmount',
    'ExistingSavings',
    'EmploymentDuration',
    'InstallmentPercent',
    'Sex',
    'OthersOnLoan',
    'CurrentResidenceDuration',
    'OwnsProperty',
    'Age',
    'InstallmentPlans',
    'Housing',
    'ExistingCreditsCount',
    'Job',
    'Dependents',
    'Telephone',
    'ForeignWorker',
    'CheckingStatus_IX',
    'CreditHistory_IX',
    'LoanPurpose_IX',
    'ExistingSavings_IX',
    'EmploymentDuration_IX',
    'Sex_IX',
    'OthersOnLoan_IX',
    'OwnsProperty_IX',
    'InstallmentPlans_IX',
    'Housing_IX',
    'Job_IX',
    'Telephone_IX',
    'ForeignWorker_IX',
    'label',
    'features',
    'rawPrediction',
    'probability',
    'prediction',
    'predictedLabel'],
   'values': [['no_checking',
     13,
     'credits_paid_to_date',
     'car_new',
     1343,
     '100_to_500',
     '1_to_4',
     2,
     'female',
     'none',
     3,


# 3. Configure OpenScale <a name="openscale"></a>

The subsequent cells will now import the necessary libraries and set up a Python OpenScale client.

In [21]:
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator
from ibm_watson_openscale import APIClient
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *

### 3.1 Initialize the APIClient

In [22]:
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator
from ibm_watson_openscale import APIClient as WOSAPIClient

SERVICE_INSTANCE_ID = DATA_MART_ID = "00000000-0000-0000-0000-000000000000"
wos_client = WOSAPIClient(
    authenticator=CloudPakForDataAuthenticator(
        url=cpd_url,
        username=cpd_username,
        password=cpd_password,
        disable_ssl_verification=True
    ),
    service_url=cpd_url,
    service_instance_id=SERVICE_INSTANCE_ID
)

print(wos_client.version)

3.0.36


In [23]:
# Listing service providers

wos_client.service_providers.show()

0,1,2,3,4,5
99999999-9999-9999-9999-999999999999,active,Venktesh_ml,watson_machine_learning,2024-09-03 06:37:51.447000+00:00,bd6a7cf2-0083-41e9-b6de-e6375a0aa45b
99999999-9999-9999-9999-999999999999,active,PROJECT_kp,watson_machine_learning,2024-08-29 14:01:24.021000+00:00,b8a8853d-0c69-4036-8668-0e4b6ac3fec6
99999999-9999-9999-9999-999999999999,active,ForShyamUse,watson_machine_learning,2024-08-21 08:11:28.250000+00:00,f93c584f-35bd-486f-a40a-e7f8c211b724
,active,batch_subscription_notebook,custom_machine_learning,2024-08-20 12:50:52.921000+00:00,68923d3b-e72f-4750-83fe-201a162ee031
,active,CUSTOM_APIKEY_CLOUD_WITHOUTAPI_PREPROD,custom_machine_learning,2024-08-20 12:00:19.739000+00:00,1314bd04-6dae-4f50-9a6a-e0c564e3b2df
,active,CUSTOM_APIKEY_CLOUD_WITHOUTAPI,custom_machine_learning,2024-08-20 12:00:19.568000+00:00,19470576-e1e7-480d-b36f-a138aecdf279
4ca7ec1c-6b35-48ef-a45c-0434c6985058,active,MRM_WMLV4_CLOUD_PREPROD,watson_machine_learning,2024-08-20 12:00:19.337000+00:00,eec6ca29-5365-4abd-96d3-a4c359db00c5
4ca7ec1c-6b35-48ef-a45c-0434c6985058,active,MRM_WMLV4_CLOUD_PROD,watson_machine_learning,2024-08-20 12:00:15.776000+00:00,8a7e2984-4a00-4d85-9940-e7f601b8973b
,active,CUSTOM_HLS_PREPROD,custom_machine_learning,2024-08-20 12:00:10.857000+00:00,1b3f7bad-3867-4bf8-a64c-f9e3a8b3df86
,active,CUSTOM_BATCH_PROD,custom_machine_learning,2024-08-20 12:00:10.695000+00:00,67a22c04-8572-465f-a5e1-e3f2ea1d0767


Note: First 10 records were displayed.


In [24]:
# Copy the ID of your service provider from the `id` column in the output of the cell above

SERVICE_PROVIDER_ID = "bd6a7cf2-0083-41e9-b6de-e6375a0aa45b"

### 3.2 Add Subscription

In [25]:
# Remove existing credit risk subscription

wos_client.subscriptions.show()

0,1,2,3,4,5,6,7,8
03655de9-b000-4f6a-b54b-2f1ef29a8598,GCR Model,00000000-0000-0000-0000-000000000000,a781b922-0a05-4523-b355-bec0377f2646,GCR Model Deployment,bd6a7cf2-0083-41e9-b6de-e6375a0aa45b,active,2024-09-03 10:35:16.177000+00:00,9e6a663a-edbf-479b-9b3a-e2d882bb2558
6c692383-4065-48e5-a21d-540648b4b2fa,Drift-ScikitGolfMulti_default_py3.10,00000000-0000-0000-0000-000000000000,70f8c361-acb9-4eac-a98f-3a475daa154b,dep_ScikitGolfMulti_default_py3.10,f93c584f-35bd-486f-a40a-e7f8c211b724,active,2024-09-02 05:35:00.011000+00:00,b3b32e48-f2d2-4f3b-a46f-6951071f395a
5e87945c-6075-498b-8cb7-2c7b8b564a51,new golf model - P9 Snap Decision Tree Classifier - Model,00000000-0000-0000-0000-000000000000,8d5e9a72-9040-4045-ad52-3aea108bcd4c,new golf model,f93c584f-35bd-486f-a40a-e7f8c211b724,active,2024-08-23 10:18:20.656000+00:00,9925ef6b-29db-4c48-bbd0-ad8e47b4a0b4
b043fb38-e53c-4ac5-a462-9013a2dfdd8c,house price regression again - P5 Snap Boosting Machine Regressor - Model,00000000-0000-0000-0000-000000000000,1abcc8a5-8c5f-48ed-8562-d7cb10ce2b51,house price regression again - P5 Snap Boosting Machine Regressor - Model,f93c584f-35bd-486f-a40a-e7f8c211b724,active,2024-08-22 08:20:18.049000+00:00,6251098f-3190-43f3-85ad-dc9124174a6d
10f198c7-ded1-4ef5-a439-fb9553cb56f2,MNIST Model,00000000-0000-0000-0000-000000000000,501ac5ac-a2da-492e-b66f-463a5a122415,MNIST Model deployment,f93c584f-35bd-486f-a40a-e7f8c211b724,active,2024-08-21 09:22:54.789000+00:00,65d229c3-3744-47c5-9555-48112310e8d9
962f754d-370b-4cdc-a29b-ab68b9cc00e1,text classifier - P1 Snap Boosting Machine Classifier - Model,00000000-0000-0000-0000-000000000000,ec82fccd-2512-4e1a-85ed-845e867ee3c4,text classifier,f93c584f-35bd-486f-a40a-e7f8c211b724,active,2024-08-21 08:32:25.193000+00:00,12c86fe3-9bf0-4bd2-b041-dd15395b97f9
4f55e65a-a2bc-48ab-b21f-64e40d0bbb2c,GCR_model - P2 LGBM Classifier - Model,00000000-0000-0000-0000-000000000000,88f959f9-9c3d-4a55-bca4-64b759da4d0c,GCR_model,f93c584f-35bd-486f-a40a-e7f8c211b724,active,2024-08-21 08:11:37.687000+00:00,c189ad5e-4301-4b2c-8a17-1a674d1e236f
8fafe58b-5c9a-44ce-b672-195c1f082565,boston_sales_regression - P10 LGBM Regressor - Model,00000000-0000-0000-0000-000000000000,9f707899-d417-4ec3-969f-c76426406c97,bostonsales_regression_online,68923d3b-e72f-4750-83fe-201a162ee031,active,2024-08-21 03:32:29.544000+00:00,8864335e-02fc-4658-beb9-0fcac1443342
2093ba9b-9fbd-4747-ba94-ced5053ea2b9,regression type - P4 Ridge - Model,00000000-0000-0000-0000-000000000000,84acf458-b88f-495c-a3bf-49123c59348f,online_subscription_regression_type,68923d3b-e72f-4750-83fe-201a162ee031,active,2024-08-20 14:10:35.105000+00:00,d491644e-86d5-41f1-bf91-c1f31c71686d
aab8c94d-433e-46bd-b32d-f87144afbd11,batch_subscription_regression_model,00000000-0000-0000-0000-000000000000,7360e7d2-2a76-45d9-a802-90289263417d,batch_subscription_regression_model,68923d3b-e72f-4750-83fe-201a162ee031,active,2024-08-20 12:51:27.757000+00:00,a47223ca-8f81-4f4b-85c6-945b55804226


Note: First 10 records were displayed.


In [26]:
subscriptions = wos_client.subscriptions.list().result.subscriptions
for subscription in subscriptions:
    sub_model_id = subscription.entity.asset.asset_id
    if sub_model_id == model_uid:
        wos_client.subscriptions.delete(subscription.metadata.id)
        print('Deleted existing subscription for model', sub_model_id)

Deleted existing subscription for model 03655de9-b000-4f6a-b54b-2f1ef29a8598


In [27]:
asset_deployment_details = wos_client.service_providers.list_assets(data_mart_id=DATA_MART_ID, 
    service_provider_id=SERVICE_PROVIDER_ID, deployment_id = deployment_uid, deployment_space_id=SPACE_ID).result['resources'][0]
asset_deployment_details

{'metadata': {'guid': 'a781b922-0a05-4523-b355-bec0377f2646',
  'created_at': '2024-09-03T10:14:18.306Z',
  'modified_at': '2024-09-03T10:14:18.306Z'},
 'entity': {'name': 'GCR Model Deployment',
  'type': 'online',
  'scoring_endpoint': {'url': 'https://internal-nginx-svc:12443/ml/v4/deployments/a781b922-0a05-4523-b355-bec0377f2646/predictions'},
  'asset': {},
  'asset_properties': {}}}

In [28]:
model_asset_details_from_deployment=wos_client.service_providers.get_deployment_asset(data_mart_id=DATA_MART_ID,
    service_provider_id=SERVICE_PROVIDER_ID, deployment_id=deployment_uid, deployment_space_id=SPACE_ID)
model_asset_details_from_deployment

{'metadata': {'guid': 'a781b922-0a05-4523-b355-bec0377f2646',
  'created_at': '2024-09-03T10:14:18.306Z',
  'modified_at': '2024-09-03T10:14:18.306Z'},
 'entity': {'name': 'GCR Model Deployment',
  'type': 'online',
  'scoring_endpoint': {'url': 'https://internal-nginx-svc:12443/ml/v4/deployments/a781b922-0a05-4523-b355-bec0377f2646/predictions'},
  'asset': {'asset_id': '03655de9-b000-4f6a-b54b-2f1ef29a8598',
   'url': 'https://internal-nginx-svc:12443/ml/v4/models/03655de9-b000-4f6a-b54b-2f1ef29a8598?space_id=be4c13b3-80fe-483c-a33b-68ddda21ae3e&version=2020-06-12',
   'name': 'GCR Model',
   'asset_type': 'model',
   'created_at': '2024-09-03T10:13:57.890Z',
   'modified_at': '2024-09-03T10:14:17.748Z'},
  'asset_properties': {'model_type': 'mllib_3.3',
   'runtime_environment': 'spark-3.3.0',
   'label_column': 'Risk',
   'input_data_schema': {'type': 'struct',
    'id': '1',
    'fields': [{'name': 'CheckingStatus',
      'type': 'string',
      'nullable': True,
      'metadata':

In [29]:
subscription_details = wos_client.subscriptions.add(
        data_mart_id=DATA_MART_ID,
        background_mode=False,
        service_provider_id=SERVICE_PROVIDER_ID,
        asset=Asset(
            asset_id=model_asset_details_from_deployment["entity"]["asset"]["asset_id"],
            name=model_asset_details_from_deployment["entity"]["asset"]["name"],
            url=model_asset_details_from_deployment["entity"]["asset"]["url"],
            asset_type=AssetTypes.MODEL,
            input_data_type=InputDataType.STRUCTURED,
            problem_type=ProblemType.BINARY_CLASSIFICATION
        ),
        deployment=AssetDeploymentRequest(
            deployment_id=asset_deployment_details['metadata']['guid'],
            name=asset_deployment_details['entity']['name'],
            deployment_type= DeploymentTypes.ONLINE,
            url=asset_deployment_details['entity']['scoring_endpoint']['url']
        ),
        asset_properties=AssetPropertiesRequest(
            label_column='Risk',
            probability_fields=['probability'],
            prediction_field='predictedLabel',
            feature_fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
            categorical_fields = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"],
            training_data_reference=TrainingDataReference(
            type="db2",
            location=DB2TrainingDataReferenceLocation(
                    table_name=TRAINING_DATA_TABLE_NAME,
                    schema_name=TRAINING_DATA_SCHEMA_NAME
                ),
                connection=DB2TrainingDataReferenceConnection.from_dict(DB2_CREDENTIALS)
            ),
            training_data_schema=SparkStruct.from_dict(model_asset_details_from_deployment["entity"]["asset_properties"]["training_data_schema"])
        )
    ).result

subscription_id = subscription_details.metadata.id
subscription_id




 Waiting for end of adding subscription 126b84d3-3b9d-4ef8-8a96-201ba9606ee0 




active

-------------------------------------------
 Successfully finished adding subscription 
-------------------------------------------




'126b84d3-3b9d-4ef8-8a96-201ba9606ee0'

In [30]:
# Check Payload Logging table status

import time

time.sleep(5)
payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(
    type=DataSetTypes.PAYLOAD_LOGGING, 
    target_target_id=subscription_id, 
    target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id

if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id: ", payload_data_set_id)

Payload data set id:  4e2b4898-d40f-4caa-8c24-4b6cfafaa728


In [31]:
# List all Subscription Data Sets

wos_client.data_sets.show()

0,1,2,3,4,5,6
00000000-0000-0000-0000-000000000000,active,126b84d3-3b9d-4ef8-8a96-201ba9606ee0,subscription,payload_logging_error,2024-09-03 11:25:17.890000+00:00,ac37d611-4365-4a63-8f05-13f5ffc0069b
00000000-0000-0000-0000-000000000000,active,126b84d3-3b9d-4ef8-8a96-201ba9606ee0,subscription,manual_labeling,2024-09-03 11:25:17.679000+00:00,737338b0-cc01-4f24-9d98-082afd1e19f7
00000000-0000-0000-0000-000000000000,active,126b84d3-3b9d-4ef8-8a96-201ba9606ee0,subscription,payload_logging,2024-09-03 11:25:17.401000+00:00,4e2b4898-d40f-4caa-8c24-4b6cfafaa728
00000000-0000-0000-0000-000000000000,active,c189ad5e-4301-4b2c-8a17-1a674d1e236f,subscription,model_health,2024-08-21 08:11:40.878000+00:00,cc714034-8b40-4f41-8057-98744d1ca971
00000000-0000-0000-0000-000000000000,active,b3b32e48-f2d2-4f3b-a46f-6951071f395a,subscription,model_health,2024-09-02 05:35:03.247000+00:00,bb8ed13e-8cd0-4562-a684-fb6af1a34afc
00000000-0000-0000-0000-000000000000,active,b3b32e48-f2d2-4f3b-a46f-6951071f395a,subscription,payload_logging_error,2024-09-02 05:35:01.617000+00:00,36f944fd-03a7-4a41-a0c0-e582b9cc286b
00000000-0000-0000-0000-000000000000,active,b3b32e48-f2d2-4f3b-a46f-6951071f395a,subscription,training,2024-09-02 05:35:01.504000+00:00,dfb9f441-2745-4ad0-9b56-f109eaf8260a
00000000-0000-0000-0000-000000000000,active,b3b32e48-f2d2-4f3b-a46f-6951071f395a,subscription,manual_labeling,2024-09-02 05:35:01.358000+00:00,b070ce8d-6214-46e0-8bfb-56adc69f31d0
00000000-0000-0000-0000-000000000000,active,b3b32e48-f2d2-4f3b-a46f-6951071f395a,subscription,payload_logging,2024-09-02 05:35:01.188000+00:00,948f0f0f-e0ac-4c8b-a230-0cc5629de337
00000000-0000-0000-0000-000000000000,active,c189ad5e-4301-4b2c-8a17-1a674d1e236f,subscription,payload_logging,2024-08-21 08:11:39.165000+00:00,aa7af68d-94fc-4d1f-836c-adec2405b068


Note: First 10 records were displayed.


### 3.3 Score the model before Monitor Configurations

Now that the WML service has been bound and the subscription has been created, we need to send a request to the model before we configure OpenScale. This allows OpenScale to create a payload log in the datamart with the correct schema, so it can capture data coming into and out of the model.

In [32]:
fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"]
values = [
  ["no_checking",13,"credits_paid_to_date","car_new",1343,"100_to_500","1_to_4",2,"female","none",3,"savings_insurance",46,"none","own",2,"skilled",1,"none","yes"],
  ["no_checking",24,"prior_payments_delayed","furniture",4567,"500_to_1000","1_to_4",4,"male","none",4,"savings_insurance",36,"none","free",2,"management_self-employed",1,"none","yes"],
  ["0_to_200",26,"all_credits_paid_back","car_new",863,"less_100","less_1",2,"female","co-applicant",2,"real_estate",38,"none","own",1,"skilled",1,"none","yes"],
  ["0_to_200",14,"no_credits","car_new",2368,"less_100","1_to_4",3,"female","none",3,"real_estate",29,"none","own",1,"skilled",1,"none","yes"],
  ["0_to_200",4,"no_credits","car_new",250,"less_100","unemployed",2,"female","none",3,"real_estate",23,"none","rent",1,"management_self-employed",1,"none","yes"],
  ["no_checking",17,"credits_paid_to_date","car_new",832,"100_to_500","1_to_4",2,"male","none",2,"real_estate",42,"none","own",1,"skilled",1,"none","yes"],
  ["no_checking",33,"outstanding_credit","appliances",5696,"unknown","greater_7",4,"male","co-applicant",4,"unknown",54,"none","free",2,"skilled",1,"yes","yes"],
  ["0_to_200",13,"prior_payments_delayed","retraining",1375,"100_to_500","4_to_7",3,"male","none",3,"real_estate",37,"none","own",2,"management_self-employed",1,"none","yes"]
]

payload_scoring = {"fields": fields,"values": values}
payload = {
    wml_client.deployments.ScoringMetaNames.INPUT_DATA: [payload_scoring]
}
scoring_response = wml_client.deployments.score(deployment_uid, payload)

print('Single record scoring result:', '\n fields:', scoring_response['predictions'][0]['fields'], '\n values: ', scoring_response['predictions'][0]['values'][0])

Single record scoring result: 
 fields: ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'CheckingStatus_IX', 'CreditHistory_IX', 'LoanPurpose_IX', 'ExistingSavings_IX', 'EmploymentDuration_IX', 'Sex_IX', 'OthersOnLoan_IX', 'OwnsProperty_IX', 'InstallmentPlans_IX', 'Housing_IX', 'Job_IX', 'Telephone_IX', 'ForeignWorker_IX', 'label', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'] 
 values:  ['no_checking', 13, 'credits_paid_to_date', 'car_new', 1343, '100_to_500', '1_to_4', 2, 'female', 'none', 3, 'savings_insurance', 46, 'none', 'own', 2, 'skilled', 1, 'none', 'yes', 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, [20, [1, 3, 5, 13, 14, 15, 16, 17, 18, 19], [1.0, 1.

In [33]:
# Check whether WML payload logging worked; else manually store payload records

import uuid
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    print("Payload logging did not happen, performing explicit payload logging.")
    wos_client.data_sets.store_records(data_set_id=payload_data_set_id, request_body=[PayloadRecord(
                   scoring_id=str(uuid.uuid4()),
                   request=payload_scoring,
                   response={"fields": scoring_response['predictions'][0]['fields'], "values":scoring_response['predictions'][0]['values']},
                   response_time=460
               )])
    time.sleep(5)
    pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))

Number of records in the payload logging table: 8


# 4. Monitor Configurations <a name="monitor"></a>

### 4.1 Quality Monitor

The cell below waits ten seconds to allow the payload logging table to be set up before it begins enabling monitors. First, it turns on the quality (accuracy) monitor and sets an alert threshold of 80%.

The second paramater supplied, `min_feedback_data_size`, specifies the minimum number of feedback records OpenScale needs before it calculates a new measurement and `max_rows_per_evaluation` specifies the maximum number of the records for which quality metrics can be evaluated. The quality monitor runs hourly, but the accuracy reading in the dashboard will not change until an additional 50 feedback records have been added, via the user interface, the Python client, or the supplied feedback logging endpoint.

In [34]:
import time

time.sleep(10)
max_records = None
#Update the max_records value when you want to consider it during quality metrics evaluation
#max_records = 80

target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)

parameters =  dict()
parameters["min_feedback_data_size"] = 50
if max_records is not None:
    parameters["max_rows_per_evaluation"] = max_records

thresholds = [
    {
        "metric_id": "area_under_roc",
        "type": "lower_limit",
        "value": 0.8
    }
]

quality_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=DATA_MART_ID,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.QUALITY.ID,
    target=target,
    parameters=parameters,
    thresholds=thresholds
).result




 Waiting for end of monitor instance creation 721cf5fd-4fb9-45ac-b7b1-e134cc36d668 




preparing
active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




### 4.2 Fairness Monitor

The code below configures fairness monitoring for our model. It turns on monitoring for two features, `Sex` and `Age`. In each case, we must specify:
  * Which model feature to monitor.
  * One or more **majority** groups, which are values of that feature that we expect to receive a higher percentage of favorable outcomes.
  * One or more **minority** groups, which are values of that feature that we expect to receive a higher percentage of unfavorable outcomes.
  * The threshold at which we would like OpenScale to display an alert if the fairness measurement falls below (in this case, 95%).

Additionally, we must specify which outcomes from the model are favourable outcomes, and which are unfavourable. We must also provide the number of records OpenScale will use to calculate the fairness score.

In [35]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)

parameters = {
    "features": [
        {"feature": "Sex",
         "majority": ['male'],
         "minority": ['female'],
         "threshold": 0.95
         },
        {"feature": "Age",
         "majority": [[26, 75]],
         "minority": [[18, 25]],
         "threshold": 0.95
         }
    ],
    "favourable_class": ["No Risk"],
    "unfavourable_class": ["Risk"],
    "min_records": 50
}

fairness_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=DATA_MART_ID,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.FAIRNESS.ID,
    target=target,
    parameters=parameters).result

fairness_monitor_instance_id =fairness_monitor_details.metadata.id
fairness_monitor_instance_id




 Waiting for end of monitor instance creation 832c4cf9-b262-4526-9bea-56b22271fe77 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




'832c4cf9-b262-4526-9bea-56b22271fe77'

### 4.3 Drift Monitor

In [36]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)

parameters = {
    "min_samples": 50,
    "drift_threshold": 0.1,
    "train_drift_model": True,
    "enable_model_drift": True,
    "enable_data_drift": True
}

drift_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=DATA_MART_ID,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.DRIFT.ID,
    target=target,
    parameters=parameters
).result

drift_monitor_instance_id = drift_monitor_details.metadata.id
drift_monitor_instance_id




 Waiting for end of monitor instance creation 96fd0556-48ff-49d7-9304-2de7d614b96d 




preparing...............
active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




'96fd0556-48ff-49d7-9304-2de7d614b96d'

### 4.4 Explainability

In [37]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)
parameters = {
    "enabled": True
}
explainability_details = wos_client.monitor_instances.create(
    data_mart_id=DATA_MART_ID,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.EXPLAINABILITY.ID,
    target=target,
    parameters=parameters
).result

explainability_monitor_id = explainability_details.metadata.id




 Waiting for end of monitor instance creation e5cca546-b5a4-4ac3-b57d-2aad33aba3b6 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




In [38]:
# Run Sample Explanation

pl_records_resp = wos_client.data_sets.get_list_of_records(data_set_id=payload_data_set_id, limit=1, offset=0).result
scoring_ids = [pl_records_resp["records"][0]["entity"]["values"]["scoring_id"]]

print("Running explanations on scoring IDs: {}".format(scoring_ids))
explanation_types = ["lime", "contrastive"]
result = wos_client.monitor_instances.explanation_tasks(scoring_ids=scoring_ids, explanation_types=explanation_types, subscription_id=subscription_id).result

print(result)

Running explanations on scoring IDs: ['MRM_c8826f59-dd83-4f7b-8853-f05390db349e-5-1']
{
  "metadata": {
    "explanation_task_ids": [
      "48b376bf-366b-4c54-8172-916dc5883b32"
    ],
    "created_by": "1000331001",
    "created_at": "2024-09-03T13:58:05.524195Z"
  }
}


### 4.5 MRM Monitor

In [39]:
# Configuring MRM

target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)
parameters = {
    "enabled": True
}
mrm_details = wos_client.monitor_instances.create(
    data_mart_id=DATA_MART_ID,
    background_mode=False,
    monitor_definition_id="mrm",
    target=target,
    parameters=parameters
).result

MRM_MONITOR_INSTANCE_ID = mrm_details.metadata.id
MRM_MONITOR_INSTANCE_ID




 Waiting for end of monitor instance creation c5b75aa5-f86d-4a52-b723-fdabd96bc331 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




'c5b75aa5-f86d-4a52-b723-fdabd96bc331'

In [40]:
wos_client.monitor_definitions.show()

0,1,2
custom_monitor_def_t5,custom_monitor_def_t5,"['specificityt5m1', 'sensitivityt5m2', 'specificityt5m3', 'sensitivityt5m4', 'specificityt5m5', 'sensitivityt5m6', 'specificityt5m7', 'sensitivityt5m8', 'specificityt5m9', 'sensitivityt5m0']"
custom_monitor_def_t4,custom_monitor_def_t4,['specificityt4m1']
custom_monitor_def_t3,custom_monitor_def_t3,"['sensitivityt3m2', 'specificityt3m1']"
custom_monitor_def_t2,custom_monitor_def_t2,"['sensitivityt2m2', 'specificityt2m1']"
custom_monitor_def_t1,custom_monitor_def_t1,"['sensitivityt1m2', 'specificityt1m1']"
fairness,Fairness,"['Disparate impact', 'Average odds difference', 'False discovery rate difference', 'Error rate difference', 'False negative rate difference', 'False omission rate difference', 'False positive rate difference', 'True positive rate difference', 'Average absolute odds difference', 'Statistical parity difference', 'Impact score']"
model_health,Model health,"['Total scoring requests', 'Total records', 'Average records', 'Median records', 'Maximum records', 'Minimum records', 'Total payload size', 'Average payload size', 'Median payload size', 'Minimum payload size', 'Maximum payload size', 'Average API throughput', 'Minimum API throughput', 'Maximum API throughput', 'Median API throughput', 'Average API latency', 'Minimum API latency', 'Maximum API latency', 'Median API latency', 'Average record throughput', 'Minimum record throughput', 'Maximum record throughput', 'Median record throughput', 'Average record latency', 'Minimum record latency', 'Maximum record latency', 'Median record latency', 'Users', 'Errors', 'Data errors', 'System errors', 'Total input token count', 'Average input token count', 'Median input token count', 'Maximum input token count', 'Minimum input token count', 'Total output token count', 'Average output token count', 'Median output token count', 'Maximum output token count', 'Minimum output token count']"
performance,Performance,['Number of records']
data_health,Data Health,"['Absence Count', 'Empty Strings', 'Data Type Mismatch', 'Class Confusion', 'Duplicate Rows', 'Unique Columns']"
explainability,Explainability,['Global explanation stability']


Note: First 10 records were displayed.


## 5. MRM Evaluation <a name="mrm"></a>

In [41]:
import time
import requests
import json

time.sleep(20)

payload = json.dumps({
  "type": "db2",
  "connection": DB2_CREDENTIALS,
  "location": {
    "schema_name": EVALUATION_DATA_SCHEMA_NAME,
    "table_name": EVALUATION_DATA_TABLE_NAME
    }
})

file_name = "test_data.json"

with open(file_name, 'w') as file:
    file.write(payload)

response = wos_client.monitor_instances.mrm.evaluate_risk(
                monitor_instance_id = MRM_MONITOR_INSTANCE_ID,
                test_data_set_name = TEST_DATA_SET_NAME,
                test_data_path = file_name,
                content_type = "application/json")

print(response.result._to_dict())

{'evaluation_id': '2a24bf79-1936-4bc0-8fd7-02b9b42baaaf', 'evaluation_date': '2024-09-03T13:58:44.284000Z', 'publish_metrics': 'false', 'evaluation_tests': ['drift', 'fairness', 'quality', 'explainability', 'drift_v2'], 'evaluation_start_time': '2024-09-03T13:58:47.294869Z', 'status': {'state': 'UPLOAD_IN_PROGRESS'}}


### 5.1 Checking MRM evaluation progress

In [42]:
import time

def get_risk_evaluations():
    
    response = wos_client.monitor_instances.mrm.get_risk_evaluation(
                    monitor_instance_id = MRM_MONITOR_INSTANCE_ID).result
    
    return response._to_dict()

risk_evaluations_resp = get_risk_evaluations()

while risk_evaluations_resp["entity"]["status"]["state"] == "UPLOAD_IN_PROGRESS":
    print("UPLOAD_IN_PROGRESS")
    time.sleep(50)
    risk_evaluations_resp = get_risk_evaluations()

while risk_evaluations_resp["entity"]["status"]["state"] not in ["finished", "error"]:
    print("EVALUATION_IN_PROGRESS")
    time.sleep(50)
    risk_evaluations_resp = get_risk_evaluations()

mrm_evaluation_state = risk_evaluations_resp["entity"]["status"]["state"]
if mrm_evaluation_state == "finished":
    print("EVALUATION_COMPLETED")
else:
    print("MRM evaluation failed with state {}, error {}".format(mrm_evaluation_state, risk_evaluations_resp.json()["entity"]["status"]))

UPLOAD_IN_PROGRESS
EVALUATION_IN_PROGRESS
EVALUATION_IN_PROGRESS
EVALUATION_COMPLETED


### 5.2 Get Published Fact

In [43]:
import time,requests,json

def get_published_fact():

    url = "{}/v1/aigov/model_inventory/models/{}/system_facts?space_id={}&deployment_id={}".format(
        cpd_url, model_uid, SPACE_ID, deployment_uid)
    
    headers = {
        'Authorization': 'Bearer {}'.format(wos_client.authenticator.token_manager.get_token()),
        'Accept': 'application/json'
    }

    response = requests.request("GET", url, headers=headers, verify=False)
    return response

# Wait for a minute before fetching the published Fact
time.sleep(60)

published_fact = get_published_fact()
print(json.dumps(published_fact.json(), indent=4))

{
    "name": "GCR Model",
    "description": "",
    "asset_type": "wml_model",
    "created_at": "2024-09-03T10:13:57Z",
    "owner_id": "1000331001",
    "asset_id": "03655de9-b000-4f6a-b54b-2f1ef29a8598",
    "creator_id": "1000331001",
    "asset_details": {
        "id": "03655de9-b000-4f6a-b54b-2f1ef29a8598",
        "name": "GCR Model",
        "description": "",
        "created": "2024-09-03T10:13:57Z",
        "created_by": "1000331001",
        "last_modified": "2024-09-03T14:00:56Z",
        "asset_type": "wml_model"
    },
    "space_details": {
        "space_id": "be4c13b3-80fe-483c-a33b-68ddda21ae3e",
        "phase_name": "Undefined",
        "name": "Venktesh_preProd",
        "description": "",
        "space_type": "pre-production",
        "created_by": "1000331001",
        "created_at": "2024-09-03T05:18:58.922Z"
    },
    },
    "model_information": {
        "input_type": "structured",
        "algorithm": "",
        "prediction_type": "binary",
        "sof

Congratulations! You have reached the end of the demo notebook. Thanks for trying it out :)

### Authors
Developed by [Harshit Sharma](mailto:harshit2@in.ibm.com), Staff Software Engineer, Watson OpenScale