<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Watson Machine Learning

This notebook should be run in a Watson Studio project, using **Default Python 3.8** runtime environment. It requires service credentials for the following Cloud services:
  * Watson OpenScale 
  * Watson Machine Learning 
  * Cloud Object Storage
  
If you have a paid Cloud account, you may also provision a **Db2 Warehouse** service to take full advantage of integration with Watson Studio and continuous learning services.

The notebook will train, create and deploy a German Credit Risk model, configure OpenScale to monitor that deployment

### Contents

- [Setup](#setup)
- [Model building and deployment](#model)
- [OpenScale configuration](#openscale)
- [Create Trainig data json](#quality)
- [Fairness and explanations](#fairness)

# Setup <a name="setup"></a>

## Package installation
**NOTE** Using scikit-learn 0.20.2 is requirement for Drift detection model training. if you are training drift detection model using notebook, make sure you have scikit-learn version 0.20.2. Your main model can be of any scikit-learn framework version supported by Watson Machine Learning. 

In [None]:
!pip install --upgrade ibm-watson-machine-learning | tail -n 1
!pip install --upgrade ibm-watson-openscale | tail -n 1

### Action: restart the kernel!

## Configure credentials

- WOS_CREDENTIALS (CP4D)
- WML_CREDENTIALS (CP4D)
- DB_CREDENTIALS (DB2 on CP4D)
- SCHEMA_NAME
- WML_SPACE_ID

In [1]:
WOS_CREDENTIALS = {
    "url": "***",
    "username": "***",
    "password": "***"
}

### WML credentials example with API key

In [2]:
WML_CREDENTIALS = {
                   "url": WOS_CREDENTIALS["url"],
                   "username": WOS_CREDENTIALS["username"],
                   "password" : WOS_CREDENTIALS["password"],
                   "instance_id": "wml_local",
                   "version" : "4.0" #If your env is CP4D 4.0 then specify "4.0" instead of "3.5"
                  }

In [3]:
DB_CREDENTIALS = {
    "hostname":"***",
    "username":"***",
    "password":"***",
    "database":"***",
    "port":50000, #provide your actual DB2 port number (as integer value)
    "ssl":"***",
    "sslmode":"***",
    "certificate_base64":"***"}



### Action: Specify created schema name below.

In [4]:
SCHEMA_NAME = 'AIOSFASTPATHICP'

In next cells, you will need to paste some credentials to Cloud Object Storage. If you haven't worked with COS yet please visit getting started with COS tutorial. You can find `COS_API_KEY_ID` and `COS_RESOURCE_CRN` variables in **_Service Credentials_** in menu of your COS instance. Used COS Service Credentials must be created with _Role_ parameter set as Writer. Later training data file will be loaded to the bucket of your instance and used as training refecence in subsription.  
`COS_ENDPOINT` variable can be found in **_Endpoint_** field of the menu.

In [5]:
COS_API_KEY_ID = "***"
COS_RESOURCE_CRN = "***"
COS_ENDPOINT = "***" # Current list avaiable at https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints
COS_IAM_AUTH_ENDPOINT = "https://iam.ng.bluemix.net/oidc/token" 

In [6]:
BUCKET_NAME = "" #example: "credit-risk-training-data"

## Run the notebook

At this point, the notebook is ready to run. You can either run the cells one at a time, or click the **Kernel** option above and select **Restart and Run All** to run all the cells.

# Model building and deployment <a name="model"></a>

In this section you will learn how to train Scikit-learn model and next deploy it as web-service using Watson Machine Learning service.

## Load the training data from github

In [7]:
!rm german_credit_data_biased_training.csv
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/Cloud%20Pak%20for%20Data/WML/assets/data/credit_risk/german_credit_data_biased_training.csv    

--2024-08-07 14:12:04--  https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/Cloud%20Pak%20for%20Data/WML/assets/data/credit_risk/german_credit_data_biased_training.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 689622 (673K) [text/plain]
Saving to: ‘german_credit_data_biased_training.csv’


2024-08-07 14:12:04 (6.42 MB/s) - ‘german_credit_data_biased_training.csv’ saved [689622/689622]



In [8]:
import numpy as np
import pandas as pd

training_data_file_name = "german_credit_data_biased_training.csv"
data_df = pd.read_csv(training_data_file_name)

## Explore data

In [9]:
data_df.head()

Unnamed: 0,CheckingStatus,LoanDuration,CreditHistory,LoanPurpose,LoanAmount,ExistingSavings,EmploymentDuration,InstallmentPercent,Sex,OthersOnLoan,...,OwnsProperty,Age,InstallmentPlans,Housing,ExistingCreditsCount,Job,Dependents,Telephone,ForeignWorker,Risk
0,0_to_200,31,credits_paid_to_date,other,1889,100_to_500,less_1,3,female,none,...,savings_insurance,32,none,own,1,skilled,1,none,yes,No Risk
1,less_0,18,credits_paid_to_date,car_new,462,less_100,1_to_4,2,female,none,...,savings_insurance,37,stores,own,2,skilled,1,none,yes,No Risk
2,less_0,15,prior_payments_delayed,furniture,250,less_100,1_to_4,2,male,none,...,real_estate,28,none,own,2,skilled,1,yes,no,No Risk
3,0_to_200,28,credits_paid_to_date,retraining,3693,less_100,greater_7,3,male,none,...,savings_insurance,32,none,own,1,skilled,1,none,yes,No Risk
4,no_checking,28,prior_payments_delayed,education,6235,500_to_1000,greater_7,3,male,none,...,unknown,57,none,own,2,skilled,1,none,yes,Risk


In [10]:
print('Columns: ', list(data_df.columns))
print('Number of columns: ', len(data_df.columns))

Columns:  ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'Risk']
Number of columns:  21


As you can see, the data contains twenty one fields. `Risk` field is the one you would like to predict using feedback data.

In [11]:
print('Number of records: ', data_df.Risk.count())

Number of records:  5000


In [12]:
target_count = data_df.groupby('Risk')['Risk'].count()
target_count

Risk
No Risk    3330
Risk       1670
Name: Risk, dtype: int64

## Save training data to Cloud Object Storage

In [13]:
import ibm_boto3
from ibm_botocore.client import Config, ClientError

cos_client = ibm_boto3.resource("s3",
    ibm_api_key_id=COS_API_KEY_ID,
    ibm_service_instance_id=COS_RESOURCE_CRN,
    #ibm_auth_endpoint=COS_IAM_AUTH_ENDPOINT,
    config=Config(signature_version="oauth"),
    endpoint_url=COS_ENDPOINT
)

In [14]:
with open(training_data_file_name, "rb") as file_data:
    cos_client.Object(BUCKET_NAME, training_data_file_name).upload_fileobj(
        Fileobj=file_data
    )

## Create a model
In this section you will learn how to:

- Prepare data for training a model
- Create machine learning pipeline
- Train a model

In [15]:
MODEL_NAME = "Scikit German Risk Model - for training data - zLinux"
DEPLOYMENT_NAME = "Scikit German Risk Model - for training data - zLinux"

### You will start with importing required libraries

In [16]:
from sklearn.linear_model import SGDClassifier
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.decomposition import TruncatedSVD
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

### Splitting the data into train and test

In [17]:
train_data, test_data = train_test_split(data_df, test_size=0.2)

### Preparing the pipeline

In [18]:
features_idx = np.s_[0:-1]
all_records_idx = np.s_[:]
first_record_idx = np.s_[0]

In this step you will encode target column labels into numeric values. You can use `inverse_transform` to decode numeric predictions into labels.

In [19]:
string_fields = [type(fld) is str for fld in train_data.iloc[first_record_idx, features_idx]]
ct = ColumnTransformer([("ohe", OneHotEncoder(), list(np.array(train_data.columns)[features_idx][string_fields]))])
clf_linear = SGDClassifier(loss='log', penalty='l2', max_iter=1000, tol=1e-5)

pipeline_linear = Pipeline([('ct', ct), ('clf_linear', clf_linear)])

### Train a model

In [20]:
risk_model = pipeline_linear.fit(train_data.drop('Risk', axis=1), train_data.Risk)



### Evaluate the model

In [21]:
from sklearn.metrics import roc_auc_score

predictions = risk_model.predict(test_data.drop('Risk', axis=1))
indexed_preds = [0 if prediction=='No Risk' else 1 for prediction in predictions]

real_observations = test_data.Risk.replace('Risk', 1)
real_observations = real_observations.replace('No Risk', 0).values

auc = roc_auc_score(real_observations, indexed_preds)
print(auc)

0.7265272515496768


## Publish the model

In this section, the notebook uses the supplied Watson Machine Learning credentials to save the model (including the pipeline) to the WML instance. Previous versions of the model are removed so that the notebook can be run again, resetting all data for another demo.

In [22]:
import json
from ibm_watson_machine_learning import APIClient

wml_client = APIClient(WML_CREDENTIALS)
wml_client.version

'1.0.360'

### Listing all the available spaces

In [23]:
wml_client.spaces.list(limit=10)

------------------------------------  -------------------------------------------------------------------  ------------------------
ID                                    NAME                                                                 CREATED
7e5a8be6-9103-4c22-9c43-b66f3d8364de  poojitha_notebooks_space                                             2024-06-29T13:53:46.645Z
16ccd855-46bd-43ed-8219-5f00ac565d08  shreya-space                                                         2024-06-26T04:29:17.302Z
bc3b9797-c509-4fb4-a424-f67b1e2ed4be  QUALITY_WMLV4_PREPROD                                                2024-06-23T12:23:04.790Z
e396e187-2977-47b4-ade3-1539f9f10adc  QUALITY_WMLV4_PROD                                                   2024-06-23T12:22:54.422Z
40c4d032-0339-4da6-bfec-4bdb096c9650  shreya                                                               2024-06-20T10:54:20.088Z
088c142e-f35e-4e48-a30c-ad55a6edeecc  notebooks 5.0                                          

Unnamed: 0,ID,NAME,CREATED
0,7e5a8be6-9103-4c22-9c43-b66f3d8364de,poojitha_notebooks_space,2024-06-29T13:53:46.645Z
1,16ccd855-46bd-43ed-8219-5f00ac565d08,shreya-space,2024-06-26T04:29:17.302Z
2,bc3b9797-c509-4fb4-a424-f67b1e2ed4be,QUALITY_WMLV4_PREPROD,2024-06-23T12:23:04.790Z
3,e396e187-2977-47b4-ade3-1539f9f10adc,QUALITY_WMLV4_PROD,2024-06-23T12:22:54.422Z
4,40c4d032-0339-4da6-bfec-4bdb096c9650,shreya,2024-06-20T10:54:20.088Z
5,088c142e-f35e-4e48-a30c-ad55a6edeecc,notebooks 5.0,2024-06-13T04:42:07.336Z
6,b9b3d3b4-6e26-4e16-807d-e8bf5e7d6984,MRM_WMLV4_PREPROD,2024-06-12T15:49:26.571Z
7,d22e2b6b-917c-4427-a40c-1a439352a742,MRM_WMLV4_PROD,2024-06-12T15:49:16.185Z
8,ce15e0f6-be30-4349-af47-35ae15983bf1,openscale-express-path-preprod-00000000-0000-0...,2024-06-04T05:18:51.988Z
9,6264dc0e-087a-4dea-bcbc-6bd872b510fb,openscale-express-path-00000000-0000-0000-0000...,2024-06-04T05:18:30.811Z


In [24]:
WML_SPACE_ID='***' # use space id here
wml_client.set.default_space(WML_SPACE_ID)

'SUCCESS'

### Remove existing model and deployment

In [25]:
import time
deployments_list = wml_client.deployments.get_details()
for deployment in deployments_list["resources"]:
    model_id = deployment["entity"]["asset"]["id"]
    deployment_id = deployment["metadata"]["id"]
    if deployment["metadata"]["name"] == DEPLOYMENT_NAME:
        print("Deleting deployment id", deployment_id)
        wml_client.deployments.delete(deployment_id)
        time.sleep(5)
        print("Deleting model id", model_id)
        wml_client.repository.delete(model_id)
        time.sleep(5)

wml_client.repository.list_models()


------------------------------------  -----------------------------------------------------  ------------------------  ----------------  ----------  ----------------
ID                                    NAME                                                   CREATED                   TYPE              SPEC_STATE  SPEC_REPLACEMENT
47c30bd1-0a05-4d9f-bbbe-8764689c2890  carInsurance_regression                                2024-06-23T13:35:51.002Z  mllib_3.4         supported
ad29754f-11e9-42ef-93ca-4f144ae2640e  MRM drug-selection                                     2024-06-23T13:35:40.002Z  mllib_3.4         supported
864f1d5e-679b-45a7-aa7d-92e46de075cb  E2E_Scikit_BostonHousePrices_RegressionV4              2024-06-23T13:35:29.002Z  scikit-learn_1.1  supported
d2425cc1-bd1b-49af-937c-a4df1ad4420c  E2E_Native_XGBoost_BostonRegressionV4                  2024-06-23T13:35:17.002Z  xgboost_1.6       supported
b05755c3-b0f6-4807-b36f-7c9800083dec  E2E_Native_XGBoost_HeartDiseaseBinaryV4   

Unnamed: 0,ID,NAME,CREATED,TYPE,SPEC_STATE,SPEC_REPLACEMENT
0,47c30bd1-0a05-4d9f-bbbe-8764689c2890,carInsurance_regression,2024-06-23T13:35:51.002Z,mllib_3.4,supported,
1,ad29754f-11e9-42ef-93ca-4f144ae2640e,MRM drug-selection,2024-06-23T13:35:40.002Z,mllib_3.4,supported,
2,864f1d5e-679b-45a7-aa7d-92e46de075cb,E2E_Scikit_BostonHousePrices_RegressionV4,2024-06-23T13:35:29.002Z,scikit-learn_1.1,supported,
3,d2425cc1-bd1b-49af-937c-a4df1ad4420c,E2E_Native_XGBoost_BostonRegressionV4,2024-06-23T13:35:17.002Z,xgboost_1.6,supported,
4,b05755c3-b0f6-4807-b36f-7c9800083dec,E2E_Native_XGBoost_HeartDiseaseBinaryV4,2024-06-23T13:35:06.002Z,xgboost_1.6,supported,
5,ccb4ea7d-f530-4d88-935e-9a2bd1b0fe7a,E2E_XGBoost_IrisMultiClassV4,2024-06-23T13:34:55.002Z,scikit-learn_1.1,supported,
6,5dba048e-0de9-4f36-9bbe-06486fd27d89,HousePrice_Auto_AI_New - P4 LGBM Regressor - M...,2024-06-23T13:34:10.002Z,wml-hybrid_0.1,supported,
7,119dd11a-e02c-4c15-921b-19ccc1bba769,AIOS Keras Multiclass StackOverflow V4,2024-06-23T13:33:21.002Z,tensorflow_2.12,supported,
8,ae94bb71-cd4d-4c11-b8c1-d37bc1769dd8,AIOS Keras Binary Dogs vs Cats model V4,2024-06-23T13:31:19.002Z,tensorflow_2.12,supported,
9,24598d22-45e7-4fe9-bccb-c4e02ee83366,MRM AIOS Spark German Risk Model - Final,2024-06-23T13:31:01.002Z,mllib_3.4,supported,


In [26]:
training_data_references = [
                {
                    "id": "product line",
                    "type": "s3",
                    "connection": {
                        "access_key_id": COS_API_KEY_ID,
                        "endpoint_url": COS_ENDPOINT,
                        "resource_instance_id":COS_RESOURCE_CRN
                    },
                    "location": {
                        "bucket": BUCKET_NAME,
                        "path": training_data_file_name,
                    }
                }
            ]

In [27]:

software_spec_uid = wml_client.software_specifications.get_id_by_name("runtime-23.1-py3.10")
print("Software Specification ID: {}".format(software_spec_uid))

model_props = {
        wml_client._models.ConfigurationMetaNames.NAME:"{}".format(MODEL_NAME),
        wml_client._models.ConfigurationMetaNames.TYPE: "scikit-learn_1.1",
        wml_client._models.ConfigurationMetaNames.SOFTWARE_SPEC_UID: software_spec_uid,
        wml_client._models.ConfigurationMetaNames.LABEL_FIELD: "Risk",
    }

Software Specification ID: 336b29df-e0e1-5e7d-b6a5-f6ab722625b2


In [28]:
print("Storing model ...")

published_model_details = wml_client.repository.store_model(model=risk_model, meta_props=model_props, training_data=data_df.drop(["Risk"], axis=1), training_target=data_df.Risk)
model_uid = wml_client.repository.get_model_id(published_model_details)
print("Done")
print("Model ID: {}".format(model_uid))

Storing model ...
Done
Model ID: df904b8a-2884-4318-82df-35cb430f8587


In [29]:
model_details = wml_client.repository.get_details(model_uid)
print(json.dumps(model_details, indent=2))

{
  "entity": {
    "hybrid_pipeline_software_specs": [],
    "label_column": "Risk",
    "schemas": {
      "input": [
        {
          "fields": [
            {
              "name": "CheckingStatus",
              "type": "object"
            },
            {
              "name": "LoanDuration",
              "type": "int64"
            },
            {
              "name": "CreditHistory",
              "type": "object"
            },
            {
              "name": "LoanPurpose",
              "type": "object"
            },
            {
              "name": "LoanAmount",
              "type": "int64"
            },
            {
              "name": "ExistingSavings",
              "type": "object"
            },
            {
              "name": "EmploymentDuration",
              "type": "object"
            },
            {
              "name": "InstallmentPercent",
              "type": "int64"
            },
            {
              "name": "Sex",
         

## Deploy the model

The next section of the notebook deploys the model as a RESTful web service in Watson Machine Learning. The deployed model will have a scoring URL you can use to send data to the model for predictions.

In [30]:
print("Deploying model...")
deployment_details = wml_client.deployments.create(
    model_uid, 
    meta_props={
        wml_client.deployments.ConfigurationMetaNames.NAME: "{}".format(DEPLOYMENT_NAME),
        wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
    }
)
scoring_url = wml_client.deployments.get_scoring_href(deployment_details)
deployment_uid=wml_client.deployments.get_uid(deployment_details)

print("Scoring URL:" + scoring_url)
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))

Deploying model...


#######################################################################################

Synchronous deployment creation for uid: 'df904b8a-2884-4318-82df-35cb430f8587' started

#######################################################################################


initializing
Note: online_url is deprecated and will be removed in a future release. Use serving_urls instead.

ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='17e4097d-c4b5-41fe-bba5-3e12af95041e'
------------------------------------------------------------------------------------------------


Scoring URL:https://cpd-cpd-instance.apps.wos415nfs2672.cp.fyre.ibm.com/ml/v4/deployments/17e4097d-c4b5-41fe-bba5-3e12af95041e/predictions
Model id: df904b8a-2884-4318-82df-35cb430f8587
Deployment id: 17e4097d-c4b5-41fe-bba5-3e12af95041e


## Score the model

In [31]:
fields = ["CheckingStatus", "LoanDuration", "CreditHistory", "LoanPurpose", "LoanAmount", "ExistingSavings",
                  "EmploymentDuration", "InstallmentPercent", "Sex", "OthersOnLoan", "CurrentResidenceDuration",
                  "OwnsProperty", "Age", "InstallmentPlans", "Housing", "ExistingCreditsCount", "Job", "Dependents",
                  "Telephone", "ForeignWorker"]
values = [
            ["no_checking", 13, "credits_paid_to_date", "car_new", 1343, "100_to_500", "1_to_4", 2, "female", "none", 3,
             "savings_insurance", 46, "none", "own", 2, "skilled", 1, "none", "yes"],
            ["no_checking", 24, "prior_payments_delayed", "furniture", 4567, "500_to_1000", "1_to_4", 4, "male", "none",
             4, "savings_insurance", 36, "none", "free", 2, "management_self-employed", 1, "none", "yes"],
        ]

scoring_payload = {"input_data": [{"fields": fields, "values": values}]}

In [32]:
predictions = wml_client.deployments.score(deployment_uid, scoring_payload)
predictions

{'predictions': [{'fields': ['prediction', 'probability'],
   'values': [['Risk', [0.4784697281865681, 0.5215302718134319]],
    ['No Risk', [0.5958891012877514, 0.40411089871224865]]]}]}

# Configure OpenScale <a name="openscale"></a>

The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [33]:
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *


authenticator = CloudPakForDataAuthenticator(
        url=WOS_CREDENTIALS['url'],
        username=WOS_CREDENTIALS['username'],
        password=WOS_CREDENTIALS['password'],
        disable_ssl_verification=True
    )
#Create client for the default instance id 00000000-0000-0000-0000-000000000000
wos_client = APIClient(service_url=WOS_CREDENTIALS['url'], authenticator=authenticator)
wos_client.version

'3.0.40'

## Create schema and datamart

### Set up datamart

Watson OpenScale uses a database to store payload logs and calculated metrics. If database credentials were supplied, the datamart will be created there unless there is an existing datamart. If an OpenScale datamart exists, the existing datamart will be used and no data will be overwritten.

Prior instances of the German Credit model will be removed from OpenScale monitoring.

In [34]:
wos_client.data_marts.show()

0,1,2,3,4,5
AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000,Data Mart created by OpenScale ExpressPath,False,active,2024-06-04 05:19:03.698000+00:00,00000000-0000-0000-0000-000000000000


In [35]:
data_marts = wos_client.data_marts.list().result.data_marts
if len(data_marts) == 0:
    if DB_CREDENTIALS is not None:
        if SCHEMA_NAME is None: 
            print("Please specify the SCHEMA_NAME and rerun the cell")

        print('Setting up external datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook",
                database_configuration=DatabaseConfigurationRequest(
                  database_type=DatabaseType.POSTGRESQL,
                    credentials=PrimaryStorageCredentialsLong(
                        hostname=DB_CREDENTIALS['hostname'],
                        username=DB_CREDENTIALS['username'],
                        password=DB_CREDENTIALS['password'],
                        db=DB_CREDENTIALS['database'],
                        port=DB_CREDENTIALS['port'],
                        ssl=True,
                        sslmode=DB_CREDENTIALS['sslmode'],
                        certificate_base64=DB_CREDENTIALS['certificate_base64']
                    ),
                    location=LocationSchemaName(
                        schema_name= SCHEMA_NAME
                    )
                )
             ).result
    else:
        print('Setting up internal datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook", 
                internal_database = True).result
        
    data_mart_id = added_data_mart_result.metadata.id
    
else:
    data_mart_id=data_marts[0].metadata.id
    print('Using existing datamart {}'.format(data_mart_id))

Using existing datamart 00000000-0000-0000-0000-000000000000


### Remove existing service provider connected with used  WML instance. 

Multiple service providers for the same engine instance are avaiable in Watson OpenScale. To avoid multiple service providers of used WML instance in the tutorial notebook the following code deletes existing service provder(s) and then adds new one. 

In [36]:
SERVICE_PROVIDER_NAME = "WML - for training data"
SERVICE_PROVIDER_DESCRIPTION = "Added by tutorial WOS notebook."

In [37]:
service_providers = wos_client.service_providers.list().result.service_providers
for service_provider in service_providers:
    service_instance_name = service_provider.entity.name
    if service_instance_name == SERVICE_PROVIDER_NAME:
        service_provider_id = service_provider.metadata.id
        wos_client.service_providers.delete(service_provider_id)
        print("Deleted existing service_provider for WML instance: {}".format(service_provider_id))

## Add service provider

Watson OpenScale needs to be bound to the Watson Machine Learning instance to capture payload data into and out of the model.

**Note:** You can bind more than one engine instance if needed by calling `wos_client.service_providers.add` method. Next, you can refer to particular service provider using `service_provider_id`.

In [38]:
added_service_provider_result = wos_client.service_providers.add(
        name=SERVICE_PROVIDER_NAME,
        description=SERVICE_PROVIDER_DESCRIPTION,
        service_type=ServiceTypes.WATSON_MACHINE_LEARNING,
        deployment_space_id = WML_SPACE_ID,
        operational_space_id = "production",
        credentials=WMLCredentialsCP4D(),
        background_mode=False
    ).result
service_provider_id = added_service_provider_result.metadata.id




 Waiting for end of adding service provider 1b7f6643-f6ac-4754-9f39-e8dbe5b09232 




active

-----------------------------------------------
 Successfully finished adding service provider 
-----------------------------------------------




In [39]:
# Generate training stats
# Setup info for generating training stats for GCR Model
from ibm_watson_openscale.utils.training_stats import TrainingStats
service_configuration_support = {
    "enable_fairness": True,
    "enable_explainability": True,
    "enable_drift": True
}

training_data_info = {
    "class_label": "Risk",
    "feature_columns": ["CheckingStatus", "LoanDuration", "CreditHistory", "LoanPurpose", "LoanAmount", "ExistingSavings", "EmploymentDuration", "InstallmentPercent", "Sex", "OthersOnLoan", "CurrentResidenceDuration", "OwnsProperty", "Age", "InstallmentPlans", "Housing", "ExistingCreditsCount", "Job", "Dependents", "Telephone", "ForeignWorker"],
    "categorical_columns": ['Age', 'CheckingStatus', 'CreditHistory', 'CurrentResidenceDuration', 'Dependents', 'EmploymentDuration', 'ExistingCreditsCount', 'ExistingSavings', 'ForeignWorker', 'Housing', 'InstallmentPercent', 'InstallmentPlans', 'Job', 'LoanAmount', 'LoanDuration', 'LoanPurpose', 'OthersOnLoan', 'OwnsProperty', 'Sex', 'Telephone']
}

fairness_attributes = [{
   "feature": "Sex", 
   "majority": [
       "male"
   ],
   "minority": [
       "female"
   ],
   "threshold": 0.8
}]

model_type = "binary"
parameters = {
    "favourable_class" :  [ "No Risk" ],
    "unfavourable_class": [ "Risk" ]
}
min_records = 10

# Generate Training stats
enable_explainability = service_configuration_support.get('enable_explainability')
enable_fairness = service_configuration_support.get('enable_fairness')
data_df = pd.read_csv('german_credit_data_biased_training.csv')
training_data_stats = None
if enable_explainability or enable_fairness:
    fairness_inputs = None
    if enable_fairness:
        fairness_inputs = {
                "fairness_attributes": fairness_attributes,
                "min_records" : min_records,
                "favourable_class" :  parameters["favourable_class"],
                "unfavourable_class": parameters["unfavourable_class"]
            }

    input_parameters = {
        "probability_column": "probability",
        "prediction_column": "prediction",
        "label_column": training_data_info["class_label"],
        "feature_columns": training_data_info["feature_columns"],
        "categorical_columns": training_data_info["categorical_columns"],
        "fairness_inputs": fairness_inputs,  
        "problem_type" : model_type  
    }

    training_stats = TrainingStats(data_df,input_parameters, explain=enable_explainability, fairness=enable_fairness, drop_na=True)
    training_data_stats = training_stats.get_training_statistics()
    training_data_stats["notebook_version"] = 5.0 

print(training_data_stats)

{'common_configuration': {'problem_type': 'binary', 'label_column': 'Risk', 'prediction': 'prediction', 'probability': ['probability'], 'input_data_schema': {'type': 'struct', 'fields': [{'name': 'CheckingStatus', 'type': 'string', 'nullable': True, 'metadata': {}}, {'name': 'LoanDuration', 'type': 'long', 'nullable': True, 'metadata': {}}, {'name': 'CreditHistory', 'type': 'string', 'nullable': True, 'metadata': {}}, {'name': 'LoanPurpose', 'type': 'string', 'nullable': True, 'metadata': {}}, {'name': 'LoanAmount', 'type': 'long', 'nullable': True, 'metadata': {}}, {'name': 'ExistingSavings', 'type': 'string', 'nullable': True, 'metadata': {}}, {'name': 'EmploymentDuration', 'type': 'string', 'nullable': True, 'metadata': {}}, {'name': 'InstallmentPercent', 'type': 'long', 'nullable': True, 'metadata': {}}, {'name': 'Sex', 'type': 'string', 'nullable': True, 'metadata': {}}, {'name': 'OthersOnLoan', 'type': 'string', 'nullable': True, 'metadata': {}}, {'name': 'CurrentResidenceDuratio

In [40]:

prediction_column = "prediction"
probability_columns = ['probability']
predicted_target_column = "prediction"
#predicted_target_column = "***"
subscription_details = wos_client.subscriptions.add(data_mart_id,
    service_provider_id,
    asset = None,
    deployment = None,
    training_data_stats=training_data_stats,
    deployment_id = deployment_uid,
    deployment_space_id = WML_SPACE_ID,
    prediction_field = prediction_column,
    #predicted_target_field = predicted_target_column,
    probability_fields = probability_columns,background_mode = False).result

subscription_id = subscription_details.metadata.id
print(subscription_details)
print("Subscription id {}".format(subscription_id))




 Waiting for end of adding subscription bcbbcc1f-7e1c-4746-8055-128ce8d805ff 




active

-------------------------------------------
 Successfully finished adding subscription 
-------------------------------------------


{
  "metadata": {
    "id": "bcbbcc1f-7e1c-4746-8055-128ce8d805ff",
    "crn": "crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:subscription:bcbbcc1f-7e1c-4746-8055-128ce8d805ff",
    "url": "/v2/subscriptions/bcbbcc1f-7e1c-4746-8055-128ce8d805ff",
    "created_at": "2024-08-07T08:51:02.441000Z",
    "created_by": "cpadmin",
    "modified_at": "2024-08-07T08:51:04.820000Z",
    "modified_by": "cpadmin"
  },
  "entity": {
    "data_mart_id": "00000000-0000-0000-0000-000000000000",
    "service_provider_id": "1b7f6643-f6ac-4754-9f39-e8dbe5b09232",
    "asset": {
      "asset_id": "df904b8a-2884-4318-82df-35cb430f8587",
      "url": "https://internal-nginx-svc:12443/ml/v4/models/df904b8a-2884-4318-82df-35cb430f8587?space_id=e396e

In [41]:
#Score payload
from IPython.utils import io
with io.capture_output() as captured:
    !wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/credit_risk/german_credit_feed.json -O german_credit_feed.json
!ls -lh german_credit_feed.json

import random, time
import uuid
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord

payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id: ", payload_data_set_id)
    

with open('german_credit_feed.json', 'r') as scoring_file:
    scoring_data = json.load(scoring_file)

fields = scoring_data['fields']
values = []
for _ in range(100):
    values.append(random.choice(scoring_data['values']))
payload_scoring = {"input_data": [{"fields": fields, "values": values}]}

scoring_response = wml_client.deployments.score(deployment_uid, payload_scoring)
time.sleep(10)

#wos_client.data_sets.store_records(data_set_id=payload_data_set_id, request_body=[PayloadRecord(
#                   scoring_id=str(uuid.uuid4()),
#                   request=payload_scoring,
#                   response=scoring_response,
#                   response_time=460
#               )],background_mode=False)
#time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))



-rw-r--r--  1 nelwin  staff   2.9M Aug  7 14:21 german_credit_feed.json
Payload data set id:  16494c6f-c675-4da5-824f-330a3ab23cb8
Number of records in the payload logging table: 100


In [42]:
#Create monitors
# It will create Bias and explain monitors based on what is enabled
print("Creating monitor instances...")
response = wos_client.monitor_instances.create(monitor_definition_id = None, 
                        target = None, data_mart_id = data_mart_id, training_data_stats=training_data_stats, 
                        subscription_id=subscription_id,background_mode=False, parameters={})
print(response)

Creating monitor instances...
Creating fairness monitor



 Waiting for end of monitor instance creation a4d97b96-32a0-43eb-96f2-df364c2293b8 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------


Creating Explain monitor



 Waiting for end of monitor instance creation 67c3459a-c9dc-4fa8-bf60-93622e330eec 




preparing
active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------


{'fairness': {'metadata': {'id': 'a4d97b96-32a0-43eb-96f2-df364c2293b8', 'crn': 'crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:monitor_instance:a4d97b96-32a0-43eb-96f2-df364c2293b8', 'url': '/v2/monitor_instances/a4d97b96-32a0-43eb-96f2-df364c2293b8', 'created_at': '2024-08-07T08:51:59.579000Z', 'created_by': 'cpadmin', 'modified_at': '2024-08-07T08:52:00.714000Z', 'modified_by': 'internal-service'}, 'entity': {'data_mart_id