<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

This notebook should be run in a Watson Studio project, using Default Python 3.7.x runtime environment. If you are viewing this in Watson Studio and do not see Python 3.7.x in the upper right corner of your screen, please update the runtime now. 

## Prerequisites
To run this notebook, you must provide the following information.

- IBMid and IBM Cloud instance
- two (2) instances of IBM Watson Machine Learning
- instance of IBM Watson OpenScale

## Provision services and configure credentials

If you have not already, provision an instance of IBM Watson OpenScale and two instances of IBM Watson Machine Learning using the Cloud catalog.


Your Cloud API key can be generated by going to the Users section of the Cloud console. From that page, click your name, scroll down to the API Keys section, and click Create an IBM Cloud API key. Give your key a name and click Create, then copy the created key and paste it below.

NOTE: You can also get OpenScale API_KEY using IBM CLOUD CLI.

How to install IBM Cloud (bluemix) console: [Instructions](https://console.bluemix.net/docs/cli/reference/ibmcloud/download_cli.html#install_use)

 
**Connection to WML**
    
Authenticate the Watson Machine Learning service on IBM Cloud. You need to provide platform api_key and instance location.

You can use IBM Cloud CLI to retrieve platform API Key and instance location.

API Key can be generated in the following way:
```
ibmcloud login
ibmcloud iam api-key-create API_KEY_NAME
In result, get the value of api_key from the output.
```
Location of your WML instance can be retrieved in the following way:
```
ibmcloud login --apikey API_KEY -a https://cloud.ibm.com
ibmcloud resource service-instances
ibmcloud resource service-instance WML_INSTANCE_NAME
ibmcloud resource service-instance COS_INSTANCE_NAME
```
In result, get the value of location from the output.

In the output, you can also get:

- **name of the service instance
CRN (ID) and Name (name)**

that can be used in next steps.

Tip: Your Cloud API key can be generated by going to the Users section of the Cloud console. From that page, click your name, scroll down to the API Keys section, and click Create an IBM Cloud API key. Give your key a name and click Create, then copy the created key and paste it below.

You can also get service specific apikey by going to the Service IDs section of the Cloud Console. From that page, click Create, then copy the created key and paste it below.

In [None]:
#####################################################################################
# Paste your IBM Cloud API key, WML CRN in the following field and then run this cell.
######################################################################################
CLOUD_API_KEY = "***"
WML_INSTANCE_NAME="***"
WML_CRN="***"

In [None]:
COS_API_KEY_ID = "***"
COS_RESOURCE_CRN = "***" # eg "crn:v1:bluemix:public:cloud-object-storage:global:a/3bf0d9003abfb5d29761c3e97696b71c:d6f04d83-6c4f-4a62-a165-696756d63903::"
COS_ENDPOINT = "***" # Current list avaiable at https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints

In [None]:
BUCKET_NAME = "***" #example: "credit-risk-training-data"
training_data_file_name="german_credit_data_biased_training.csv"

In [None]:
WML_CREDENTIALS = {
                   "url": "https://us-south.ml.cloud.ibm.com",
                   "apikey": CLOUD_API_KEY
}

In [None]:
DB_CREDENTIALS=None
#DB_CREDENTIALS= {"hostname":"","username":"","password":"","database":"","port":"","ssl":True,"sslmode":"","certificate_base64":""}

In [None]:
KEEP_MY_INTERNAL_POSTGRES = True

In [None]:
IAM_URL="https://iam.ng.bluemix.net/oidc/token"

## Package installation
The following opensource packages must be installed into this notebook instance so that they are available to use during processing.

In [None]:
!pip install pyspark==2.4.0 --no-cache | tail -n 1

In [None]:
!rm -rf /home/spark/shared/user-libs/python3.7*

!pip install --upgrade pandas==1.2.3 --no-cache | tail -n 1
!pip install --upgrade requests==2.23 --no-cache | tail -n 1
!pip install --upgrade numpy==1.20.3 --user --no-cache | tail -n 1
!pip install SciPy --no-cache | tail -n 1
!pip install lime --no-cache | tail -n 1

!pip install --upgrade ibm-watson-machine-learning --user | tail -n 1
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1

In [None]:
import json
import requests
import base64
from requests.auth import HTTPBasicAuth
import time

## Load the training data from Github
So you don't have to manually generate training data, we've provided a sample and placed it in a publicly available Github repo.

In [None]:
!rm german_credit_data_biased_training.csv
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/credit_risk/german_credit_data_biased_training.csv

In [None]:
import pandas as pd
pd_data = pd.read_csv("german_credit_data_biased_training.csv", sep=",", header=0)

In [None]:
import ibm_boto3
from ibm_botocore.client import Config, ClientError

cos_client = ibm_boto3.resource("s3",
    ibm_api_key_id=COS_API_KEY_ID,
    ibm_service_instance_id=COS_RESOURCE_CRN,
    ibm_auth_endpoint="https://iam.bluemix.net/oidc/token",
    config=Config(signature_version="oauth"),
    endpoint_url=COS_ENDPOINT
)

In [None]:
with open(training_data_file_name, "rb") as file_data:
    cos_client.Object(BUCKET_NAME, training_data_file_name).upload_fileobj(
        Fileobj=file_data
    )

## Deploy the Spark Credit Risk Model to Watson Machine Learning

The following cell deploys the Spark version of the German Credit Risk Model to the specified Machine Learning instance in the specified deployment space. You'll notice that this version of the German Credit Risk model has an auc-roc score around 71%.

In [None]:
from ibm_watson_machine_learning import APIClient

wml_client = APIClient(WML_CREDENTIALS)
wml_client.version

In [None]:
wml_client.spaces.list(limit=10)

In [None]:
space_name ="pre-prod-space"
spaces = wml_client.spaces.get_details()['resources']
preprod_space_id = None
for space in spaces:
    if space['entity']['name'] == space_name:
        preprod_space_id = space["metadata"]["id"]
if preprod_space_id is None:
    preprod_space_id = wml_client.spaces.store(
        meta_props={wml_client.spaces.ConfigurationMetaNames.NAME: space_name,
                   wml_client.spaces.ConfigurationMetaNames.STORAGE: {"resource_crn":COS_RESOURCE_CRN},
                   wml_client.spaces.ConfigurationMetaNames.COMPUTE: {"name": WML_INSTANCE_NAME,
                                            "crn": WML_CRN}})["metadata"]["id"]
wml_client.set.default_space(preprod_space_id)
print(preprod_space_id)

In [None]:
def deploy_credit_risk_spark_model(wml_credentials, model_name, deployment_name,space_id):

    import numpy 
    numpy.version.version

    import pandas as pd
    import json

    from pyspark import SparkContext, SQLContext
    from pyspark.ml import Pipeline
    from pyspark.ml.classification import RandomForestClassifier,GBTClassifier
    from pyspark.ml.evaluation import BinaryClassificationEvaluator
    from pyspark.ml.feature import StringIndexer, VectorAssembler, IndexToString
    from pyspark.sql.types import StructType, DoubleType, StringType, ArrayType

    from pyspark.sql import SparkSession
    from pyspark import SparkFiles

    spark = SparkSession.builder.getOrCreate()
    pd_data = pd.read_csv("german_credit_data_biased_training.csv", sep=",", header=0)
    spark_df = spark.read.csv(path="german_credit_data_biased_training.csv", sep=",", header=True, inferSchema=True)
    spark_df.head()

    (train_data, test_data) = spark_df.randomSplit([0.9, 0.1], 24)
    print("Number of records for training: " + str(train_data.count()))
    print("Number of records for evaluation: " + str(test_data.count()))

    si_CheckingStatus = StringIndexer(inputCol='CheckingStatus', outputCol='CheckingStatus_IX')
    si_CreditHistory = StringIndexer(inputCol='CreditHistory', outputCol='CreditHistory_IX')
    si_LoanPurpose = StringIndexer(inputCol='LoanPurpose', outputCol='LoanPurpose_IX')
    si_ExistingSavings = StringIndexer(inputCol='ExistingSavings', outputCol='ExistingSavings_IX')
    si_EmploymentDuration = StringIndexer(inputCol='EmploymentDuration', outputCol='EmploymentDuration_IX')
    si_Sex = StringIndexer(inputCol='Sex', outputCol='Sex_IX')
    si_OthersOnLoan = StringIndexer(inputCol='OthersOnLoan', outputCol='OthersOnLoan_IX')
    si_OwnsProperty = StringIndexer(inputCol='OwnsProperty', outputCol='OwnsProperty_IX')
    si_InstallmentPlans = StringIndexer(inputCol='InstallmentPlans', outputCol='InstallmentPlans_IX')
    si_Housing = StringIndexer(inputCol='Housing', outputCol='Housing_IX')
    si_Job = StringIndexer(inputCol='Job', outputCol='Job_IX')
    si_Telephone = StringIndexer(inputCol='Telephone', outputCol='Telephone_IX')
    si_ForeignWorker = StringIndexer(inputCol='ForeignWorker', outputCol='ForeignWorker_IX')
    si_Label = StringIndexer(inputCol="Risk", outputCol="label").fit(spark_df)
    label_converter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=si_Label.labels)

    va_features = VectorAssembler(
    inputCols=["CheckingStatus_IX", "CreditHistory_IX", "LoanPurpose_IX", "ExistingSavings_IX",
               "EmploymentDuration_IX", "Sex_IX", "OthersOnLoan_IX", "OwnsProperty_IX", "InstallmentPlans_IX",
               "Housing_IX", "Job_IX", "Telephone_IX", "ForeignWorker_IX", "LoanDuration", "LoanAmount",
               "InstallmentPercent", "CurrentResidenceDuration", "LoanDuration", "Age", "ExistingCreditsCount",
               "Dependents"], outputCol="features")

    classifier=GBTClassifier(featuresCol="features")

    pipeline = Pipeline(
    stages=[si_CheckingStatus, si_CreditHistory, si_EmploymentDuration, si_ExistingSavings, si_ForeignWorker,
            si_Housing, si_InstallmentPlans, si_Job, si_LoanPurpose, si_OthersOnLoan,
            si_OwnsProperty, si_Sex, si_Telephone, si_Label, va_features, classifier, label_converter])

    model = pipeline.fit(train_data)
    predictions = model.transform(test_data)
    evaluator = BinaryClassificationEvaluator(rawPredictionCol="prediction")
    auc = evaluator.evaluate(predictions)

    print("Accuracy = %g" % auc)

    from ibm_watson_machine_learning import APIClient

    wml_client = APIClient(WML_CREDENTIALS)
    wml_client.version
    wml_client.set.default_space(space_id)
    

    # Remove existing model and deployment
    MODEL_NAME=model_name
    DEPLOYMENT_NAME=deployment_name

    deployments_list = wml_client.deployments.get_details()
    for deployment in deployments_list["resources"]:
        model_id = deployment["entity"]["asset"]["id"]
        deployment_id = deployment["metadata"]["id"]
        if deployment["metadata"]["name"] == DEPLOYMENT_NAME:
            print("Deleting deployment id", deployment_id)
            wml_client.deployments.delete(deployment_id)
            print("Deleting model id", model_id)
            wml_client.repository.delete(model_id)
    wml_client.repository.list_models()
    
    training_data_reference = [
                    {
                        "id": "credit risk",
                        "type": "s3",
                        "connection": {
                            "access_key_id": COS_API_KEY_ID,
                            "endpoint_url": COS_ENDPOINT,
                            "resource_instance_id":COS_RESOURCE_CRN
                        },
                        "location": {
                            "bucket": BUCKET_NAME,
                            "path": training_data_file_name,
                        }
                    }
                ]

    # Save Model
    software_spec_uid = wml_client.software_specifications.get_id_by_name("spark-mllib_2.4")
    print("Software Specification ID: {}".format(software_spec_uid))
    model_props = {
            wml_client._models.ConfigurationMetaNames.NAME:"{}".format(MODEL_NAME),
            #wml_client._models.ConfigurationMetaNames.SPACE_UID: space_id,
            wml_client._models.ConfigurationMetaNames.TYPE: "mllib_2.4",
            wml_client._models.ConfigurationMetaNames.SOFTWARE_SPEC_UID: software_spec_uid,
            wml_client._models.ConfigurationMetaNames.TRAINING_DATA_REFERENCES: training_data_reference,
            wml_client._models.ConfigurationMetaNames.LABEL_FIELD: "Risk",
        }

    print("Storing model ...")
    published_model_details = wml_client.repository.store_model(
        model=model, 
        meta_props=model_props, 
        training_data=train_data, 
        pipeline=pipeline)

    model_uid = wml_client.repository.get_model_uid(published_model_details)
    print("Done")
    print("Model ID: {}".format(model_uid))


    # Deploy model
    deployment_details = wml_client.deployments.create(
    model_uid, 
    meta_props={
        wml_client.deployments.ConfigurationMetaNames.NAME: "{}".format(DEPLOYMENT_NAME),
        wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
    }
    )
    scoring_url = wml_client.deployments.get_scoring_href(deployment_details)
    deployment_uid=wml_client.deployments.get_uid(deployment_details)

    print("Scoring URL:" + scoring_url)
    print("Model id: {}".format(model_uid))
    print("Deployment id: {}".format(deployment_uid))

    fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"]
    values = [
      ["no_checking",13,"credits_paid_to_date","car_new",1343,"100_to_500","1_to_4",2,"female","none",3,"savings_insurance",46,"none","own",2,"skilled",1,"none","yes"],
      ["no_checking",24,"prior_payments_delayed","furniture",4567,"500_to_1000","1_to_4",4,"male","none",4,"savings_insurance",36,"none","free",2,"management_self-employed",1,"none","yes"],
      ["0_to_200",26,"all_credits_paid_back","car_new",863,"less_100","less_1",2,"female","co-applicant",2,"real_estate",38,"none","own",1,"skilled",1,"none","yes"],
      ["0_to_200",14,"no_credits","car_new",2368,"less_100","1_to_4",3,"female","none",3,"real_estate",29,"none","own",1,"skilled",1,"none","yes"],
      ["0_to_200",4,"no_credits","car_new",250,"less_100","unemployed",2,"female","none",3,"real_estate",23,"none","rent",1,"management_self-employed",1,"none","yes"],
      ["no_checking",17,"credits_paid_to_date","car_new",832,"100_to_500","1_to_4",2,"male","none",2,"real_estate",42,"none","own",1,"skilled",1,"none","yes"],
      ["no_checking",33,"outstanding_credit","appliances",5696,"unknown","greater_7",4,"male","co-applicant",4,"unknown",54,"none","free",2,"skilled",1,"yes","yes"],
      ["0_to_200",13,"prior_payments_delayed","retraining",1375,"100_to_500","4_to_7",3,"male","none",3,"real_estate",37,"none","own",2,"management_self-employed",1,"none","yes"]
    ]

    scoring_payload = {"input_data": [{"fields": fields, "values": values}]}
    #print(scoring_payload)

    scoring_response = wml_client.deployments.score(deployment_uid, scoring_payload)
    print(scoring_response)
    
    return model_uid, deployment_uid, scoring_url


## Deploy the Scikit-Learn Credit Risk Model to Watson Machine Learning

The following cell deploys the Scikit-learn version of the German Credit Risk Model to the specified Machine Learning instance in the specified deployment space. This version of the German Credit Risk model has an auc-roc score around 85% and will be called the "Challenger."

In [None]:
def deploy_credit_risk_scikit_model(wml_credentials, model_name, deployment_name,space_id):

    import pandas as pd
    import json
    import sys
    import numpy
    import sklearn
    import sklearn.ensemble
    numpy.set_printoptions(threshold=sys.maxsize)
    from sklearn.utils.multiclass import type_of_target
    from sklearn.model_selection import train_test_split
    from sklearn.pipeline import Pipeline
    from sklearn.impute import SimpleImputer
    from sklearn.preprocessing import StandardScaler, OrdinalEncoder
    from sklearn.compose import ColumnTransformer
    from sklearn.model_selection import cross_validate
    from sklearn.metrics import get_scorer
    from sklearn.model_selection import cross_validate
    from sklearn.metrics import classification_report

    data_df=pd.read_csv ("german_credit_data_biased_training.csv")

    data_df.head()

    target_label_name = "Risk"
    feature_cols= data_df.drop(columns=[target_label_name])
    label= data_df[target_label_name]

    # Set model evaluation properties
    optimization_metric = 'roc_auc'
    random_state = 33
    cv_num_folds = 3
    holdout_fraction = 0.1

    if type_of_target(label.values) in ['multiclass', 'binary']:
        X_train, X_holdout, y_train, y_holdout = train_test_split(feature_cols, label, test_size=holdout_fraction, random_state=random_state, stratify=label.values)
    else:
        X_train, X_holdout, y_train, y_holdout = train_test_split(feature_cols, label, test_size=holdout_fraction, random_state=random_state)

    # Data preprocessing transformer generation

    numeric_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler())])
    categorical_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='most_frequent')),
        ('OrdinalEncoder', OrdinalEncoder(categories='auto',dtype=numpy.float64 ))])

    numeric_features = feature_cols.select_dtypes(include=['int64', 'float64']).columns
    categorical_features = feature_cols.select_dtypes(include=['object']).columns

    preprocessor = ColumnTransformer(
        transformers=[
            ('num', numeric_transformer, numeric_features),
            ('cat', categorical_transformer, categorical_features)])

    # Initiate model and create pipeline
    model=sklearn.ensemble.GradientBoostingClassifier()
    gbt_pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('classifier', model)])
    model_gbt=gbt_pipeline.fit(X_train, y_train)

    y_pred = model_gbt.predict(X_holdout)


    # Evaluate model performance on test data and Cross validation
    scorer = get_scorer(optimization_metric)
    scorer(model_gbt,X_holdout, y_holdout)

    # Cross validation -3 folds
    cv_results = cross_validate(model_gbt,X_train,y_train, scoring={optimization_metric:scorer})
    numpy.mean(cv_results['test_' + optimization_metric])

    print(classification_report(y_pred, y_holdout))


    # Initiate WML
    from ibm_watson_machine_learning import APIClient

    wml_client = APIClient(WML_CREDENTIALS)
    wml_client.version
    wml_client.set.default_space(space_id)
    

    # Remove existing model and deployment
    MODEL_NAME=model_name
    DEPLOYMENT_NAME=deployment_name

    deployments_list = wml_client.deployments.get_details()
    for deployment in deployments_list["resources"]:
        model_id = deployment["entity"]["asset"]["id"]
        deployment_id = deployment["metadata"]["id"]
        if deployment["metadata"]["name"] == DEPLOYMENT_NAME:
            print("Deleting deployment id", deployment_id)
            wml_client.deployments.delete(deployment_id)
            print("Deleting model id", model_id)
            wml_client.repository.delete(model_id)
    wml_client.repository.list_models()
    
    # Store Model
    #Note if there is specification related exception or specification ID is None then use "default_py3.8" instead of "default_py3.7_opence"
    software_spec_uid = wml_client.software_specifications.get_id_by_name("default_py3.7_opence")
    print("Software Specification ID: {}".format(software_spec_uid))
    
    training_data_reference = [
                    {
                        "id": "credit risk",
                        "type": "s3",
                        "connection": {
                            "access_key_id": COS_API_KEY_ID,
                            "endpoint_url": COS_ENDPOINT,
                            "resource_instance_id":COS_RESOURCE_CRN
                        },
                        "location": {
                            "bucket": BUCKET_NAME,
                            "path": training_data_file_name,
                        }
                    }
                ]


    model_props = {
        wml_client._models.ConfigurationMetaNames.NAME:"{}".format(MODEL_NAME),
        #wml_client._models.ConfigurationMetaNames.SPACE_UID: space_id,
        wml_client._models.ConfigurationMetaNames.TYPE: "scikit-learn_0.23",
        wml_client._models.ConfigurationMetaNames.SOFTWARE_SPEC_UID: software_spec_uid,
        wml_client._models.ConfigurationMetaNames.TRAINING_DATA_REFERENCES: training_data_reference,
        wml_client._models.ConfigurationMetaNames.LABEL_FIELD: "Risk",
    }
    
    print("Storing model ...")

    published_model_details = wml_client.repository.store_model(model=model_gbt, meta_props=model_props, training_data=feature_cols, training_target=label)
    model_uid = wml_client.repository.get_model_uid(published_model_details)
    print("Done")
    print("Model ID: {}".format(model_uid))



    # Deploy model
    print("Deploying model...")
    deployment_details = wml_client.deployments.create(
        model_uid, 
        meta_props={
            wml_client.deployments.ConfigurationMetaNames.NAME: "{}".format(DEPLOYMENT_NAME),
            wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
        }
    )
    scoring_url = wml_client.deployments.get_scoring_href(deployment_details)
    deployment_uid=wml_client.deployments.get_uid(deployment_details)

    print("Scoring URL:" + scoring_url)
    print("Model id: {}".format(model_uid))
    print("Deployment id: {}".format(deployment_uid))

    # Sample scoring
    fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"]
    values = [
      ["no_checking",13,"credits_paid_to_date","car_new",1343,"100_to_500","1_to_4",2,"female","none",3,"savings_insurance",46,"none","own",2,"skilled",1,"none","yes"],
      ["no_checking",24,"prior_payments_delayed","furniture",4567,"500_to_1000","1_to_4",4,"male","none",4,"savings_insurance",36,"none","free",2,"management_self-employed",1,"none","yes"],
      ["0_to_200",26,"all_credits_paid_back","car_new",863,"less_100","less_1",2,"female","co-applicant",2,"real_estate",38,"none","own",1,"skilled",1,"none","yes"],
      ["0_to_200",14,"no_credits","car_new",2368,"less_100","1_to_4",3,"female","none",3,"real_estate",29,"none","own",1,"skilled",1,"none","yes"],
      ["0_to_200",4,"no_credits","car_new",250,"less_100","unemployed",2,"female","none",3,"real_estate",23,"none","rent",1,"management_self-employed",1,"none","yes"],
      ["no_checking",17,"credits_paid_to_date","car_new",832,"100_to_500","1_to_4",2,"male","none",2,"real_estate",42,"none","own",1,"skilled",1,"none","yes"],
      ["no_checking",33,"outstanding_credit","appliances",5696,"unknown","greater_7",4,"male","co-applicant",4,"unknown",54,"none","free",2,"skilled",1,"yes","yes"],
      ["0_to_200",13,"prior_payments_delayed","retraining",1375,"100_to_500","4_to_7",3,"male","none",3,"real_estate",37,"none","own",2,"management_self-employed",1,"none","yes"]
    ]

    payload_scoring = {"input_data": [{"fields": fields, "values": values}]}
    #print(payload_scoring)

    scoring_response = wml_client.deployments.score(deployment_uid, payload_scoring)
    print(scoring_response)

    return model_uid, deployment_uid, scoring_url


# Deploy the models

The following cells will deploy both the PreProd and Challenger models into the WML instance that is designated as Pre-Production.

In [None]:
PRE_PROD_MODEL_NAME="German Credit Risk Model - PreProd"
PRE_PROD_DEPLOYMENT_NAME="German Credit Risk Model - PreProd"

PRE_PROD_CHALLENGER_MODEL_NAME="German Credit Risk Model - Challenger"
PRE_PROD_CHALLENGER_DEPLOYMENT_NAME="German Credit Risk Model - Challenger"

In [None]:
pre_prod_model_uid, pre_prod_deployment_uid, pre_prod_scoring_url = deploy_credit_risk_spark_model(WML_CREDENTIALS, PRE_PROD_MODEL_NAME, PRE_PROD_DEPLOYMENT_NAME,preprod_space_id)

In [None]:
challenger_model_uid, challenger_deployment_uid, challenger_scoring_url = deploy_credit_risk_scikit_model(WML_CREDENTIALS, PRE_PROD_CHALLENGER_MODEL_NAME, PRE_PROD_CHALLENGER_DEPLOYMENT_NAME,preprod_space_id)

# Configure OpenScale 
The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [None]:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator,BearerTokenAuthenticator

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *


authenticator = IAMAuthenticator(apikey=CLOUD_API_KEY)
#authenticator = BearerTokenAuthenticator(bearer_token=IAM_TOKEN) ## uncomment if using IAM token
wos_client = APIClient(authenticator=authenticator)
wos_client.version

## Create schema and datamart

### Set up datamart
Watson OpenScale uses a database to store payload logs and calculated metrics. If database credentials were not supplied above, the notebook will use the free, internal lite database. If database credentials were supplied, the datamart will be created there unless there is an existing datamart and the KEEP_MY_INTERNAL_POSTGRES variable is set to True. If an OpenScale datamart exists in Db2 or PostgreSQL, the existing datamart will be used and no data will be overwritten.

Prior instances of the German Credit model will be removed from OpenScale monitoring.

In [None]:
data_marts = wos_client.data_marts.list().result.data_marts
if len(data_marts) == 0:
    if DB_CREDENTIALS is not None:
        if SCHEMA_NAME is None: 
            print("Please specify the SCHEMA_NAME and rerun the cell")

        print('Setting up external datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook",
                database_configuration=DatabaseConfigurationRequest(
                  database_type=DatabaseType.POSTGRESQL,
                    credentials=PrimaryStorageCredentialsLong(
                        hostname=DB_CREDENTIALS['hostname'],
                        username=DB_CREDENTIALS['username'],
                        password=DB_CREDENTIALS['password'],
                        db=DB_CREDENTIALS['database'],
                        port=DB_CREDENTIALS['port'],
                        ssl=True,
                        sslmode=DB_CREDENTIALS['sslmode'],
                        certificate_base64=DB_CREDENTIALS['certificate_base64']
                    ),
                    location=LocationSchemaName(
                        schema_name= SCHEMA_NAME
                    )
                )
             ).result
    else:
        print('Setting up internal datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook", 
                internal_database = True).result
        
    data_mart_id = added_data_mart_result.metadata.id
    
else:
    data_mart_id=data_marts[0].metadata.id
    print('Using existing datamart {}'.format(data_mart_id))

## Bind WML machine learning instance as Pre-Prod

Watson OpenScale needs to be bound to the Watson Machine Learning instance to capture payload data into and out of the model. If a binding with name "WML Pre-Prod" already exists, this code will delete that binding a create a new one.

In [None]:
SERVICE_PROVIDER_NAME = "Watson Machine Learning pre-prod openpage"
SERVICE_PROVIDER_DESCRIPTION = "Added by tutorial WOS notebook."

In [None]:
service_providers = wos_client.service_providers.list().result.service_providers
for service_provider in service_providers:
    service_instance_name = service_provider.entity.name
    if service_instance_name == SERVICE_PROVIDER_NAME:
        service_provider_id = service_provider.metadata.id
        wos_client.service_providers.delete(service_provider_id)
        print("Deleted existing service_provider for WML instance: {}".format(service_provider_id))

In [None]:
added_service_provider_result = wos_client.service_providers.add(
        name=SERVICE_PROVIDER_NAME,
        description=SERVICE_PROVIDER_DESCRIPTION,
        service_type=ServiceTypes.WATSON_MACHINE_LEARNING,
        deployment_space_id = preprod_space_id, # use pre-prod space ID
        operational_space_id = "pre_production",
        credentials=WMLCredentialsCloud(
            apikey=CLOUD_API_KEY,      ## use `apikey=IAM_TOKEN` if using IAM_TOKEN to initiate client
            url=WML_CREDENTIALS["url"],
            instance_id=None
        ),
        background_mode=False
    ).result
service_provider_id = added_service_provider_result.metadata.id
service_provider_id

In [None]:
asset_deployment_details = wos_client.service_providers.list_assets(data_mart_id=data_mart_id, service_provider_id=service_provider_id, deployment_space_id = preprod_space_id).result['resources'][1]
asset_deployment_details

In [None]:
model_asset_details_from_deployment=wos_client.service_providers.get_deployment_asset(data_mart_id=data_mart_id,service_provider_id=service_provider_id,deployment_id=pre_prod_deployment_uid,deployment_space_id=preprod_space_id)
model_asset_details_from_deployment

## Generate an IAM token

The following is a function that will generate an IAM access token used to interact with the Watson OpenScale APIs

In [None]:
def generate_access_token():
    headers={}
    headers["Content-Type"] = "application/x-www-form-urlencoded"
    headers["Accept"] = "application/json"
    auth = HTTPBasicAuth("bx", "bx")
    data = {
        "grant_type": "urn:ibm:params:oauth:grant-type:apikey",
        "apikey": CLOUD_API_KEY
    }
    response = requests.post(IAM_URL, data=data, headers=headers, auth=auth)
    json_data = response.json()
    iam_access_token = json_data['access_token']
    return iam_access_token

## Subscriptions
### Remove existing PreProd and Challenger credit risk subscriptions
This code removes previous subscriptions with name `German Credit Risk Model - PreProd` and `German Credit Risk Model - Challenger` to refresh the monitors with the new model and new data.

In [None]:
subscriptions = wos_client.subscriptions.list().result.subscriptions
for subscription in subscriptions:
    sub_model_name = subscription.entity.asset.name
    if sub_model_name == PRE_PROD_MODEL_NAME or sub_model_name == PRE_PROD_CHALLENGER_MODEL_NAME :
        wos_client.subscriptions.delete(subscription.metadata.id)
        print('Deleted existing subscription for model', subscription.entity.asset.asset_id)

In [None]:
from ibm_watson_openscale.base_classes.watson_open_scale_v2 import ScoringEndpointRequest

In [None]:
pre_prod_subscription_details = wos_client.subscriptions.add(
        data_mart_id=data_mart_id,
        service_provider_id=service_provider_id,
        asset=Asset(
            asset_id=model_asset_details_from_deployment["entity"]["asset"]["asset_id"],
            name=model_asset_details_from_deployment["entity"]["asset"]["name"],
            url=model_asset_details_from_deployment["entity"]["asset"]["url"],
            asset_type=AssetTypes.MODEL,
            input_data_type=InputDataType.STRUCTURED,
            problem_type=ProblemType.BINARY_CLASSIFICATION
        ),
        deployment=AssetDeploymentRequest(
            deployment_id=asset_deployment_details['metadata']['guid'],
            name=asset_deployment_details['entity']['name'],
            deployment_type= DeploymentTypes.ONLINE,
            url=asset_deployment_details['metadata']['url'],
            scoring_endpoint=ScoringEndpointRequest(url=pre_prod_scoring_url) # score model without shadow deployment
        ),
        asset_properties=AssetPropertiesRequest(
            label_column='Risk',
            probability_fields=['probability'],
            prediction_field='predictedLabel',
            feature_fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
            categorical_fields = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"],
            training_data_reference=TrainingDataReference(type='cos',
                                                          location=COSTrainingDataReferenceLocation(bucket = BUCKET_NAME,
                                                                                                    file_name = training_data_file_name),
                                                          connection=COSTrainingDataReferenceConnection.from_dict({
                                                                        "resource_instance_id": COS_RESOURCE_CRN,
                                                                        "url": COS_ENDPOINT,
                                                                        "api_key": COS_API_KEY_ID,
                                                                        "iam_url": IAM_URL})),
            training_data_schema=SparkStruct.from_dict(model_asset_details_from_deployment["entity"]["asset_properties"]["training_data_schema"])
        )
    ).result
pre_prod_subscription_id = pre_prod_subscription_details.metadata.id
pre_prod_subscription_id

## Subscribe challenger model

In [None]:
challenger_asset_deployment_details = wos_client.service_providers.list_assets(data_mart_id=data_mart_id, service_provider_id=service_provider_id, deployment_space_id = preprod_space_id).result['resources'][0]
challenger_asset_deployment_details

In [None]:
challenger_model_asset_details_from_deployment=wos_client.service_providers.get_deployment_asset(data_mart_id=data_mart_id,service_provider_id=service_provider_id,deployment_id=challenger_deployment_uid,deployment_space_id=preprod_space_id)
challenger_model_asset_details_from_deployment

In [None]:
challenger_subscription_details = wos_client.subscriptions.add(
        data_mart_id=data_mart_id,
        service_provider_id=service_provider_id,
        asset=Asset(
            asset_id=challenger_model_asset_details_from_deployment["entity"]["asset"]["asset_id"],
            name=challenger_model_asset_details_from_deployment["entity"]["asset"]["name"],
            url=challenger_model_asset_details_from_deployment["entity"]["asset"]["url"],
            asset_type=AssetTypes.MODEL,
            input_data_type=InputDataType.STRUCTURED,
            problem_type=ProblemType.BINARY_CLASSIFICATION
        ),
        deployment=AssetDeploymentRequest(
            deployment_id=challenger_asset_deployment_details['metadata']['guid'],
            name=challenger_asset_deployment_details['entity']['name'],
            deployment_type= DeploymentTypes.ONLINE,
            url=asset_deployment_details['metadata']['url'],
            scoring_endpoint=ScoringEndpointRequest(url=challenger_scoring_url) # score model without shadow deployment
            
        ),
        asset_properties=AssetPropertiesRequest(
            label_column='Risk',
            probability_fields=['probability'],
            prediction_field='prediction',
            feature_fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
            categorical_fields = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"],
            training_data_reference=TrainingDataReference(type='cos',
                                                          location=COSTrainingDataReferenceLocation(bucket = BUCKET_NAME,
                                                                                                    file_name = training_data_file_name),
                                                          connection=COSTrainingDataReferenceConnection.from_dict({
                                                                        "resource_instance_id": COS_RESOURCE_CRN,
                                                                        "url": COS_ENDPOINT,
                                                                        "api_key": COS_API_KEY_ID,
                                                                        "iam_url": IAM_URL})),
            training_data_schema=SparkStruct.from_dict(challenger_model_asset_details_from_deployment["entity"]["asset_properties"]["training_data_schema"])
        )
    ).result
challenger_subscription_id = challenger_subscription_details.metadata.id
challenger_subscription_id

### Score the model so we can configure monitors
Now that the WML service has been bound and the subscription has been created, we need to send a request to the model before we configure OpenScale. This allows OpenScale to create a payload log in the datamart with the correct schema, so it can capture data coming into and out of the model. First, the code gets the model deployment's endpoint URL, and then sends a few records for predictions.

In [None]:
fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"]
values = [
  ["no_checking",13,"credits_paid_to_date","car_new",1343,"100_to_500","1_to_4",2,"female","none",3,"savings_insurance",46,"none","own",2,"skilled",1,"none","yes"],
  ["no_checking",24,"prior_payments_delayed","furniture",4567,"500_to_1000","1_to_4",4,"male","none",4,"savings_insurance",36,"none","free",2,"management_self-employed",1,"none","yes"],
  ["0_to_200",26,"all_credits_paid_back","car_new",863,"less_100","less_1",2,"female","co-applicant",2,"real_estate",38,"none","own",1,"skilled",1,"none","yes"],
  ["0_to_200",14,"no_credits","car_new",2368,"less_100","1_to_4",3,"female","none",3,"real_estate",29,"none","own",1,"skilled",1,"none","yes"],
  ["0_to_200",4,"no_credits","car_new",250,"less_100","unemployed",2,"female","none",3,"real_estate",23,"none","rent",1,"management_self-employed",1,"none","yes"],
  ["no_checking",17,"credits_paid_to_date","car_new",832,"100_to_500","1_to_4",2,"male","none",2,"real_estate",42,"none","own",1,"skilled",1,"none","yes"],
  ["no_checking",33,"outstanding_credit","appliances",5696,"unknown","greater_7",4,"male","co-applicant",4,"unknown",54,"none","free",2,"skilled",1,"yes","yes"],
  ["0_to_200",13,"prior_payments_delayed","retraining",1375,"100_to_500","4_to_7",3,"male","none",3,"real_estate",37,"none","own",2,"management_self-employed",1,"none","yes"]
]

payload_scoring = {"input_data": [{"fields": fields, "values": values}]}


In [None]:
import time

time.sleep(5)
preprod_payload_data_set_id = None
preprod_payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=pre_prod_subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if preprod_payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id: ", preprod_payload_data_set_id)

In [None]:
scoring_response = wml_client.deployments.score(pre_prod_deployment_uid, payload_scoring)

print("Single record scoring result:", "\n fields:", scoring_response["predictions"][0]["fields"], "\n values: ", scoring_response["predictions"][0]["values"][0])

In [None]:
import uuid
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(preprod_payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    print("Payload logging did not happen, performing explicit payload logging.")
    wos_client.data_sets.store_records(data_set_id=preprod_payload_data_set_id, request_body=[PayloadRecord(
                   scoring_id=str(uuid.uuid4()),
                   request=payload_scoring,
                   response=scoring_response,
                   response_time=460
               )])
    time.sleep(5)
    pl_records_count = wos_client.data_sets.get_records_count(preprod_payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))

In [None]:
import time

time.sleep(5)
challenger_payload_data_set_id = None
challenger_payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=challenger_subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if challenger_payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id: ", challenger_payload_data_set_id)

In [None]:
scoring_response = wml_client.deployments.score(challenger_deployment_uid, payload_scoring)

print("Single record scoring result:", "\n fields:", scoring_response["predictions"][0]["fields"], "\n values: ", scoring_response["predictions"][0]["values"][0])

In [None]:
import uuid
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(challenger_payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    print("Payload logging did not happen, performing explicit payload logging.")
    wos_client.data_sets.store_records(data_set_id=challenger_payload_data_set_id, request_body=[PayloadRecord(
                   scoring_id=str(uuid.uuid4()),
                   request=payload_scoring,
                   response=scoring_response,
                   response_time=460
               )])
    time.sleep(5)
    pl_records_count = wos_client.data_sets.get_records_count(challenger_payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))

# Quality monitoring

## Enable quality monitoring
The code below waits ten seconds to allow the payload logging table to be set up before it begins enabling monitors. First, it turns on the quality (accuracy) monitor and sets an alert threshold of 80%. OpenScale will show an alert on the dashboard if the model accuracy measurement (area under the curve, in the case of a binary classifier) falls below this threshold.

The second paramater supplied, min_records, specifies the minimum number of feedback records OpenScale needs before it calculates a new measurement. The quality monitor runs hourly, but the accuracy reading in the dashboard will not change until an additional 50 feedback records have been added, via the user interface, the Python client, or the supplied feedback endpoint.

In [None]:
import time

time.sleep(10)
target = Target(
        target_type=TargetTypes.SUBSCRIPTION,
        target_id=pre_prod_subscription_id
)
parameters = {
    "min_feedback_data_size": 50
}
quality_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.QUALITY.ID,
    target=target,
    parameters=parameters
).result

preprod_quality_monitor_instance_id = quality_monitor_details.metadata.id
preprod_quality_monitor_instance_id

In [None]:
import time

time.sleep(10)
target = Target(
        target_type=TargetTypes.SUBSCRIPTION,
        target_id=challenger_subscription_id
)
parameters = {
    "min_feedback_data_size": 50
}
quality_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.QUALITY.ID,
    target=target,
    parameters=parameters
).result
challenger_quality_monitor_instance_id = quality_monitor_details.metadata.id
challenger_quality_monitor_instance_id

# Fairness, drift monitoring and explanations 

## Fairness configuration
The code below configures fairness monitoring for our model. It turns on monitoring for two features, Sex and Age. In each case, we must specify:

Which model feature to monitor
One or more majority groups, which are values of that feature that we expect to receive a higher percentage of favorable outcomes
One or more minority groups, which are values of that feature that we expect to receive a higher percentage of unfavorable outcomes
The threshold at which we would like OpenScale to display an alert if the fairness measurement falls below (in this case, 80%)
Additionally, we must specify which outcomes from the model are favourable outcomes, and which are unfavourable. We must also provide the number of records OpenScale will use to calculate the fairness score. In this case, OpenScale's fairness monitor will run hourly, but will not calculate a new fairness rating until at least 100 records have been added. Finally, to calculate fairness, OpenScale must perform some calculations on the training data, so we provide the dataframe containing the data.

In [None]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=pre_prod_subscription_id

)
parameters = {
    "features": [
        {"feature": "Sex",
         "majority": ['male'],
         "minority": ['female'],
         "threshold": 0.95
         },
        {"feature": "Age",
         "majority": [[26, 75]],
         "minority": [[18, 25]],
         "threshold": 0.95
         }
    ],
    "favourable_class": ["No Risk"],
    "unfavourable_class": ["Risk"],
    "min_records": 100
}

fairness_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.FAIRNESS.ID,
    target=target,
    parameters=parameters).result
preprod_fairness_monitor_instance_id =fairness_monitor_details.metadata.id
preprod_fairness_monitor_instance_id

In [None]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=challenger_subscription_id

)
parameters = {
    "features": [
        {"feature": "Sex",
         "majority": ['male'],
         "minority": ['female'],
         "threshold": 0.95
         },
        {"feature": "Age",
         "majority": [[26, 75]],
         "minority": [[18, 25]],
         "threshold": 0.95
         }
    ],
    "favourable_class": ["No Risk"],
    "unfavourable_class": ["Risk"],
    "min_records": 100
}

fairness_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.FAIRNESS.ID,
    target=target,
    parameters=parameters).result
challenger_fairness_monitor_instance_id =fairness_monitor_details.metadata.id
challenger_fairness_monitor_instance_id

## Drift configuration

Enable the drift configuration for both the subscription created with a threshold of 10% and minimal sample as 100 records.

In [None]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=pre_prod_subscription_id

)
parameters = {
    "min_samples": 100,
    "drift_threshold": 0.1,
    "train_drift_model": True,
    "enable_model_drift": False,
    "enable_data_drift": True
}

drift_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.DRIFT.ID,
    target=target,
    parameters=parameters
).result

preprod_drift_monitor_instance_id = drift_monitor_details.metadata.id
preprod_drift_monitor_instance_id

In [None]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=challenger_subscription_id

)
parameters = {
    "min_samples": 100,
    "drift_threshold": 0.1,
    "train_drift_model": True,
    "enable_model_drift": False,
    "enable_data_drift": True
}

drift_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.DRIFT.ID,
    target=target,
    parameters=parameters
).result

challenger_preprod_drift_monitor_instance_id = drift_monitor_details.metadata.id
challenger_preprod_drift_monitor_instance_id

## Configure Explainability
Finally, we provide OpenScale with the training data to enable and configure the explainability features.

In [None]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=pre_prod_subscription_id
)
parameters = {
    "enabled": True
}
explainability_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.EXPLAINABILITY.ID,
    target=target,
    parameters=parameters
).result

preprod_explainability_monitor_id = explainability_details.metadata.id
preprod_explainability_monitor_id

In [None]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=challenger_subscription_id
)
parameters = {
    "enabled": True
}
explainability_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.EXPLAINABILITY.ID,
    target=target,
    parameters=parameters
).result

challenger_explainability_monitor_id = explainability_details.metadata.id
challenger_explainability_monitor_id

## Enable model risk management (MRM) 

We enable the MRM configuration for both the subscriptions

In [None]:
WOS_GUID='***'# use openscale instance GUID here

In [None]:
headers = {}
headers["Content-Type"] = "application/json"
headers["Authorization"] = "Bearer {}".format(generate_access_token())

payload = {
  "data_mart_id": WOS_GUID,
  "monitor_definition_id": "mrm",
  "target": {
    "target_id": pre_prod_subscription_id,
    "target_type": "subscription"
  },
  "parameters": {
  },
  "managed_by": "user"
}

MONITOR_INSTANCES_URL = "https://api.aiopenscale.cloud.ibm.com/openscale/{0}/v2/monitor_instances".format(WOS_GUID)

response = requests.post(MONITOR_INSTANCES_URL, json=payload, headers=headers)
json_data = response.json()
print(json_data)
if "metadata" in json_data and "id" in json_data["metadata"]:
    pre_prod_mrm_instance_id = json_data["metadata"]["id"]

In [None]:
headers = {}
headers["Content-Type"] = "application/json"
headers["Authorization"] = "Bearer {}".format(generate_access_token())

payload = {
  "data_mart_id": WOS_GUID,
  "monitor_definition_id": "mrm",
  "target": {
    "target_id": challenger_subscription_id,
    "target_type": "subscription"
  },
  "parameters": {
  },
  "managed_by": "user"
}

MONITOR_INSTANCES_URL ="https://api.aiopenscale.cloud.ibm.com/openscale/{0}/v2/monitor_instances".format(WOS_GUID)

response = requests.post(MONITOR_INSTANCES_URL, json=payload, headers=headers)
json_data = response.json()
print(json_data)
if "metadata" in json_data and "id" in json_data["metadata"]:
    challenger_mrm_instance_id = json_data["metadata"]["id"]

## Create test data sets from the training data 

In [None]:
test_data_1 = pd_data[1:201]
test_data_1.to_csv("german_credit_risk_test_data_1.csv", encoding="utf-8", index=False)
test_data_2 = pd_data[201:401]
test_data_2.to_csv("german_credit_risk_test_data_2.csv", encoding="utf-8", index=False)
test_data_3 = pd_data[401:601]
test_data_3.to_csv("german_credit_risk_test_data_3.csv", encoding="utf-8", index=False)
test_data_4 = pd_data[601:801]
test_data_4.to_csv("german_credit_risk_test_data_4.csv", encoding="utf-8", index=False)

## Function to upload, evaluate and check the status of the evaluation

This function will upload the test data CSV and trigger the risk evaluation. It will iterate and check the status of the evaluation until its finished with a finite wait duration

In [None]:
def upload_and_evaluate(file_name, mrm_instance_id):
    
    print("Running upload and evaluate for {}".format(file_name))
    import json
    import time
    from datetime import datetime

    status = None
    monitoring_run_id = None
    GET_UPLOAD_AND_EVALUATION_STATUS_RETRIES = 32
    GET_UPLOAD_AND_EVALUATION_STATUS_INTERVAL = 10
    
    if file_name is not None:
        
        headers = {}
        headers["Content-Type"] = "text/csv"
        headers["Authorization"] = "Bearer {}".format(generate_access_token())
        
        POST_EVALUATIONS_URL ="https://api.aiopenscale.cloud.ibm.com/openscale/{0}/v2/monitoring_services/mrm/monitor_instances/{1}/risk_evaluations?test_data_set_name={2}".format(WOS_GUID, mrm_instance_id, file_name)

        with open(file_name) as file:
            f = file.read()
            b = bytearray(f, 'utf-8')

        response = requests.post(POST_EVALUATIONS_URL, data=bytes(b), headers=headers)
        if response.ok is False:
            print("Upload and evalaute for {0} failed with {1}: {2}".format(file_name, response.status_code, response.reason))
            return
        
        headers = {}
        headers["Content-Type"] = "application/json"
        headers["Authorization"] = "Bearer {}".format(generate_access_token())

        GET_EVALUATIONS_URL = "https://api.aiopenscale.cloud.ibm.com/openscale/{0}/v2/monitoring_services/mrm/monitor_instances/{1}/risk_evaluations".format(WOS_GUID, mrm_instance_id)
        
        for i in range(GET_UPLOAD_AND_EVALUATION_STATUS_RETRIES):
        
            response = requests.get(GET_EVALUATIONS_URL, headers=headers)
            if response.ok is False:
                print("Getting status of upload and evalaute for {0} failed with {1}: {2}".format(file_name, response.status_code, response.reason))
                return

            response = json.loads(response.text)
            if "metadata" in response and "id" in response["metadata"]:
                monitoring_run_id = response["metadata"]["id"]
            if "entity" in response and "status" in response["entity"]:
                status = response["entity"]["status"]["state"]
            
            if status is not None:
                print(datetime.utcnow().strftime('%H:%M:%S'), status.lower())
                if status.lower() in ["finished", "completed"]:
                    break
                elif "error"in status.lower():
                    print(response)
                    break

            time.sleep(GET_UPLOAD_AND_EVALUATION_STATUS_INTERVAL)

    return status, monitoring_run_id

## Perform Risk Evaluations

We now start performing evaluations of smaller data sets against both the PreProd and Challenger subscriptions

In [None]:
upload_and_evaluate("german_credit_risk_test_data_1.csv", pre_prod_mrm_instance_id)

In [None]:
upload_and_evaluate("german_credit_risk_test_data_2.csv", pre_prod_mrm_instance_id)

In [None]:
upload_and_evaluate("german_credit_risk_test_data_3.csv", pre_prod_mrm_instance_id)

In [None]:
upload_and_evaluate("german_credit_risk_test_data_4.csv", pre_prod_mrm_instance_id)

In [None]:
upload_and_evaluate("german_credit_risk_test_data_1.csv", challenger_mrm_instance_id)

In [None]:
upload_and_evaluate("german_credit_risk_test_data_2.csv", challenger_mrm_instance_id)

In [None]:
upload_and_evaluate("german_credit_risk_test_data_3.csv", challenger_mrm_instance_id)

In [None]:
upload_and_evaluate("german_credit_risk_test_data_4.csv", challenger_mrm_instance_id)

## Explore the Model Risk Management UI

Here is a quick recap of what we have done so far.

1. We've deployed two Credit Risk Model to a WML instance that is designated as Pre-Production
2. We've created subscriptions of these two model deployments in OpenScale
3. Configured all monitors supported by OpenScale for these subscriptions
4. We've performed a few risk evaluations against both these susbscription with the same set of test data

Now, please explore the Model Risk Management UI to visualize the results, compare the performance of models, download the evaluation report as PDF. For more information, refer to the Beta Guide section "Work in Watson OpenScale."

Link to OpenScale : https://aiopenscale.cloud.ibm.com/aiopenscale/insights?mrm=true

# Promote pre-production model to production 

After you have reviewed the evaluation results of the PreProd Vs Challenger and if you make the decision to promote the PreProd model to Production, the first thing you need to do is to deploy the model into a WML instance that is designated as Production instance

## Deploy model to production WML instance 

In [None]:
PROD_MODEL_NAME="German Credit Risk Model - Prod"
PROD_DEPLOYMENT_NAME="German Credit Risk Model - Prod"

In [None]:
space_name ="prod-space"
spaces = wml_client.spaces.get_details()['resources']
prod_space_id = None
for space in spaces:
    if space['entity']['name'] == space_name:
        prod_space_id = space["metadata"]["id"]
if prod_space_id is None:
    prod_space_id = wml_client.spaces.store(
        meta_props={wml_client.spaces.ConfigurationMetaNames.NAME: space_name,
                   wml_client.spaces.ConfigurationMetaNames.STORAGE: {"resource_crn":COS_CRN},
                   wml_client.spaces.ConfigurationMetaNames.COMPUTE: {"name": WML_INSTANCE_NAME,
                                            "crn": WML_CRN}})["metadata"]["id"]
wml_client.set.default_space(prod_space_id)
print(prod_space_id)

In [None]:
prod_model_uid, prod_deployment_uid, prod_scoring_url = deploy_credit_risk_spark_model(WML_CREDENTIALS, PROD_MODEL_NAME, PROD_DEPLOYMENT_NAME,prod_space_id)

## Bind WML machine learning instance as Prod

Watson OpenScale needs to be bound to the Watson Machine Learning instance to capture payload data into and out of the model. If a binding with name "WML Prod" already exists, this code will delete that binding a create a new one.

In [None]:
SERVICE_PROVIDER_NAME = "Watson Machine Learning prod openpage"
SERVICE_PROVIDER_DESCRIPTION = "Added by tutorial WOS notebook."

In [None]:
service_providers = wos_client.service_providers.list().result.service_providers
for service_provider in service_providers:
    service_instance_name = service_provider.entity.name
    if service_instance_name == SERVICE_PROVIDER_NAME:
        service_provider_id = service_provider.metadata.id
        wos_client.service_providers.delete(service_provider_id)
        print("Deleted existing service_provider for WML instance: {}".format(service_provider_id))

In [None]:
added_service_provider_result = wos_client.service_providers.add(
        name=SERVICE_PROVIDER_NAME,
        description=SERVICE_PROVIDER_DESCRIPTION,
        service_type=ServiceTypes.WATSON_MACHINE_LEARNING,
        deployment_space_id = prod_space_id, # use prod space ID
        operational_space_id = "production",
        credentials=WMLCredentialsCloud(
            apikey=CLOUD_API_KEY,      ## use `apikey=IAM_TOKEN` if using IAM_TOKEN to initiate client
            url=WML_CREDENTIALS["url"],
            instance_id=None
        ),
        background_mode=False
    ).result
service_provider_id = added_service_provider_result.metadata.id
service_provider_id

In [None]:
asset_deployment_details = wos_client.service_providers.list_assets(data_mart_id=data_mart_id, service_provider_id=service_provider_id, deployment_space_id = prod_space_id).result['resources'][0]
asset_deployment_details

# Import configuration settings from pre-prod model

With MRM we provide a important feature that lets you copy the configuration settings of your pre-production subscription to the production subscription. To try this out

1. Navigate to Model Monitors view in Insights dashboard of OpenScale
2. Click on the Add to dashboard
3. Select the production model deployment from WML production machine learning provider and click on Configure
4. In Selections saved dialog, click on Configure monitors
5. Click on Import settings
6. In the Import configuration settings dialog, choose the `German Credit Risk Model - PreProd` as the subscription from which you want to import the settings and click Configure
7. In the Replace existing settings? dialog, click on Import

All the configuration settings are now copied into the production subscription


<b>Note: The next set of cells should be executed only after finishing the import settings from the OpenScale dashboard</b>

## Score the production model so that we can trigger monitors

Now that the production subscription is configured by copying the configuration, there would be schedules created for each of the monitors to run on a scheduled basis. 
Quality, Fairness and Mrm will run hourly. Drift will run once in three hours.

For this demo purpose, we will trigger the monitors on-demand so that we can see the model summary dashboard without having to wait the entire hour. 
To do that lets first push some records in the Payload Logging table.

In [None]:
df = pd_data.sample(n=400)
df = df.drop(['Risk'], axis=1)
fields = df.columns.tolist()
values = df.values.tolist()
payload_scoring = {"input_data": [{"fields": fields, "values": values}]}

In [None]:
scoring_response = wml_client.deployments.score(prod_deployment_uid, payload_scoring)
print("Single record scoring result:", "\n fields:", scoring_response["predictions"][0]["fields"], "\n values: ", scoring_response["predictions"][0]["values"][0])

In [None]:
wos_client.subscriptions.show()

In [None]:
prod_subscription = wos_client.subscriptions.get('5949851d-8746-4bfa-b3f0-9a27c4cef98b').result.to_dict()
prod_subscription_id=prod_subscription['metadata']['id']

In [None]:
import time

time.sleep(5)
prod_payload_data_set_id = None
prod_payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=prod_subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if prod_payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id: ", prod_payload_data_set_id)

## Fetch all monitor instances

In [None]:
headers = {}
headers["Content-Type"] = "application/json"
headers["Authorization"] = "Bearer {}".format(generate_access_token())

MONITOR_INSTANCES_URL = "https://api.aiopenscale.cloud.ibm.com/openscale/{0}/v2/monitor_instances?target.target_id={1}&target.target_type=subscription".format(WOS_GUID, prod_subscription_id)
print(MONITOR_INSTANCES_URL)

response = requests.get(MONITOR_INSTANCES_URL, headers=headers)
monitor_instances = response.json()["monitor_instances"]

drift_monitor_instance_id = None
quality_monitor_instance_id = None
fairness_monitor_instance_id= None
mrm_monitor_instance_id = None

if monitor_instances is not None:
    for monitor_instance in monitor_instances:
        if "entity" in monitor_instance and "monitor_definition_id" in monitor_instance["entity"]:
            monitor_name = monitor_instance["entity"]["monitor_definition_id"]
            if "metadata" in monitor_instance and "id" in monitor_instance["metadata"]:
                id = monitor_instance["metadata"]["id"]
                if monitor_name == "drift":
                    drift_monitor_instance_id = id
                elif monitor_name == "fairness":
                    fairness_monitor_instance_id = id
                elif monitor_name == "quality":
                    quality_monitor_instance_id = id
                elif monitor_name == "mrm":
                    mrm_monitor_instance_id = id
                    
print("Quality monitor instance id - {0}".format(quality_monitor_instance_id))
print("Fairness monitor instance id - {0}".format(fairness_monitor_instance_id))
print("Drift monitor instance id - {0}".format(drift_monitor_instance_id))
print("MRM monitor instance id - {0}".format(mrm_monitor_instance_id))

## Function to get the monitoring run details

In [None]:
def get_monitoring_run_details(monitor_instance_id, monitoring_run_id):
    
    headers = {}
    headers["Content-Type"] = "application/json"
    headers["Authorization"] = "Bearer {}".format(generate_access_token())
    
    MONITORING_RUNS_URL = "https://api.aiopenscale.cloud.ibm.com/openscale/{0}/v2/monitor_instances/{1}/runs/{2}".format(WOS_GUID, monitor_instance_id, monitoring_run_id)
    response = requests.get(MONITORING_RUNS_URL, headers=headers, verify=False)
    return response.json()

## Run on-demand Quality

In [None]:
headers = {}
headers["Content-Type"] = "application/json"
headers["Authorization"] = "Bearer {}".format(generate_access_token())

if quality_monitor_instance_id is not None:
    MONITOR_RUN_URL = "https://api.aiopenscale.cloud.ibm.com/openscale/{0}/v2/monitor_instances/{1}/runs".format(WOS_GUID, quality_monitor_instance_id)
    payload = {
        "triggered_by": "user"
    }
    print("Triggering Quality computation with {}".format(MONITOR_RUN_URL))
    response = requests.post(MONITOR_RUN_URL, json=payload, headers=headers, verify=False)
    json_data = response.json()
    print()
    print(json_data)
    print()
    if "metadata" in json_data and "id" in json_data["metadata"]:
        quality_monitoring_run_id = json_data["metadata"]["id"]
    print("Done triggering Quality computation")

In [None]:
from datetime import datetime

quality_run_status = None
while quality_run_status != 'finished':
    monitoring_run_details = get_monitoring_run_details(quality_monitor_instance_id, quality_monitoring_run_id)
    quality_run_status = monitoring_run_details["entity"]["status"]["state"]
    if quality_run_status == "error":
        print(monitoring_run_details)
        break
    if quality_run_status != 'finished':
        print(datetime.utcnow().strftime('%H:%M:%S'), quality_run_status)
        time.sleep(10)
print(quality_run_status)

## Run on-demand Drift

In [None]:
headers = {}
headers["Content-Type"] = "application/json"
headers["Authorization"] = "Bearer {}".format(generate_access_token())

if drift_monitor_instance_id is not None:
    MONITOR_RUN_URL = "https://api.aiopenscale.cloud.ibm.com/openscale/{0}/v2/monitor_instances/{1}/runs".format(WOS_GUID, drift_monitor_instance_id)
    payload = {
        "triggered_by": "user"
    }
    print("Triggering Drift computation with {}".format(MONITOR_RUN_URL))
    response = requests.post(MONITOR_RUN_URL, json=payload, headers=headers, verify=False)
    json_data = response.json()
    print()
    print(json_data)
    print()
    if "metadata" in json_data and "id" in json_data["metadata"]:
        drift_monitoring_run_id = json_data["metadata"]["id"]
    print("Done triggering Drift computation")

In [None]:
from datetime import datetime

drift_run_status = None
while drift_run_status != 'finished':
    monitoring_run_details = get_monitoring_run_details(drift_monitor_instance_id, drift_monitoring_run_id)
    drift_run_status = monitoring_run_details["entity"]["status"]["state"]
    if drift_run_status == "error":
        print(monitoring_run_details)
        break
    if drift_run_status != 'finished':
        print(datetime.utcnow().strftime('%H:%M:%S'), drift_run_status)
        time.sleep(10)
print(drift_run_status)

## Run on-demand Fairness

In [None]:
headers = {}
headers["Content-Type"] = "application/json"
headers["Authorization"] = "Bearer {}".format(generate_access_token())

if fairness_monitor_instance_id is not None:
    MONITOR_RUN_URL = "https://api.aiopenscale.cloud.ibm.com/openscale/{0}/v2/monitor_instances/{1}/runs".format(WOS_GUID, fairness_monitor_instance_id)
    payload = {
        "triggered_by": "user"
    }
    print("Triggering fairness computation with {}".format(MONITOR_RUN_URL))
    response = requests.post(MONITOR_RUN_URL, json=payload, headers=headers, verify=False)
    json_data = response.json()
    print()
    print(json_data)
    print()
    if "metadata" in json_data and "id" in json_data["metadata"]:
        fairness_monitor_run_id = json_data["metadata"]["id"]
    print("Done triggering fairness computation")

In [None]:
from datetime import datetime

fairness_run_status = None
while fairness_run_status != 'finished':
    monitoring_run_details = get_monitoring_run_details(fairness_monitor_instance_id, fairness_monitor_run_id)
    fairness_run_status = monitoring_run_details["entity"]["status"]["state"]
    if fairness_run_status == "error":
        print(monitoring_run_details)
        break
    if fairness_run_status != 'finished':
        print(datetime.utcnow().strftime('%H:%M:%S'), fairness_run_status)
        time.sleep(10)
print(fairness_run_status)

## Run on-demand MRM

In [None]:
headers = {}
headers["Content-Type"] = "application/json"
headers["Authorization"] = "Bearer {}".format(generate_access_token())

if mrm_monitor_instance_id is not None:
    MONITOR_RUN_URL ="https://api.aiopenscale.cloud.ibm.com/openscale/{0}/v2/monitor_instances/{1}/runs".format(WOS_GUID, mrm_monitor_instance_id)
    payload = {
        "triggered_by": "user"
    }
    print("Triggering MRM computation with {}".format(MONITOR_RUN_URL))
    response = requests.post(MONITOR_RUN_URL, json=payload, headers=headers, verify=False)
    json_data = response.json()
    print()
    print(json_data)
    print()
    if "metadata" in json_data and "id" in json_data["metadata"]:
        mrm_monitoring_run_id = json_data["metadata"]["id"]
    print("Done triggering MRM computation")

In [None]:
from datetime import datetime

mrm_run_status = None
while mrm_run_status != 'finished':
    monitoring_run_details = get_monitoring_run_details(mrm_monitor_instance_id, mrm_monitoring_run_id)
    mrm_run_status = monitoring_run_details["entity"]["status"]["state"]
    if mrm_run_status == "error":
        print(monitoring_run_details)
        break
    if mrm_run_status != 'finished':
        print(datetime.utcnow().strftime('%H:%M:%S'), mrm_run_status)
        time.sleep(10)
print(mrm_run_status)

## Refresh the model summary of the production subscription in the OpenScale dashboard

This brings us to the end of this demo exercise. Thank you for trying it out.