# IBM Watson OpenScale Lab instructions

This notebook should be run in a Watson Studio project, using with **Python 3.6 with Spark** runtime environment. **If you are viewing this in Watson Studio and do not see Python 3.6 with Spark in the upper right corner of your screen, please update the runtime now.** It requires service credentials for the following Cloud services:
  * IBM Watson OpenScale
  * Watson Machine Learning
  
If you have a paid Cloud account, you may also provision a **Databases for PostgreSQL** or **Db2 Warehouse** service to take full advantage of integration with Watson Studio and continuous learning services. If you choose not to provision this paid service, you can use the free internal PostgreSQL storage with OpenScale, but will not be able to configure continuous learning for your model.

The notebook will train, create and deploy a Propensity to Buy Model, configure OpenScale to monitor that deployment, and inject seven days' worth of historical records and measurements for viewing in the OpenScale Insights dashboard.

# Business Scenario

National Chemical Company (NCC) operates two plants plants in South Lousiana.  Both plants produce the same chemical.  Both plants use the same suppliers.  Both plants are managed by the same people.  The only difference between the two plants is age.  Plant A is brand new.  It opened less than a year ago.  Plant B is more than 20 years old.

Bad Yields cost NCC millions of dollars each year.  A bad yield just means that the final product produced by the factory does not meet quality standards.  Ideally, they would like to predict a bad yield a few days in advance, drill down to see why there is a prediction of a bad yield and make corrections.  To accomplish this, NCC turned to IBM and Watson.  

In this notebook, you will construct a machine learning model and deploy that model to a Watson Machine Learning Service.  This model is based on historical data and predicts the probability of a bad yield.  

After the model is built and deployed, the notebook configures Watson Openscale.  Watson Openscale will allow NCC to monitor the accuracy of the model over-time and understand the key factors that cause  bad yields.  Knowing why a bad yield will occur allows the plant operators to make adjustments and prevent them from occuring.

Another key issue that NCC is worried about is the bad sensor readings in Plant B, the older plant.  They are concerned that the bad sensor readings will lead to invalid predictions.  Given that Plant A and B an should have a similar percentage of predicted Bad Yield yields, we will use the bias and fairness detection monitors inside Watson Openscale to correct bias that may occur because of the older sensors in Plant B.






# Package installation

In [1]:
!rm -rf $PIP_BUILD
!pip install psycopg2-binary | tail -n 1
!pip install --upgrade watson-machine-learning-client --no-cache | tail -n 1
!pip install --upgrade ibm-ai-openscale --no-cache | tail -n 1
!pip install --upgrade numpy --no-cache | tail -n 1
!pip install --upgrade lime --no-cache | tail -n 1
!pip install --upgrade SciPy --no-cache | tail -n 1

Waiting for a Spark session to start...
Spark Initialization Done! ApplicationId = app-20190524175432-0000
KERNEL_ID = 5678fab0-9dd3-4a3b-a05e-1a54230b4981
Successfully installed psycopg2-binary-2.8.2
[31mtensorflow 1.13.1 requires tensorboard<1.14.0,>=1.13.0, which is not installed.[0m
[31mspyder 3.3.3 requires pyqt5<=5.12; python_version >= "3", which is not installed.[0m
[31mibm-cos-sdk-core 2.4.4 has requirement urllib3<1.25,>=1.20, but you'll have urllib3 1.25.3 which is incompatible.[0m
[31mbotocore 1.12.82 has requirement urllib3<1.25,>=1.20, but you'll have urllib3 1.25.3 which is incompatible.[0m
Successfully installed certifi-2019.3.9 chardet-3.0.4 docutils-0.14 ibm-cos-sdk-2.4.4 ibm-cos-sdk-core-2.4.4 ibm-cos-sdk-s3transfer-2.4.4 idna-2.8 jmespath-0.9.4 lomond-0.3.3 numpy-1.16.3 pandas-0.24.2 python-dateutil-2.8.0 pytz-2019.1 requests-2.22.0 six-1.12.0 tabulate-0.8.3 tqdm-4.32.1 urllib3-1.25.3 watson-machine-learning-client-1.0.365
[31mtensorflow 1.13.1 requires ten

#### Provision services and configure credentials

If you have not already, provision an instance of IBM Watson OpenScale using the [OpenScale link in the Cloud catalog](https://cloud.ibm.com/catalog/services/ai-openscale).

Your Cloud API key can be generated by going to the [**Users** section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the **API Keys** section, and click **Create an IBM Cloud API key**. Give your key a name and click **Create**, then copy the created key and paste it below.

In [26]:
CLOUD_API_KEY = "PFHEY2Po9KcARJqfUtnv75yvAHtawFzURnMWorJedhK7"

Next you will need credentials for Watson Machine Learning. If you already have a WML instance, you may use credentials for it. To provision a new Lite instance of WML, use the [Cloud catalog](https://cloud.ibm.com/catalog/services/machine-learning), give your service a name, and click **Create**. Once your instance is created, click the **Service Credentials** link on the left side of the screen. Click the **New credential** button, give your credentials a name, and click **Add**. Your new credentials can be accessed by clicking the **View credentials** button. Copy and paste your WML credentials into the cell below.

In [27]:
WML_CREDENTIALS ={
  "apikey": "uoUoBHKgrz-OPFGBmLNhQgy_BlHNPXCqyJYaR-5tGNGt",
  "iam_apikey_description": "Auto-generated for key 9e155d75-e70d-40c7-96ee-dba3fe9a030d",
  "iam_apikey_name": "Service credentials-1",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/97deeb0b7e78431438a00a04f20580b7::serviceid:ServiceId-755ca7bd-4d01-44b9-912d-7c05f9d81a11",
  "instance_id": "67c0aed8-ed70-4029-a923-a585e83408c4",
  "password": "b7a11f66-129c-44fc-a963-958e9e4f6452",
  "url": "https://us-south.ml.cloud.ibm.com",
  "username": "9e155d75-e70d-40c7-96ee-dba3fe9a030d"
}

This lab can use Databases for PostgreSQL, Db2 Warehouse, or a free internal verison of PostgreSQL to create a datamart for OpenScale.

If you have previously configured OpenScale, it will use your existing datamart, and not interfere with any models you are currently monitoring. Do not update the cell below.

If you do not have a paid Cloud account or would prefer not to provision this paid service, you may use the free internal PostgreSQL service with OpenScale. Do not update the cell below.

To provision a new instance of Db2 Warehouse, locate [Db2 Warehouse in the Cloud catalog](https://cloud.ibm.com/catalog/services/db2-warehouse), give your service a name, and click **Create**. Once your instance is created, click the **Service Credentials** link on the left side of the screen. Click the **New credential** button, give your credentials a name, and click **Add**. Your new credentials can be accessed by clicking the **View credentials** button. Copy and paste your Db2 Warehouse credentials into the cell below.

To provision a new instance of Databases for PostgreSQL, locate [Databases for PostgreSQL in the Cloud catalog](https://cloud.ibm.com/catalog/services/databases-for-postgresql), give your service a name, and click **Create**. Once your instance is created, click the **Service Credentials** link on the left side of the screen. Click the **New credential** button, give your credentials a name, and click **Add**. Your new credentials can be accessed by clicking the **View credentials** button. Copy and paste your Databases for PostgreSQL credentials into the cell below.

In [28]:
DB_CREDENTIALS = None

__If you previously configured OpenScale to use the free internal version of PostgreSQL, you can switch to a new datamart using a paid database service.__ If you would like to delete the internal PostgreSQL configuration and create a new one using service credentials supplied in the cell above, set the __KEEP_MY_INTERNAL_POSTGRES__ variable below to __False__ below. In this case, the notebook will remove your existing internal PostgreSQL datamart and create a new one with the supplied credentials. __*NO DATA MIGRATION WILL OCCUR.*__

In [29]:
KEEP_MY_INTERNAL_POSTGRES = True

# Run the notebook

At this point, the notebook is ready to run. You can either run the cells one at a time, or click the **Kernel** option above and select **Restart and Run All** to run all the cells.

# Load and explore data

## Load the training data from github

In [30]:
!rm df_training.csv
!wget https://raw.githubusercontent.com/shadgriffin/oglabworking/master/df_training.csv

--2019-05-24 17:59:34--  https://raw.githubusercontent.com/shadgriffin/oglabworking/master/df_training.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.8.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.8.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 514802 (503K) [text/plain]
Saving to: 'df_training.csv'


2019-05-24 17:59:35 (23.3 MB/s) - 'df_training.csv' saved [514802/514802]



In [31]:
from pyspark.sql import SparkSession
import pandas as pd
import json

spark = SparkSession.builder.getOrCreate()
pd_data = pd.read_csv("df_training.csv", sep=",", header=0)
#df_data = spark.read.csv(path="df_training.csv", sep=",", header=True, inferSchema=True)
#df_data.head()

In [32]:
pd_data.columns

Index(['WS_001_FLOW_MEAN', 'WS_001_FLOW_MIN', 'WS_001_FLOW_MAX',
       'WS_001_CONC_MEAN', 'WS_001_CONC_MIN', 'WS_001_CONC_MAX',
       'DMW_FLOW_MEAN', 'DMW_FLOW_MIN', 'DMW_FLOW_MAX', 'ALK_FLOW_MEAN',
       'ALK_FLOW_MIN', 'ALK_FLOW_MAX', 'WS_002_FLOW_MEAN', 'WS_002_FLOW_MIN',
       'WS_002_FLOW_MAX', 'WS_002_CONC_MEAN', 'WS_002_CONC_MIN',
       'WS_002_CONC_MAX', 'RPM_MEAN', 'RPM_MIN', 'RPM_MAX', 'PLANT_A',
       'BAD_YIELD'],
      dtype='object')

In [33]:
pd_data.shape
pd_data.head()

Unnamed: 0,WS_001_FLOW_MEAN,WS_001_FLOW_MIN,WS_001_FLOW_MAX,WS_001_CONC_MEAN,WS_001_CONC_MIN,WS_001_CONC_MAX,DMW_FLOW_MEAN,DMW_FLOW_MIN,DMW_FLOW_MAX,ALK_FLOW_MEAN,...,WS_002_FLOW_MIN,WS_002_FLOW_MAX,WS_002_CONC_MEAN,WS_002_CONC_MIN,WS_002_CONC_MAX,RPM_MEAN,RPM_MIN,RPM_MAX,PLANT_A,BAD_YIELD
0,91.544345,91.677855,91.488572,88.603772,88.634486,88.584482,88.597187,88.598325,88.595852,91.467674,...,91.535971,91.470367,91.879905,91.982337,91.770328,5043.982357,5043.563746,5044.453746,1,GOOD
1,91.636583,91.677855,91.57625,88.603805,88.63305,88.583608,88.597187,88.598325,88.595852,91.467674,...,91.553737,91.445614,91.434144,91.430998,91.429932,5043.041871,5042.263746,5043.713746,0,GOOD
2,91.221818,91.333725,91.137861,88.603715,88.631168,88.588621,88.597187,88.598325,88.595852,91.467674,...,91.552534,91.457397,91.179878,91.210463,91.089537,5044.362843,5044.023746,5044.803746,1,GOOD
3,91.519911,91.591823,91.400894,88.603665,88.634511,88.581391,88.597187,88.598325,88.595852,91.467674,...,91.546751,91.453738,91.542207,91.651534,91.543397,5031.524232,5026.753746,5035.233746,1,GOOD
4,87.107759,87.290196,87.017004,92.930565,92.951904,92.915805,92.93055,92.934958,92.91834,86.453545,...,86.690369,86.477336,86.730213,86.910021,86.777862,5043.626663,5040.483746,5044.743746,0,BAD


In [34]:
df_data = spark.createDataFrame(pd_data)
df_data.head()

Row(WS_001_FLOW_MEAN=91.544345, WS_001_FLOW_MIN=91.677855, WS_001_FLOW_MAX=91.488572, WS_001_CONC_MEAN=88.60377199999998, WS_001_CONC_MIN=88.634486, WS_001_CONC_MAX=88.584482, DMW_FLOW_MEAN=88.597187, DMW_FLOW_MIN=88.598325, DMW_FLOW_MAX=88.59585200000002, ALK_FLOW_MEAN=91.467674, ALK_FLOW_MIN=91.52794, ALK_FLOW_MAX=91.439629, WS_002_FLOW_MEAN=91.489663, WS_002_FLOW_MIN=91.535971, WS_002_FLOW_MAX=91.470367, WS_002_CONC_MEAN=91.879905, WS_002_CONC_MIN=91.982337, WS_002_CONC_MAX=91.770328, RPM_MEAN=5043.982357, RPM_MIN=5043.563746, RPM_MAX=5044.453746, PLANT_A=1, BAD_YIELD='GOOD')

## Explore data

In [35]:
df_data.printSchema()

root
 |-- WS_001_FLOW_MEAN: double (nullable = true)
 |-- WS_001_FLOW_MIN: double (nullable = true)
 |-- WS_001_FLOW_MAX: double (nullable = true)
 |-- WS_001_CONC_MEAN: double (nullable = true)
 |-- WS_001_CONC_MIN: double (nullable = true)
 |-- WS_001_CONC_MAX: double (nullable = true)
 |-- DMW_FLOW_MEAN: double (nullable = true)
 |-- DMW_FLOW_MIN: double (nullable = true)
 |-- DMW_FLOW_MAX: double (nullable = true)
 |-- ALK_FLOW_MEAN: double (nullable = true)
 |-- ALK_FLOW_MIN: double (nullable = true)
 |-- ALK_FLOW_MAX: double (nullable = true)
 |-- WS_002_FLOW_MEAN: double (nullable = true)
 |-- WS_002_FLOW_MIN: double (nullable = true)
 |-- WS_002_FLOW_MAX: double (nullable = true)
 |-- WS_002_CONC_MEAN: double (nullable = true)
 |-- WS_002_CONC_MIN: double (nullable = true)
 |-- WS_002_CONC_MAX: double (nullable = true)
 |-- RPM_MEAN: double (nullable = true)
 |-- RPM_MIN: double (nullable = true)
 |-- RPM_MAX: double (nullable = true)
 |-- PLANT_A: long (nullable = true)
 |-- B

In [36]:
print("Number of records: " + str(df_data.count()))

Number of records: 2000


# Create a model

In [37]:
spark_df = df_data
(train_data, test_data) = spark_df.randomSplit([0.8, 0.2], 24)

MODEL_NAME = "Yield Model"
DEPLOYMENT_NAME = "Yield Model"

print("Number of records for training: " + str(train_data.count()))
print("Number of records for evaluation: " + str(test_data.count()))

spark_df.printSchema()

Number of records for training: 1612
Number of records for evaluation: 388
root
 |-- WS_001_FLOW_MEAN: double (nullable = true)
 |-- WS_001_FLOW_MIN: double (nullable = true)
 |-- WS_001_FLOW_MAX: double (nullable = true)
 |-- WS_001_CONC_MEAN: double (nullable = true)
 |-- WS_001_CONC_MIN: double (nullable = true)
 |-- WS_001_CONC_MAX: double (nullable = true)
 |-- DMW_FLOW_MEAN: double (nullable = true)
 |-- DMW_FLOW_MIN: double (nullable = true)
 |-- DMW_FLOW_MAX: double (nullable = true)
 |-- ALK_FLOW_MEAN: double (nullable = true)
 |-- ALK_FLOW_MIN: double (nullable = true)
 |-- ALK_FLOW_MAX: double (nullable = true)
 |-- WS_002_FLOW_MEAN: double (nullable = true)
 |-- WS_002_FLOW_MIN: double (nullable = true)
 |-- WS_002_FLOW_MAX: double (nullable = true)
 |-- WS_002_CONC_MEAN: double (nullable = true)
 |-- WS_002_CONC_MIN: double (nullable = true)
 |-- WS_002_CONC_MAX: double (nullable = true)
 |-- RPM_MEAN: double (nullable = true)
 |-- RPM_MIN: double (nullable = true)
 |-- RP

In [38]:
from pyspark.ml.feature import OneHotEncoder, StringIndexer, IndexToString, VectorAssembler
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml import Pipeline, Model
from pyspark.ml import linalg



In [39]:
si_Label = StringIndexer(inputCol="BAD_YIELD", outputCol="label").fit(spark_df)
label_converter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=si_Label.labels)

In [40]:
pd_data.columns

Index(['WS_001_FLOW_MEAN', 'WS_001_FLOW_MIN', 'WS_001_FLOW_MAX',
       'WS_001_CONC_MEAN', 'WS_001_CONC_MIN', 'WS_001_CONC_MAX',
       'DMW_FLOW_MEAN', 'DMW_FLOW_MIN', 'DMW_FLOW_MAX', 'ALK_FLOW_MEAN',
       'ALK_FLOW_MIN', 'ALK_FLOW_MAX', 'WS_002_FLOW_MEAN', 'WS_002_FLOW_MIN',
       'WS_002_FLOW_MAX', 'WS_002_CONC_MEAN', 'WS_002_CONC_MIN',
       'WS_002_CONC_MAX', 'RPM_MEAN', 'RPM_MIN', 'RPM_MAX', 'PLANT_A',
       'BAD_YIELD'],
      dtype='object')

In [41]:
va_features = VectorAssembler(inputCols=['WS_001_FLOW_MEAN', 'WS_001_FLOW_MIN', 'WS_001_FLOW_MAX',
       'WS_001_CONC_MEAN', 'WS_001_CONC_MIN', 'WS_001_CONC_MAX',
       'DMW_FLOW_MEAN', 'DMW_FLOW_MIN', 'DMW_FLOW_MAX', 'ALK_FLOW_MEAN',
       'ALK_FLOW_MIN', 'ALK_FLOW_MAX', 'WS_002_FLOW_MEAN', 'WS_002_FLOW_MIN',
       'WS_002_FLOW_MAX', 'WS_002_CONC_MEAN', 'WS_002_CONC_MIN',
       'WS_002_CONC_MAX', 'RPM_MEAN', 'RPM_MIN', 'RPM_MAX', 'PLANT_A'], outputCol="features")

In [42]:
from pyspark.ml.classification import RandomForestClassifier
classifier = RandomForestClassifier(featuresCol="features")

pipeline = Pipeline(stages=[ si_Label, va_features, classifier, label_converter])
model = pipeline.fit(train_data)

In [43]:
predictions = model.transform(test_data)
evaluatorDT = BinaryClassificationEvaluator(rawPredictionCol="prediction")
area_under_curve = evaluatorDT.evaluate(predictions)

#default evaluation is areaUnderROC
print("areaUnderROC = %g" % area_under_curve)

areaUnderROC = 0.873134


# Save and deploy the model

from scipy import sparse
from scipy import linalg

In [44]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient 
import json 



wml_client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)

### Remove existing model and deployment

In [45]:
model_deployment_ids = wml_client.deployments.get_uids()
for deployment_id in model_deployment_ids:
    deployment = wml_client.deployments.get_details(deployment_id)
    model_id = deployment['entity']['deployable_asset']['guid']
    if deployment['entity']['name'] == DEPLOYMENT_NAME:
        print('Deleting deployment id', deployment_id)
        wml_client.deployments.delete(deployment_id)
        print('Deleting model id', model_id)
        wml_client.repository.delete(model_id)
wml_client.repository.list_models()

----  ----  -------  ---------
GUID  NAME  CREATED  FRAMEWORK
----  ----  -------  ---------


In [46]:
model_props = {
    wml_client.repository.ModelMetaNames.NAME: "{}".format(MODEL_NAME),
    wml_client.repository.ModelMetaNames.EVALUATION_METHOD: "binary",
    wml_client.repository.ModelMetaNames.EVALUATION_METRICS: [
        {
           "name": "areaUnderROC",
           "value": area_under_curve,
           "threshold": 0.85
        }
    ]
}

In [47]:
wml_models = wml_client.repository.get_details()
model_uid = None
for model_in in wml_models['models']['resources']:
    if MODEL_NAME == model_in['entity']['name']:
        model_uid = model_in['metadata']['guid']
        break

if model_uid is None:
    print("Storing model ...")

    published_model_details = wml_client.repository.store_model(model=model, meta_props=model_props, training_data=train_data, pipeline=pipeline)
    model_uid = wml_client.repository.get_model_uid(published_model_details)
    print("Done")

Storing model ...




Done


In [48]:
model_uid

'02051e13-bc01-424a-b0be-199543b56d67'

In [49]:
wml_deployments = wml_client.deployments.get_details()
deployment_uid = None
for deployment in wml_deployments['resources']:
    if DEPLOYMENT_NAME == deployment['entity']['name']:
        deployment_uid = deployment['metadata']['guid']
        break

if deployment_uid is None:
    print("Deploying model...")

    deployment = wml_client.deployments.create(artifact_uid=model_uid, name=DEPLOYMENT_NAME, asynchronous=False)
    deployment_uid = wml_client.deployments.get_uid(deployment)
    
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))

Deploying model...


#######################################################################################

Synchronous deployment creation for uid: '02051e13-bc01-424a-b0be-199543b56d67' started

#######################################################################################


INITIALIZING
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='5b3a14df-8381-400a-8814-0715dbcfe295'
------------------------------------------------------------------------------------------------


Model id: 02051e13-bc01-424a-b0be-199543b56d67
Deployment id: 5b3a14df-8381-400a-8814-0715dbcfe295


# Configure OpenScale

In [50]:
from ibm_ai_openscale import APIClient
from ibm_ai_openscale.engines import *
from ibm_ai_openscale.utils import *
from ibm_ai_openscale.supporting_classes import PayloadRecord, Feature
from ibm_ai_openscale.supporting_classes.enums import *

### Get AI OpenScale GUID

In [51]:
import requests

AIOS_GUID = None
token_data = {
    'grant_type': 'urn:ibm:params:oauth:grant-type:apikey',
    'response_type': 'cloud_iam',
    'apikey': CLOUD_API_KEY
}

response = requests.post('https://iam.bluemix.net/identity/token', data=token_data)
iam_token = response.json()['access_token']
iam_headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer %s' % iam_token
}

resources = json.loads(requests.get('https://resource-controller.cloud.ibm.com/v2/resource_instances', headers=iam_headers).text)['resources']
for resource in resources:
    if "aiopenscale" in resource['id'].lower():
        AIOS_GUID = resource['guid']
        
AIOS_CREDENTIALS = {
    "instance_guid": AIOS_GUID,
    "apikey": CLOUD_API_KEY,
    "url": "https://api.aiopenscale.cloud.ibm.com"
}

if AIOS_GUID is None:
    print('AI OpenScale GUID NOT FOUND')
else:
    print(AIOS_GUID)

d5d03772-abc7-4c24-bd2a-36a7083778a0


## Create schema and datamart

In [52]:
ai_client = APIClient(aios_credentials=AIOS_CREDENTIALS)
ai_client.version
time.sleep(20)

### Set up datamart

In [53]:
try:
    data_mart_details = ai_client.data_mart.get_details()
    if 'internal_database' in data_mart_details and data_mart_details['internal_database']:
        if KEEP_MY_INTERNAL_POSTGRES:
            print('Using existing internal datamart.')
        else:
            if DB_CREDENTIALS is None:
                print('No postgres credentials supplied. Using existing internal datamart')
            else:
                print('Switching to external datamart')
                ai_client.data_mart.delete(force=True)
                ai_client.data_mart.setup(db_credentials=DB_CREDENTIALS)
    else:
        print('Using existing external datamart')
except:
    if DB_CREDENTIALS is None:
        print('Setting up internal datamart')
        ai_client.data_mart.setup(internal_db=True)
    else:
        print('Setting up external datamart')
        ai_client.data_mart.setup(db_credentials=DB_CREDENTIALS)
    

Setting up internal datamart


In [54]:
data_mart_details = ai_client.data_mart.get_details()
data_mart_details

{'internal_database': True,
 'service_instance_crn': 'crn:v1:bluemix:public:aiopenscale:us-south:a/97deeb0b7e78431438a00a04f20580b7:d5d03772-abc7-4c24-bd2a-36a7083778a0::',
 'status': {'state': 'active'},
 'database_configuration': {},
 'internal_database_pool': 'icd-psql'}

## Bind machine learning engines

In [55]:
binding_uid = ai_client.data_mart.bindings.add('WML instance', WatsonMachineLearningInstance(WML_CREDENTIALS))
if binding_uid is None:
    binding_uid = ai_client.data_mart.bindings.get_details()['service_bindings'][0]['metadata']['guid']
bindings_details = ai_client.data_mart.bindings.get_details()
ai_client.data_mart.bindings.list()

0,1,2,3
67c0aed8-ed70-4029-a923-a585e83408c4,WML instance,watson_machine_learning,2019-05-24T18:00:36.061Z


In [56]:
print(binding_uid)

67c0aed8-ed70-4029-a923-a585e83408c4


In [57]:
ai_client.data_mart.bindings.list_assets()

0,1,2,3,4,5,6
02051e13-bc01-424a-b0be-199543b56d67,Yield Model,2019-05-24T17:59:59.182Z,model,mllib-2.3,67c0aed8-ed70-4029-a923-a585e83408c4,False


## Subscriptions

### Remove existing  propensity to buy subscriptions

In [58]:
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
for subscription in subscriptions_uids:
    sub_name = ai_client.data_mart.subscriptions.get_details(subscription)['entity']['asset']['name']
    if sub_name == MODEL_NAME:
        ai_client.data_mart.subscriptions.delete(subscription)
        print('Deleted existing subscription for', MODEL_NAME)

In [59]:
subscription = ai_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(
    model_uid,
    problem_type=ProblemType.BINARY_CLASSIFICATION,
    input_data_type=InputDataType.STRUCTURED,
    label_column='BAD_YIELD',
    prediction_column='predictedLabel',
    probability_column='probability',
    feature_columns = ['WS_001_FLOW_MEAN', 'WS_001_FLOW_MIN', 'WS_001_FLOW_MAX',
       'WS_001_CONC_MEAN', 'WS_001_CONC_MIN', 'WS_001_CONC_MAX',
       'DMW_FLOW_MEAN', 'DMW_FLOW_MIN', 'DMW_FLOW_MAX', 'ALK_FLOW_MEAN',
       'ALK_FLOW_MIN', 'ALK_FLOW_MAX', 'WS_002_FLOW_MEAN', 'WS_002_FLOW_MIN',
       'WS_002_FLOW_MAX', 'WS_002_CONC_MEAN', 'WS_002_CONC_MIN',
       'WS_002_CONC_MAX', 'RPM_MEAN', 'RPM_MIN', 'RPM_MAX','PLANT_A'],
    categorical_columns = ['PLANT_A']
))

if subscription is None:
    print('Subscription already exists; get the existing one')
    subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
    for sub in subscriptions_uids:
        if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:
            subscription = ai_client.data_mart.subscriptions.get(sub)

Get subscription list

In [60]:
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
ai_client.data_mart.subscriptions.list()

0,1,2,3,4
02051e13-bc01-424a-b0be-199543b56d67,Yield Model,model,67c0aed8-ed70-4029-a923-a585e83408c4,2019-05-24T18:00:38.045Z


In [61]:

subscription.get_details()

{'metadata': {'guid': '02051e13-bc01-424a-b0be-199543b56d67',
  'url': '/v1/data_marts/d5d03772-abc7-4c24-bd2a-36a7083778a0/service_bindings/67c0aed8-ed70-4029-a923-a585e83408c4/subscriptions/02051e13-bc01-424a-b0be-199543b56d67',
  'created_at': '2019-05-24T18:00:38.045Z',
  'modified_at': '2019-05-24T18:00:38.836Z'},
 'entity': {'service_binding_id': '67c0aed8-ed70-4029-a923-a585e83408c4',
  'asset_properties': {'runtime_environment': 'spark-2.3',
   'predicted_target_field': 'predictedLabel',
   'training_data_schema': {'type': 'struct',
    'fields': [{'name': 'WS_001_FLOW_MEAN',
      'type': 'double',
      'nullable': True,
      'metadata': {'modeling_role': 'feature'}},
     {'name': 'WS_001_FLOW_MIN',
      'type': 'double',
      'nullable': True,
      'metadata': {'modeling_role': 'feature'}},
     {'name': 'WS_001_FLOW_MAX',
      'type': 'double',
      'nullable': True,
      'metadata': {'modeling_role': 'feature'}},
     {'name': 'WS_001_CONC_MEAN',
      'type': 'dou

### Score the model so we can configure monitors

In [62]:
propensity_to_buy_scoring_endpoint = None
print(deployment_uid)

for deployment in wml_client.deployments.get_details()['resources']:
    if deployment_uid in deployment['metadata']['guid']:
        propensity_to_buy_scoring_endpoint = deployment['entity']['scoring_url']
        
print(propensity_to_buy_scoring_endpoint)

5b3a14df-8381-400a-8814-0715dbcfe295
https://us-south.ml.cloud.ibm.com/v3/wml_instances/67c0aed8-ed70-4029-a923-a585e83408c4/deployments/5b3a14df-8381-400a-8814-0715dbcfe295/online


In [63]:
fields = ['WS_001_FLOW_MEAN', 'WS_001_FLOW_MIN', 'WS_001_FLOW_MAX',
       'WS_001_CONC_MEAN', 'WS_001_CONC_MIN', 'WS_001_CONC_MAX',
       'DMW_FLOW_MEAN', 'DMW_FLOW_MIN', 'DMW_FLOW_MAX', 'ALK_FLOW_MEAN',
       'ALK_FLOW_MIN', 'ALK_FLOW_MAX', 'WS_002_FLOW_MEAN', 'WS_002_FLOW_MIN',
       'WS_002_FLOW_MAX', 'WS_002_CONC_MEAN', 'WS_002_CONC_MIN',
       'WS_002_CONC_MAX', 'RPM_MEAN', 'RPM_MIN', 'RPM_MAX', 'PLANT_A']
values = [[87.118754,87.290196,100.017004,92.930604,92.953476,92.904793,92.93055,92.934958,92.91834,86.453545,86.63842,86.425383,86.496803,
           86.684725,86.461847,83.697635,86.910021,86.664397,5042.769927,5037.723746,5044.663746,0],
          [87.118754,87.290196,84.017004,92.930604,92.953476,92.904793,92.93055,92.934958,92.91834,86.453545,86.63842,86.425383,86.496803,
           86.684725,86.461847,86.697635,86.910021,86.664397,5042.769927,5037.723746,5044.663746,0],
          [87.118754,87.290196,87.017004,92.930604,92.953476,92.904793,92.93055,92.934958,92.91834,86.453545,86.63842,86.425383,86.496803,
           86.684725,86.461847,86.697635,86.910021,86.664397,5042.769927,5037.723746,5044.663746,0],
          [87.118754,87.290196,87.017004,92.930604,92.953476,92.904793,92.93055,92.934958,92.91834,86.453545,86.63842,86.425383,86.496803,
           86.684725,86.461847,86.697635,86.910021,86.664397,5042.769927,5037.723746,5044.663746,0],
          [87.118754,87.290196,87.017004,92.930604,92.953476,92.904793,92.93055,92.934958,92.91834,86.453545,86.63842,86.425383,86.496803,
           86.684725,86.461847,86.697635,86.910021,86.664397,5042.769927,5037.723746,5044.663746,0],
          [87.118754,87.290196,87.017004,92.930604,92.953476,92.904793,92.93055,92.934958,92.91834,86.453545,86.63842,86.425383,86.496803,
           86.684725,86.461847,86.697635,86.910021,86.664397,5042.769927,5037.723746,5044.663746,0],
          [87.118754,87.290196,87.017004,92.930604,92.953476,92.904793,92.93055,92.934958,92.91834,86.453545,86.63842,86.425383,86.496803,
           86.684725,86.461847,86.697635,86.910021,86.664397,5042.769927,5037.723746,5044.663746,0],
          [87.118754,87.290196,87.017004,92.930604,92.953476,92.904793,92.93055,92.934958,92.91834,86.453545,86.63842,86.425383,86.496803,
           86.684725,86.461847,86.697635,86.910021,86.664397,5042.769927,5037.723746,5044.663746,0],
         ]
payload_scoring = {"fields": fields,"values": values}
scoring_response = wml_client.deployments.score(propensity_to_buy_scoring_endpoint, payload_scoring)

print(scoring_response)

{'fields': ['WS_001_FLOW_MEAN', 'WS_001_FLOW_MIN', 'WS_001_FLOW_MAX', 'WS_001_CONC_MEAN', 'WS_001_CONC_MIN', 'WS_001_CONC_MAX', 'DMW_FLOW_MEAN', 'DMW_FLOW_MIN', 'DMW_FLOW_MAX', 'ALK_FLOW_MEAN', 'ALK_FLOW_MIN', 'ALK_FLOW_MAX', 'WS_002_FLOW_MEAN', 'WS_002_FLOW_MIN', 'WS_002_FLOW_MAX', 'WS_002_CONC_MEAN', 'WS_002_CONC_MIN', 'WS_002_CONC_MAX', 'RPM_MEAN', 'RPM_MIN', 'RPM_MAX', 'PLANT_A', 'BAD_YIELD', 'label', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'], 'values': [[87.118754, 87.290196, 100.017004, 92.930604, 92.953476, 92.904793, 92.93055, 92.934958, 92.91834, 86.453545, 86.63842, 86.425383, 86.496803, 86.684725, 86.461847, 83.697635, 86.910021, 86.664397, 5042.769927, 5037.723746, 5044.663746, 0, 'GOOD', 0.0, [87.118754, 87.290196, 100.017004, 92.930604, 92.953476, 92.904793, 92.93055, 92.934958, 92.91834, 86.453545, 86.63842, 86.425383, 86.496803, 86.684725, 86.461847, 83.697635, 86.910021, 86.664397, 5042.769927, 5037.723746, 5044.663746, 0.0], [0.016949

## Quality and feedback monitoring

### Enable quality monitoring

Wait ten seconds to allow the payload logging table to be set up before we begin enabling monitors.

In [64]:
time.sleep(20)
subscription.quality_monitoring.enable(threshold=0.7, min_records=100)

### Feedback logging

In [65]:
!rm df_feedback.json
!wget https://raw.githubusercontent.com/shadgriffin/oglabworking/master/df_feedback.json

rm: cannot remove 'df_feedback.json': No such file or directory
--2019-05-24 18:01:02--  https://raw.githubusercontent.com/shadgriffin/oglabworking/master/df_feedback.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.48.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 223977 (219K) [text/plain]
Saving to: 'df_feedback.json'


2019-05-24 18:01:02 (23.3 MB/s) - 'df_feedback.json' saved [223977/223977]



In [66]:
with open('df_feedback.json') as feedback_file:
    df_feedback = json.load(feedback_file)
subscription.feedback_logging.store(df_feedback)

In [67]:
subscription.feedback_logging.show_table()

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
87.255584,87.376229,87.280037,92.930513,92.95901,92.896258,92.93055,92.934958,92.91834,87.957784,88.105276,87.929657,87.920131,88.074473,87.909034,87.646367,87.902431,87.572118,5041.57333,5040.013746,5043.673746,0,GOOD,2019-05-24 18:01:03.180319+00:00
87.311782,87.462261,87.280037,92.683112,92.715594,92.651412,92.682929,92.68715,92.67134,87.957784,88.105276,87.929657,87.920571,88.088079,87.884347,87.882358,88.122966,87.912514,5045.122913,5044.773746,5045.493746,1,GOOD,2019-05-24 18:01:03.180319+00:00
91.588937,91.591823,91.57625,88.6034,88.632419,88.586145,88.597187,88.598325,88.595852,91.467674,91.52794,91.439629,91.48318,91.550868,91.444996,91.484997,91.430998,91.429932,4991.47076,4983.533746,4998.283746,0,GOOD,2019-05-24 18:01:03.180319+00:00
91.596878,91.591823,91.57625,88.60385,88.631453,88.582832,88.597187,88.598325,88.595852,91.467674,91.52794,91.439629,91.483025,91.547537,91.452218,91.504862,91.541266,91.429932,5040.781524,5036.263746,5043.683746,0,GOOD,2019-05-24 18:01:03.180319+00:00
91.548621,91.677855,91.488572,88.603593,88.634177,88.582352,88.597187,88.598325,88.595852,91.467674,91.52794,91.439629,91.485054,91.548786,91.465616,91.586704,91.651534,91.543397,5034.505621,5030.773746,5038.013746,0,GOOD,2019-05-24 18:01:03.180319+00:00
87.141967,87.290196,87.104682,92.930718,92.954417,92.911555,92.93055,92.934958,92.91834,86.453545,86.63842,86.425383,86.495959,86.672409,86.471459,86.6881,86.910021,86.664397,5043.687288,5043.153746,5044.203746,0,BAD,2019-05-24 18:01:03.180319+00:00
91.521744,91.591823,91.488572,88.603462,88.631106,88.586576,88.597187,88.598325,88.595852,91.467674,91.52794,91.439629,91.480428,91.551562,91.448132,91.571607,91.651534,91.543397,5043.381385,5042.843746,5043.993746,0,GOOD,2019-05-24 18:01:03.180319+00:00
91.419732,91.50579,91.400894,88.603947,88.630995,88.586305,88.597187,88.598325,88.595852,91.467674,91.52794,91.439629,91.485244,91.548277,91.447467,91.3086,91.320731,91.316467,5042.460413,5040.983746,5043.293746,0,GOOD,2019-05-24 18:01:03.180319+00:00
91.240143,91.333725,91.225539,88.603573,88.630524,88.585369,88.597187,88.598325,88.595852,91.467674,91.52794,91.439629,91.482578,91.553644,91.444331,91.446063,91.430998,91.429932,5045.124996,5044.903746,5045.493746,0,GOOD,2019-05-24 18:01:03.180319+00:00
87.135247,87.290196,87.104682,92.930309,92.955185,92.907417,92.93055,92.934958,92.91834,86.453545,86.63842,86.425383,86.496779,86.681255,86.471264,86.719089,86.910021,86.664397,5042.353191,5039.043746,5044.093746,0,BAD,2019-05-24 18:01:03.180319+00:00


In [68]:
run_details = subscription.quality_monitoring.run()
status = run_details['status']
id = run_details['id']
print(id)

print("Run status: {}".format(status))

start_time = time.time()
elapsed_time = 0

while status != 'completed' and elapsed_time < 60:
    time.sleep(10)
    run_details = subscription.quality_monitoring.get_run_details(run_uid=id)
    status = run_details['status']
    elapsed_time = time.time() - start_time
    print("Run status: {}".format(status))

dc26ee13-2600-45e0-b864-35b5f01e3443
Run status: initializing
Run status: completed


In [69]:
subscription.quality_monitoring.get_run_details()

{'evaluations': [{'stages': [{'name': 'Prerequisite Check',
     'completed_at': '2019-05-24T18:01:07.991Z',
     'started_at': '2019-05-24T18:01:07.720Z',
     'id': 1,
     'properties': {'training_columns': ['WS_001_FLOW_MEAN',
       'WS_001_FLOW_MIN',
       'WS_001_FLOW_MAX',
       'WS_001_CONC_MEAN',
       'WS_001_CONC_MIN',
       'WS_001_CONC_MAX',
       'DMW_FLOW_MEAN',
       'DMW_FLOW_MIN',
       'DMW_FLOW_MAX',
       'ALK_FLOW_MEAN',
       'ALK_FLOW_MIN',
       'ALK_FLOW_MAX',
       'WS_002_FLOW_MEAN',
       'WS_002_FLOW_MIN',
       'WS_002_FLOW_MAX',
       'WS_002_CONC_MEAN',
       'WS_002_CONC_MIN',
       'WS_002_CONC_MAX',
       'RPM_MEAN',
       'RPM_MIN',
       'RPM_MAX',
       'PLANT_A',
       'BAD_YIELD'],
      'input_columns': ['WS_001_FLOW_MEAN',
       'WS_001_FLOW_MIN',
       'WS_001_FLOW_MAX',
       'WS_001_CONC_MEAN',
       'WS_001_CONC_MIN',
       'WS_001_CONC_MAX',
       'DMW_FLOW_MEAN',
       'DMW_FLOW_MIN',
       'DMW_FLOW_MAX',
 

In [70]:
subscription.quality_monitoring.show_table()

0,1,2,3,4,5,6,7,8,9
2019-05-24 18:01:07.720000+00:00,true_positive_rate,60ddba29-b2bd-40f4-b7ca-6955a7263209,0.8762376237623762,,,model_type: original,67c0aed8-ed70-4029-a923-a585e83408c4,02051e13-bc01-424a-b0be-199543b56d67,5b3a14df-8381-400a-8814-0715dbcfe295
2019-05-24 18:01:07.720000+00:00,area_under_roc,60ddba29-b2bd-40f4-b7ca-6955a7263209,0.937492245465148,0.7,,model_type: original,67c0aed8-ed70-4029-a923-a585e83408c4,02051e13-bc01-424a-b0be-199543b56d67,5b3a14df-8381-400a-8814-0715dbcfe295
2019-05-24 18:01:07.720000+00:00,precision,60ddba29-b2bd-40f4-b7ca-6955a7263209,0.99438202247191,,,model_type: original,67c0aed8-ed70-4029-a923-a585e83408c4,02051e13-bc01-424a-b0be-199543b56d67,5b3a14df-8381-400a-8814-0715dbcfe295
2019-05-24 18:01:07.720000+00:00,f1_measure,60ddba29-b2bd-40f4-b7ca-6955a7263209,0.9315789473684212,,,model_type: original,67c0aed8-ed70-4029-a923-a585e83408c4,02051e13-bc01-424a-b0be-199543b56d67,5b3a14df-8381-400a-8814-0715dbcfe295
2019-05-24 18:01:07.720000+00:00,accuracy,60ddba29-b2bd-40f4-b7ca-6955a7263209,0.974,,,model_type: original,67c0aed8-ed70-4029-a923-a585e83408c4,02051e13-bc01-424a-b0be-199543b56d67,5b3a14df-8381-400a-8814-0715dbcfe295
2019-05-24 18:01:07.720000+00:00,log_loss,60ddba29-b2bd-40f4-b7ca-6955a7263209,0.0662841688780246,,,model_type: original,67c0aed8-ed70-4029-a923-a585e83408c4,02051e13-bc01-424a-b0be-199543b56d67,5b3a14df-8381-400a-8814-0715dbcfe295
2019-05-24 18:01:07.720000+00:00,false_positive_rate,60ddba29-b2bd-40f4-b7ca-6955a7263209,0.0012531328320802,,,model_type: original,67c0aed8-ed70-4029-a923-a585e83408c4,02051e13-bc01-424a-b0be-199543b56d67,5b3a14df-8381-400a-8814-0715dbcfe295
2019-05-24 18:01:07.720000+00:00,area_under_pr,60ddba29-b2bd-40f4-b7ca-6955a7263209,0.9453484814773612,,,model_type: original,67c0aed8-ed70-4029-a923-a585e83408c4,02051e13-bc01-424a-b0be-199543b56d67,5b3a14df-8381-400a-8814-0715dbcfe295
2019-05-24 18:01:07.720000+00:00,recall,60ddba29-b2bd-40f4-b7ca-6955a7263209,0.8762376237623762,,,model_type: original,67c0aed8-ed70-4029-a923-a585e83408c4,02051e13-bc01-424a-b0be-199543b56d67,5b3a14df-8381-400a-8814-0715dbcfe295
2019-05-24 18:01:07.720000+00:00,true_positive_rate,b6e351ce-6c89-4a95-bce1-e643c9ac33f7,0.8762376237623762,,,model_type: recommended,67c0aed8-ed70-4029-a923-a585e83408c4,02051e13-bc01-424a-b0be-199543b56d67,5b3a14df-8381-400a-8814-0715dbcfe295


subscription.quality_monitoring._get_data_from_rest_api()

In [71]:
ai_client.data_mart.get_deployment_metrics()

{'deployment_metrics': [{'subscription': {'subscription_id': '02051e13-bc01-424a-b0be-199543b56d67',
    'url': '/v1/data_marts/d5d03772-abc7-4c24-bd2a-36a7083778a0/service_bindings/67c0aed8-ed70-4029-a923-a585e83408c4/subscriptions/02051e13-bc01-424a-b0be-199543b56d67'},
   'asset': {'name': 'Yield Model',
    'asset_id': '02051e13-bc01-424a-b0be-199543b56d67',
    'url': 'https://us-south.ml.cloud.ibm.com/v3/wml_instances/67c0aed8-ed70-4029-a923-a585e83408c4/published_models/02051e13-bc01-424a-b0be-199543b56d67',
    'asset_type': 'model',
    'created_at': '2019-05-24T17:59:59.182Z'},
   'deployment': {'name': 'Yield Model',
    'url': 'https://us-south.ml.cloud.ibm.com/v3/wml_instances/67c0aed8-ed70-4029-a923-a585e83408c4/deployments/5b3a14df-8381-400a-8814-0715dbcfe295',
    'deployment_type': 'online',
    'scoring_endpoint': {'url': 'https://us-south.ml.cloud.ibm.com/v3/wml_instances/67c0aed8-ed70-4029-a923-a585e83408c4/deployments/5b3a14df-8381-400a-8814-0715dbcfe295/online',
 

## Fairness monitoring

In [72]:
subscription.fairness_monitoring.enable(
            features=[
                Feature("PLANT_A", majority=[[1,1]], minority=[[0,0]], threshold=0.95)
            ],
            favourable_classes=['1'],
            unfavourable_classes=['0'],
            min_records=1000,
            training_data=pd_data
        )

## Score the model again now that monitoring is configured

In [73]:
!rm df_payload_biased-a.json
!wget https://raw.githubusercontent.com/shadgriffin/oglabworking/master/df_payload_biased-a.json

rm: cannot remove 'df_payload_biased-a.json': No such file or directory
--2019-05-24 18:02:20--  https://raw.githubusercontent.com/shadgriffin/oglabworking/master/df_payload_biased-a.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.8.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.8.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 434642 (424K) [text/plain]
Saving to: 'df_payload_biased-a.json'


2019-05-24 18:02:20 (20.2 MB/s) - 'df_payload_biased-a.json' saved [434642/434642]



Score 1000 randomly chosen records

In [74]:
import random

with open('df_payload_biased-a.json', 'r') as scoring_file:
    scoring_data = json.load(scoring_file)

fields = scoring_data['fields']
values = []
for _ in range(1000):
    values.append(random.choice(scoring_data['values']))
payload_scoring = {"fields": fields, "values": values}

scoring_response = wml_client.deployments.score(propensity_to_buy_scoring_endpoint, payload_scoring)
print(scoring_response)

{'fields': ['WS_001_FLOW_MEAN', 'WS_001_FLOW_MIN', 'WS_001_FLOW_MAX', 'WS_001_CONC_MEAN', 'WS_001_CONC_MIN', 'WS_001_CONC_MAX', 'DMW_FLOW_MEAN', 'DMW_FLOW_MIN', 'DMW_FLOW_MAX', 'ALK_FLOW_MEAN', 'ALK_FLOW_MIN', 'ALK_FLOW_MAX', 'WS_002_FLOW_MEAN', 'WS_002_FLOW_MIN', 'WS_002_FLOW_MAX', 'WS_002_CONC_MEAN', 'WS_002_CONC_MIN', 'WS_002_CONC_MAX', 'RPM_MEAN', 'RPM_MIN', 'RPM_MAX', 'PLANT_A', 'BAD_YIELD', 'label', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'], 'values': [[87.247643, 87.462261, 87.19236, 92.682978, 92.715619, 92.652348, 92.682929, 92.68715, 92.67134, 87.957784, 88.105276, 87.929657, 87.919598, 88.088574, 87.881083, 87.885537, 88.122966, 87.912514, 5044.131732, 5043.393746, 5044.613746, 1, 'GOOD', 0.0, [87.247643, 87.462261, 87.19236, 92.682978, 92.715619, 92.652348, 92.682929, 92.68715, 92.67134, 87.957784, 88.105276, 87.929657, 87.919598, 88.088574, 87.881083, 87.885537, 88.122966, 87.912514, 5044.131732, 5043.393746, 5044.613746, 1.0], [15.5003116

In [75]:
subscription.get_details()

{'metadata': {'guid': '02051e13-bc01-424a-b0be-199543b56d67',
  'url': '/v1/data_marts/d5d03772-abc7-4c24-bd2a-36a7083778a0/service_bindings/67c0aed8-ed70-4029-a923-a585e83408c4/subscriptions/02051e13-bc01-424a-b0be-199543b56d67',
  'created_at': '2019-05-24T18:00:38.045Z',
  'modified_at': '2019-05-24T18:01:10.263Z'},
 'entity': {'service_binding_id': '67c0aed8-ed70-4029-a923-a585e83408c4',
  'asset_properties': {'runtime_environment': 'spark-2.3',
   'predicted_target_field': 'predictedLabel',
   'training_data_schema': {'type': 'struct',
    'fields': [{'name': 'WS_001_FLOW_MEAN',
      'type': 'double',
      'nullable': True,
      'metadata': {'modeling_role': 'feature'}},
     {'name': 'WS_001_FLOW_MIN',
      'type': 'double',
      'nullable': True,
      'metadata': {'modeling_role': 'feature'}},
     {'name': 'WS_001_FLOW_MAX',
      'type': 'double',
      'nullable': True,
      'metadata': {'modeling_role': 'feature'}},
     {'name': 'WS_001_CONC_MEAN',
      'type': 'dou

# Insert historical payloads

In [76]:
!rm payload_history*.json
!wget https://raw.githubusercontent.com/shadgriffin/propensitytobuylab/master/payload_history_1.json
!wget https://raw.githubusercontent.com/shadgriffin/propensitytobuylab/master/payload_history_2.json
!wget https://raw.githubusercontent.com/shadgriffin/propensitytobuylab/master/payload_history_3.json
!wget https://raw.githubusercontent.com/shadgriffin/propensitytobuylab/master/payload_history_4.json
!wget https://raw.githubusercontent.com/shadgriffin/propensitytobuylab/master/payload_history_5.json
!wget https://raw.githubusercontent.com/shadgriffin/propensitytobuylab/master/payload_history_6.json
!wget https://raw.githubusercontent.com/shadgriffin/propensitytobuylab/master/payload_history_7.json

rm: cannot remove 'payload_history*.json': No such file or directory
--2019-05-24 18:02:22--  https://raw.githubusercontent.com/shadgriffin/propensitytobuylab/master/payload_history_1.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.48.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2588835 (2.5M) [text/plain]
Saving to: 'payload_history_1.json'


2019-05-24 18:02:22 (42.8 MB/s) - 'payload_history_1.json' saved [2588835/2588835]

--2019-05-24 18:02:23--  https://raw.githubusercontent.com/shadgriffin/propensitytobuylab/master/payload_history_2.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.8.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.8.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2588676 (2.5M) [text/plain]
Saving to: 'payload_history_2.json'



In [77]:
historyDays = 7
from ibm_ai_openscale.supporting_classes import PayloadRecord, Feature
import datetime
import time

In [78]:
data_mart_id = subscription.get_details()['metadata']['url'].split('/service_bindings')[0].split('marts/')[1]
print(data_mart_id)

d5d03772-abc7-4c24-bd2a-36a7083778a0


In [79]:
performance_metrics_url = 'https://api.aiopenscale.cloud.ibm.com' + subscription.get_details()['metadata']['url'].split('/service_bindings')[0] + '/metrics'
print(performance_metrics_url)

https://api.aiopenscale.cloud.ibm.com/v1/data_marts/d5d03772-abc7-4c24-bd2a-36a7083778a0/metrics


## Insert historical fairness metrics

In [80]:
!rm fairness_records.json
!wget https://raw.githubusercontent.com/shadgriffin/oglabworking/master/fairness_records.json
import random

rm: cannot remove 'fairness_records.json': No such file or directory
--2019-05-24 18:02:29--  https://raw.githubusercontent.com/shadgriffin/oglabworking/master/fairness_records.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.8.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.8.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11173 (11K) [text/plain]
Saving to: 'fairness_records.json'


2019-05-24 18:02:29 (18.1 MB/s) - 'fairness_records.json' saved [11173/11173]



In [81]:

token_data = {
    'grant_type': 'urn:ibm:params:oauth:grant-type:apikey',
    'response_type': 'cloud_iam',
    'apikey': AIOS_CREDENTIALS['apikey']
}

response = requests.post('https://iam.bluemix.net/identity/token', data=token_data)
iam_token = response.json()['access_token']
iam_headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer %s' % iam_token
}

with open('fairness_records.json', 'r') as history_file:
    payloads = json.load(history_file)

for day in range(historyDays):
    print('Day', day + 1)
    for hour in range(24):
        score_time = (datetime.datetime.utcnow() + datetime.timedelta(hours=(-(24*day + hour + 1)))).strftime('%Y-%m-%dT%H:%M:%SZ')
        
        qualityMetric = {
            'metric_type': 'fairness',
            'binding_id': binding_uid,
            'timestamp': score_time,
            'subscription_id': model_uid,
            'asset_revision': model_uid,
            'deployment_id': deployment_uid,
            'value': random.choice(payloads)
        }

        response = requests.post(performance_metrics_url, json=[qualityMetric], headers=iam_headers)
print('Finished')

Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
Finished


## Insert historical quality metrics

In [82]:
token_data = {
    'grant_type': 'urn:ibm:params:oauth:grant-type:apikey',
    'response_type': 'cloud_iam',
    'apikey': AIOS_CREDENTIALS['apikey']
}

response = requests.post('https://iam.bluemix.net/identity/token', data=token_data)
iam_token = response.json()['access_token']
iam_headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer %s' % iam_token
}

measurements = [0.84, 0.81, 0.68, 0.72, 0.80, 0.84, 0.83]
for day in range(historyDays):
    print('Day', day + 1)
    for hour in range(24):
        score_time = (datetime.datetime.utcnow() + datetime.timedelta(hours=(-(24*day + hour + 1)))).strftime('%Y-%m-%dT%H:%M:%SZ')
        
        qualityMetric = {
            'metric_type': 'quality',
            'binding_id': binding_uid,
            'timestamp': score_time,
            'subscription_id': model_uid,
            'asset_revision': model_uid,
            'deployment_id': deployment_uid,
            'value': {
                'quality': measurements[day],
                'threshold': 0.75,
                'metrics': [
                    {
                        'name': 'auroc',
                        'value': measurements[day],
                        'threshold': 0.75
                    }
                ]
            }
        }

        response = requests.post(performance_metrics_url, json=[qualityMetric], headers=iam_headers)
print('Finished')

Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
Finished


## Insert historical performance metrics

In [83]:
token_data = {
    'grant_type': 'urn:ibm:params:oauth:grant-type:apikey',
    'response_type': 'cloud_iam',
    'apikey': AIOS_CREDENTIALS['apikey']
}

response = requests.post('https://iam.bluemix.net/identity/token', data=token_data)
iam_token = response.json()['access_token']
iam_headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer %s' % iam_token
}

for day in range(historyDays):
    print('Day', day + 1)
    for hour in range(24):
        score_time = (datetime.datetime.utcnow() + datetime.timedelta(hours=(-(24*day + hour + 1)))).strftime('%Y-%m-%dT%H:%M:%SZ')
        score_count = random.randint(600, 6000)
        score_resp = random.uniform(600, 3000)

        performanceMetric = {
            'metric_type': 'performance',
            'binding_id': binding_uid,
            'timestamp': score_time,
            'subscription_id': model_uid,
            'asset_revision': model_uid,
            'deployment_id': deployment_uid,
            'value': {
                'response_time': score_resp,
                'records': score_count
            }
        }

        response = requests.post(performance_metrics_url, json=[performanceMetric], headers=iam_headers)
print('Finished')

Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
Finished


## Configure Explainability

In [84]:
pd_data.head()

Unnamed: 0,WS_001_FLOW_MEAN,WS_001_FLOW_MIN,WS_001_FLOW_MAX,WS_001_CONC_MEAN,WS_001_CONC_MIN,WS_001_CONC_MAX,DMW_FLOW_MEAN,DMW_FLOW_MIN,DMW_FLOW_MAX,ALK_FLOW_MEAN,...,WS_002_FLOW_MIN,WS_002_FLOW_MAX,WS_002_CONC_MEAN,WS_002_CONC_MIN,WS_002_CONC_MAX,RPM_MEAN,RPM_MIN,RPM_MAX,PLANT_A,BAD_YIELD
0,91.544345,91.677855,91.488572,88.603772,88.634486,88.584482,88.597187,88.598325,88.595852,91.467674,...,91.535971,91.470367,91.879905,91.982337,91.770328,5043.982357,5043.563746,5044.453746,1,GOOD
1,91.636583,91.677855,91.57625,88.603805,88.63305,88.583608,88.597187,88.598325,88.595852,91.467674,...,91.553737,91.445614,91.434144,91.430998,91.429932,5043.041871,5042.263746,5043.713746,0,GOOD
2,91.221818,91.333725,91.137861,88.603715,88.631168,88.588621,88.597187,88.598325,88.595852,91.467674,...,91.552534,91.457397,91.179878,91.210463,91.089537,5044.362843,5044.023746,5044.803746,1,GOOD
3,91.519911,91.591823,91.400894,88.603665,88.634511,88.581391,88.597187,88.598325,88.595852,91.467674,...,91.546751,91.453738,91.542207,91.651534,91.543397,5031.524232,5026.753746,5035.233746,1,GOOD
4,87.107759,87.290196,87.017004,92.930565,92.951904,92.915805,92.93055,92.934958,92.91834,86.453545,...,86.690369,86.477336,86.730213,86.910021,86.777862,5043.626663,5040.483746,5044.743746,0,BAD


In [85]:
from ibm_ai_openscale.supporting_classes import *
subscription.explainability.enable(training_data=pd_data)

In [86]:
subscription.explainability.get_details()

{'enabled': True,
 'parameters': {'training_statistics': {'mins': {'12': 86.492056,
    '8': 88.59585200000002,
    '19': 3733.123746,
    '4': 85.792671,
    '15': 85.58363,
    '11': 86.425383,
    '9': 86.45354499999998,
    '13': 86.222711,
    '16': 82.030261,
    '5': 88.206787,
    '10': 86.63842,
    '6': 88.49487099999997,
    '1': 86.085741,
    '17': 85.756676,
    '14': 86.444491,
    '0': 86.611007,
    '20': 4531.633746,
    '2': 86.578615,
    '18': 4383.473746,
    '7': 88.35051700000002,
    '3': 87.744149},
   'categorical_columns': ['PLANT_A'],
   'feature_values': {'12': [3, 2, 0, 1],
    '8': [0, 1, 2],
    '19': [2, 1, 3, 0],
    '4': [2, 1, 0, 3],
    '15': [3, 2, 1, 0],
    '11': [1, 0],
    '9': [1, 0],
    '13': [1, 3, 2, 0],
    '16': [3, 2, 1, 0],
    '5': [1, 2, 0, 3],
    '10': [1, 0],
    '21': [1, 0],
    '6': [0, 1, 2],
    '1': [3, 1, 2, 0],
    '17': [3, 2, 1, 0],
    '14': [3, 2, 0, 1],
    '0': [3, 1, 2, 0],
    '20': [2, 1, 0, 3],
    '2': [2, 3, 1

## Run fairness monitor

Kick off a fairness monitor run on current data. Depending on how fast the monitor runs, the table may not contain the most recent results.

In [87]:
run_details = subscription.fairness_monitoring.run()

In [88]:
subscription.fairness_monitoring.show_table()

0,1,2,3,4,5,6,7,8,9,10
2019-05-24 17:02:29+00:00,PLANT_A,"[1, 1]",False,1.12,29.0,67c0aed8-ed70-4029-a923-a585e83408c4,02051e13-bc01-424a-b0be-199543b56d67,02051e13-bc01-424a-b0be-199543b56d67,5b3a14df-8381-400a-8814-0715dbcfe295,


## Additional data to help debugging

In [89]:
#print('Datamart:', data_mart_id)
print('Model:', model_uid)
print('Deployment:', deployment_uid)
print('Binding:', binding_uid)
print('Scoring URL:', propensity_to_buy_scoring_endpoint)

Model: 02051e13-bc01-424a-b0be-199543b56d67
Deployment: 5b3a14df-8381-400a-8814-0715dbcfe295
Binding: 67c0aed8-ed70-4029-a923-a585e83408c4
Scoring URL: https://us-south.ml.cloud.ibm.com/v3/wml_instances/67c0aed8-ed70-4029-a923-a585e83408c4/deployments/5b3a14df-8381-400a-8814-0715dbcfe295/online


## Identify transactions for Explainability

Transaction IDs identified by the cells below can be copied and pasted into the Explainability tab of the OpenScale dashboard.

In [90]:
import json, random

DEPLOYMENT_NAME = "Yield Model"
MIN_RECORDS = 1000
MAX_RECORDS = 1000

In [91]:
!rm df_payload_biased-a.json
!wget https://raw.githubusercontent.com/shadgriffin/oglabworking/master/df_payload_biased-a.json

--2019-05-24 18:03:33--  https://raw.githubusercontent.com/shadgriffin/oglabworking/master/df_payload_biased-a.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.48.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 434642 (424K) [text/plain]
Saving to: 'df_payload_biased-a.json'


2019-05-24 18:03:33 (24.7 MB/s) - 'df_payload_biased-a.json' saved [434642/434642]



In [92]:
wml_deployments = wml_client.deployments.get_details()
scoring_url = None
for deployment in wml_deployments['resources']:
    if DEPLOYMENT_NAME == deployment['entity']['name']:
        scoring_url = deployment['entity']['scoring_url']
        break
    
print("Scoring URL: {}".format(scoring_url))

Scoring URL: https://us-south.ml.cloud.ibm.com/v3/wml_instances/67c0aed8-ed70-4029-a923-a585e83408c4/deployments/5b3a14df-8381-400a-8814-0715dbcfe295/online


In [93]:
try:
    with open('df_payload_biased-a.json', 'r') as scoring_file:
        scoring_data = json.load(scoring_file)
    print('file found')
    
except:
    !wget https://raw.githubusercontent.com/shadgriffin/oglabworking/master/df_payload_biased-a.json
    with open('df_payload_biased-a.json', 'r') as scoring_file:
        scoring_data = json.load(scoring_file)
    print('file downloaded')


file found


In [94]:
fields = scoring_data['fields']
values = []
for _ in range(0, random.randint(MIN_RECORDS, MAX_RECORDS)):
    values.append(random.choice(scoring_data['values']))
payload_scoring = {"fields": fields, "values": values}

scoring_response = wml_client.deployments.score(scoring_url, payload_scoring)
print(scoring_response)

{'fields': ['WS_001_FLOW_MEAN', 'WS_001_FLOW_MIN', 'WS_001_FLOW_MAX', 'WS_001_CONC_MEAN', 'WS_001_CONC_MIN', 'WS_001_CONC_MAX', 'DMW_FLOW_MEAN', 'DMW_FLOW_MIN', 'DMW_FLOW_MAX', 'ALK_FLOW_MEAN', 'ALK_FLOW_MIN', 'ALK_FLOW_MAX', 'WS_002_FLOW_MEAN', 'WS_002_FLOW_MIN', 'WS_002_FLOW_MAX', 'WS_002_CONC_MEAN', 'WS_002_CONC_MIN', 'WS_002_CONC_MAX', 'RPM_MEAN', 'RPM_MIN', 'RPM_MAX', 'PLANT_A', 'BAD_YIELD', 'label', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'], 'values': [[91.658802, 91.677855, 91.663928, 88.60336, 88.612994, 88.583731, 88.597187, 88.598325, 88.595852, 91.467674, 91.52794, 91.439629, 91.482232, 91.553228, 91.451315, 91.487608, 91.541266, 91.429932, 5036.490889, 4993.713746, 5041.473746, 0, 'GOOD', 0.0, [91.658802, 91.677855, 91.663928, 88.60336, 88.612994, 88.583731, 88.597187, 88.598325, 88.595852, 91.467674, 91.52794, 91.439629, 91.482232, 91.553228, 91.451315, 91.487608, 91.541266, 91.429932, 5036.490889, 4993.713746, 5041.473746, 0.0], [19.99710

In [95]:
time.sleep(10)
payload_data = subscription.payload_logging.get_table_content(limit=60)
payload_data.filter(items=['scoring_id', 'predictedLabel', 'probability','PLANT_A'])

Unnamed: 0,scoring_id,predictedLabel,probability,PLANT_A
0,11334ad50e23be1d838d97e314818ab3-10,GOOD,"[0.9998553574320189, 0.00014464256798116931]",0
1,11334ad50e23be1d838d97e314818ab3-100,BAD,"[0.000847457627118644, 0.9991525423728813]",1
2,11334ad50e23be1d838d97e314818ab3-1000,GOOD,"[0.983555306349154, 0.01644469365084597]",0
3,11334ad50e23be1d838d97e314818ab3-101,GOOD,"[0.859960342768419, 0.1400396572315809]",0
4,11334ad50e23be1d838d97e314818ab3-102,GOOD,"[0.9998553574320189, 0.00014464256798116931]",1
5,11334ad50e23be1d838d97e314818ab3-103,BAD,"[0.0, 1.0]",1
6,11334ad50e23be1d838d97e314818ab3-104,BAD,"[0.3721928206477677, 0.6278071793522323]",1
7,11334ad50e23be1d838d97e314818ab3-105,GOOD,"[0.9998553574320189, 0.00014464256798116931]",1
8,11334ad50e23be1d838d97e314818ab3-106,GOOD,"[0.7476545442739654, 0.2523454557260346]",0
9,11334ad50e23be1d838d97e314818ab3-107,GOOD,"[0.7720034054815956, 0.22799659451840446]",1


## Congratulations!

You have finished the hands-on lab for IBM Watson OpenScale. You can now view the [OpenScale Dashboard](https://aiopenscale.cloud.ibm.com/). Click on the tile for the Propensity to Buy model to see fairness, accuracy, and performance monitors. Click on the timeseries graph to get detailed information on transactions during a specific time window.

