<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Custom Machine Learning engine

This notebook shows how to log the payload for the model deployed on custom model serving engine using Watson OpenScale python sdk.

Contents
- [1. Setup](#setup)
- [2. Binding machine learning engine](#binding)
- [3. Subscriptions](#subscription)
- [4. Scoring and payload logging](#scoring)
- [5. Feedback logging](#feedback)
- [6. Data Mart](#datamart)

<a id="setup"></a>
## 1. Setup

### 1.0 Sample custom machine learning engine

The sample machine learning engine based on docker image and deployment instructions can be found [here](https://github.com/IBM/monitor-custom-ml-engine-with-watson-openscale).

**NOTE:** If you use a different CUSTOM machine learning engine, it must follow this [API specification](https://aiopenscale-custom-deployement-spec.mybluemix.net/) to be supported.

### 1.1 Installation and authentication

In [None]:
!pip install ibm-ai-openscale==2.1.1 --no-cache | tail -n 1

Import and initiate.

In [None]:
from ibm_ai_openscale import APIClient
from ibm_ai_openscale.supporting_classes import PayloadRecord
from ibm_ai_openscale.engines import *
from ibm_ai_openscale.utils import *

#### ACTION: Get Watson OpenScale `instance_guid` and `apikey`

You will use an instance of [Watson OpenScale](https://console.bluemix.net/catalog/services/ai-openscale).

How to install IBM Cloud (bluemix) console: [instruction](https://console.bluemix.net/docs/cli/reference/ibmcloud/download_cli.html#install_use)

How to get api key using bluemix console:

- This will contain an `API Key` used as `apikey` below

```bash
ibmcloud login --sso
ibmcloud iam api-key-create 'my_key'
```

How to get your Watson OpenScale instance GUID:

- If your resource group is different than `default`, switch to the resource group containing your Watson OpenScale instance
```bash
ibmcloud target -g <myResourceGroup>
```
- get details of the instance. This contains the GUID used as `instance_guid` below
```bash
ibmcloud resource service-instance <Watson-OpenScale-instance_name>
```

#### Let's define some constants required to set up data mart:

- WATSON_OS_CREDENTIALS
- POSTGRES_CREDENTIALS
- SCHEMA_NAME

In [None]:
WATSON_OS_CREDENTIALS = {
  "url": "https://api.aiopenscale.cloud.ibm.com",
  "instance_guid": "****",
  "apikey": "****"
}

You will use an instance of [Databases for PostgreSQL DB](https://console.bluemix.net/catalog/services/databases-for-postgresql)

In [None]:
POSTGRES_CREDENTIALS = {
  "connection": {
    "cli": {
      "arguments": [
        [
          "host=****.databases.appdomain.cloud port=31173 dbname=ibmclouddb user=**** sslmode=verify-full"
        ]
      ],
      "bin": "psql",
      "certificate": {
        "certificate_base64": "****",
        "name": "****"
      },
      "composed": [
        "PGPASSWORD=**** PGSSLROOTCERT=**** psql 'host=****.databases.appdomain.cloud port=31173 dbname=ibmclouddb user=ibm_cloud_*** sslmode=verify-full'"
      ],
      "environment": {
        "PGPASSWORD": "****",
        "PGSSLROOTCERT": "****"
      },
      "type": "cli"
    },
    "postgres": {
      "authentication": {
        "method": "direct",
        "password": "****",
        "username": "ibm_cloud_****"
      },
      "certificate": {
        "certificate_base64": "****",
        "name": "****"
      },
      "composed": [
        "postgres://ibm_cloud_***:***.****.databases.appdomain.cloud:31173/ibmclouddb?sslmode=verify-full"
      ],
      "database": "ibmclouddb",
      "hosts": [
        {
          "hostname": "****.databases.appdomain.cloud",
          "port": 31173,
          "protocol": "postgres"
        }
      ],
      "path": "/ibmclouddb",
      "query_options": {
        "sslmode": "verify-full"
      },
      "scheme": "postgres",
      "type": "uri"
    }
  },
  "instance_administration_api": {
    "deployment_id": "crn:v1:bluemix:public:databases-for-postgresql:us-south:a/****::",
    "instance_id": "crn:v1:bluemix:public:databases-for-postgresql:us-south:a/****::",
    "root": "https://api.******.databases.cloud.ibm.com/v4/ibm"
  }
}

In [None]:
SCHEMA_NAME = 'data_mart_for_custom'

Create schema for data mart.

In [None]:
create_postgres_schema(postgres_credentials=POSTGRES_CREDENTIALS, schema_name=SCHEMA_NAME)

In [None]:
client = APIClient(WATSON_OS_CREDENTIALS)

In [None]:
client.version

### 1.2 DataMart setup

>NOTE: If you have already created a data_mart and need to delete it, uncomment and run the cell below:

In [None]:
#client.data_mart.delete()

In [None]:
client.data_mart.setup(db_credentials=POSTGRES_CREDENTIALS, schema=SCHEMA_NAME)

In [None]:
data_mart_details = client.data_mart.get_details()

<a id="binding"></a>
## 2. Bind machine learning engines

### 2.1 Bind  `CUSTOM` machine learning engine
**NOTE:** CUSTOM machine learning engine must follow this [API specification](https://aiopenscale-custom-deployement-spec.mybluemix.net/) to be supported.

Credentials support following fields:
- `url` - hostname and port (required) in the form of "http://123.45.67.890:12345"
- `username` - part of BasicAuth (optional)
- `password` - part of BasicAuth (optional)

In [None]:
CUSTOM_ENGINE_CREDENTIALS = {
    "url": "http://***:***"
}
# OR if you have BasicAuth use:
'''
CUSTOM_ENGINE_CREDENTIALS = {
    "url": "***",
    "username": "***",
    "password": "***"
}
'''

In [None]:
binding_uid = client.data_mart.bindings.add('My custom engine', CustomMachineLearningInstance(CUSTOM_ENGINE_CREDENTIALS))

In [None]:
bindings_details = client.data_mart.bindings.get_details()

In [None]:
client.data_mart.bindings.list()

<a id="subsciption"></a>
## 3. Subscriptions

### 3.1 Add subscriptions

List available deployments.

#### client.data_mart.bindings.list_assets()

In [None]:
subscription = client.data_mart.subscriptions.add(
    CustomMachineLearningAsset(source_uid='action', 
                               binding_uid=binding_uid, 
                               label_column='label',
                               prediction_column='predictedActionLabel'))

#### Get subscriptions list

In [None]:
subscriptions = client.data_mart.subscriptions.get_details()

In [None]:
subscriptions_uids = client.data_mart.subscriptions.get_uids()
print(subscriptions_uids)

#### List subscriptions

In [None]:
client.data_mart.subscriptions.list()

<a id="scoring"></a>
## 4. Scoring and payload logging

### 4.1 Score the action model

In [None]:
import requests
import time


request_data = {'fields': ['ID',
                              'Gender',
                              'Status',
                              'Children',
                              'Age',
                              'Customer_Status',
                              'Car_Owner',
                              'Customer_Service',
                              'Business_Area',
                              'Satisfaction'],
                             'values': [[3785,
                               'Male',
                               'S',
                               1,
                               17,
                               'Inactive',
                               'Yes',
                               'The car should have been brought to us instead of us trying to find it in the lot.',
                               'Product: Information',
                               0]]}

header = {'Content-Type': 'application/json'}
scoring_url = subscription.get_details()['entity']['deployments'][0]['scoring_endpoint']['url']

start_time = time.time()
response = requests.post(scoring_url, json=request_data, headers=header)
response_time = int((time.time() - start_time)*1000)

response_data = response.json()
print('Response: ' + str(response_data))

### 4.2 Store the request and response in payload logging table

#### Using Python SDK

**Hint:** You can embed payload logging code into your custom deployment so it is logged automatically each time you score the model.

In [None]:
records_list = [PayloadRecord(request=request_data, response=response_data, response_time=response_time), 
                PayloadRecord(request=request_data, response=response_data, response_time=response_time)]

for i in range(1, 10):
    records_list.append(PayloadRecord(request=request_data, response=response_data, response_time=response_time))

subscription.payload_logging.store(records=records_list)

#### Using REST API

Get the token first.

In [None]:
token_endpoint = "https://iam.bluemix.net/identity/token"
headers = {
    "Content-Type": "application/x-www-form-urlencoded",
    "Accept": "application/json"
}

data = {
    "grant_type":"urn:ibm:params:oauth:grant-type:apikey",
    "apikey":WATSON_OS_CREDENTIALS["apikey"]
}

req = requests.post(token_endpoint, data=data, headers=headers)
token = req.json()['access_token']

Store the payload.

In [None]:
import requests, uuid

PAYLOAD_STORING_HREF_PATTERN = '{}/v1/data_marts/{}/scoring_payloads'
endpoint = PAYLOAD_STORING_HREF_PATTERN.format(WATSON_OS_CREDENTIALS['url'], WATSON_OS_CREDENTIALS['data_mart_id'])

payload = [{
    'binding_id': binding_uid, 
    'deployment_id': subscription.get_details()['entity']['deployments'][0]['deployment_id'], 
    'subscription_id': subscription.uid, 
    'scoring_id': str(uuid.uuid4()), 
    'response': response_data,
    'request': request_data
}]


headers = {"Authorization": "Bearer " + token}
      
req_response = requests.post(endpoint, json=payload, headers = headers)

print("Request OK: " + str(req_response.ok))

<a id="feedback"></a>
## 5. Feedback logging & quality (accuracy) monitoring

### Enable quality monitoring

You need to provide the monitoring `threshold` and `min_records` (minimal number of feedback records).

In [None]:
subscription.quality_monitoring.enable(threshold=0.7, min_records=10)

### Feedback records logging

Feedback records are used to evaluate your model. The predicted values are compared to real values (feedback records).

You can check the schema of feedback table using below method.

In [None]:
subscription.feedback_logging.print_table_schema()

The feedback records can be send to feedback table using below code.

In [None]:
fields = ['ID', 'Gender', 'Status','Children', 'Age', 'Customer_Status', 'Car_Owner', 'Customer_Service', 'Business_Area', 'Satisfaction', 'label']

records = [
    [3785, 'Male', 'S', 1, 17,'Inactive', 'Yes', 'The car should have been brought to us instead of us trying to find it in the lot.', 'Product: Information', 0, 'On-demand pickup location'],
    [3785, 'Male', 'S', 1, 17,'Inactive', 'Yes', 'The car should have been brought to us instead of us trying to find it in the lot.', 'Product: Information', 0, 'On-demand pickup location']]

for i in range(1,10):
    records.append([3785, 'Male', 'S', 1, 17,'Inactive', 'Yes', 'The car should have been brought to us instead of us trying to find it in the lot.', 'Product: Information', 0, 'On-demand pickup location'])

subscription.feedback_logging.store(feedback_data=records, fields=fields)

### Run quality monitoring on demand

By default, quality monitoring is run on hourly schedule. You can also trigger it on demand using below code.

In [None]:
run_details = subscription.quality_monitoring.run()

Since the monitoring runs in the background you can use below method to check the status of the job.

In [None]:
status = run_details['status']
id = run_details['id']

print("Run status: {}".format(status))

start_time = time.time()
elapsed_time = 0

while status != 'completed' and elapsed_time < 60:
    time.sleep(10)
    run_details = subscription.quality_monitoring.get_run_details(run_uid=id)
    status = run_details['status']
    elapsed_time = time.time() - start_time
    print("Run status: {}".format(status))

### Show the quality metrics

In [None]:
subscription.quality_monitoring.show_table()

Get all calculated metrics.

In [None]:
subscription.quality_monitoring.get_metrics(deployment_uid='action')

<a id="datamart"></a>
## 6. Get the logged data

### 6.1 Payload logging

#### Print schema of payload_logging table

In [None]:
subscription.payload_logging.print_table_schema()

#### Show (preview) the table

In [None]:
subscription.payload_logging.describe_table()

#### Return the table content as pandas dataframe

In [None]:
pandas_df = subscription.payload_logging.get_table_content(format='pandas')

### 6.2 Feddback logging

Check the schema of table.

In [None]:
subscription.feedback_logging.print_table_schema()

Preview table content.

In [None]:
subscription.feedback_logging.show_table()

Describe table (calulcate basic statistics).

In [None]:
subscription.feedback_logging.describe_table()

Get table content.

In [None]:
feedback_pd = subscription.feedback_logging.get_table_content(format='pandas')

### 6.3 Quality metrics table

In [None]:
subscription.quality_monitoring.print_table_schema()

In [None]:
subscription.quality_monitoring.show_table()

### 6.4 Performance metrics table

In [None]:
subscription.performance_monitoring.print_table_schema()

In [None]:
subscription.performance_monitoring.show_table()

### 6.5 Data Mart measurement facts table

In [None]:
client.data_mart.get_deployment_metrics()

---

### Authors
Lukasz Cmielowski, PhD, is an Automation Architect and Data Scientist at IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.