<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Custom Machine Learning engine

This notebook shows how to log the payload for the model deployed on custom model serving engine using Watson OpenScale python sdk.

Contents
- [1. Setup](#setup)
- [2. Binding machine learning engine](#binding)
- [3. Subscriptions](#subscription)
- [4. Scoring and payload logging](#scoring)
- [5. Feedback logging](#feedback)
- [6. Data Mart](#datamart)

<a id="setup"></a>
## 1. Setup

### 1.0 Sample custom machine learning engine

The sample machine learning engine based on docker image and deployment instructions can be found [here](https://github.com/IBM/monitor-custom-ml-engine-with-watson-openscale).

**NOTE:** If you use a different CUSTOM machine learning engine, it must follow this [API specification](https://aiopenscale-custom-deployement-spec.mybluemix.net/) to be supported.

### 1.1 Installation and authentication

In [33]:
!pip install ibm-ai-openscale==1.0.429 --no-cache | tail -n 1

Requirement not upgraded as not directly required: jmespath<1.0.0,>=0.7.1 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from ibm-cos-sdk-core==2.*,>=2.0.0->ibm-cos-sdk->watson-machine-learning-client->ibm-ai-openscale)


Import and initiate.

In [34]:
from ibm_ai_openscale import APIClient
from ibm_ai_openscale.supporting_classes import PayloadRecord
from ibm_ai_openscale.engines import *
from ibm_ai_openscale.utils import *

#### ACTION: Get Watson OpenScale `instance_guid` and `apikey`

You will use an instance of [Watson OpenScale](https://console.bluemix.net/catalog/services/ai-openscale).

How to install IBM Cloud (bluemix) console: [instruction](https://console.bluemix.net/docs/cli/reference/ibmcloud/download_cli.html#install_use)

How to get api key using bluemix console:

- This will contain an `API Key` used as `apikey` below

```bash
ibmcloud login --sso
ibmcloud iam api-key-create 'my_key'
```

How to get your Watson OpenScale instance GUID:

- If your resource group is different than `default`, switch to the resource group containing your Watson OpenScale instance
```bash
ibmcloud target -g <myResourceGroup>
```
- get details of the instance. This contains the GUID used as `instance_guid` below
```bash
ibmcloud resource service-instance <Watson-OpenScale-instance_name>
```

#### Let's define some constants required to set up data mart:

- WATSON_OS_CREDENTIALS
- POSTGRES_CREDENTIALS
- SCHEMA_NAME

In [35]:
WATSON_OS_CREDENTIALS = {
  "url": "https://api.aiopenscale.cloud.ibm.com",
  "instance_guid": "****",
  "apikey": "****"
}

You will use an instgance of [Compose for PostgreSQL DB](https://console.bluemix.net/catalog/services/compose-for-postgresql)

In [36]:
POSTGRES_CREDENTIALS = {
  "db_type": "postgresql",
  "uri_cli_1": "****",
  "maps": [],
  "instance_administration_api": {
    "instance_id": "****",
    "root": "****",
    "deployment_id": "****"
  },
  "name": "****",
  "uri_cli": "****",
  "uri_direct_1": "****",
  "ca_certificate_base64": "****",
  "deployment_id": "****",
  "uri": "****"
}

In [37]:
SCHEMA_NAME = 'data_mart_for_custom'

Create schema for data mart.

In [38]:
create_postgres_schema(postgres_credentials=POSTGRES_CREDENTIALS, schema_name=SCHEMA_NAME)

In [39]:
client = APIClient(WATSON_OS_CREDENTIALS)

In [40]:
client.version

'1.0.375'

### 1.2 DataMart setup

>NOTE: If you have already created a data_mart and need to delete it, uncomment and run the cell below:

In [45]:
#client.data_mart.delete()

In [46]:
client.data_mart.setup(db_credentials=POSTGRES_CREDENTIALS, schema=SCHEMA_NAME)

In [47]:
data_mart_details = client.data_mart.get_details()

<a id="binding"></a>
## 2. Bind machine learning engines

### 2.1 Bind  `CUSTOM` machine learning engine
**NOTE:** CUSTOM machine learning engine must follow this [API specification](https://aiopenscale-custom-deployement-spec.mybluemix.net/) to be supported.

Credentials support following fields:
- `url` - hostname and port (required) in the form of "http://123.45.67.890:12345"
- `username` - part of BasicAuth (optional)
- `password` - part of BasicAuth (optional)

In [114]:
CUSTOM_ENGINE_CREDENTIALS = {
    "url": "***"
}
# OR if you have BasicAuth use:
'''
CUSTOM_ENGINE_CREDENTIALS = {
    "url": "***",
    "username": "***",
    "password": "***"
}
'''

In [115]:
binding_uid = client.data_mart.bindings.add('My custom engine', CustomMachineLearningInstance(CUSTOM_ENGINE_CREDENTIALS))

In [116]:
bindings_details = client.data_mart.bindings.get_details()

In [117]:
client.data_mart.bindings.list()

0,1,2,3
f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,My custom engine,custom_machine_learning,2019-01-15T21:20:56.930Z


<a id="subsciption"></a>
## 3. Subscriptions

### 3.1 Add subscriptions

List available deployments.

#### client.data_mart.bindings.list_assets()

In [118]:
subscription = client.data_mart.subscriptions.add(
    CustomMachineLearningAsset(source_uid='action', 
                               binding_uid=binding_uid, 
                               prediction_column='predictedActionLabel'))

#### Get subscriptions list

In [119]:
subscriptions = client.data_mart.subscriptions.get_details()

In [122]:
subscriptions_uids = client.data_mart.subscriptions.get_uids()
print(subscriptions_uids)

['action']


#### List subscriptions

In [123]:
client.data_mart.subscriptions.list()

0,1,2,3,4
action,area and action prediction,model,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,2019-01-15T21:21:14.723Z


<a id="scoring"></a>
## 4. Scoring and payload logging

### 4.1 Score the action model

In [148]:
import requests
import time


request_data = {'fields': ['ID',
                              'Gender',
                              'Status',
                              'Children',
                              'Age',
                              'Customer_Status',
                              'Car_Owner',
                              'Customer_Service',
                              'Business_Area',
                              'Satisfaction'],
                             'values': [[3785,
                               'Male',
                               'S',
                               1,
                               17,
                               'Inactive',
                               'Yes',
                               'The car should have been brought to us instead of us trying to find it in the lot.',
                               'Product: Information',
                               0]]}

header = {'Content-Type': 'application/json'}
scoring_url = subscription.get_details()['entity']['deployments'][0]['scoring_endpoint']['url']

start_time = time.time()
response = requests.post(scoring_url, json=request_data, headers=header)
response_time = int((time.time() - start_time)*1000)

response_data = response.json()
print('Response: ' + str(response_data))

Response: {'fields': ['ID', 'Gender', 'Status', 'Children', 'Age', 'Customer_Status', 'Car_Owner', 'Customer_Service', 'Business_Area', 'Satisfaction', 'words', 'hash', 'area_features', 'area_label', 'rawPrediction_area', 'probability_area', 'prediction_area', 'predictedAreaLabel', 'gender_ix', 'customer_status_ix', 'status_ix', 'owner_ix', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedActionLabel'], 'labels': ['NA', 'Free Upgrade', 'On-demand pickup location', 'Voucher', 'Premium features'], 'values': [[3785, 'Male', 'S', 1, 17, 'Inactive', 'Yes', 'The car should have been brought to us instead of us trying to find it in the lot.', 'Product: Information', 0, ['the', 'car', 'should', 'have', 'been', 'brought', 'to', 'us', 'instead', 'of', 'us', 'trying', 'to', 'find', 'it', 'in', 'the', 'lot.'], [262144.0, [9639.0, 21872.0, 74079.0, 86175.0, 91878.0, 99585.0, 103838.0, 175817.0, 205044.0, 218965.0, 222453.0, 227152.0, 227431.0, 229772.0, 253475.0], [1.0, 2.0, 1.0,

### 4.2 Store the request and response in payload logging table

#### Using Python SDK

**Hint:** You can embed payload logging code into your custom deployment so it is logged automatically each time you score the model.

In [125]:
records_list = [PayloadRecord(request=request_data, response=response_data, response_time=response_time), 
                PayloadRecord(request=request_data, response=response_data, response_time=response_time)]

for i in range(1, 10):
    records_list.append(PayloadRecord(request=request_data, response=response_data, response_time=response_time))

subscription.payload_logging.store(records=records_list)

#### Using REST API

Get the token first.

In [126]:
token_endpoint = "https://iam.bluemix.net/identity/token"
headers = {
    "Content-Type": "application/x-www-form-urlencoded",
    "Accept": "application/json"
}

data = {
    "grant_type":"urn:ibm:params:oauth:grant-type:apikey",
    "apikey":WATSON_OS_CREDENTIALS["apikey"]
}

req = requests.post(token_endpoint, data=data, headers=headers)
token = req.json()['access_token']

Store the payload.

In [127]:
import requests, uuid

PAYLOAD_STORING_HREF_PATTERN = '{}/v1/data_marts/{}/scoring_payloads'
endpoint = PAYLOAD_STORING_HREF_PATTERN.format(WATSON_OS_CREDENTIALS['url'], WATSON_OS_CREDENTIALS['data_mart_id'])

payload = [{
    'binding_id': binding_uid, 
    'deployment_id': subscription.get_details()['entity']['deployments'][0]['deployment_id'], 
    'subscription_id': subscription.uid, 
    'scoring_id': str(uuid.uuid4()), 
    'response': response_data,
    'request': request_data
}]


headers = {"Authorization": "Bearer " + token}
      
req_response = requests.post(endpoint, json=payload, headers = headers)

print("Request OK: " + str(req_response.ok))

Request OK: True


<a id="feedback"></a>
## 5. Feedback logging & quality (accuracy) monitoring

### Enable quality monitoring

You need to provide the monitoring `threshold` and `min_records` (minimal number of feedback records).

In [128]:
subscription.quality_monitoring.enable(threshold=0.7, min_records=10)

### Feedback records logging

Feedback records are used to evaluate your model. The predicted values are compared to real values (feedback records).

You can check the schema of feedback table using below method.

In [129]:
subscription.feedback_logging.print_table_schema()

0,1,2
ID,integer,True
Gender,string,True
Status,string,True
Children,integer,True
Age,integer,True
Customer_Status,string,True
Car_Owner,string,True
Customer_Service,string,True
Business_Area,string,True
Satisfaction,integer,True


The feedback records can be send to feedback table using below code.

In [130]:
fields = ['ID', 'Gender', 'Status','Children', 'Age', 'Customer_Status', 'Car_Owner', 'Customer_Service', 'Business_Area', 'Satisfaction', 'label']

records = [
    [3785, 'Male', 'S', 1, 17,'Inactive', 'Yes', 'The car should have been brought to us instead of us trying to find it in the lot.', 'Product: Information', 0, 'On-demand pickup location'],
    [3785, 'Male', 'S', 1, 17,'Inactive', 'Yes', 'The car should have been brought to us instead of us trying to find it in the lot.', 'Product: Information', 0, 'On-demand pickup location']]

for i in range(1,10):
    records.append([3785, 'Male', 'S', 1, 17,'Inactive', 'Yes', 'The car should have been brought to us instead of us trying to find it in the lot.', 'Product: Information', 0, 'On-demand pickup location'])

subscription.feedback_logging.store(feedback_data=records, fields=fields)

### Run quality monitoring on demand

By default, quality monitoring is run on hourly schedule. You can also trigger it on demand using below code.

In [131]:
run_details = subscription.quality_monitoring.run()

Since the monitoring runs in the background you can use below method to check the status of the job.

In [132]:
status = run_details['status']
id = run_details['id']

print("Run status: {}".format(status))

start_time = time.time()
elapsed_time = 0

while status != 'completed' and elapsed_time < 60:
    time.sleep(10)
    run_details = subscription.quality_monitoring.get_run_details(run_uid=id)
    status = run_details['status']
    elapsed_time = time.time() - start_time
    print("Run status: {}".format(status))

Run status: running
Run status: completed


### Show the quality metrics

In [134]:
subscription.quality_monitoring.show_table()

0,1,2,3,4,5,6,7
2019-01-15 21:22:56.600000+00:00,1.0,0.7,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,action,action,Accuracy_evaluation_ffcbe77d-c557-47d9-9b02-ebd76550d6c7,


Get all calculated metrics.

In [135]:
subscription.quality_monitoring.get_metrics(deployment_uid='action')

{'end': '2019-01-15T21:29:33.618903Z',
 'metrics': [{'process': 'Accuracy_evaluation_ffcbe77d-c557-47d9-9b02-ebd76550d6c7',
   'timestamp': '2019-01-15T21:22:56.600Z',
   'value': {'metrics': [{'name': 'weightedTruePositiveRate', 'value': 1.0},
     {'name': 'accuracy', 'value': 1.0},
     {'name': 'weightedFMeasure', 'value': 1.0},
     {'name': 'weightedRecall', 'value': 1.0},
     {'name': 'weightedFalsePositiveRate', 'value': None},
     {'name': 'weightedPrecision', 'value': 1.0}],
    'quality': 1.0,
    'threshold': 0.7}}],
 'start': '2019-01-15T20:21:14.723Z'}

<a id="datamart"></a>
## 6. Get the logged data

### 6.1 Payload logging

#### Print schema of payload_logging table

In [136]:
subscription.payload_logging.print_table_schema()

0,1,2
scoring_id,string,False
scoring_timestamp,timestamp,False
deployment_id,string,False
asset_revision,string,True
ID,integer,True
Gender,string,True
Status,string,True
Children,integer,True
Age,integer,True
Customer_Status,string,True


#### Show (preview) the table

In [137]:
subscription.payload_logging.describe_table()

           ID  Children   Age  Satisfaction  area_label  prediction_area  \
count    36.0      36.0  36.0          36.0        36.0             36.0   
mean   3785.0       1.0  17.0           0.0         7.0              1.0   
std       0.0       0.0   0.0           0.0         0.0              0.0   
min    3785.0       1.0  17.0           0.0         7.0              1.0   
25%    3785.0       1.0  17.0           0.0         7.0              1.0   
50%    3785.0       1.0  17.0           0.0         7.0              1.0   
75%    3785.0       1.0  17.0           0.0         7.0              1.0   
max    3785.0       1.0  17.0           0.0         7.0              1.0   

       gender_ix  customer_status_ix  status_ix  owner_ix  prediction  
count       36.0                36.0       36.0      36.0        36.0  
mean         0.0                 1.0        1.0       1.0         2.0  
std          0.0                 0.0        0.0       0.0         0.0  
min          0.0           

#### Return the table content as pandas dataframe

In [138]:
pandas_df = subscription.payload_logging.get_table_content(format='pandas')

### 6.2 Feddback logging

Check the schema of table.

In [139]:
subscription.feedback_logging.print_table_schema()

0,1,2
ID,integer,True
Gender,string,True
Status,string,True
Children,integer,True
Age,integer,True
Customer_Status,string,True
Car_Owner,string,True
Customer_Service,string,True
Business_Area,string,True
Satisfaction,integer,True


Preview table content.

In [140]:
subscription.feedback_logging.show_table()

0,1,2,3,4,5,6,7,8,9,10,11
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2019-01-15 19:30:55.199000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2019-01-15 19:30:55.199000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2019-01-15 19:30:55.199000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2019-01-15 19:30:55.199000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2019-01-15 19:30:55.199000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2019-01-15 19:30:55.199000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2019-01-15 19:30:55.199000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2019-01-15 19:30:55.199000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2019-01-15 19:30:55.199000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2019-01-15 19:30:55.199000+00:00


Describe table (calulcate basic statistics).

In [141]:
subscription.feedback_logging.describe_table()

           ID  Children   Age  Satisfaction
count    33.0      33.0  33.0          33.0
mean   3785.0       1.0  17.0           0.0
std       0.0       0.0   0.0           0.0
min    3785.0       1.0  17.0           0.0
25%    3785.0       1.0  17.0           0.0
50%    3785.0       1.0  17.0           0.0
75%    3785.0       1.0  17.0           0.0
max    3785.0       1.0  17.0           0.0


Get table content.

In [142]:
feedback_pd = subscription.feedback_logging.get_table_content(format='pandas')

### 6.3 Quality metrics table

In [143]:
subscription.quality_monitoring.print_table_schema()

0,1,2
ts,timestamp,False
quality,float,False
quality_threshold,float,False
binding_id,string,False
subscription_id,string,False
deployment_id,string,True
process,string,False
asset_revision,string,True


In [144]:
subscription.quality_monitoring.show_table()

0,1,2,3,4,5,6,7
2019-01-15 21:22:56.600000+00:00,1.0,0.7,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,action,action,Accuracy_evaluation_ffcbe77d-c557-47d9-9b02-ebd76550d6c7,


### 6.4 Performance metrics table

In [145]:
subscription.performance_monitoring.print_table_schema()

0,1,2
ts,timestamp,False
scoring_time,float,False
scoring_records,object,False
binding_id,string,False
subscription_id,string,False
deployment_id,string,True
process,string,False
asset_revision,string,True


In [146]:
subscription.performance_monitoring.show_table()

0,1,2,3,4,5,6,7
2019-01-15 21:22:26.682446+00:00,2679.0,1,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,action,action,,
2019-01-15 21:22:26.682467+00:00,2679.0,1,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,action,action,,
2019-01-15 21:22:26.682549+00:00,2679.0,1,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,action,action,,
2019-01-15 21:22:26.682564+00:00,2679.0,1,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,action,action,,
2019-01-15 21:22:26.682501+00:00,2679.0,1,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,action,action,,
2019-01-15 21:22:26.682389+00:00,2679.0,1,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,action,action,,
2019-01-15 21:22:26.682533+00:00,2679.0,1,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,action,action,,
2019-01-15 21:22:26.682517+00:00,2679.0,1,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,action,action,,
2019-01-15 21:22:26.682484+00:00,2679.0,1,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,action,action,,
2019-01-15 21:22:26.682581+00:00,2679.0,1,f1f35824-8d9b-4ad7-a14c-6498d5f1e0a3,action,action,,


### 6.5 Data Mart measurement facts table

In [147]:
client.data_mart.get_deployment_metrics()

{'deployment_metrics': [{'asset': {'asset_id': 'action',
    'asset_type': 'model',
    'created_at': '2016-12-01T10:11:12Z',
    'name': 'area and action prediction',
    'url': 'http://169.60.16.73:31520/v1/deployments/action/online'},
   'deployment': {'created_at': '2016-12-01T10:11:12Z',
    'deployment_id': 'action',
    'deployment_rn': '',
    'deployment_type': 'online',
    'name': 'action deployment',
    'scoring_endpoint': {'request_headers': {'Content-Type': 'application/json'},
     'url': 'http://169.60.16.73:31520/v1/deployments/action/online'},
    'url': ''},
   'metrics': [{'issues': 0,
     'metric_type': 'performance',
     'timestamp': '2019-01-15T19:30:20.342121Z',
     'value': {'records': 1, 'response_time': 794.0}},
    {'issues': 0,
     'metric_type': 'performance',
     'timestamp': '2019-01-15T19:47:35.886607Z',
     'value': {'records': 1, 'response_time': 794.0}},
    {'issues': 0,
     'metric_type': 'performance',
     'timestamp': '2019-01-15T21:22:2

---

### Authors
Lukasz Cmielowski, PhD, is an Automation Architect and Data Scientist at IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.