# IBM Business Automation Workflow recommendation service with IBM Business Automation Insights and Watson Machine Learning

Artificial intelligence can be combined with business processes management in many ways. For example, AI can help transforming unstructured data into data that a process can work with, through techniques such as visual or text recognition. Assistants and bots provide a better user experience and several IBM Watson services can help achieve those goals but a business process can capture lots of business data. This notebook demonstrates how to take better benefit of this data and inject machine-learning techniques to optimize processes. If for every decision that needs to be taken as part of a business process you can get a recommendation based on the decisions that have been taken in the past in similar situations, your processes are greatly enhanced. 

## The recommendation service scenario

The scenario here is the following: imagine an insurance company which has set up a Workflow process to approve or reject insurance claims. Some of those insurance claims are simple because, typically, the amount is small and the customer's history and claim circumstances are straightforward. Such claims can be approved automatically or, at least, follow a fast approval path. Some claims are more complex and therefore their approval path includes more steps. Let us further assume that the approval decision or the decision on which path to follow is a human task and this task is captured as a task of a Workflow process. Then it becomes interesting to consider whether a machine-learning algorithm can help figure out which decision to take, based on past decisions.
This scenario can be adapted to any kind of human decision process. In the insurance claim example, the decision consists in approving or rejecting a claim, which amounts to a kind of yes-or-no decision. Such decisions can translate into a binary classification machine-learning problem. However, if the decision consists in dispatching a process into many other subprocesses, the scenario becomes a multiclass classification problem, which Business Automation Insights can also handle.

This recommendation service uses IBM Business Automation Workflow to build the claim approval process, IBM Business Automation Insights to store the business process data, and IBM Watson for artificial intelligence, in particular Watson Studio for building the machine-learning model and Watson Machine Learning for deploying that model. 


## Overview of the solution

First, a schema outlines how all the different elements and cloud services are used together to build the expected service.

![](https://raw.githubusercontent.com/Tissandier/bpm-recommendation/master/images/overview.png)


Everything starts with the business process itself, which runs in <b>IBM Business Automation Workflow</b>. As the process is running, the business data of the process, which in our scenario contains information about the insured person and the claim, is captured by the <b>Business Automation Insight (BAI)</b> service, which stores all the process operational data and in particular our claim data in HDFS. The role of this BAI service is really to capture and store this data so that the processes can be monitored and, as the name indicates, provides you with insights on the process. BAI can render various dashboards, for example to monitor the process efficiency. In this insurance claim scenario, you are more interested in the data that is associated with activities and processes rather than in the operational data. <br>
Once the data is captured in <b>HDFS</b>, it can be used to train a machine-learning model. After the model is trained with existing claim data and approval decisions, it should be able to provide recommendations on whether to approve or reject new claims.<br>
The trained model needs to be deployed, which is the role of the <b>IBM Watson Machine Learning</b> service. This service stores the machine-learning model and provides a scoring endpoint.
Finally, the scoring endpoint can be invoked by the Workflow business-management process and the result transformed into a recommendation within the process user interface.


## Learning goals

In this notebook, you will learn:

- How to load time series data, in IBM Business Automation Insights, from a specific tracking point in the Workflow process
- How to explore the format of the data and read it
- How to create an Apache® Spark machine learning pipeline, which will be the recommendation model
- How to train and evaluate the model.
- Persist a pipeline and model in Watson Machine Learning repository.
- Deploy a model for online scoring using Wastson Machine Learning API.
- Score sample scoring data by using the Watson Machine Learning API.
- How to set up the scoring to create a recommendation service in a Workflow Coach



## Setting up the solution

To illustrate how to combine all the technologies together, the notebook comes with a business process definition that you can download from here: [Download the BPM Process](https://github.com/Tissandier/bpm-recommendation/raw/master/process/Claim_Approval_Sample.twx)

To be able to run the solution that is presented in this notebook, make sure the following elements are installed:

- IBM Business Automation Workflow

- IBM Business Automation Insight 

Business Automation Insight must be installed and connected to an HDFS data lake.

- IBM Watson Machine Learning service (https://console.bluemix.net/catalog/services/machine-learning) on IBM Cloud 
You can install a free tier.

Once you have the installed the various elements, ensure you have :
- Credentials for your Workflow instance
- URI for the HDFS used by Business Automation Insights
- Watson Machine Learning credentials

## Tracking data in Business Automation Insights
Download and import the process definition from the Workflow Center.
<br>
![](https://raw.githubusercontent.com/Tissandier/bpm-recommendation/master/images/importprocess.png)
Then open the 'Claim Approval Sample' process application in Process Designer. As you explore the process application, you see one 'Claim approval' process, which has been defined as a single user task.
<br>
![](https://raw.githubusercontent.com/Tissandier/bpm-recommendation/master/images/process.png)
<br>
<br>
For this process, four classes of business data have been created: claim, customer, vehicle, and recommendation. The 'claim' business data represents the data of the insurance claim. It will reference a customer and a vehicle. The 'recommendation' object will contain information from the recommendation service that's being built. This object will be examined later.<br>
Note that this example is not intended to reflect a real claim approval system, which is notably more complex.
The claim contains information on the vehicle: the 'make', the 'type' and 'model', and the 'year' of the vehicle, information about the customer, in particular the 'creditScore' property, which represents the customer's insurance score, and information about the claim itself such as the estimated amount, the assessment that was made, and the assessor. The example uses only some of this information. <br>
<br>
Since this is not a real process, we initialize the claim object with some random data.
<br>
![](https://raw.githubusercontent.com/Tissandier/bpm-recommendation/master/images/claimdata.png)
<br>
The main task in this process is to approve or reject an insurance claim and thus to decide (based on the claim data) whether to set the 'approved' attribute of the claim to 'true' or to 'false'. 

After the approval decision is taken --that is, when the approval task is finished-- this piece of information is stored in Business Automation Insights so that it can be fed to the machine-learning model. For this purpose, a 'tracking point' is introduced after the approval task. 
The tracking point in a process is a moment when all the current status and data is sent to Business Automation Insights. 
Each tracking point can store the appropriate data. This example stores the data of the claim that the machine algorithm is to learn from. Of course, the decision value of the 'approved' property of the claim is stored, too.
<br>
![](https://raw.githubusercontent.com/Tissandier/bpm-recommendation/master/images/tracking.png)
<br>
Each tracking point stores the information that has been specified when a tracking group has been created. The tracking group is really a model of the data that needs to be stored in BAI.<br>
The tracking point definition specifies the tracking group and the mapping from the claim data to the tracking group data.
<br>
Also note the name of the tracking group: IBMBPMRSTraining_Claims, which is necessary to find the data in the next step.  


## Creating some data to train the system
At this point, it is necessary to create some data to train the system. You can continue the exercise even with few data, but you have to run the process from Process Portal 10 to 20 times.<br>
As you run the process, you can see the coach making some recommendations for you. Because at the moment no recommendation service has been created yet, those recommendations are fake.<br>
However, you should still follow those recommendations when you create the initial data because by doin so, you will create a set of initial data for which the machine-learning model will be easy to create. 

## The format of the Business Automation Insights data
After the process has run several times, events are stored in Business Automation Insights. BAI stores many different types of events but in this scenario, the events that are registered when a tracking point is reached by a process are stored as a 'bpm-timeseries' for tracking data.
Every time a process is going through the tracking point, a record is added to HDFS in the form of JSON data.<br>
In this scenario, the timeseries data is partitioned by the following elements:
- The identifier and version number of the Workflow business process application
- The tracking group identifier 

Thus, HDFS file names start with the following path:<br>
<br>
[hdfs root]/ibm-bai/bpmn-timeseries/[processAppId]/[processAppVersionId]/tracking/[trackingGroupId]
<br>
<br>
Remember, the tracking group name is IBMBPMRSTraining_Claims. To find the data, you query the various ids from the Workflow system


## How to find an application id and version, and the tracking group id

In this example, when the process is imported into the Workflow instance, the process application IDs and versions, and the tracking group ID, do not change, Therefore, to run the example, predefined IDs could be used but the demo shows how you can retrieve all the IDs by using the IBM Workflow REST API<br>
<br>You may skip this part and go directly to the next chapter.
<br>You can also refer to https://www.ibm.com/support/knowledgecenter/SS8JB4_18.0.0/com.ibm.wbpm.ref.doc/rest/bpmrest/rest_bpm_wle.htm to get more details on the REST API that is used below.<br>

Here is some Python code to set up the REST API URL. All you have to do is modify this code to specify the correct host name and credentials to access the process server REST API. Make sure you <strong>change the host and credentials to your Workflow credentials</strong>. Your Workflow system might not be accessible from this notebook. In this case, copy-paste this code and run it in a system that can access your Workflow environment.


In [None]:
import urllib3, requests, json
bpmusername='admin'
bpmpassword='admin'
bpmrestapiurl = 'https://localhost:9443/rest/bpm/wle/v1'

headers = urllib3.util.make_headers(basic_auth='{username}:{password}'.format(username=bpmusername, password=bpmpassword, verify=False))


Now you retrieve the process application ID and version number by using the 'processApps' REST API. The code below searches for the 'Claim Approval Sample' application and assumes that only one version or snapshot is installed.

In [None]:
url = bpmrestapiurl + '/processApps'
response = requests.get(url, headers=headers, verify=False)

[processApp] = [x for x in json.loads(response.text).get('data').get('processAppsList') if x.get('name') == 'Claim Approval Sample']
processAppId = processApp.get('ID')

# Note that the 5 first characters of the process app id below are removed
# because the REST API returns the process application id with a 5-letter prefix that is '2066.'.
# This prefix marks the identifier as a process application id but you won't need this prefix later.

print("the process application id: " + processAppId[5:])
snapshot = processApp.get('installedSnapshots')[0]
processAppVersionId = snapshot.get('ID')
print("the process application version id: " + processAppVersionId)

You can now retrieve the tracking group ID. For this, you call the BPM 'assets' API by using the versionId (or snapshot ID) that has just been computed. Assets are filtered so that only the definitions of the tracking group are retrieved. For this purpose, another call to the Workflow 'assets' API is necessary, using the version or snapshot identifier that has just been computed.


In [None]:
url = bpmrestapiurl + '/assets'
response = requests.get(url, headers=headers, verify=False, params={'processAppId': processAppId, 'filter': 'type=TrackingGroup' })

[trackingGroupId] = [x.get('poId') for x in json.loads(response.text).get('data').get('TrackingGroup') if x.get('name') == 'IBMBPMRSTraining_Claims']


# Note that the 3 first characters of the tracking group id below are removed
# because the REST API returns the tracking group id with a 3-letter prefix that is '14.'.
# This prefix marks the identifier as a tracking group id but you won't need this prefix later.

print('The tracking group id : ' + trackingGroupId[3:])


Now you know the processApp id and version, and the tracking group id, so that you can query data.


## Using Spark SQL to read Business Automation Insights data
Business Automation Insights stores data in HDFS. As described above, the events coming from the Workflow instance are stored in JSON files. To read the data with Spark SQL, this example uses IBM Cloud Object Storage as the HDFS server. First, a connection is created to IBM Cloud Object Storage. At this point, you need to <strong>specifiy the HDFS url below</strong>

In [10]:
from pyspark.sql import  SparkSession

hdfs_root = 'hdfs://erupt1.fyre.ibm.com'

processAppId = '638d314f-12db-43c3-9051-89f3ce992393'
processAppVersionId = '2064.4310cecf-969e-48ce-9ac3-00e73de5dfb9'
trackingGroupId = 'f1cf87ab-29ae-4b54-901a-6601b4539132'

spark = SparkSession.builder.getOrCreate()
spark.conf.set("dfs.client.use.datanode.hostname", "true")

#try:
timeseries = spark.read.json(hdfs_root + "/tmp/sample_training_data.json")
  # timeseries = spark.read.json(hdfs_root + "/ibm-bai/bpmn-timeseries/" + processAppId + '/' + '*' + '/tracking/' + trackingGroupId + '/*/*')

timeseries.createOrReplaceTempView("timeseries")
timeseries.show()
print(timeseries.count())
timeseries.printSchema()
#except:
  #print('Exception while reading data, please ensure data was created in BAI')

+--------------------+--------------------+--------------+------------+--------------------+-----------+--------------------+--------------------+------+--------------------+---------+-----------+-------------+--------------------+----------------------+------------------------------+---------------------------+--------------------+----------+--------------------+--------------------+--------------------+--------------------+----------------------+--------------------+-----------------+---------------------------+----------------------+------------+-------+
|          activityId|  activityInstanceId|  activityName|activityType|   activityVersionId|bpmCellName|         bpmSystemId|                  id|offset|            parentId|partition|performerId|performerName|processApplicationId|processApplicationName|processApplicationSnapshotName|processApplicationVersionId|   processInstanceId|sequenceId|           timestamp|       trackedFields|     trackingGroupId|   trackingGroupName|trackin

4999
root
 |-- activityId: string (nullable = true)
 |-- activityInstanceId: string (nullable = true)
 |-- activityName: string (nullable = true)
 |-- activityType: string (nullable = true)
 |-- activityVersionId: string (nullable = true)
 |-- bpmCellName: string (nullable = true)
 |-- bpmSystemId: string (nullable = true)
 |-- id: string (nullable = true)
 |-- offset: long (nullable = true)
 |-- parentId: string (nullable = true)
 |-- partition: long (nullable = true)
 |-- performerId: string (nullable = true)
 |-- performerName: string (nullable = true)
 |-- processApplicationId: string (nullable = true)
 |-- processApplicationName: string (nullable = true)
 |-- processApplicationSnapshotName: string (nullable = true)
 |-- processApplicationVersionId: string (nullable = true)
 |-- processInstanceId: string (nullable = true)
 |-- sequenceId: long (nullable = true)
 |-- timestamp: string (nullable = true)
 |-- trackedFields: struct (nullable = true)
 |    |-- approved.string: string (n

Note that the various ids for the path are specified in the JSON path. This HDFS path could also use HDFS wildcards. Here, the * character replaces any directory or file name in the path.

In [11]:
businessdata = spark.sql("SELECT trackedFields.* from timeseries")
businessdata.show()
businessdata.printSchema()

+---------------+----------------------+-------------------+------------------------+----------------------+--------------------------------+------------------+-------------------+------------------+-------------------+
|approved.string|approvedAmount.integer|creditScore.integer|duration.dayTimeDuration|estimateAmount.integer|trackingPointOccurrenceTime.date|vehicleMake.string|vehicleModel.string|vehicleType.string|vehicleYear.integer|
+---------------+----------------------+-------------------+------------------------+----------------------+--------------------------------+------------------+-------------------+------------------+-------------------+
|          false|                  3168|                375|            P13DT11H9M8S|                  3168|            2018-12-3T20:45:3...|           Peugeot|               Golf|               car|               2008|
|          false|                   509|                846|            P13DT11H9M8S|                   559|            

## Create an Apache® Spark machine-learning model

Watson Machine learning supports a growing number of IBM or open-source machine-learning and deep-learning packages. This example uses Spark ML and in particular the Random Forest Classifier algorithm. Learn now how to prepare data, create an Apache® Spark machine-learning pipeline, and train the model.

In [12]:
from pyspark.ml.feature import OneHotEncoder, StringIndexer, IndexToString, VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml import Pipeline, Model, PipelineModel

### Adaptation of data

The following code rename the columns to remove the type from it.<br>
Then, the StringIndexer method transforms the 'approved' column, which is a column of type 'string' containing only 'true' or 'false' values, into a numeric column with '0' and '1' values so that the classifier can understand it.<br>

The VectorAssembler class creates a new features column which contains the features from which to build the model.<br>
The IndexToString method transforms the prediction/classification of the model, which will be '0' and '1' values, back into "true" or "false" strings.

In [13]:
businessdata = businessdata.withColumnRenamed("approved.string", "approved")
businessdata = businessdata.withColumnRenamed("creditScore.integer", "creditScore")
businessdata = businessdata.withColumnRenamed("estimateAmount.integer", "estimateAmount")
businessdata = businessdata.withColumnRenamed("approvedAmount.integer", "approvedAmount")

features = ["approvedAmount", "creditScore", "estimateAmount"]
approvalColumn = "approved"


approvalIndexer = StringIndexer(inputCol='approved', outputCol="label").fit(businessdata)

assembler = VectorAssembler(inputCols=features, outputCol="features")

labelConverter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=approvalIndexer.labels)

###  Creating the model
The model is built from the RandomForestClassifier algorithm.

In [14]:
rf = RandomForestClassifier(labelCol="label", featuresCol="features")

In the cell below the data is splitted into training data and test data and the prediction model is trained and then tested, finally the accuracy of the model is displayed.

In [15]:
businessdata = businessdata[features+['approved']]
splitted_data = businessdata.randomSplit([0.8, 0.20], 24)
train_data = splitted_data[0]
test_data = splitted_data[1]

pipeline = Pipeline(stages=[approvalIndexer, assembler, rf, labelConverter])

model = pipeline.fit(train_data)

predictions = model.transform(test_data)
evaluator = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(predictions)

print("Accuracy = %g" % accuracy)
print("Test Error = %g" % (1.0 - accuracy))
predictions.show()

train_data.printSchema()

Accuracy = 0.932503
Test Error = 0.0674974
+--------------+-----------+--------------+--------+-----+-------------------+--------------------+--------------------+----------+--------------+
|approvedAmount|creditScore|estimateAmount|approved|label|           features|       rawPrediction|         probability|prediction|predictedLabel|
+--------------+-----------+--------------+--------+-----+-------------------+--------------------+--------------------+----------+--------------+
|             1|        150|             1|   false|  0.0|    [1.0,150.0,1.0]|[18.7605857200891...|[0.93802928600445...|       0.0|         false|
|             8|        377|             8|   false|  0.0|    [8.0,377.0,8.0]|[18.7615840522061...|[0.93807920261030...|       0.0|         false|
|            28|        943|            28|    true|  1.0|  [28.0,943.0,28.0]|[11.4182834088006...|[0.57091417044003...|       0.0|         false|
|            29|        349|            29|   false|  0.0|  [29.0,349.0,29.

## Storing the model in Watson ML
Watson machine learning is used here to store the resulting model. After the model is stored , Watson machine learning makes it possible to create an HTTP scoring endpoint, which is then used as the recommendation service.
The code below stores the created model and pipeline in Watson Machine Learning. Note that you need to <b>specify the authentication information from your instance of Watson Machine Learning service</b> in the code below.


In [16]:
!rm -rf $PIP_BUILD/watson-machine-learning-client
!pip install --upgrade watson-machine-learning-client==1.0.260

Collecting watson-machine-learning-client==1.0.260
  Using cached https://files.pythonhosted.org/packages/92/86/498c67053917adb32fe4c8d2f057c54aed3722d4ccaa8b4f3290049e7830/watson_machine_learning_client-1.0.260.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/ld/cc_krfy1255d2fzt5ys3shcr0000gn/T/pip-install-MCp1Qn/watson-machine-learning-client/setup.py", line 30, in <module>
        with open(os.path.join(this_directory, 'README.md'), encoding='utf-8') as f:
    TypeError: 'encoding' is an invalid keyword argument for this function
    
    ----------------------------------------
[31mCommand "python setup.py egg_info" failed with error code 1 in /private/var/folders/ld/cc_krfy1255d2fzt5ys3shcr0000gn/T/pip-install-MCp1Qn/watson-machine-learning-client/[0m


In [17]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

# Authenticate to Watson Machine Learning service on Bluemix.

wml_credentials = {
  "instance_id": "279bb6f6-b9d1-411b-8ad9-4a25c221ad63",
  "password": "85773bde-09f1-45f8-a616-b3ce15fc3258",
  "url": "https://us-south.ml.cloud.ibm.com",
  "username": "d83981dd-af9a-4920-9771-d2104666af6c"
}

# wml_service_path, user and wml_password can be found on Service Credentials tab of service instance created in Bluemix.

client = WatsonMachineLearningAPIClient(wml_credentials)

We can now save the model and the training data. 

In [18]:
db2_service_credentials = {
    'driver': 'com.ibm.db2.jcc.DB2Driver',
    'jdbcurl': 'jdbc:db2://dashdb-entry-yp-dal09-09.services.dal.bluemix.net:50000/BLUDB',
    'user': 'dash11358',
    'password': 'Z2_rC5ZR_mwb'
}

In [19]:
db2_credentials = {
    'jdbcurl': db2_service_credentials['jdbcurl'],
    'user': db2_service_credentials['user'],
    'password': db2_service_credentials['password']
}

In [20]:
training_data_reference = {
 "name": "Recommendation feedback",
 "connection": db2_service_credentials,
 "source": {
  "tablename": "CLAIM_APPROVAL_DATA_UPDATED",
  "type": "dashdb"
 }
}

In [21]:
model_props = {
    client.repository.ModelMetaNames.NAME: "Claim Approval Recommendation Model 3",
    client.repository.ModelMetaNames.TRAINING_DATA_REFERENCE: training_data_reference,
    client.repository.ModelMetaNames.EVALUATION_METHOD: "multiclass",
    client.repository.ModelMetaNames.EVALUATION_METRICS: [
        {
           "name": "accuracy",
           "value": accuracy,
           "threshold": 0.8
        }
    ]
}

In [22]:

published_model_details = client.repository.store_model(model=model, meta_props=model_props, training_data=train_data, pipeline=pipeline)




Here is the list of the models that are currently stored in the machine learning instance

In [23]:
client.repository.list_models()

------------------------------------  ---------------------------------------  ------------------------  --------------
GUID                                  NAME                                     CREATED                   FRAMEWORK
a1947b1e-e09d-4d1b-be91-8837b700369a  Claim Approval Recommendation Model 3    2018-09-06T16:19:45.834Z  mllib-2.1
6f0ba804-dc5c-4d5e-88d3-7a0d119d9b68  Claim Approval Recommendation Model 2    2018-08-31T11:43:26.603Z  mllib-2.1
30f7e240-5a2b-41c1-a891-929a964206a8  Claim Approval Recommendation Model      2018-08-31T10:08:22.860Z  mllib-2.1
8c02674c-a339-44e2-bc45-463d5d1fec0b  CARS4U - Satisfaction Prediction Model   2018-08-10T12:34:57.639Z  tensorflow-1.5
c682fc9e-b93b-4c8c-b5ac-c307397da6d2  CARS4U - Business Area Prediction Model  2018-08-09T09:39:17.142Z  mllib-2.1
fab0a18d-45cf-4452-8337-79ea4f5b34e7  CARS4U - Action Recommendation Model     2018-07-30T07:26:58.607Z  mllib-2.1
8eb2c5ac-cb38-48eb-872d-f77bf833c8d0  CARS4U - Business Area Predictio

In [27]:
spark_credentials= {
  "tenant_id": "sb11-da49692ff2933d-344619ea60c0",
  "tenant_id_full": "3fa7a537-a3ca-40af-9b11-da49692ff293_955ddcf4-fcaa-4754-873d-344619ea60c0",
  "cluster_master_url": "https://spark.bluemix.net",
  "tenant_secret": "0fb0f81c-9036-47fb-95c0-d7740fb89c66",
  "instance_id": "3fa7a537-a3ca-40af-9b11-da49692ff293",
  "plan": "ibm.SparkService.PayGoPersonal"
}

In [28]:
feedback_data_reference = {
 "name": "Recommendation feedback",
 "connection": db2_service_credentials,
 "source": {
  "tablename": "CLAIM_APPROVAL_FEEDBACK_DATA",
  "type": "dashdb"
 }
}

In [30]:
model_uid = client.repository.get_model_uid(published_model_details)
print(model_uid)

a1947b1e-e09d-4d1b-be91-8837b700369a


In [31]:
system_config = {
    client.learning_system.ConfigurationMetaNames.FEEDBACK_DATA_REFERENCE: feedback_data_reference,
    client.learning_system.ConfigurationMetaNames.MIN_FEEDBACK_DATA_SIZE: 10,
    client.learning_system.ConfigurationMetaNames.SPARK_REFERENCE: spark_credentials,
    client.learning_system.ConfigurationMetaNames.AUTO_RETRAIN: "conditionally",
    client.learning_system.ConfigurationMetaNames.AUTO_REDEPLOY: "always"
}

client.learning_system.setup(model_uid=model_uid, meta_props=system_config)

Status code: 400, body: {"trace":"4d215c5d4f1446bc6be5313838c4269e","errors":[{"code":"deserialization_error","message":"Incorrect input: (Source format is invalid for dashDb connection)"}]}


ApiRequestFailure: Failure during creating learning system. (PUT https://us-south.ml.cloud.ibm.com/v3/wml_instances/279bb6f6-b9d1-411b-8ad9-4a25c221ad63/published_models/a1947b1e-e09d-4d1b-be91-8837b700369a/learning_configuration)
Status code: 400, body: {"trace":"4d215c5d4f1446bc6be5313838c4269e","errors":[{"code":"deserialization_error","message":"Incorrect input: (Source format is invalid for dashDb connection)"}]}

<a id="scoring"></a>
## Deploying the model

Now that the model is stored, we need to deploy it in a runtime environement, we start by retrieving the model uid:

In [13]:
model_uid = client.repository.get_model_uid(published_model_details)
print(model_uid)

30f7e240-5a2b-41c1-a891-929a964206a8


Below we list already installed deployments, A free tier in Watson machine learning allows no more than five deployments.

In [14]:
print(client.deployments.get_uids())

['040f0329-ecf5-42b1-8314-250c5c585dd6']


We use the deployments client API to create a new deployment for our model:

In [15]:
deployment_details = client.deployments.create(asset_uid=model_uid, name='Recommendation Prediction Model')



#######################################################################################

Synchronous deployment creation for uid: '30f7e240-5a2b-41c1-a891-929a964206a8' started

#######################################################################################


INITIALIZING
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='2f9dad5c-28f5-45e4-b80d-31d9ac51d7e7'
------------------------------------------------------------------------------------------------




As part of the deployment details result is the URL that will allow us to score against the published model.

In [16]:
recommendation_url = client.deployments.get_scoring_url(deployment_details)
print(recommendation_url)

https://us-south.ml.cloud.ibm.com/v3/wml_instances/279bb6f6-b9d1-411b-8ad9-4a25c221ad63/deployments/2f9dad5c-28f5-45e4-b80d-31d9ac51d7e7/online


<a id="payload_logging"></a>
## Payload logging

In this section we configure payload logging for online scoring.


We have to get `deployment_uid` for model deployed in the IBM Cloud.

In [17]:
deployment_uid = client.deployments.get_uid(deployment_details)

We need to provide configuration for database to which scoring payload will be logged.

In [18]:
# @hidden_cell
postgres_connection = {
  'database':'compose',
  'password':"""JTUJPXLXDMBBGUGV""",
  'port':'51921',
  'host':'sl-us-south-1-portal.31.dblayer.com',
  'username':'admin'
}

**Tip:** You can use Data panel to insert postgress connection credentials.

In [19]:
payload_data_reference = {
    "type": "postgresql",
    "location": {
        "tablename": "public.claim_approval_recommendations_payload"
    },
    "connection": {
            "uri": "postgres://{username}:{password}@{host}:{port}/{database}".format(**postgres_connection)
        }
}
print(payload_data_reference)

{'connection': {'uri': 'postgres://admin:JTUJPXLXDMBBGUGV@sl-us-south-1-portal.31.dblayer.com:51921/compose'}, 'location': {'tablename': 'public.claim_approval_recommendations_payload'}, 'type': 'postgresql'}


In [20]:
payload_metadata = {client.deployments.PayloadLoggingMetaNames.PAYLOAD_DATA_REFERENCE: payload_data_reference}

Now we are ready to setup payload logging for deployed model.

In [21]:
config_details = client.deployments.setup_payload_logging(deployment_uid, meta_props=payload_metadata)
print(config_details)

{'dynamic_schema_update': False, 'payload_store': {'connection': {'host': 'sl-us-south-1-portal.31.dblayer.com', 'db': 'compose', 'uri': 'postgres://admin:JTUJPXLXDMBBGUGV@sl-us-south-1-portal.31.dblayer.com:51921/compose'}, 'location': {'tablename': 'public.claim_approval_recommendations_payload'}, 'type': 'postgresql'}, 'output_data_schema': {'fields': [{'metadata': {}, 'type': 'long', 'name': 'approvedAmount', 'nullable': True}, {'metadata': {}, 'type': 'long', 'name': 'creditScore', 'nullable': True}, {'metadata': {}, 'type': 'long', 'name': 'estimateAmount', 'nullable': True}, {'metadata': {'modeling_role': 'prediction'}, 'type': 'double', 'name': 'prediction', 'nullable': True}, {'metadata': {'modeling_role': 'prediction-probability'}, 'type': 'double', 'name': 'prediction_probability', 'nullable': True}, {'metadata': {'modeling_role': 'probability'}, 'type': {'containsNull': True, 'elementType': 'double', 'type': 'array'}, 'name': 'probability', 'nullable': True}], 'type': 'stru

### Testing the recommendation URL

You can test the scoring URL with some data to see how it works. The next step consists in using this URL from within the Workflow process.

In [22]:
import json
recommendation_data = {"fields": ["approvedAmount", "creditScore", "estimateAmount"],"values": [[200, 50, 200]]}

scoring_response = client.deployments.score(recommendation_url, recommendation_data)

print(json.dumps(scoring_response, indent=3))

{
   "fields": [
      "approvedAmount",
      "creditScore",
      "estimateAmount",
      "approved",
      "label",
      "features",
      "rawPrediction",
      "probability",
      "prediction",
      "predictedLabel"
   ],
   "values": [
      [
         200,
         50,
         200,
         "false",
         0.0,
         [
            200.0,
            50.0,
            200.0
         ],
         [
            18.34769136743643,
            1.652308632563573
         ],
         [
            0.9173845683718213,
            0.08261543162817864
         ],
         0.0,
         "false"
      ]
   ]
}


### Defining a Python Function

We define python function to be deployed in the IBM Cloud.

Put all parameters required to define python function in dictionary.


In [27]:
ai_params = {"wml_credentials": wml_credentials, 
             "recommendation_url": recommendation_url}

In [77]:
def recommendation_generator(params=ai_params):
    
    from watson_machine_learning_client import WatsonMachineLearningAPIClient

    wml_credentials = params["wml_credentials"]
    recommendation_url = params["recommendation_url"]

    client = WatsonMachineLearningAPIClient(wml_credentials)

    def recommend(payload):
        """Python function with model version.
{\"fields\": [\"approvedAmount\", \"creditScore\", \"estimateAmount\"],\"values\": [["+
             tw.local.claim.approvedAmount+","+tw.local.claim.customer.CreditScore+","+tw.local.claim.estimateAmount+"]]}
        Example:
          {"fields": ["approvedAmount", "creditScore", "estimateAmount"],
           "values": [[2624, 20, 2800]]}
        """
        
        result = []
        scores_area = client.deployments.score(recommendation_url, payload)
        predictedLabelIndex = scores_area['fields'].index("predictedLabel");
        probabilityIndex = scores_area['fields'].index("probability");
        for idx in range(0, len(scores_area['values'])):
           predictedLabel = scores_area['values'][idx][predictedLabelIndex];
           probability = scores_area['values'][idx][probabilityIndex];
           probability = round(max(probability[0], probability[1]) *100)
           result = result + [{'recommendation': predictedLabel, 'probability': probability}]
        
        return result

    return recommend

local test of the Python Function

In [78]:
sample_payload = {"fields": ["approvedAmount", "creditScore", "estimateAmount"],
                  "values": [[2624, 20, 2800]]}     

In [79]:
recommend = recommendation_generator()
recommendations_ai = recommend(sample_payload)
print(recommendations_ai)

[{'recommendation': 'false', 'probability': 95}]


### Store the python function in the repository

In this section we store AI function to Watson machine Learning repository.

In [81]:
runtime_meta = {
            client.runtime_specs.ConfigurationMetaNames.NAME: "Basic runtime specification",
            client.runtime_specs.ConfigurationMetaNames.DESCRIPTION: "Runtime for Python function",
            client.runtime_specs.ConfigurationMetaNames.PLATFORM: {
               "name": "python",
               "version": "3.5"
             }}


In [82]:
runtime_details = client.runtime_specs.create(runtime_meta)
runtime_url = client.runtime_specs.get_url(runtime_details)
print(runtime_url)

https://us-south.ml.cloud.ibm.com/v4/runtimes/91f30840-7b9e-4533-b2b9-f3918f63403c


In [84]:
meta_data = {
    client.repository.FunctionMetaNames.NAME: 'Claim Approval - Recommendation - Python Function',
    client.repository.FunctionMetaNames.RUNTIME_URL: runtime_url
}

function_details = client.repository.store_function(meta_props=meta_data, function=recommendation_generator)

Recognized generator function.


In [85]:
ai_function_uid = client.repository.get_function_uid(function_details)
print(ai_function_uid)

90b4de5a-0dc4-45b1-9a05-cc3e845fb2d1


### Deploying the Python Function

In this section we deploy AI function in the IBM Cloud and test created deployment using sample payload.

In [87]:


function_deployment_details = client.deployments.create(ai_function_uid, "Claim Approval - Recommendation - Python Function deployment")



#######################################################################################

Synchronous deployment creation for uid: '90b4de5a-0dc4-45b1-9a05-cc3e845fb2d1' started

#######################################################################################


INITIALIZING
DEPLOY_IN_PROGRESS.
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='508ab6ab-acc6-4311-88ba-4c0ad5ea6b3f'
------------------------------------------------------------------------------------------------




### AI function online deployment test

In [88]:
recommendation_url  = client.deployments.get_scoring_url(function_deployment_details)

recommendation_results = client.deployments.score(recommendation_url, sample_payload)
print(recommendation_results)



[{'recommendation': 'false', 'probability': 95}]


## Invoking the recommendation Python Function from the Workflow process


To display a recommendation for a decision on a claim within the Workflow process itself, invoke the Recommendation service from a Workflow service. 
<br>
If you go back to the installed process application, you can see a service flow called 'Invoke Watson ML Service Flow'. This is the service that calls the recommendation REST endpoint .
<br>
![](https://raw.githubusercontent.com/Tissandier/bpm-recommendation/master/images/invocationscript.png)
<br>
As soon as you specify your credentials to Watson Machine Learning and the recommendation URL in this script, the Workflow process displays recommendations from the Spark machine-learning model.

<br>
The result of the recommendation service is displayed in the process UI (the coach) after the service has been called. In the picture below, you see that the coach contains two different parts, one for the 'I recommend' and another one for 'I do not recommend', the visibility of each portion depends on the result of the recommendation service.

![](https://raw.githubusercontent.com/Tissandier/bpm-recommendation/master/images/coach.png)

The system is now ready to return recommendations about the insurance claim.

![](https://raw.githubusercontent.com/Tissandier/bpm-recommendation/master/images/reco.png)


In [3]:
client.training.get_frameworks()
client.training.

{'frameworks': [{'name': 'spark', 'version': '2.0.1'},
  {'name': 'spark', 'version': '2.1'},
  {'name': 'spark', 'version': '2.3'},
  {'name': 'scikit-learn', 'version': '0.17'},
  {'name': 'tensorflow',
   'runtimes': [{'name': 'python', 'version': '3.5'}],
   'version': '1.5'},
  {'name': 'pytorch',
   'runtimes': [{'name': 'python', 'version': '3.6'}],
   'version': '0.3'},
  {'name': 'caffe',
   'runtimes': [{'name': 'python', 'version': '3.5'}],
   'version': '1.0'},
  {'name': 'tensorflow-horovod',
   'runtimes': [{'name': 'python', 'version': '3.5'}],
   'version': '1.5'},
  {'name': 'tensorflow-ddl',
   'runtimes': [{'name': 'python', 'version': '3.5'}],
   'version': '1.5'}]}

### Conclusion

This notebook is designed to help you understand how to create a recommendation service for your Workflow Process with Watson Machine Learning and Business Automation Insights. You are encouraged to explore more possibilities of Watson Studio and Watson Machine Learning, in particular the capability to retrain the model when more data becomes available.


Author: Emmanuel Tissandier is a Senior Technical Staff Member and architect in the Business Automation team in the IBM France Lab.