# ML-BPMN Getting Started with scikit-learn, PMML and Camunda

*... a tutorial for students in the FHNW, written by [Andreas Martin, PhD](https://andreasmartin.ch).*

|[![deepnote](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://deepnote.com/launch?url=https%3A%2F%2Fgithub.com%2FAI4BP%2Fainotes%2Fblob%2Fmain%2Fipynb%2Fexpense-authorization-process%2Fexpense-authorization.ipynb)|[![Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AI4BP/ainotes/blob/main/ipynb/expense-authorization-process/expense-authorization.ipynb)|[![Gitpod](https://img.shields.io/badge/Gitpod-Run%20in%20VS%20Code-908a85?logo=gitpod)](https://gitpod.io/#https://github.com/AI4BP/ainotes/)|[![GitHub.dev](https://img.shields.io/badge/github.dev-Open%20in%20VS%20Code-908a85?logo=github)](https://github.dev/AI4BP/ainotes/blob/main/ipynb/expense-authorization-process/expense-authorization.ipynb)|
|-|-|-|-|

This short tutorial is intended to provide a straight forward introduction to machine learning using the widely used Python library **scikit-learn** (aka sklearn).

> Trivia: The name *SciKit* is derived from its original intention being a SciPy Toolkit. SciPy is another Python library for scientific computing.

Sklearn enjoys huge popularity when it comes to classic machine learning methods; it is well documented, has a large developer community and besides the official documentation there are plenty of other good resources for the ML toolkit available on the web.

> Sklearn is intended for **classical ML** and **not for Deep Learning**, although a Multi-layer Perceptron (MLP), for example, can be trained. Since Sklearn does **not support GPUs**, it is not suitable for large-scale applications.

## Data and Use Case
This tutorial uses historical data from an expense reporting and audit process — the **expense authorization process** is depicted in the following Fig a:

![](https://github.com/AI4BP/ainotes/raw/main/sklearn-getting-started/ipynb/images/expense-authorization-sklearn.png)

**Fig a**: expense authorization process

This (possibly synthetic) data was collected by humans, which is approved or not based on the expense **category**, **urgency**, **target price** and actual **price** paid.

> This use case has been inspired by an example/article of Donato Marrazzo (Red Hat, Inc.). He provided data, in a related GitHub repository [[1]](https://github.com/dmarrazzo/rhdm-dmn-pmml-order), of an **expense approval process**, which is used here in this tutorial, along with a series of articles ([[2]](https://developers.redhat.com/blog/2021/01/14/knowledge-meets-machine-learning-for-smarter-decisions-part-1#conclusion) and [[3]](https://developers.redhat.com/blog/2021/01/22/knowledge-meets-machine-learning-for-smarter-decisions-part-2#conclusion)).

### 🚧 Main Task
The task in this tutorial is to train an ML model step by step and then generate a PMML file, which then implements the `Approve expense order (PMML)` activity depicted in Fig a.

## 0. Initialization Configuration
In the following there is some code for initialization. For example, the URL to the data `url_data` and the BPMN/DMN models `url_modelling` is set.

In [None]:
import os

url_github = "https://raw.githubusercontent.com/AI4BP/ainotes/main"
project_folder = "sklearn-getting-started"
working_dir = os.path.normpath(os.getcwd()+"/../")
url_data = "data"
url_data = f"{working_dir}/{url_data}" if os.path.exists(f"{working_dir}/{url_data}") else f"{url_github}/{project_folder}/{url_data}"
print(url_data)
url_modelling = "modelling"
url_modelling = f"{working_dir}/{url_modelling}" if os.path.exists(f"{working_dir}/{url_modelling}") else f"{url_github}/{project_folder}/{url_modelling}"
print(url_modelling)

## 13. Decision Task and PMML Connector
With the configuration insights from the PMML API, one can now configure the decision task within the BPMN model by using the Camunda Modeler. The classroom instantiation has been extended with a specific Camunda Connector for PMML, which can be used to execute a PMML model, provide input data from workflow variables and pass on output data (predictions) to workflow variables. Fig 13 shows the configurations required on the `expense-authorization-sklearn-init.bpmn` ([download BPMN model - here](https://ghcdn.rawgit.org/AI4BP/ainotes/main/ipynb/expense-authorization-process/modelling/expense-authorization-sklearn-init.bpmn)) process.

![](https://github.com/AI4BP/ainotes/raw/main/sklearn-getting-started/ipynb/images/expense-authorization-camunda-pmml.png)

**Fig 13**: PMML-connector configuration in Camunda Modeler

## 14. DMN Variable Mapping and Deployment
In step 2 we mapped the text values to numerical values. We now have to map these in the process as shown in Fig 14a, as we are working with strings in the workflow.

![](https://github.com/AI4BP/ainotes/raw/main/sklearn-getting-started/ipynb/images/automatic-approval-mapping-urgency.png)

**Fig 14a**: variable mapping in DMN

Then download the `automatic-approval-mapping.dmn` ([download DMN model - here](https://ghcdn.rawgit.org/AI4BP/ainotes/main/ipynb/expense-authorization-process/modelling/automatic-approval-mapping.dmn)).

Finally, you can upload the entire package as shown in Fig 14b, consisting of the BPMN, the DMN and the PMML file to the server and start the process.

![](https://raw.githubusercontent.com/AI4BP/ainotes/main/sklearn-getting-started/ipynb/images/expense-authorization-camunda-pmml-deploy.png)

**Fig 14b**: deployment of the a BPMN, DMN and PMML file with the Camunda Modeler

### 🔀 Alternative Way
Instead of using the Camunda Modeler to configure the rule task in the BPMN model, we can deploy the entire package including the pre-configured BPMN file by executing the following code.

> You may need to change the `tenant-id` first.

In [None]:
import requests

bpmn_file_name = "expense-authorization-sklearn.bpmn"
bpmn_file = requests.get(f"{url_modelling}/{bpmn_file_name}")
dmn_file_name = "automatic-approval-mapping.dmn"
dmn_file = requests.get(f"{url_modelling}/{dmn_file_name}")

files = {
    pmml_file_name: open(pmml_file_name, "rb"),
    bpmn_file_name: bpmn_file.content,
    dmn_file_name: dmn_file.content,
}

request_data = {
    "tenant-id": "showcase",  # please change the tenant-id
}

response = requests.post(camunda_eninge_rest, files=files, data=request_data)
deployment_id = response.json()["id"]

print(deployment_id)

## 15. Conclusion
🎉 Now, finally, we can instantiate and run the process from the Camunda Platform task list.

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=d0fd9439-c605-456d-a503-a84c619fa8f8' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>