<table align="center">

  <td align="center"><a target="_blank" href="https://colab.research.google.com/github/a-rebmann/nlp4bpa/blob/main/nlp4bpa_tutorial_2023.ipynb">
        <img src="https://i.ibb.co/2P3SLwK/colab.png"  style="padding-bottom:5px;" />Run in Google Colab</a></td>
</table>

# NLP for BPA - Hands-on Exercises

## Outline

### 1. Event log and Process Model Analysis
    1.1 Importing and analyzing an event log with GPT4 and pm4py
    1.2 Custom task: action and object extraction from activity labels using LLMs
    1.3 Potential use cases of the custom task.

### 2. Analyzing textual process descriptions
    Imperative model extraction from text
    
### 3. Future of NLP for BPA

#### API key

In [None]:
import os
os.environ["OPENAI_API_KEY"] = 
os.environ["OPENAI_API_ORG"] = 

#### Required installs

In [2]:
# Required installs

!pip install -q pm4py==2.7.3
!pip install -q spacy
!pip install -q spacy-transformers
!pip install -q openai
!python -m spacy download en_core_web_sm
!python -m pip install spacy-llm

Collecting en-core-web-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')




### 1.1 Importing and analyzing an event log with GPTs and pm4py

In this part, we will use [GPT-4](https://openai.com) and [pm4py](https://pm4py.fit.fraunhofer.de) to analyze a real-life event log. 

<small>
Alessandro Berti, Daniel Schuster, and Wil M. P. van der Aalst: Abstractions, Scenarios, and Prompt Definitions for Process Mining with LLMs: A Case Study. In: BPM 2023 Workshops.
</small>

#### Importing an event log

In [3]:
# Importing an example event log

import pm4py
travel_event_log = pm4py.read_xes("content/DomesticDeclarations.xes")

parsing log, completed traces ::   0%|          | 0/10500 [00:00<?, ?it/s]

#### Describing the process captured in an event log

In [5]:
ans_desc = pm4py.openai.describe_process(travel_event_log, openai_model="gpt-3.5-turbo", api_key=os.environ["OPENAI_API_KEY"])
print(ans_desc)

"This process describes a series of actions or steps within an employee expense report or declaration approval workflow. It seems to occur in a corporate or administrative environment, where employees submit declarations of their expenses, which then go through a series of reviews and approvals by multiple parties, such as the supervisor, the administration, and the budget owner.\n\nThe process begins with an employee submitting a declaration, which can then follow different paths of approval or rejection by different parties. \n\n- Most frequently, the declaration is approved by the Administration and then finally approved by the Supervisor, after which the payment is requested and handled. \n- Another common route sees the declaration approved by the Administration, then the Budget Owner and finally the Supervisor.\n- Occasionally, the submission from the employee goes directly to final approval by the Supervisor.\n- The process also accounts for situations where the declaration is r

#### Checking for potentially undesired behavior

In [6]:
ans_ad = pm4py.openai.anomaly_detection(travel_event_log, openai_model="gpt-4", api_key=os.environ["OPENAI_API_KEY"])
print(ans_ad)

1. Most Serious Anomaly: "Declaration SUBMITTED by EMPLOYEE -> Declaration REJECTED by ADMINISTRATION -> Declaration REJECTED by EMPLOYEE -> Declaration SUBMITTED by EMPLOYEE -> Declaration REJECTED by ADMINISTRATION -> Declaration REJECTED by EMPLOYEE -> Declaration SUBMITTED by EMPLOYEE -> Declaration REJECTED by ADMINISTRATION -> Declaration REJECTED by EMPLOYEE -> Declaration SUBMITTED by EMPLOYEE -> Declaration APPROVED by ADMINISTRATION -> Declaration FINAL_APPROVED by SUPERVISOR -> Request Payment -> Payment Handled ( frequency = 4  performance = 3502608.25 )" This process variant presents the greatest anomaly as it has a significant amount of repeated iterations for the rejection and submission of the declaration. Although it only happens 4 times, it implies a great deal of rework which results in a high performance cost and potentially causing a delay in the final completion of the process.

2. Second Most Serious Anomaly: "Declaration SUBMITTED by EMPLOYEE -> Declaration REJE

### 1.2 Implementing a custom task using GPTs
In this part, we will focus on an NPL task in the context of business process analysis.
We will show how such a task can be implemented using LLMs without any fine-tuning.

As an example, we will focus on the extraction of business objecs and actions applied to these from activity or event labels. 
The automated analysis of such labels based on traditional NLP techniques and based on transformers has been actively researched. It enables many downstream pre-processing tasks, such as the cleaning/standardization of activity labels and the automated assessment of the type of activity that is performed.

In [21]:
task_prompt = """You are an expert activity label tagger system. 
Your task is to accept activity labels such as 'create purchase order' as input and provide a list of pairs, where each pair consists of an action and the object it is applied on. For 'create purchase order', you would return [('create', 'purchase oder')]. If actions are not provided as verbs, change them into verbs. For 'purchase order creation' you would hence return ('create', 'purchase order') as well. Also turn past tense actions into present tense ones, i.e. 'purchase order created' becomes ('create', 'purchase order') too. If there is additional information, e.g., about who is performing the action orabout an IT system that is involved, discard that. 

Do not put any other text in your answer, only a (possibly empty) list of pairs with nothing before or after. In each pair the action comes first, followed by the object (if any).
If the activity label does not contain any actions, return an empty list , ie., []

Here is the activity label that shall be tagged.
Text:
"""

def extract_object_action_pairs_from_label(label, model="gpt-3.5-turbo"):
    label_promt = task_prompt + "'" + label.lower() + "''"
    import openai
    openai.api_key = os.environ["OPENAI_API_KEY"]
    messages = [{"role": "user", "content": label_promt}]
    response = openai.ChatCompletion.create(model=model, messages=messages)
    return response["choices"][0]["message"]["content"]
    
extract_object_action_pairs_from_label("notification letter creation and approval")

"[('create', 'notification letter'), ('approve', 'notification letter')]"

### 1.3 Potential use cases of the custom task

Next, we use the implementation of our custom task for preprocessing our event log

Let's apply the function to the event labels of our event log.

In [22]:
unique_labels = set(travel_event_log["concept:name"].unique())
unique_labels

{'Declaration APPROVED by ADMINISTRATION',
 'Declaration APPROVED by BUDGET OWNER',
 'Declaration APPROVED by PRE_APPROVER',
 'Declaration FINAL_APPROVED by SUPERVISOR',
 'Declaration FOR_APPROVAL by ADMINISTRATION',
 'Declaration FOR_APPROVAL by PRE_APPROVER',
 'Declaration FOR_APPROVAL by SUPERVISOR',
 'Declaration REJECTED by ADMINISTRATION',
 'Declaration REJECTED by BUDGET OWNER',
 'Declaration REJECTED by EMPLOYEE',
 'Declaration REJECTED by MISSING',
 'Declaration REJECTED by PRE_APPROVER',
 'Declaration REJECTED by SUPERVISOR',
 'Declaration SAVED by EMPLOYEE',
 'Declaration SUBMITTED by EMPLOYEE',
 'Payment Handled',
 'Request Payment'}

In [23]:
for label in unique_labels:
    print(extract_object_action_pairs_from_label(label))

[('reject', 'declaration')]
[('approve', 'declaration')]
[('handle', 'payment')]
[('approve', 'declaration')]
[('save', 'declaration')]
[('approve', 'declaration')]
[]
[('declare', 'for_approval')]
[('reject', 'declaration')]
[('reject', 'declaration')]
[('request', 'payment')]
[('reject', 'declaration')]
[('approve', 'declaration')]
[('approve', 'declaration')]
[('submit', 'declaration')]
[('approve', 'declaration')]
[('reject', 'declaration')]


## 2. Analyzing textual process descriptions

<b>Exercise 3.1</b>: Once a loan application has been approved by the loan provider, an acceptance pack is prepared and sent to the customer. The acceptance pack includes a repayment schedule which the customer needs to agree upon by sending the signed documents back to the loan provider. The latter then verifies the repayment agreement: if the applicant disagreed with the repayment schedule, the loan provider cancels the application; if the applicant agreed, the loan provider approves the application. In either case, the process completes with the loan provider notifying the applicant of the application status.

<b>Exercise 3.2</b>: A loan application is approved if it passes two checks: (i) the applicant’s loan risk assessment, done automatically by a system, and (ii) the appraisal of the property for which the loan has been asked, carried out by a property appraiser. The risk assessment requires a credit history check on the applicant, which is performed by a financial officer. Once both the loan risk assessment and the property appraisal have been performed, a loan officer can assess the applicant’s eligibility. If the applicant is not eligible, the application is rejected, otherwise the acceptance pack is prepared and sent to the applicant.

<b>Exercise 3.3</b>: A loan application may be coupled with a home insurance which is offered at discounted prices. The applicants may express their interest in a home insurance plan at the time of submitting their loan application to the loan provider. Based on this information, if the loan application is approved, the loan provider may either only send an acceptance pack to the applicant, or also send a home insurance quote. The process then continues with the verification of the repayment agreement.



In [26]:
exercise_3_1 = "Once a loan application has been approved by the loan provider, an acceptance pack is prepared and sent to the customer. The acceptance pack includes a repayment schedule which the customer needs to agree upon by sending the signed documents back to the loan provider. The latter then verifies the repayment agreement: if the applicant disagreed with the repayment schedule, the loan provider cancels the application; if the applicant agreed, the loan provider approves the application. In either case, the process completes with the loan provider notifying the applicant of the application status."
exercise_3_2 = "A loan application is approved if it passes two checks: (i) the applicant’s loan risk assessment, done automatically by a system, and (ii) the appraisal of the property for which the loan has been asked, carried out by a property appraiser. The risk assessment requires a credit history check on the applicant, which is performed by a financial officer. Once both the loan risk assessment and the property appraisal have been performed, a loan officer can assess the applicant’s eligibility. If the applicant is not eligible, the application is rejected, otherwise the acceptance pack is prepared and sent to the applicant."
exercise_3_3 = "A loan application may be coupled with a home insurance which is offered at discounted prices. The applicants may express their interest in a home insurance plan at the time of submitting their loan application to the loan provider. Based on this information, if the loan application is approved, the loan provider may either only send an acceptance pack to the applicant, or also send a home insurance quote. The process then continues with the verification of the repayment agreement."


In [34]:
text_to_intermediate_prompt = """
create a BPMN process model for the process description that I’ll give to you. 
Use the following notation: tasks as words (e.g., Receive Order); arcs as arrows (->).
XOR, OR, and AND splits as XOR, OR, and AND. 
In case of XOR, provide me with the condition of using it using this notation: XOR -> (condition) task. 
Provide a mapping of actors to activities if applicable using the form: actor: [activity1, ...]
"""

text_to_bpmn_prompt = "Convert the given description into a BPMN diagram and provide the XML code for it. Provide only the code and nothing else."

In [36]:
def text_to_model(description, model="gpt-3.5-turbo"):
    prompt = text_to_bpmn_prompt + "\n Here is the description:\n" + description
    import openai
    openai.api_key = os.environ["OPENAI_API_KEY"]
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(model=model, messages=messages)
    return response["choices"][0]["message"]["content"]

ans_sketch = text_to_model(exercise_3_1)
print(ans_sketch)

<definitions id="definitions"
   targetNamespace="http://www.omg.org/spec/BPMN/20100524/MODEL"
   xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL"
   xmlns:semantic="http://www.omg.org/spec/BPMN/20100524/MODEL"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <process id="loanApplicationProcess" isExecutable="true">
    <startEvent id="startEvent" name="Loan Application Submitted"/>
    <parallelGateway id="parallelGateway1" name="Parallel Gateway 1"/>
    <sequenceFlow id="sequenceFlow1" sourceRef="startEvent" targetRef="task1"/>
    <task id="task1" name="Loan Application Approval"/>
    <exclusiveGateway id="exclusiveGateway1" name="Decision Gateway 1"/>
    <sequenceFlow id="sequenceFlow2" sourceRef="task1" targetRef="exclusiveGateway1"/>
    <task id="task2" name="Prepare Acceptance Pack"/>
    <sequenceFlow id="sequenceFlow3" sourceRef="exclusiveGateway1" targetRef="task2">
      <conditionExpression xsi:type="tFormalExpression" language="http://www.w3.org/1999/XPat

<img src="https://raw.githubusercontent.com/a-rebmann/nlp4bpa/main/content/exercise_3_1.png" alt="Exercise 3.1" />

<img src="https://raw.githubusercontent.com/a-rebmann/nlp4bpa/main/content/exercise_3_2.png" alt="Exercise 3.2" />

<img src="https://raw.githubusercontent.com/a-rebmann/nlp4bpa/main/content/exercise_3_3.png" alt="Exercise 3.3" />