<table align="center">

  <td align="center"><a target="_blank" href="https://colab.research.google.com/github/a-rebmann/nlp4bpa/blob/main/nlp4bpa_tutorial_2023.ipynb">
        <img src="https://i.ibb.co/2P3SLwK/colab.png"  style="padding-bottom:5px;" />Run in Google Colab</a></td>
</table>

# NLP for BPA - Hands-on Exercises

## Outline

### 1. Event log and Process Model Analysis
    1.1 Importing and analyzing an event log with GPT4 and pm4py
    1.2 Custom task: action and object extraction from activity labels using LLMs
    1.3 Potential use cases of the custom task.

### 2. Analyzing textual process descriptions
    Imperative model extraction from text
    
### 3. Future of NLP for BPA

#### API key

In [None]:
import os
os.environ["OPENAI_API_KEY"] = ""

#### Required installs

In [None]:
# Required installs

!pip install -q pm4py==2.7.3
!pip install -q spacy
!pip install -q spacy-transformers
!pip install -q openai
!python -m spacy download en_core_web_sm
!python -m pip install spacy-llm

### 1.1 Importing and analyzing an event log with GPTs and pm4py

In this part, we will use [GPT-4](https://openai.com) and [pm4py](https://pm4py.fit.fraunhofer.de) to analyze a real-life event log. 

<small>
Alessandro Berti, Daniel Schuster, and Wil M. P. van der Aalst: Abstractions, Scenarios, and Prompt Definitions for Process Mining with LLMs: A Case Study. In: BPM 2023 Workshops.
</small>

#### Importing an event log

In [None]:
# Importing an example event log

import pm4py
travel_event_log = pm4py.read_xes("content/PermitLog.xes")

#### Describing the process captured in an event log

In [None]:
ans_desc = pm4py.openai.describe_process(travel_event_log, openai_model="gpt-3.5-turbo", api_key=os.environ["OPENAI_API_KEY"])
print(ans_desc)

This result is based on the following query, which abstracts the log to a directly-follows graph that is in turn described textually.

In [None]:
from pm4py.algo.querying.openai import log_to_variants_descr
d_query = log_to_dfg_descr.apply(travel_event_log, parameters={})
d_query+= "can you provide a description of the process?"
print(d_query)

#### Checking for potentially undesired behavior

In [None]:
ans_ad = pm4py.openai.anomaly_detection(travel_event_log, openai_model="gpt-3.5-turbo", api_key=os.environ["OPENAI_API_KEY"])
print(ans_ad)

This result is obtained by a abstracting the event log to trace variants:

In [None]:
from pm4py.algo.querying.openai import log_to_dfg_descr
a_query = log_to_variants_descr.apply(log_obj, parameters={})
a_query += "what are the main anomalies? An anomaly involves a strange ordering of the activities, or a significant amount of rework. Please only data and process specific considerations, not general considerations. Please sort the anomalies based on their seriousness."
print(a_query)

### 1.2 Implementing a custom task using GPTs
In this part, we will focus on an NPL task in the context of business process analysis.
We will show how such a task can be implemented using LLMs without any fine-tuning.

As an example, we will focus on the extraction of business objecs and actions applied to these from activity or event labels. 
The automated analysis of such labels based on traditional NLP techniques and based on transformers has been actively researched. It enables many downstream pre-processing tasks, such as the cleaning/standardization of activity labels and the automated assessment of the type of activity that is performed.

In [None]:
task_prompt = """You are an expert activity label tagger system. 
Your task is to accept activity labels such as 'create purchase order' as input and provide a list of pairs, where each pair consists of an action and the object it is applied on. 
For 'create purchase order', you would return [('create', 'purchase oder')]. 
If actions are not provided as verbs, change them into verbs. 
For 'purchase order creation' you would hence return ('create', 'purchase order') as well and for 'purchase order for checking' you would return ('check', 'purchase order'). 
Also turn past tense actions into present tense ones, i.e. 'purchase order created' becomes ('create', 'purchase order') too. 
If there is additional information, e.g., about who is performing the action orabout an IT system that is involved, discard that. 

Do not put any other text in your answer, only a (possibly empty) list of pairs with nothing before or after. In each pair the action comes first, followed by the object (if any).
If the activity label does not contain any actions, return an empty list , ie., []

Here is the activity label that shall be tagged.
Text:
"""

def extract_object_action_pairs_from_label(label, model="gpt-3.5-turbo"):
    label_promt = task_prompt + "'" + label.lower() + "''"
    import openai
    openai.api_key = os.environ["OPENAI_API_KEY"]
    messages = [{"role": "user", "content": label_promt}]
    response = openai.ChatCompletion.create(model=model, messages=messages)
    return response["choices"][0]["message"]["content"]
    
extract_object_action_pairs_from_label("notification letter creation and approval")

### 1.3 Potential use cases of the custom task

Next, we use the implementation of our custom task for preprocessing our event log

Let's apply the function to the event labels of our event log.

In [None]:
unique_labels = set(travel_event_log["concept:name"].unique())
unique_labels

In [None]:
label_mapping = {}
for label in unique_labels:
    processed_label = extract_object_action_pairs_from_label(label)
    print(processed_label)
    label_mapping[label] = " ".join(processed_label)

## 2. Analyzing textual process descriptions

<b>Exercise 3.1</b>: Once a loan application has been approved by the loan provider, an acceptance pack is prepared and sent to the customer. The acceptance pack includes a repayment schedule which the customer needs to agree upon by sending the signed documents back to the loan provider. The latter then verifies the repayment agreement: if the applicant disagreed with the repayment schedule, the loan provider cancels the application; if the applicant agreed, the loan provider approves the application. In either case, the process completes with the loan provider notifying the applicant of the application status.

<b>Exercise 3.2</b>: A loan application is approved if it passes two checks: (i) the applicant’s loan risk assessment, done automatically by a system, and (ii) the appraisal of the property for which the loan has been asked, carried out by a property appraiser. The risk assessment requires a credit history check on the applicant, which is performed by a financial officer. Once both the loan risk assessment and the property appraisal have been performed, a loan officer can assess the applicant’s eligibility. If the applicant is not eligible, the application is rejected, otherwise the acceptance pack is prepared and sent to the applicant.

<b>Exercise 3.3</b>: A loan application may be coupled with a home insurance which is offered at discounted prices. The applicants may express their interest in a home insurance plan at the time of submitting their loan application to the loan provider. Based on this information, if the loan application is approved, the loan provider may either only send an acceptance pack to the applicant, or also send a home insurance quote. The process then continues with the verification of the repayment agreement.



In [None]:
exercise_3_1 = "Once a loan application has been approved by the loan provider, an acceptance pack is prepared and sent to the customer. The acceptance pack includes a repayment schedule which the customer needs to agree upon by sending the signed documents back to the loan provider. The latter then verifies the repayment agreement: if the applicant disagreed with the repayment schedule, the loan provider cancels the application; if the applicant agreed, the loan provider approves the application. In either case, the process completes with the loan provider notifying the applicant of the application status."
exercise_3_2 = "A loan application is approved if it passes two checks: (i) the applicant’s loan risk assessment, done automatically by a system, and (ii) the appraisal of the property for which the loan has been asked, carried out by a property appraiser. The risk assessment requires a credit history check on the applicant, which is performed by a financial officer. Once both the loan risk assessment and the property appraisal have been performed, a loan officer can assess the applicant’s eligibility. If the applicant is not eligible, the application is rejected, otherwise the acceptance pack is prepared and sent to the applicant."
exercise_3_3 = "A loan application may be coupled with a home insurance which is offered at discounted prices. The applicants may express their interest in a home insurance plan at the time of submitting their loan application to the loan provider. Based on this information, if the loan application is approved, the loan provider may either only send an acceptance pack to the applicant, or also send a home insurance quote. The process then continues with the verification of the repayment agreement."


In [None]:
text_to_intermediate_prompt = """
create a BPMN process model for the process description that I’ll give to you. Do not consider tasks of external parties.
Use the following notation for control-flow constructs in your output
1. Tasks, i.e., the basic construct, represent tasks as words in a verb-object style, e.g., receive order, when posiible.
2. Nested constructs:
2.1 Sequences denoted as ->(construct1, construct2, ...), which means that construct1 is followed by construct2 and construct2 is followed by ...
2.2 XOR construct as XOR(construct1, construct2, ...), in case of XOR, provide me with the condition of using its elements using this notation: XOR([condition]construct1, [condition]construct2,...).
2.3 OR construct OR(construct1, construct2, ...)
2.4 AND construct AND(construct1, construct2, ...) 
Do not include any line breaks or textual explanation in you output and stick to the provided notation.
"""

text_to_bpmn_prompt = "Convert the given description into a BPMN diagram and provide the XML code for it. Provide only the code and nothing else."

In [None]:
def text_to_model(description, model="gpt-3.5-turbo", xml=False):
    prompt = text_to_bpmn_prompt if xml else text_to_intermediate_prompt + "\n Here is the description:\n" + description
    import openai
    openai.api_key = os.environ["OPENAI_API_KEY"]
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(model=model, messages=messages)
    return response["choices"][0]["message"]["content"]

This is the model provided by a human expert:
<img src="https://raw.githubusercontent.com/a-rebmann/nlp4bpa/main/content/exercise_3_1.png" alt="Exercise 3.1" />

This is what GPT-X comes up with:

In [None]:
ans_sketch = text_to_model(exercise_3_1)
print(ans_sketch)

This is the model provided by a human expert:
<img src="https://raw.githubusercontent.com/a-rebmann/nlp4bpa/main/content/exercise_3_2.png" alt="Exercise 3.2" />

This is what GPT-X comes up with:

In [None]:
ans_sketch = text_to_model(exercise_3_2)
print(ans_sketch)

This is the model provided by a human expert:
<img src="https://raw.githubusercontent.com/a-rebmann/nlp4bpa/main/content/exercise_3_3.png" alt="Exercise 3.3" />

This is what GPT-X comes up with:

In [None]:
ans_sketch = text_to_model(exercise_3_3)
print(ans_sketch)