# Setup

Here, we already imported most of the packages that you might want to use

In [7]:
import os
import pandas as pd
import numpy as np

In [2]:
import pm4py

## Log Util
# Log conversion
from pm4py.objects.conversion.log import converter as log_converter
# Read Log (e.g., If you can not use the simplified interface because you do not have ipywidgets installed)
from pm4py.objects.log.importer.xes import importer as xes_importer

## Conformance Checking
# Token-based replay (e.g., If you can not use the simplified interface because you do not have ipywidgets installed)
from pm4py.algo.evaluation.replay_fitness import evaluator as replay_fitness_evaluator
from pm4py.algo.conformance.tokenreplay import algorithm as token_replay
# Standard Alignments (e.g., If you can not use the simplified interface because you do not have ipywidgets installed)
from pm4py.algo.conformance.alignments import algorithm as alignments
# Decomposed Alignments (e.g., If you can not use the simplified interface because you do not have ipywidgets installed)
from pm4py.algo.conformance.alignments.decomposed import algorithm as decomp_alignments

## Petri Nets
from pm4py.objects.petri_net.obj import PetriNet, Marking
from pm4py.objects.petri_net.utils import petri_utils



# Hints
**Have look at the simplified interface of PM4Py $\rightarrow$ Click on the 'See Simplified Interface' buttons on the documentation page**

Good way to convert a DataFrame so that it works nicely in PM4Py
```python
df = pm4py.format_dataframe(df, case_id=case_id_column activity_key=activity_column, timestamp_key=timestamp_column)
```

To create a DFG, you might want to use the following procedure
```python
dfg, start_activities, end_activities = pm4py.discover_directly_follows_graph(log)
pm4py.view_dfg(dfg, start_activities, end_activities)
```

If you do not have ipywidgets installed, you'll probably encounter the ```IProgress not found``` error.
In that case, it should help to deactivate the progress bar by passing a dictionary with the following key:
```python
{'show_progress_bar': False}
```
However, currently you can not use the simplified interface then. Instead you have to use the base functions.
For example, for importing a log without progress bar, you can use
```
log = xes_importer.apply('log.xes',parameters={'show_progress_bar':False})
```
If you have ipywidgets installed
```python
log = pm4py.read_xes(file_path)
```
can be used.

## Question 1: Process Overview - Connecting the Dots

**1c. A powerful way to reveal additional information using a Dotted Chart is to enhance the event log by additional attributes. To do so, one can use, for example, Python or PM4Py. According to Stella, age is an important factor in the vaccination process. In the Python notebook, append an additional binary event attribute that is set to 1 if and only if the age of the corresponding patient is greater than or equal to 80 (0 otherwise). Using ProM, re-create the Dotted Chart from Part b using the new attribute for coloring. Briefly describe what you observe.**

In [14]:
# All the necessary imports are taken from the above code.

df = pd.read_csv('data/log_vaccination.csv', sep=',')
df['age_factor'] = np.where(df['Age'] >= 80, 1, 0) 
df.to_csv('data/log_vaccination_enhanced.csv', index=False)

## Question 2: Process Discovery Using Inductive Miner

2a. Load the log log_vaccination.csv and apply IM. Describe the behavior defined by the resulting model. Based on the model, explain which cuts (cut type and partitions) have been made by IM.

2b. Next, you and Stella investigate the directly-follows graph of the log. A long time ago, during her studies, Stella already attended the lecture BPI (even though she did not do well in the exam). Now she is wondering about all the beautiful “possibil- ities” for sequence cuts. Would the following cuts be feasible? Explain your answer. (In your report, please refer to the numbering below.)
1. Partition 1: Enter into System; Partition 2: Insurance Check Private, Insurance Check Statutory
2. Partition 1: Request appointment; Partition 2: Appointment granted
3. Partition 1: Checkout; Partition 2: Send Invoide, Send Vaccination Certificate

2c. Investigating the DFG, Stella recognizes the Special Priority Appointment granted activity which has been introduced during the design of the system. Later, it has been decided that this activity should not be contained in the final release; however, nobody patched the prototype. As traces containing Special Priority Ap- pointment granted are rather infrequent, we postpone the analysis and decide to focus on the frequent behavior by filtering out these traces. Implement this filtering step. Afterwards, apply IM on the remaining traces and describe the resulting model (i.e., filter out traces that contain this activity). Explain how the DFG changed and why this allows for the IM to find different cuts.

2d. According to Stella there are two types of vaccines. One requires a single vaccination while the other one requires two vaccinations. Verify this claim (doing so, you may assume that the log precisely contains the events that happened in “real life”). Rename the vaccination facility-related activities related to the second vaccination (e.g., by appending enumeration). Apply IM to the resulting log and discuss the result.

2e. The model that you obtained so far (hopefully) looks quite good; however, Stella notices that it still allows for impossible behavior. Before a patient can participate in the risk talk and submit the declaration of consent, he has to register at the facility. Passing the security at the entry without registering is extremely unlikely/impossible. Besides, it is impossible that a vaccine is send to the vaccination before it has been prepared. As an attendee of an advanced process mining course is always eager to understand what happened, you decide to investigate further. Stella thinks that it might be useful to project the log onto the activities Prepare vaccine, Send vaccine to cabine, Register at vaccination facillity, and Submit Declaration, related to the second vaccination, to investigate their relationship. Luckily, this has already been done and you can simply load the log log_vaccination_e.xes. Using this log, mine a process tree with IM.

- i. (4 points) Can you explain why IM discovers impossible concurrency?
- ii. (2 points) In a general setting, is this method of projecting the initial log onto all activities contained in a subtree always applicable, or can it be problematic: will
IM, applied to the projected log, always return the same sub-process tree?