# Applied Process Mining Module

This notebook is part of an Applied Process Mining module. The collection of notebooks is a *living document* and subject to change. 

# Hands-On 1 - 'Event Logs and Process Visualization' (Python / PM4Py)

## Setup

<img src="https://pm4py.fit.fraunhofer.de/static/assets/images/pm4py-site-logo-padded.png" alt="PM4Py" style="width: 200px;"/>

In this notebook, we are using the [PM4Py library](https://pm4py.fit.fraunhofer.de/) in combination with several standard Python data science libraries:

* [pandas](https://pandas.pydata.org/)
* [plotnine](https://plotnine.readthedocs.io/en/stable/)

In [None]:
## Perform the commented out commands to install the dependencies
# %pip install pandas
# %pip install matplotlib
# %pip install pm4py

In [None]:
import pandas as pd
import pm4py
import plotnine
from plotnine import ggplot, geom_point, aes, theme_bw, coord_flip, scale_y_discrete, theme, element_text, geom_bin2d

## Assignment

In this hands-on session, you are going to explore a real-life dataset and apply what was presented in the lecture about event logs and basic process mining visualizations.
The objective is to explore your dataset and as an event log and with the learned process mining visualizations in mind.

* Analyse basic properties of the the process (business process or other process) that has generated it. 
    * What are possible case notions / what is the or what are the case identifiers?
    * What are the activities? Are all activities on the same abstraction level? Can activities be derived from other data?
    * Can activities or actions be derived from other (non-activity) data?
* Discovery a map of the process (or a sub-process) behind it.
    * Are there multiple processes that can be discovered?
    * What is the effect of taking a subset of the data (by incident type, …)? 

You may use this notebook to conduct the analysis.

## Dataset

The proposed real-life dataset to investigate is the *BPI Challenge 2014* dataset. The dataset is captured from the ITIL process of Rabobank Group ICT and has been subject to the yearly BPI challenge in 2014. Here is more informaation on the dataset and downloads links to the data files:

* [Overview](https://www.win.tue.nl/bpi/doku.php?id=2014:challenge)
* [Dataset](http://dx.doi.org/10.4121/uuid:c3e5d162-0cfd-4bb0-bd82-af5268819c35)
* [Quick Reference](https://www.win.tue.nl/bpi/lib/exe/fetch.php?media=2014:quick_reference_bpi_challenge_2014.pdf)

On the BPI Challenge 2014 website above, there are also several reports that describe and analyze the dataset in detail. We suggest to first explore the dataset without reading the reports.

## Data Loading

To simplify the data loading task, here are the initial steps:

In [None]:
# some warnings are expected here
interaction_data = pd.read_csv("https://data.4tu.nl/ndownloader/files/24031670", sep=';')
incident_data = pd.read_csv("https://data.4tu.nl/ndownloader/files/24031637", sep=';')
activity_log_incidents = pd.read_csv("https://data.4tu.nl/ndownloader/files/24060575", sep=';')
change_data = pd.read_csv("https://data.4tu.nl/ndownloader/files/24073421", sep=';')

In [None]:
interaction_data.head()

In [None]:
incident_data.head()

In [None]:
activity_log_incidents.head()

In [None]:
change_data.head()

## Event Log

Have a look at the excellent `PM4Py` documentation: https://pm4py.fit.fraunhofer.de/documentation#importing