# Process Mining
Please make sure your notebook fulfills the following requirements:
- Commented and structured code (if not, this will be penalized in the style points)
- Questions separated by markdown headers
- Top-to-bottom runnable cells to reproduce your results
- It should be runnable in the bundled conda environment.
- Ensure that the code in the notebook runs if placed in the same folder as all of the provided
files, delivering the same outputs as the ones you submit in the notebook and report on
in your report.
- DO NOT CLEAR THE OUTPUT of the notebooks you are submitting!

Also, please do not set any random states in this notebook. We set the following random seeds to ensure that the results are the same.

In [None]:
### Display
from IPython.display import display
## Data Handling
import pandas as pd
import pm4py
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import random

In [None]:
# please do not change or delete this cell
random.seed(42)
np.random.seed(42)

## Event Log Exploration
You have received event data of a financing organization's process of offering loans to customers whose application of a loan was accepted. In the event log, every entry describes the state change of an offer,i.e., every entry refers to: 1) the loan application of a customer (case), 2) the state an offer was set to (activity) and 3) a timestamp. Additionally, some additional information is provided for every offer, including the offer ID.

In the provided data, every case describes the process of coming to terms with the customers of accepted loan applications. Hence, every case can include the lifecycles of one or more offers, however, for every case no more than one offer is accepted.
Use Python and PM4PY (https://processintelligence.solutions/static/api/2.7.11/) to explore the event log.

In [None]:
# Load the event log


### Activities and Events
Please leave do not change the activities in the event log -- just leave them as they are in the data. Even if you suspect multiple activity names refer to the same actual activity.

In [None]:
# number of events and plot showing the frequency of each activity

### Cases and Offers

In [None]:
# number of cases, offers and mean offers per case

## Process Exploration

### Start and End Activities

In [None]:
# start and end activities of the process

In [None]:
# plot of start and end activities

### Full Process Model

In [None]:
# Process Model full log

#### Model Language
(Sensible and Nonsensical Traces)

In [None]:
# in case you want to check something here, but no code required for this part

### Variants

In [None]:
# plot showing percentage of cases covered by number of variants

minimum number of variants you need to cover 85% of cases

percentage of cases covered by the 2 most frequent variants

number of variants representing fewer than 10 cases

## Process Models

### Five Most Frequent Variants

In [None]:
# event log with only 5 most frequent variants

In [None]:
# process model of the five most frequent variants

In [None]:
# fitness of process model

Model fitness per case

In [None]:
# histogram of model fitness
bins = np.arange(0, 1.02, 0.02)

# add your code here

## Performance Analysis (Throughput Times)

- Only consider the 35 most frequent variants
- Throughput time of a case: case duration


In [None]:
# event log with 35 most frequent variants

In [None]:
# determine required information per variant (mean throughput time, number of offers)

In [None]:
# create bar chart with throughput time per variant.

Throughput time and number of offers

In [None]:
# box plot of throughput time vs. number of offers