In [1]:
import pm4py
pm4py.__version__

'1.2.13'

# Performance Analysis

> Which parts of the process have the biggest influence on the total case duration?

## Data loading

Import the log.

In [26]:
import os
import pandas as pd
from pm4py.objects.log.importer.xes import factory as xes_import_factory
from pm4py.objects.petri.importer import factory as pnml_importer

PROJ_ROOT = os.path.abspath(os.path.pardir)

#import the renamed event log from Q1
event_log = xes_import_factory.apply(PROJ_ROOT+"/data/processed_log.xes")

#import the filtered petri net obtained in Q1
pnml_path = os.path.join(PROJ_ROOT, 'results', 'Q1', 'filtered_petri.pnml')
net, initial_marking, final_marking = pnml_importer.apply(pnml_path)

---

# a)
> Provide and briefly describe results of your performance analysis. Remember to also consider your current results which may give you a good entry point for a deeper analysis.

We start our exploration of the process performance by evaluating the performance of single activities. At first we compute the mean activity duration for all activities, the results can be found below.

In [38]:
df_log = pd.read_csv(PROJ_ROOT+"/data/log.csv")

#calculate mean duration
df_log.groupby(["Activity"])[["@@duration"]].mean().sort_values(by=["@@duration"],ascending=False)

Unnamed: 0_level_0,@@duration
Activity,Unnamed: 1_level_1
Prescripe Special Medication,3605.705128
Test III,1802.110398
Treatment A2,1797.421006
Treatment A1,1200.385571
Initial Exam,902.963092
Treatment B,899.318141
Register Facility,601.992352
Treatment A3,601.773109
Referral,600.379962
Register,600.366481


What we can observe here is that the _Prescripe Special Medication_ activity has by far the largest average duration. In general we see that the activities that are relatd to treatments have the longest duration. 

Administrative tasks like registration activities also show a long duration, while examination decisions and treatment checks have the shortest duration.

In the following we will investigate if there are large changes in the duration of an activity between cases. If an activity has a small standard deviation for its duration, it suggests that the activity always requires a certain amount of time.

In [69]:
#calculate the standard deviaiton in durration for all activities
df_log.groupby(["Activity"])[["@@duration"]].std().sort_values(by=["@@duration"],ascending=False).head()


Unnamed: 0_level_0,@@duration
Activity,Unnamed: 1_level_1
Prescripe Special Medication,150.46578
Test III,118.784907
Treatment A2,116.362698
Initial Exam,91.817365
Treatment A1,89.089785


All of the standard deviations are small considering the overall duration of the activities, this means that there is not much variation in the time that is required for a specific activity in between cases.

In [24]:
from pm4py.visualization.petrinet import factory as pn_visualizer
from pm4py.algo.discovery.inductive import factory as inductive_miner

#annotate the mined petri net with performance measures
perf_net_vis = pn_visualizer.apply(net, initial_marking, final_marking=final_marking,
                                   variant=pn_visualizer.PERFORMANCE_DECORATION, log=event_log,
                                   parameters=param_keys)

figures_dir = os.path.join(PROJ_ROOT, 'report', 'figures')

# fix place size
import numpy as np
body = np.array(perf_net_vis.body)
body[body ==  '\tnode [fixedsize=true shape=circle width=0.75]'] = '\tnode [fixedsize=true shape=circle width=1]'
perf_net_vis.body = body

perf_net_vis.render(os.path.join(figures_dir, 'q3_perf_petrinet'),
                 format='pdf',
                 view=True)

'/Users/Tom/Documents/Uni/4. Semester M/Advanced Process Mining/Assignments/Assignment 1/APM-A1/report/figures/q3_perf_petrinet.pdf'

We have annotated the petri net that was obtained in Q1 after preprocessing the log with a performance metric. The metric used is the mean time between two events.

In [72]:
#from pm4py.algo.filtering.log.cases import case_filter

#filtered_log = case_filter.filter_on_case_performance(event_log, 86400, 864000)
#len(filtered_log)

---

# b)
> Discuss insights obtained from you analysis, for example identify bottlenecks, and discuss their impact.