# PM4Py

Pm4py is an open-source python library built by Fraunhofer Institute for Applied Information Technology to support Process Mining. 


To read about it more, please refer [this](https://analyticsindiamag.com/guide-to-pm4py-python-framework-for-process-mining-algorithms/) article.

## Installation

In [None]:
!python -m pip install pip --upgrade --user -q
!python -m pip install numpy pandas seaborn matplotlib scipy sklearn statsmodels tensorflow keras --user -q

In [None]:
!python -m pip install -U pm4py --user -q

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

## Data Loading

This library supports tabular data input like CSV with the help of pandas. But the recommended data format for event logs is XES(EXtensible Event Stream). This is an XML based hierarchical, tag-based log storage format prescribed by IEEE as a standard.

Let’s load some bank transaction logs stored in xes format. Data is downloaded from this [website](https://www.cs.upc.edu/~taymouri/dataset.html
).

In [None]:
from pm4py.objects.log.importer.xes import importer as xes_importer
log = xes_importer.apply('https://gitlab.com/AnalyticsIndiaMagazine/practicedatasets/-/raw/main/PM4Py/banktransfer(2000-all-noise).xes')

If we prefer to use pandas to analyse the data we can convert the imported logs as follows.


In [None]:
import pandas as pd
from pm4py.objects.conversion.log import converter as log_converter
df = log_converter.apply(log, variant=log_converter.Variants.TO_DATA_FRAME)
# df.to_csv('banktransfer')
df 

We can see that the three most important attributes, case id, timestamp and name of the event are present. Let us reduce the number of rows by limiting the number of traces. This can be done by pm4py’s own suite of filtering functions.

In [None]:
from pm4py.algo.filtering.log.timestamp import timestamp_filter
filtered_log = timestamp_filter.filter_traces_contained(log, "2013-01-01 00:00:00", "2020-01-01 23:59:59") 

## Model Discovery

PM4PY supports three formalisms that represent the process models: PetriNets(Place Transition Net), Directly Flow graphs and Process trees. We will confine ourselves to using Petrinets in this article. Following is the description of Petrinets published in the pm4py documentation.

Petrinets can be obtained using several different mining algorithms.We will use one such algorithm called alphaminer.

In [None]:
from pm4py.algo.discovery.alpha import algorithm as alpha_miner
net, initial_marking, final_marking = alpha_miner.apply(filtered_log) 

## Visualizing a Petrinet

In [None]:
from pm4py.visualization.petrinet import visualizer as pn_visualizer
gviz = pn_visualizer.apply(net, initial_marking, final_marking)
pn_visualizer.view(gviz) 

## Conformance Checking

Following is an example code to perform conformance checking.We generate a model using a part of the log and then validate the entire log.

In [None]:
from pm4py.algo.discovery.inductive import algorithm as inductive_miner
from pm4py.algo.filtering.log.auto_filter.auto_filter import apply_auto_filter
from pm4py.algo.conformance.tokenreplay.diagnostics import duration_diagnostics
#Generating model using only a part of the log
filtered_log = apply_auto_filter(log)
net, initial_marking, final_marking = inductive_miner.apply(filtered_log)
#Checking the entire log for conformance with the model
from pm4py.algo.conformance.tokenreplay import algorithm as token_based_replay
parameters_tbr = {token_based_replay.Variants.TOKEN_REPLAY.value.Parameters.DISABLE_VARIANTS: True, token_based_replay.Variants.TOKEN_REPLAY.value.Parameters.ENABLE_PLTR_FITNESS: True}
replayed_traces, place_fitness, trans_fitness, unwanted_activities = token_based_replay.apply(log, net,
                                                                                              initial_marking,
                                                                                              final_marking,
                                                                                              parameters=parameters_tbr)

In [None]:
#Displaying Diagnostics Information
act_diagnostics = duration_diagnostics.diagnose_from_notexisting_activities(log, unwanted_activities)
for act in act_diagnostics:
    print(act, act_diagnostics[act]) 