# Managing Event Logs with Declare4Py

This tutorial will go through the steps necessary to import and manage an event log.

## The `D4PyEventLog` class

The `Declare4Py.D4PyEventLog.D4PyEventLog` class is responsible for managing with `.xes` event log. It contains utilities for importing an event log, retrieving useful information, exporting in a `.xes` format or converting in a Pandas dataframe and computing the frequent itemsets in a log.

We now instantiate a `D4PyEventLog`, notice that the name of the case id is required.

In [22]:
import sys
import os
import pathlib
from pandas import DataFrame

SCRIPT_DIR = pathlib.Path("..", "src").resolve()
sys.path.append(os.path.dirname(SCRIPT_DIR))

from src.Declare4Py.D4PyEventLog import D4PyEventLog

event_log: D4PyEventLog = D4PyEventLog(case_name="case:concept:name")

The next step is the parsing of the log with the `parse_xes_log` function. Logs can be passed both in the `.xes` or `xes.gz` formats.

In [23]:
log_path = os.path.join("..", "tests", "Sepsis Cases.xes.gz")

# Parses a xes log to EventLog
event_log.parse_xes_log(log_path)

parsing log, completed traces ::   0%|          | 0/1050 [00:00<?, ?it/s]

Once the log has been successfully parsed, basic information are available such as the log itself, its length, the case name, the concept name and the timestamp name.

In [3]:
# Print the parsed log
print("This is the log:")
print(event_log.get_log())
print("--------------------------------------")

# Print the number of cases in the log
print("Number of cases:")
print(event_log.get_length())
print("--------------------------------------")

# Print the number of cases in the log
print("Case name:")
print(event_log.get_case_name())
print("--------------------------------------")

# Print the number of cases in the log
print("Concept name:")
print(event_log.get_concept_name())
print("--------------------------------------")

# Print the number of cases in the log
print("Timestamp name:")
print(event_log.get_timestamp_name())

This is the log:
[{'attributes': {'concept:name': 'A'}, 'events': [{'InfectionSuspected': True, 'org:group': 'A', 'DiagnosticBlood': True, 'DisfuncOrg': True, 'SIRSCritTachypnea': True, 'Hypotensie': True, 'SIRSCritHeartRate': True, 'Infusion': True, 'DiagnosticArtAstrup': True, 'concept:name': 'ER Registration', 'Age': 85, 'DiagnosticIC': True, 'DiagnosticSputum': False, 'DiagnosticLiquor': False, 'DiagnosticOther': False, 'SIRSCriteria2OrMore': True, 'DiagnosticXthorax': True, 'SIRSCritTemperature': True, 'time:timestamp': datetime.datetime(2014, 10, 22, 11, 15, 41, tzinfo=datetime.timezone(datetime.timedelta(seconds=7200))), 'DiagnosticUrinaryCulture': True, 'SIRSCritLeucos': False, 'Oligurie': False, 'DiagnosticLacticAcid': True, 'lifecycle:transition': 'complete', 'Diagnose': 'A', 'Hypoxie': False, 'DiagnosticUrinarySediment': True, 'DiagnosticECG': True}, '..', {'org:group': 'E', 'lifecycle:transition': 'complete', 'concept:name': 'Release A', 'time:timestamp': datetime.datetime(

### The `get_trace` function

The `get_trace` function returns a trace given a numeric index.

In [6]:
event_log.get_trace(3)

{'attributes': {'concept:name': 'D'}, 'events': [{'InfectionSuspected': True, 'org:group': 'A', 'DiagnosticBlood': True, 'DisfuncOrg': False, 'SIRSCritTachypnea': True, 'Hypotensie': False, 'SIRSCritHeartRate': True, 'Infusion': True, 'DiagnosticArtAstrup': True, 'concept:name': 'ER Registration', 'Age': 70, 'DiagnosticIC': True, 'DiagnosticSputum': False, 'DiagnosticLiquor': False, 'DiagnosticOther': False, 'SIRSCriteria2OrMore': True, 'DiagnosticXthorax': True, 'SIRSCritTemperature': True, 'time:timestamp': datetime.datetime(2014, 7, 10, 11, 52, tzinfo=datetime.timezone(datetime.timedelta(seconds=7200))), 'DiagnosticUrinaryCulture': False, 'SIRSCritLeucos': False, 'Oligurie': False, 'DiagnosticLacticAcid': True, 'lifecycle:transition': 'complete', 'Diagnose': 'D', 'Hypoxie': False, 'DiagnosticUrinarySediment': False, 'DiagnosticECG': True}, '..', {'org:group': '?', 'lifecycle:transition': 'complete', 'concept:name': 'Return ER', 'time:timestamp': datetime.datetime(2014, 7, 28, 17, 36

### The `get_event_attribute_values` function

The `get_event_attribute_values` function returns all the values of an attribute that occur in an event log along with their number of occurences.

In [18]:
# Print the set of activity values that are in the log along with their number of occurences
print("Activity names:")
print(event_log.get_event_attribute_values(event_log.get_concept_name()))
print("--------------------------------------")

# Print the set of resource values that are in the log along with their number of occurences
print("Resources names:")
print(event_log.get_event_attribute_values('org:group'))

Activity names:
{'ER Registration': 1050, 'Leucocytes': 3383, 'CRP': 3262, 'LacticAcid': 1466, 'ER Triage': 1053, 'ER Sepsis Triage': 1049, 'IV Liquid': 753, 'IV Antibiotics': 823, 'Admission NC': 1182, 'Release A': 671, 'Return ER': 294, 'Admission IC': 117, 'Release B': 56, 'Release C': 25, 'Release D': 24, 'Release E': 6}
--------------------------------------
Resources names:
{'A': 3462, 'B': 8111, 'C': 1053, 'D': 47, 'E': 782, 'F': 216, 'G': 148, 'H': 55, '?': 294, 'I': 126, 'J': 26, 'K': 18, 'L': 213, 'M': 84, 'N': 46, 'O': 186, 'P': 59, 'Q': 63, 'R': 57, 'S': 33, 'T': 35, 'U': 18, 'V': 25, 'W': 55, 'X': 1, 'Y': 1}


### The `get_start_activities` function

The `get_start_activities` function returns all the activities that start the traces in the log. The function returns a dictionary where each starting activity is paired with the number of traces that start with that activity.

In [5]:
event_log.get_start_activities()

{'ER Registration': 995,
 'IV Liquid': 14,
 'ER Triage': 6,
 'CRP': 10,
 'ER Sepsis Triage': 7,
 'Leucocytes': 18}

### The `get_end_activities` function

The `get_end_activities` function returns all the activities that end the traces in the log. The function returns a dictionary where each ending activity is paired with the number of traces that end with that activity.

In [6]:
event_log.get_end_activities()

{'Release A': 393,
 'Return ER': 291,
 'IV Antibiotics': 87,
 'Release B': 55,
 'ER Sepsis Triage': 49,
 'Leucocytes': 44,
 'IV Liquid': 12,
 'Release C': 19,
 'CRP': 41,
 'LacticAcid': 24,
 'Release D': 14,
 'Admission NC': 14,
 'Release E': 5,
 'ER Triage': 2}

### The `attribute_log_projection` function

A log is a complex data structure that can be explored along several dimensions. The function `attribute_log_projection` projects the cases in the log according to the given input attribute. A projection is a list (the log) of lists (the single cases) containing the value of the attribute.

In [None]:
# Activity projection
for idx, trace in enumerate(event_log.attribute_log_projection(event_log.get_concept_name())):
    print(f"{idx}- {trace}")
print("--------------------------------------")

# Resource projection
for idx, trace in enumerate(event_log.attribute_log_projection("org:group")):
    print(f"{idx}- {trace}")

### The `get_variants` function

This function returns all the variants of an event log. It returns a dictionary where the key is a string expressing the variant and the value is a list containing all the traces encoding that variant. The following snippet of code returns the variants in a string format.

In [10]:
for idx, variant in enumerate(event_log.get_variants().keys()):
    print(f"{idx}- {variant}")

0- ER Registration,Leucocytes,CRP,LacticAcid,ER Triage,ER Sepsis Triage,IV Liquid,IV Antibiotics,Admission NC,CRP,Leucocytes,Leucocytes,CRP,Leucocytes,CRP,CRP,Leucocytes,Leucocytes,CRP,CRP,Leucocytes,Release A
1- ER Registration,ER Triage,CRP,LacticAcid,Leucocytes,ER Sepsis Triage,IV Liquid,IV Antibiotics,Admission NC,CRP,CRP,Release A
2- ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,IV Liquid,IV Antibiotics,Admission NC,Admission NC,Leucocytes,CRP,Leucocytes,CRP,Release A
3- ER Registration,ER Triage,ER Sepsis Triage,CRP,LacticAcid,Leucocytes,IV Liquid,IV Antibiotics,Admission NC,Leucocytes,CRP,Release A,Return ER
4- ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,CRP,Leucocytes,LacticAcid,IV Antibiotics
5- ER Registration,ER Triage,ER Sepsis Triage,Leucocytes,CRP,LacticAcid,IV Antibiotics,IV Liquid,Admission NC,Release A
6- ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,IV Antibiotics,LacticAcid,CRP,Leucocytes,Admission NC,Leucocytes,CRP,Release A
7- ER Registr

### The `to_dataframe` function

The event log can be converted in a Pandas dataframe with the `to_dataframe` function.

In [20]:
df_log: DataFrame = event_log.to_dataframe()
df_log.head()

Unnamed: 0,InfectionSuspected,org:group,DiagnosticBlood,DisfuncOrg,SIRSCritTachypnea,Hypotensie,SIRSCritHeartRate,Infusion,DiagnosticArtAstrup,concept:name,...,DiagnosticLacticAcid,lifecycle:transition,Diagnose,Hypoxie,DiagnosticUrinarySediment,DiagnosticECG,case:concept:name,Leucocytes,CRP,LacticAcid
0,True,A,True,True,True,True,True,True,True,ER Registration,...,True,complete,A,False,True,True,A,,,
1,,B,,,,,,,,Leucocytes,...,,complete,,,,,A,9.6,,
2,,B,,,,,,,,CRP,...,,complete,,,,,A,,21.0,
3,,B,,,,,,,,LacticAcid,...,,complete,,,,,A,,,2.2
4,,C,,,,,,,,ER Triage,...,,complete,,,,,A,,,


### The `save_xes` function

The event log can be saved in `xes` format with the `save_xes` function.

In [21]:
event_log.save_xes("saved_log.xes")

exporting log, completed traces ::   0%|          | 0/1050 [00:00<?, ?it/s]

### The `compute_frequent_itemsets` function

The `D4PyEventLog` class offers support for computing the frequent itemsets of attributes in the log. The function `compute_frequent_itemsets` takes as input the `min_support` of the itemsets, the name of the case id attribute, a list with the names of the attributes you want to discover the itemsets, the `algorithm` to perform the computation (available `fpgrowth` and `apriori`) and `len_itemset` indicating the maximum length of the itemsets, the default is `None`.

In [19]:
frequent_itemsets = event_log.compute_frequent_itemsets(min_support=0.8, case_id_col=event_log.get_case_name(), categorical_attributes=['concept:name'], algorithm='fpgrowth', len_itemset=3)
frequent_itemsets



Unnamed: 0,support,itemsets,length
0,1.0,(concept:name_ER Triage),1
1,1.0,(concept:name_ER Registration),1
2,0.999048,(concept:name_ER Sepsis Triage),1
3,0.96381,(concept:name_Leucocytes),1
4,0.959048,(concept:name_CRP),1
5,0.819048,(concept:name_LacticAcid),1
6,1.0,"(concept:name_ER Registration, concept:name_ER...",2
7,0.999048,"(concept:name_ER Registration, concept:name_ER...",2
8,0.999048,"(concept:name_ER Sepsis Triage, concept:name_E...",2
9,0.999048,"(concept:name_ER Registration, concept:name_ER...",3
