# Managing Event Logs

This tutorial will go through the steps necessary to import and manage an event log.

## The `D4PyEventLog` class

The `Declare4Py.D4PyEventLog.D4PyEventLog` class is responsible for managing with `.xes` event log. It methods utilities for importing an event log, retrieving useful information, exporting it in a `.xes` format or converting it in a Pandas dataframe and computing the frequent itemsets of activities or other attributes.

We show how to instantiate a `D4PyEventLog`, notice that the name of the case id is required.

In [1]:
import os
from Declare4Py.D4PyEventLog import D4PyEventLog

event_log: D4PyEventLog = D4PyEventLog(case_name="case:concept:name")

The next step is the parsing of the log with the `parse_xes_log` function. Logs can be passed both in the `.xes` or `xes.gz` formats.

In [6]:
log_path = os.path.join("../../../", "tests", "test_logs", "Sepsis Cases.xes.gz")

# Parses a xes log to EventLog
event_log.parse_xes_log(log_path)

Once the event log has been successfully parsed, basic information are available such as the log itself, its length, the case name, the concept name and the timestamp name.

In [7]:
# Print the parsed log
print("This is the log:")
print(event_log.get_log())
print("--------------------------------------")

# Print the number of cases in the log
print("Number of cases:")
print(event_log.get_length())
print("--------------------------------------")

# Print the number of cases in the log
print("Case name:")
print(event_log.get_case_name())
print("--------------------------------------")

# Print the number of cases in the log
print("Concept name:")
print(event_log.get_concept_name())
print("--------------------------------------")

# Print the number of cases in the log
print("Timestamp name:")
print(event_log.get_timestamp_name())

### The `get_trace` method

The `get_trace` method returns a trace given a numeric index.

In [8]:
event_log.get_trace(3)

### The `get_event_attribute_values` method

The `get_event_attribute_values` method returns all the values of an attribute that occur in an event log along with their number of occurences.

In [9]:
# Print the set of activity values that are in the log along with their number of occurences
print("Activity names:")
print(event_log.get_event_attribute_values(event_log.get_concept_name()))
print("--------------------------------------")

# Print the set of resource values that are in the log along with their number of occurences
print("Resources names:")
print(event_log.get_event_attribute_values('org:group'))

### The `get_start_activities` method

The `get_start_activities` method returns all the activities that start the traces in the log. The method returns a dictionary where each starting activity is paired with the number of traces that start with that activity.

In [10]:
event_log.get_start_activities()

### The `get_end_activities` method

The `get_end_activities` function returns all the activities that end the traces in the log. The method returns a dictionary where each ending activity is paired with the number of traces that end with that activity.

In [11]:
event_log.get_end_activities()

### The `attribute_log_projection` method

A log is a complex data structure that can be explored along several dimensions. The method `attribute_log_projection` projects the cases in the log according to the given input attribute. A projection is a list (the log) of lists (the single cases) containing the value of the attribute.

In [12]:
# Activity projection
for idx, trace in enumerate(event_log.attribute_log_projection(event_log.get_concept_name())):
    print(f"{idx}- {trace}")
print("--------------------------------------")

# Resource projection
for idx, trace in enumerate(event_log.attribute_log_projection("org:group")):
    print(f"{idx}- {trace}")

### The `get_variants` method

This method returns all the variants of an event log. It returns a dictionary where the key is a string expressing the variant and the value is a list containing all the traces encoding that variant. The following snippet of code returns the variants in a string format.

In [13]:
for idx, variant in enumerate(event_log.get_variants().keys()):
    print(f"{idx}- {variant}")

### The `to_dataframe` method

The event log can be converted in a Pandas dataframe with the `to_dataframe` method.

In [14]:
event_log.to_dataframe()
event_log.get_log().head()

### The `to_eventlog` method

The event log can be converted in a EventLog with the `to_eventlog` method.

In [15]:
event_log.to_eventlog()
event_log.get_log()

### The `save_xes` method

The event log can be saved in `xes` format with the `save_xes` method.

In [16]:
event_log.save_xes("saved_log.xes")

### The `compute_frequent_itemsets` method

The `D4PyEventLog` class offers support for computing the frequent itemsets of attributes in the log. The method `compute_frequent_itemsets` takes as input the `min_support` of the itemsets, the name of the case id attribute, a list with the names of the attributes you want to discover the itemsets, the `algorithm` to perform the computation (available `fpgrowth` and `apriori`) and `len_itemset` indicating the maximum length of the itemsets, the default is `None`.

In [17]:
frequent_itemsets = event_log.compute_frequent_itemsets(min_support=0.8, case_id_col=event_log.get_case_name(), categorical_attributes=['concept:name'], algorithm='fpgrowth', len_itemset=3)
frequent_itemsets