# Example 3

The focus of this example is converting synthetic IT ticket data into a process architecture.

**Use Case**: Convert semi-structured events into a system proces diagram

## Workflow

- Parse historical logs into PM4PY format
- Run Process mining algorithm
- Identify anomalies
- Token-based replay for simulation?

In [1]:
import pandas as pd
import pm4py

In [2]:
""""Parse CSV log into PM4PY format."""

dataframe = pd.read_csv("./synthetic_data.csv")
dataframe = pm4py.format_dataframe(
    dataframe, case_id="ticket_id", activity_key="activity", timestamp_key="timestamp"
)
event_log = pm4py.convert_to_event_log(dataframe)

dataframe.head()

Unnamed: 0,event_id,timestamp,ticket_id,activity,assigned_to,priority,category,department,notes,case:concept:name,concept:name,time:timestamp,@@index,@@case_index
0,1,2024-01-15 08:42:13+00:00,TKT-20240115-001,Ticket Created,Unassigned,Medium,Software,Finance,Excel crashing when opening large files,TKT-20240115-001,Ticket Created,2024-01-15 08:42:13+00:00,0,0
1,2,2024-01-15 08:55:28+00:00,TKT-20240115-001,Ticket Assigned,Pat Anderson,Medium,Software,Finance,Assigned to L1 support,TKT-20240115-001,Ticket Assigned,2024-01-15 08:55:28+00:00,1,0
2,3,2024-01-15 09:10:45+00:00,TKT-20240115-001,Work Started,Pat Anderson,Medium,Software,Finance,Checking Office version and updates,TKT-20240115-001,Work Started,2024-01-15 09:10:45+00:00,2,0
3,4,2024-01-15 10:22:30+00:00,TKT-20240115-001,Ticket Resolved,Pat Anderson,Medium,Software,Finance,Increased memory allocation in Excel settings,TKT-20240115-001,Ticket Resolved,2024-01-15 10:22:30+00:00,3,0
4,5,2024-01-15 10:35:12+00:00,TKT-20240115-001,Ticket Closed,Pat Anderson,Medium,Software,Finance,User confirmed issue resolved,TKT-20240115-001,Ticket Closed,2024-01-15 10:35:12+00:00,4,0


In [3]:
"""Analyze data using inductive mining."""

import pm4py
from IPython.display import HTML, display


# Discover performance DFG - returns tuple (dfg_dict, start_activities, end_activities)
performance_dfg, start_activities, end_activities = pm4py.discover_performance_dfg(
    event_log
)

In [4]:
# Visualize with performance (time) annotations
output_file = f"./example_3_dfg_performance.svg"
pm4py.save_vis_performance_dfg(
    performance_dfg,
    start_activities,
    end_activities,
    str(output_file),
    rankdir="TD",
)
print(f"Performance DFG saved to: {output_file}")

# Display
with open(output_file, "r") as f:
    svg_content = f.read()
    display(HTML(svg_content))

Performance DFG saved to: ./example_3_dfg_performance.svg


# Cycle Time and Waiting Time Analysis

[Link](https://processintelligence.solutions/pm4py/examples/statistics)

PM4PY can calculate cycle time and lead time metrics for each event:

- **@@approx_bh_partial_cycle_time**: Incremental cycle time associated with the event (the cycle time of the last event is the cycle time of the instance)
- **@@approx_bh_partial_lead_time**: Incremental lead time associated with the event
- **@@approx_bh_overall_wasted_time**: Difference between partial lead time and partial cycle time values
- **@@approx_bh_this_wasted_time**: Wasted time for the activity defined by the 'interval' event
- **@approx_bh_ratio_cycle_lead_time**: Measures the incremental Flow Rate (between 0 and 1)

In [None]:
from pm4py.objects.log.util import interval_lifecycle

# Enrich the event log with cycle time and lead time metrics
enriched_log = interval_lifecycle.assign_lead_cycle_time(event_log)

# Extract the last event (closure)
enriched_log_df = pd.DataFrame([item[-1] for item in enriched_log])

# Display sample of enriched data
enriched_log_df.head()

Unnamed: 0,@@approx_bh_overall_wasted_time,@@approx_bh_partial_cycle_time,@@approx_bh_partial_lead_time,@@approx_bh_this_wasted_time,@@case_index,@@duration,@@index,@approx_bh_ratio_cycle_lead_time,activity,assigned_to,category,concept:name,department,event_id,notes,priority,start_timestamp,ticket_id,time:timestamp,timestamp
0,6779.0,0.0,6779.0,762.0,0,0.0,4,0.0,Ticket Closed,Pat Anderson,Software,Ticket Closed,Finance,5,User confirmed issue resolved,Medium,2024-01-15 10:35:12+00:00,TKT-20240115-001,2024-01-15 10:35:12+00:00,2024-01-15 10:35:12+00:00
1,135289.0,0.0,135289.0,912.0,1,0.0,17,0.0,Ticket Closed,Taylor Chen,Network,Ticket Closed,Engineering,18,Team confirmed VPN working properly,High,2024-01-18 16:50:22+00:00,TKT-20240115-002,2024-01-18 16:50:22+00:00,2024-01-18 16:50:22+00:00
2,4660.0,0.0,4660.0,275.0,2,0.0,22,0.0,Ticket Closed,Sam Williams,Access,Ticket Closed,HR,23,User able to login successfully,Low,2024-01-15 12:40:20+00:00,TKT-20240115-003,2024-01-15 12:40:20+00:00,2024-01-15 12:40:20+00:00
3,0.0,0.0,0.0,0.0,3,0.0,30,1.0,Ticket Closed,Casey Brown,Software,Ticket Closed,Operations,31,Post-incident review scheduled,Critical,2024-01-16 06:20:33+00:00,TKT-20240116-001,2024-01-16 06:20:33+00:00,2024-01-16 06:20:33+00:00
4,128965.0,0.0,128965.0,288.0,4,0.0,38,0.0,Ticket Closed,Chris Johnson,Hardware,Ticket Closed,Sales,39,User confirmed all keys working,Medium,2024-01-19 15:20:10+00:00,TKT-20240116-002,2024-01-19 15:20:10+00:00,2024-01-19 15:20:10+00:00


## Aggregate Metrics by Priority Level

Analyze how ticket priority affects cycle time and efficiency.

In [32]:
# Aggregate by priority - only use the last event of each case (total ticket cycle time)
priority_metrics = (
    enriched_log_df.groupby("priority")
    .agg(
        {
            "@@approx_bh_partial_cycle_time": [
                "mean",
                "median",
                "std",
                "min",
                "max",
                "count",
            ],
            "@@approx_bh_overall_wasted_time": ["mean", "median", "std"],
            "@approx_bh_ratio_cycle_lead_time": ["mean", "median"],
        }
    )
    .round(2)
)

priority_metrics

Unnamed: 0_level_0,@@approx_bh_partial_cycle_time,@@approx_bh_partial_cycle_time,@@approx_bh_partial_cycle_time,@@approx_bh_partial_cycle_time,@@approx_bh_partial_cycle_time,@@approx_bh_partial_cycle_time,@@approx_bh_overall_wasted_time,@@approx_bh_overall_wasted_time,@@approx_bh_overall_wasted_time,@approx_bh_ratio_cycle_lead_time,@approx_bh_ratio_cycle_lead_time
Unnamed: 0_level_1,mean,median,std,min,max,count,mean,median,std,mean,median
priority,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2
Critical,0.0,0.0,0.0,0.0,0.0,6,15935.83,14852.5,12079.13,0.17,0.0
High,0.0,0.0,0.0,0.0,0.0,12,55624.83,25287.5,60892.53,0.0,0.0
Low,0.0,0.0,0.0,0.0,0.0,16,22621.25,5102.5,51395.27,0.0,0.0
Medium,0.0,0.0,0.0,0.0,0.0,18,49966.33,21302.5,53858.33,0.0,0.0


## Aggregate Metrics by Category

Compare cycle times across different ticket categories (Hardware, Software, Network, etc.).

In [None]:
category_metrics = (
    enriched_log_df.groupby("category")
    .agg(
        {
            "@@approx_bh_partial_cycle_time": [
                "mean",
                "median",
                "std",
                "min",
                "max",
                "count",
            ],
            "@@approx_bh_overall_wasted_time": ["mean", "median", "std"],
            "@approx_bh_ratio_cycle_lead_time": ["mean", "median"],
        }
    )
    .round(2)
)

category_metrics

Unnamed: 0_level_0,@@approx_bh_partial_cycle_time,@@approx_bh_partial_cycle_time,@@approx_bh_partial_cycle_time,@@approx_bh_partial_cycle_time,@@approx_bh_partial_cycle_time,@@approx_bh_partial_cycle_time,@@approx_bh_overall_wasted_time,@@approx_bh_overall_wasted_time,@@approx_bh_overall_wasted_time,@approx_bh_ratio_cycle_lead_time,@approx_bh_ratio_cycle_lead_time
Unnamed: 0_level_1,mean,median,std,min,max,count,mean,median,std,mean,median
category,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2
Access,0.0,0.0,0.0,0.0,0.0,8,45883.75,11540.0,56857.73,0.0,0.0
Hardware,0.0,0.0,0.0,0.0,0.0,9,55385.56,33185.0,57542.61,0.0,0.0
Network,0.0,0.0,0.0,0.0,0.0,8,36236.75,15885.0,45298.48,0.0,0.0
Performance,0.0,0.0,0.0,0.0,0.0,6,46058.67,27177.5,57210.25,0.0,0.0
Security,0.0,0.0,0.0,0.0,0.0,5,16702.4,13062.0,11962.45,0.0,0.0
Software,0.0,0.0,0.0,0.0,0.0,16,31821.81,6974.5,61389.6,0.06,0.0


## Aggregate Metrics by Assigned Technician

Analyze technician performance and workload distribution.

In [38]:
tech_metrics = (
    enriched_log_df.groupby("assigned_to")
    .agg(
        {
            "@@approx_bh_overall_wasted_time": ["mean", "median"],
            "@approx_bh_ratio_cycle_lead_time": ["mean"],
        }
    )
    .round(2)
)

tech_metrics

Unnamed: 0_level_0,@@approx_bh_overall_wasted_time,@@approx_bh_overall_wasted_time,@approx_bh_ratio_cycle_lead_time
Unnamed: 0_level_1,mean,median,mean
assigned_to,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Alex Martinez,59325.5,27390.0,0.0
Casey Brown,44236.17,29370.0,0.17
Chris Johnson,46278.33,5675.0,0.0
Jordan Lee,23525.0,23130.0,0.0
Morgan Davis,53597.43,18300.0,0.0
Pat Anderson,32016.12,5857.5,0.0
Riley Thompson,13186.67,10815.0,0.0
Sam Williams,5022.86,4825.0,0.0
Taylor Chen,50066.8,19680.0,0.0


## Overall Summary Statistics

Key metrics across all tickets.

In [41]:
# Overall summary statistics
overall_stats = {
    "Total Tickets": len(enriched_log_df),
    "Avg Cycle Time (hours)": round(
        enriched_log_df["@@approx_bh_partial_cycle_time"].mean() / 3600, 2
    ),
    "Median Cycle Time (hours)": round(
        enriched_log_df["@@approx_bh_partial_cycle_time"].median() / 3600, 2
    ),
    "Avg Wasted Time (hours)": round(
        enriched_log_df["@@approx_bh_overall_wasted_time"].mean() / 3600, 2
    ),
    "Avg Flow Rate": round(
        enriched_log_df["@approx_bh_ratio_cycle_lead_time"].mean(), 4
    ),
}

print("Overall Summary Statistics:")
print("=" * 80)
for key, value in overall_stats.items():
    print(f"{key:30s}: {value}")

Overall Summary Statistics:
Total Tickets                 : 52
Avg Cycle Time (hours)        : 0.0
Median Cycle Time (hours)     : 0.0
Avg Wasted Time (hours)       : 10.81
Avg Flow Rate                 : 0.0192
