## ***Generating the Synthetic Eventlogs by Incorporating Overlapping Executions***
#### *In this code, we introduce overlaps between two work items. To this end, we will use the preprocessed event logs generated in Step_1 of the experiment.*  
#### *As an example, we use the "Purchasing.xes" event log and incorporate overlaps with a multitasking level ratio of 0.9.*

In [None]:
# import the libraries
import numpy as np
import matplotlib.pyplot as plt
import time
import pm4py
from pm4py.objects.log.importer.xes import importer as xes_importer
import pytz
import pandas as pd
from collections import defaultdict
import matplotlib.pyplot as plt
from pm4py.objects.log.util import sorting
from pm4py.objects.log.exporter.xes import exporter as xes_exporter

In [None]:
log = xes_importer.apply('Purchasing_NMT.xes') 
log
# LOGS NO_MT:
# BPIC2012_NMT.xes
# BPIC2017_NMT.xes
# EDlog_NMT.xes
# Production_NMT.xes
# Purchasing_NMT.xes

parsing log, completed traces ::   0%|          | 0/608 [00:00<?, ?it/s]

[{'attributes': {'concept:name': '1'}, 'events': [{'concept:name': 'Create Purchase Requisition', 'start:timestamp': datetime.datetime(2011, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), 'time:timestamp': datetime.datetime(2011, 1, 1, 0, 37, tzinfo=datetime.timezone.utc), 'org:resource': 'Kim Passa', 'duration_seconds': 2220.0, 'duration_minutes': 37.0}, '..', {'concept:name': 'Pay invoice', 'start:timestamp': datetime.datetime(2011, 1, 4, 15, 22, tzinfo=datetime.timezone.utc), 'time:timestamp': datetime.datetime(2011, 1, 4, 15, 31, tzinfo=datetime.timezone.utc), 'org:resource': 'Pedro Alvares', 'duration_seconds': 540.0, 'duration_minutes': 9.0}]}, '....', {'attributes': {'concept:name': '1949'}, 'events': [{'concept:name': 'Create Purchase Requisition', 'start:timestamp': datetime.datetime(2011, 10, 13, 6, 27, tzinfo=datetime.timezone.utc), 'time:timestamp': datetime.datetime(2011, 10, 13, 7, 21, tzinfo=datetime.timezone.utc), 'org:resource': 'Tesca Lobes', 'duration_seconds': 3240.0, '

In [None]:
# Converting into the dataframe
df = pm4py.convert_to_dataframe(log)
df.head(20)

Unnamed: 0,concept:name,start:timestamp,time:timestamp,org:resource,duration_seconds,duration_minutes,case:concept:name
0,Create Purchase Requisition,2011-01-01 00:00:00+00:00,2011-01-01 00:37:00+00:00,Kim Passa,2220.0,37.0,1
1,Create Request for Quotation Requester,2011-01-01 05:37:00+00:00,2011-01-01 05:45:00+00:00,Kim Passa,480.0,8.0,1
2,Analyze Request for Quotation,2011-01-01 06:41:00+00:00,2011-01-01 06:55:00+00:00,Karel de Groot,840.0,14.0,1
3,Send Request for Quotation to Supplier,2011-01-01 11:43:00+00:00,2011-01-01 12:09:00+00:00,Karel de Groot,1560.0,26.0,1
4,Create Quotation comparison Map,2011-01-01 12:32:00+00:00,2011-01-01 16:03:00+00:00,Magdalena Predutta,12660.0,211.0,1
5,Analyze Quotation comparison Map,2011-01-01 22:44:00+00:00,2011-01-01 23:13:00+00:00,Immanuel Karagianni,1740.0,29.0,1
6,Choose best option,2011-01-01 23:13:00+00:00,2011-01-01 23:13:00+00:00,Tesca Lobes,0.0,0.0,1
7,Settle conditions with supplier,2011-01-02 01:22:00+00:00,2011-01-02 09:20:00+00:00,Francois de Perrier,28680.0,478.0,1
8,Create Purchase Order,2011-01-02 09:58:00+00:00,2011-01-02 10:10:00+00:00,Karel de Groot,720.0,12.0,1
9,Confirm Purchase Order,2011-01-02 14:09:00+00:00,2011-01-02 14:43:00+00:00,Sean Manney,2040.0,34.0,1


In [None]:
df_sorted = df.sort_values(['org:resource', 'start:timestamp']).copy()

#previous timestamps
df_sorted['prev_end'] = df_sorted.groupby('org:resource')['time:timestamp'].shift(1)
df_sorted['prev_start'] = df_sorted.groupby('org:resource')['start:timestamp'].shift(1)

#idle duration
df_sorted['idle_duration_min'] = ((df_sorted['start:timestamp'] - df_sorted['prev_end']).dt.total_seconds() / 60)

df_sorted['is_first'] = df_sorted['prev_end'].isna()

#transition type
df_sorted['transition_type'] = np.where(
    df_sorted['is_first'], 'first',
    np.where(df_sorted['idle_duration_min'] == 0, 'back_to_back', 
    np.where(df_sorted['idle_duration_min'] > 0, 'idle_gap', 'other'))
)

#keeping only idle_gap rows (positive idle)
idle_valid = df_sorted[df_sorted['transition_type'] == 'idle_gap']

overall_summary = pd.DataFrame({
    'Total Resources': [df['org:resource'].nunique()],
    'Total Work Items': [len(df)],
    'Total Transitions': [df_sorted['transition_type'].notna().sum()],
    'Back-to-Back Count': [(df_sorted['transition_type'] == 'back_to_back').sum()],
    'Idle Gap Transitions': [len(idle_valid)],
    'Avg Idle Duration': [idle_valid['idle_duration_min'].mean()],
    'Median Idle Duration': [idle_valid['idle_duration_min'].median()]
})

overall_summary['% Back-to-Back'] = (overall_summary['Back-to-Back Count'] / overall_summary['Total Transitions'] * 100).round(2)

overall_summary['% With Idle'] = (overall_summary['Idle Gap Transitions'] / overall_summary['Total Transitions'] * 100).round(2)

overall_summary

Unnamed: 0,Total Resources,Total Work Items,Total Transitions,Back-to-Back Count,Idle Gap Transitions,Avg Idle Duration,Median Idle Duration,% Back-to-Back,% With Idle
0,27,9119,9119,2416,6648,1485.637335,578.5,26.49,72.9


In [None]:
df['duration_minutes'] = (df['time:timestamp'] - df['start:timestamp']).dt.total_seconds() / 60.0

### ***Main Part for Introducing Overlapping Executions***
*To introduce overlap among the work items, we define specific criteria. To generate the synthetic log, the user needs to assign appropriate parameter values accordingly.* 

***The variables used in this step are described as follows:***
- `multitask_ratio` : represents the multitasking level ratio (the overall percentage of workitems that are executed in a multitasking fashion).
- `idle_tolerance`  : indicates the maximum idle gap between two workitems if they are not executed back to back. We use the ***Median Idle Duration*** which is computed above.
- `no_multitask_activities` : specifies the activities that should not be overlapped. In our experiment, the ED_Log contains some activity instances that must be executed individually.  
- `debug` : set to True to enable code debugging.

In [None]:
# MAIN PART TO INTRODUCE THE OVERLAP

def introduce_multitasking(
    df,
    multitask_ratio=0.9,           # Multitasking level ratio
    idle_tolerance=578.5,          # median of idle gap in minutes
    random_seed=42,
    #no_multitask_activities=None,  #for all logs: 'None', for ED_log: ['AUTONOMOUSLY TRIAGE', 'AMBULANCE TRIAGE', 'OBSERVATION', 'X-RAY REPORT', 'MRI REPORT'] 
    debug=False):                   #use 'True' if want to debug
    
    np.random.seed(random_seed)
    df = df.copy().sort_values(['org:resource', 'start:timestamp']).reset_index(drop=True)

    #keeping original timestamp of Groundtruth log
    df['old_start:timestamp'] = df['start:timestamp']
    df['old_end:timestamp'] = df['time:timestamp']
    df['overlap_flag'] = False

    used_indices = set()
    all_pairs = []

    # 1:Collect eligible pairs
    for res, group in df.groupby('org:resource'):
        indices = group.index.to_list()
        for i in range(len(indices) - 1):
            idx_A = indices[i]
            idx_B = indices[i + 1]

            # skip if already used
            if idx_A in used_indices or idx_B in used_indices:
                continue

            # skip pairs where either activity belongs to no-multitask list (RUN only if 'no_multitask_activities' is not 'None')
            # if (df.loc[idx_A, 'concept:name'] in no_multitask_activities or df.loc[idx_B, 'concept:name'] in no_multitask_activities):
            #     continue

            eA = df.loc[idx_A, 'time:timestamp']
            sB = df.loc[idx_B, 'start:timestamp']
            dA = df.loc[idx_A, 'duration_minutes']
            dB = df.loc[idx_B, 'duration_minutes']

            #skip the zero-duration workitems
            if dA == 0 or dB == 0:
                continue

            #compute idle duration in minutes
            idle_duration = (sB - eA).total_seconds() / 60.0

            #checking adjacency or within tolerance
            if 0 <= idle_duration <= idle_tolerance:
                all_pairs.append((idx_A, idx_B, idle_duration))

    # 2:Selecting random subset
    total_pairs = len(all_pairs)
    n_select = int(total_pairs * multitask_ratio)

    if n_select == 0:
        if debug:
            print("No eligible pairs for overlap.")
        return df

    selected_indices = np.random.choice(range(total_pairs), size=n_select, replace=False)
    selected_pairs = [all_pairs[i] for i in selected_indices]

    # 3:Apply overlaps
    for idx_A, idx_B, idle_duration in selected_pairs:
        # skip if used
        if idx_A in used_indices or idx_B in used_indices:
            continue

        sA = df.loc[idx_A, 'start:timestamp']
        eA = df.loc[idx_A, 'time:timestamp']
        sB = df.loc[idx_B, 'start:timestamp']
        eB = df.loc[idx_B, 'time:timestamp']
        dA = float(df.loc[idx_A, 'duration_minutes'])
        dB = float(df.loc[idx_B, 'duration_minutes'])

        #Removing the idle gap and writing it back to df
        if idle_duration > 0:
            sB_new = eA
            eB_new = sB_new + pd.to_timedelta(dB, unit='m')

            # write gap-removed timestamps back to dataframe so feasibility uses them
            df.loc[idx_B, 'start:timestamp'] = sB_new
            df.loc[idx_B, 'time:timestamp'] = eB_new

            #update local variables to the new values
            sB = sB_new
            eB = eB_new

            #marking the idle as removed(so extension dont include idle)
            idle_duration = 0

            if debug:
                print(f"[gap-removed] idx_A={idx_A}, idx_B={idx_B}, sB_old={df.loc[idx_B,'old_start:timestamp']}, sB_new={sB}")

        # Overlap attempts
        max_attempts = 20
        success = False

        for attempt in range(max_attempts):
            p = np.random.uniform(0.3, 0.7)
            q = 1 - p
            small_w = min(p, q)
            large_w = max(p, q)

            # del_A = % of dB, del_B = % of dA (below logic is just for so small_duration get large share in order to be 
            # visible or to show impact in overlap
            if dA >= dB:
                del_A = large_w * dB
                del_B = small_w * dA
            else:
                del_A = small_w * dB
                del_B = large_w * dA

            #extended durations
            extension_A = del_A
            extension_B = del_B

            new_end_A = eA + pd.to_timedelta(extension_A, unit='m')
            new_start_B = sB - pd.to_timedelta(extension_B, unit='m')

            # FEASIBILITY_CHECK (remain inside [sA, eB])
            #using the updated sB/eB (after gap removal) to ensure new_end_A and new_start_B fall in the window
            if (new_start_B >= sA) and (new_end_A <= eB):
                success = True
                if debug:
                    print(f"[attempt success] idx_A={idx_A}, idx_B={idx_B}, attempt={attempt+1}, del_A={del_A:.2f}, del_B={del_B:.2f}")
                break
            else:
                #debug toknow why failed
                if debug:
                    reason = []
                    if new_start_B < sA:
                        reason.append(f"new_start_B ({new_start_B}) < sA ({sA})")
                    if new_end_A > eB:
                        reason.append(f"new_end_A ({new_end_A}) > eB ({eB})")
                    if reason:
                        print(f"[attempt fail] idx_A={idx_A}, idx_B={idx_B}, attempt={attempt+1}, reason={' & '.join(reason)}")

        #If found a feasible overlap, apply the changes
        if success:
            # for A: extend its end
            df.loc[idx_A, 'time:timestamp'] = new_end_A
            # for B: move its start earlier
            df.loc[idx_B, 'start:timestamp'] = new_start_B

            #update the durations
            df.loc[idx_A, 'duration_minutes'] = (new_end_A - sA).total_seconds() / 60.0
            df.loc[idx_B, 'duration_minutes'] = (eB - new_start_B).total_seconds() / 60.0

            # flags and used indices
            df.loc[idx_A, 'overlap_flag'] = True
            df.loc[idx_B, 'overlap_flag'] = True
            used_indices.update([idx_A, idx_B])
        else:
            # not successful after attempts 
            if debug:
                print(f"[no feasible overlap] idx_A={idx_A}, idx_B={idx_B}")

    # Summary
    print(f"Total Eligible pairs(each pair has 2WI): {total_pairs}")
    print(f"Selected overlapping pairs(each pair has 2WI): {len(selected_pairs)}")
    print(f"From the selected overlapping pairs, the no. of work items that successfully used and involved in overlapping are: {len(used_indices)}")
    print(f"Idle tolerance used: {idle_tolerance} minutes")

    return df

out = introduce_multitasking(df)   
out.head(20)

Total Eligible pairs(each pair has 2WI): 4381
Selected overlapping pairs(each pair has 2WI): 3942
From the selected overlapping pairs, the no. of work items that successfully used and involved in overlapping are: 4608
Idle tolerance used: 578.5 minutes


Unnamed: 0,concept:name,start:timestamp,time:timestamp,org:resource,duration_seconds,duration_minutes,case:concept:name,old_start:timestamp,old_end:timestamp,overlap_flag
0,Create Request for Quotation Requester,2011-01-01 08:16:00+00:00,2011-01-01 08:31:15.890822556+00:00,Alberto Duport,600.0,15.264847,2,2011-01-01 08:16:00+00:00,2011-01-01 08:26:00+00:00,True
1,Create Request for Quotation Requester,2011-01-01 08:20:02.992940430+00:00,2011-01-01 08:39:00+00:00,Alberto Duport,780.0,18.950118,6,2011-01-01 17:32:00+00:00,2011-01-01 17:45:00+00:00,True
2,Create Purchase Requisition,2011-01-02 05:31:00+00:00,2011-01-02 05:41:00+00:00,Alberto Duport,600.0,10.0,12,2011-01-02 05:31:00+00:00,2011-01-02 05:41:00+00:00,False
3,Create Request for Quotation Requester,2011-01-03 04:02:00+00:00,2011-01-03 04:12:00+00:00,Alberto Duport,600.0,10.0,17,2011-01-03 04:02:00+00:00,2011-01-03 04:12:00+00:00,False
4,Create Purchase Requisition,2011-01-03 17:22:00+00:00,2011-01-03 18:16:00+00:00,Alberto Duport,3240.0,54.0,27,2011-01-03 17:22:00+00:00,2011-01-03 18:16:00+00:00,False
5,Create Purchase Requisition,2011-01-04 12:33:00+00:00,2011-01-04 13:17:00+00:00,Alberto Duport,2640.0,44.0,30,2011-01-04 12:33:00+00:00,2011-01-04 13:17:00+00:00,False
6,Create Purchase Requisition,2011-01-05 10:01:00+00:00,2011-01-05 10:44:52.434444630+00:00,Alberto Duport,1920.0,43.873907,41,2011-01-05 10:01:00+00:00,2011-01-05 10:33:00+00:00,True
7,Analyze Quotation comparison Map,2011-01-05 10:23:21.053072250+00:00,2011-01-05 10:50:00+00:00,Alberto Duport,1020.0,26.649115,27,2011-01-05 12:08:00+00:00,2011-01-05 12:25:00+00:00,True
8,Choose best option,2011-01-05 12:25:00+00:00,2011-01-05 12:25:00+00:00,Alberto Duport,0.0,0.0,27,2011-01-05 12:25:00+00:00,2011-01-05 12:25:00+00:00,False
9,Amend Request for Quotation Requester,2011-01-05 19:14:00+00:00,2011-01-05 19:28:00+00:00,Alberto Duport,840.0,14.0,30,2011-01-05 19:14:00+00:00,2011-01-05 19:28:00+00:00,False


In [None]:
print(len(out[out['overlap_flag']==True]))
print(len(out[out['overlap_flag']==False]))

4608
4511


In [None]:
out['old_duration_minutes'] = (out['old_end:timestamp'] - out['old_start:timestamp']).dt.total_seconds() / 60.0

In [None]:
out['time:timestamp'] = out['time:timestamp'].dt.strftime('%Y-%m-%d %H:%M:%S%z')
out['time:timestamp'] = pd.to_datetime(out['time:timestamp'], utc=True)

out['start:timestamp'] = out['start:timestamp'].dt.strftime('%Y-%m-%d %H:%M:%S%z')
out['start:timestamp'] = pd.to_datetime(out['start:timestamp'], utc=True)

out['duration_minutes'] = (out['time:timestamp'] - out['start:timestamp']).dt.total_seconds() / 60.0
out.head(60)

Unnamed: 0,concept:name,start:timestamp,time:timestamp,org:resource,duration_seconds,duration_minutes,case:concept:name,old_start:timestamp,old_end:timestamp,overlap_flag,old_duration_minutes
0,Create Request for Quotation Requester,2011-01-01 08:16:00+00:00,2011-01-01 08:31:15+00:00,Alberto Duport,600.0,15.25,2,2011-01-01 08:16:00+00:00,2011-01-01 08:26:00+00:00,True,10.0
1,Create Request for Quotation Requester,2011-01-01 08:20:02+00:00,2011-01-01 08:39:00+00:00,Alberto Duport,780.0,18.966667,6,2011-01-01 17:32:00+00:00,2011-01-01 17:45:00+00:00,True,13.0
2,Create Purchase Requisition,2011-01-02 05:31:00+00:00,2011-01-02 05:41:00+00:00,Alberto Duport,600.0,10.0,12,2011-01-02 05:31:00+00:00,2011-01-02 05:41:00+00:00,False,10.0
3,Create Request for Quotation Requester,2011-01-03 04:02:00+00:00,2011-01-03 04:12:00+00:00,Alberto Duport,600.0,10.0,17,2011-01-03 04:02:00+00:00,2011-01-03 04:12:00+00:00,False,10.0
4,Create Purchase Requisition,2011-01-03 17:22:00+00:00,2011-01-03 18:16:00+00:00,Alberto Duport,3240.0,54.0,27,2011-01-03 17:22:00+00:00,2011-01-03 18:16:00+00:00,False,54.0
5,Create Purchase Requisition,2011-01-04 12:33:00+00:00,2011-01-04 13:17:00+00:00,Alberto Duport,2640.0,44.0,30,2011-01-04 12:33:00+00:00,2011-01-04 13:17:00+00:00,False,44.0
6,Create Purchase Requisition,2011-01-05 10:01:00+00:00,2011-01-05 10:44:52+00:00,Alberto Duport,1920.0,43.866667,41,2011-01-05 10:01:00+00:00,2011-01-05 10:33:00+00:00,True,32.0
7,Analyze Quotation comparison Map,2011-01-05 10:23:21+00:00,2011-01-05 10:50:00+00:00,Alberto Duport,1020.0,26.65,27,2011-01-05 12:08:00+00:00,2011-01-05 12:25:00+00:00,True,17.0
8,Choose best option,2011-01-05 12:25:00+00:00,2011-01-05 12:25:00+00:00,Alberto Duport,0.0,0.0,27,2011-01-05 12:25:00+00:00,2011-01-05 12:25:00+00:00,False,0.0
9,Amend Request for Quotation Requester,2011-01-05 19:14:00+00:00,2011-01-05 19:28:00+00:00,Alberto Duport,840.0,14.0,30,2011-01-05 19:14:00+00:00,2011-01-05 19:28:00+00:00,False,14.0


***The final genertated synthetic log contains the following attributes:***
- `case:concept:name` : case_id.
- `concept:name` : activity.
- `org:resource`: resource.
- `start:timestamp` : new start time of the workitem after introducng overlapping execution.
- `time:timestamp` : new end time of the workitem after  introducng overlapping execution.
- `duration_minutes` : new duration of the workitem (time:timestamp - start:timestamp). In subsequent steps of the experiment we call it as ***'MT'(Multitasking)***.
- `old_start:timestamp` : start time of the workitems when there were no multitasking executions (before introducing overlaps), 
- `old_end:timestamp` : end time of the workitems when there were no multitasking executions (before introducing overlaps).
- `old_duration_minutes` : duration of the workitems when there were no multitasking executions. We treat it as ***Ground-truth (GT)*** in our subsequent experiment steps.
- `overlap_flag` : indicates whether workitem used or not for introducing overlaps. 


In [None]:
out = out[['case:concept:name', 'concept:name', 'org:resource', 'start:timestamp', 'time:timestamp', 'duration_minutes', 
                       'old_start:timestamp', 'old_end:timestamp', 'old_duration_minutes', 'overlap_flag']]
out.head(60)

Unnamed: 0,case:concept:name,concept:name,org:resource,start:timestamp,time:timestamp,duration_minutes,old_start:timestamp,old_end:timestamp,old_duration_minutes,overlap_flag
0,2,Create Request for Quotation Requester,Alberto Duport,2011-01-01 08:16:00+00:00,2011-01-01 08:31:15+00:00,15.25,2011-01-01 08:16:00+00:00,2011-01-01 08:26:00+00:00,10.0,True
1,6,Create Request for Quotation Requester,Alberto Duport,2011-01-01 08:20:02+00:00,2011-01-01 08:39:00+00:00,18.966667,2011-01-01 17:32:00+00:00,2011-01-01 17:45:00+00:00,13.0,True
2,12,Create Purchase Requisition,Alberto Duport,2011-01-02 05:31:00+00:00,2011-01-02 05:41:00+00:00,10.0,2011-01-02 05:31:00+00:00,2011-01-02 05:41:00+00:00,10.0,False
3,17,Create Request for Quotation Requester,Alberto Duport,2011-01-03 04:02:00+00:00,2011-01-03 04:12:00+00:00,10.0,2011-01-03 04:02:00+00:00,2011-01-03 04:12:00+00:00,10.0,False
4,27,Create Purchase Requisition,Alberto Duport,2011-01-03 17:22:00+00:00,2011-01-03 18:16:00+00:00,54.0,2011-01-03 17:22:00+00:00,2011-01-03 18:16:00+00:00,54.0,False
5,30,Create Purchase Requisition,Alberto Duport,2011-01-04 12:33:00+00:00,2011-01-04 13:17:00+00:00,44.0,2011-01-04 12:33:00+00:00,2011-01-04 13:17:00+00:00,44.0,False
6,41,Create Purchase Requisition,Alberto Duport,2011-01-05 10:01:00+00:00,2011-01-05 10:44:52+00:00,43.866667,2011-01-05 10:01:00+00:00,2011-01-05 10:33:00+00:00,32.0,True
7,27,Analyze Quotation comparison Map,Alberto Duport,2011-01-05 10:23:21+00:00,2011-01-05 10:50:00+00:00,26.65,2011-01-05 12:08:00+00:00,2011-01-05 12:25:00+00:00,17.0,True
8,27,Choose best option,Alberto Duport,2011-01-05 12:25:00+00:00,2011-01-05 12:25:00+00:00,0.0,2011-01-05 12:25:00+00:00,2011-01-05 12:25:00+00:00,0.0,False
9,30,Amend Request for Quotation Requester,Alberto Duport,2011-01-05 19:14:00+00:00,2011-01-05 19:28:00+00:00,14.0,2011-01-05 19:14:00+00:00,2011-01-05 19:28:00+00:00,14.0,False


In [None]:
from pm4py.objects.log.exporter.xes import exporter as xes_exporter
output_path = 'MR_0.9.xes'

xes_exporter.apply(out, output_path)
print(f"Multitasking execution log saved as {output_path}")

exporting log, completed traces ::   0%|          | 0/608 [00:00<?, ?it/s]

Multitasking execution log saved as MR_0.9.xes
