# Negative Weighted Events

In this notebook, we adapt the negative events-based measure from van den Broucke et al. from 2014 in the paper: "Determining Process Model Precision and Generalization with Weighted Artificial Negative Events" (doi: 10.1109/TKDE.2013.130). <br>
With respect to the actual executions of a process, the measure focuses on negative events, where a negative event represents information about activities that were prevented from taking place in the first place. Since they are rarely recorded in reality, we induce them into the log artificially. <br>
So, we basically induce all elements that weren't fired at this specific position in the event log. For simplicity, the authors assume that all other events, outside the current event itself, are inserted at each position in the trace. Afterwards, we check which of these events could be fired at the current position in the corresponding position or not. If an event can be fired, we increase a counter for allowed generalizations $AG$ by one, and if not, we increase a counter for allowed generalizations $DG$ by one. <br>
The final measure is calculated as follows: <br>
For event log $E$ and process model $M$,
$$ Generalization (L,M) = AG\,/\, (AG+DG).$$
One has to be aware, that this is not the complete logic of the paper, which is implemented in a further step by introducing a scoring mechanism. This is done due to the fact that we have to assume the completeness of an event log to induce negative events and this is normally not true in reality, as an event log only represents a subset. The scoring mechanism will allow us to loosen this assumption by only assuming the completeness property on a small window before the execution of the current activity.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [21]:
from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from ocpa.algo.discovery.ocpn import algorithm as ocpn_discovery_factory
from src.utils import get_happy_path_log, create_flower_model, generate_variant_model
from ocpa.objects.log.importer.csv import factory as ocel_import_factory_csv
import pickle
import multiprocessing
import numpy as np
from tqdm import tqdm
from models.negative_events_measure_parallel import process_group_without, negative_events_without_weighting_parallel

# O2C Log

### Standard Petri Net

In a first step, we load the OCEL-log into the notebook and generate the object-centric petri net.

In [3]:
filename = "../src/data/jsonocel/order_process.jsonocel"
ocel = ocel_import_factory.apply(filename)
ocpn = ocpn_discovery_factory.apply(ocel, parameters={"debug": False})

In [5]:
if __name__ == '__main__':
    
    #generate the variables needed for the parallel processing
    grouped_df, filtered_preceding_events_full, filtered_preceding_events, filtered_succeeding_activities_updated, events, silent_transitions = negative_events_without_weighting_parallel(ocel, ocpn)

    DG = 0  # Disallowed Generalization initialisation
    AG = 0  # Allowed Generalization initialisation

    # Create a multiprocessing Pool
    pool = multiprocessing.Pool(5)

    # Prepare the arguments for parallel processing
    args = [(group_key, df_group, filtered_preceding_events_full, filtered_preceding_events,
             filtered_succeeding_activities_updated, events, silent_transitions, AG, DG)
            for group_key, df_group in grouped_df]

    
    # Apply the parallel processing to each group with additional variables
    results = []
    with tqdm(total=len(grouped_df)) as pbar:
        for result in pool.imap_unordered(process_group_without, args):
            results.append(result)
            pbar.update(1)

    # Calculate the final sums of AG and DG
    final_AG = sum([result[0] for result in results])
    final_DG = sum([result[1] for result in results])

    # Close the multiprocessing Pool and join the processes
    pool.close()
    pool.join()

    # calculate the generalization based on the paper
    generalization = final_AG / (final_AG + final_DG)
    print(np.round(generalization,4))

Check the arcs: 100%|██████████| 46/46 [00:00<?, ?it/s]
100%|██████████| 48/48 [00:06<00:00,  6.96it/s]


0.5063


### Happy Path Petri Net

In [6]:
happy_path__ocel = get_happy_path_log(filename)

In [7]:
happy_path_ocpn = ocpn_discovery_factory.apply(happy_path__ocel, parameters={"debug": False})

In [8]:
if __name__ == '__main__':
    
    #generate the variables needed for the parallel processing
    grouped_df, filtered_preceding_events_full, filtered_preceding_events, filtered_succeeding_activities_updated, events, silent_transitions = negative_events_without_weighting_parallel(ocel, happy_path_ocpn)

    DG = 0  # Disallowed Generalization initialisation
    AG = 0  # Allowed Generalization initialisation

    # Create a multiprocessing Pool
    pool = multiprocessing.Pool(5)

    # Prepare the arguments for parallel processing
    args = [(group_key, df_group, filtered_preceding_events_full, filtered_preceding_events,
             filtered_succeeding_activities_updated, events, silent_transitions, AG, DG)
            for group_key, df_group in grouped_df]

    
    # Apply the parallel processing to each group with additional variables
    results = []
    with tqdm(total=len(grouped_df)) as pbar:
        for result in pool.imap_unordered(process_group_without, args):
            results.append(result)
            pbar.update(1)

    # Calculate the final sums of AG and DG
    final_AG = sum([result[0] for result in results])
    final_DG = sum([result[1] for result in results])

    # Close the multiprocessing Pool and join the processes
    pool.close()
    pool.join()

    # calculate the generalization based on the paper
    generalization = final_AG / (final_AG + final_DG)
    print(np.round(generalization,4))

Check the arcs: 100%|██████████| 26/26 [00:00<?, ?it/s]
100%|██████████| 48/48 [00:08<00:00,  5.87it/s]


0.3073


### Flower Model Petri Net

In [9]:
ots = ["order","item","delivery"]

In [10]:
flower_ocpn = create_flower_model(filename,ots)

In [11]:
if __name__ == '__main__':
    
    #generate the variables needed for the parallel processing
    grouped_df, filtered_preceding_events_full, filtered_preceding_events, filtered_succeeding_activities_updated, events, silent_transitions = negative_events_without_weighting_parallel(ocel, flower_ocpn)

    DG = 0  # Disallowed Generalization initialisation
    AG = 0  # Allowed Generalization initialisation

    # Create a multiprocessing Pool
    pool = multiprocessing.Pool(5)

    # Prepare the arguments for parallel processing
    args = [(group_key, df_group, filtered_preceding_events_full, filtered_preceding_events,
             filtered_succeeding_activities_updated, events, silent_transitions, AG, DG)
            for group_key, df_group in grouped_df]

    
    # Apply the parallel processing to each group with additional variables
    results = []
    with tqdm(total=len(grouped_df)) as pbar:
        for result in pool.imap_unordered(process_group_without, args):
            results.append(result)
            pbar.update(1)

    # Calculate the final sums of AG and DG
    final_AG = sum([result[0] for result in results])
    final_DG = sum([result[1] for result in results])

    # Close the multiprocessing Pool and join the processes
    pool.close()
    pool.join()

    # calculate the generalization based on the paper
    generalization = final_AG / (final_AG + final_DG)
    print(np.round(generalization,4))

Check the arcs: 100%|██████████| 32/32 [00:00<00:00, 32040.52it/s]
100%|██████████| 48/48 [00:07<00:00,  6.10it/s]


0.965


# DS4 Log

### Standard Petri Net

In a first step, we load the OCEL-log into the notebook and generate the object-centric petri net.

In [12]:
filename = "../src/data/jsonocel/DS4.jsonocel"
ocel = ocel_import_factory.apply(filename)
ocpn = ocpn_discovery_factory.apply(ocel, parameters={"debug": False})

In [13]:
if __name__ == '__main__':
    
    #generate the variables needed for the parallel processing
    grouped_df, filtered_preceding_events_full, filtered_preceding_events, filtered_succeeding_activities_updated, events, silent_transitions = negative_events_without_weighting_parallel(ocel, ocpn)

    DG = 0  # Disallowed Generalization initialisation
    AG = 0  # Allowed Generalization initialisation

    # Create a multiprocessing Pool
    pool = multiprocessing.Pool(5)

    # Prepare the arguments for parallel processing
    args = [(group_key, df_group, filtered_preceding_events_full, filtered_preceding_events,
             filtered_succeeding_activities_updated, events, silent_transitions, AG, DG)
            for group_key, df_group in grouped_df]

    
    # Apply the parallel processing to each group with additional variables
    results = []
    with tqdm(total=len(grouped_df)) as pbar:
        for result in pool.imap_unordered(process_group_without, args):
            results.append(result)
            pbar.update(1)

    # Calculate the final sums of AG and DG
    final_AG = sum([result[0] for result in results])
    final_DG = sum([result[1] for result in results])

    # Close the multiprocessing Pool and join the processes
    pool.close()
    pool.join()

    # calculate the generalization based on the paper
    generalization = final_AG / (final_AG + final_DG)
    print(np.round(generalization,4))

Check the arcs: 100%|██████████| 364/364 [00:00<00:00, 131591.68it/s]
100%|██████████| 14507/14507 [03:22<00:00, 71.46it/s] 


0.3604


### Happy Path Petri Net

In [14]:
happy_path__ocel = get_happy_path_log(filename)

In [15]:
happy_path_ocpn = ocpn_discovery_factory.apply(happy_path__ocel, parameters={"debug": False})

In [16]:
if __name__ == '__main__':
    
    #generate the variables needed for the parallel processing
    grouped_df, filtered_preceding_events_full, filtered_preceding_events, filtered_succeeding_activities_updated, events, silent_transitions = negative_events_without_weighting_parallel(ocel, happy_path_ocpn)

    DG = 0  # Disallowed Generalization initialisation
    AG = 0  # Allowed Generalization initialisation

    # Create a multiprocessing Pool
    pool = multiprocessing.Pool(5)

    # Prepare the arguments for parallel processing
    args = [(group_key, df_group, filtered_preceding_events_full, filtered_preceding_events,
             filtered_succeeding_activities_updated, events, silent_transitions, AG, DG)
            for group_key, df_group in grouped_df]

    
    # Apply the parallel processing to each group with additional variables
    results = []
    with tqdm(total=len(grouped_df)) as pbar:
        for result in pool.imap_unordered(process_group_without, args):
            results.append(result)
            pbar.update(1)

    # Calculate the final sums of AG and DG
    final_AG = sum([result[0] for result in results])
    final_DG = sum([result[1] for result in results])

    # Close the multiprocessing Pool and join the processes
    pool.close()
    pool.join()

    # calculate the generalization based on the paper
    generalization = final_AG / (final_AG + final_DG)
    print(np.round(generalization,4))

Check the arcs: 100%|██████████| 70/70 [00:00<00:00, 35027.59it/s]
100%|██████████| 14507/14507 [03:38<00:00, 66.45it/s] 


0.0842


### Flower Model Petri Net

In [17]:
ots = ["Payment application","Control summary","Entitlement application","Geo parcel document","Inspection","Reference alignment"]

In [18]:
flower_ocpn = create_flower_model(filename,ots)

In [19]:
if __name__ == '__main__':
    
    #generate the variables needed for the parallel processing
    grouped_df, filtered_preceding_events_full, filtered_preceding_events, filtered_succeeding_activities_updated, events, silent_transitions = negative_events_without_weighting_parallel(ocel, flower_ocpn)

    DG = 0  # Disallowed Generalization initialisation
    AG = 0  # Allowed Generalization initialisation

    # Create a multiprocessing Pool
    pool = multiprocessing.Pool(5)

    # Prepare the arguments for parallel processing
    args = [(group_key, df_group, filtered_preceding_events_full, filtered_preceding_events,
             filtered_succeeding_activities_updated, events, silent_transitions, AG, DG)
            for group_key, df_group in grouped_df]

    
    # Apply the parallel processing to each group with additional variables
    results = []
    with tqdm(total=len(grouped_df)) as pbar:
        for result in pool.imap_unordered(process_group_without, args):
            results.append(result)
            pbar.update(1)

    # Calculate the final sums of AG and DG
    final_AG = sum([result[0] for result in results])
    final_DG = sum([result[1] for result in results])

    # Close the multiprocessing Pool and join the processes
    pool.close()
    pool.join()

    # calculate the generalization based on the paper
    generalization = final_AG / (final_AG + final_DG)
    print(np.round(generalization,4))

Check the arcs: 100%|██████████| 162/162 [00:00<00:00, 23157.16it/s]
100%|██████████| 14507/14507 [03:51<00:00, 62.78it/s] 


0.6885


### Variant Model Petri Net

Import the primarily generated variant log for our measure computation, while we generate the variant model with the original log.

In [22]:
filename_variant = "../src/data/csv/DS4_variant_log.csv" 
object_types = ["Payment application","Control summary","Entitlement application","Geo parcel document","Inspection","Reference alignment"]
parameters = {"obj_names": object_types,
              "val_names": [],
              "act_name": "event_activity",
              "time_name": "event_timestamp",
              "sep": ","}
ocel_variant = ocel_import_factory_csv.apply(file_path=filename_variant, parameters=parameters)

In [25]:
with open("../src/data/csv/DS4_variant_ocpn.pickle", "rb") as file:
    variant_ocpn = pickle.load(file)

In [27]:
if __name__ == '__main__':
    
    #generate the variables needed for the parallel processing
    grouped_df, filtered_preceding_events_full, filtered_preceding_events, filtered_succeeding_activities_updated, events, silent_transitions = negative_events_without_weighting_parallel(ocel_variant, variant_ocpn)

    DG = 0  # Disallowed Generalization initialisation
    AG = 0  # Allowed Generalization initialisation

    # Create a multiprocessing Pool
    pool = multiprocessing.Pool(7)

    # Prepare the arguments for parallel processing
    args = [(group_key, df_group, filtered_preceding_events_full, filtered_preceding_events,
             filtered_succeeding_activities_updated, events, silent_transitions, AG, DG)
            for group_key, df_group in grouped_df]

    
    # Apply the parallel processing to each group with additional variables
    results = []
    with tqdm(total=len(grouped_df)) as pbar:
        for result in pool.imap_unordered(process_group_without, args):
            results.append(result)
            pbar.update(1)

    # Calculate the final sums of AG and DG
    final_AG = sum([result[0] for result in results])
    final_DG = sum([result[1] for result in results])

    # Close the multiprocessing Pool and join the processes
    pool.close()
    pool.join()

    # calculate the generalization based on the paper
    generalization = final_AG / (final_AG + final_DG)
    print(np.round(generalization,4))

Check the arcs:   1%|          | 5242/695752 [07:50<17:13:36, 11.13it/s]


KeyboardInterrupt: 