# Tutorial 2: Conformance Checking (Order-to-Cash)

This tutorial demonstrates how to perform conformance checking to compare a process model with an event log for an order-to-cash (O2C) process.

**Reproducibility Note**: This notebook uses deterministic data and algorithms, ensuring identical results on every run.

## 1. Import necessary libraries

In [None]:
import pandas as pd
from erp_processminer.eventlog.serialization import dataframe_to_log
from erp_processminer.discovery.heuristics_miner import discover_petri_net_with_heuristics
from erp_processminer.conformance.token_replay import calculate_conformance
from erp_processminer.visualization.graphs import visualize_petri_net

## 2. Create a Sample Event Log

For this tutorial, we will start with an event log that has already been created. This represents a simple order-to-cash process with some deviant behavior.

In [None]:
log_data = [
    # Conforming trace: standard O2C flow
    ['C-01', 'Create Order', '2023-02-01'],
    ['C-01', 'Confirm Order', '2023-02-02'],
    ['C-01', 'Create Shipment', '2023-02-03'],
    ['C-01', 'Ship Goods', '2023-02-04'],
    ['C-01', 'Send Invoice', '2023-02-05'],
    ['C-01', 'Receive Payment', '2023-02-10'],

    # Deviant trace: order cancellation (not in expected model)
    ['C-02', 'Create Order', '2023-02-02'],
    ['C-02', 'Confirm Order', '2023-02-03'],
    ['C-02', 'Cancel Order', '2023-02-04'],

    # Conforming trace: standard O2C flow
    ['C-03', 'Create Order', '2023-02-05'],
    ['C-03', 'Confirm Order', '2023-02-06'],
    ['C-03', 'Create Shipment', '2023-02-07'],
    ['C-03', 'Ship Goods', '2023-02-08'],
    ['C-03', 'Send Invoice', '2023-02-09'],
    ['C-03', 'Receive Payment', '2023-02-15'],
]

log_df = pd.DataFrame(log_data, columns=['case_id', 'activity', 'timestamp'])
event_log = dataframe_to_log(log_df)

print(f"Event log created with {len(event_log.traces)} traces.")
print("\nTraces:")
for trace in event_log.traces:
    activities = [e.activity for e in trace.events]
    print(f"  {trace.case_id}: {' -> '.join(activities)}")

## 3. Discover a Process Model

We will use the Heuristics Miner to discover a Petri net from the event log. We'll use a relatively high dependency threshold to get a clean model.

In [None]:
petri_net = discover_petri_net_with_heuristics(event_log, dependency_thresh=0.7)

print(f"Discovered Petri net with {len(petri_net.places)} places and {len(petri_net.transitions)} transitions")
print("\nTransitions (activities):")
for t in sorted(petri_net.transitions, key=lambda x: x.name):
    print(f"  - {t.label or t.name}")

# Visualize the discovered model
try:
    g = visualize_petri_net(petri_net, output_file='o2c_petri_net')
    print("\nPetri net visualization saved to 'o2c_petri_net.png'")
    display(g)
except Exception as e:
    print(f"\nVisualization skipped (Graphviz not available): {e}")

## 4. Perform Conformance Checking

Now we can use token-based replay to check the conformance of the log against the discovered model. A fitness of 1.0 means the log perfectly conforms to the model.

In [None]:
avg_fitness, trace_results = calculate_conformance(event_log, petri_net)

print("=== Conformance Results ===")
print(f"\nAverage Fitness: {avg_fitness:.3f}")

print("\nPer-trace fitness:")
for i, res in enumerate(trace_results):
    trace = event_log.traces[i]
    status = "CONFORMING" if res['fitness'] > 0.99 else "DEVIANT"
    print(f"  {trace.case_id}: Fitness = {res['fitness']:.3f} [{status}]")
    print(f"    Missing tokens: {res['missing_tokens']}, Remaining tokens: {res['remaining_tokens']}")

## 5. Deviation Statistics

Let's compute detailed deviation statistics to understand where conformance issues occur.

In [None]:
# Compute deviation statistics
total_traces = len(trace_results)
conforming_traces = sum(1 for r in trace_results if r['fitness'] > 0.99)
deviant_traces = total_traces - conforming_traces

print("=== Deviation Statistics ===")
print(f"\nTotal traces: {total_traces}")
print(f"Conforming traces: {conforming_traces} ({conforming_traces/total_traces*100:.1f}%)")
print(f"Deviant traces: {deviant_traces} ({deviant_traces/total_traces*100:.1f}%)")

# Identify activities causing deviations
print("\n=== Activities in Deviant Traces ===")
for i, res in enumerate(trace_results):
    if res['fitness'] < 0.99:
        trace = event_log.traces[i]
        activities = [e.activity for e in trace.events]
        print(f"\n{trace.case_id} (fitness: {res['fitness']:.3f}):")
        print(f"  Activities: {' -> '.join(activities)}")
        
        # Identify potentially unexpected activities
        expected_activities = {'Create Order', 'Confirm Order', 'Create Shipment', 
                              'Ship Goods', 'Send Invoice', 'Receive Payment'}
        unexpected = set(activities) - expected_activities
        if unexpected:
            print(f"  Unexpected activities: {unexpected}")

In [None]:
# Fitness distribution summary
print("=== Fitness Distribution ===")
fitness_values = [r['fitness'] for r in trace_results]

print(f"\nMin fitness: {min(fitness_values):.3f}")
print(f"Max fitness: {max(fitness_values):.3f}")
print(f"Average fitness: {sum(fitness_values)/len(fitness_values):.3f}")

# Token statistics
total_missing = sum(r['missing_tokens'] for r in trace_results)
total_remaining = sum(r['remaining_tokens'] for r in trace_results)
print(f"\nTotal missing tokens: {total_missing}")
print(f"Total remaining tokens: {total_remaining}")

## 6. Summary

In this tutorial, we demonstrated conformance checking on an O2C process:

1. **Event Log Creation**: Created a sample O2C event log with both conforming and deviant traces
2. **Process Discovery**: Used Heuristics Miner to discover a Petri net model
3. **Conformance Checking**: Applied token-based replay to compute fitness scores
4. **Deviation Statistics**: Analyzed which traces deviate and why

**Key Findings**:
- Trace C-02 has lower fitness because it contains 'Cancel Order' activity
- This activity is not well-represented in the discovered model
- Conformance checking helps identify such deviations for compliance auditing

This workflow is **fully reproducible** - running the notebook again will produce identical results.