# 03.1 Querying - Conversion of the Extended to the Standard OCEL format

This notebook demonstrates how to convert an extended OCEL to the standard format. We demonstrate backward compatibility to existing PM4PY methods. 

We argue that existing object-centric process mining techniques can be employed through backwards compatibility of the proposed extended OCEL format with the established one. While there is the need to preserve sensor data for multi-dimentionality, it may be omitted in each temporary OCEL profile. A OCEL profile is a specific perspective of the single ground truth OCEL, including a subset of the types of behavior events and objects. Creating an OCEL can be viewed as a drill-down approach, where we go from person-centric health data to behavior-specific view of a personal health behavior. Therefore, for modeling interpretable behavior models, the sensor dimension is represented by the recognized behavior events or their derived attributes. Note that OCEL profiles are queried from the single ground truth OCED without extracting new data. There is no actual data loss in creating OCED profiles, which is preserved at the single ground truth OCEL.

## Setup

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import sys
sys.path.append('..')

from src.extended_ocel.covert_to_ocel import ExtendedOCELToStandardOCELCoverter
from src.extended_ocel.read_json import read_json

## Load Extended OCED from JSON file

In [4]:
extended_ocel_data_file = f"../data/transformed/player_107631_oced_data_time_bouts_notifications_stress_location_linked_bouts_reports_2.json"
extended_ocel_data = read_json(extended_ocel_data_file)

## Validate Extended OCEL Format

In [None]:
from src.extended_ocel.validation import apply
# Validate using default schema
is_valid, errors = apply("../data/transformed/player_107631_oced_data_time_bouts_notifications_stress_location_linked.json","../schema/extended-OCEL.json")
is_valid

In [None]:
errors

## Convert from extended to standard OCEL

In [5]:
converter =  ExtendedOCELToStandardOCELCoverter(extended_ocel_data)
ocel_data = converter.convert()
converter.save_to_file("../data/ocel/standard_ocel_data_linked3.jsonocel")

## Validate Standard OCEL Format  


In [None]:
from pm4py.objects.ocel.validation import jsonocel
validation_result = jsonocel.apply("../data/ocel/standard_ocel_data_linked.jsonocel", "../schema/OCEL-2.0-Standard.json")
print(validation_result)

## Visualization of Data Loss

#### Event Type Distribution

In [None]:
# Create visualizations
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Original event types
sensor_types = pd.Series([e['sensorEventType'] for e in converter.extended_ocel_data['sensorEvents']]).value_counts()
behavior_types = pd.Series([e['behaviorEventType'] for e in converter.extended_ocel_data['behaviorEvents']]).value_counts()

# Plot sensor events
bars1 = ax1.bar(sensor_types.index, sensor_types.values, label='Sensor Events', alpha=0.6)
# Add count numbers on top of sensor event bars
for bar in bars1:
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height,
             f'{int(height)}',
             ha='center', va='bottom')

# Plot behavior events
bars2 = ax1.bar(behavior_types.index, behavior_types.values, label='Behavior Events', alpha=0.6)
# Add count numbers on top of behavior event bars
for bar in bars2:
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height,
             f'{int(height)}',
             ha='center', va='bottom')

ax1.set_title('OCEL-mHealth Event Types')
ax1.legend()
ax1.tick_params(axis='x', rotation=45)

# Converted event types
ocel_types = pd.Series([e['type'] for e in ocel_data['events']]).value_counts()
bars3 = ax2.bar(ocel_types.index, ocel_types.values)
# Add count numbers on top of OCEL event bars
for bar in bars3:
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height,
             f'{int(height)}',
             ha='center', va='bottom')

ax2.set_title('OCEL Event Types')
ax2.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

## Key Takeaways

1. **Data Loss**:
   - All sensor events and their relationships are dropped
   - Numeric attribute values are converted to strings
   - Sensor event relationships to behavior events are lost

2. **Preserved Data**:
   - All behavior events are maintained
   - All objects and their attributes are preserved
   - Object relationships are maintained
   - Temporal information is preserved

3. **Implications**:
   - OCEL-mHealth provides richer data for sensor-based analysis
   - OCEL compatibility ensures existing process mining tools can be used
   - The conversion is lossless for behavior events and objects