# 03.0 Querying - Selecting a sample of Extended-OCEL

This notebook demonstrates how to select a representative sample of an extended-OCEL. It provides a minimal example and helps better understand the data in this project. 

The method selects:
- One item from each `sensorEventType`
- One item from each `behaviorEventType`
- One item from each `objectType`
- One `sensorEvent` from each `sensorEventType`
- One `behaviorEvent` from each `behaviorEventType`
- One `object` from each `objectType`

This creates a minimal but complete structural example of your extended-OCEL data.

<b> In the schema folder, you can find an extended_OCEL-minimal_sample.json file.

## Setup

In [1]:
import sys
sys.path.append("..")

from src.extended_ocel.select_sample import select_sample, get_sample_statistics, compare_sizes

## Create sample data

In [8]:
sample_data = select_sample(
    input_file="../data/transformed/player_107631_oced_data_time_bouts_notifications_stress_location_linked_bouts_reports_2.json",
    output_file="../schema/minimal_sample.json"
)

Sample data

In [12]:
import json

# Load JSON file
with open("../schema/minimal_sample.json", 'r') as f:
    data = json.load(f)

# Pretty print with indentation
print(json.dumps(data, indent=2))

{
  "sensorEventTypes": [
    {
      "name": "accelerometer",
      "attributes": [
        {
          "name": "x",
          "type": "number"
        },
        {
          "name": "y",
          "type": "number"
        },
        {
          "name": "z",
          "type": "number"
        },
        {
          "name": "activity_id",
          "type": "string"
        }
      ]
    },
    {
      "name": "activity_type",
      "attributes": [
        {
          "name": "type",
          "type": "string"
        },
        {
          "name": "speed",
          "type": "number"
        },
        {
          "name": "steps",
          "type": "number"
        },
        {
          "name": "walks",
          "type": "number"
        },
        {
          "name": "runs",
          "type": "number"
        },
        {
          "name": "freq",
          "type": "number"
        },
        {
          "name": "distance",
          "type": "number"
        },
        {
          "na

## Display statistics

In [9]:
import pandas as pd
stats_df = pd.DataFrame(list(get_sample_statistics(sample_data).items()), 
                       columns=['Array', 'Count'])
display(stats_df)

Unnamed: 0,Array,Count
0,sensorEventTypes,4
1,behaviorEventTypes,4
2,objectTypes,8
3,sensorEvents,4
4,behaviorEvents,4
5,objects,8
