### Benchmark Segments

Below are the segments to filter FIFA World Cup 2022 from StatsBomb Open World Cup Event Data.

Note: This requires a different installation as certain dependencies and versions of `kloppy` and `socceraction` don't play nice.

Note 2: We load data in SPADL format, but this does _not_ include the computation of actual (AtomicVAEP) values. These can be found in [benchmarks.csv](benchmark.csv)

In [None]:
!pip install -r ../requirements-events.txt

### Event Dataset

We convert to SPADL. If you want to use Kloppy directly, make sure to include only the relevant events. Note: `action_ids` are internal to SPADL.

In [2]:
from datetime import timedelta

# [STATSBOMB EVENT DATA] We have slightly different event time sec
EVENT_SEGMENTS = [
    {
        "id": "1",
        "timestamp": timedelta(minutes=16, seconds=8, milliseconds=400),
        "end_timestamp": timedelta(minutes=16, seconds=16, microseconds=0),
        "period_id": 1,
        "team_id": "779",
    },
    {
        "id": "2",
        "timestamp": timedelta(minutes=12, seconds=48, microseconds=1155),
        "end_timestamp": timedelta(minutes=13, seconds=18, microseconds=50000),
        "period_id": 1,
        "team_id": "771",
    },
    {
        "id": "3",
        "timestamp": timedelta(minutes=20, seconds=39),
        "end_timestamp": timedelta(minutes=20, seconds=48, microseconds=500_000),
        "period_id": 1,
        "team_id": "779",
    },
    {
        "id": "4",
        "timestamp": timedelta(minutes=34, seconds=0),
        "end_timestamp": timedelta(minutes=34, seconds=17, microseconds=0),
        "period_id": 1,
        "team_id": "779",
    },
    {
        "id": "5",
        "timestamp": timedelta(minutes=13, seconds=15),
        "end_timestamp": timedelta(minutes=13, seconds=21),
        "period_id": 2,
        "team_id": "779",
        "remove_action_ids": [1306],
    },
    {
        "id": "6",
        "timestamp": timedelta(minutes=16, seconds=35, microseconds=500_000),
        "end_timestamp": timedelta(minutes=16, seconds=54),
        "period_id": 2,
        "team_id": "771",
    },
    {
        "id": "7",
        "timestamp": timedelta(minutes=17, seconds=15, microseconds=900_000),
        "end_timestamp": timedelta(minutes=17, seconds=25, microseconds=500_000),
        "period_id": 2,
        "team_id": "779",
    },
    {
        "id": "8",
        "timestamp": timedelta(minutes=17, seconds=55, microseconds=500_000),
        "end_timestamp": timedelta(minutes=18, seconds=5, microseconds=500_000),
        "period_id": 2,
        "team_id": "771",
    },
    {
        "id": "9",
        "timestamp": timedelta(minutes=33, seconds=6, microseconds=0),
        "end_timestamp": timedelta(minutes=33, seconds=13, microseconds=500_000),
        "period_id": 2,
        "team_id": "771",
    },
    {
        "id": "10",
        "timestamp": timedelta(minutes=35, seconds=45, microseconds=500_000),
        "end_timestamp": timedelta(minutes=35, seconds=60, microseconds=600_000),
        "period_id": 2,
        "team_id": "771",
        "remove_action_ids": [1777],
    },
]

In [3]:
from kloppy import statsbomb

# match id of the World Cup Final 2022
match_id = 3869685
event_dataset = statsbomb.load_open_data(
    match_id=match_id,
)


You are about to use StatsBomb public data.
By using this data, you are agreeing to the user agreement. 
The user agreement can be found here: https://github.com/statsbomb/open-data/blob/master/LICENSE.pdf



In [4]:
from socceraction.spadl.kloppy import convert_to_actions
import socceraction.atomic.spadl as atomicspadl

spadl_actions = convert_to_actions(event_dataset)
atomic = atomicspadl.convert_to_atomic(spadl_actions)

In [6]:
import polars as pl

for segment in EVENT_SEGMENTS:
    events = (
        pl.from_dataframe(atomic)
        .filter(
            (pl.col("time_seconds").is_between(segment['timestamp'].total_seconds(), segment['end_timestamp'].total_seconds())) &
            (pl.col("period_id") == segment['period_id']) &
            (pl.col("team_id") == segment['team_id']) &
            (~pl.col("action_id").is_in(segment.get('remove_action_ids', [])))
        )
    )