## 🌀 unravel BigDataBowl data!

First run `pip install unravelsports` if you haven't already!

-------

In [None]:
%pip install unravelsports --quiet

This basic guide shows the differences between loading and converting the BigDataBowl data using the `AmericanFootballGraphConverter` compared to the `SoccerGraphConverter`. For the remaining functionalities please reference the [Quick-Start Guide](0_quick_start_guide.ipynb) or [Section 5](1_kloppy_gnn_train.ipynb#5-creating-a-custom-graph-dataset) and beyond of the Walkthrough.

If anything is unclear, please read through [the Soccer example](1_kloppy_gnn_train.ipynb) in it's interity. 

-------

### BigDataBowl data

Before you can get started you have to download the BigDataBowl files from [Kaggle](https://www.kaggle.com/competitions/nfl-big-data-bowl-2025/data) after agreeing to the terms and conditions of the BigDataBowl. 

Please note development of this implementation started prior to the BigDataBowl 2025 data release and was done using BigDataBowl 2023 data. This implementation supports the 2025 dataset, but the 2025 tracking data files have 7x as many frames (1,100,000 compared to 7,100,000), because of the BEFORE_SNAP and AFTER_SNAP frame types. It is highly adviced to walk through this notebook with 2023 data first, because it will run much faster (roughly ~2 compared to ~15 minutes for one tracking data csv on an M1 MacBook). After you've tested it feel free to progress with the 2025 dataset.


-------

### American Football as Graphs

Although a lot of the functionality is similar to the Soccer implementation, the American Football implemenation has some clear differences, because of the different file structure, the lack of a [Kloppy](https://github.com/PySport/kloppy)-like library for American Football data, and both sports having different requirements. And most importantly the American Football implementation uses Polars under the hood! 

### Contents:

- [**1. BigDataBowlDataset**](#1-bigdatabowldataset)
- [**2. AmericanFootballGraphConverter**](#2-americanfootballgraphconverter)
- [**3. Spektral Graphs**](#3-spektral-graphs)

ℹ️ [**Graphs FAQ**](graphs_faq.md)

-------


### 1. BigDataBowlDataset

- The `AmericanFootballGraphConverter` expects a Polars DataFrame. You can load this with the `BigDataBowlDataset` class
- The `BigDataBowlDataset`-class takes 3 parameters, namely the paths to the tracking, players and plays files (str) respectively. 
- The `BigDataBowlDataset` standardizes every play attacking from left to right, and it adjusts orientation (`o`) and direction (`dir`) accordingly. It also joins the `plays.csv` and `players.csv` to the tracking data to use the in possession team, and player positions (QB identifier), weight and height respectively. It also converts weight to kilograms and height to centimeters.
- We add `graph_id`s to the `BigDataBowlDataset` by calling `.add_graph_ids()`. We don't have to use dummy graph ids as we do for ⚽ because the data comes with `playId`s. It's recommended to assign graph identifiers at least by `gameId` and `playId` (the default behaviour). 
- We add dummy labels by calling `.add_dummy_labels()`. It's still up to the user to define the _actual_ labels they can use to train on. The `add_dummy_labels` adds a column `"label"` to the Polars DataFrame. To add your own labels simply remove the `.add_dummy_labels()` call and add your own label column. 

In [1]:
from unravel.american_football import BigDataBowlDataset, AmericanFootballGraphConverter

bdb = BigDataBowlDataset(
    tracking_file_path=".data/nfl-big-data-bowl-2023/week1.csv",
    players_file_path=".data/nfl-big-data-bowl-2023/players.csv",
    plays_file_path=".data/nfl-big-data-bowl-2023/plays.csv",
)
bdb.load()
bdb.add_graph_ids(by=["gameId", "playId"], column_name="graph_id")
bdb.add_dummy_labels(by=["gameId", "playId", "frameId"], column_name="label")

gameId,playId,nflId,frameId,time,jerseyNumber,team,playDirection,x,y,s,a,dis,o,dir,event,officialPosition,height_cm,weight_kg,possessionTeam,graph_id,label
i64,i64,f64,i64,datetime[μs],i64,str,str,f64,f64,f64,f64,f64,f64,f64,str,str,f64,f64,str,str,i64
2021090900,97,25511.0,1,2021-09-10 00:26:31.100,12,"""TB""","""right""",-22.23,-2.43,0.29,0.3,0.03,2.882586,1.483355,"""None""","""QB""",193.04,102.0582,"""TB""","""2021090900-97""",1
2021090900,97,25511.0,2,2021-09-10 00:26:31.200,12,"""TB""","""right""",-22.22,-2.43,0.23,0.11,0.02,2.8681,1.620887,"""None""","""QB""",193.04,102.0582,"""TB""","""2021090900-97""",1
2021090900,97,25511.0,3,2021-09-10 00:26:31.300,12,"""TB""","""right""",-22.22,-2.41,0.16,0.1,0.01,2.796716,1.196423,"""None""","""QB""",193.04,102.0582,"""TB""","""2021090900-97""",1
2021090900,97,25511.0,4,2021-09-10 00:26:31.400,12,"""TB""","""right""",-22.27,-2.4,0.15,0.24,0.06,2.655169,-1.102175,"""None""","""QB""",193.04,102.0582,"""TB""","""2021090900-97""",0
2021090900,97,25511.0,5,2021-09-10 00:26:31.500,12,"""TB""","""right""",-22.31,-2.39,0.25,0.18,0.04,2.588847,-1.264491,"""None""","""QB""",193.04,102.0582,"""TB""","""2021090900-97""",1
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
2021091300,4845,-9999.9,30,2021-09-14 03:54:20.600,,"""football""","""left""",7.22,1.42,3.58,1.95,0.37,,,"""pass_forward""",,,,"""LV""","""2021091300-4845""",1
2021091300,4845,-9999.9,31,2021-09-14 03:54:20.700,,"""football""","""left""",9.69,0.19,17.16,0.25,2.77,,,"""None""",,,,"""LV""","""2021091300-4845""",0
2021091300,4845,-9999.9,32,2021-09-14 03:54:20.800,,"""football""","""left""",11.34,-0.34,17.1,1.05,1.73,,,"""None""",,,,"""LV""","""2021091300-4845""",1
2021091300,4845,-9999.9,33,2021-09-14 03:54:20.900,,"""football""","""left""",12.96,-0.88,16.98,1.67,1.71,,,"""None""",,,,"""LV""","""2021091300-4845""",0


### 2. AmericanFootballGraphConverter

ℹ️ For more information on:
- What a Graph is, check out [Graph FAQ Section A](graphs_faq.ipynb)
- What features each Graph has, check out [Graph FAQ Section C](graphs_faq.ipynb)

#### Parameters
- To learn what parameters we can pass to the `AmericanFootballGraphConverter`, check out [Graph FAQ Section B](graphs_faq.ipynb)
- We pass `dataset` as a `BigDataBowlDataset` object.
- The `AmericanFootballGraphConverter` also takes an optional `label_col` and `graph_id_col` parameter. These default to "label" and "graph_id" respectively and only need to be specified when they have been changed in the steps prior.
- Although we convert weight and height to kilograms and centimeters, the coordinate system, speed and acceleration remains in the provided formats. Hence, `max_*_speed` and `max_*_acceleration` are in yards/second and yards/second^2 respectively. 

In [2]:
converter = AmericanFootballGraphConverter(
    dataset=bdb,
    label_col="label",
    graph_id_col="graph_id",
    max_player_speed=8.0,
    max_ball_speed=28.0,
    max_player_acceleration=10.0,
    max_ball_acceleration=10.0,
    self_loop_ball=True,
    adjacency_matrix_connect_type="ball",
    adjacency_matrix_type="split_by_team",
    label_type="binary",
    defending_team_node_value=0.0,
    attacking_non_qb_node_value=0.1,
    random_seed=False,
    pad=False,
    verbose=False,
    chunk_size=10_000,
)
spektral_graphs = converter.to_graph_frames()

### 3. Spektral Graphs

The `converter` allows for transformation in 3 different ways:
1. `converter.to_graph_frames()` returns a list of dictionary objects, one dict per frame. Each dictionary has the keys `x`, `e`, `a`, `y` and `graph_id`
2. `converter.to_spektral_graphs()` returns a list of Spektral `Graph` objects that can be loaded directly into `CustomSpektralDataset(graphs=converter.to_spektral_graphs())`
3. `converter.to_pickle(file_path=file_path)` stores the converted frames into a `pickle` file. You can load all pickle files directly with `CustomSpektralDataset(pickle_folder=pickle_folder)`
- For a comprehensive list of American Football node and edge features please reference [Graph FAQ Section C](graphs_faq.md)

You should now have enough information to continue on recreating [Section 5](examples/1_kloppy_gnn_train.ipynb#5-creating-a-custom-graph-dataset) and beyond of the Walkthrough with this American Football!

Warning: As mentioned the data files for the 2025 BigDataBowl are pretty huge, which means the converted (pickle) files will be even bigger. 