## üåÄ Quick Start Guide: It's all starting to unravel!

### ‚ÄºÔ∏è NEW PYTORCH VERSION

In this example we'll run through all the basic features the `unravelsports` package offers for converting a `kloppy` dataset of soccer tracking data into graphs for training binary classification graph neural networks using PyTorch Geometric and PyTorch Lightning.

This guide will go through the following steps:

- [**1. Process Data**](#1-processing-data). We'll show how to load a `kloppy` dataset and convert each individual frame into a single graph. All necessary steps (like setting the correct coordinate system, and left-right normalization) are done under the hood of the converter.
- [**1.1 Split Data**](#11-split-data).
- [**2. Initialize Model**](#2-initialize-model). We initialize the built-in binary classification model as presented in [A Graph Neural Network Deep-dive into Successful Counterattacks {A. Sahasrabudhe & J. Bekkers}](https://github.com/USSoccerFederation/ussf_ssac_23_soccer_gnn).
- [**3. Train Model**](#3-train-model). Using the initialized model we train it on the training set created in step [1.1 Splitting Data](#11-split-data).
- [**4. Evaluate Model Performance**](#4-evaluate-model-performance). We calculate model performance using the metrics defined in the model.
- [**5. Predict**](#5-predict). Finally, we apply the trained model to unseen data.
- [**6. Save & Load Model**](#6-save--load-model). Learn how to save and reload your trained models.

<br>
<i>Before we get started it is important to note that the <b>unravelsports</b> library does not have built in functionality to create binary labels, these will need to be supplied by the reader. In this example we use the <b>dummy_labels()</b> functionality that comes with the package. This function creates a single binary label for each frame by randomly assigning it a 0 or 1 value.

When supplying your own labels they need to be in the form of a dictionary (more information on this can be found in the [in-depth Walkthrough](1_kloppy_gnn_train.ipynb)) </i>



-----


The first thing is to run `pip install unravelsports` if you haven't already!


In [None]:
!pip install unravelsports torch torch-geometric pytorch-lightning torchmetrics --quiet

### 1. Process Data

1. Load [Kloppy](https://github.com/PySport/kloppy) dataset. 
    See [in-depth Tutorial](1_kloppy_gnn_train.ipynb) on how to process multiple match files, and to see an overview of all possible settings.
2. Convert to Graph format using `SoccerGraphConverter`
3. Create dataset for easy processing with PyTorch Geometric

In [1]:
from unravel.soccer import SoccerGraphConverter, KloppyPolarsDataset
from unravel.utils import GraphDataset

from kloppy import sportec

# Load Kloppy dataset
kloppy_dataset = sportec.load_open_tracking_data(only_alive=True, limit=500)
kloppy_polars_dataset = KloppyPolarsDataset(
    kloppy_dataset=kloppy_dataset,
)
kloppy_polars_dataset.add_dummy_labels()
kloppy_polars_dataset.add_graph_ids(by=["frame_id"])

# Initialize the Graph Converter with dataset
# Here we use the default settings
converter = SoccerGraphConverter(dataset=kloppy_polars_dataset)

# Compute the graphs and add them to the GraphDataset
pyg_graphs = converter.to_pytorch_graphs()
dataset = GraphDataset(graphs=pyg_graphs, format="pyg")

  from .autonotebook import tqdm as notebook_tqdm


#### 1.1 Split Data

Split the dataset with the built in `split_test_train_validation` method.

In [2]:
train, test, val = dataset.split_test_train_validation(
    split_train=4, split_test=1, split_validation=1, random_seed=43
)

### 2. Initialize Model

1. Initialize the `PyGLightningCrystalGraphClassifier` with PyTorch Lightning.
2. Set up callbacks for model checkpointing and early stopping.
3. Initialize the trainer.

Note: The model settings are chosen to reflect the model used in [A Graph Neural Network Deep-dive into Successful Counterattacks {A. Sahasrabudhe & J. Bekkers}](https://github.com/USSoccerFederation/ussf_ssac_23_soccer_gnn)

In [5]:
from unravel.classifiers import PyGLightningCrystalGraphClassifier
import pytorch_lightning as pyl
from pytorch_lightning.callbacks import ModelCheckpoint, EarlyStopping

# Initialize the Lightning model
lit_model = PyGLightningCrystalGraphClassifier(
    n_layers=3, channels=128, drop_out=0.5, n_out=1
)

# Set up callbacks
checkpoint_callback = ModelCheckpoint(
    dirpath="models/",
    filename="best-model-{epoch:02d}-{val_auc:.2f}",
    save_top_k=1,
    monitor="val_auc",
    mode="max",
)

early_stop_callback = EarlyStopping(monitor="val_loss", patience=5, mode="min")

# Initialize trainer
trainer = pyl.Trainer(
    max_epochs=10,
    accelerator="auto",  # Automatically uses GPU if available
    callbacks=[checkpoint_callback, early_stop_callback],
)

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores


### 3. Train Model

1. Create PyTorch Geometric `DataLoader` for training and validation sets.
2. Train the model using PyTorch Lightning's `trainer.fit()`.

In [6]:
from torch_geometric.loader import DataLoader

batch_size = 32

# Create data loaders
loader_tr = DataLoader(train, batch_size=batch_size, shuffle=True)
loader_va = DataLoader(val, batch_size=batch_size, shuffle=False)

# Train the model
trainer.fit(lit_model, loader_tr, loader_va)


  | Name      | Type                      | Params | Mode  | FLOPs
------------------------------------------------------------------------
0 | model     | PyGCrystalGraphClassifier | 328 K  | train | 0    
1 | criterion | BCELoss                   | 0      | train | 0    
2 | train_auc | BinaryAUROC               | 0      | train | 0    
3 | train_acc | BinaryAccuracy            | 0      | train | 0    
4 | val_auc   | BinaryAUROC               | 0      | train | 0    
5 | val_acc   | BinaryAccuracy            | 0      | train | 0    
6 | test_auc  | BinaryAUROC               | 0      | train | 0    
7 | test_acc  | BinaryAccuracy            | 0      | train | 0    
------------------------------------------------------------------------
328 K     Trainable params
0         Non-trainable params
328 K     Total params
1.315     Total estimated model params size (MB)
27        Modules in train mode
0         Modules in eval mode
0         Total Flops


Epoch 8: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11/11 [00:00<00:00, 42.73it/s, v_num=1, val_loss=0.695, val_auc=0.446, val_acc=0.429, train_loss=0.702, train_auc=0.503, train_acc=0.492]


### 4. Evaluate Model Performance

1. Create a PyTorch Geometric `DataLoader` for the test set.
2. Use `trainer.test()` to evaluate. This automatically uses the metrics defined in the Lightning module.

Note: Our performance is really bad because we're using random labels, very few epochs and a small dataset.

In [7]:
loader_te = DataLoader(test, batch_size=batch_size, shuffle=False)

# Test and get metrics
test_results = trainer.test(lit_model, loader_te)
print(test_results)
# Output: [{'test_loss': 0.234, 'test_auc': 0.85, 'test_acc': 0.78}]

/Users/jbekkers/PycharmProjects/unravelsports/.venv311-test/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:434: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.


Testing DataLoader 0: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [00:00<00:00, 10.24it/s]
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
       Test metric             DataLoader 0
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
        test_acc            0.5301204919815063
        test_auc            0.5209789872169495
        test_loss           0.6924476623535156
‚îÄ‚î

/Users/jbekkers/PycharmProjects/unravelsports/.venv311-test/lib/python3.11/site-packages/pytorch_lightning/utilities/data.py:79: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 437. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.


### 5. Predict

1. Use unseen data to predict on. In this example we're using the test dataset.
2. We have to re-create `loader_te` because the previous one was consumed.
3. Predictions come as a list of tensors (one per batch), so we concatenate them.

In [10]:
import torch

loader_te = DataLoader(
    test, batch_size=batch_size, shuffle=False, num_workers=15, persistent_workers=True
)

# Get predictions
predictions = trainer.predict(lit_model, loader_te)

# predictions is a list of tensors (one per batch)
# Concatenate to get all predictions
all_predictions = torch.cat(predictions).cpu().numpy()

print(f"Predictions shape: {all_predictions.shape}")
print(f"First 10 predictions: {all_predictions[:10]}")

Predicting DataLoader 0: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [00:00<00:00, 73.37it/s]
Predictions shape: (83,)
First 10 predictions: [0.49448484 0.49671462 0.49610677 0.4947461  0.49599648 0.4933405
 0.4959804  0.495771   0.49324304 0.49329212]


### 6. Save & Load Model

PyTorch Lightning offers several ways to save and load models.

#### Saving the Model

In [11]:
# Method 1: Using ModelCheckpoint callback (already done during training)
# The best model is automatically saved

# Method 2: Manual save
model_path = "models/my-graph-classifier.ckpt"
trainer.save_checkpoint(model_path)

# Method 3: Save just the model weights (not trainer state)
torch.save(lit_model.state_dict(), "models/my-model-weights.pth")

`weights_only` was not set, defaulting to `False`.


#### Loading a Saved Model

In [13]:
# Method 1: Load from Lightning checkpoint (Recommended)
loaded_model = PyGLightningCrystalGraphClassifier.load_from_checkpoint(
    "models/my-graph-classifier.ckpt"
)

# Create new trainer for loaded model
new_trainer = pyl.Trainer(accelerator="auto")

# Make predictions
loader_te = DataLoader(
    test, batch_size=32, shuffle=False, num_workers=15, persistent_workers=True
)
predictions = new_trainer.predict(loaded_model, loader_te)
all_predictions = torch.cat(predictions).cpu().numpy()

print(f"Loaded model predictions shape: {all_predictions.shape}")

üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores


Predicting DataLoader 0: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [00:00<00:00, 27.66it/s]
Loaded model predictions shape: (83,)


In [14]:
# Method 2: Load just weights (requires you to create model first)
loaded_model = PyGLightningCrystalGraphClassifier(
    n_layers=3, channels=128, drop_out=0.5, n_out=1
)
loaded_model.load_state_dict(torch.load("models/my-model-weights.pth"))
loaded_model.eval()

# Initialize lazy layers before using
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
loaded_model.to(device)
sample_batch = next(iter(loader_te))
sample_batch = sample_batch.to(device)
with torch.no_grad():
    _ = loaded_model(
        sample_batch.x,
        sample_batch.edge_index,
        sample_batch.edge_attr,
        sample_batch.batch,
    )