In [1]:
#| default_exp model.agent_imputer

In [None]:
#| hide
%load_ext autoreload
%autoreload 2
from IPython.core.debugger import set_trace

In [None]:
#| export

import torch
from torch import nn
import model_architecture.Time_LSTM_Module as TimeLSTM
import model_architecture.GNN_Module as GCN
import pytorch_lightning as pl
import itertools

# Agent Imputer Model

<img src="../img/model_schema.png" />
<figcaption>Fig.1 Agent Imputer model architecture.</figcaption>


$N$ = 22 agent. $B$ = Batch size. $L$ = sequence length. $I$ = features number. $H_{1}$ = 100 hidden size layer. $H_{2}$ = 50 hidden size layer. $H_{3}$ = 64 hidden size layer. $H_{4}$ = 32 hidden size layer.

## PROBLEM FORMULATION

The model consists of a group of $N$ agents, represented as $A = [\mathbf{a}_1, \ldots, \mathbf{a}_N]$. In the context of football, there are typically 22 players on the field, so $N=22$. The time-series data we are working with is a sequence of $T$ events denoted as $E = [\mathbf{e}_1, \ldots, \mathbf{e}_T]$. Note that the time-series data is non-uniform, as each element $e_t \in E$ corresponds to an on-ball action, such as a pass, dribble, or shot, resulting in varying gaps between each time step.

For each event $e_t \in E$, we have a set of observations denoted as $\Phi_t = [\boldsymbol{\phi}{t,1}, \ldots, \boldsymbol{\phi}{t,N}]$, where $\boldsymbol{\phi}_{t,n}$ is the observation of agent $n$ at time step $t$. Thus, we obtain a complete set of observations over time denoted as $\Phi = [\Phi_1, \ldots, \Phi_T]$. However, in our configuration, we only know one value in each $\Phi_t \in \Phi$, meaning that there are $N-1$ missing values for each $\Phi_t$, and a total of $T(N-1)$ missing values in the entire dataset. This is because we only make observations of the on-the-ball agent at each time step in football, resulting in only one known observation.

As the observed agent changes over time, we used a label encoded mask denoted as $M$, where $M_{n,t} = 1$ if agent $n$ is observed at time step $t$, and 0 otherwise. This binary matrix captures all information regarding known and unknown observations across the time-series problem.

In this configuration, the objective of the imputation model is to predict values for the unknown observations. Specifically, for each $e_t \in E$, the model predicts a value $\hat{\boldsymbol{\phi}}_{t,n}$ for every $n \in [1, \ldots, N]$, resulting in a complete set of predicted observations denoted as $\hat{\Phi} = [\hat{\Phi}_1, \ldots, \hat{\Phi}T]$, where $\hat{\Phi}t = [\hat{\boldsymbol{\phi}}{t,1}, \ldots, \hat{\boldsymbol{\phi}}{t,N}]$.

In the context of football, the predicted observations $\hat{\Phi}$ are the estimated locations of all players (both on-ball and off-ball) for every event $e_t \in E$. The known observations are derived from the location of the events that occur. For an event $e_t \in E$ with an on-ball agent $\mathbf{a}n \in A$, the assigned observation $\boldsymbol{\phi}{t,n}$ is $\mathbf{e}{t,x,y}$, where $\mathbf{e}{t,x,y}$ is the $x,y$ position at which $e_t$ occurred.

To summarise, the set of known observations $\Phi$ (containing missing data) is a $(T \times N \times 2)$ tensor, and the set of imputed observations.

## Model component

### 1. Time-Aware LSTM

The LSTM Component is a crucial part of our approach, where each agent's data is separately passed into a shared bidirectional LSTM as the above figure shows. 
The input data is divided into N segments of size ($B$ x $L$ x $I$), allowing the LSTM to learn the temporal relationship between the engineered features and agent location. 
Sharing the LSTM across all agents helps to overcome the issue of sparsity in agent observations by learning common movement patterns for agents with similar roles. 
To handle the irregular time intervals between timesteps, we used this [Time-Aware LSTM implementation](https://dl.acm.org/doi/10.1145/3097983.3097997). This architecture adjusts cell memory to modify the discount rate of previous or future actions in the sequence based on the difference in time from the current event. This implementation uses an LSTM with a single hidden layer of size $H_{1}$ = 100 .

The source code for the **Time-Aware** LSTM can be accessed in the `Time_LSTM.py` file, located in the `model/model architecture` directory.

In [None]:
#| export


class seq_lstm(nn.Module):
    def __init__(
        self, input_size=66, hidden_layer_size=100, output_size=50, batch_size=128
    ):
        super().__init__()

        self.hidden_layer_size = hidden_layer_size
        self.lstm = TimeLSTM.TimeLSTM(input_size, hidden_layer_size, bidirectional=True)
        self.linear = nn.Linear(hidden_layer_size, output_size)
        self.batch_size = batch_size
        self.relu = nn.ReLU()

    def forward(self, input_seq, ts):
        lstm_out = self.lstm(input_seq, ts)
        outs = self.linear(lstm_out[:, -1, :])
        outs = self.relu(outs)
        return outs

### 2. GNN

The LSTM component of the network deals with the temporal behavior of each agent. However, to fully understand the impact of inter-agent relationships within the multi-agent system (MAS), we need to incorporate these relationships into the model. To achieve this, we create a fully connected graph where each node represents an agent and the node features are the temporally aggregated agent representations. With this graph structure, we employ a Graph Neural Network (GNN) to allow information sharing between all agents. The GNN comprises two message passing layers with feature sizes of $H_{3}$ = 64 and $H_{4}$ = 32, respectively, and utilizes the SAGEConv operator. The GNN updates the node features by aggregating information about the agent neighborhoods (i.e., agent interactions) using a mean aggregation scheme.

The source code for the **GNN** can be accessed in the `GNN.py` file, located in the `model/model architecture` directory.

### Loss fuction

The loss function measures the average Euclidean distance between the predicted positions and their corresponding ground truth positions.

$$
\mathcal{L} = \frac{1}{N}\sum_{i=1}^{N}\sqrt{\sum_{t=1}^{T}(y_{i,t}-\hat{y}_{i,t})^2}
$$

where $N$ is the number of samples, $T$ is the length of the time sequence, $\hat{y}{i,j}$ is the predicted position of the $i$-th sample at time $j$, and $y{i,j}$ is the true position of the $i$-th sample at time $j$.

In [None]:
#| export


def eucl_loss(output, target):
    loss = (output - target).pow(2).sum(2).sqrt().mean()
    return loss

## Agent Imputer Lightning Module

This is a PyTorch Lightning implementation of the model. It has an LSTM component for temporal modeling and a graph convolutional network (GCN) component for modeling the interactions between agents in a multi-agent system. The GCN operates on a fully connected graph with the temporally-aggregated agent representations as node features. The architecture consists of two message passing layers using the SAGEConv operator. The model has a training step, a validation step, and a predict step. The optimizer used is AdamW.

In [None]:
#| export


class AgentImputerLightning(pl.LightningModule):
    def __init__(
        self, input_size=16, hidden_layer_size=100, output_size=2, batch_size=128
    ):
        super().__init__()
        self.hidden_layer_size = hidden_layer_size
        self.lstms = seq_lstm(input_size, hidden_layer_size)
        self.gcn = GCN.GCN(50)
        self.learning_rate = 0.002

        # Create edges list
        t1 = list(range(22))
        list_edges = list(itertools.product(t1, t1))
        list_edges = [list(ele) for ele in list_edges if ele[0] != ele[1]]
        self.edges = torch.tensor(list_edges).t().contiguous()

    def forward(self, input_list, ts_list, edges):
        outputs = torch.cat(
            [self.lstms(x, ts_l) for x, ts_l in zip(input_list, ts_list)], dim=1
        )
        outputs = outputs.reshape(outputs.shape[0], 22, 50)
        gcn_outputs = self.gcn(outputs, edges)
        return gcn_outputs

    def configure_optimizers(self):
        optimizer = torch.optim.AdamW(self.parameters(), lr=self.learning_rate)
        return optimizer

    def training_step(self, train_batch, batch_index):
        x, y, t = train_batch
        input_list = [torch.tensor(x.float())[:, i, :, :] for i in range(0, 22)]
        ts_list = [t.float()[:, i, :] for i in range(0, 22)]
        y_pred = self.forward(input_list, ts_list, self.edges)
        loss = eucl_loss(y_pred, y.float())
        self.log(
            "training loss",
            loss,
            on_step=False,
            on_epoch=True,
            prog_bar=True,
            logger=True,
        )
        return loss

    def validation_step(self, val_batch, batch_index):
        x, y, t = val_batch
        input_list = [torch.tensor(x.float())[:, i, :, :] for i in range(0, 22)]
        ts_list = [t.float()[:, i, :] for i in range(0, 22)]
        y_pred = self.forward(input_list, ts_list, self.edges)
        loss = eucl_loss(y_pred, y.float())
        self.log(
            "validation loss",
            loss,
            on_step=False,
            on_epoch=True,
            prog_bar=True,
            logger=True,
        )

    def predict_step(self, batch, batch_idx, dataloader_idx=0):
        x, y, t = batch
        input_list = [torch.tensor(x.float())[:, i, :, :] for i in range(0, 22)]
        ts_list = [t.float()[:, i, :] for i in range(0, 22)]
        y_pred = self.forward(input_list, ts_list, self.edges)
        return y_pred

In [None]:
#| hide
from nbdev import nbdev_export

nbdev_export()