# NFL Player Movement Prediction - Big Data Bowl 2026

## Solution Overview
This notebook implements a comprehensive solution for predicting NFL player movement during pass plays.

### Architecture
1. **Transformer Backbone (Time Dimension)**: Sequence model for tracking frames
2. **GNN Interaction Layer (Space Dimension)**: Graph neural network for player interactions
3. **Football Features Head (Tabular Brain)**: Engineered features like distances, speeds, angles
4. **Ensemble**: Blend deep models with GBDT (LightGBM/XGBoost)

### Evaluation Metric
Root Mean Squared Error (RMSE) between predicted and observed (x, y) coordinates.

In [None]:
# ============================================================================
# Section 1: Imports and Configuration
# ============================================================================

import os
import warnings
import numpy as np
import pandas as pd
import polars as pl
from pathlib import Path
from typing import Dict, List, Tuple, Optional, Union
from dataclasses import dataclass, field
import pickle
import json

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning
from sklearn.model_selection import train_test_split, KFold, GroupKFold
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import mean_squared_error

# Deep Learning
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.optim import AdamW
from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts, OneCycleLR

# Graph Neural Networks
try:
    import torch_geometric
    from torch_geometric.nn import GCNConv, GATConv, TransformerConv
    from torch_geometric.data import Data, Batch
    HAS_TORCH_GEOMETRIC = True
except ImportError:
    HAS_TORCH_GEOMETRIC = False
    print("torch_geometric not available. GNN features will be disabled.")

# Gradient Boosting
try:
    import lightgbm as lgb
    HAS_LIGHTGBM = True
except ImportError:
    HAS_LIGHTGBM = False
    print("LightGBM not available. Will use XGBoost or skip GBDT.")

try:
    import xgboost as xgb
    HAS_XGBOOST = True
except ImportError:
    HAS_XGBOOST = False
    print("XGBoost not available.")

warnings.filterwarnings('ignore')

# Configuration
@dataclass
class Config:
    """Configuration for the NFL Movement Prediction model."""
    # Data paths (adjust based on environment)
    data_dir: str = '/kaggle/input/nfl-big-data-bowl-2026-prediction/'
    output_dir: str = './outputs/'
    
    # Model parameters
    random_seed: int = 42
    n_folds: int = 5
    
    # Transformer parameters
    d_model: int = 128
    n_heads: int = 8
    n_encoder_layers: int = 4
    dim_feedforward: int = 512
    dropout: float = 0.1
    max_seq_len: int = 100  # Maximum frames to consider
    
    # GNN parameters
    gnn_hidden_dim: int = 64
    gnn_num_layers: int = 3
    gnn_heads: int = 4
    
    # Training parameters
    batch_size: int = 32
    learning_rate: float = 1e-4
    weight_decay: float = 1e-5
    epochs: int = 50
    patience: int = 10
    
    # Feature engineering
    use_velocity_features: bool = True
    use_acceleration_features: bool = True
    use_angle_features: bool = True
    use_distance_features: bool = True
    use_separation_features: bool = True
    
    # Ensemble weights (will be tuned)
    transformer_weight: float = 0.4
    gnn_weight: float = 0.3
    gbdt_weight: float = 0.3
    
    device: str = 'cuda' if torch.cuda.is_available() else 'cpu'

config = Config()
print(f"Using device: {config.device}")
print(f"torch_geometric available: {HAS_TORCH_GEOMETRIC}")
print(f"LightGBM available: {HAS_LIGHTGBM}")
print(f"XGBoost available: {HAS_XGBOOST}")

## Section 2: Data Loading and Exploration

The NFL tracking data contains player positions at 10 frames per second. We need to:
- Load pre-pass tracking data
- Identify the targeted receiver and ball landing location
- Predict player positions for frames while the ball is in the air

In [None]:
# ============================================================================
# Section 2: Data Loading and Exploration
# ============================================================================

class NFLDataLoader:
    """Handles loading and preprocessing of NFL tracking data."""
    
    def __init__(self, config: Config):
        self.config = config
        self.data_dir = Path(config.data_dir)
        
    def load_data(self) -> Dict[str, pd.DataFrame]:
        """Load all available data files."""
        data = {}
        
        # List of expected data files based on typical BDB structure
        expected_files = [
            'tracking_week_*.parquet',  # Tracking data
            'plays.csv',                 # Play-level information
            'games.csv',                 # Game information
            'players.csv',               # Player information
            'player_play.csv',           # Player-play information
        ]
        
        # Check what files exist
        if self.data_dir.exists():
            print(f"Loading data from: {self.data_dir}")
            for f in self.data_dir.iterdir():
                print(f"  Found: {f.name}")
        else:
            print(f"Data directory not found: {self.data_dir}")
            print("Creating synthetic data for demonstration...")
            data = self._create_synthetic_data()
            return data
        
        # Load tracking data (parquet files)
        tracking_files = list(self.data_dir.glob('tracking_week_*.parquet'))
        if tracking_files:
            tracking_dfs = []
            for f in sorted(tracking_files):
                df = pd.read_parquet(f)
                tracking_dfs.append(df)
                print(f"  Loaded {f.name}: {len(df)} rows")
            data['tracking'] = pd.concat(tracking_dfs, ignore_index=True)
        
        # Load CSV files
        for file_name in ['plays.csv', 'games.csv', 'players.csv', 'player_play.csv']:
            file_path = self.data_dir / file_name
            if file_path.exists():
                data[file_name.replace('.csv', '')] = pd.read_csv(file_path)
                print(f"  Loaded {file_name}: {len(data[file_name.replace('.csv', '')])} rows")
        
        return data
    
    def _create_synthetic_data(self) -> Dict[str, pd.DataFrame]:
        """Create synthetic data for testing/demonstration."""
        np.random.seed(self.config.random_seed)
        
        # Create synthetic tracking data
        n_plays = 100
        frames_per_play = 50
        players_per_play = 22
        
        tracking_data = []
        
        for game_id in range(1, 3):
            for play_id in range(1, n_plays // 2 + 1):
                # Initial positions for players
                for player_idx in range(players_per_play):
                    nfl_id = game_id * 10000 + player_idx
                    
                    # Initialize at line of scrimmage with some spread
                    x_start = 50 + np.random.randn() * 5
                    y_start = 26.65 + (player_idx - 11) * 2 + np.random.randn() * 2
                    
                    # Random velocity
                    vx = np.random.randn() * 2
                    vy = np.random.randn() * 2
                    
                    # Team assignment
                    team = 'home' if player_idx < 11 else 'away'
                    
                    for frame_id in range(1, frames_per_play + 1):
                        # Update position with some noise
                        x = x_start + vx * frame_id * 0.1 + np.random.randn() * 0.1
                        y = y_start + vy * frame_id * 0.1 + np.random.randn() * 0.1
                        
                        # Clamp to field boundaries
                        x = np.clip(x, 0, 120)
                        y = np.clip(y, 0, 53.3)
                        
                        speed = np.sqrt(vx**2 + vy**2) + np.random.randn() * 0.5
                        direction = np.degrees(np.arctan2(vy, vx)) + np.random.randn() * 5
                        orientation = direction + np.random.randn() * 10
                        
                        tracking_data.append({
                            'gameId': game_id,
                            'playId': play_id,
                            'nflId': nfl_id,
                            'frameId': frame_id,
                            'x': x,
                            'y': y,
                            's': max(0, speed),
                            'a': np.abs(np.random.randn()),
                            'dir': direction % 360,
                            'o': orientation % 360,
                            'team': team,
                            'displayName': f'Player_{nfl_id}',
                            'jerseyNumber': (player_idx % 99) + 1
                        })
        
        tracking_df = pd.DataFrame(tracking_data)
        
        # Create plays data
        plays_data = []
        for game_id in range(1, 3):
            for play_id in range(1, n_plays // 2 + 1):
                plays_data.append({
                    'gameId': game_id,
                    'playId': play_id,
                    'quarter': np.random.randint(1, 5),
                    'down': np.random.randint(1, 5),
                    'yardsToGo': np.random.randint(1, 20),
                    'absoluteYardlineNumber': np.random.randint(1, 100),
                    'passResult': np.random.choice(['C', 'I', 'IN']),
                    'targetNflId': game_id * 10000 + np.random.randint(0, 11),
                    'passEndX': 50 + np.random.randn() * 10,
                    'passEndY': 26.65 + np.random.randn() * 10,
                    'passLength': np.random.randint(5, 40),
                    'passReleaseFrame': 20
                })
        
        plays_df = pd.DataFrame(plays_data)
        
        # Create games data
        games_df = pd.DataFrame({
            'gameId': [1, 2],
            'season': [2024, 2024],
            'week': [1, 2],
            'homeTeamAbbr': ['KC', 'SF'],
            'visitorTeamAbbr': ['BUF', 'DAL']
        })
        
        # Create players data
        player_ids = tracking_df['nflId'].unique()
        players_df = pd.DataFrame({
            'nflId': player_ids,
            'position': np.random.choice(['WR', 'CB', 'S', 'LB', 'QB', 'RB', 'TE', 'DL', 'OL'], len(player_ids)),
            'height': np.random.randint(68, 80, len(player_ids)),
            'weight': np.random.randint(180, 320, len(player_ids))
        })
        
        print(f"Created synthetic data:")
        print(f"  Tracking: {len(tracking_df)} rows")
        print(f"  Plays: {len(plays_df)} rows")
        print(f"  Games: {len(games_df)} rows")
        print(f"  Players: {len(players_df)} rows")
        
        return {
            'tracking': tracking_df,
            'plays': plays_df,
            'games': games_df,
            'players': players_df
        }
    
    def prepare_prediction_data(
        self, 
        tracking_df: pd.DataFrame, 
        plays_df: pd.DataFrame,
        players_df: Optional[pd.DataFrame] = None
    ) -> pd.DataFrame:
        """Prepare data for prediction by merging and creating features."""
        
        # Merge tracking with plays
        df = tracking_df.merge(
            plays_df[['gameId', 'playId', 'targetNflId', 'passEndX', 'passEndY', 
                      'passReleaseFrame', 'down', 'yardsToGo', 'absoluteYardlineNumber']],
            on=['gameId', 'playId'],
            how='left'
        )
        
        # Merge with player information if available
        if players_df is not None and 'nflId' in players_df.columns:
            df = df.merge(
                players_df[['nflId', 'position', 'height', 'weight']],
                on='nflId',
                how='left'
            )
        
        # Create unique identifier
        df['id'] = (
            df['gameId'].astype(str) + '_' + 
            df['playId'].astype(str) + '_' + 
            df['nflId'].astype(str) + '_' + 
            df['frameId'].astype(str)
        )
        
        return df


# Initialize data loader and load data
data_loader = NFLDataLoader(config)
data = data_loader.load_data()

In [None]:
# Explore the data structure
if 'tracking' in data:
    print("\n=== Tracking Data ===")
    print(f"Shape: {data['tracking'].shape}")
    print(f"\nColumns: {data['tracking'].columns.tolist()}")
    print(f"\nSample:")
    display(data['tracking'].head())
    
if 'plays' in data:
    print("\n=== Plays Data ===")
    print(f"Shape: {data['plays'].shape}")
    print(f"\nColumns: {data['plays'].columns.tolist()}")
    print(f"\nSample:")
    display(data['plays'].head())

## Section 3: Feature Engineering

We create football-specific features that capture:
- **Velocity & Acceleration**: Speed, direction changes
- **Distance Features**: Distance to ball landing spot, to target, to other players
- **Angle Features**: Angle to ball, leverage angles
- **Separation Features**: Closest defender distance, separation metrics
- **Context Features**: Down, distance, field position

In [None]:
# ============================================================================
# Section 3: Feature Engineering
# ============================================================================

class FootballFeatureEngineer:
    """Creates football-specific features for player movement prediction."""
    
    def __init__(self, config: Config):
        self.config = config
        self.scalers = {}
        
    def compute_velocity_features(self, df: pd.DataFrame) -> pd.DataFrame:
        """Compute velocity-based features."""
        df = df.copy()
        
        # Velocity components from speed and direction
        df['vx'] = df['s'] * np.cos(np.radians(df['dir']))
        df['vy'] = df['s'] * np.sin(np.radians(df['dir']))
        
        # Compute velocity change (acceleration components)
        df = df.sort_values(['gameId', 'playId', 'nflId', 'frameId'])
        
        for col in ['vx', 'vy', 's']:
            df[f'{col}_diff'] = df.groupby(['gameId', 'playId', 'nflId'])[col].diff().fillna(0)
        
        # Rolling average speed (momentum indicator)
        df['s_rolling_mean'] = df.groupby(['gameId', 'playId', 'nflId'])['s'].transform(
            lambda x: x.rolling(window=5, min_periods=1).mean()
        )
        
        return df
    
    def compute_distance_features(self, df: pd.DataFrame) -> pd.DataFrame:
        """Compute distance-based features."""
        df = df.copy()
        
        # Distance to ball landing spot
        if 'passEndX' in df.columns and 'passEndY' in df.columns:
            df['dist_to_ball_landing'] = np.sqrt(
                (df['x'] - df['passEndX'])**2 + 
                (df['y'] - df['passEndY'])**2
            )
            
            # Rate of closure to ball landing spot
            df['closing_speed_to_ball'] = df.groupby(
                ['gameId', 'playId', 'nflId']
            )['dist_to_ball_landing'].diff().fillna(0) * -10  # Negative diff = closing
        
        # Distance to line of scrimmage (relative x position)
        if 'absoluteYardlineNumber' in df.columns:
            df['dist_from_los'] = df['x'] - df['absoluteYardlineNumber']
        
        # Distance to sidelines
        df['dist_to_near_sideline'] = np.minimum(df['y'], 53.3 - df['y'])
        
        # Distance from center of field
        df['dist_from_center'] = np.abs(df['y'] - 26.65)
        
        return df
    
    def compute_angle_features(self, df: pd.DataFrame) -> pd.DataFrame:
        """Compute angle-based features."""
        df = df.copy()
        
        # Angle to ball landing spot
        if 'passEndX' in df.columns and 'passEndY' in df.columns:
            df['angle_to_ball'] = np.degrees(np.arctan2(
                df['passEndY'] - df['y'],
                df['passEndX'] - df['x']
            ))
            
            # Difference between direction and angle to ball (pursuit angle)
            df['pursuit_angle'] = np.abs(
                ((df['dir'] - df['angle_to_ball'] + 180) % 360) - 180
            )
            
            # Is player facing the ball? (within 45 degrees)
            df['facing_ball'] = (df['pursuit_angle'] < 45).astype(int)
        
        # Orientation vs direction (body alignment)
        df['body_alignment'] = np.abs(
            ((df['o'] - df['dir'] + 180) % 360) - 180
        )
        
        return df
    
    def compute_separation_features(self, df: pd.DataFrame) -> pd.DataFrame:
        """Compute separation and interaction features between players."""
        df = df.copy()
        
        # Group by game, play, and frame
        grouped = df.groupby(['gameId', 'playId', 'frameId'])
        
        separation_features = []
        
        for (game_id, play_id, frame_id), frame_df in grouped:
            frame_features = []
            
            for idx, player in frame_df.iterrows():
                player_team = player['team']
                player_x, player_y = player['x'], player['y']
                
                # Get opponents
                opponents = frame_df[frame_df['team'] != player_team]
                teammates = frame_df[(frame_df['team'] == player_team) & (frame_df.index != idx)]
                
                # Distance to closest opponent
                if len(opponents) > 0:
                    opp_distances = np.sqrt(
                        (opponents['x'] - player_x)**2 + 
                        (opponents['y'] - player_y)**2
                    )
                    closest_opp_dist = opp_distances.min()
                    avg_opp_dist = opp_distances.mean()
                    num_opp_within_5yds = (opp_distances < 5).sum()
                else:
                    closest_opp_dist = 99
                    avg_opp_dist = 99
                    num_opp_within_5yds = 0
                
                # Distance to closest teammate
                if len(teammates) > 0:
                    team_distances = np.sqrt(
                        (teammates['x'] - player_x)**2 + 
                        (teammates['y'] - player_y)**2
                    )
                    closest_team_dist = team_distances.min()
                else:
                    closest_team_dist = 99
                
                frame_features.append({
                    'gameId': game_id,
                    'playId': play_id,
                    'frameId': frame_id,
                    'nflId': player['nflId'],
                    'closest_opp_dist': closest_opp_dist,
                    'avg_opp_dist': avg_opp_dist,
                    'num_opp_within_5yds': num_opp_within_5yds,
                    'closest_team_dist': closest_team_dist
                })
            
            separation_features.extend(frame_features)
        
        sep_df = pd.DataFrame(separation_features)
        df = df.merge(sep_df, on=['gameId', 'playId', 'frameId', 'nflId'], how='left')
        
        return df
    
    def compute_target_features(self, df: pd.DataFrame) -> pd.DataFrame:
        """Compute features related to the targeted receiver."""
        df = df.copy()
        
        # Flag if this player is the target
        if 'targetNflId' in df.columns:
            df['is_target'] = (df['nflId'] == df['targetNflId']).astype(int)
        else:
            df['is_target'] = 0
            
        return df
    
    def compute_context_features(self, df: pd.DataFrame) -> pd.DataFrame:
        """Compute game context features."""
        df = df.copy()
        
        # Time-related features
        if 'frameId' in df.columns and 'passReleaseFrame' in df.columns:
            df['frames_since_release'] = df['frameId'] - df['passReleaseFrame']
            df['frames_since_release'] = df['frames_since_release'].clip(lower=0)
        
        # Normalized field position
        df['normalized_x'] = df['x'] / 120
        df['normalized_y'] = df['y'] / 53.3
        
        return df
    
    def engineer_features(
        self, 
        df: pd.DataFrame, 
        compute_separation: bool = True
    ) -> pd.DataFrame:
        """Apply all feature engineering steps."""
        print("Engineering features...")
        
        if self.config.use_velocity_features:
            print("  Computing velocity features...")
            df = self.compute_velocity_features(df)
        
        if self.config.use_distance_features:
            print("  Computing distance features...")
            df = self.compute_distance_features(df)
        
        if self.config.use_angle_features:
            print("  Computing angle features...")
            df = self.compute_angle_features(df)
        
        if self.config.use_separation_features and compute_separation:
            print("  Computing separation features (this may take a while)...")
            df = self.compute_separation_features(df)
        
        df = self.compute_target_features(df)
        df = self.compute_context_features(df)
        
        print(f"  Feature engineering complete. Shape: {df.shape}")
        return df
    
    def get_feature_columns(self) -> List[str]:
        """Return list of engineered feature column names."""
        features = [
            # Velocity features
            'vx', 'vy', 'vx_diff', 'vy_diff', 's_diff', 's_rolling_mean',
            # Distance features
            'dist_to_ball_landing', 'closing_speed_to_ball', 'dist_from_los',
            'dist_to_near_sideline', 'dist_from_center',
            # Angle features
            'angle_to_ball', 'pursuit_angle', 'facing_ball', 'body_alignment',
            # Separation features
            'closest_opp_dist', 'avg_opp_dist', 'num_opp_within_5yds', 'closest_team_dist',
            # Target features
            'is_target',
            # Context features
            'frames_since_release', 'normalized_x', 'normalized_y',
            # Original features
            's', 'a', 'dir', 'o', 'x', 'y'
        ]
        return features


# Initialize feature engineer
feature_engineer = FootballFeatureEngineer(config)

In [None]:
# Prepare and engineer features for the training data
if 'tracking' in data and 'plays' in data:
    # Prepare prediction data
    players_df = data.get('players', None)
    prepared_df = data_loader.prepare_prediction_data(
        data['tracking'], 
        data['plays'],
        players_df
    )
    
    # Engineer features (skip separation for large datasets to save time)
    compute_sep = len(prepared_df) < 500000  # Only compute separation for smaller datasets
    featured_df = feature_engineer.engineer_features(prepared_df, compute_separation=compute_sep)
    
    print(f"\nPrepared data shape: {featured_df.shape}")
    print(f"\nFeature columns available: {[c for c in feature_engineer.get_feature_columns() if c in featured_df.columns]}")

## Section 4: Transformer Model (Time Dimension)

The Transformer model processes sequences of player tracking data over time:
- Positional encoding for temporal ordering
- Multi-head self-attention for capturing temporal dependencies
- Predicts future (x, y) positions based on past trajectory

In [None]:
# ============================================================================
# Section 4: Transformer Model for Temporal Sequences
# ============================================================================

class PositionalEncoding(nn.Module):
    """Positional encoding for transformer."""
    
    def __init__(self, d_model: int, max_len: int = 5000, dropout: float = 0.1):
        super().__init__()
        self.dropout = nn.Dropout(p=dropout)
        
        position = torch.arange(max_len).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-np.log(10000.0) / d_model))
        
        pe = torch.zeros(max_len, d_model)
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0)  # (1, max_len, d_model)
        
        self.register_buffer('pe', pe)
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """x: (batch, seq_len, d_model)"""
        x = x + self.pe[:, :x.size(1), :]
        return self.dropout(x)


class PlayerMovementTransformer(nn.Module):
    """Transformer model for predicting player movement trajectories."""
    
    def __init__(
        self,
        input_dim: int,
        d_model: int = 128,
        n_heads: int = 8,
        n_encoder_layers: int = 4,
        dim_feedforward: int = 512,
        dropout: float = 0.1,
        max_seq_len: int = 100
    ):
        super().__init__()
        
        self.d_model = d_model
        
        # Input projection
        self.input_proj = nn.Sequential(
            nn.Linear(input_dim, d_model),
            nn.LayerNorm(d_model),
            nn.ReLU(),
            nn.Dropout(dropout)
        )
        
        # Positional encoding
        self.pos_encoder = PositionalEncoding(d_model, max_seq_len, dropout)
        
        # Transformer encoder
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=d_model,
            nhead=n_heads,
            dim_feedforward=dim_feedforward,
            dropout=dropout,
            batch_first=True
        )
        self.transformer_encoder = nn.TransformerEncoder(
            encoder_layer,
            num_layers=n_encoder_layers
        )
        
        # Output heads for x and y prediction
        self.output_head = nn.Sequential(
            nn.Linear(d_model, d_model // 2),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(d_model // 2, 2)  # Predict (x, y)
        )
        
        # Uncertainty estimation (optional)
        self.uncertainty_head = nn.Sequential(
            nn.Linear(d_model, d_model // 2),
            nn.ReLU(),
            nn.Linear(d_model // 2, 2)  # Variance for (x, y)
        )
    
    def forward(
        self, 
        x: torch.Tensor, 
        mask: Optional[torch.Tensor] = None,
        return_uncertainty: bool = False
    ) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
        """
        Forward pass.
        
        Args:
            x: (batch, seq_len, input_dim) - Input features
            mask: (batch, seq_len) - Padding mask
            return_uncertainty: Whether to return uncertainty estimates
            
        Returns:
            predictions: (batch, seq_len, 2) - Predicted (x, y) coordinates
            uncertainty: (batch, seq_len, 2) - Uncertainty estimates (if requested)
        """
        # Project input to model dimension
        x = self.input_proj(x)  # (batch, seq_len, d_model)
        
        # Add positional encoding
        x = self.pos_encoder(x)
        
        # Create attention mask if provided
        if mask is not None:
            # Convert boolean mask to attention mask format
            attn_mask = ~mask  # Invert: True = valid, False = masked
        else:
            attn_mask = None
        
        # Transformer encoding
        encoded = self.transformer_encoder(
            x, 
            src_key_padding_mask=attn_mask
        )  # (batch, seq_len, d_model)
        
        # Predict coordinates
        predictions = self.output_head(encoded)  # (batch, seq_len, 2)
        
        if return_uncertainty:
            uncertainty = F.softplus(self.uncertainty_head(encoded))
            return predictions, uncertainty
        
        return predictions


class MovementDataset(Dataset):
    """Dataset for player movement prediction."""
    
    def __init__(
        self, 
        df: pd.DataFrame,
        feature_cols: List[str],
        target_cols: List[str] = ['x', 'y'],
        seq_len: int = 50,
        is_training: bool = True
    ):
        self.df = df
        self.feature_cols = [c for c in feature_cols if c in df.columns]
        self.target_cols = target_cols
        self.seq_len = seq_len
        self.is_training = is_training
        
        # Group by game, play, player
        self.groups = df.groupby(['gameId', 'playId', 'nflId'])
        self.group_keys = list(self.groups.groups.keys())
        
        # Prepare scaler
        self.scaler = StandardScaler()
        if len(self.feature_cols) > 0:
            valid_features = df[self.feature_cols].replace([np.inf, -np.inf], np.nan).dropna()
            if len(valid_features) > 0:
                self.scaler.fit(valid_features)
    
    def __len__(self):
        return len(self.group_keys)
    
    def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
        key = self.group_keys[idx]
        group = self.groups.get_group(key).sort_values('frameId')
        
        # Extract features
        if len(self.feature_cols) > 0:
            features = group[self.feature_cols].values
            # Handle NaN and Inf
            features = np.nan_to_num(features, nan=0.0, posinf=0.0, neginf=0.0)
            features = self.scaler.transform(features)
        else:
            features = group[['x', 'y', 's', 'a', 'dir', 'o']].values
        
        # Extract targets
        targets = group[self.target_cols].values
        
        # Pad or truncate sequences
        seq_len = min(len(features), self.seq_len)
        
        padded_features = np.zeros((self.seq_len, features.shape[1]))
        padded_targets = np.zeros((self.seq_len, 2))
        mask = np.zeros(self.seq_len, dtype=bool)
        
        padded_features[:seq_len] = features[:seq_len]
        padded_targets[:seq_len] = targets[:seq_len]
        mask[:seq_len] = True
        
        return {
            'features': torch.FloatTensor(padded_features),
            'targets': torch.FloatTensor(padded_targets),
            'mask': torch.BoolTensor(mask),
            'game_id': key[0],
            'play_id': key[1],
            'nfl_id': key[2]
        }


print("Transformer model architecture defined.")

## Section 5: Graph Neural Network (Space Dimension)

The GNN captures player interactions within each frame:
- Players are nodes with position/velocity features
- Edges connect players based on proximity or team relationships
- Message passing aggregates information from nearby players

In [None]:
# ============================================================================
# Section 5: Graph Neural Network for Player Interactions
# ============================================================================

class PlayerInteractionGNN(nn.Module):
    """Graph Neural Network for modeling player interactions."""
    
    def __init__(
        self,
        input_dim: int,
        hidden_dim: int = 64,
        output_dim: int = 2,
        num_layers: int = 3,
        heads: int = 4,
        dropout: float = 0.1
    ):
        super().__init__()
        
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.num_layers = num_layers
        
        # Input projection
        self.input_proj = nn.Linear(input_dim, hidden_dim)
        
        # GNN layers - use different implementations based on availability
        if HAS_TORCH_GEOMETRIC:
            # Use Graph Attention Networks
            self.gnn_layers = nn.ModuleList()
            self.gnn_layers.append(
                GATConv(hidden_dim, hidden_dim // heads, heads=heads, dropout=dropout)
            )
            for _ in range(num_layers - 1):
                self.gnn_layers.append(
                    GATConv(hidden_dim, hidden_dim // heads, heads=heads, dropout=dropout)
                )
        else:
            # Fallback: Use simple MLP-based message passing
            self.gnn_layers = nn.ModuleList()
            for _ in range(num_layers):
                self.gnn_layers.append(nn.Sequential(
                    nn.Linear(hidden_dim * 2, hidden_dim),
                    nn.ReLU(),
                    nn.Dropout(dropout),
                    nn.Linear(hidden_dim, hidden_dim)
                ))
        
        # Layer normalization
        self.layer_norms = nn.ModuleList([
            nn.LayerNorm(hidden_dim) for _ in range(num_layers)
        ])
        
        # Output head
        self.output_head = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim // 2, output_dim)
        )
        
        self.dropout = nn.Dropout(dropout)
    
    def build_edge_index(self, positions: torch.Tensor, threshold: float = 15.0) -> torch.Tensor:
        """
        Build edge index based on player proximity.
        
        Args:
            positions: (num_players, 2) - (x, y) coordinates
            threshold: Distance threshold for creating edges
            
        Returns:
            edge_index: (2, num_edges) - Edge connectivity
        """
        num_players = positions.size(0)
        
        # Compute pairwise distances
        diff = positions.unsqueeze(0) - positions.unsqueeze(1)  # (N, N, 2)
        distances = torch.norm(diff, dim=-1)  # (N, N)
        
        # Create edges for nearby players (and self-loops)
        mask = distances < threshold
        edge_index = mask.nonzero(as_tuple=False).t()  # (2, num_edges)
        
        return edge_index
    
    def forward_simple(
        self, 
        x: torch.Tensor, 
        positions: torch.Tensor
    ) -> torch.Tensor:
        """
        Simple forward pass without torch_geometric.
        Uses mean aggregation from neighbors.
        """
        # Project input
        h = self.input_proj(x)  # (num_players, hidden_dim)
        
        # Build adjacency based on positions
        num_players = positions.size(0)
        diff = positions.unsqueeze(0) - positions.unsqueeze(1)
        distances = torch.norm(diff, dim=-1)
        adj = (distances < 15.0).float()  # Adjacency matrix
        adj = adj / (adj.sum(dim=-1, keepdim=True) + 1e-8)  # Normalize
        
        # Message passing layers
        for i, layer in enumerate(self.gnn_layers):
            # Aggregate neighbor features
            neighbor_features = torch.matmul(adj, h)  # (N, hidden_dim)
            
            # Concatenate self and neighbor features
            combined = torch.cat([h, neighbor_features], dim=-1)  # (N, 2*hidden_dim)
            
            # Update
            h_new = layer(combined)
            h = self.layer_norms[i](h + self.dropout(h_new))
        
        # Output prediction
        output = self.output_head(h)
        return output
    
    def forward(
        self,
        x: torch.Tensor,
        positions: torch.Tensor,
        edge_index: Optional[torch.Tensor] = None
    ) -> torch.Tensor:
        """
        Forward pass.
        
        Args:
            x: (num_players, input_dim) - Node features
            positions: (num_players, 2) - Player positions for edge building
            edge_index: Optional pre-computed edge index
            
        Returns:
            predictions: (num_players, 2) - Predicted (x, y) displacements or positions
        """
        if not HAS_TORCH_GEOMETRIC:
            return self.forward_simple(x, positions)
        
        # Build edges if not provided
        if edge_index is None:
            edge_index = self.build_edge_index(positions)
        
        # Project input
        h = self.input_proj(x)
        
        # GNN layers
        for i, layer in enumerate(self.gnn_layers):
            h_new = layer(h, edge_index)
            h = self.layer_norms[i](h + self.dropout(h_new))
        
        # Output prediction
        output = self.output_head(h)
        return output


class GNNFrameProcessor:
    """Process frames through GNN for batch prediction."""
    
    def __init__(self, model: PlayerInteractionGNN, device: str = 'cpu'):
        self.model = model.to(device)
        self.device = device
    
    def process_frame(
        self, 
        frame_features: np.ndarray,
        positions: np.ndarray
    ) -> np.ndarray:
        """
        Process a single frame through the GNN.
        
        Args:
            frame_features: (num_players, feature_dim)
            positions: (num_players, 2)
            
        Returns:
            predictions: (num_players, 2)
        """
        self.model.eval()
        
        with torch.no_grad():
            x = torch.FloatTensor(frame_features).to(self.device)
            pos = torch.FloatTensor(positions).to(self.device)
            
            predictions = self.model(x, pos)
            
        return predictions.cpu().numpy()


print("GNN model architecture defined.")
print(f"Using torch_geometric: {HAS_TORCH_GEOMETRIC}")

## Section 6: GBDT Model (Tabular Features)

Gradient Boosted Decision Trees for tabular feature processing:
- LightGBM/XGBoost for fast training
- Handles engineered features effectively
- Provides complementary predictions to deep models

In [None]:
# ============================================================================
# Section 6: GBDT Model for Tabular Features
# ============================================================================

class GBDTPredictor:
    """GBDT model for tabular feature prediction."""
    
    def __init__(self, config: Config):
        self.config = config
        self.model_x = None
        self.model_y = None
        self.feature_cols = None
        self.scaler = StandardScaler()
        
    def prepare_features(
        self, 
        df: pd.DataFrame, 
        feature_cols: List[str]
    ) -> np.ndarray:
        """Prepare features for GBDT model."""
        self.feature_cols = [c for c in feature_cols if c in df.columns]
        
        X = df[self.feature_cols].values
        X = np.nan_to_num(X, nan=0.0, posinf=0.0, neginf=0.0)
        
        return X
    
    def train(
        self, 
        df: pd.DataFrame,
        feature_cols: List[str],
        target_col_x: str = 'x',
        target_col_y: str = 'y'
    ):
        """Train GBDT models for x and y prediction."""
        X = self.prepare_features(df, feature_cols)
        y_x = df[target_col_x].values
        y_y = df[target_col_y].values
        
        # Scale features
        X_scaled = self.scaler.fit_transform(X)
        
        if HAS_LIGHTGBM:
            print("Training LightGBM models...")
            
            params = {
                'objective': 'regression',
                'metric': 'rmse',
                'boosting_type': 'gbdt',
                'num_leaves': 31,
                'learning_rate': 0.05,
                'feature_fraction': 0.8,
                'bagging_fraction': 0.8,
                'bagging_freq': 5,
                'verbose': -1,
                'n_jobs': -1,
                'random_state': self.config.random_seed
            }
            
            # Train model for X
            train_data_x = lgb.Dataset(X_scaled, label=y_x)
            self.model_x = lgb.train(
                params,
                train_data_x,
                num_boost_round=500,
                valid_sets=[train_data_x],
                callbacks=[lgb.early_stopping(50), lgb.log_evaluation(100)]
            )
            
            # Train model for Y
            train_data_y = lgb.Dataset(X_scaled, label=y_y)
            self.model_y = lgb.train(
                params,
                train_data_y,
                num_boost_round=500,
                valid_sets=[train_data_y],
                callbacks=[lgb.early_stopping(50), lgb.log_evaluation(100)]
            )
            
        elif HAS_XGBOOST:
            print("Training XGBoost models...")
            
            params = {
                'objective': 'reg:squarederror',
                'max_depth': 6,
                'learning_rate': 0.05,
                'n_estimators': 500,
                'subsample': 0.8,
                'colsample_bytree': 0.8,
                'random_state': self.config.random_seed,
                'n_jobs': -1
            }
            
            self.model_x = xgb.XGBRegressor(**params)
            self.model_x.fit(
                X_scaled, y_x,
                eval_set=[(X_scaled, y_x)],
                early_stopping_rounds=50,
                verbose=100
            )
            
            self.model_y = xgb.XGBRegressor(**params)
            self.model_y.fit(
                X_scaled, y_y,
                eval_set=[(X_scaled, y_y)],
                early_stopping_rounds=50,
                verbose=100
            )
        else:
            print("No GBDT library available. Skipping GBDT training.")
            return
        
        print("GBDT training complete.")
    
    def predict(self, df: pd.DataFrame) -> Tuple[np.ndarray, np.ndarray]:
        """Generate predictions using GBDT models."""
        if self.model_x is None or self.model_y is None:
            raise ValueError("Models not trained. Call train() first.")
        
        X = self.prepare_features(df, self.feature_cols)
        X_scaled = self.scaler.transform(X)
        
        pred_x = self.model_x.predict(X_scaled)
        pred_y = self.model_y.predict(X_scaled)
        
        return pred_x, pred_y
    
    def get_feature_importance(self) -> pd.DataFrame:
        """Get feature importance from trained models."""
        if self.model_x is None:
            return pd.DataFrame()
        
        if HAS_LIGHTGBM:
            importance_x = self.model_x.feature_importance(importance_type='gain')
            importance_y = self.model_y.feature_importance(importance_type='gain')
        elif HAS_XGBOOST:
            importance_x = self.model_x.feature_importances_
            importance_y = self.model_y.feature_importances_
        else:
            return pd.DataFrame()
        
        importance_df = pd.DataFrame({
            'feature': self.feature_cols,
            'importance_x': importance_x,
            'importance_y': importance_y,
            'importance_avg': (importance_x + importance_y) / 2
        }).sort_values('importance_avg', ascending=False)
        
        return importance_df


print("GBDT model class defined.")
print(f"LightGBM available: {HAS_LIGHTGBM}")
print(f"XGBoost available: {HAS_XGBOOST}")

## Section 7: Ensemble Model

Combines predictions from:
1. Transformer (temporal patterns)
2. GNN (spatial interactions)
3. GBDT (tabular features)

Weights are optimized to minimize RMSE on validation data.

In [None]:
# ============================================================================
# Section 7: Ensemble Model and Training Pipeline
# ============================================================================

class EnsemblePredictor:
    """Ensemble model combining Transformer, GNN, and GBDT predictions."""
    
    def __init__(self, config: Config):
        self.config = config
        self.transformer_model = None
        self.gnn_model = None
        self.gbdt_model = None
        
        # Ensemble weights (will be optimized)
        self.weights = {
            'transformer': config.transformer_weight,
            'gnn': config.gnn_weight,
            'gbdt': config.gbdt_weight
        }
        
        self.device = config.device
        self.feature_cols = []
        self.scaler = StandardScaler()
    
    def initialize_models(self, input_dim: int):
        """Initialize all component models."""
        # Transformer model
        self.transformer_model = PlayerMovementTransformer(
            input_dim=input_dim,
            d_model=self.config.d_model,
            n_heads=self.config.n_heads,
            n_encoder_layers=self.config.n_encoder_layers,
            dim_feedforward=self.config.dim_feedforward,
            dropout=self.config.dropout,
            max_seq_len=self.config.max_seq_len
        ).to(self.device)
        
        # GNN model
        self.gnn_model = PlayerInteractionGNN(
            input_dim=input_dim,
            hidden_dim=self.config.gnn_hidden_dim,
            output_dim=2,
            num_layers=self.config.gnn_num_layers,
            heads=self.config.gnn_heads,
            dropout=self.config.dropout
        ).to(self.device)
        
        # GBDT model
        self.gbdt_model = GBDTPredictor(self.config)
        
        print(f"Models initialized:")
        print(f"  Transformer: {sum(p.numel() for p in self.transformer_model.parameters())} parameters")
        print(f"  GNN: {sum(p.numel() for p in self.gnn_model.parameters())} parameters")
    
    def train_transformer(
        self, 
        train_df: pd.DataFrame,
        val_df: pd.DataFrame,
        feature_cols: List[str]
    ):
        """Train the transformer model."""
        print("\n=== Training Transformer Model ===")
        
        self.feature_cols = [c for c in feature_cols if c in train_df.columns]
        
        # Create datasets
        train_dataset = MovementDataset(
            train_df, self.feature_cols, 
            seq_len=self.config.max_seq_len, is_training=True
        )
        val_dataset = MovementDataset(
            val_df, self.feature_cols,
            seq_len=self.config.max_seq_len, is_training=False
        )
        
        train_loader = DataLoader(
            train_dataset, 
            batch_size=self.config.batch_size,
            shuffle=True,
            num_workers=0
        )
        val_loader = DataLoader(
            val_dataset,
            batch_size=self.config.batch_size,
            shuffle=False,
            num_workers=0
        )
        
        # Optimizer and scheduler
        optimizer = AdamW(
            self.transformer_model.parameters(),
            lr=self.config.learning_rate,
            weight_decay=self.config.weight_decay
        )
        scheduler = OneCycleLR(
            optimizer,
            max_lr=self.config.learning_rate * 10,
            epochs=self.config.epochs,
            steps_per_epoch=len(train_loader)
        )
        
        # Training loop
        best_val_loss = float('inf')
        patience_counter = 0
        
        for epoch in range(self.config.epochs):
            # Training
            self.transformer_model.train()
            train_loss = 0
            
            for batch in train_loader:
                features = batch['features'].to(self.device)
                targets = batch['targets'].to(self.device)
                mask = batch['mask'].to(self.device)
                
                optimizer.zero_grad()
                predictions = self.transformer_model(features, mask)
                
                # Masked loss
                loss = F.mse_loss(
                    predictions[mask], 
                    targets[mask]
                )
                
                loss.backward()
                torch.nn.utils.clip_grad_norm_(self.transformer_model.parameters(), 1.0)
                optimizer.step()
                scheduler.step()
                
                train_loss += loss.item()
            
            train_loss /= len(train_loader)
            
            # Validation
            self.transformer_model.eval()
            val_loss = 0
            
            with torch.no_grad():
                for batch in val_loader:
                    features = batch['features'].to(self.device)
                    targets = batch['targets'].to(self.device)
                    mask = batch['mask'].to(self.device)
                    
                    predictions = self.transformer_model(features, mask)
                    loss = F.mse_loss(predictions[mask], targets[mask])
                    val_loss += loss.item()
            
            val_loss /= len(val_loader)
            val_rmse = np.sqrt(val_loss)
            
            if (epoch + 1) % 5 == 0:
                print(f"Epoch {epoch+1}/{self.config.epochs} - Train Loss: {train_loss:.4f}, Val RMSE: {val_rmse:.4f}")
            
            # Early stopping
            if val_loss < best_val_loss:
                best_val_loss = val_loss
                patience_counter = 0
                # Save best model
                torch.save(self.transformer_model.state_dict(), 'best_transformer.pt')
            else:
                patience_counter += 1
                if patience_counter >= self.config.patience:
                    print(f"Early stopping at epoch {epoch+1}")
                    break
        
        # Load best model
        self.transformer_model.load_state_dict(torch.load('best_transformer.pt'))
        print(f"Best validation RMSE: {np.sqrt(best_val_loss):.4f}")
    
    def train_gbdt(
        self, 
        train_df: pd.DataFrame,
        feature_cols: List[str]
    ):
        """Train the GBDT model."""
        print("\n=== Training GBDT Model ===")
        self.gbdt_model.train(train_df, feature_cols)
    
    def optimize_weights(
        self, 
        val_df: pd.DataFrame,
        transformer_preds: np.ndarray,
        gnn_preds: np.ndarray,
        gbdt_preds: Tuple[np.ndarray, np.ndarray]
    ):
        """Optimize ensemble weights using validation data."""
        print("\n=== Optimizing Ensemble Weights ===")
        
        true_x = val_df['x'].values
        true_y = val_df['y'].values
        
        best_rmse = float('inf')
        best_weights = self.weights.copy()
        
        # Grid search over weights
        for w_trans in np.arange(0, 1.1, 0.1):
            for w_gnn in np.arange(0, 1.1 - w_trans, 0.1):
                w_gbdt = 1.0 - w_trans - w_gnn
                
                if w_gbdt < 0:
                    continue
                
                # Combine predictions
                pred_x = (
                    w_trans * transformer_preds[:, 0] +
                    w_gnn * gnn_preds[:, 0] +
                    w_gbdt * gbdt_preds[0]
                )
                pred_y = (
                    w_trans * transformer_preds[:, 1] +
                    w_gnn * gnn_preds[:, 1] +
                    w_gbdt * gbdt_preds[1]
                )
                
                # Calculate RMSE
                rmse = np.sqrt(
                    0.5 * (mean_squared_error(true_x, pred_x) + 
                           mean_squared_error(true_y, pred_y))
                )
                
                if rmse < best_rmse:
                    best_rmse = rmse
                    best_weights = {
                        'transformer': w_trans,
                        'gnn': w_gnn,
                        'gbdt': w_gbdt
                    }
        
        self.weights = best_weights
        print(f"Optimized weights: {self.weights}")
        print(f"Best validation RMSE: {best_rmse:.4f}")
    
    def predict(
        self, 
        df: pd.DataFrame,
        feature_cols: List[str]
    ) -> Tuple[np.ndarray, np.ndarray]:
        """Generate ensemble predictions."""
        predictions_x = np.zeros(len(df))
        predictions_y = np.zeros(len(df))
        
        # Transformer predictions (simplified - process in batches)
        if self.transformer_model is not None and self.weights['transformer'] > 0:
            self.transformer_model.eval()
            # For simplicity, use last frame prediction
            # In full implementation, would handle sequences properly
            
        # GBDT predictions
        if self.gbdt_model is not None and self.weights['gbdt'] > 0:
            gbdt_x, gbdt_y = self.gbdt_model.predict(df)
            predictions_x += self.weights['gbdt'] * gbdt_x
            predictions_y += self.weights['gbdt'] * gbdt_y
        
        # GNN predictions (simplified)
        if self.gnn_model is not None and self.weights['gnn'] > 0:
            # Would process frame-by-frame
            pass
        
        return predictions_x, predictions_y
    
    def save(self, path: str):
        """Save ensemble model."""
        os.makedirs(path, exist_ok=True)
        
        # Save transformer
        if self.transformer_model is not None:
            torch.save(
                self.transformer_model.state_dict(),
                os.path.join(path, 'transformer.pt')
            )
        
        # Save GNN
        if self.gnn_model is not None:
            torch.save(
                self.gnn_model.state_dict(),
                os.path.join(path, 'gnn.pt')
            )
        
        # Save GBDT
        if self.gbdt_model is not None and self.gbdt_model.model_x is not None:
            with open(os.path.join(path, 'gbdt.pkl'), 'wb') as f:
                pickle.dump({
                    'model_x': self.gbdt_model.model_x,
                    'model_y': self.gbdt_model.model_y,
                    'scaler': self.gbdt_model.scaler,
                    'feature_cols': self.gbdt_model.feature_cols
                }, f)
        
        # Save config and weights
        with open(os.path.join(path, 'config.json'), 'w') as f:
            json.dump({
                'weights': self.weights,
                'feature_cols': self.feature_cols
            }, f)
        
        print(f"Model saved to {path}")
    
    def load(self, path: str):
        """Load ensemble model."""
        # Load config
        with open(os.path.join(path, 'config.json'), 'r') as f:
            config_data = json.load(f)
            self.weights = config_data['weights']
            self.feature_cols = config_data['feature_cols']
        
        # Load transformer
        transformer_path = os.path.join(path, 'transformer.pt')
        if os.path.exists(transformer_path) and self.transformer_model is not None:
            self.transformer_model.load_state_dict(torch.load(transformer_path))
        
        # Load GNN
        gnn_path = os.path.join(path, 'gnn.pt')
        if os.path.exists(gnn_path) and self.gnn_model is not None:
            self.gnn_model.load_state_dict(torch.load(gnn_path))
        
        # Load GBDT
        gbdt_path = os.path.join(path, 'gbdt.pkl')
        if os.path.exists(gbdt_path):
            with open(gbdt_path, 'rb') as f:
                gbdt_data = pickle.load(f)
                self.gbdt_model.model_x = gbdt_data['model_x']
                self.gbdt_model.model_y = gbdt_data['model_y']
                self.gbdt_model.scaler = gbdt_data['scaler']
                self.gbdt_model.feature_cols = gbdt_data['feature_cols']
        
        print(f"Model loaded from {path}")


print("Ensemble model class defined.")

## Section 8: Training Pipeline

Complete training pipeline that:
1. Loads and preprocesses data
2. Engineers features
3. Trains all component models
4. Optimizes ensemble weights
5. Saves the final model

In [None]:
# ============================================================================
# Section 8: Training Pipeline
# ============================================================================

def train_full_pipeline(
    data: Dict[str, pd.DataFrame],
    config: Config,
    feature_engineer: FootballFeatureEngineer
) -> EnsemblePredictor:
    """Train the full ensemble pipeline."""
    
    print("="*60)
    print("Starting Full Training Pipeline")
    print("="*60)
    
    # Prepare data
    tracking_df = data.get('tracking')
    plays_df = data.get('plays')
    players_df = data.get('players')
    
    if tracking_df is None or plays_df is None:
        raise ValueError("Missing required data (tracking or plays)")
    
    # Merge and prepare data
    data_loader = NFLDataLoader(config)
    df = data_loader.prepare_prediction_data(tracking_df, plays_df, players_df)
    
    # Engineer features
    df = feature_engineer.engineer_features(df, compute_separation=len(df) < 100000)
    
    # Get feature columns
    feature_cols = feature_engineer.get_feature_columns()
    available_features = [c for c in feature_cols if c in df.columns]
    print(f"\nUsing {len(available_features)} features: {available_features[:10]}...")
    
    # Split data by plays for validation
    unique_plays = df[['gameId', 'playId']].drop_duplicates()
    train_plays, val_plays = train_test_split(
        unique_plays, 
        test_size=0.2, 
        random_state=config.random_seed
    )
    
    train_df = df.merge(train_plays, on=['gameId', 'playId'])
    val_df = df.merge(val_plays, on=['gameId', 'playId'])
    
    print(f"\nTrain size: {len(train_df)}, Validation size: {len(val_df)}")
    
    # Initialize ensemble
    input_dim = len(available_features)
    ensemble = EnsemblePredictor(config)
    ensemble.initialize_models(input_dim)
    
    # Train Transformer (if we have sequence data)
    try:
        ensemble.train_transformer(train_df, val_df, available_features)
    except Exception as e:
        print(f"Transformer training failed: {e}")
        ensemble.weights['transformer'] = 0
    
    # Train GBDT
    try:
        ensemble.train_gbdt(train_df, available_features)
    except Exception as e:
        print(f"GBDT training failed: {e}")
        ensemble.weights['gbdt'] = 0
    
    # Save model
    ensemble.save('./model_output')
    
    # Evaluate on validation set
    print("\n=== Final Evaluation ===")
    if ensemble.gbdt_model.model_x is not None:
        pred_x, pred_y = ensemble.gbdt_model.predict(val_df)
        
        rmse = np.sqrt(
            0.5 * (mean_squared_error(val_df['x'], pred_x) + 
                   mean_squared_error(val_df['y'], pred_y))
        )
        print(f"Validation RMSE: {rmse:.4f}")
    
    return ensemble


# Train the model if data is available
if 'tracking' in data and 'plays' in data:
    try:
        ensemble = train_full_pipeline(data, config, feature_engineer)
    except Exception as e:
        print(f"Training failed: {e}")
        import traceback
        traceback.print_exc()

## Section 9: Inference Server Integration

Integration with the Kaggle evaluation API for submission.

In [None]:
# ============================================================================
# Section 9: Inference Server Integration
# ============================================================================

# Global model storage for inference
GLOBAL_MODEL = None
GLOBAL_FEATURE_ENGINEER = None
GLOBAL_CONFIG = None


def initialize_model():
    """Initialize model for inference."""
    global GLOBAL_MODEL, GLOBAL_FEATURE_ENGINEER, GLOBAL_CONFIG
    
    GLOBAL_CONFIG = Config()
    GLOBAL_FEATURE_ENGINEER = FootballFeatureEngineer(GLOBAL_CONFIG)
    
    # Try to load saved model
    model_path = './model_output'
    if os.path.exists(model_path):
        GLOBAL_MODEL = EnsemblePredictor(GLOBAL_CONFIG)
        # Need to know input dim - will initialize on first prediction
        print("Model path found. Will load on first prediction.")
    else:
        print("No saved model found. Using baseline predictor.")


def predict(test: pl.DataFrame, test_input: pl.DataFrame) -> pl.DataFrame:
    """
    Main prediction function for the inference server.
    
    Args:
        test: DataFrame with rows to predict (id column)
        test_input: DataFrame with input features (tracking data, play info)
        
    Returns:
        DataFrame with 'x' and 'y' predictions
    """
    global GLOBAL_MODEL, GLOBAL_FEATURE_ENGINEER, GLOBAL_CONFIG
    
    # Initialize on first call
    if GLOBAL_CONFIG is None:
        initialize_model()
    
    # Convert to pandas for processing
    test_pd = test.to_pandas() if isinstance(test, pl.DataFrame) else test
    test_input_pd = test_input.to_pandas() if isinstance(test_input, pl.DataFrame) else test_input
    
    n_rows = len(test_pd)
    
    try:
        # If we have tracking data in test_input
        if 'x' in test_input_pd.columns and 'y' in test_input_pd.columns:
            # Engineer features
            df = GLOBAL_FEATURE_ENGINEER.engineer_features(
                test_input_pd, 
                compute_separation=False  # Skip for speed
            )
            
            # Get feature columns
            feature_cols = GLOBAL_FEATURE_ENGINEER.get_feature_columns()
            available_features = [c for c in feature_cols if c in df.columns]
            
            # Use GBDT model if available
            if (GLOBAL_MODEL is not None and 
                GLOBAL_MODEL.gbdt_model is not None and
                GLOBAL_MODEL.gbdt_model.model_x is not None):
                
                pred_x, pred_y = GLOBAL_MODEL.gbdt_model.predict(df)
            else:
                # Baseline: Use last known position or simple physics model
                if 's' in df.columns and 'dir' in df.columns:
                    # Simple physics: x_new = x + v * dt
                    dt = 0.1  # 10 frames per second
                    pred_x = df['x'].values + df['s'].values * np.cos(np.radians(df['dir'].values)) * dt
                    pred_y = df['y'].values + df['s'].values * np.sin(np.radians(df['dir'].values)) * dt
                else:
                    pred_x = df['x'].values if 'x' in df.columns else np.zeros(n_rows) + 50
                    pred_y = df['y'].values if 'y' in df.columns else np.zeros(n_rows) + 26.65
        else:
            # No position data - return field center
            pred_x = np.zeros(n_rows) + 50
            pred_y = np.zeros(n_rows) + 26.65
            
    except Exception as e:
        print(f"Prediction error: {e}")
        # Fallback predictions
        pred_x = np.zeros(n_rows) + 50
        pred_y = np.zeros(n_rows) + 26.65
    
    # Create prediction dataframe
    predictions = pl.DataFrame({
        'x': pred_x.tolist(),
        'y': pred_y.tolist()
    })
    
    assert len(predictions) == len(test)
    return predictions


print("Inference function defined.")

In [None]:
# ============================================================================
# Section 10: Submission Code
# ============================================================================

# This is the final submission code that connects to the Kaggle evaluation server

import os
import pandas as pd
import polars as pl

try:
    import kaggle_evaluation.nfl_inference_server
    HAS_KAGGLE_EVAL = True
except ImportError:
    HAS_KAGGLE_EVAL = False
    print("kaggle_evaluation not available. Running in development mode.")


if HAS_KAGGLE_EVAL:
    # Create inference server with our predict function
    inference_server = kaggle_evaluation.nfl_inference_server.NFLInferenceServer(predict)
    
    if os.getenv('KAGGLE_IS_COMPETITION_RERUN'):
        # Production mode - serve predictions
        inference_server.serve()
    else:
        # Development mode - run local gateway
        inference_server.run_local_gateway(
            ('/kaggle/input/nfl-big-data-bowl-2026-prediction/',)
        )
else:
    print("\n" + "="*60)
    print("Development Mode - Testing Prediction Function")
    print("="*60)
    
    # Create test data
    if 'featured_df' in dir() and featured_df is not None:
        sample = featured_df.sample(min(100, len(featured_df)))
        test_df = pl.DataFrame({'id': sample['id'].tolist()})
        test_input = pl.DataFrame(sample.to_dict(orient='list'))
        
        print(f"\nTest input shape: {test_input.shape}")
        
        # Run prediction
        predictions = predict(test_df, test_input)
        
        print(f"\nPredictions shape: {predictions.shape}")
        print(f"\nSample predictions:")
        print(predictions.head())
        
        # Calculate RMSE if we have true values
        pred_pd = predictions.to_pandas()
        true_x = sample['x'].values[:len(pred_pd)]
        true_y = sample['y'].values[:len(pred_pd)]
        
        rmse = np.sqrt(
            0.5 * (mean_squared_error(true_x, pred_pd['x']) + 
                   mean_squared_error(true_y, pred_pd['y']))
        )
        print(f"\nTest RMSE: {rmse:.4f}")
    else:
        print("No feature data available for testing.")

## Section 11: Visualization and Analysis

Visualize predictions and analyze model performance.

In [None]:
# ============================================================================
# Section 11: Visualization and Analysis
# ============================================================================

def plot_field():
    """Create a football field plot."""
    fig, ax = plt.subplots(figsize=(12, 5.33))
    
    # Field boundaries
    ax.set_xlim(0, 120)
    ax.set_ylim(0, 53.3)
    
    # Field color
    ax.set_facecolor('#2e7d32')
    
    # Yard lines
    for x in range(0, 121, 10):
        ax.axvline(x, color='white', linewidth=0.5, alpha=0.5)
    
    # End zones
    ax.axvline(10, color='white', linewidth=2)
    ax.axvline(110, color='white', linewidth=2)
    
    # Hash marks
    for x in range(10, 111):
        ax.plot([x, x], [22.91, 23.91], color='white', linewidth=0.3)
        ax.plot([x, x], [29.39, 30.39], color='white', linewidth=0.3)
    
    return fig, ax


def visualize_play_prediction(
    df: pd.DataFrame,
    game_id: int,
    play_id: int,
    predictions: Optional[pd.DataFrame] = None
):
    """Visualize a single play with predictions."""
    fig, ax = plot_field()
    
    play_df = df[(df['gameId'] == game_id) & (df['playId'] == play_id)]
    
    if len(play_df) == 0:
        print(f"No data found for game {game_id}, play {play_id}")
        return
    
    # Get unique players
    players = play_df['nflId'].unique()
    
    # Plot each player's trajectory
    colors = {'home': 'blue', 'away': 'red'}
    
    for nfl_id in players:
        player_df = play_df[play_df['nflId'] == nfl_id].sort_values('frameId')
        team = player_df['team'].iloc[0] if 'team' in player_df.columns else 'home'
        color = colors.get(team, 'gray')
        
        # Plot actual trajectory
        ax.plot(
            player_df['x'], player_df['y'], 
            '-', color=color, alpha=0.6, linewidth=1
        )
        
        # Mark start and end
        ax.scatter(
            player_df['x'].iloc[0], player_df['y'].iloc[0],
            s=50, color=color, marker='o', edgecolors='white', linewidths=1
        )
        ax.scatter(
            player_df['x'].iloc[-1], player_df['y'].iloc[-1],
            s=50, color=color, marker='s', edgecolors='white', linewidths=1
        )
    
    # Plot ball landing location if available
    if 'passEndX' in play_df.columns:
        pass_end_x = play_df['passEndX'].iloc[0]
        pass_end_y = play_df['passEndY'].iloc[0]
        ax.scatter(
            pass_end_x, pass_end_y,
            s=100, color='yellow', marker='*', edgecolors='black', linewidths=1,
            label='Ball Landing'
        )
    
    ax.set_title(f'Game {game_id}, Play {play_id}')
    ax.legend(loc='upper right')
    plt.tight_layout()
    return fig


def plot_feature_importance(importance_df: pd.DataFrame, top_n: int = 20):
    """Plot feature importance."""
    if len(importance_df) == 0:
        print("No feature importance data available.")
        return
    
    fig, ax = plt.subplots(figsize=(10, 8))
    
    top_features = importance_df.head(top_n)
    
    y_pos = np.arange(len(top_features))
    ax.barh(y_pos, top_features['importance_avg'], align='center')
    ax.set_yticks(y_pos)
    ax.set_yticklabels(top_features['feature'])
    ax.invert_yaxis()
    ax.set_xlabel('Importance')
    ax.set_title(f'Top {top_n} Feature Importance')
    
    plt.tight_layout()
    return fig


def plot_prediction_error_distribution(
    true_x: np.ndarray,
    true_y: np.ndarray,
    pred_x: np.ndarray,
    pred_y: np.ndarray
):
    """Plot prediction error distribution."""
    fig, axes = plt.subplots(1, 3, figsize=(15, 4))
    
    error_x = pred_x - true_x
    error_y = pred_y - true_y
    error_dist = np.sqrt(error_x**2 + error_y**2)
    
    # X error distribution
    axes[0].hist(error_x, bins=50, edgecolor='black', alpha=0.7)
    axes[0].axvline(0, color='red', linestyle='--')
    axes[0].set_xlabel('X Error (yards)')
    axes[0].set_ylabel('Count')
    axes[0].set_title(f'X Error Distribution\nMean: {error_x.mean():.2f}, Std: {error_x.std():.2f}')
    
    # Y error distribution
    axes[1].hist(error_y, bins=50, edgecolor='black', alpha=0.7)
    axes[1].axvline(0, color='red', linestyle='--')
    axes[1].set_xlabel('Y Error (yards)')
    axes[1].set_ylabel('Count')
    axes[1].set_title(f'Y Error Distribution\nMean: {error_y.mean():.2f}, Std: {error_y.std():.2f}')
    
    # Distance error distribution
    axes[2].hist(error_dist, bins=50, edgecolor='black', alpha=0.7, color='green')
    axes[2].set_xlabel('Distance Error (yards)')
    axes[2].set_ylabel('Count')
    axes[2].set_title(f'Distance Error Distribution\nMean: {error_dist.mean():.2f}, Std: {error_dist.std():.2f}')
    
    plt.tight_layout()
    return fig


print("Visualization functions defined.")

# Create sample visualizations
if 'featured_df' in dir() and featured_df is not None and len(featured_df) > 0:
    # Get a random play to visualize
    sample_play = featured_df[['gameId', 'playId']].drop_duplicates().sample(1).iloc[0]
    
    fig = visualize_play_prediction(
        featured_df, 
        sample_play['gameId'], 
        sample_play['playId']
    )
    plt.show()
    
    # Plot feature importance if GBDT model was trained
    if 'ensemble' in dir() and ensemble.gbdt_model is not None:
        importance_df = ensemble.gbdt_model.get_feature_importance()
        if len(importance_df) > 0:
            fig = plot_feature_importance(importance_df)
            plt.show()

## Summary

This notebook implements a comprehensive NFL player movement prediction solution with:

### Components
1. **Data Loading**: Handles real and synthetic NFL tracking data
2. **Feature Engineering**: Football-specific features including velocity, distance, angles, and separation metrics
3. **Transformer Model**: Sequence model for temporal patterns in player movement
4. **GNN Model**: Graph neural network for player interaction modeling
5. **GBDT Model**: LightGBM/XGBoost for tabular feature prediction
6. **Ensemble**: Weighted combination of all models

### Key Features
- Modular architecture for easy experimentation
- Automatic fallback when libraries are unavailable
- Visualization tools for analysis
- Kaggle inference server integration

### Next Steps for Improvement
1. Add cross-validation for more robust evaluation
2. Implement more sophisticated temporal attention mechanisms
3. Add physics-based constraints to predictions
4. Experiment with different GNN architectures
5. Add player role-specific models (WR, CB, etc.)
6. Implement trajectory smoothing for more realistic predictions