# NFL Big Data Bowl 2026 - Kaggle Submission Guide

This notebook shows how to format predictions for Kaggle submission using the **new Evaluation API**.

**Key Changes from Traditional Submissions**:
- Uses `kaggle_evaluation.nfl_inference_server` API
- Models are called iteratively per time step
- No 5-minute time limit for model loading (only inference time matters)
- Returns only `x, y` columns (no IDs)

**Contents**:
1. API Overview
2. Model Loading Strategy
3. Prediction Function Structure
4. Test-Time Augmentation (TTA)
5. Complete Submission Example

In [1]:
# Imports for submission
import os
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from pathlib import Path
import pickle
import warnings
warnings.filterwarnings('ignore')

# Polars for API (required by Kaggle)
try:
    import polars as pl
    print('Polars available for Kaggle API')
except ImportError:
    print('Polars not installed (pip install polars)')

print('Imports ready')

Polars not installed (pip install polars)
Imports ready


## 1. Kaggle API Overview

The 2026 competition uses a new inference API:

```python
def predict(test: pl.DataFrame, test_input: pl.DataFrame) -> pl.DataFrame | pd.DataFrame:
    """
    Args:
        test: DataFrame with columns (game_id, play_id, nfl_id, frame_id)
              - These are the frames you need to predict positions for
        test_input: DataFrame with historical tracking data
              - Contains all input features up to current time
    
    Returns:
        DataFrame with ONLY columns: (x, y)
        - Must match length of test DataFrame
        - Rows must be in same order as test DataFrame
    """
```

**Important Notes**:
- Models are loaded ONCE at startup (no time limit)
- `predict()` is called repeatedly for each batch
- Return only `x, y` columns - API handles row matching

In [2]:
# Field constants
FIELD_LENGTH = 120.0
FIELD_WIDTH = 53.3

class Config:
    """Configuration for submission"""
    # Path to pretrained models (on Kaggle)
    MODELS_DIR = Path('/kaggle/input/your-model-dataset/')
    
    # Local path for testing
    LOCAL_MODELS_DIR = Path('../models/st_transformer_demo/')
    
    WINDOW_SIZE = 10
    MAX_FUTURE_HORIZON = 94
    N_FOLDS = 5  # or 20 for production
    
    DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

cfg = Config()
print(f'Device: {cfg.DEVICE}')

Device: cuda


## 2. Model Loading Strategy

Load models ONCE at startup (no time limit), then reuse for all predictions.

In [3]:
# Example model class (same as training)
class STTransformer(nn.Module):
    def __init__(self, input_dim, horizon=94, hidden_dim=128, n_layers=6, n_heads=8):
        super().__init__()
        self.input_proj = nn.Linear(input_dim, hidden_dim)
        self.pos_embed = nn.Parameter(torch.randn(1, 10, hidden_dim))
        
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=hidden_dim, nhead=n_heads,
            dim_feedforward=hidden_dim*4, dropout=0.1,
            batch_first=True, norm_first=True
        )
        self.encoder = nn.TransformerEncoder(encoder_layer, num_layers=n_layers)
        
        self.pool_ln = nn.LayerNorm(hidden_dim)
        self.pool_attn = nn.MultiheadAttention(hidden_dim, n_heads, batch_first=True)
        self.pool_query = nn.Parameter(torch.randn(1, 2, hidden_dim))
        
        self.head = nn.Sequential(
            nn.Linear(2*hidden_dim, 256), nn.GELU(), nn.Dropout(0.2),
            nn.Linear(256, horizon*2)
        )
        self.horizon = horizon
    
    def forward(self, x):
        B, T, _ = x.shape
        x = self.input_proj(x) + self.pos_embed[:, :T, :]
        h = self.encoder(x)
        q = self.pool_query.expand(B, -1, -1)
        ctx, _ = self.pool_attn(q, self.pool_ln(h), self.pool_ln(h))
        out = self.head(ctx.flatten(1)).view(B, self.horizon, 2)
        return torch.cumsum(out, dim=1)

print('Model class defined')

Model class defined


In [4]:
def load_ensemble_models(model_dir, model_class, n_folds=5, input_dim=16):
    """
    Load all fold models and scalers.
    Called ONCE at startup - no time limit.
    """
    models = []
    scalers = []
    
    model_dir = Path(model_dir)
    
    for fold in range(1, n_folds + 1):
        # Load model
        model_path = model_dir / f'model_fold{fold}.pt'
        scaler_path = model_dir / f'scaler_fold{fold}.pkl'
        
        if not model_path.exists():
            print(f'Warning: Model fold {fold} not found')
            continue
        
        # Instantiate and load weights
        model = model_class(input_dim)
        state = torch.load(model_path, map_location='cpu')
        model.load_state_dict(state)
        model.to(cfg.DEVICE)
        model.eval()
        models.append(model)
        
        # Load scaler
        if scaler_path.exists():
            with open(scaler_path, 'rb') as f:
                scaler = pickle.load(f)
            scalers.append(scaler)
    
    print(f'Loaded {len(models)} models and {len(scalers)} scalers')
    return models, scalers

# Example loading (uncomment to test)
# models, scalers = load_ensemble_models(cfg.LOCAL_MODELS_DIR, STTransformer)
print('Model loading function defined')

Model loading function defined


## 3. Prediction Function

The main `predict()` function called by Kaggle API.

In [5]:
def invert_to_original_direction(x_u, y_u, play_dir_right):
    """Convert unified coordinates back to original direction."""
    if not play_dir_right:
        return float(x_u), float(y_u)
    return float(FIELD_LENGTH - x_u), float(FIELD_WIDTH - y_u)

def predict_batch(models, scalers, sequences, device):
    """
    Run ensemble prediction on a batch of sequences.
    
    Args:
        models: List of trained models
        scalers: List of corresponding scalers
        sequences: List of (window_size, n_features) arrays
        device: torch device
    
    Returns:
        (dx, dy) predictions averaged across models
    """
    all_preds_dx = []
    all_preds_dy = []
    
    for model, scaler in zip(models, scalers):
        # Scale inputs
        X_scaled = [scaler.transform(s) for s in sequences]
        X_tensor = torch.tensor(np.stack(X_scaled).astype(np.float32)).to(device)
        
        # Predict
        with torch.no_grad():
            preds = model(X_tensor).cpu().numpy()
        
        all_preds_dx.append(preds[:, :, 0])
        all_preds_dy.append(preds[:, :, 1])
    
    # Average across models
    ens_dx = np.mean(all_preds_dx, axis=0)
    ens_dy = np.mean(all_preds_dy, axis=0)
    
    return ens_dx, ens_dy

print('Prediction functions defined')

Prediction functions defined


## 4. Test-Time Augmentation (TTA)

Improve predictions by averaging original and horizontally flipped predictions.

In [6]:
def horizontal_flip_sequence(seq, y_idx=1, vy_idx=None, dir_idx=None):
    """
    Apply horizontal flip to a sequence for TTA.
    
    Args:
        seq: (window_size, n_features) array
        y_idx: Index of y coordinate
        vy_idx: Index of velocity_y (optional)
        dir_idx: Index of direction (optional)
    """
    flipped = seq.copy()
    
    # Flip y coordinate
    flipped[:, y_idx] = FIELD_WIDTH - flipped[:, y_idx]
    
    # Flip velocity_y
    if vy_idx is not None:
        flipped[:, vy_idx] = -flipped[:, vy_idx]
    
    # Flip direction
    if dir_idx is not None:
        flipped[:, dir_idx] = (180.0 - flipped[:, dir_idx]) % 360.0
    
    return flipped

def predict_with_tta(models, scalers, sequences, device):
    """
    Predict with test-time augmentation (horizontal flip).
    Averages original and flipped predictions.
    """
    # Original prediction
    dx_orig, dy_orig = predict_batch(models, scalers, sequences, device)
    
    # Flipped prediction
    flipped_seqs = [horizontal_flip_sequence(s, y_idx=1) for s in sequences]
    dx_flip, dy_flip = predict_batch(models, scalers, flipped_seqs, device)
    
    # Average (flip dy back)
    dx_tta = (dx_orig + dx_flip) / 2
    dy_tta = (dy_orig - dy_flip) / 2  # Flip sign back
    
    return dx_tta, dy_tta

print('TTA functions defined')

TTA functions defined


## 5. Complete Submission Example

Full submission code structure for Kaggle.

In [7]:
# Complete submission template
SUBMISSION_CODE = '''
import polars as pl
import pandas as pd
import numpy as np
import torch
import pickle
from pathlib import Path

# Field constants
FIELD_LENGTH, FIELD_WIDTH = 120.0, 53.3

# Global model storage (loaded once)
_models_loaded = False
_models = None
_scalers = None

def load_models_once():
    """Load models on first call (no time limit)"""
    global _models_loaded, _models, _scalers
    
    if _models_loaded:
        return
    
    print("Loading models...")
    model_dir = Path("/kaggle/input/your-model-dataset/")
    
    _models = []
    _scalers = []
    
    for fold in range(1, 6):  # 5 folds
        # Load model
        model = YourModelClass(input_dim=16)
        state = torch.load(model_dir / f"model_fold{fold}.pt", map_location="cpu")
        model.load_state_dict(state)
        model.cuda().eval()
        _models.append(model)
        
        # Load scaler
        with open(model_dir / f"scaler_fold{fold}.pkl", "rb") as f:
            _scalers.append(pickle.load(f))
    
    _models_loaded = True
    print(f"Loaded {len(_models)} models")

def predict(test: pl.DataFrame, test_input: pl.DataFrame) -> pd.DataFrame:
    """
    Main prediction function called by Kaggle API.
    
    MUST return DataFrame with ONLY columns: x, y
    MUST have same number of rows as test DataFrame
    """
    global _models, _scalers
    
    # Load models on first call
    if not _models_loaded:
        load_models_once()
    
    # Convert to pandas
    test_pd = test.to_pandas()
    test_input_pd = test_input.to_pandas()
    
    # 1. Prepare sequences from test_input
    sequences = prepare_sequences(test_input_pd, test_pd)
    
    # 2. Run ensemble prediction
    all_preds = []
    for model, scaler in zip(_models, _scalers):
        X_scaled = [scaler.transform(s) for s in sequences]
        X_tensor = torch.tensor(np.stack(X_scaled)).cuda()
        with torch.no_grad():
            preds = model(X_tensor).cpu().numpy()
        all_preds.append(preds)
    
    ens_preds = np.mean(all_preds, axis=0)
    
    # 3. Format output (ONLY x, y columns!)
    rows = []
    for i, (_, row) in enumerate(test_pd.iterrows()):
        # Get prediction for this frame
        t = min(row["frame_offset"], 93)  # Clip to horizon
        x_pred = last_x[i] + ens_preds[i, t, 0]
        y_pred = last_y[i] + ens_preds[i, t, 1]
        
        # Clip to field
        x_pred = np.clip(x_pred, 0, FIELD_LENGTH)
        y_pred = np.clip(y_pred, 0, FIELD_WIDTH)
        
        rows.append({"x": x_pred, "y": y_pred})
    
    return pd.DataFrame(rows)
'''

print('Submission template shown above')
print('\nKey points:')
print('  1. Load models ONCE at startup')
print('  2. Return ONLY x, y columns')
print('  3. Match row count of test DataFrame')
print('  4. Clip predictions to field boundaries')

Submission template shown above

Key points:
  1. Load models ONCE at startup
  2. Return ONLY x, y columns
  3. Match row count of test DataFrame
  4. Clip predictions to field boundaries


## Summary

**Kaggle Submission Checklist**:

1. **Model Loading**:
   - Load all models/scalers at startup
   - No time limit for loading phase
   - Store in global variables for reuse

2. **Prediction Function**:
   - Accept `pl.DataFrame` inputs
   - Return `pd.DataFrame` with only `x, y` columns
   - Row count must match input `test` DataFrame

3. **Best Practices**:
   - Use TTA for ~0.005-0.010 improvement
   - Ensemble multiple models/seeds
   - Clip predictions to field boundaries
   - Handle direction inversion if using unified coordinates

**Next**: See `08_ensemble_prediction.ipynb` for combining multiple models.