# NFL Big Data Bowl 2026 - Inference & Ensemble Guide

This notebook explains how to use the pretrained models for inference and create the winning ensemble.

**Note**: The complete inference code is in the actual Kaggle submission notebook:
- `/mnt/raid0/BigData2/ensemble_4model_SIMPLE.ipynb` (full implementation)

**Contents:**
1. Understanding the ensemble architecture
2. Loading pretrained models
3. Making predictions with single models
4. Applying Test-Time Augmentation (TTA)
5. Creating the 4-model ensemble
6. Generating Kaggle submissions

## 1. Ensemble Architecture Overview

Our best submission (**0.541 Public LB**) uses a 4-model ensemble:

```python
# Ensemble weights (from ensemble_4model_SIMPLE.ipynb)
WEIGHTS = {
    'st_transformer_6l': 0.2517,    # 6-Layer ST Transformer
    'multiscale_cnn':    0.2517,    # Multiscale CNN + 2L Transformer
    'position_st':       0.2490,    # Position-Specific ST Models
    'gru_seed27':        0.2476     # GRU with Geometric Features
}

# Total: 1.0 (equal weighting approximately)
```

### Why This Ensemble Works:

1. **Architecture Diversity**:
   - Pure Transformer (ST 6L)
   - CNN + Transformer hybrid (Multiscale CNN)
   - RNN (GRU)
   - Position-specialized (Position ST)

2. **Feature Diversity**:
   - Global features (ST, CNN)
   - Position-specific features (Position ST)
   - Geometric features (GRU)

3. **Training Diversity**:
   - Different seeds
   - Different augmentations
   - Different window sizes

4. **Test-Time Augmentation**:
   - All models use horizontal flip TTA
   - Consistent +0.005-0.010 improvement

## 2. Complete Inference Implementation

### Where to Find the Code:

**Full Kaggle Submission Notebook**:
```
/mnt/raid0/BigData2/ensemble_4model_SIMPLE.ipynb
```

This notebook contains:
- ✅ Complete model loading for all 4 models
- ✅ Preprocessing and feature engineering
- ✅ Test-Time Augmentation implementation
- ✅ Ensemble weighting and combination
- ✅ Kaggle submission formatting

### Key Components:

#### Model 1: ST Transformer (6-Layer)
```python
# From notebook:
DATASET_DIR_ST = '/kaggle/input/6layer-seed700-flip-only'

# Load 20 folds
for fold in range(20):
    model = STTransformer(input_dim)
    model.load_state_dict(torch.load(f'model_fold{fold}.pt'))
    scaler = joblib.load(f'scaler_fold{fold}.pkl')
```

#### Model 2: Multiscale CNN
```python
# From notebook:
MODELS_DIR_CNN = Path('/kaggle/input/st-multiscale-cnn-w10-20fold')

# Architecture includes multi-scale dilated convolutions
class MultiScaleCNN:
    conv1: kernel=3, dilation=1
    conv2: kernel=3, dilation=2
    conv3: kernel=3, dilation=3
    # → Concatenate → 2-Layer Transformer
```

#### Model 3: GRU (Seed 27)
```python
# From notebook:
load_dir = Path('/kaggle/input/gru-w9-seed27-20fold')

# Load with geometric features
route_kmeans = pickle.load(open('route_kmeans.pkl', 'rb'))
route_scaler = pickle.load(open('route_scaler.pkl', 'rb'))

# 20-fold ensemble
for fold in range(20):
    model = JointSeqModel(input_dim, horizon=94, hidden_dim=64)
```

#### Model 4: Position-Specific ST
```python
# From notebook:
positions = {
    'wr': ['WR'],
    'te': ['TE'],
    'ball_carriers': ['QB', 'RB', 'FB']
}

# Load separate model for each position (5-fold each)
for position_name in positions:
    for fold in range(1, 6):
        model = STTransformer(input_dim)
        model.load_state_dict(torch.load(f'{position_name}/fold{fold}/model.pt'))
```

## 3. Test-Time Augmentation (TTA)

### Implementation:

```python
def horizontal_flip_dataframe(df):
    """Flip play horizontally for TTA"""
    df = df.copy()
    field_width = 53.3
    
    # Flip y-coordinate
    df['y'] = field_width - df['y']
    
    # Flip y-velocity
    for col in ['velocity_y', 'acceleration_y']:
        if col in df.columns:
            df[col] = -df[col]
    
    # Flip direction angles
    if 'dir' in df.columns:
        df['dir'] = (180 - df['dir']) % 360
    if 'o' in df.columns:
        df['o'] = (180 - df['o']) % 360
    
    return df

def unflip_predictions(predictions):
    """Reverse flip on predictions"""
    pred_copy = predictions.copy()
    pred_copy[:, :, 1] = -pred_copy[:, :, 1]  # Negate dy
    return pred_copy

# Usage:
# 1. Original predictions
pred_original = model.predict(test_input)

# 2. Flipped predictions
test_input_flip = horizontal_flip_dataframe(test_input)
pred_flipped = model.predict(test_input_flip)
pred_flipped = unflip_predictions(pred_flipped)

# 3. Average both
pred_final = (pred_original + pred_flipped) / 2.0
```

**Impact**: Consistent +0.005-0.010 improvement on all models

## 4. Ensemble Combination

### From `ensemble_4model_SIMPLE.ipynb`:

```python
def predict(test, test_input):
    """
    Main prediction function for ensemble
    
    Args:
        test: polars DataFrame with output template
        test_input: polars DataFrame with input tracking data
    
    Returns:
        pd.DataFrame with columns ['x', 'y']
    """
    print('Model 1: ST Transformer...')
    p1 = predict_model1_st(test, test_input)
    
    print('Model 2: CNN...')
    p2 = predict_model2_cnn(test, test_input)
    
    print('Model 3: GRU...')
    p3 = predict_model3_gru(test, test_input)
    
    print('Model 4: Position...')
    p4 = predict_model4_position(test, test_input)
    
    print('Ensemble 4 models...')
    return pd.DataFrame({
        'x': (WEIGHTS['st'] * p1['x'].values + 
              WEIGHTS['cnn'] * p2['x'].values + 
              WEIGHTS['gru'] * p3['x'].values + 
              WEIGHTS['position'] * p4['x'].values),
        'y': (WEIGHTS['st'] * p1['y'].values + 
              WEIGHTS['cnn'] * p2['y'].values + 
              WEIGHTS['gru'] * p3['y'].values + 
              WEIGHTS['position'] * p4['y'].values)
    })
```

### Each Model's Prediction Includes:
1. Load all folds (20 folds for most models)
2. Preprocess input data
3. Make predictions with original data
4. Make predictions with flipped data (TTA)
5. Average flipped predictions after unflipping
6. Average all fold predictions
7. Return final ensemble prediction

## 5. Using the Ensemble

### For Local Testing:

```python
# Copy the code from ensemble_4model_SIMPLE.ipynb
# Update the paths to your pretrained models

import polars as pl

# Load test data
test = pl.read_csv('test_output_template.csv')
test_input = pl.read_csv('test_input.csv')

# Make predictions
predictions = predict(test, test_input)

# Save submission
predictions.to_csv('submission.csv', index=False)
```

### For Kaggle Submission:

```python
# Use the inference server (from notebook)
import kaggle_evaluation.nfl_inference_server

inference_server = kaggle_evaluation.nfl_inference_server.NFLInferenceServer(predict)

if os.getenv('KAGGLE_IS_COMPETITION_RERUN'):
    inference_server.serve()
else:
    inference_server.run_local_gateway(
        ('/kaggle/input/nfl-big-data-bowl-2026-prediction/',)
    )
```

## 6. Key Insights from the Ensemble

### What Makes It Work:

1. **Diversity is Key**:
   - Different architectures (Transformer, CNN, RNN)
   - Different feature sets (global, position-specific, geometric)
   - Different training strategies (seeds, folds, augmentations)

2. **Test-Time Augmentation**:
   - Applied to ALL models
   - Horizontal flip is most effective
   - Consistent improvement across architectures

3. **Cross-Validation Averaging**:
   - 20-fold CV for most models
   - Reduces overfitting
   - More stable predictions

4. **Weighted Combination**:
   - Weights based on inverse Public LB scores
   - Nearly equal weights (0.2476 - 0.2517)
   - Suggests all models contribute equally

### Performance Breakdown:

| Model | Individual Score | Weight | Contribution |
|-------|-----------------|--------|-------------|
| ST Transformer 6L | 0.547 | 0.2517 | 25.17% |
| Multiscale CNN | 0.548 | 0.2517 | 25.17% |
| Position ST | 0.553 | 0.2490 | 24.90% |
| GRU Seed27 | 0.557 | 0.2476 | 24.76% |
| **Ensemble** | **0.541** | **1.0** | **100%** |

**Improvement**: 0.541 vs 0.547 (best single) = -0.006 gain from ensembling

## 7. Pretrained Model Locations

### Kaggle Datasets (for submissions):

```python
KAGGLE_DATASETS = {
    'st_6l': '/kaggle/input/6layer-seed700-flip-only',
    'cnn': '/kaggle/input/st-multiscale-cnn-w10-20fold',
    'gru': '/kaggle/input/gru-w9-seed27-20fold',
    'position': '/kaggle/input/nfl-bdb-2026-position-st-combined'
}
```

### Local Models:

```python
LOCAL_MODELS = {
    'st_6l': '/mnt/raid0/BigData2/kaggle_submission/6l_no_bad_play',
    'cnn': '/mnt/raid0/BigData2/models/4L_CNN_Transformer_NO_BAD_PLAY_20fold_FLIP_SPEED',
    'gru': '/mnt/raid0/BigData2/models/gru_w9_h64_flip_speed_seed27_20fold',
    'position': '/mnt/raid0/BigData2/kaggle_submission/position_st_models'
}
```

## 8. How to Adapt for Your Own Use

### Step 1: Copy the Notebook
```bash
cp /mnt/raid0/BigData2/ensemble_4model_SIMPLE.ipynb ./my_ensemble.ipynb
```

### Step 2: Update Paths
Replace Kaggle dataset paths with your local paths:

```python
# Before (Kaggle)
DATASET_DIR_ST = '/kaggle/input/6layer-seed700-flip-only'

# After (Local)
DATASET_DIR_ST = '../pretrained/6layer_st_transformer_20fold'
```

### Step 3: Test Individual Models
Test each model separately before combining:

```python
# Test ST model only
p1 = predict_model1_st(test, test_input)
print(f"ST predictions shape: {p1.shape}")

# Test GRU model only
p3 = predict_model3_gru(test, test_input)
print(f"GRU predictions shape: {p3.shape}")
```

### Step 4: Combine Models
Start with 2 models, then add more:

```python
# 2-model ensemble first
ensemble_2 = (0.5 * p1['x'], 0.5 * p3['x'])

# Then add more
ensemble_4 = (
    0.25 * p1['x'] + 0.25 * p2['x'] + 
    0.25 * p3['x'] + 0.25 * p4['x']
)
```

## Summary

In this guide, we covered:

1. ✅ Ensemble architecture (4 models, weighted averaging)
2. ✅ Complete inference implementation (in `ensemble_4model_SIMPLE.ipynb`)
3. ✅ Test-Time Augmentation (horizontal flip)
4. ✅ Model combination strategy (weighted averaging)
5. ✅ Pretrained model locations
6. ✅ How to adapt for your own use

**Key Takeaways**:
- Ensemble of 4 diverse models achieves **0.541 Public LB** (best)
- TTA provides consistent +0.005-0.010 improvement
- Nearly equal weights suggest good diversity
- Complete code available in `ensemble_4model_SIMPLE.ipynb`

**Next Steps**:
- Review the full notebook: `/mnt/raid0/BigData2/ensemble_4model_SIMPLE.ipynb`
- Adapt for local inference with your pretrained models
- Experiment with different model combinations
- Try different weighting strategies