This project predicts NBA stadium attendance by modeling game intensity and combining it with contextual data like weather. A two-stage PyTorch pipeline first estimates point differential as a proxy for excitement, then predicts attendance to support scheduling, promotion, and dynamic pricing decisions for the NBA.
- Two-stage modeling pipeline: Predict game score margin and use it to forecast attendance.
- Data integration: Combines box score, weather, and game metadata from 2022–2025 seasons.
- Model serving: FastAPI-based inference with ONNX optimization.
- Online evaluation: Data drift detection and synthetic data testing for real-time robustness.
NBA game schedules and TV allocations are often fixed and rarely optimized for fan interest. This project introduces an ML-based tool that forecasts attendance using game intensity signals and contextual factors, allowing the NBA to identify underperforming matchups and optimize resource allocation, promotions, and ticket pricing.
- Model 1: Predicts point differential using rolling averages of team statistics.
- Model 2: Uses point differential and weather data to predict game attendance.
- Serving: FastAPI endpoint for real-time inference; ONNX conversion for performance.
- Evaluation: MLFlow for offline tracking, Alibi for online data drift detection.
Component | Metric | Value |
---|---|---|
Model 1 | RMSE / R² | Tracked via MLFlow |
Model 2 | RMSE / R² | Tracked via MLFlow |
Online Eval | Response time | ~1.2s avg |
Drift Detection | Alibi + Gaussian | Detected expected changes |
Final Output | Attendance MAE | Evaluated on 2024–25 games |
- Data Sources:
- nba_api (box score, attendance)
- WeatherAPI and Open-Meteo (temperature, wind, precipitation)
- Time Range: 2022–23 to 2024–25 regular seasons
- Preprocessing:
- Rolling 5-game averages
- Season-wise splits to prevent leakage
- Weather joins by date and location
- Model 1 (Score Prediction):
- PyTorch MLP
- Input: Team stats (rolling averages)
- Output: Point differential
- Model 2 (Attendance Prediction):
- PyTorch MLP
- Input: Output of Model 1 + weather data
- Output: Predicted attendance
- Optimizations:
- ONNX conversion
- Graph and quantization optimizations for Model 2
- Interface: FastAPI web app
- Inputs: Game date, home team, away team
- Flow:
- Retrieve historical team stats and weather
- Run Model 1 → predict score diff
- Run Model 2 → predict attendance
- Backend Support:
- .pth and ONNX model formats
- Model staging, fallback logic
- Limitations:
- Requires valid NBA team acronyms and dates
- GPU-based performance tuning was partially constrained
- Synthetic Data:
- Generated using Gaussian noise and future date shifts
- Drift Detection:
- Implemented with Alibi Detect
- Reports confidence scores and alerts
Component | Description |
---|---|
2 × m1.medium VMs |
Training and serving environments |
A100 GPUs (x2) | Model training with DDP |
Persistent Storage | Object store for data and artifacts |
Floating IPs | Model endpoint + VM communication |
- Python, PyTorch, ONNX, FastAPI
- MLflow for experiment tracking
- Alibi for online evaluation
- Trovi + Chameleon Cloud (KVM@TACC) for compute
Name | Focus Area |
---|---|
Will Calandra | Model training |
Lake Wang | Model serving, monitoring |
SungJoon Moon | Data pipeline |
All Members | Planning, integration, testing |
- Full implementation of offline evaluation and load testing
- CI/CD integration for Ray-based job management
- Enhanced dashboarding for drift and performance monitoring
- Real-time feature enrichment (e.g., betting markets, injuries)