NBA Stadium Attendance Prediction Using Game Intensity Modeling

This project predicts NBA stadium attendance by modeling game intensity and combining it with contextual data like weather. A two-stage PyTorch pipeline first estimates point differential as a proxy for excitement, then predicts attendance to support scheduling, promotion, and dynamic pricing decisions for the NBA.

🧠 Key Features

Two-stage modeling pipeline: Predict game score margin and use it to forecast attendance.
Data integration: Combines box score, weather, and game metadata from 2022–2025 seasons.
Model serving: FastAPI-based inference with ONNX optimization.
Online evaluation: Data drift detection and synthetic data testing for real-time robustness.

🏀 Motivation

NBA game schedules and TV allocations are often fixed and rarely optimized for fan interest. This project introduces an ML-based tool that forecasts attendance using game intensity signals and contextual factors, allowing the NBA to identify underperforming matchups and optimize resource allocation, promotions, and ticket pricing.

🧪 System Overview

Model 1: Predicts point differential using rolling averages of team statistics.
Model 2: Uses point differential and weather data to predict game attendance.
Serving: FastAPI endpoint for real-time inference; ONNX conversion for performance.
Evaluation: MLFlow for offline tracking, Alibi for online data drift detection.

📊 Evaluation Results

Component	Metric	Value
Model 1	RMSE / R²	Tracked via MLFlow
Model 2	RMSE / R²	Tracked via MLFlow
Online Eval	Response time	~1.2s avg
Drift Detection	Alibi + Gaussian	Detected expected changes
Final Output	Attendance MAE	Evaluated on 2024–25 games

📦 Data Pipeline

Data Sources:
- nba_api (box score, attendance)
- WeatherAPI and Open-Meteo (temperature, wind, precipitation)
Time Range: 2022–23 to 2024–25 regular seasons
Preprocessing:
- Rolling 5-game averages
- Season-wise splits to prevent leakage
- Weather joins by date and location

⚙️ Model Architecture

Model 1 (Score Prediction):
- PyTorch MLP
- Input: Team stats (rolling averages)
- Output: Point differential
Model 2 (Attendance Prediction):
- PyTorch MLP
- Input: Output of Model 1 + weather data
- Output: Predicted attendance
Optimizations:
- ONNX conversion
- Graph and quantization optimizations for Model 2

🚀 Serving & Inference

Interface: FastAPI web app
Inputs: Game date, home team, away team
Flow:
1. Retrieve historical team stats and weather
2. Run Model 1 → predict score diff
3. Run Model 2 → predict attendance
Backend Support:
- .pth and ONNX model formats
- Model staging, fallback logic
Limitations:
- Requires valid NBA team acronyms and dates
- GPU-based performance tuning was partially constrained

🔍 Online Evaluation

Synthetic Data:
- Generated using Gaussian noise and future date shifts
Drift Detection:
- Implemented with Alibi Detect
- Reports confidence scores and alerts

🧱 Infrastructure

Component	Description
2 × `m1.medium` VMs	Training and serving environments
A100 GPUs (x2)	Model training with DDP
Persistent Storage	Object store for data and artifacts
Floating IPs	Model endpoint + VM communication

🧰 Tech Stack

Python, PyTorch, ONNX, FastAPI
MLflow for experiment tracking
Alibi for online evaluation
Trovi + Chameleon Cloud (KVM@TACC) for compute

👥 Contributors

Name	Focus Area
Will Calandra	Model training
Lake Wang	Model serving, monitoring
SungJoon Moon	Data pipeline
All Members	Planning, integration, testing

📌 Future Improvements

Full implementation of offline evaluation and load testing
CI/CD integration for Ray-based job management
Enhanced dashboarding for drift and performance monitoring
Real-time feature enrichment (e.g., betting markets, injuries)

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
data_engineering		data_engineering
devops		devops
ml-train		ml-train
serving		serving
README.md		README.md
project_diagram.png		project_diagram.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NBA Stadium Attendance Prediction Using Game Intensity Modeling

🧠 Key Features

🏀 Motivation

🧪 System Overview

📊 Evaluation Results

📦 Data Pipeline

⚙️ Model Architecture

🚀 Serving & Inference

🔍 Online Evaluation

🧱 Infrastructure

🧰 Tech Stack

👥 Contributors

📌 Future Improvements

About

Uh oh!

Releases

Packages

Languages

Lake-Wang/MLops_System_NBA_Attendance

Folders and files

Latest commit

History

Repository files navigation

NBA Stadium Attendance Prediction Using Game Intensity Modeling

🧠 Key Features

🏀 Motivation

🧪 System Overview

📊 Evaluation Results

📦 Data Pipeline

⚙️ Model Architecture

🚀 Serving & Inference

🔍 Online Evaluation

🧱 Infrastructure

🧰 Tech Stack

👥 Contributors

📌 Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages