Relay is a baseball analytics app for querying, visualizing, and comparing cached Statcast pitch-level data.
The core idea is query-first baseball analysis: ask for a pitcher, pitch type, chart, table, or comparison in plain language, then open the full Pitch Explorer or Compare workbench when you want deeper control.
- Ingests Statcast pitch-level data into a local Parquet cache.
- Queries cached data through a FastAPI backend using DuckDB.
- Provides a React + TypeScript frontend with:
- Ask Relay natural-language query entry
- Pitch Explorer filters, tables, heatmaps, strike-zone views, and movement charts
- Pitcher comparison workflow with period presets, movement diff, heatmaps, and pitch-type deltas
- Keeps natural language deterministic for now by translating text into safe structured skill calls. Relay does not generate raw SQL.
backend/
app/
api/ FastAPI route modules
db/ DuckDB/parquet cache helpers
services/ pitch search, comparison, and query parsing logic
main.py FastAPI app entrypoint
scripts/ Statcast ingestion scripts and provider layer
tests/ backend unit tests
frontend/
src/
components/ chart and reusable UI components
views/ Pitch Explorer and Compare workbench views
App.tsx app shell, Ask Relay flow, shared state
data/ local Statcast cache and manifest, ignored by git
docs/ project documentationFrom the repo root:
cd backend
python -m venv ..\.venv
..\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
uvicorn app.main:app --reloadThe API runs at http://127.0.0.1:8000.
Useful checks:
- Health:
http://127.0.0.1:8000/health - OpenAPI docs:
http://127.0.0.1:8000/docs
In another terminal:
cd frontend
npm install
npm run devThe Vite dev server usually runs at http://localhost:5173.
The frontend reads VITE_API_URL; if unset, it defaults to http://localhost:8000.
Relay needs a local Statcast cache before pitch searches or comparisons are useful.
Example batch ingestion:
cd backend
..\.venv\Scripts\Activate.ps1
python scripts\ingest_statcast_batch.py `
--start-date 2024-04-01 `
--end-date 2026-05-21 `
--pitcher-name "Aaron Nola" `
--pitcher-name "Tarik Skubal" `
--pitcher-name "Paul Skenes" `
--pitcher-name "Nolan McLean" `
--pitcher-name "Kyle Bradish" `
--pitcher-name "Cade Povich" `
--output ..\data\statcast.parquet `
--manifest ..\data\statcast_manifest.json `
--replaceBy default ingestion keeps regular-season games only. Use --include-spring-training or --all-game-types only when you explicitly want those rows.
Backend tests:
.\.venv\Scripts\python.exe -m unittest discover backend\testsFrontend build:
cd frontend
npm run buildRebuild the cache manifest without fetching new data:
cd backend
python scripts\ingest_statcast_batch.py `
--start-date 2024-04-01 `
--end-date 2026-05-21 `
--output ..\data\statcast.parquet `
--manifest ..\data\statcast_manifest.json `
--index-only- docs/development.md: local setup, commands, and environment variables
- docs/data-ingestion.md: Statcast cache, manifest, game types, and ingestion scripts
- docs/api.md: backend endpoint overview and example requests
- docs/natural-language.md: Ask Relay skill-call contract and supported phrasing
- docs/frontend.md: frontend views, charts, and query-first UX
- docs/architecture.md: system architecture and future direction
data/statcast.parquetanddata/statcast_manifest.jsonare local cache artifacts, not source-controlled app code.- MLBAM pitcher ID is the canonical identity; names are display/search labels.
- The current natural-language layer is rule-based by design. A model-backed parser can be added later as long as it emits the same validated skill-call shape.