Project Pony

Project Pony is a Codex-first template for Project Thoroughbred, a US-only horse-racing prediction and ticket-generation system.

The only authoritative racing data source is The Racing API. Future work must use The Racing API MCP tools for data-contract discovery and must filter all racing data to United States records only.

Quick Start

make bootstrap
make test
make acceptance-smoke
make live-smoke
make factor-audit

PYTHONPATH=src python3 -m racing_app.cli check-config

Dependencies must be installed only inside this repository under .local/:

python3 -m venv .local/venv
.local/venv/bin/python -m pip install --upgrade pip
.local/venv/bin/pip install -e ".[dev]"

Current Scope

Root AGENTS.md is the source of truth for Codex behavior.
.agents/skills/ contains reusable Codex workflows for data gathering, US validation, point-in-time features, leakage-safe training, ELO, prediction, settlement, tuning, tickets, monitoring, and quality gates.
Guardrail tests cover the currently enforceable US-only, leakage, ELO chronology, tuning eligibility, future-race, and ticket-input invariants.
The src/racing_app/ guardrail package now covers USA endpoint scope, raw-before-transform storage, 364-day backfill windows, leakage checks, ELO chronology, WOTP cap behavior, scratch field versions, Kelly caps, and ticket-input separation.
Live HTTP clients pace calls through RACING_APP_API_REQUESTS_PER_SECOND with a conservative default of 4.0, below The Racing API's published default of 5 requests/second, and retry 429 responses with Retry-After or exponential backoff.
USA North America entries can be normalized and upserted into the local SQLite schema for fixture-tested race, runner, scratch, pool, change, and weather fields.
USA North America results can be normalized and upserted into local result, payoff, wager-type, fraction, scratch, and weather tables for fixture-tested settlement inputs.
USA entry refresh now also attempts official Pro racecard owner enrichment for the refreshed dates, then probes the corrected single-race Pro route for compatible rac_... IDs that still lack owner context. Timestamped owner context is stored in entry_runner_owner_snapshots without overwriting raw North America entry rows.
Owner-dependent model features use a whole-active-field completeness gate: if any active runner lacks owner context, owner recent rate, trainer-owner combo, owner switch, and owner ELO are disabled for that race, while combined ELO is recomputed without the owner component. When a missing-owner runner scratches, the gate re-evaluates on the remaining active field.
Live .env checks on 2026-04-28 used The Racing API North America endpoints and stored USA-only local data without printing credentials. After the user-requested clean reset, the active local database contains 221 USA meets, 722 entry races, 6,243 entry runners, 124 result meets, 1,052 result races, 3,158 result runners, 7 historical result races, 261 immutable pending prediction records across 67 future race slots, 253 current recommended tickets, 825 invalidated ticket recommendations, 22,955 entity aliases, 216 entity ID maps, 2,971 raw payloads, 5,995 API call log rows, and 0 non-USA meets. Generated databases and raw payloads remain ignored local artifacts.
Entry/result ingestion now also stores raw-first entity snapshots for horses, jockeys, trainers, owners, sires, dams, and damsires when those fields appear in USA The Racing API payloads. make backfill-entity-snapshots rehydrates those dimensions from stored raw payloads without additional API calls; entity counts are ingestion coverage, not a substitute for leakage-safe profile/history reconciliation.
WOTP snapshots are built only from The Racing API fields already present in entries payloads: morning line, live odds, fractional odds, dollar odds, and runner pool values.
Prediction and ticket records now persist locally as separate immutable prediction rows and recommendation rows. Ticket requests and recommendations now also persist prediction/model/field-version/race lineage so recommendation outputs can be traced back to the immutable prediction inputs that generated them; the active database has been backfilled so existing ticket requests and recommendations no longer have empty lineage arrays.
The local CLI can list stored entries, create a baseline immutable prediction, and generate recommendation-only win tickets from persisted prediction rows. Example path:

.local/venv/bin/python scripts/run_cli.py list-races --date 2026-04-28 --track "Churchill Downs"
.local/venv/bin/python scripts/run_cli.py predict-race --race-id <race_id>
.local/venv/bin/python scripts/run_cli.py recommend-win-tickets --prediction-id <prediction_id> --bankroll 100 --manual-planning-override
.local/venv/bin/python scripts/run_cli.py recommend-tickets --prediction-id <prediction_id> --bankroll 100 --bet-types Win,Exacta --manual-planning-override
.local/venv/bin/python scripts/run_cli.py recommend-multirace-tickets --prediction-ids <prediction_id_1>,<prediction_id_2> --bet-type "Daily Double" --bankroll 100 --manual-planning-override
.local/venv/bin/python scripts/run_cli.py settle-prediction --prediction-id <prediction_id>
.local/venv/bin/python scripts/backfill_api_call_log.py
.local/venv/bin/python scripts/backfill_ticket_lineage.py
.local/venv/bin/python scripts/run_cli.py prepare-race-day --date 2026-04-28 --track "Churchill Downs" --bankroll-per-race 100 --manual-planning-override
.local/venv/bin/python scripts/run_cli.py rebuild-elo
.local/venv/bin/python scripts/run_cli.py settle-eligible
.local/venv/bin/python scripts/run_cli.py lifecycle-audit
.local/venv/bin/python scripts/run_cli.py lifecycle-status --write-report
.local/venv/bin/python scripts/run_acceptance_smoke.py
.local/venv/bin/python scripts/run_live_smoke.py --date 2026-04-28 --track "Churchill Downs" --manual-planning-override
.local/venv/bin/python scripts/enrich_horse_history_us.py --race-id <race_id> --max-horses 5 --results-limit 10

The Streamlit GUI reads the local DB for date/course/race selection, runner display, prediction generation, and win-ticket recommendation generation.
The conditional-logit module now has a small trainable grouped-softmax core for supplied point-in-time feature rows, plus leakage checks for forbidden post-race feature fields.
The backtest utilities now cover chronological expanding splits and core prediction metrics. They are not yet a production walk-forward report over model-ready historical features.
The prediction workflow now writes point-in-time feature snapshots for every research/FACTOR_CANON.md factor ID per runner. Verified current support exists for a subset of current-entry, prior-result, WOTP, ELO, weather, and race-context factors, including endpoint-ledger-backed purse, claiming, class, grade, and state-bred restriction fields; unavailable or low-support factors are persisted with explicit missing reasons.
make factor-audit writes .local/reports/factor_canon_ledger.json using the latest feature snapshot per race, so stale snapshots cannot inflate coverage. The latest clean real-data run snapshotted all 137 canon factors: 113 functioning, 24 snapshotted-missing, 24 accepted non-functioning source/data dependencies, 0 unresolved non-functioning factors, and 0 release-blocking factors. The ledger now includes per-factor source_availability with MCP/API provenance, sample and missing counts, sentinel counts, date-window rules, and leakage-guard notes. Unsupported workout, wind-speed, horse-level pace, gate-break, path/bias, and pace-scenario factors remain accepted snapshotted_missing rather than failures; opportunistic owner, apprentice, going preference, DP6A, rail/runup, and scratch-delta factors only graduate when real timestamp-safe inputs exist. ML01 and ML02 remain disabled pending chronological validation and disable-if-worse promotion evidence. scripts/audit_factor_canon.py --all-feature-sets writes .local/reports/factor_canon_ledger_all_feature_sets.json for historical debugging. WOTP movement factors rely on time-versioned API-only odds snapshots; WOTP02 is a diagnostics-only public-vs-fundamental overlay persisted after probabilities exist and excluded from training inputs; WOTP04 uses parsed horse_data_pools[].pool_type_name values; WOTP10 buckets API-only public probabilities; ML03, ML05, and ML06 are diagnostics-only post-prediction features for model disagreement, scratch delta, and prior-settled calibration-bin residuals; horse-history enrichment uses exact USA horse ID resolution, get_horse_results(region="usa"), get_horse_profile_pro, and official pedigree analysis endpoints to support prior-start form, pace/time/speed where fields exist, purse/earnings, transition-preference, switch/claim, competition-strength, DP6A, and pedigree tendency factors only from starts or source snapshots before the target date. Trainer/jockey entity enrichment uses exact USA-sourced names, search endpoints only as ID resolvers, and downstream region="usa" analysis/results calls for PRS connection factors.
ELO rebuilds from official USA result rows into elo_ratings and elo_rating_history; prediction features only use ratings whose result snapshot timestamp is strictly earlier than the prediction timestamp. Current supported contexts are horse overall/surface/distance/going/track plus jockey, trainer, and trainer-jockey combo overall ratings.
make train rebuilds current ELO contexts and fits the fundamental conditional-logit model only when settled prediction/result feature targets exist, the prediction timestamp is before both race post time and the official result snapshot, and the feature snapshot passes target-result leakage checks. Fundamental training uses baseline_benter canon factors plus legacy post-position features; diagnostics-only, WOTP, ELO, and supplemental RF/XGBoost factors are excluded because those are separate blend/audit components. Constrained blend tuning, calibration diagnostics, and local supplemental residual-model utilities are implemented, but production supplemental factor outputs stay disabled until enough chronological validation rows exist and disable-if-worse checks pass. Otherwise training creates a local audit record with readiness states such as feature_snapshots_available_no_training_targets or leakage_unsafe_training_targets. The current clean real-data DB has 261 pending prediction records across repeated regeneration passes, 0 eligible prediction/result joins, and the latest training run reports feature_snapshots_available_no_training_targets. Future or pending predictions remain ineligible for training, backtesting, tuning, or calibration until official results arrive later and chronology is valid.
Prediction generation automatically uses the latest persisted leakage-safe trained_from_settled_predictions conditional-logit model when one exists; otherwise it falls back to the baseline equal-probability model.
make backtest creates a local audit record and reports metrics only when prediction-before-post-time/result joins exist and their feature snapshots pass leakage checks; otherwise it reports states such as no_prior_predictions_with_results or no_leakage_safe_prior_predictions_with_results. The current real-data backtest reports no_prior_predictions_with_results and is waiting on clean prediction-before-post-time/result joins for metrics.
scripts/run_cli.py lifecycle-audit runs the post-result local lifecycle in order: settle eligible prior predictions, run training audit, and write a backtest audit. refresh_results_us.py invokes the same lifecycle after official result refresh.
scripts/run_cli.py lifecycle-status --write-report writes .local/reports/lifecycle_status.json and reports how many predictions are pending results, eligible for settlement, settled, settled-but-chronology-invalid, or chronology-ineligible.
scripts/run_acceptance_smoke.py runs a deterministic local end-to-end proof in .local/tmp/acceptance_smoke: fixture USA entries, predictions, same-race tickets, a multi-race ticket, official results, settlement, training, backtest, and .local/reports/acceptance_smoke.json.
scripts/run_live_smoke.py runs the same operational chain against real The Racing API USA entries/results and writes .local/reports/live_smoke_<date>.json.
The GUI has local controls for USA entries/results sync, training audit, and backtest audit, in addition to race selection, prediction generation, and ticket recommendations.
The CLI/GUI can generate same-race ticket plans and multi-race ticket plans from persisted prediction IDs. Multi-race plans validate distinct race legs and persist ticket legs/combinations separately from predictions.
Bulk race-day preparation can predict every stored race for a date/track and generate recommendation-only tickets per race. The clean 2026-04-28 validation run created predictions for 2026-04-29 through 2026-05-04 and recommendation-only tickets where the current risk/availability logic produced them.
The 2026-04-28 real-data validation refreshed current/tomorrow cards, pulled results back to 2026-04-18, refreshed future entries through 2026-05-04, enriched horse/trainer/jockey history where exact USA identity resolution succeeded, prepared Churchill Downs, Gulfstream Park, and Parx Racing race days, created 261 pending prediction records across 67 future race slots, left 253 current recommended tickets with 0 stale recommended tickets, and used conservative live enrichment pacing at 1.2-1.5 requests/second.
Settlement is append-only: a prediction can be settled only when an official The Racing API result exists for the same race, the prediction timestamp is earlier than both race post time and the result timestamp, and the result timestamp is after race post time.
Ticket-generation helpers now cover Win, Place, Show, Exacta, Quinella, Trifecta, Superfecta, Daily Double, Pick 3, Pick 4, Pick 5, and Pick 6 recommendation structures. They require API wager availability unless a manual planning override is explicitly supplied, and there is still no wager-placement code.
The baseline CLI prediction path uses equal probabilities when no trained feature weights exist. It is a persistence and workflow proof, not a production betting model.
Churchill Downs is the preferred development smoke-test target track when relevant The Racing API data is available.

Local Data

Do not commit credentialed source responses, paid raw files, generated databases, model artifacts, predictions, tickets, or tuning outputs. Use .local/ or ignored generated-data folders.

Responsible Use

This project is decision-support tooling. It must report uncertainty, preserve audit trails, and never promise profits, guaranteed outcomes, or risk-free wagering.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.agents/skills		.agents/skills
.github		.github
.vscode		.vscode
AUDIT		AUDIT
PROMPTS		PROMPTS
RELEASE		RELEASE
build		build
config		config
data		data
docs		docs
extensions		extensions
research		research
resources		resources
scripts		scripts
src		src
test		test
tests		tests
.DS_Store		.DS_Store
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
MANIFEST.md		MANIFEST.md
Makefile		Makefile
PLANNING.md		PLANNING.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.lock		requirements.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Pony

Quick Start

Current Scope

Local Data

Responsible Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Pony

Quick Start

Current Scope

Local Data

Responsible Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages