Project Pony is a Codex-first template for Project Thoroughbred, a US-only horse-racing prediction and ticket-generation system.
The only authoritative racing data source is The Racing API. Future work must use The Racing API MCP tools for data-contract discovery and must filter all racing data to United States records only.
make bootstrap
make test
make acceptance-smoke
make live-smoke
make factor-auditPYTHONPATH=src python3 -m racing_app.cli check-configDependencies must be installed only inside this repository under .local/:
python3 -m venv .local/venv
.local/venv/bin/python -m pip install --upgrade pip
.local/venv/bin/pip install -e ".[dev]"- Root
AGENTS.mdis the source of truth for Codex behavior. .agents/skills/contains reusable Codex workflows for data gathering, US validation, point-in-time features, leakage-safe training, ELO, prediction, settlement, tuning, tickets, monitoring, and quality gates.- Guardrail tests cover the currently enforceable US-only, leakage, ELO chronology, tuning eligibility, future-race, and ticket-input invariants.
- The
src/racing_app/guardrail package now covers USA endpoint scope, raw-before-transform storage, 364-day backfill windows, leakage checks, ELO chronology, WOTP cap behavior, scratch field versions, Kelly caps, and ticket-input separation. - Live HTTP clients pace calls through
RACING_APP_API_REQUESTS_PER_SECONDwith a conservative default of4.0, below The Racing API's published default of 5 requests/second, and retry 429 responses withRetry-Afteror exponential backoff. - USA North America entries can be normalized and upserted into the local SQLite schema for fixture-tested race, runner, scratch, pool, change, and weather fields.
- USA North America results can be normalized and upserted into local result, payoff, wager-type, fraction, scratch, and weather tables for fixture-tested settlement inputs.
- USA entry refresh now also attempts official Pro racecard owner enrichment for the refreshed dates, then probes the corrected single-race Pro route for compatible
rac_...IDs that still lack owner context. Timestamped owner context is stored inentry_runner_owner_snapshotswithout overwriting raw North America entry rows. - Owner-dependent model features use a whole-active-field completeness gate: if any active runner lacks owner context, owner recent rate, trainer-owner combo, owner switch, and owner ELO are disabled for that race, while combined ELO is recomputed without the owner component. When a missing-owner runner scratches, the gate re-evaluates on the remaining active field.
- Live
.envchecks on 2026-04-28 used The Racing API North America endpoints and stored USA-only local data without printing credentials. After the user-requested clean reset, the active local database contains 221 USA meets, 722 entry races, 6,243 entry runners, 124 result meets, 1,052 result races, 3,158 result runners, 7 historical result races, 261 immutable pending prediction records across 67 future race slots, 253 current recommended tickets, 825 invalidated ticket recommendations, 22,955 entity aliases, 216 entity ID maps, 2,971 raw payloads, 5,995 API call log rows, and 0 non-USA meets. Generated databases and raw payloads remain ignored local artifacts. - Entry/result ingestion now also stores raw-first entity snapshots for horses, jockeys, trainers, owners, sires, dams, and damsires when those fields appear in USA The Racing API payloads.
make backfill-entity-snapshotsrehydrates those dimensions from stored raw payloads without additional API calls; entity counts are ingestion coverage, not a substitute for leakage-safe profile/history reconciliation. - WOTP snapshots are built only from The Racing API fields already present in entries payloads: morning line, live odds, fractional odds, dollar odds, and runner pool values.
- Prediction and ticket records now persist locally as separate immutable prediction rows and recommendation rows. Ticket requests and recommendations now also persist prediction/model/field-version/race lineage so recommendation outputs can be traced back to the immutable prediction inputs that generated them; the active database has been backfilled so existing ticket requests and recommendations no longer have empty lineage arrays.
- The local CLI can list stored entries, create a baseline immutable prediction, and generate recommendation-only win tickets from persisted prediction rows. Example path:
.local/venv/bin/python scripts/run_cli.py list-races --date 2026-04-28 --track "Churchill Downs"
.local/venv/bin/python scripts/run_cli.py predict-race --race-id <race_id>
.local/venv/bin/python scripts/run_cli.py recommend-win-tickets --prediction-id <prediction_id> --bankroll 100 --manual-planning-override
.local/venv/bin/python scripts/run_cli.py recommend-tickets --prediction-id <prediction_id> --bankroll 100 --bet-types Win,Exacta --manual-planning-override
.local/venv/bin/python scripts/run_cli.py recommend-multirace-tickets --prediction-ids <prediction_id_1>,<prediction_id_2> --bet-type "Daily Double" --bankroll 100 --manual-planning-override
.local/venv/bin/python scripts/run_cli.py settle-prediction --prediction-id <prediction_id>
.local/venv/bin/python scripts/backfill_api_call_log.py
.local/venv/bin/python scripts/backfill_ticket_lineage.py
.local/venv/bin/python scripts/run_cli.py prepare-race-day --date 2026-04-28 --track "Churchill Downs" --bankroll-per-race 100 --manual-planning-override
.local/venv/bin/python scripts/run_cli.py rebuild-elo
.local/venv/bin/python scripts/run_cli.py settle-eligible
.local/venv/bin/python scripts/run_cli.py lifecycle-audit
.local/venv/bin/python scripts/run_cli.py lifecycle-status --write-report
.local/venv/bin/python scripts/run_acceptance_smoke.py
.local/venv/bin/python scripts/run_live_smoke.py --date 2026-04-28 --track "Churchill Downs" --manual-planning-override
.local/venv/bin/python scripts/enrich_horse_history_us.py --race-id <race_id> --max-horses 5 --results-limit 10- The Streamlit GUI reads the local DB for date/course/race selection, runner display, prediction generation, and win-ticket recommendation generation.
- The conditional-logit module now has a small trainable grouped-softmax core for supplied point-in-time feature rows, plus leakage checks for forbidden post-race feature fields.
- The backtest utilities now cover chronological expanding splits and core prediction metrics. They are not yet a production walk-forward report over model-ready historical features.
- The prediction workflow now writes point-in-time feature snapshots for every
research/FACTOR_CANON.mdfactor ID per runner. Verified current support exists for a subset of current-entry, prior-result, WOTP, ELO, weather, and race-context factors, including endpoint-ledger-backed purse, claiming, class, grade, and state-bred restriction fields; unavailable or low-support factors are persisted with explicit missing reasons. make factor-auditwrites.local/reports/factor_canon_ledger.jsonusing the latest feature snapshot per race, so stale snapshots cannot inflate coverage. The latest clean real-data run snapshotted all 137 canon factors: 113 functioning, 24 snapshotted-missing, 24 accepted non-functioning source/data dependencies, 0 unresolved non-functioning factors, and 0 release-blocking factors. The ledger now includes per-factorsource_availabilitywith MCP/API provenance, sample and missing counts, sentinel counts, date-window rules, and leakage-guard notes. Unsupported workout, wind-speed, horse-level pace, gate-break, path/bias, and pace-scenario factors remain acceptedsnapshotted_missingrather than failures; opportunistic owner, apprentice, going preference, DP6A, rail/runup, and scratch-delta factors only graduate when real timestamp-safe inputs exist.ML01andML02remain disabled pending chronological validation and disable-if-worse promotion evidence.scripts/audit_factor_canon.py --all-feature-setswrites.local/reports/factor_canon_ledger_all_feature_sets.jsonfor historical debugging. WOTP movement factors rely on time-versioned API-only odds snapshots;WOTP02is a diagnostics-only public-vs-fundamental overlay persisted after probabilities exist and excluded from training inputs;WOTP04uses parsedhorse_data_pools[].pool_type_namevalues;WOTP10buckets API-only public probabilities;ML03,ML05, andML06are diagnostics-only post-prediction features for model disagreement, scratch delta, and prior-settled calibration-bin residuals; horse-history enrichment uses exact USA horse ID resolution,get_horse_results(region="usa"),get_horse_profile_pro, and official pedigree analysis endpoints to support prior-start form, pace/time/speed where fields exist, purse/earnings, transition-preference, switch/claim, competition-strength, DP6A, and pedigree tendency factors only from starts or source snapshots before the target date. Trainer/jockey entity enrichment uses exact USA-sourced names, search endpoints only as ID resolvers, and downstreamregion="usa"analysis/results calls for PRS connection factors.- ELO rebuilds from official USA result rows into
elo_ratingsandelo_rating_history; prediction features only use ratings whose result snapshot timestamp is strictly earlier than the prediction timestamp. Current supported contexts are horse overall/surface/distance/going/track plus jockey, trainer, and trainer-jockey combo overall ratings. make trainrebuilds current ELO contexts and fits the fundamental conditional-logit model only when settled prediction/result feature targets exist, the prediction timestamp is before both race post time and the official result snapshot, and the feature snapshot passes target-result leakage checks. Fundamental training usesbaseline_bentercanon factors plus legacy post-position features; diagnostics-only, WOTP, ELO, and supplemental RF/XGBoost factors are excluded because those are separate blend/audit components. Constrained blend tuning, calibration diagnostics, and local supplemental residual-model utilities are implemented, but production supplemental factor outputs stay disabled until enough chronological validation rows exist and disable-if-worse checks pass. Otherwise training creates a local audit record with readiness states such asfeature_snapshots_available_no_training_targetsorleakage_unsafe_training_targets. The current clean real-data DB has 261 pending prediction records across repeated regeneration passes, 0 eligible prediction/result joins, and the latest training run reportsfeature_snapshots_available_no_training_targets. Future or pending predictions remain ineligible for training, backtesting, tuning, or calibration until official results arrive later and chronology is valid.- Prediction generation automatically uses the latest persisted leakage-safe
trained_from_settled_predictionsconditional-logit model when one exists; otherwise it falls back to the baseline equal-probability model. make backtestcreates a local audit record and reports metrics only when prediction-before-post-time/result joins exist and their feature snapshots pass leakage checks; otherwise it reports states such asno_prior_predictions_with_resultsorno_leakage_safe_prior_predictions_with_results. The current real-data backtest reportsno_prior_predictions_with_resultsand is waiting on clean prediction-before-post-time/result joins for metrics.scripts/run_cli.py lifecycle-auditruns the post-result local lifecycle in order: settle eligible prior predictions, run training audit, and write a backtest audit.refresh_results_us.pyinvokes the same lifecycle after official result refresh.scripts/run_cli.py lifecycle-status --write-reportwrites.local/reports/lifecycle_status.jsonand reports how many predictions are pending results, eligible for settlement, settled, settled-but-chronology-invalid, or chronology-ineligible.scripts/run_acceptance_smoke.pyruns a deterministic local end-to-end proof in.local/tmp/acceptance_smoke: fixture USA entries, predictions, same-race tickets, a multi-race ticket, official results, settlement, training, backtest, and.local/reports/acceptance_smoke.json.scripts/run_live_smoke.pyruns the same operational chain against real The Racing API USA entries/results and writes.local/reports/live_smoke_<date>.json.- The GUI has local controls for USA entries/results sync, training audit, and backtest audit, in addition to race selection, prediction generation, and ticket recommendations.
- The CLI/GUI can generate same-race ticket plans and multi-race ticket plans from persisted prediction IDs. Multi-race plans validate distinct race legs and persist ticket legs/combinations separately from predictions.
- Bulk race-day preparation can predict every stored race for a date/track and generate recommendation-only tickets per race. The clean 2026-04-28 validation run created predictions for 2026-04-29 through 2026-05-04 and recommendation-only tickets where the current risk/availability logic produced them.
- The 2026-04-28 real-data validation refreshed current/tomorrow cards, pulled results back to 2026-04-18, refreshed future entries through 2026-05-04, enriched horse/trainer/jockey history where exact USA identity resolution succeeded, prepared Churchill Downs, Gulfstream Park, and Parx Racing race days, created 261 pending prediction records across 67 future race slots, left 253 current recommended tickets with 0 stale recommended tickets, and used conservative live enrichment pacing at 1.2-1.5 requests/second.
- Settlement is append-only: a prediction can be settled only when an official The Racing API result exists for the same race, the prediction timestamp is earlier than both race post time and the result timestamp, and the result timestamp is after race post time.
- Ticket-generation helpers now cover Win, Place, Show, Exacta, Quinella, Trifecta, Superfecta, Daily Double, Pick 3, Pick 4, Pick 5, and Pick 6 recommendation structures. They require API wager availability unless a manual planning override is explicitly supplied, and there is still no wager-placement code.
- The baseline CLI prediction path uses equal probabilities when no trained feature weights exist. It is a persistence and workflow proof, not a production betting model.
- Churchill Downs is the preferred development smoke-test target track when relevant The Racing API data is available.
Do not commit credentialed source responses, paid raw files, generated databases, model artifacts, predictions, tickets, or tuning outputs. Use .local/ or ignored generated-data folders.
This project is decision-support tooling. It must report uncertainty, preserve audit trails, and never promise profits, guaranteed outcomes, or risk-free wagering.