Skip to content

feat(signal_returns): write calibrator-v1 context on score_performance seed + backfill#206

Merged
cipher813 merged 2 commits into
mainfrom
feat/seed-score-performance-canonical-context
May 10, 2026
Merged

feat(signal_returns): write calibrator-v1 context on score_performance seed + backfill#206
cipher813 merged 2 commits into
mainfrom
feat/seed-score-performance-canonical-context

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Closes the producer-side root cause behind backtester #176's evaluator P0. Two writers had diverged silently after research migration #12 (2026-05-08): `scoring/performance_tracker.py::record_new_buy_scores` wrote all 5 canonical context columns but had zero production callers; `collectors/signal_returns.py::_seed_score_performance` was the actual production writer (runs weekly in DataPhase1) and only wrote `(symbol, score_date, score, price_on_date)`. Result: every row in `score_performance` post-migration had NULL `quant_score` / `qual_score` / `conviction` / `sector_modifier` / `market_regime`.

Production audit at time of writing:

date rows quant_score populated qual_score populated
2026-04-30 23 0 0
2026-05-01 23 0 0

(`investment_thesis` for the same dates had 13/15 and 50/58 populated, so the upstream LLM pipeline IS producing the values — they were just being dropped at the seed step.)

Changes

  • `_seed_score_performance` now reads sub_scores + conviction + sector + market_regime from the same signals.json payload that drives the BUY filter (no second round-trip; single source-of-truth fetch per date).
  • `_backfill_score_context` (new) repairs legacy rows whose canonical columns are NULL. UPDATE-WHERE-NULL means re-runs converge to a no-op once every row has at least one source. Wired into `collect()` as Step 1b.
  • `_ensure_score_performance_schema` defensively mirrors research migration Add GitHub Actions auto-deploy on push to main #12 (5 calibrator-v1 context columns). Belt-and-suspenders in case DataPhase1 ever fires against a fresh research.db before research's cold-start migrations.
  • `_extract_signal_context` (new helper) matches the extraction logic in alpha-engine-research's `scripts/backfill_calibrator_v1_context.py` so producer-side seed and legacy backfill agree on every field's source.

Test plan

  • `pytest tests/test_signal_returns_score_context.py -q` → 11 passed (3 extraction cases, 3 seed-INSERT cases, 4 backfill cases, 1 schema-ensure idempotence case)
  • `pytest -q` (full suite) → 705 passed, 1 skipped, 0 failures
  • Saturday 2026-05-16 SF DataPhase1 emits canonical context on newly-seeded rows
  • Same SF run's backfill step closes the 186-row legacy NULL gap in production research.db
  • weight_optimizer status=ok end-to-end on that same SF run (no auto-rollback from the column-rename class)

Composes with

🤖 Generated with Claude Code

cipher813 and others added 2 commits May 10, 2026 11:33
…e seed + backfill

Root-cause closure for the 2026-05-09 Saturday SF evaluator P0
(weight_optimizer ERROR: "None of [Index(['quant_score','qual_score'])]
are in the [columns]"; auto-rollback Sharpe -42.2% vs baseline).

Producer audit revealed two parallel writers diverged silently after
research migration #12 (2026-05-08):
  * scoring/performance_tracker.py::record_new_buy_scores writes ALL 5
    canonical context columns — but has zero production callers.
  * collectors/signal_returns.py::_seed_score_performance is the actual
    production writer (runs weekly in DataPhase1) and only wrote
    (symbol, score_date, score, price_on_date). The 5 canonical
    columns (quant_score, qual_score, conviction, sector_modifier,
    market_regime) were never populated.

Single-fact-single-writer rebuild:
  * _seed_score_performance now extracts the 5 context fields from the
    same signals.json payload that drives the BUY filter — single
    source-of-truth fetch per signals.json, no second round-trip.
  * New _backfill_score_context repairs legacy rows whose canonical
    columns are NULL. UPDATE-WHERE-NULL so re-runs are no-ops once
    every row has a source.
  * _ensure_score_performance_schema mirrors research migration #12
    defensively in case DataPhase1 ever fires against a fresh
    research.db before research's cold-start migrations run.

Composes with backtester #176 (PR-day consumer-side coalesce fix). With
this PR the producer becomes authoritative; the next backtester PR can
retire the S3 round-trip in weight_optimizer.load_with_subscores.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Locks the producer-side contract established in the previous commit:
after seed + backfill complete, query score_performance for rows with
score_date >= 2026-05-17 (first Sat SF after this PR merges) and emit
the coverage percentage as a CloudWatch gauge:

  AlphaEngine/Data/score_performance_canonical_coverage_pct

Coverage = fraction of post-cutover rows with ALL 5 canonical context
columns populated (quant_score, qual_score, conviction,
sector_modifier, market_regime). 100% is the contract; the gauge is
always emitted (including 100.0) so alarm baselines stay continuous.

Mirrors the chronic-gap drift detection pattern at
weekly_collector.py:_check_chronic_gap_polygon_recovery — same
best-effort emit, same observability-not-load-bearing posture. A
follow-up alpha-engine-lib transparency_inventory entry can wire this
into the substrate health alarm if desired; the metric itself is the
drift signal.

Tripwire test asserts _CANONICAL_CONTEXT_COLUMNS stays in lockstep
with the seed INSERT — adding a 6th column to one without the other
would make the drift gate blind to that field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 4c58e1f into main May 10, 2026
1 check passed
@cipher813 cipher813 deleted the feat/seed-score-performance-canonical-context branch May 10, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant