Skip to content

feat(canonical-keys): migrate Wave 1 producers to YYMMDDHHMM + latest.json shape#234

Merged
cipher813 merged 1 commit into
mainfrom
feat/canonical-key-shape-migration
May 13, 2026
Merged

feat(canonical-keys): migrate Wave 1 producers to YYMMDDHHMM + latest.json shape#234
cipher813 merged 1 commit into
mainfrom
feat/canonical-key-shape-migration

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

PR 1 of 2 — migrates Wave 1 producers to the canonical alpha_engine_lib.eval_artifacts shape (YYMMDDHHMM run_id + latest.json sidecar). PR 2 (SubstrateReader migration in alpha-engine-research) follows.

Per architectural review: the lib's eval_artifacts module is the institutional canonical for artifact partitioning. Wave 1 originally shipped with the older {YYYY-MM-DD}.parquet shape matching legacy alpha-engine-data conventions; this PR corrects to canonical before the first Saturday SF firing writes any production data under the old shape.

What's migrated

Module Old New
news_aggregates {date}.parquet {YYMMDDHHMM}_result.parquet + latest.json
analyst_revisions {as_of_date}.parquet {YYMMDDHHMM}_result.parquet + latest.json
analyst_snapshots/{ticker}/ {date}.json {YYMMDDHHMM}.json + per-ticker latest.json
insider_transactions {filed_date}.parquet (one per filed_date) {YYMMDDHHMM}_result.parquet + latest.json (one consolidated parquet per run; filed_date preserved as row column)

Backward-compat shim during transition: readers try canonical first, fall back to legacy {date}.parquet if missing. After 1 Saturday SF firing under canonical shape, shim can be removed (separate cleanup PR).

Test plan

  • 1018 passing (1 skipped). Updated tests pin the canonical shape + sidecar pattern; no behavior changes outside artifact-key shape.
  • Pre-Saturday-SF migration: no production data exists under either shape yet, so this is a clean schema migration with zero data backfill.

Composes with

🤖 Generated with Claude Code

….json shape

Per architectural review, all Wave 1 parquet/JSON producers now use
the canonical alpha_engine_lib.eval_artifacts shape:
  artifact: {prefix}/{YYMMDDHHMM}.json  (or {YYMMDDHHMM}_result.parquet)
  latest:   {prefix}/latest.json   (sidecar with run_id pointer + metadata)

Why: research runs weekly; consumer accesses "the latest run" by
default. YYMMDDHHMM run_id encodes the date directly so calendar-date
sub-partitioning is redundant (per memory id 2013 — YYMMDDHHMM-encoded
run_ids make {calendar_date}/ partition redundant). Re-runs preserve
audit trail. Composes with eval-judge YYMMDDHHMM artifacts for
consistent system-wide partition shape.

Migrated:
  - data/derived/news_aggregates.py — write_news_aggregates_parquet
  - data/derived/analyst_revisions.py — write_revisions_parquet
                                       + load_snapshot_time_series
                                       (lists prefix + parses body
                                       payload's snapshot_date)
  - data/snapshotter/analyst_daily.py — write_snapshot_document
                                       + read_snapshot_document (lists
                                       per-ticker prefix + filters by
                                       YYMMDD run_id prefix)
  - rag/pipelines/ingest_form4.py — write_form4_parquet
                                    + ingest_for_tickers (one
                                    consolidated parquet per run, not
                                    one per filed_date; filed_date
                                    preserved as row column)

Backward-compat shim during transition: readers try canonical
key/sidecar first, fall back to legacy {date}.parquet shape if missing.
After 1 Saturday SF firing under canonical shape, shim can be removed
(separate cleanup PR).

Removed deprecated helpers:
  - news_aggregates.s3_key_for_date
  - analyst_revisions.s3_key_for_date
  - analyst_daily.s3_key_for
  - ingest_form4.s3_key_for_filed_date

Tests updated to pin the canonical shape (35 + 16 + 20 + 16 = 87
adjusted; no behavior changes outside the artifact-key shape).

Suite: 1018 passing (1 skipped).

Pre-Saturday-SF migration: since no production data has been written
under either shape yet, this PR migrates the schema cleanly with no
data backfill.

Composes with:
  - alpha-engine-config PR #164 (cadence correction)
  - alpha-engine-data PR #233 (Gate A Saturday SF wiring)
  - alpha-engine-research PR #174 (SubstrateReader — migration follows
    in PR 2)
  - alpha_engine_lib.eval_artifacts module (v0.8.0 — institutional
    canonical for artifact partitioning)

PR 2 (alpha-engine-research SubstrateReader migration) follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 6cc248b into main May 13, 2026
1 check passed
@cipher813 cipher813 deleted the feat/canonical-key-shape-migration branch May 13, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant