feat(canonical-keys): migrate Wave 1 producers to YYMMDDHHMM + latest.json shape#234
Merged
Merged
Conversation
….json shape
Per architectural review, all Wave 1 parquet/JSON producers now use
the canonical alpha_engine_lib.eval_artifacts shape:
artifact: {prefix}/{YYMMDDHHMM}.json (or {YYMMDDHHMM}_result.parquet)
latest: {prefix}/latest.json (sidecar with run_id pointer + metadata)
Why: research runs weekly; consumer accesses "the latest run" by
default. YYMMDDHHMM run_id encodes the date directly so calendar-date
sub-partitioning is redundant (per memory id 2013 — YYMMDDHHMM-encoded
run_ids make {calendar_date}/ partition redundant). Re-runs preserve
audit trail. Composes with eval-judge YYMMDDHHMM artifacts for
consistent system-wide partition shape.
Migrated:
- data/derived/news_aggregates.py — write_news_aggregates_parquet
- data/derived/analyst_revisions.py — write_revisions_parquet
+ load_snapshot_time_series
(lists prefix + parses body
payload's snapshot_date)
- data/snapshotter/analyst_daily.py — write_snapshot_document
+ read_snapshot_document (lists
per-ticker prefix + filters by
YYMMDD run_id prefix)
- rag/pipelines/ingest_form4.py — write_form4_parquet
+ ingest_for_tickers (one
consolidated parquet per run, not
one per filed_date; filed_date
preserved as row column)
Backward-compat shim during transition: readers try canonical
key/sidecar first, fall back to legacy {date}.parquet shape if missing.
After 1 Saturday SF firing under canonical shape, shim can be removed
(separate cleanup PR).
Removed deprecated helpers:
- news_aggregates.s3_key_for_date
- analyst_revisions.s3_key_for_date
- analyst_daily.s3_key_for
- ingest_form4.s3_key_for_filed_date
Tests updated to pin the canonical shape (35 + 16 + 20 + 16 = 87
adjusted; no behavior changes outside the artifact-key shape).
Suite: 1018 passing (1 skipped).
Pre-Saturday-SF migration: since no production data has been written
under either shape yet, this PR migrates the schema cleanly with no
data backfill.
Composes with:
- alpha-engine-config PR #164 (cadence correction)
- alpha-engine-data PR #233 (Gate A Saturday SF wiring)
- alpha-engine-research PR #174 (SubstrateReader — migration follows
in PR 2)
- alpha_engine_lib.eval_artifacts module (v0.8.0 — institutional
canonical for artifact partitioning)
PR 2 (alpha-engine-research SubstrateReader migration) follows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 1 of 2 — migrates Wave 1 producers to the canonical
alpha_engine_lib.eval_artifactsshape (YYMMDDHHMM run_id +latest.jsonsidecar). PR 2 (SubstrateReader migration in alpha-engine-research) follows.Per architectural review: the lib's eval_artifacts module is the institutional canonical for artifact partitioning. Wave 1 originally shipped with the older
{YYYY-MM-DD}.parquetshape matching legacy alpha-engine-data conventions; this PR corrects to canonical before the first Saturday SF firing writes any production data under the old shape.What's migrated
news_aggregates{date}.parquet{YYMMDDHHMM}_result.parquet+latest.jsonanalyst_revisions{as_of_date}.parquet{YYMMDDHHMM}_result.parquet+latest.jsonanalyst_snapshots/{ticker}/{date}.json{YYMMDDHHMM}.json+ per-tickerlatest.jsoninsider_transactions{filed_date}.parquet(one per filed_date){YYMMDDHHMM}_result.parquet+latest.json(one consolidated parquet per run; filed_date preserved as row column)Backward-compat shim during transition: readers try canonical first, fall back to legacy
{date}.parquetif missing. After 1 Saturday SF firing under canonical shape, shim can be removed (separate cleanup PR).Test plan
Composes with
alpha_engine_lib.eval_artifactsv0.8.0 (institutional canonical)🤖 Generated with Claude Code