feat(features): per-ticker risk features (Stage 2a regime-conditioning rebuild) by cipher813 · Pull Request #202 · cipher813/alpha-engine-data

cipher813 · 2026-05-10T15:30:05Z

Summary

Stage 2a of the regime-conditioning rebuild — adds 4 institutional per-ticker risk-decomposition features to features/feature_engineer.py
Each feature varies cross-sectionally on a given date (rank-norm pipeline-compatible) and captures a distinct risk dimension the existing 6 vol features cannot make a split on
Cross-repo: alpha-engine-predictor Stage 2b consumes these features via the same parallel-observation pattern Stage 1b/1c uses (parallel prod_vol_risk_aug GBM + expected_move_risk_aug parallel field)

Plan reference

Plan doc: ~/Development/alpha-engine-docs/private/regime-conditioning-260510.md (gitignored). ROADMAP entry in alpha-engine-config #103.

New features

Feature	What it captures	Compute
`beta_60d`	Systematic market exposure	60d rolling cov(stock, SPY) / var(SPY) on log-returns. NaN if SPY unavailable.
`idio_vol_60d`	Idiosyncratic (non-market) risk	60d rolling std of `(stock_log_return − beta × spy_log_return) × sqrt(252)`
`vol_of_vol_30d`	Vol regime stability	30d rolling std of `realized_vol_20d`
`max_drawdown_60d`	Recent left-tail risk	Worst peak-to-trough within trailing 60d window. Distinct from `dist_from_52w_high` which is current depth-from-rolling-252-high

FEATURE_CFG additions: beta_window=60, vol_of_vol_window=30, max_drawdown_window=60.

Stage sequencing

Stage 0a — Triple-barrier label generic module (predictor feat(data): wire flow-doctor into all entrypoints via alpha-engine-lib #116, merged)
Stage 0b — Variant cutover gate (predictor fix(data): vendor alpha-engine-lib + ship flow-doctor.yaml in Phase 2… #117, merged)
Stage 0c — Retire V2 + RegimeConditionedMeta (predictor fix(orchestration): bump DeployDriftCheck timeout 60s -> 300s #118, merged)
Stage 1a — Per-feature normalization substrate (predictor feat(daily_append): producer-side universe-freshness scan + S3 receipt #119, merged)
Stage 1b — Parallel vol-with-macros training (predictor feat(iam): grant github-actions-lambda-deploy access to changelog/ prefix #120, merged)
Stage 1c — Wire inference parallel observation (predictor feat(ci): wire deploy.yml + deploy-infrastructure.yml into system-wide changelog #121, awaiting merge)
Stage 2a — Risk features in feature_engineer (this PR, alpha-engine-data)
Stage 2b — Parallel prod_vol_risk_aug GBM training + inference (alpha-engine-predictor)
Stage 1d / 2d — Cutovers gated on variant_cutover_gate (≥15% relative IC lift)

Operator follow-up (post-merge)

The new feature columns will appear in compute_features() output starting on the next Saturday SF firing of feature_engineer. Historical rows in ArcticDB universe library need a one-shot backfill before Stage 2b's training can consume them. Backfill command (to be confirmed against the existing universe library backfill pattern):

# Per the existing universe-library backfill discipline; exact command TBD
# alongside Stage 2b sequencing
python -m features.compute --backfill --columns beta_60d,idio_vol_60d,vol_of_vol_30d,max_drawdown_60d

The Stage 2b PR description will include the canonical backfill invocation once the predictor side is scoped.

Test plan

16 new tests covering: schema contract (FEATURES + compute_features output), beta-of-self=1, beta-of-independent-series≈0, idio_vol-of-self=0, monotone-increasing→drawdown=0, post-decline→drawdown<-0.15, NaN-when-no-SPY for beta+idio_vol, vol_of_vol nonnegative
Full data suite: 644 passed + 1 skipped (16 new + 628 unchanged)

🤖 Generated with Claude Code

…oning rebuild) Stage 2a of the regime-conditioning rebuild (plan doc: alpha-engine-docs/private/regime-conditioning-260510.md). Adds 4 institutional risk-decomposition features to features/feature_engineer.py that capture per-ticker risk dimensions distinct from the existing volatility features. Each varies cross-sectionally on a given date (rank-norm pipeline-compatible) and gives the volatility GBM new splits the existing 6 vol features cannot make. Per-feature definitions: - beta_60d: 60d rolling regression slope of stock log-returns vs SPY log-returns. Systematic market exposure. NaN when spy_series is unavailable. - idio_vol_60d: Residual vol after removing market-beta exposure. residual = stock_log_return - beta * spy_log_return; 60d rolling std × sqrt(252). Idiosyncratic risk. - vol_of_vol_30d: 30d rolling stdev of realized_vol_20d. Stability of vol regime — high values signal vol-regime instability. - max_drawdown_60d: Worst peak-to-trough drawdown within trailing 60d window. Distinct from dist_from_52w_high (current depth from rolling-252-high): captures the deepest drawdown that occurred during the recent 60d, even if the stock has since recovered. Always non-positive. These four features are consumed by Stage 2b (alpha-engine-predictor) which trains a parallel macro+risk-augmented volatility GBM (prod_vol_risk_aug) alongside the plain vol GBM and the vol_macro_aug variant from Stage 1b. Each variant is independently gated by the variant_cutover_gate (≥15% relative IC lift over plain baseline) before any cutover. FEATURE_CFG additions: beta_window=60, vol_of_vol_window=30, max_drawdown_window=60. 644 tests pass + 1 skip (16 new tests for the 4 risk features: schema contract, beta-of-self=1, beta-of-independent=0, idio_vol-of-self=0, monotone-increasing→drawdown=0, post-decline→drawdown<0, NaN-when-no-SPY). Operator follow-up: ArcticDB universe library backfill for the 4 new columns. Saturday SF feature_engineer run will start writing them on the next firing; historical rows need a one-shot backfill script invocation against the universe library before Stage 2b training can consume them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds 63-day realized volatility (per-ticker) to the v3.2 risk feature set. Pairs with the existing realized_vol_20d to give the volatility GBM a vol-term-structure signal — steep upward slope in the (20d, 63d) pair indicates vol expansion; flat or inverted indicates mean-revert regime. Captures a slower vol regime than 20d. Trees can split on the relationship between short and medium-window vol naturally. 3 new tests (warmup, non-negativity, smoother-than-20d). Stage 2a-extended scope discipline: this is the only piece of the expanded macro/risk set that lives in alpha-engine-data — 200d breadth, VIX/VIX3M ratio, 10Y-2Y curve, and HY OAS features all live in alpha-engine-predictor's regime_predictor.build_features() (macros) or require new daily_closes.py time-series ingestion (DGS2 + HY OAS). Those are scoped as Stage 2.5 (data-side ingestion) + Stage 2c (predictor-side wiring). 19 risk feature tests pass total (16 pre-existing + 3 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ht + EvalJudge/Rationale/Replay/CF dry_run_llm) (#263) Closes the keystone gap: the 5 documented shell-run skip-exceptions are flipped skip→dry. Under shell_run EVERY substantive workload now boots + runs dry; ZERO skip-exceptions remain. All prerequisite dry flags were already MERGED on origin/main of their repos. Per-state mechanism: | State | Type | Mechanism (under shell_run) | |-----------------------------|--------|------------------------------------------------------| | DriftDetection | spot | commands.$ States.Format($.preflight_args) → ` --preflight-only` (data #261) | | EvalJudgeSubmitFirstSaturday| Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgeSubmitWeekly | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgePoll | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgeProcess | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | RationaleClustering | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | ReplayConcordance | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) | | Counterfactual | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) | Exact canonical dry var: $.research_dry. It is THE canonical shell-run LLM-dry signal — InitializeInput seeds it false on every run (so the absent path / real Sat 02:00 PT firing is unchanged); ApplyShellRunDefaults already sets it true under shell_run (it backed Research from the keystone). No new var invented — research #202 / backtester #225 PR bodies specify dry_run_llm, and reusing $.research_dry keeps the absent-path guarantee automatic (no extra seeding needed; the seed already exists). Changes: - ApplyShellRunDefaults: removed skip_drift_detection / skip_eval_judge / skip_rationale_clustering / skip_replay_concordance / skip_counterfactual from the force-set JsonMerge blob. It now force-sets ZERO skip_*. Per-flag user overrides still win (merge order unchanged). The Choice-gated CheckSkip<State> gates are LEFT INTACT (still valid for targeted operator skips — verified by test_skip_gates_still_intact). - DriftDetection: literal `commands` array → `commands.$` States.Array whose final entry is States.Format('bash infrastructure/ spot_drift_detection.sh{} 2>&1 | tee /var/log/drift-detection.log', $.preflight_args). {} sits immediately after the script token with no literal space; preflight_args carries its leading space inside the var, so preflight_args="" reproduces the origin/main command char-for-char and " --preflight-only" yields exactly one separating space. - 7 eval Lambda Payloads: added "dry_run_llm.$": "$.research_dry". EvalRollingMean (alpha-engine-research-eval-rolling-mean) was NOT touched — it has no skip gate, was never a keystone exception, and is a pure historical-metric reader (out of scope). Byte-identical proof approach: - shell_run absent ⇒ CheckShellRun.Default = CheckSkipMorningEnrich (unchanged); InitializeInput seeds preflight_args="", research_dry=false. Every spot States.Format resolves char-for-char to the frozen origin/main literal; every eval Lambda dry_run_llm.$ resolves to false (handlers default it false ⇒ behaviourally identical to pre-rewire). - The frozen baseline fixture tests/fixtures/sf_prekeystone_spot_commands .json now INCLUDES DriftDetection's pre-rewire origin/main literal command (regenerated via the established generator at preflight_args=""; the existing 7 entries are unchanged). The byte-identical test asserts DriftDetection's resolved command at preflight_args="" equals that frozen baseline and carries --preflight-only (single space) under shell_run. - CI-safe: tests read only the committed fixture (no `git show origin/main` shell-out — that was the #260 CI failure). Tests: - _SPOT_STATES grew to 8 (added DriftDetection); _DRY_LAMBDA_STATES grew to 11 (added the 7 eval states); _KEYSTONE_SKIP_EXCEPTIONS = empty set. - test_shell_defaults_force_set_ZERO_skip_exceptions asserts the blob force-sets no skip_* and none of the 16 workload skips (incl. the 5 ex-exceptions) appear. - TestHappyPathTraversal: under shell_run nothing is skipped (skipped == set()); DriftDetection is VISITED (runs dry), not jumped past. - Module + class docstrings updated to the rewire semantics. JSON valid (58 top-level states, 91 incl. parallel branches). Full alpha-engine-data suite: 1351 passed, 1 skipped, 0 failed. Zero skip-exceptions remain — every substantive task runs dry under shell_run (spots → --preflight-only, Lambdas → dry_run_llm). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 and others added 2 commits May 10, 2026 08:29

cipher813 mentioned this pull request May 10, 2026

feat(collectors): FRED date-range history fetcher (Stage 2.5b) #204

Merged

9 tasks

cipher813 merged commit 479a5f0 into main May 10, 2026
1 check passed

cipher813 deleted the feat/risk-features-stage-2a branch May 10, 2026 18:11

This was referenced May 10, 2026

feat(collectors): BAA10Y full-history credit-regime indicator (Stage 2.5c) #205

Merged

docs(config): banner on config.yaml.example + delete stale flow-doctor.yaml.example #225

Merged

cipher813 mentioned this pull request May 18, 2026

feat(sf): rewire last 5 skip-exceptions → dry (DriftDetection preflight + EvalJudge/Rationale/Replay/CF dry_run_llm) #263

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(features): per-ticker risk features (Stage 2a regime-conditioning rebuild)#202

feat(features): per-ticker risk features (Stage 2a regime-conditioning rebuild)#202
cipher813 merged 2 commits into
mainfrom
feat/risk-features-stage-2a

cipher813 commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 10, 2026

Summary

Plan reference

New features

Stage sequencing

Operator follow-up (post-merge)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant