Skip to content

refactor(iam): take ownership of shared orchestration roles#172

Merged
cipher813 merged 3 commits into
mainfrom
refactor/move-shared-iam-to-data
May 6, 2026
Merged

refactor(iam): take ownership of shared orchestration roles#172
cipher813 merged 3 commits into
mainfrom
refactor/move-shared-iam-to-data

Conversation

@cipher813
Copy link
Copy Markdown
Owner

@cipher813 cipher813 commented May 6, 2026

Summary

Move the codified inline policies for alpha-engine-step-functions-role and alpha-engine-eventbridge-sfn-role from alpha-engine/infrastructure/iam/ to this repo. Their grants are derived from code that lives here (SF JSON Lambda invoke targets, EC2 instances the SF SSMs, EventBridge SFN target ARNs) — co-locating tightens the coupling so a single PR can change SF behavior + matching IAM atomically.

Also drops the last surviving inline put-role-policy block from deploy_step_function.sh (Saturday script — PR #170 dropped its EB-SFN twin, PR #151 dropped the daily script's SF role twin; this completes the trio) and scopes add-ssm-policy.sh to non-codified Lambda execution roles only.

What's added

infrastructure/iam/
├── alpha-engine-step-functions-role.json    # NEW
├── alpha-engine-eventbridge-sfn-role.json   # NEW
├── github-actions-lambda-deploy.json        # was already here
├── apply.sh                                 # was already here
├── check-drift.py                           # NEW (flat-layout variant)
└── README.md                                # NEW

.github/workflows/iam-drift-check.yml         # NEW

Live operations performed

OIDC trust policy on github-actions-iam-drift-check widened live to also permit repo:cipher813/alpha-engine-data:* (added 2 sub patterns alongside the existing 2 for alpha-engine). Trust policies + role creation stay out-of-band per existing convention.

Companion / supersedes

Predictor follow-up (NOT in this PR)

add-ssm-policy.sh's ROLES list drops alpha-engine-predictor-role along with alpha-engine-executor-role. Executor's alpha-engine-ssm-read is already codified in alpha-engine. Predictor's live alpha-engine-ssm-read inline grant stays live (nothing deletes it) but is no longer maintained by this script. Codifying it on the predictor side is a separate small PR — alpha-engine-predictor's infrastructure/iam/ would need a directory-per-role refactor first to support multiple inline policies on the same role.

Test plan

  • bash -n syntax-clean on both deploy scripts.
  • python3 infrastructure/iam/check-drift.py clean against live AWS for all 3 roles owned here (lambda-deploy, SF, EB-SFN).
  • Foreign-writer guard from companion PR runs clean against this branch.
  • CI runs both checks on this PR.
  • Saturday SF auto-fire (Sat 2026-05-09) confirms no regression.
  • Weekday SF auto-fire (Thu 2026-05-07) confirms no regression.

🤖 Generated with Claude Code

Step Functions execution role + EventBridge cron role move here from
alpha-engine/infrastructure/iam/. Their grants are derived from code
that lives in this repo (SF JSON Lambda invoke targets, EC2 instances
the SF SSMs, EventBridge SFN target ARNs) — co-locating the codified
IAM with the source of those grants tightens the coupling so a single
PR can change SF behavior + matching IAM atomically.

Files added:
- infrastructure/iam/alpha-engine-step-functions-role.json
- infrastructure/iam/alpha-engine-eventbridge-sfn-role.json
- infrastructure/iam/check-drift.py (flat-layout variant of
  alpha-engine's directory-per-role drift checker)
- infrastructure/iam/README.md (documents which roles this repo owns
  + the single-writer rule)
- .github/workflows/iam-drift-check.yml (PR + daily 09:30 UTC + manual)

Files updated:
- infrastructure/deploy_step_function.sh — drops the surviving inline
  put-role-policy block against alpha-engine-step-functions-role
  (PR #170 dropped the EB-SFN twin; PR #151 dropped the daily-script
  twin; this completes the trio). The script kept a stale narrower
  policy that clobbered ssm:DescribeInstanceInformation +
  ec2:StopInstances + the trading-instance SSM ARN every saturday
  deploy. Trust policy + create-role bootstrap stay (one-time setup).
- infrastructure/add-ssm-policy.sh — drops alpha-engine-executor-role
  + alpha-engine-predictor-role from the ROLES list. Both now have
  alpha-engine-ssm-read codified in their home repos (executor:
  already; predictor: covered by separate PR codifying its existing
  live grant). Script remains the writer for non-codified Lambda
  execution roles only.

OIDC trust policy on github-actions-iam-drift-check widened live to
also permit repo:cipher813/alpha-engine-data so the new drift-check
workflow here can authenticate with the existing OIDC role.

Companion PRs:
- alpha-engine #137 — removes the codified directories, updates the
  cross-repo foreign-writer guard.
- alpha-engine-predictor (separate) — codifies existing live
  alpha-engine-ssm-read grant on predictor-role.

Supersedes alpha-engine-data #171 (which only addressed the saturday
script's inline write).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit f6818da into main May 6, 2026
2 checks passed
@cipher813 cipher813 deleted the refactor/move-shared-iam-to-data branch May 6, 2026 15:03
cipher813 added a commit that referenced this pull request May 13, 2026
…; requires lib v0.15.0) (#226)

Wave 1 PR β of the institutional data-revamp arc (plan doc:
~/Development/alpha-engine-docs/private/data-revamp-260513.md).

Producer-side concrete adapter implementations + multi-source
aggregator. Pairs with alpha-engine-lib PR #46 (PR α, v0.15.0) which
defined NewsSource Protocol + NewsArticle shape.

Architectural pattern: data is the producer; lib defines the contract;
research is the consumer (will read producer outputs via S3 + RAG
retrieval in future sub-PRs, never imports adapters directly).

New modules:

  collectors/news_sources/
    polygon.py     — FREE. Uses our existing polygon_client (data
                     repo's copy) for rate-limit reuse. Normalizes
                     /v2/reference/news.
    gdelt.py       — FREE (no key). GDELT 2.0 DOC API; academic-grade
                     event-extracted news. Requires ticker→name map
                     for query building.
    yahoo_rss.py   — FREE (fallback). Pure feedparser-based; matches
                     existing collectors/alternative.py pattern but
                     normalized into NewsArticle.
    benzinga.py    — PAID stub. Raises NotImplementedError on init.
    ravenpack.py   — PAID stub.
    bloomberg.py   — PAID stub.

  collectors/news_aggregator.py
    NewsAggregator(sources, trust_weights) — fan-in across enabled
    NewsSource adapters → dedup (composite fingerprint: normalized
    title + URL path hash with querystring/fragment stripped) →
    preserve all source-provenance variants → return
    AggregatedNewsArticle records sorted by earliest_published_at desc.
    DEFAULT_TRUST_WEIGHTS: paid 0.95-1.0, polygon 0.9, gdelt 0.85,
    edgar_press 0.95, yahoo_rss 0.5.

Lib pin bumps (lockstep, both must move per the pin-lockstep test):
  requirements.txt v0.12.0 → v0.15.0
  Dockerfile       v0.12.0 → v0.15.0

What's deferred (subsequent Wave 1 sub-PRs):
  PR A.1 — NLP pipeline (Loughran-McDonald + FinBERT + spaCy NER +
           LLM event extraction). Heavier deps; separate PR.
  PR A.2 — Structured aggregates writer (S3 parquet per ticker per
           day). Joined onto research's snapshot in PR F.
  PR A.3 — RAG ingest path: news → chunked → embedded → indexed in
           pgvector alongside existing SEC filings corpus.
  PR B   — Filings substrate expansion (EDGAR full coverage:
           10-K/Q/14A/S-1/13D/G/13F/Form-4).
  PR C   — Analyst substrate (yfinance + FMP adapters + self-derived
           revisions tracking from daily snapshots).
  PR D   — Async + S3 cache + per-vendor rate limiters.
  PR E   — Wire RAG retrieval tools into research repo's
           thesis_update + sector agents.
  PR F   — Wire new substrate into research's fetch_data (supersedes
           #170's per-ticker pre-fetch).

+37 unit tests:
  - Protocol structural-subtyping for all 3 free adapters
  - Polygon: happy + transient-failure-per-ticker + schema-drift-skip
  - GDELT: happy + query building (multi-word vs single-word) +
    failure-skips-ticker + missing-name-map-fallback
  - Yahoo RSS: happy + entries-older-than-cutoff-dropped +
    no-link-skipped + fetch-failure-skips-ticker
  - Paid stubs: all 3 raise NotImplementedError on init
  - Aggregator: fan-in + URL/title dedup + canonical-title-longest +
    canonical-url-highest-trust + ticker-union + one-broken-source-
    isolated + output-sorted-desc + empty-fan-in
  - Trust weights: defaults + overrides + unknown-source-defaults-half
  - Fingerprint determinism
  - Lib shape contract pin (extra='forbid' + frozen)

Suite: 848 passing.

Composes with:
  - alpha-engine-lib PR #46 (v0.15.0) — required for shapes + Protocols
  - alpha-engine-research PR #172 (CLOSED) — original mis-located
    substrate; relocated here per architectural correction

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant