Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/compute-rankings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,10 @@ jobs:
- name: Install
run: |
python -m pip install --upgrade pip
pip install -e .
# Phase 4h: weekly compute imports compute/ingest/osap.py which
# imports the `openassetpricing` package — installed via the
# `factors` extra (pinned to ==0.0.2 in pyproject.toml).
pip install -e ".[factors]"

- name: Compute current quarter id
id: quarter
Expand Down
16 changes: 9 additions & 7 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,14 +153,16 @@ non-connector-bound work.

## Phase status

Current schema: **`0.8.0-phase4.5f`** · Defense layer: **17**
(7 active vetoes + 10 annotates + 5 numerical guards +
`manipulation_index` rollup). Latest release tag:
Current schema: **`0.9.0-phase4h`** (bumped from `0.8.0-phase4.5f` in
PR #112). Defense layer: **17** (7 active vetoes + 10 annotates + 5
numerical guards + `manipulation_index` rollup) — Phase 4h adds
observability surface, no new veto. Latest release tag:
[**`v1.2.0-phase4.5`**](https://github.com/dackclup/quantrank/releases/tag/v1.2.0-phase4.5)
**SHIPPED 2026-05-17** at commit `6d414a9b` — **Phase 4.5 cluster
✅ complete** (6 sub-PRs). Production verified run #51
(`b1588b2a`, 5m14s warm-cache). Test suite: 856 offline + 17
`@network`.
shipped 2026-05-17 at commit `6d414a9b`. **Phase 4h in flight in PR
#112** — OSAP signal replication (factor-exposure proxy) + PBO/DSR
hard gate (PR #60 reuse) + rolling-12m IC observability + Path-b
composite × OSAP blend (50/50 default, Top-5 still ranks raw
composite per Rule 16). Test suite: 906 offline + 19 `@network`.

**Next deliverable** (pick by appetite — three tracks parallelize):
**4.5e** (Form 4 insider, ~3w → v1.3.0) · **4h/4i/4j/4k** factor
Expand Down
2 changes: 1 addition & 1 deletion PHASE_STATUS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
| 1 | Universe + prices ingestion | ✅ DONE — 2026-05-08 |
| 2 | Fundamentals via SEC EDGAR | ✅ DONE — 2026-05-08 |
| 3 | Classical features + composite + **defenses** → **v1.0** | ✅ **DONE — 2026-05-14** (v1.0.0 tagged + GitHub release) |
| 4 | Factor consolidation (OSAP + JKP + Qlib + IPCA) → **v1.1** | 🟡 IN PROGRESS — 4a-4g + 4c.1/4c.2/4c.3 + PR 4b §1+§2 all merged; PR 4b §3 IC-decay output deferred to Phase 5; **next: 4h / 4i / 4j / 4k factor integrations** (PBO/DSR gate ready), can run in parallel with Phase 4.5 |
| 4 | Factor consolidation (OSAP + JKP + Qlib + IPCA) → **v1.1** | 🟡 IN PROGRESS — 4a-4g + 4c.1/4c.2/4c.3 + PR 4b §1+§2 all merged; **PR #112 (Phase 4h)** ships OSAP signal replication + PBO/DSR gate + Path-b 50/50 blend (schema bump `0.8.0-phase4.5f` → `0.9.0-phase4h`, no new veto — annotate-only blend, Top-5 still ranks raw composite per Rule 16, 5-commit cluster on `claude/resume-quantrank-phase-4.5-Zh0pO`); 4i/4j/4k pending; PR 4b §3 IC-decay output deferred to Phase 5 |
| **4.5** | **Earnings-manipulation defense cluster** → **v1.2** | ✅ **DONE 2026-05-17** — **tag [`v1.2.0-phase4.5`](https://github.com/dackclup/quantrank/releases/tag/v1.2.0-phase4.5) cut** at commit `6d414a9b`. 6 sub-PRs (#89/#90/#91 + #93 + #95 + #97 + #100). Active vetoes **5 → 7**; defense layer **9 → 17** (= 7 vetoes + 10 annotates). 4.5f adds `manipulation_index` (0-100 rollup) + `composite_score_adjusted` (soft penalty, max 10 pts, informational only) + `ManipulationRiskCard` UI + schema bump **`0.7.1-phase4g` → `0.8.0-phase4.5f`**. Production verified run #51 (`b1588b2a`, 5m14s warm-cache): card fires on 158/502 (31.5%); HIGH band 2 (SMCI=84 · WAT=64), MODERATE 60, LOW 96. 4.5e Form-4 insider clustering **deferred to v1.3.0** — reserved-slot weights already declared in `FLAG_WEIGHTS`. |
| 5 | ML meta-learner (Triple-Barrier + Meta-Labeling + Conformal) + SHAP | ⚪ not started |
| 6 | Sentiment v2 (FinBERT + Whisper + 8-K Lazy Prices) | ⚪ not started |
Expand Down
1 change: 1 addition & 0 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,7 @@ Schema versions:
| `0.7.0-phase4g` | Phase 4g | **8-K Tier-2 event defenses re-enabled** (PR #79, merged 2026-05-15 on `c35c6d40`, closes [issue #14](https://github.com/dackclup/quantrank/issues/14)). Flipped `compute/scoring/tier2._EIGHT_K_DEFENSES_ENABLED = True` after the PR 3d workflow-timeout deferral (root cause cleared by PR #58 cache layers + PR 3d tenacity tightening). `non_reliance_filing` (Item 4.02 hard veto, 365d lookback, Schroeder 2024 SSRN — ~50% of 4.02 filings precede formal restatement) returns to the active layer as the **5th active veto**. `auditor_change` (Item 4.01 annotate, 730d lookback, Reg S-K Item 304, Cohen-Malloy-Nguyen 2020 type) joins the Tier-2 annotate surface. No data-schema-shape delta — only the feature-flag flip + reason-taxonomy expansion. |
| `0.7.1-phase4g` | Phase 4g | **`price_change_1d_pct` additive field** (squash-merged via PR #80, commit `1509f707`). New optional `float \| None` field on `StockSummary` + `StockDetail` — day-over-day percent change from the prior trading-day close. Computed once in `compute/main.py:_fetch_prices_one` from the last two valid yfinance closes; null for newly-IPO'd tickers (only one close available). Lets the ranking-table mobile cards render a change pill without lazy-fetching 502 per-stock history JSONs. Per `phase-4/schema-versioning/PLAN.md`: "Add a new optional field (default = None) → patch". Production metadata.version stays `0.7.0-phase4g` until next weekly compute. |
| `0.7.1-phase4g` (no schema delta) | Phase 4.5a-4.5d wave | **Earnings-manipulation defense cluster — sub-PRs 4.5a + 4.5b + 4.5c + 4.5d shipped 2026-05-16/17** (PRs #89/#90/#91 + #93 + #95 + #97). **No data-schema-shape delta** — all 9 new flag identifiers are strings appended to existing `risk_flags: list[str]` (active vetoes) + `valuation_warnings: list[str]` (annotates) arrays. Active vetoes **5 → 7**: + `beneish_manipulation_veto` (Beneish 1999, M > −1.78) + `dechow_manipulation_veto` (Dechow 2011, F > 3.0). Annotates added: `manipulation_triple_flag` (4.5a joint gate, 2 fired: SMCI · WAT), `restatement_history` (4.5b, 59 fired / 11.8% — Hennes-Leone-Miller 2008 *TAR*), `late_filing_notification` (4.5b, 2 fired: HAS · Q — Bartov-Lai-Yeung 2002 *JAR*), `rem_suspect` (4.5c, 16 fired / 3.2% — Roychowdhury 2006 *JAE* 3-proxy REM via per-sector OLS), `accruals_momentum_high` (4.5d, 50 fired / 10.0% — Sloan 1996 / Beneish 1999 Δ(TATA) > +0.05 over 3y), `loss_avoidance_pattern` (4.5d, 0 fired — Burgstahler-Dichev 1997 cohort thresholds too tight for S&P 500 large-cap universe, file as follow-up). Also closes [issue #7](https://github.com/dackclup/quantrank/issues/7) (Sloan over-firing on Financials: 21.3% → 11.7%, sector spread 7.7× → 1.4×). 2 new cache dirs (`compute/cache/edgar_amendments/` + `compute/cache/edgar_late_filings/`, 7d TTL each). Test suite **646 → 831 offline**. Reason taxonomy: 24 stable + 2 Tier-3 + 2 new vetoes + 6 new annotates = **34 stable identifiers**. |
| **`0.9.0-phase4h`** (in flight in PR #112) | Phase 4h | **OSAP signal replication + PBO/DSR hard gate + Path-b composite × OSAP blend** (5-commit cluster on branch `claude/resume-quantrank-phase-4.5-Zh0pO`: 06bdac76 schema-foundation, b79983f6 osap_replicate proxy + 100-signal manifest, a6760d91 osap_blend Path-b, df4d9bd2 osap_validation PBO/DSR gate + rolling-12m-IC, [TBD] compute/main.py wiring + @network e2e). **Minor bump** — 6 new optional fields land simultaneously: `StockDetail.osap_signals: dict[str, float] \| None` + `StockDetail.osap_blended_score: float \| None`; `Metadata.osap_signals_used: list[str] \| None`, `Metadata.osap_excluded_signals: list[str] \| None`, `Metadata.osap_signals_ic_12m: dict[str, float] \| None`, `Metadata.osap_signals_coverage_pct: dict[str, float] \| None`. **OSAP blend stays OUTSIDE `compute_composite()`** — `PHASE3_WEIGHTS` sum-to-1.0 invariant (`compute/scoring/composite.py:43-45`) intact; Path-b formula `blended = (1 - weight) × composite_score + weight × osap_signal_aggregate`, default `weight=0.5` locked at `osap-integration/PLAN.md:168-170`. **Hard gate** = PBO ≤ 0.5 AND DSR > 0 via PR #60's `factor_passes_gates`; rolling-12m Spearman IC is observability-only (full walk-forward CV deferred to Phase 5 per `defense-infrastructure/PLAN.md:270`). **No new veto** (Top-5 still ranks raw `composite_score` per Rule 16; `osap_blended_score` is informational); defense layer stays at **17**. **Universe-gap policy** — tickers with no OSAP coverage pass `composite_score` through unchanged (no impute, distinct from pillar `neutralize_missing=True`). **NaN policy in PBO cohort** — zero-fill (not mean-fill, not dropna) preserves Bailey 2014 `n_trials = cohort_size` multiple-testing correction; sparse signals naturally lose on DSR (low Sharpe → DSR rejection). **OSAP failure is observability-only** — wrapped in try/except in `compute/main.py` so live-fetch / package failure NEVER blocks weekly production; all 6 new fields degrade to `None`. Test suite **856 → 906 offline + 18 → 19 `@network`** (commits 2-5 added 50 tests; e2e network test added in commit 5). Reason taxonomy unchanged at 34 stable identifiers. Tag `v1.1.0-phase4` (or `v1.3.0` for the 4.5e+4h combined release) deferred until 4i/4j/4k also merge. |
| **`0.8.0-phase4.5f`** | Phase 4.5f | **Manipulation Composite + soft composite penalty + UI** (PR #100 merged 2026-05-17 on commit `b1588b2a`; production verified on commit `e57f09cb`, run #51, warm-cache 5m14s). **Minor bump** because 5 new optional fields land simultaneously + new UI surface ships + tag `v1.2.0-phase4.5` coordinates with the data-version bump (semver coupling). Additive optional fields: `StockSummary.manipulation_index: float \| None`, `StockSummary.composite_score_adjusted: float \| None`, `StockDetail.manipulation_index`, `StockDetail.composite_score_adjusted`, `StockDetail.manipulation_components: dict[str, bool] \| None`. **`manipulation_index`** is a 0-100 rollup over the 4.5a-d flag set via a per-flag additive weight table in `compute/scoring/manipulation_index.py::FLAG_WEIGHTS` (active vetoes 15-20 pts · joint-gate 10 · annotates 5-8 · Tier-3 soft 3); clipped to `[0, 100]`. **`composite_score_adjusted`** applies the soft penalty `composite − 0.5 × (index / 100) × 20` (max 10-pt deduction at index = 100); the original `composite_score` field is preserved untouched per Rule 9 audit trail. **Rank source stays the raw composite per Rule 16** — the adjusted value is informational only, surfaced on the new detail-page `ManipulationRiskCard` (3-band outlined-light: emerald LOW / amber MODERATE / rose HIGH) with the in-line qualifier "Composite penalty: −X.XX pts (informational; rank uses raw composite)". Production: 158/502 (31.5%) fire the card (HIGH 2: SMCI=84 · WAT=64; MODERATE 60; LOW 96). **Phase 4.5e reserved-slot weights declared** (`INSIDER_SELL_CLUSTER_WEIGHT_RESERVED = 10`, `C_SUITE_UNUSUAL_SELL_WEIGHT_RESERVED = 5`) — the 4.5e PR uncomments 2 entries in `FLAG_WEIGHTS`, no calibration cascade. Test suite **831 → 856 offline**. Reason taxonomy: 34 stable identifiers (unchanged — `manipulation_index` is a derivation, not a new flag). Tag **`v1.2.0-phase4.5`** ready to cut. |

> Phase 4+ schemas are tracked in [`WORKFLOW.md`](WORKFLOW.md) "Defense
Expand Down
70 changes: 69 additions & 1 deletion compute/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
MODELS_DIR: Path = PROJECT_ROOT / "models"

UNIVERSE: str = "SP500"
SCHEMA_VERSION: str = "0.8.0-phase4.5f"
SCHEMA_VERSION: str = "0.9.0-phase4h"

PRICES_PERIOD: str = "5y"
MAX_PARALLEL_FETCHES: int = 10
Expand Down Expand Up @@ -181,3 +181,71 @@
# more often is wasted bandwidth.
OSAP_RETURNS_CACHE: Path = CACHE_DIR / "osap" / "returns.parquet"
OSAP_RETURNS_MAX_AGE_DAYS: int = 31

# --- Phase 4h: 100-signal manifest ---
#
# Theme buckets mirror the table at
# `.claude/skills/phase-4/osap-integration/PLAN.md` L60-73
# (Value/Quality/Momentum/Investment/Risk/EarningsNews/Trading +
# Misc). CamelCase names follow the Chen-Zimmermann OSAP convention
# (see github.com/OpenSourceAP/CrossSection signal docs).
#
# Aspirational manifest — commit 4's PBO/DSR gate
# (`compute/validation/osap_validation.py`) will catch any signal that
# does not resolve in the fetched OSAP returns DataFrame and log it
# under `metadata.json::osap_excluded_signals` with reason
# `not_found_in_osap_dataset` so the manifest can be tuned over
# subsequent compute runs without a redeploy.
OSAP_SIGNALS_BY_THEME: dict[str, tuple[str, ...]] = {
"Value": (
"BM", "EP", "SP", "CF", "DivYieldST", "NetEquityFinance",
"NetDebtFinance", "BookLeverage", "IntanBM", "IntanCFP",
"IntanEP", "IntanSP", "DebtIssuance", "OperatingLeverage",
"CompositeDebtIssuance",
), # 15
"Quality": (
"GP", "RoE", "RoA", "AssetTurnover", "AOP", "OperatingProfit",
"RDS", "RD", "ProfitMargin", "CashProf", "GrcapxThreeYears",
"AccrualsBM", "OperatingAccruals", "PctTotAcc", "Cash",
), # 15
"Momentum": (
"Mom12m", "Mom6m", "Mom36m", "Mom1m", "STreversal", "IndMom",
"IntMom", "EarnSupBig", "MomVol", "MomOffSeason", "MomSeason",
"Recomm_ShortInterest",
), # 12
"Investment": (
"AssetGrowth", "ChNNCOA", "ChNWC", "GrLTNOA", "ChInv",
"ShareIss1Y", "ShareIss5Y", "GrSaleToGrInv",
), # 8
"Risk": (
"MaxRet", "IdioVol3F", "IdioVolAHT", "BetaTailRisk", "Beta",
"BetaFP", "ReturnSkew", "ReturnSkew3F", "IndIPO",
"AbnormalAccruals",
), # 10
"EarningsNews": (
"SUE", "EarningsSurprise", "REV6", "RDIPO", "NumEarnIncrease",
"ConsRecomm", "Recomm", "EarningsForecastDisparity",
), # 8
"Trading": (
"Illiquidity", "Turnover", "Bid_Ask", "VolMkt", "VolSD",
"dVolCall", "Coskewness",
), # 7
"Misc": (
"Leverage", "OrgCapital", "Tax", "ChAssetTurnover", "BAR",
"GS", "AnnouncementReturn", "OScore", "ZScore", "CredRatDG",
"FailureProbability", "IRA", "FR", "BPEBM", "Activism1",
"Activism2", "AnalystValue", "ChForecastAccrual", "ChInvIA",
"AnalystRevision", "ForecastDispersion", "GrowthCapEx",
"MeanRankRevGrowth", "AbnormalAccrualsPercent", "ChEQ",
), # 25
}

OSAP_SIGNALS_100: tuple[str, ...] = tuple(
sig for theme_signals in OSAP_SIGNALS_BY_THEME.values() for sig in theme_signals
)
assert len(OSAP_SIGNALS_100) == 100, (
f"OSAP_SIGNALS_100 must have exactly 100 entries, got {len(OSAP_SIGNALS_100)}"
)
assert len(set(OSAP_SIGNALS_100)) == 100, (
"OSAP_SIGNALS_100 contains duplicate signal names"
)
Loading