Skip to content

Phase 4i.1 — Full JKP integration (theme aggregation + 36m regression + PBO/DSR gate) #115

@dackclup

Description

@dackclup

Context

Phase 4i scout (PR #114, merged 2026-05-18) shipped the JKP factor library ingest skeleton + 6 smoke tests + access-path discovery. This issue tracks the full integration PR that consumes the scout's foundation and wires JKP factor exposures into the QuantRank composite (theme aggregation → 36-month regression → PBO/DSR gate → schema additions → compute/main.py wiring).

The scout intentionally deferred all production wiring per PR #114 §"Out of scope". This issue is that deferred work, sized at ~610 LOC per .claude/skills/phase-4/jkp-integration/PLAN.md:144-150.

Access-path foundation (already locked by scout)

The integration PR builds on the scout's compute/ingest/jkp.py module docstring §"Access-path discovery", which records the 2026-05-18 verification:

  • https://jkpfactors.s3.amazonaws.com/ is publicly readable (no auth)
  • ✅ 13-theme manifest locked in JKP_THEMES (Quality / Value / Investment / Profitability / Profit Growth / Momentum / Leverage / Trading Frictions / Skewness / Low Risk / Size / Accruals / Seasonality)
  • JKP_BUCKET_ROOT + JKP_SCOUT_ASSET_PATH constants surface the canonical access pattern
  • ✅ Tenacity policy + parquet cache pattern mirrors OSAP (compute/ingest/osap.py:52-56)
  • ❌ NO PyPI package — verified across 11 candidate names
  • bkelly-lab/jkp-data GitHub repo is WRDS-dependent (orchestrates user WRDS downloads; useless without credentials)
  • jkpfactors.com/factors/ UI is gated (HTTP 302 → /login)

🚨 PRECONDITION — license-review checkpoint MUST clear before opening this integration PR

JKP data files are CC BY-NC 4.0 per DATA_LICENSE at bkelly-lab/jkp-data (verbatim verified 2026-05-18). QuantRank's current open-source static-site educational deployment qualifies per .claude/skills/phase-4/jkp-integration/PLAN.md:33.

Phase 6+ commercial roadmap (paid tier, consulting API, anything that invalidates the "non-commercial" qualifier) would force JKP removal or commercial license purchase from Kelly Lab.

Before this issue's integration PR opens, an explicit license-review checkpoint must produce a verdict:

  • (a) OK to proceed — no Phase 6+ commercial tier is in active planning → integration PR can open
  • (b) Pause — commercial tier is in planning → halt integration PR; either skip JKP, plan for commercial license, or defer the commercial decision while accepting the JKP coupling cost
  • (c) Halt + remove — commercial tier is launched → close this issue, remove scout module + cache directory, mark JKP integration permanently out-of-scope

This precondition is enforced by the license-review-required label; remove it only after the verdict is recorded.

Scope IN (from PR #114 §"Out of scope" — 5 deferred work items)

(a) Theme aggregation from S3 public/ prefix

The S3 bucket exposes 153-factor ZIPs at public/[<country>]_[<factor>]_[<freq>]_[<weight>].zip (~1000+ keys). Integration PR must:

  • Walk the public/ prefix via ?list-type=2&prefix=public/&delimiter=/ listing
  • Map each of the 13 themes in JKP_THEMES to a slice of the 153 individual factors (theme-to-factor mapping is the canonical JKP collapse; reference table in bkelly-lab/jkp-data repo's documentation/ directory)
  • Select a single weighting convention for aggregation — likely vw_cap (value-weighted with capitalization cap) to match institutional practice
  • Aggregate per-factor long-short returns within each theme (likely equal-weighted mean across the theme's factor cluster)
  • Output the theme-aggregated long-format DataFrame matching the schema {theme, year_month, ls_return, region} per PLAN.md:62-63

(b) 36-month rolling regression — the WRDS-free per-stock proxy

JKP returns are factor-level (long-short portfolios over time), NOT stock-level signals. Integration PR computes each stock's exposure to each theme via 36-month rolling regression on stock returns vs the theme's L/S return — per PLAN.md:92-96:

JKP returns are long-short portfolio returns over time, not stock-level signal values. The blend works by computing each stock's exposure to the JKP factor (regression slope on the theme's L/S return over rolling 36 months) and using that exposure as the cross-sectional rank.

This is the JKP analog of Phase 4h's per-stock proxy mode in compute/features/osap_replicate.py. NaN guard: per PLAN.md:96 ("NaN if <24 months overlap").

(c) PBO/DSR gate — extract Phase 4h's helper to a generic module

Phase 4h shipped compute/validation/osap_validation.py with PBO ≤ 0.5 + DSR > 0 hard gate wrapping PR #60's factor_passes_gates. Integration PR should refactor this into a generic compute/validation/factor_validation.py that both OSAP and JKP (and later 4j/4k) consume. Asymmetric NaN policy (zero-fill cohort + DSR-side strip) carries forward unchanged.

Rolling-12m Spearman IC observability per accepted theme — same observability-only framing as Phase 4h.

(d) Schema bump 0.9.0-phase4h → 0.10.0-phase4i

Schema triple lockstep edit (per AGENTS.md:229-231):

  • compute/output/schemas.py — add to StockDetail:
    • jkp_theme_loadings: dict[str, float] | None = None
    • jkp_blended_score: float | None = None (if blending into composite; observability-only per SKILL.md Rule 16)
  • compute/output/schemas.py — add to Metadata:
    • jkp_themes_used: list[str] | None = None
    • jkp_themes_excluded: list[str] | None = None
    • jkp_themes_ic_12m: dict[str, float] | None = None
    • jkp_themes_coverage_pct: dict[str, float] | None = None
  • compute/config.py:30SCHEMA_VERSION: "0.9.0-phase4h""0.10.0-phase4i"
  • frontend/lib/types.ts — mirror Pydantic additions
  • frontend/lib/schema-snapshot.json — regenerate via python -m compute.output.schema_check --update-snapshot

(e) compute/main.py wiring + Top-5 rotation impact analysis

Mirror Phase 4h commit 5 (fbd1acf4) — insert JKP pipeline after the OSAP block + before Step 8 per-ticker loop:

# Phase 4i — JKP theme aggregation + 36m regression + Path-b blend
try:
    jkp_themes_raw = fetch_jkp_returns(...)
    jkp_themes_long = aggregate_themes_from_public_zips(jkp_themes_raw)
    gate_results = gate_jkp_themes(jkp_themes_long, requested_themes=config.JKP_THEMES)
    # 36-month rolling regression per stock × theme
    jkp_exposures = compute_jkp_exposures(price_returns_by_ticker, jkp_themes_long, ...)
    # Optional blend (observability only; Top-5 stays raw composite per Rule 16)
    composite_jkp_adjusted = apply_jkp_blend(composite, jkp_exposures, ...)
except Exception as e:
    logger.warning("JKP pipeline failed (observability-only); fields → None: %s", e)
    ...

Top-5 rotation impact analysis: confirm entered_top5 / exited_top5 distributions are unchanged when composite_score_osap_adjusted and jkp_blended_score are excluded from ranking (Rule 16 lock).

Triggers (open implementation PR when EITHER fires)

  1. License-review verdict = "OK to proceed" AND Phase 4h.1 (Phase 4h.1 — Full per-stock OSAP signal replication #113) is either closed or stabilized at proxy-mode operation
  2. Analyst / user feedback indicates the JKP theme exposures are needed for an in-flight analysis use case (forces the license-review and integration timeline)

Effort estimate

Sub-item LOC Days
Theme aggregation from S3 ~80 1
36-month rolling regression ~120 2
PBO/DSR gate extraction to generic helper ~100 1
Schema additions (triple lockstep) ~50 0.5
compute/main.py wiring ~50 0.5
Tests + docs + attribution ~210 1
Total ~610 ~6 days

Per .claude/skills/phase-4/jkp-integration/PLAN.md:144-150.

Sequencing relative to other Phase 4+ tracks

  • Phase 4h.1 (Phase 4h.1 — Full per-stock OSAP signal replication #113, OSAP full per-stock replication) — parallel; can ship independently
  • Phase 4j (Qlib scout) — recommended to ship Qlib scout BEFORE this 4i.1 integration so the generic factor_validation.py extraction benefits from 3 consumers (4h, 4i, 4j) not 2
  • Phase 4k (IPCA scout) — bkelly-lab/ipca repo verified to exist; lower priority than 4j

Tag v1.1.0-phase4 is gated on all of 4h/4i/4j/4k merging; this issue is the gating item for 4i specifically.

Out of scope for this issue

  • WRDS path — locked CSV-only per osap-integration/PLAN.md:165-169 precedent. WRDS replication is a different debate.
  • Global region expansion (dev / em / global) — scout's region kwarg is a forward-compat no-op; integration PR can ship us-only and add other regions in a follow-up
  • Per-pillar JKP weight tuning — locked equal-pillar mapping per PLAN.md:75-86; Phase 5 ML meta-learner re-tunes
  • Stock-level JKP signal replication — needs WRDS; permanently skipped per PLAN.md:29 + PLAN.md:172
  • Top-5 ranking cutover to JKP-blended score — observability-only this phase per Rule 16

Related

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions