You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
❌ NO PyPI package — verified across 11 candidate names
❌ bkelly-lab/jkp-data GitHub repo is WRDS-dependent (orchestrates user WRDS downloads; useless without credentials)
❌ jkpfactors.com/factors/ UI is gated (HTTP 302 → /login)
🚨 PRECONDITION — license-review checkpoint MUST clear before opening this integration PR
JKP data files are CC BY-NC 4.0 per DATA_LICENSE at bkelly-lab/jkp-data (verbatim verified 2026-05-18). QuantRank's current open-source static-site educational deployment qualifies per .claude/skills/phase-4/jkp-integration/PLAN.md:33.
Phase 6+ commercial roadmap (paid tier, consulting API, anything that invalidates the "non-commercial" qualifier) would force JKP removal or commercial license purchase from Kelly Lab.
Before this issue's integration PR opens, an explicit license-review checkpoint must produce a verdict:
(a) OK to proceed — no Phase 6+ commercial tier is in active planning → integration PR can open
(b) Pause — commercial tier is in planning → halt integration PR; either skip JKP, plan for commercial license, or defer the commercial decision while accepting the JKP coupling cost
(c) Halt + remove — commercial tier is launched → close this issue, remove scout module + cache directory, mark JKP integration permanently out-of-scope
This precondition is enforced by the license-review-required label; remove it only after the verdict is recorded.
Scope IN (from PR #114 §"Out of scope" — 5 deferred work items)
(a) Theme aggregation from S3 public/ prefix
The S3 bucket exposes 153-factor ZIPs at public/[<country>]_[<factor>]_[<freq>]_[<weight>].zip (~1000+ keys). Integration PR must:
Walk the public/ prefix via ?list-type=2&prefix=public/&delimiter=/ listing
Map each of the 13 themes in JKP_THEMES to a slice of the 153 individual factors (theme-to-factor mapping is the canonical JKP collapse; reference table in bkelly-lab/jkp-data repo's documentation/ directory)
Select a single weighting convention for aggregation — likely vw_cap (value-weighted with capitalization cap) to match institutional practice
Aggregate per-factor long-short returns within each theme (likely equal-weighted mean across the theme's factor cluster)
Output the theme-aggregated long-format DataFrame matching the schema {theme, year_month, ls_return, region} per PLAN.md:62-63
(b) 36-month rolling regression — the WRDS-free per-stock proxy
JKP returns are factor-level (long-short portfolios over time), NOT stock-level signals. Integration PR computes each stock's exposure to each theme via 36-month rolling regression on stock returns vs the theme's L/S return — per PLAN.md:92-96:
JKP returns are long-short portfolio returns over time, not stock-level signal values. The blend works by computing each stock's exposure to the JKP factor (regression slope on the theme's L/S return over rolling 36 months) and using that exposure as the cross-sectional rank.
This is the JKP analog of Phase 4h's per-stock proxy mode in compute/features/osap_replicate.py. NaN guard: per PLAN.md:96 ("NaN if <24 months overlap").
(c) PBO/DSR gate — extract Phase 4h's helper to a generic module
Phase 4h shipped compute/validation/osap_validation.py with PBO ≤ 0.5 + DSR > 0 hard gate wrapping PR #60's factor_passes_gates. Integration PR should refactor this into a generic compute/validation/factor_validation.py that both OSAP and JKP (and later 4j/4k) consume. Asymmetric NaN policy (zero-fill cohort + DSR-side strip) carries forward unchanged.
Rolling-12m Spearman IC observability per accepted theme — same observability-only framing as Phase 4h.
Top-5 rotation impact analysis: confirm entered_top5 / exited_top5 distributions are unchanged when composite_score_osap_adjusted and jkp_blended_score are excluded from ranking (Rule 16 lock).
Triggers (open implementation PR when EITHER fires)
Analyst / user feedback indicates the JKP theme exposures are needed for an in-flight analysis use case (forces the license-review and integration timeline)
Effort estimate
Sub-item
LOC
Days
Theme aggregation from S3
~80
1
36-month rolling regression
~120
2
PBO/DSR gate extraction to generic helper
~100
1
Schema additions (triple lockstep)
~50
0.5
compute/main.py wiring
~50
0.5
Tests + docs + attribution
~210
1
Total
~610
~6 days
Per .claude/skills/phase-4/jkp-integration/PLAN.md:144-150.
Phase 4j (Qlib scout) — recommended to ship Qlib scout BEFORE this 4i.1 integration so the generic factor_validation.py extraction benefits from 3 consumers (4h, 4i, 4j) not 2
Phase 4k (IPCA scout) — bkelly-lab/ipca repo verified to exist; lower priority than 4j
Tag v1.1.0-phase4 is gated on all of 4h/4i/4j/4k merging; this issue is the gating item for 4i specifically.
Out of scope for this issue
WRDS path — locked CSV-only per osap-integration/PLAN.md:165-169 precedent. WRDS replication is a different debate.
Global region expansion (dev / em / global) — scout's region kwarg is a forward-compat no-op; integration PR can ship us-only and add other regions in a follow-up
Per-pillar JKP weight tuning — locked equal-pillar mapping per PLAN.md:75-86; Phase 5 ML meta-learner re-tunes
Stock-level JKP signal replication — needs WRDS; permanently skipped per PLAN.md:29 + PLAN.md:172
Top-5 ranking cutover to JKP-blended score — observability-only this phase per Rule 16
Context
Phase 4i scout (PR #114, merged 2026-05-18) shipped the JKP factor library ingest skeleton + 6 smoke tests + access-path discovery. This issue tracks the full integration PR that consumes the scout's foundation and wires JKP factor exposures into the QuantRank composite (theme aggregation → 36-month regression → PBO/DSR gate → schema additions →
compute/main.pywiring).The scout intentionally deferred all production wiring per PR #114 §"Out of scope". This issue is that deferred work, sized at ~610 LOC per
.claude/skills/phase-4/jkp-integration/PLAN.md:144-150.Access-path foundation (already locked by scout)
The integration PR builds on the scout's
compute/ingest/jkp.pymodule docstring §"Access-path discovery", which records the 2026-05-18 verification:https://jkpfactors.s3.amazonaws.com/is publicly readable (no auth)JKP_THEMES(Quality / Value / Investment / Profitability / Profit Growth / Momentum / Leverage / Trading Frictions / Skewness / Low Risk / Size / Accruals / Seasonality)JKP_BUCKET_ROOT+JKP_SCOUT_ASSET_PATHconstants surface the canonical access patterncompute/ingest/osap.py:52-56)bkelly-lab/jkp-dataGitHub repo is WRDS-dependent (orchestrates user WRDS downloads; useless without credentials)jkpfactors.com/factors/UI is gated (HTTP 302 → /login)🚨 PRECONDITION — license-review checkpoint MUST clear before opening this integration PR
JKP data files are CC BY-NC 4.0 per
DATA_LICENSEatbkelly-lab/jkp-data(verbatim verified 2026-05-18). QuantRank's current open-source static-site educational deployment qualifies per.claude/skills/phase-4/jkp-integration/PLAN.md:33.Phase 6+ commercial roadmap (paid tier, consulting API, anything that invalidates the "non-commercial" qualifier) would force JKP removal or commercial license purchase from Kelly Lab.
Before this issue's integration PR opens, an explicit license-review checkpoint must produce a verdict:
This precondition is enforced by the
license-review-requiredlabel; remove it only after the verdict is recorded.Scope IN (from PR #114 §"Out of scope" — 5 deferred work items)
(a) Theme aggregation from S3
public/prefixThe S3 bucket exposes 153-factor ZIPs at
public/[<country>]_[<factor>]_[<freq>]_[<weight>].zip(~1000+ keys). Integration PR must:public/prefix via?list-type=2&prefix=public/&delimiter=/listingJKP_THEMESto a slice of the 153 individual factors (theme-to-factor mapping is the canonical JKP collapse; reference table inbkelly-lab/jkp-datarepo'sdocumentation/directory)vw_cap(value-weighted with capitalization cap) to match institutional practice{theme, year_month, ls_return, region}perPLAN.md:62-63(b) 36-month rolling regression — the WRDS-free per-stock proxy
JKP returns are factor-level (long-short portfolios over time), NOT stock-level signals. Integration PR computes each stock's exposure to each theme via 36-month rolling regression on stock returns vs the theme's L/S return — per
PLAN.md:92-96:This is the JKP analog of Phase 4h's per-stock proxy mode in
compute/features/osap_replicate.py. NaN guard: perPLAN.md:96("NaN if <24 months overlap").(c) PBO/DSR gate — extract Phase 4h's helper to a generic module
Phase 4h shipped
compute/validation/osap_validation.pywith PBO ≤ 0.5 + DSR > 0 hard gate wrapping PR #60'sfactor_passes_gates. Integration PR should refactor this into a genericcompute/validation/factor_validation.pythat both OSAP and JKP (and later 4j/4k) consume. Asymmetric NaN policy (zero-fill cohort + DSR-side strip) carries forward unchanged.Rolling-12m Spearman IC observability per accepted theme — same observability-only framing as Phase 4h.
(d) Schema bump
0.9.0-phase4h → 0.10.0-phase4iSchema triple lockstep edit (per
AGENTS.md:229-231):compute/output/schemas.py— add toStockDetail:jkp_theme_loadings: dict[str, float] | None = Nonejkp_blended_score: float | None = None(if blending into composite; observability-only per SKILL.md Rule 16)compute/output/schemas.py— add toMetadata:jkp_themes_used: list[str] | None = Nonejkp_themes_excluded: list[str] | None = Nonejkp_themes_ic_12m: dict[str, float] | None = Nonejkp_themes_coverage_pct: dict[str, float] | None = Nonecompute/config.py:30—SCHEMA_VERSION:"0.9.0-phase4h"→"0.10.0-phase4i"frontend/lib/types.ts— mirror Pydantic additionsfrontend/lib/schema-snapshot.json— regenerate viapython -m compute.output.schema_check --update-snapshot(e)
compute/main.pywiring + Top-5 rotation impact analysisMirror Phase 4h commit 5 (
fbd1acf4) — insert JKP pipeline after the OSAP block + before Step 8 per-ticker loop:Top-5 rotation impact analysis: confirm
entered_top5/exited_top5distributions are unchanged whencomposite_score_osap_adjustedandjkp_blended_scoreare excluded from ranking (Rule 16 lock).Triggers (open implementation PR when EITHER fires)
Effort estimate
compute/main.pywiringPer
.claude/skills/phase-4/jkp-integration/PLAN.md:144-150.Sequencing relative to other Phase 4+ tracks
factor_validation.pyextraction benefits from 3 consumers (4h, 4i, 4j) not 2bkelly-lab/ipcarepo verified to exist; lower priority than 4jTag
v1.1.0-phase4is gated on all of 4h/4i/4j/4k merging; this issue is the gating item for 4i specifically.Out of scope for this issue
osap-integration/PLAN.md:165-169precedent. WRDS replication is a different debate.regionkwarg is a forward-compat no-op; integration PR can shipus-only and add other regions in a follow-upPLAN.md:75-86; Phase 5 ML meta-learner re-tunesPLAN.md:29+PLAN.md:172Related
fbd1acf4)0.9.0-phase4h(current) →0.10.0-phase4i(this issue's target).claude/skills/phase-4/jkp-integration/PLAN.md.claude/skills/phase-4/backtest-infrastructure/PLAN.md