Skip to content

security(W2+W4): workflow-perm narrowing + log-bash secret scrub#229

Merged
dackclup merged 1 commit into
mainfrom
claude/eager-bohr-12bQi
May 24, 2026
Merged

security(W2+W4): workflow-perm narrowing + log-bash secret scrub#229
dackclup merged 1 commit into
mainfrom
claude/eager-bohr-12bQi

Conversation

@dackclup
Copy link
Copy Markdown
Owner

Summary

Closes the 2 remaining security-reviewer WARNs deferred from PR #226 (W1 + W3 shipped there; W2 + W4 bundled here as one focused security-hardening PR). Both fixes are operational hygiene with no compute / schema / production-code surface.

W2 — compute-rankings.yml workflow-perm narrowing

The workflow-level permissions: contents: write was wider than needed — only the final Commit JSON outputs step inside the compute: job is a writer. Fix:

Before After
Workflow-level default contents: write contents: read
compute: job inherits write explicitly contents: write
Future job added would inherit write inherits read unless opts up

Behavior unchanged for the compute job; only the default surface shrinks. YAML parse-verified clean.

W4 — log-bash.sh inline-credential scrub

Previously the hook appended the raw Bash command (including any inline env-var dereference like EDGAR_USER_AGENT=foo python ... or an Authorization: Bearer <tok> header) to gitignored .claude/session.log. Severity is LOW because the file is gitignored + local-only, but an accidental cat .claude/session.log during a screen-share or pasted into a gist could leak the credential.

Fix: sed pre-filter redacts the value half of common secret-prefix tokens before appending; the prefix stays so the log is still readable for "which integration was being called" debugging.

Covered prefixes:

Prefix family Pattern
GitHub ghp_ / gho_ / ghu_ / ghs_ / ghr_ / github_pat_
Anthropic sk-ant-api*
OpenAI generic sk-*
AWS AKIA* / ASIA*
Google API AIza*
Slack xox[abprs]-*
Bare bearer Bearer <tok> / Authorization: Bearer <tok>

Manual scrub test (this PR's verification):

in:  curl -H "Authorization: Bearer ghp_abcdef123...890" ...
out: curl -H "Authorization: Bearer ghp_[REDACTED]" ...

Fail-open discipline preserved — if sed errors (unlikely), the existing || true falls back to the original command, never blocks the hook.

Defense-in-depth: the gitignore remains the primary guard; this shrinks the blast radius of an accidental local leak.

Files touched

File Δ Reason
.github/workflows/compute-rankings.yml +12 / −1 W2 — workflow-perm narrowing
.claude/hooks/log-bash.sh +19 / −1 W4 — secret scrub
CLAUDE.md +33 §Phase status in-flight note
AGENTS.md +28 / −18 §Phase + version state in-flight note + PR #228 stale-header reword

Doc-only otherwise — no compute / schema / scoring / valuation / frontend / Python / TS production-code change. CLAUDE.md + AGENTS.md lockstep satisfied per §Conventions.

Test plan

  • python -c "yaml.safe_load(...)".github/workflows/compute-rankings.yml parses clean
  • Manual scrub test — Bearer ghp_abc... correctly redacts to Bearer ghp_[REDACTED]
  • git diff --stat confirms scope (4 files / +105 / −20)
  • Frontend (build) CI green (no frontend touched; expected pass)
  • Python (lint + test) CI green (no Python touched; expected pass)
  • security-reviewer sonnet spawn (optional — re-verifies both WARNs resolved)
  • Flip Draft → Ready after CI green + reviewer pass

Out of scope (still deferred)

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4


Generated by Claude Code

Closes the 2 remaining security-reviewer WARNs deferred from PR #226
(W1 + W3 shipped there; W2 + W4 bundled here as one focused
security-hardening PR). Both fixes are operational hygiene with no
compute / schema / production-code surface.

W2 — .github/workflows/compute-rankings.yml workflow-perm narrowing
---
The workflow-level `permissions: contents: write` was wider than
needed — only the final `Commit JSON outputs` step inside the
`compute:` job is a writer. Fix:

- Workflow-level default narrowed to `contents: read` (least
  privilege baseline; any future job added to this file inherits
  read unless it explicitly opts up)
- The `compute:` job explicitly opts up to `contents: write` so
  the `git commit + push` step still works

Behavior unchanged for the compute job; only the default surface
shrinks. YAML parse-verified clean.

W4 — .claude/hooks/log-bash.sh inline-credential scrub
---
Previously the hook appended the raw Bash command (including any
inline env-var dereference like `EDGAR_USER_AGENT=foo python ...`
or an `Authorization: Bearer <tok>` header) to gitignored
`.claude/session.log`. Severity is LOW because the file is
gitignored + local-only, but an accidental `cat .claude/session.log`
during a screen-share or pasted into a gist could leak the
credential.

Fix: `sed` pre-filter redacts the value half of common secret-prefix
tokens before appending; the prefix stays so the log is still
readable for "which integration was being called" debugging.

Covered prefixes:
- GitHub: ghp_ / gho_ / ghu_ / ghs_ / ghr_ / github_pat_
- Anthropic: sk-ant-api*
- OpenAI generic: sk-*
- AWS: AKIA* / ASIA*
- Google API: AIza*
- Slack: xox[abprs]-*
- Bare `Bearer <tok>` / `Authorization: Bearer <tok>` headers

Manual scrub test:
  in:  curl -H "Authorization: Bearer ghp_abcdef123...890" ...
  out: curl -H "Authorization: Bearer ghp_[REDACTED]" ...

Fail-open discipline preserved — if `sed` errors (unlikely), the
existing `|| true` falls back to the original command, never blocks
the hook.

Defense-in-depth: the gitignore remains the primary guard; this
shrinks the blast radius of an accidental local leak.

CLAUDE.md + AGENTS.md lockstep satisfied via §Phase status in-flight
notes (no new §Gotchas — these are not invariants future code
authors need to remember; they're hardening of existing surfaces).

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
@vercel
Copy link
Copy Markdown

vercel Bot commented May 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 24, 2026 4:02am

@dackclup dackclup marked this pull request as ready for review May 24, 2026 04:09
@dackclup dackclup merged commit dacf293 into main May 24, 2026
4 checks passed
@dackclup dackclup deleted the claude/eager-bohr-12bQi branch May 24, 2026 05:19
dackclup pushed a commit that referenced this pull request May 24, 2026
…per edgar-debugger live-XML verify

The literature-searcher proof-of-life on 2026-05-23 (session 4)
flagged a precision gap: the form4 module docstrings referenced
the colloquial <rule10b5_1> XML tag name (a label, not the actual
SEC EDGAR Ownership XML element). edgar-debugger re-audit on
2026-05-24 (session 5) fetched 3 live AAPL Form 4 XMLs from SEC
EDGAR and confirmed the actual element is <aff10b5One> at
ownershipDocument/aff10b5One — a DOCUMENT-LEVEL boolean (one per
filing, covering ALL transactions in that Form 4), NOT a
per-transaction attribute.

Updated 7 references to use the canonical name:

5 code spots:
- compute/scoring/form4_insider.py:20 (module docstring header)
- compute/scoring/form4_insider.py:82 (§Footnote resolution)
- compute/scoring/form4_insider.py:587 (inline _form4_to_transactions
  comment)
- compute/scoring/form4_signals.py:129 (module docstring)
- compute/output/schemas.py:200 (extreme_estimate_majority_count
  comment context)

2 doc spots:
- CLAUDE.md §Phase status PR #224 entry "Access-path caveat" block
  (line 1352-1359)
- AGENTS.md §Phase + version state PR #224 entry "Access-path
  caveat for cross-tool agents" block (line 947-951)

Added §Footnote resolution architectural-gap note in
compute/scoring/form4_insider.py: a filer who checks
<aff10b5One>true at the document level but does NOT include the
per-transaction footnote text (valid — the checkbox IS the formal
affirmative defense, footnotes are supplemental) will slip past
the current footnote-text fallback path and enter the
opportunistic cohort incorrectly. Deferred to a follow-up PR that
direct-parses the raw Form 4 XML for ownershipDocument/aff10b5One
(edgartools doesn't expose it, so the implementation walks the
filing text once per Ownership object). Tracked as the
architectural follow-up after PR 4-eq.

edgartools 5.31.5 verified as PyPI latest by edgar-debugger
2026-05-24 — no newer release adds a parse path for <aff10b5One>.

detect_10b5_1_plan regex set (6 patterns) confirmed complete vs
real-world footnote text — pattern "10b5-1" alone covers every
common variant including Rule 10b5-1(c) subsection cites. No
additions needed.

Docstring-only PR — no compute / scoring / valuation / behavior
change. schema_check clean (comment-only edit on schemas.py),
ruff clean, tests unchanged at 1168+. CLAUDE.md + AGENTS.md
lockstep satisfied via §Phase status in-flight notes.

Companion stale-header reword: marked PR #229 "Security WARN
cleanup" → "merged via PR #229" in CLAUDE.md + AGENTS.md.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup pushed a commit that referenced this pull request May 24, 2026
…sion §Gotcha

Closes the recurring "merge ไม่ได้ หลังแก้แล้วกลับมาเป็นอีก" pattern
surfaced 2026-05-24: PR #230 hit mergeable_state=dirty twice in a
single session — first when PR #229 (security WARN cleanup) landed
on main mid-iteration, then again when PR #232 + PR #233 (LedgerCraft
A1 + A2) landed before Mark-Ready.

ROOT CAUSE: every PR adds a "**X in flight (this PR)**" entry at
the SAME insertion line in CLAUDE.md §Phase status + AGENTS.md
§Phase + version state (just before "**Next deliverables**" /
"## Claude-Code-specific tooling"). Two parallel PRs both touch
that single line → `git merge` cannot auto-resolve two distinct
adds at the same line → `mergeable_state: dirty`. The conflict
is BENIGN (both adds are legitimate; resolution is always "keep
both in chronological order") but git can't auto-detect that.

THREE COMPANION EDITS bundled here:

(a) CLAUDE.md §Conventions — new bullet "Rebase onto origin/main
    before flipping any PR Draft → Ready". Includes the operational
    `git fetch origin main && git rebase origin/main` recipe + the
    "keep both entries in chronological order (older PR first, your
    entry second)" resolution discipline.

(b) CLAUDE.md §Gotchas — new "Parallel-PR §Phase status collision
    pattern" entry recording the recurring symptom (PR #230 ×2 on
    2026-05-24 against PR #229 + PR #232 + PR #233) + the
    local-mitigation workflow + a forward-looking structural
    follow-up note: consider moving the "in flight" sub-block to
    a side file (PHASE_STATUS_INFLIGHT.md) that's append-only-
    per-PR; on merge a housekeeping commit moves the entry into
    CLAUDE.md proper. Not yet adopted — trade-off is extra
    discipline at merge time.

(c) AGENTS.md mirror — cross-tool equivalent for Copilot / Cursor /
    Devin contributors, emphasising the rebase-before-Mark-Ready
    discipline applies to ALL agent runtimes (not Claude-Code-
    specific).

ALSO bundled as a 3rd in-flight entry in the §Phase status block
documenting THIS PR's scope expansion (Parts 1+2+3 now).

No infrastructure / CI / behavior change — pure doc discipline.
Future contributors who hit the same conflict find the §Gotchas
entry + the §Conventions recipe + AGENTS.md note and resolve in
seconds, instead of re-discovering the pattern.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup pushed a commit that referenced this pull request May 24, 2026
…per edgar-debugger live-XML verify

The literature-searcher proof-of-life on 2026-05-23 (session 4)
flagged a precision gap: the form4 module docstrings referenced
the colloquial <rule10b5_1> XML tag name (a label, not the actual
SEC EDGAR Ownership XML element). edgar-debugger re-audit on
2026-05-24 (session 5) fetched 3 live AAPL Form 4 XMLs from SEC
EDGAR and confirmed the actual element is <aff10b5One> at
ownershipDocument/aff10b5One — a DOCUMENT-LEVEL boolean (one per
filing, covering ALL transactions in that Form 4), NOT a
per-transaction attribute.

Updated 7 references to use the canonical name:

5 code spots:
- compute/scoring/form4_insider.py:20 (module docstring header)
- compute/scoring/form4_insider.py:82 (§Footnote resolution)
- compute/scoring/form4_insider.py:587 (inline _form4_to_transactions
  comment)
- compute/scoring/form4_signals.py:129 (module docstring)
- compute/output/schemas.py:200 (extreme_estimate_majority_count
  comment context)

2 doc spots:
- CLAUDE.md §Phase status PR #224 entry "Access-path caveat" block
  (line 1352-1359)
- AGENTS.md §Phase + version state PR #224 entry "Access-path
  caveat for cross-tool agents" block (line 947-951)

Added §Footnote resolution architectural-gap note in
compute/scoring/form4_insider.py: a filer who checks
<aff10b5One>true at the document level but does NOT include the
per-transaction footnote text (valid — the checkbox IS the formal
affirmative defense, footnotes are supplemental) will slip past
the current footnote-text fallback path and enter the
opportunistic cohort incorrectly. Deferred to a follow-up PR that
direct-parses the raw Form 4 XML for ownershipDocument/aff10b5One
(edgartools doesn't expose it, so the implementation walks the
filing text once per Ownership object). Tracked as the
architectural follow-up after PR 4-eq.

edgartools 5.31.5 verified as PyPI latest by edgar-debugger
2026-05-24 — no newer release adds a parse path for <aff10b5One>.

detect_10b5_1_plan regex set (6 patterns) confirmed complete vs
real-world footnote text — pattern "10b5-1" alone covers every
common variant including Rule 10b5-1(c) subsection cites. No
additions needed.

Docstring-only PR — no compute / scoring / valuation / behavior
change. schema_check clean (comment-only edit on schemas.py),
ruff clean, tests unchanged at 1168+. CLAUDE.md + AGENTS.md
lockstep satisfied via §Phase status in-flight notes.

Companion stale-header reword: marked PR #229 "Security WARN
cleanup" → "merged via PR #229" in CLAUDE.md + AGENTS.md.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup pushed a commit that referenced this pull request May 24, 2026
…sion §Gotcha

Closes the recurring "merge ไม่ได้ หลังแก้แล้วกลับมาเป็นอีก" pattern
surfaced 2026-05-24: PR #230 hit mergeable_state=dirty twice in a
single session — first when PR #229 (security WARN cleanup) landed
on main mid-iteration, then again when PR #232 + PR #233 (LedgerCraft
A1 + A2) landed before Mark-Ready.

ROOT CAUSE: every PR adds a "**X in flight (this PR)**" entry at
the SAME insertion line in CLAUDE.md §Phase status + AGENTS.md
§Phase + version state (just before "**Next deliverables**" /
"## Claude-Code-specific tooling"). Two parallel PRs both touch
that single line → `git merge` cannot auto-resolve two distinct
adds at the same line → `mergeable_state: dirty`. The conflict
is BENIGN (both adds are legitimate; resolution is always "keep
both in chronological order") but git can't auto-detect that.

THREE COMPANION EDITS bundled here:

(a) CLAUDE.md §Conventions — new bullet "Rebase onto origin/main
    before flipping any PR Draft → Ready". Includes the operational
    `git fetch origin main && git rebase origin/main` recipe + the
    "keep both entries in chronological order (older PR first, your
    entry second)" resolution discipline.

(b) CLAUDE.md §Gotchas — new "Parallel-PR §Phase status collision
    pattern" entry recording the recurring symptom (PR #230 ×2 on
    2026-05-24 against PR #229 + PR #232 + PR #233) + the
    local-mitigation workflow + a forward-looking structural
    follow-up note: consider moving the "in flight" sub-block to
    a side file (PHASE_STATUS_INFLIGHT.md) that's append-only-
    per-PR; on merge a housekeeping commit moves the entry into
    CLAUDE.md proper. Not yet adopted — trade-off is extra
    discipline at merge time.

(c) AGENTS.md mirror — cross-tool equivalent for Copilot / Cursor /
    Devin contributors, emphasising the rebase-before-Mark-Ready
    discipline applies to ALL agent runtimes (not Claude-Code-
    specific).

ALSO bundled as a 3rd in-flight entry in the §Phase status block
documenting THIS PR's scope expansion (Parts 1+2+3 now).

No infrastructure / CI / behavior change — pure doc discipline.
Future contributors who hit the same conflict find the §Gotchas
entry + the §Conventions recipe + AGENTS.md note and resolve in
seconds, instead of re-discovering the pattern.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup pushed a commit that referenced this pull request May 24, 2026
…per edgar-debugger live-XML verify

The literature-searcher proof-of-life on 2026-05-23 (session 4)
flagged a precision gap: the form4 module docstrings referenced
the colloquial <rule10b5_1> XML tag name (a label, not the actual
SEC EDGAR Ownership XML element). edgar-debugger re-audit on
2026-05-24 (session 5) fetched 3 live AAPL Form 4 XMLs from SEC
EDGAR and confirmed the actual element is <aff10b5One> at
ownershipDocument/aff10b5One — a DOCUMENT-LEVEL boolean (one per
filing, covering ALL transactions in that Form 4), NOT a
per-transaction attribute.

Updated 7 references to use the canonical name:

5 code spots:
- compute/scoring/form4_insider.py:20 (module docstring header)
- compute/scoring/form4_insider.py:82 (§Footnote resolution)
- compute/scoring/form4_insider.py:587 (inline _form4_to_transactions
  comment)
- compute/scoring/form4_signals.py:129 (module docstring)
- compute/output/schemas.py:200 (extreme_estimate_majority_count
  comment context)

2 doc spots:
- CLAUDE.md §Phase status PR #224 entry "Access-path caveat" block
  (line 1352-1359)
- AGENTS.md §Phase + version state PR #224 entry "Access-path
  caveat for cross-tool agents" block (line 947-951)

Added §Footnote resolution architectural-gap note in
compute/scoring/form4_insider.py: a filer who checks
<aff10b5One>true at the document level but does NOT include the
per-transaction footnote text (valid — the checkbox IS the formal
affirmative defense, footnotes are supplemental) will slip past
the current footnote-text fallback path and enter the
opportunistic cohort incorrectly. Deferred to a follow-up PR that
direct-parses the raw Form 4 XML for ownershipDocument/aff10b5One
(edgartools doesn't expose it, so the implementation walks the
filing text once per Ownership object). Tracked as the
architectural follow-up after PR 4-eq.

edgartools 5.31.5 verified as PyPI latest by edgar-debugger
2026-05-24 — no newer release adds a parse path for <aff10b5One>.

detect_10b5_1_plan regex set (6 patterns) confirmed complete vs
real-world footnote text — pattern "10b5-1" alone covers every
common variant including Rule 10b5-1(c) subsection cites. No
additions needed.

Docstring-only PR — no compute / scoring / valuation / behavior
change. schema_check clean (comment-only edit on schemas.py),
ruff clean, tests unchanged at 1168+. CLAUDE.md + AGENTS.md
lockstep satisfied via §Phase status in-flight notes.

Companion stale-header reword: marked PR #229 "Security WARN
cleanup" → "merged via PR #229" in CLAUDE.md + AGENTS.md.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup pushed a commit that referenced this pull request May 24, 2026
…sion §Gotcha

Closes the recurring "merge ไม่ได้ หลังแก้แล้วกลับมาเป็นอีก" pattern
surfaced 2026-05-24: PR #230 hit mergeable_state=dirty twice in a
single session — first when PR #229 (security WARN cleanup) landed
on main mid-iteration, then again when PR #232 + PR #233 (LedgerCraft
A1 + A2) landed before Mark-Ready.

ROOT CAUSE: every PR adds a "**X in flight (this PR)**" entry at
the SAME insertion line in CLAUDE.md §Phase status + AGENTS.md
§Phase + version state (just before "**Next deliverables**" /
"## Claude-Code-specific tooling"). Two parallel PRs both touch
that single line → `git merge` cannot auto-resolve two distinct
adds at the same line → `mergeable_state: dirty`. The conflict
is BENIGN (both adds are legitimate; resolution is always "keep
both in chronological order") but git can't auto-detect that.

THREE COMPANION EDITS bundled here:

(a) CLAUDE.md §Conventions — new bullet "Rebase onto origin/main
    before flipping any PR Draft → Ready". Includes the operational
    `git fetch origin main && git rebase origin/main` recipe + the
    "keep both entries in chronological order (older PR first, your
    entry second)" resolution discipline.

(b) CLAUDE.md §Gotchas — new "Parallel-PR §Phase status collision
    pattern" entry recording the recurring symptom (PR #230 ×2 on
    2026-05-24 against PR #229 + PR #232 + PR #233) + the
    local-mitigation workflow + a forward-looking structural
    follow-up note: consider moving the "in flight" sub-block to
    a side file (PHASE_STATUS_INFLIGHT.md) that's append-only-
    per-PR; on merge a housekeeping commit moves the entry into
    CLAUDE.md proper. Not yet adopted — trade-off is extra
    discipline at merge time.

(c) AGENTS.md mirror — cross-tool equivalent for Copilot / Cursor /
    Devin contributors, emphasising the rebase-before-Mark-Ready
    discipline applies to ALL agent runtimes (not Claude-Code-
    specific).

ALSO bundled as a 3rd in-flight entry in the §Phase status block
documenting THIS PR's scope expansion (Parts 1+2+3 now).

No infrastructure / CI / behavior change — pure doc discipline.
Future contributors who hit the same conflict find the §Gotchas
entry + the §Conventions recipe + AGENTS.md note and resolve in
seconds, instead of re-discovering the pattern.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup added a commit that referenced this pull request May 24, 2026
…nent 45-min simulate fix (#230)

* docs(form4): correct <rule10b5_1> → <aff10b5One> docstring precision per edgar-debugger live-XML verify

The literature-searcher proof-of-life on 2026-05-23 (session 4)
flagged a precision gap: the form4 module docstrings referenced
the colloquial <rule10b5_1> XML tag name (a label, not the actual
SEC EDGAR Ownership XML element). edgar-debugger re-audit on
2026-05-24 (session 5) fetched 3 live AAPL Form 4 XMLs from SEC
EDGAR and confirmed the actual element is <aff10b5One> at
ownershipDocument/aff10b5One — a DOCUMENT-LEVEL boolean (one per
filing, covering ALL transactions in that Form 4), NOT a
per-transaction attribute.

Updated 7 references to use the canonical name:

5 code spots:
- compute/scoring/form4_insider.py:20 (module docstring header)
- compute/scoring/form4_insider.py:82 (§Footnote resolution)
- compute/scoring/form4_insider.py:587 (inline _form4_to_transactions
  comment)
- compute/scoring/form4_signals.py:129 (module docstring)
- compute/output/schemas.py:200 (extreme_estimate_majority_count
  comment context)

2 doc spots:
- CLAUDE.md §Phase status PR #224 entry "Access-path caveat" block
  (line 1352-1359)
- AGENTS.md §Phase + version state PR #224 entry "Access-path
  caveat for cross-tool agents" block (line 947-951)

Added §Footnote resolution architectural-gap note in
compute/scoring/form4_insider.py: a filer who checks
<aff10b5One>true at the document level but does NOT include the
per-transaction footnote text (valid — the checkbox IS the formal
affirmative defense, footnotes are supplemental) will slip past
the current footnote-text fallback path and enter the
opportunistic cohort incorrectly. Deferred to a follow-up PR that
direct-parses the raw Form 4 XML for ownershipDocument/aff10b5One
(edgartools doesn't expose it, so the implementation walks the
filing text once per Ownership object). Tracked as the
architectural follow-up after PR 4-eq.

edgartools 5.31.5 verified as PyPI latest by edgar-debugger
2026-05-24 — no newer release adds a parse path for <aff10b5One>.

detect_10b5_1_plan regex set (6 patterns) confirmed complete vs
real-world footnote text — pattern "10b5-1" alone covers every
common variant including Rule 10b5-1(c) subsection cites. No
additions needed.

Docstring-only PR — no compute / scoring / valuation / behavior
change. schema_check clean (comment-only edit on schemas.py),
ruff clean, tests unchanged at 1168+. CLAUDE.md + AGENTS.md
lockstep satisfied via §Phase status in-flight notes.

Companion stale-header reword: marked PR #229 "Security WARN
cleanup" → "merged via PR #229" in CLAUDE.md + AGENTS.md.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

* ci(simulate): permanent 45-min cancellation fix — wire QR_SKIP_TIER2 + cache + path filter

The docstring fix above triggered simulate (path filter matched
compute/scoring/**) which cancelled at 45m02s on the timeout cap.
ci-triage-engineer deep-dive (session 5, 2026-05-24) identified the
recurring pattern across multiple PRs: the QR_SKIP_TIER2 kill-switch
fully wired in compute/scoring/tier2.py:158 was NEVER set in
.github/workflows/pre-merge-prod-sim.yml. Past mitigations:

- PR #165 (1y non-reliance lookback): addressed a specific
  5-year EDGAR backfill spike on one PR but left the universe-wide
  Tier-2 loop intact
- PR-form4-2 (FORM4_FETCH_SKIP=1): saved form4 budget but left
  the Tier-2 (502 tickers × 10-K text fetch + 8-K fetch) running
  unconditionally (20-35m cold-cache cost)

Recurrence tally on last 5 simulate runs:
  PR #165  19m01s  ✅ warm
  PR #204  19m37s  ✅ warm
  PR #222  16m56s  ✅ warm
  PR #224  17m08s  ✅ warm
  PR #230  45m15s  ❌ CANCELLED (cold)

Pattern is structural — when GitHub evicts the warm cache after a
7-day gap without a simulate-triggering PR, the next compute/scoring
PR hits full cold. The 4-part permanent fix:

(a) QR_SKIP_TIER2: "1" added to pre-merge-prod-sim.yml env block
    PRIMARY fix — eliminates the 20-35m cold-cache Tier-2 cost.
    Tradeoff: the non_reliance veto (enabled since PR 4g) is
    suppressed in simulate's diff output. Acceptable because
    simulate is informational-only (per workflow comment line 24);
    non_reliance veto correctness is covered by offline pytest.

(b) compute/cache/edgar_form4 added to BOTH workflows' cache paths
    (compute-rankings.yml save+restore, pre-merge-prod-sim.yml
    restore only). Future-proofs for Phase 4.5e PR 5 when form4
    weight promotion enables FORM4_FETCH_SKIP=0 on simulate.

(c) Path filter widened in pre-merge-prod-sim.yml from
    compute/scoring/** + compute/features/** only to also include:
    - compute/ingest/**     (fundamentals fetcher regressions)
    - compute/valuation/**  (RIM / DCF / ensemble regressions)
    - compute/output/schemas.py  (Pydantic schema bumps)
    - compute/main.py       (orchestrator changes)
    - pyproject.toml        (dep bumps affecting compute)

(d) compute/scoring/tier2.py:154-155 docstring corrected
    The old comment said "_EIGHT_K_DEFENSES_ENABLED=False" but
    PR 4g (2026-05-17) flipped it to True. The non_reliance veto
    IS the active veto path now; updated comment explains the
    veto suppression tradeoff in simulate context.

Expected outcomes:
- Docstring-only PR in compute/scoring/ → simulate completes in
  ~12-15m warm or ~17-20m partial-cold (well under 45m cap)
- Real scoring PR → same budget; composite-score diff unchanged
- Cache eviction (7-day gap) → worst case fundamentals cold ~10-20m
  + no Tier-2 + no form4 = under cap
- Weekly cron UNAFFECTED — QR_SKIP_TIER2 not set there; full
  run continues to populate the warm cache

Companion benefit: this PR's re-pushed simulate is the LIVE
VALIDATION of the fix — first sub-45m simulate on this branch
proves the fix works.

Files touched:
- .github/workflows/pre-merge-prod-sim.yml  (env + cache + paths)
- .github/workflows/compute-rankings.yml    (cache path only)
- compute/scoring/tier2.py                  (stale comment)
- CLAUDE.md + AGENTS.md                     (§Phase status note)

YAML parse-verified; ruff clean; no compute / schema / scoring /
valuation / Python behavior change. CLAUDE.md + AGENTS.md
lockstep satisfied via the §Phase status notes.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

* docs(workflow): rebase-discipline § + parallel-PR §Phase status collision §Gotcha

Closes the recurring "merge ไม่ได้ หลังแก้แล้วกลับมาเป็นอีก" pattern
surfaced 2026-05-24: PR #230 hit mergeable_state=dirty twice in a
single session — first when PR #229 (security WARN cleanup) landed
on main mid-iteration, then again when PR #232 + PR #233 (LedgerCraft
A1 + A2) landed before Mark-Ready.

ROOT CAUSE: every PR adds a "**X in flight (this PR)**" entry at
the SAME insertion line in CLAUDE.md §Phase status + AGENTS.md
§Phase + version state (just before "**Next deliverables**" /
"## Claude-Code-specific tooling"). Two parallel PRs both touch
that single line → `git merge` cannot auto-resolve two distinct
adds at the same line → `mergeable_state: dirty`. The conflict
is BENIGN (both adds are legitimate; resolution is always "keep
both in chronological order") but git can't auto-detect that.

THREE COMPANION EDITS bundled here:

(a) CLAUDE.md §Conventions — new bullet "Rebase onto origin/main
    before flipping any PR Draft → Ready". Includes the operational
    `git fetch origin main && git rebase origin/main` recipe + the
    "keep both entries in chronological order (older PR first, your
    entry second)" resolution discipline.

(b) CLAUDE.md §Gotchas — new "Parallel-PR §Phase status collision
    pattern" entry recording the recurring symptom (PR #230 ×2 on
    2026-05-24 against PR #229 + PR #232 + PR #233) + the
    local-mitigation workflow + a forward-looking structural
    follow-up note: consider moving the "in flight" sub-block to
    a side file (PHASE_STATUS_INFLIGHT.md) that's append-only-
    per-PR; on merge a housekeeping commit moves the entry into
    CLAUDE.md proper. Not yet adopted — trade-off is extra
    discipline at merge time.

(c) AGENTS.md mirror — cross-tool equivalent for Copilot / Cursor /
    Devin contributors, emphasising the rebase-before-Mark-Ready
    discipline applies to ALL agent runtimes (not Claude-Code-
    specific).

ALSO bundled as a 3rd in-flight entry in the §Phase status block
documenting THIS PR's scope expansion (Parts 1+2+3 now).

No infrastructure / CI / behavior change — pure doc discipline.
Future contributors who hit the same conflict find the §Gotchas
entry + the §Conventions recipe + AGENTS.md note and resolve in
seconds, instead of re-discovering the pattern.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

* ci(simulate): Part 4 — wire QR_SKIP_FUNDAMENTALS to actually close the 45m cap

Part 2's QR_SKIP_TIER2 fix was necessary but NOT sufficient. Re-pushing
the rebased branch on 2026-05-24 07:06 UTC still hit simulate cancelled
at 45m15s.

ci-triage-engineer deep-dive #2 identified the gap: compute/main.py
has THREE independent SEC EDGAR ThreadPoolExecutor loops, and Part 2
only killed ONE (Step 4b — Tier-2):

  Step 2 (compute/main.py:717) — fundamentals snapshot (502 tickers ×
    SEC companyfacts XBRL); NOT skipped by QR_SKIP_TIER2
  Step 3 (compute/main.py:785) — annual fundamentals history; same
  Step 4b (compute/main.py:972) — Tier-2 / 8-K; killed by Part 2

Cold-cache cost of Steps 2+3 alone is 25-50m per CLAUDE.md §Gotchas —
enough to fill the entire 45m simulate budget. Worse, the cache freshness
gate at compute/ingest/fundamentals.py:917 (_is_fresh()) checks the
filed_date INSIDE each cached parquet against FUNDAMENTALS_REFETCH_DAYS=45.
On a partial cache hit, any ticker with a > 45d-old most-recent filing
still triggers a live EDGAR round-trip.

Part 4 fix — QR_SKIP_FUNDAMENTALS=1 escape hatch:

(a) compute/ingest/fundamentals.py:fetch_fundamentals — env-var check
    added at the top of the function (BEFORE _require_identity()), so
    the cached snapshot is returned unconditionally when set, without
    a freshness check OR an EDGAR identity requirement. Falls through
    to live fetch if no cache exists (cold runner has nothing to read).

(b) compute/ingest/fundamentals.py:fetch_fundamentals_history — mirror
    pattern: returns the cached annual parquet without the 180d age
    check when QR_SKIP_FUNDAMENTALS=1 AND force_refresh=False. Same
    fallthrough.

(c) .github/workflows/pre-merge-prod-sim.yml — adds QR_SKIP_FUNDAMENTALS:
    "1" to the env block alongside QR_SKIP_TIER2 and FORM4_FETCH_SKIP.
    Inline comment explains the root cause + the safety reasoning.

SAFE for simulate because:
- The workflow's purpose is composite-score-DIFF detection vs main's
  COMMITTED rankings.json
- Both sides (PR branch + main) were produced from the same upstream
  fundamentals input (the cache the weekly cron wrote)
- Using that cache without re-fetch is the CORRECT input for the diff
- It's NOT a quality compromise — re-fetching live would give the PR
  branch newer data than main has, making the diff confusing rather
  than accurate

Weekly cron (compute-rankings.yml) does NOT set QR_SKIP_FUNDAMENTALS —
full live fetch still runs there and populates the warm cache for
future simulate restores.

Expected outcome: simulate completes in 8-12m on a cache-hit restore
(no live SEC fetch in any of Steps 2, 3, or 4b). Live validation:
re-push after this commit → simulate green under 25m on this PR.

Tests verified: tests/test_features/test_fundamentals.py +
tests/test_workflow_cache_coverage.py = 24 pass / 6 skipped. ruff
clean. YAML parse-verified. schema_check in sync.

CLAUDE.md §Phase status + AGENTS.md §Phase + version state both
updated with the Part 4 explanation. Future contributors editing
pre-merge-prod-sim.yml should now keep all THREE escape-hatch
env vars set together: FORM4_FETCH_SKIP + QR_SKIP_TIER2 +
QR_SKIP_FUNDAMENTALS.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

---------

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 24, 2026
…fix for parallel-PR collision) (#237)

Closes the structural follow-up tracked in PR #230's §Gotchas
"Parallel-PR §Phase status collision pattern". PR #230 itself hit
the collision 3 times in one session while iterating on the
simulate-cap fix — empirical proof that the doc-discipline
§Convention landed in PR #230 (rebase-before-Mark-Ready) is
necessary but NOT sufficient when 5+ PRs land on main in a single
hour.

Root cause (pre-2026-05-24): every PR was required to add a
"**X in flight (this PR)**" bullet to CLAUDE.md §Phase status +
AGENTS.md §Phase + version state per §Conventions "ship with every
PR" lockstep. Both blocks were append-shaped, with the new bullet
inserted at the SAME line (just before "**Next deliverables**" /
"## Claude-Code-specific tooling"). Two PRs opened in parallel
both touched that single anchor line → `mergeable_state: dirty` →
recurring "merge ไม่ได้ หลังแก้แล้วกลับมาเป็นอีก" frustration.

The conflict was BENIGN (both PRs added distinct entries at the
same insertion line; resolution was always "keep both") but
`git merge` cannot auto-detect that. PR #230 hit this 3 times:
- vs PR #229 (security WARN cleanup) mid-iteration
- vs PR #232 + #233 (LedgerCraft A1 + A2) before Mark-Ready
- vs PR #234 + #235 + #236 (LedgerCraft A3 + B1 + B2+B3+B4)
  during the simulate-fix re-push loop

Structural fix:

(1) NEW FILE — `PHASE_STATUS_INFLIGHT.md` at the repo root. Each
    PR appends its in-flight entry to the END of this file (the
    "In flight (current)" sub-section). Parallel PRs both append
    at disjoint last-lines → `git merge` auto-resolves → no more
    `mergeable_state: dirty` from this pattern.

(2) CLAUDE.md §Conventions — "ship with every PR" rule updated to
    point at PHASE_STATUS_INFLIGHT.md as the canonical destination
    for in-flight entries. Substance updates to CLAUDE.md / AGENTS.md
    sections still land directly (they're rare + cross-PR-coupled
    by nature).

(3) CLAUDE.md §Gotchas "Parallel-PR §Phase status collision pattern"
    marked as RESOLVED 2026-05-24 with a back-reference to the
    side-file adoption. The historical-symptom record stays for
    future readers' context.

(4) AGENTS.md mirror — same updates for cross-tool agents (Copilot /
    Cursor / Devin / VS Code Agent Mode). The side-file works
    regardless of which agent runtime authors the PR.

Trade-offs:

- Housekeeping workflow to drain merged entries from
  PHASE_STATUS_INFLIGHT.md into CLAUDE.md §Phase status proper
  DEFERRED — manual cleanup is fine for the first few weeks while
  the pattern proves itself. `tools/housekeep_phase_status.py`
  may land later once volume + shape are known.
- The rebase-before-Mark-Ready discipline (PR #230 §Conventions
  bullet) still applies as a backstop for OTHER conflict surfaces
  (shared code edits, workflow YAML, schema bumps) — it's no
  longer the recurring drag it used to be, but still good
  discipline.

No compute / schema / scoring / valuation / frontend / Python /
TS code change. Doc-only PR. ruff clean. schema_check clean.
CLAUDE.md + AGENTS.md lockstep satisfied via §Conventions +
§Gotchas substance updates (the "ship with every PR" rule now
points at the new file as the canonical destination, and this
PR's in-flight entry lives in PHASE_STATUS_INFLIGHT.md per the
new convention — dogfooding from commit #1).

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants