fix(verify-output): Section B Tier-2 post-PR-#79 stale expectations (closes #117)#149
Merged
Merged
Conversation
…loses #117) helper.py Section B was hard-failing on `non_reliance_filing` and `auditor_change` fires with "expected 0; flag broken?" — but PR #79 (Phase 4g, 2026-05-15) re-enabled both 8-K Tier-2 defenses by flipping `compute/scoring/tier2._EIGHT_K_DEFENSES_ENABLED = True`. Non-zero fires in the normal cohort band are EXPECTED post-4g, not bugs. Changes: - `section_b_tier2()` now takes `metadata` as a second parameter and replaces the hard-fail-on-any with a soft-band check against the academic cohort priors that calibrated each flag: * going_concern_disclosure — Mayew 2015: 1-3%; WARN > 5% * non_reliance_filing — Schroeder 2024: rare 4.02s; WARN > 2% * auditor_change — Cohen-Malloy-Nguyen 2020: 1-5%; WARN > 5% - Regression guard inverts: if `tier2_coverage_pct` ≤ 5% (proxy for `_EIGHT_K_DEFENSES_ENABLED = False` at compute time) and a flag still fires, that's the real bug — keeps the original "feature flag must hold" contract intact without flipping it backwards on healthy runs. - SKILL.md Section B description + Hard contract checks updated. - CLAUDE.md + AGENTS.md lockstep update per Rule from PR #142. Verification on current production data (commit `3da995dc`, 502 stocks): Section A-H run: 0 failures, 0 warnings (was: 2 failures pre-fix on the stale Section B expectations).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This was referenced May 20, 2026
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…ing cross-ref (#153) Throwaway PR to dogfood the pre-merge-prod-sim workflow (PR #148 + #149). The workflow's path filter triggers on `compute/scoring/**` or `compute/features/**`, but neither prior PR touched those paths — so the sticky-comment + diff-table composition has never run end-to-end in CI. This PR adds a 3-line docstring cross-reference in `composite.py` to epic #150 Phase 3 (pillar correlation analysis + Quality+Profitability ROE double-counting). The cross-ref documents in code where the next structural work lives — useful regardless of smoke-test outcome. Composite logic is unchanged. PHASE3_WEIGHTS unchanged. sum-to-1.0 invariant lock at composite.py:43-45 unchanged. CLAUDE.md + AGENTS.md lockstep update. Co-authored-by: Claude <noreply@anthropic.com>
This was referenced May 20, 2026
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…+ tests (#156) Phase 1.5 — `section_j_annotate_audit()` auto-tabulates the annotate surface (`valuation_warnings` list + boolean-True `tier2_events` keys) across the universe with counts + universe-pct, sorted descending. The 2026-05-20 quarterly cohort audit (issue #130 comment) discovered 10 undocumented flags by manual inventory walk; this section automates that walk so the next quarterly review (2026-08-19) reads the table off the helper. Complements Section E (risk_flags veto totals) by covering the annotate-surface complement; dual-nature flags like `non_reliance_filing` intentionally appear in both. Phase 1.4 — `tests/test_verify_helper.py` adds a 9-test regression suite covering Section A schema reporter (happy + missing-tier2-cov warn + low-fundamentals-cov fail), the Section B 4-branch Tier-2 matrix from PR #149 (tier2_enabled × within-band vs over-band vs no-fire vs has-fire-but-disabled), and the new Section J (empty + populated with descending sort). Helper loads via importlib since it lives outside any package. Real-data smoke (502-stock S&P 500 production output): 17 distinct valuation_warnings flags + 3 tier2_events booleans tabulated, e.g. value_trap_risk 35.1%, goodwill_heavy 17.9%, auditor_change 1.8%. Closes epic #150 Phase 1.4 + 1.5 (Phase 1.6 tracker #155 filed 2026-05-20; Phase 1 closes on merge of this PR). Co-authored-by: Claude <noreply@anthropic.com>
5 tasks
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…ield (closes #155) (#160) Surfaces compute/scoring/tier2._EIGHT_K_DEFENSES_ENABLED into Metadata.tier2_enabled so verify-production-output/helper.py Section B branches on the explicit flag instead of inferring from tier2_coverage_pct > 5%. A future emergency-disable PR will now show up in the verifier output instead of silently masking itself. Schema bump 0.9.2-phase4h.2 → 0.9.3-phase4h.3. Pydantic default True for back-compat with legacy snapshots; TypeScript side optional+nullable so the stale frontend/public/data/metadata.json snapshot still casts cleanly. Helper falls back to coverage-based inference when the key is absent. 3 new regression tests (writer round-trip + Section B explicit- flag-overrides-coverage matrix). Closes the last open AC item carried forward from issue #117 (PR #149 deferred). Phase 1 of epic #150 fully closed. https://claude.ai/code/session_01Nj5sMzisnqDmF46g5ckEJn Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #117.
verify-production-output/helper.pySection B was hard-failing every post-PR-#79 production scan with:But PR #79 (Phase 4g, 2026-05-15) re-enabled both 8-K Tier-2 defenses by flipping
_EIGHT_K_DEFENSES_ENABLED = True. Non-zero fires in the normal cohort band are EXPECTED, not bugs. Cleanup that's been pending since the first 0.9.0-phase4h scan five schema versions ago.What changed
section_b_tier2()rewriteReplaces the hard-fail-on-any with soft-band checks against the academic priors each flag was calibrated against:
going_concern_disclosurenon_reliance_filingtier2_coverage_pct ≤ 5%+ fires > 0auditor_changetier2_coverage_pct ≤ 5%+ fires > 0Inverted regression guard
The original "feature flag must hold" contract stays intact — but it now fires the right direction. If
_EIGHT_K_DEFENSES_ENABLEDever flips back toFalseat compute time (proxied bytier2_coverage_pct ≤ 5%since the compute layer doesn't currently emit an explicittier2_enabledfield), then ANY non-zero fire = real bug. If Tier-2 is healthy, soft band check.SKILL.md
Row 47 (the section-table) + lines 87-95 (Hard contract checks) updated to describe the post-PR-#79 reality.
CLAUDE.md + AGENTS.md
Lockstep update per the convention codified in PR #142. PR #148 also moved from "in flight" → "Recently merged" in CLAUDE.md.
Verification
Run on current production data (commit
3da995dc, 502 stocks). Previously: 2 hard failures on Section B from stale expectations.Test plan
ruff check .— All checks passedpython .claude/skills/verify-production-output/helper.pyon current production data — 0 failures, 0 warningsPaired work (separate, not in this PR)
The 2026-05-19 anchor for issue #130 (Process Hygiene Item #5, quarterly cohort-threshold review) is also due. I'll post the quarterly fire-rate audit as a comment on #130 immediately after this PR — that's a comment + issue-body table update, no code change.
Constraints honored
compute//frontend/workflow_dispatchtriggerGenerated by Claude Code
Generated by Claude Code