Add /backdate-program and /review-program commands by hua7450 · Pull Request #84 · PolicyEngine/policyengine-claude

hua7450 · 2026-02-26T04:56:37Z

Summary

Adds two new multi-agent commands and updates skills with new patterns. Version bump to 3.11.0.

New Commands

`/review-program` — Consolidated PR Review

Replaces the need to run /review-pr and /audit-state-tax separately. Single command that runs code validation + PDF audit in one pass.

Phase 0: Parse arguments, resolve PR number, ask posting mode
Phase 1-2: Gather PR context + PDF acquisition (always on, in parallel)
Phase 3: Map files to topics, plan agent split (large PDF splitting: max ~40 pages per agent)
Phase 4: Parallel execution — 4 code validators + 2-5 PDF audit agents simultaneously
Phase 5: Two-stage mismatch verification:
- Step 5C: Code-path tracing — verifier greps for parameter usage, traces call chain, checks if parameter is reachable in target year (filters false positives from in_effect gates, deprecated branches)
- Step 5D: 600 DPI visual verification — only for CONFIRMED/INCONCLUSIVE mismatches
Phase 6: Consolidation (single agent merges all findings, deduplicates)
Phase 7: Post to GitHub via --body-file or display locally

Main Claude is a pure orchestrator — reads only short summary files (≤30 lines), never touches diffs, PDFs, or agent reports.

`/backdate-program` — Historical Parameter Backdating

Multi-agent workflow to add historical date entries, fix reference quality, review formula correctness, and improve test coverage.

Phase 0: Parse args, create issue + draft PR (issue-manager), inventory files
Phase 0E: Parallel PDF research pipeline (discovery → prep → dynamic research agent spawning based on page count)
Phase 1: Consolidation + regulatory checkpoint
Phase 2: Reference & formula audit (reference-validator + program-reviewer)
Phase 3: Implementation (parameter-architect + rules-engineer with embedded in_effect/regional_in_effect patterns)
Phase 4: Tests (test-creator + edge-case-generator)
Phase 5: Validation (implementation-validator + ci-fixer + quick audit)
Phase 6: Review-fix loop — runs /review-program --local --full, fixes criticals, commit+push, re-review (max 3 rounds)
Phase 7: Finalize — changelog, push, PR description with "Needs Human Decision" section
Phase 8: Self-learning lessons mechanism:
- Layer 1: Session checklist — fix agents append issues found/fixed, subsequent rounds read it
- Layer 2: Persistent local lessons — distilled to ~/.claude/projects/.../memory/agent-lessons.md (max 50 entries)
- Layer 3: PR to plugin repo — appends to lessons/agent-lessons.md, one open PR at a time

Skill Updates

parameter-patterns: Added in_effect boolean (provision gating) and regional_in_effect boolean (regional variation) patterns with CT TFA production code examples
variable-patterns: Added corresponding variable-side patterns (if p.in_effect:, if p.regional_in_effect: with select()) and comparison table of all three boolean toggle approaches

Agent Tool Fixes

reference-validator: Added Write tool (needs to write findings to /tmp/)
program-reviewer: Added Write tool (needs to write audit reports to /tmp/)
edge-case-generator: Added Edit tool (needs to edit existing test files)

Other Changes

Registered both commands in marketplace.json (country-models + complete plugins)
Version bump to 3.11.0

Files Changed

File	Change
`commands/review-program.md`	New — 860 lines
`commands/backdate-program.md`	New — 1101 lines
`skills/.../policyengine-parameter-patterns-skill/SKILL.md`	Added ~115 lines (in_effect, regional_in_effect)
`skills/.../policyengine-variable-patterns-skill/SKILL.md`	Added ~120 lines (variable-side patterns + comparison table)
`agents/reference-validator.md`	Added Write tool
`agents/country-models/program-reviewer.md`	Added Write tool
`agents/country-models/edge-case-generator.md`	Added Edit tool
`.claude-plugin/marketplace.json`	Registered commands, version 3.11.0
`CHANGELOG.md`	3.11.0 entry

🤖 Generated with Claude Code

New orchestration command that coordinates multi-agent workflows to: - Research historical regulatory sources (parallel PDF discovery, prep, extraction) - Audit reference quality (broken URLs, generic statutes, session law migration) - Review formula correctness (unused params, zero-sentinel anti-patterns) - Implement parameter backdating (YAML date entries, reference fixes) - Run built-in /review-pr and /audit-state-tax as validation phases - Generate comprehensive tests (transition boundaries, all dimensions) Key design decisions: - Main Claude only orchestrates; all work delegated to agents (context protection) - Agent-to-agent communication via SendMessage (no Main Claude relay) - Data flows through files on disk; Main Claude reads only short summaries - Works for any state program (TANF, SNAP, Medicaid, etc.), not just TANF - Incorporates lessons learned from Utah and Connecticut backdating runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…age rule - Replace 14 generic agents with 10 specialized plugin agents: document-collector, reference-validator, program-reviewer, parameter-architect, rules-engineer, test-creator, edge-case-generator, implementation-validator, ci-fixer, pr-pusher - Only 6 general-purpose agents remain (PDF rendering, research, consolidation) - Add agent summary table documenting why each type was chosen - Add towncrier changelog format (changelog.d/<branch>.<type>.md) - Add global PDF page number rule (#page=XX required on all PDF refs, except single-page PDFs) - Integrate /review-pr and /audit-state-tax as Phase 6 built-in review Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Default remains 300 DPI. Use --600dpi for scanned docs, poor-quality PDFs, or dense tables that agents struggle to read at 300 DPI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Combines code validation (4 plugin agents) and PDF audit (2-5 agents) into a single command with PDF acquisition always on by default. Updates /backdate-program Phase 6 to invoke /review-program instead of two separate commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Both commands now split large PDFs across multiple parallel agents (~40 pages max per agent). Main Claude decides agent count using only the page count number from the manifest/prep agent — never reads PDF content itself. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add ORCHESTRATOR ONLY section with explicit MUST NOT / DO rules - Phase 1: delegate diff analysis to general-purpose agent (writes context summary to disk); Main Claude only runs gh commands + saves diff to file - Phase 3: Main Claude reads only two short summaries (context + manifest) - Phase 5C: delegate 600 DPI mismatch verification to agents - Phase 5D: delegate page number verification to agents - Phase 7: use gh pr comment --body-file (no file read into context); local mode uses display-agent to present report - Fix Explore → general-purpose for agents that need Write tool - Fix stale task table reference in backdate-program Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Phase 6 now runs /review-program and fixes critical issues in a loop until zero critical issues remain (max 3 rounds). Round 2+ asks user before continuing. Each round: full review → fix criticals → run tests → re-review. Catches regressions from fixes and cascading issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gh pr diff fetches from GitHub remote API, so local-only commits are invisible. Each fix round now commits AND pushes so the next review round sees the updated code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Phase 0B: issue-manager finds/creates tracking issue + draft PR (runs in parallel with inventory) - Phase 6: review-fix loop now commits + pushes between rounds (gh pr diff reads from remote, needs pushed code) - Phase 7B: reporter writes PR description with unresolved items section for human decision-making - Phase 7C: gh pr edit --body-file updates PR description Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Group both agents under a single step with explicit "spawn in one message" instruction. Move the results collection to after both agents complete. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Main Claude reads only a 10-line summary with counts and the program path. The full inventory with all file paths stays on disk for agents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Skills have 0% coverage for these critical patterns. Embed concrete examples from CT TFA directly into the parameter-architect (Tier A/B) and rules-engineer (Tier B/C) prompts: - Pattern 1: in_effect boolean for provisions with a start date (parameter side + variable side with if p.flag:) - Pattern 2: regional_in_effect for region-based variation (parameter side + variable side with select()) - Explains when to use if p.flag: (scalar) vs where() (vectorized) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…le skills Both skills previously only covered the flat_applies transition pattern. Now include provision gating (in_effect) and regional variation (regional_in_effect) patterns with real CT TFA production code examples and a comparison table showing when to use each of the three boolean toggle approaches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Incorporates the mismatch verification approach from PR #71 (audit-state-tax Phase 5.5). Before 600 DPI visual verification, a code-path verifier traces whether the flagged parameter is actually reachable in the target year's computation. This filters false positives from parameters gated by in_effect booleans, deprecated branches, or overriding parameters. Phase 5 now has two-stage verification: - Step 5C: Code-path tracing (CONFIRMED/REJECTED/INCONCLUSIVE) - Step 5D: 600 DPI visual verification (only for CONFIRMED/INCONCLUSIVE) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three-layer learning system: Layer 1 (session): Fix agents in the review-fix loop append to a session checklist (/tmp/{st}-{prog}-checklist.md). Subsequent fix agents read it to avoid repeating the same mistakes within a single run. Layer 2 (persistent): After the workflow completes, a lesson-extractor agent generalizes session fixes into reusable rules and appends to ~/.claude/projects/.../memory/agent-lessons.md (max 50 entries, pruned). Layer 3 (shared): New lessons are proposed as a PR to policyengine-claude repo (lessons/agent-lessons.md). Only one open lessons PR at a time — multiple runs append to the same PR until a maintainer merges. Implementation agents (parameter-architect, rules-engineer) now load lessons files on startup to prevent known mistakes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…m, and skill patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Added to both country-models and complete plugin command arrays. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rogram - reference-validator: add Write (needs to write findings to /tmp/) - program-reviewer: add Write (needs to write audit reports to /tmp/) - edge-case-generator: add Edit (needs to edit existing test files) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

review-program: - Fix Global Rules numbering gap (missing rule 6) - Fix agent count in summary (no Explore agents used) backdate-program: - Fix Phase 5 Quick Audit: Explore agent → general-purpose (needs Write) - Fix Files on Disk table reference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add Step 5C push-to-remote before Phase 6 review-fix loop - Add --local-diff flag to /review-program for unpushed work - Add --skip-pdf flag for infrastructure/refactoring PRs - Add /tmp cleanup at start of both commands - Make /review-program work for any PR type (scope-aware agent selection) - Use temporary clone in Phase 8C instead of modifying plugin directory - Add fork-based fallback when user lacks push access to plugin repo Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When a boolean parameter (in_effect, regional_in_effect, flat_applies) changes value at date D, the gated parameters must have entries covering that date. Without this, PolicyEngine silently backward-extrapolates a later value, producing incorrect historical amounts (e.g., CT TFA FY2023 gap where regional_in_effect flipped but statewide amount.yaml started 15 months later). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Plugin agents have the Skill tool but workflow prompts never told them to load relevant skills. Added explicit Load skills instructions to: review-program: - Validator 1 (program-reviewer): variable-patterns, parameter-patterns - Validator 2 (reference-validator): parameter-patterns - Validator 3 (implementation-validator): variable-patterns, parameter-patterns, code-style, period-patterns - Validator 4 (edge-case-generator): testing-patterns, period-patterns backdate-program: - edge-case-generator (Phase 4B): testing-patterns, period-patterns - implementation-validator (Phase 5A): variable-patterns, parameter-patterns, code-style, period-patterns - review-fixer rules-engineer (Phase 6C): explicit skills replacing vague "Load appropriate skills" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PDF audit agents, code-path verifiers, and visual mismatch verifiers are general-purpose agents with inline prompts. Without skills loaded, they lack knowledge of PolicyEngine parameter structure, variable patterns, period handling, and boolean toggle patterns — making them more likely to produce false positives or miss real issues. - pdf-audit-{topic}: parameter-patterns, period-patterns - verifier-codepath-{N}: variable-patterns, parameter-patterns, period-patterns, code-style - verifier-mismatch-{N}: parameter-patterns, period-patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PolicyEngine-US has hundreds of existing variables for common concepts (fpg, smi, tanf_fpg, is_tanf_enrolled, ssi, etc.). Agents should search the codebase before creating new non-program-specific variables. Implementation side (backdate-program): - parameter-architect, rules-engineer, review-fixer: "Grep the codebase before creating ANY non-program-specific variable" Validation side (both commands): - program-reviewer: flag reinvented variables as CRITICAL - implementation-validator: duplicate variable detection via Grep Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

hua7450 and others added 4 commits February 25, 2026 23:56

Add --600dpi option for hard-to-read PDFs

c2d454f

Default remains 300 DPI. Use --600dpi for scanned docs, poor-quality PDFs, or dense tables that agents struggle to read at 300 DPI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

hua7450 changed the title ~~Add /backdate-program command~~ Add /backdate-program and /review-program commands Feb 26, 2026

hua7450 and others added 23 commits February 26, 2026 00:31

Bump version to 3.10.0 — add /review-program and /backdate-program

8221cbb

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix review-fix loop: commit + push between rounds

fbcac53

gh pr diff fetches from GitHub remote API, so local-only commits are invisible. Each fix round now commits AND pushes so the next review round sees the updated code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Clarify Phase 0B: issue-manager + inventory spawn in one message

70413b0

Group both agents under a single step with explicit "spawn in one message" instruction. Move the results collection to after both agents complete. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Split inventory into full (for agents) and summary (for orchestrator)

2b73e0f

Main Claude reads only a 10-line summary with counts and the program path. The full inventory with all file paths stays on disk for agents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update 3.10.0 changelog with code-path verification, lessons mechanis…

dae443b

…m, and skill patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Register /backdate-program and /review-program in marketplace

8afae45

Added to both country-models and complete plugin command arrays. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Bump version to 3.11.0 (main is already 3.10.0)

fa22762

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix

0988332

hua7450 merged commit c429185 into main Feb 26, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add /backdate-program and /review-program commands#84

Add /backdate-program and /review-program commands#84
hua7450 merged 27 commits intomainfrom
add-backdate-program-command

hua7450 commented Feb 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hua7450 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Commands

/review-program — Consolidated PR Review

/backdate-program — Historical Parameter Backdating

Skill Updates

Agent Tool Fixes

Other Changes

Files Changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hua7450 commented Feb 26, 2026 •

edited

Loading

`/review-program` — Consolidated PR Review

`/backdate-program` — Historical Parameter Backdating