Skip to content

Add /backdate-program and /review-program commands#84

Merged
hua7450 merged 27 commits intomainfrom
add-backdate-program-command
Feb 26, 2026
Merged

Add /backdate-program and /review-program commands#84
hua7450 merged 27 commits intomainfrom
add-backdate-program-command

Conversation

@hua7450
Copy link
Collaborator

@hua7450 hua7450 commented Feb 26, 2026

Summary

Adds two new multi-agent commands and updates skills with new patterns. Version bump to 3.11.0.

New Commands

/review-program — Consolidated PR Review

Replaces the need to run /review-pr and /audit-state-tax separately. Single command that runs code validation + PDF audit in one pass.

  • Phase 0: Parse arguments, resolve PR number, ask posting mode
  • Phase 1-2: Gather PR context + PDF acquisition (always on, in parallel)
  • Phase 3: Map files to topics, plan agent split (large PDF splitting: max ~40 pages per agent)
  • Phase 4: Parallel execution — 4 code validators + 2-5 PDF audit agents simultaneously
  • Phase 5: Two-stage mismatch verification:
    • Step 5C: Code-path tracing — verifier greps for parameter usage, traces call chain, checks if parameter is reachable in target year (filters false positives from in_effect gates, deprecated branches)
    • Step 5D: 600 DPI visual verification — only for CONFIRMED/INCONCLUSIVE mismatches
  • Phase 6: Consolidation (single agent merges all findings, deduplicates)
  • Phase 7: Post to GitHub via --body-file or display locally

Main Claude is a pure orchestrator — reads only short summary files (≤30 lines), never touches diffs, PDFs, or agent reports.

/backdate-program — Historical Parameter Backdating

Multi-agent workflow to add historical date entries, fix reference quality, review formula correctness, and improve test coverage.

  • Phase 0: Parse args, create issue + draft PR (issue-manager), inventory files
  • Phase 0E: Parallel PDF research pipeline (discovery → prep → dynamic research agent spawning based on page count)
  • Phase 1: Consolidation + regulatory checkpoint
  • Phase 2: Reference & formula audit (reference-validator + program-reviewer)
  • Phase 3: Implementation (parameter-architect + rules-engineer with embedded in_effect/regional_in_effect patterns)
  • Phase 4: Tests (test-creator + edge-case-generator)
  • Phase 5: Validation (implementation-validator + ci-fixer + quick audit)
  • Phase 6: Review-fix loop — runs /review-program --local --full, fixes criticals, commit+push, re-review (max 3 rounds)
  • Phase 7: Finalize — changelog, push, PR description with "Needs Human Decision" section
  • Phase 8: Self-learning lessons mechanism:
    • Layer 1: Session checklist — fix agents append issues found/fixed, subsequent rounds read it
    • Layer 2: Persistent local lessons — distilled to ~/.claude/projects/.../memory/agent-lessons.md (max 50 entries)
    • Layer 3: PR to plugin repo — appends to lessons/agent-lessons.md, one open PR at a time

Skill Updates

  • parameter-patterns: Added in_effect boolean (provision gating) and regional_in_effect boolean (regional variation) patterns with CT TFA production code examples
  • variable-patterns: Added corresponding variable-side patterns (if p.in_effect:, if p.regional_in_effect: with select()) and comparison table of all three boolean toggle approaches

Agent Tool Fixes

  • reference-validator: Added Write tool (needs to write findings to /tmp/)
  • program-reviewer: Added Write tool (needs to write audit reports to /tmp/)
  • edge-case-generator: Added Edit tool (needs to edit existing test files)

Other Changes

  • Registered both commands in marketplace.json (country-models + complete plugins)
  • Version bump to 3.11.0

Files Changed

File Change
commands/review-program.md New — 860 lines
commands/backdate-program.md New — 1101 lines
skills/.../policyengine-parameter-patterns-skill/SKILL.md Added ~115 lines (in_effect, regional_in_effect)
skills/.../policyengine-variable-patterns-skill/SKILL.md Added ~120 lines (variable-side patterns + comparison table)
agents/reference-validator.md Added Write tool
agents/country-models/program-reviewer.md Added Write tool
agents/country-models/edge-case-generator.md Added Edit tool
.claude-plugin/marketplace.json Registered commands, version 3.11.0
CHANGELOG.md 3.11.0 entry

🤖 Generated with Claude Code

hua7450 and others added 4 commits February 25, 2026 23:56
New orchestration command that coordinates multi-agent workflows to:
- Research historical regulatory sources (parallel PDF discovery, prep, extraction)
- Audit reference quality (broken URLs, generic statutes, session law migration)
- Review formula correctness (unused params, zero-sentinel anti-patterns)
- Implement parameter backdating (YAML date entries, reference fixes)
- Run built-in /review-pr and /audit-state-tax as validation phases
- Generate comprehensive tests (transition boundaries, all dimensions)

Key design decisions:
- Main Claude only orchestrates; all work delegated to agents (context protection)
- Agent-to-agent communication via SendMessage (no Main Claude relay)
- Data flows through files on disk; Main Claude reads only short summaries
- Works for any state program (TANF, SNAP, Medicaid, etc.), not just TANF
- Incorporates lessons learned from Utah and Connecticut backdating runs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…age rule

- Replace 14 generic agents with 10 specialized plugin agents:
  document-collector, reference-validator, program-reviewer,
  parameter-architect, rules-engineer, test-creator, edge-case-generator,
  implementation-validator, ci-fixer, pr-pusher
- Only 6 general-purpose agents remain (PDF rendering, research, consolidation)
- Add agent summary table documenting why each type was chosen
- Add towncrier changelog format (changelog.d/<branch>.<type>.md)
- Add global PDF page number rule (#page=XX required on all PDF refs,
  except single-page PDFs)
- Integrate /review-pr and /audit-state-tax as Phase 6 built-in review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Default remains 300 DPI. Use --600dpi for scanned docs, poor-quality
PDFs, or dense tables that agents struggle to read at 300 DPI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Combines code validation (4 plugin agents) and PDF audit (2-5 agents) into
a single command with PDF acquisition always on by default. Updates
/backdate-program Phase 6 to invoke /review-program instead of two separate
commands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hua7450 hua7450 changed the title Add /backdate-program command Add /backdate-program and /review-program commands Feb 26, 2026
hua7450 and others added 23 commits February 26, 2026 00:31
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both commands now split large PDFs across multiple parallel agents
(~40 pages max per agent). Main Claude decides agent count using only
the page count number from the manifest/prep agent — never reads PDF
content itself.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ORCHESTRATOR ONLY section with explicit MUST NOT / DO rules
- Phase 1: delegate diff analysis to general-purpose agent (writes context
  summary to disk); Main Claude only runs gh commands + saves diff to file
- Phase 3: Main Claude reads only two short summaries (context + manifest)
- Phase 5C: delegate 600 DPI mismatch verification to agents
- Phase 5D: delegate page number verification to agents
- Phase 7: use gh pr comment --body-file (no file read into context);
  local mode uses display-agent to present report
- Fix Explore → general-purpose for agents that need Write tool
- Fix stale task table reference in backdate-program

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 6 now runs /review-program and fixes critical issues in a loop
until zero critical issues remain (max 3 rounds). Round 2+ asks user
before continuing. Each round: full review → fix criticals → run tests
→ re-review. Catches regressions from fixes and cascading issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gh pr diff fetches from GitHub remote API, so local-only commits are
invisible. Each fix round now commits AND pushes so the next review
round sees the updated code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Phase 0B: issue-manager finds/creates tracking issue + draft PR
  (runs in parallel with inventory)
- Phase 6: review-fix loop now commits + pushes between rounds
  (gh pr diff reads from remote, needs pushed code)
- Phase 7B: reporter writes PR description with unresolved items
  section for human decision-making
- Phase 7C: gh pr edit --body-file updates PR description

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Group both agents under a single step with explicit "spawn in one
message" instruction. Move the results collection to after both
agents complete.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Main Claude reads only a 10-line summary with counts and the program
path. The full inventory with all file paths stays on disk for agents.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skills have 0% coverage for these critical patterns. Embed concrete
examples from CT TFA directly into the parameter-architect (Tier A/B)
and rules-engineer (Tier B/C) prompts:

- Pattern 1: in_effect boolean for provisions with a start date
  (parameter side + variable side with if p.flag:)
- Pattern 2: regional_in_effect for region-based variation
  (parameter side + variable side with select())
- Explains when to use if p.flag: (scalar) vs where() (vectorized)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…le skills

Both skills previously only covered the flat_applies transition pattern.
Now include provision gating (in_effect) and regional variation
(regional_in_effect) patterns with real CT TFA production code examples
and a comparison table showing when to use each of the three boolean
toggle approaches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Incorporates the mismatch verification approach from PR #71 (audit-state-tax
Phase 5.5). Before 600 DPI visual verification, a code-path verifier traces
whether the flagged parameter is actually reachable in the target year's
computation. This filters false positives from parameters gated by in_effect
booleans, deprecated branches, or overriding parameters.

Phase 5 now has two-stage verification:
- Step 5C: Code-path tracing (CONFIRMED/REJECTED/INCONCLUSIVE)
- Step 5D: 600 DPI visual verification (only for CONFIRMED/INCONCLUSIVE)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three-layer learning system:

Layer 1 (session): Fix agents in the review-fix loop append to a session
checklist (/tmp/{st}-{prog}-checklist.md). Subsequent fix agents read it
to avoid repeating the same mistakes within a single run.

Layer 2 (persistent): After the workflow completes, a lesson-extractor
agent generalizes session fixes into reusable rules and appends to
~/.claude/projects/.../memory/agent-lessons.md (max 50 entries, pruned).

Layer 3 (shared): New lessons are proposed as a PR to policyengine-claude
repo (lessons/agent-lessons.md). Only one open lessons PR at a time —
multiple runs append to the same PR until a maintainer merges.

Implementation agents (parameter-architect, rules-engineer) now load
lessons files on startup to prevent known mistakes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…m, and skill patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added to both country-models and complete plugin command arrays.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rogram

- reference-validator: add Write (needs to write findings to /tmp/)
- program-reviewer: add Write (needs to write audit reports to /tmp/)
- edge-case-generator: add Edit (needs to edit existing test files)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
review-program:
- Fix Global Rules numbering gap (missing rule 6)
- Fix agent count in summary (no Explore agents used)

backdate-program:
- Fix Phase 5 Quick Audit: Explore agent → general-purpose (needs Write)
- Fix Files on Disk table reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Step 5C push-to-remote before Phase 6 review-fix loop
- Add --local-diff flag to /review-program for unpushed work
- Add --skip-pdf flag for infrastructure/refactoring PRs
- Add /tmp cleanup at start of both commands
- Make /review-program work for any PR type (scope-aware agent selection)
- Use temporary clone in Phase 8C instead of modifying plugin directory
- Add fork-based fallback when user lacks push access to plugin repo

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a boolean parameter (in_effect, regional_in_effect, flat_applies)
changes value at date D, the gated parameters must have entries covering
that date. Without this, PolicyEngine silently backward-extrapolates a
later value, producing incorrect historical amounts (e.g., CT TFA
FY2023 gap where regional_in_effect flipped but statewide amount.yaml
started 15 months later).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Plugin agents have the Skill tool but workflow prompts never told them
to load relevant skills. Added explicit Load skills instructions to:

review-program:
- Validator 1 (program-reviewer): variable-patterns, parameter-patterns
- Validator 2 (reference-validator): parameter-patterns
- Validator 3 (implementation-validator): variable-patterns, parameter-patterns, code-style, period-patterns
- Validator 4 (edge-case-generator): testing-patterns, period-patterns

backdate-program:
- edge-case-generator (Phase 4B): testing-patterns, period-patterns
- implementation-validator (Phase 5A): variable-patterns, parameter-patterns, code-style, period-patterns
- review-fixer rules-engineer (Phase 6C): explicit skills replacing vague "Load appropriate skills"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PDF audit agents, code-path verifiers, and visual mismatch verifiers
are general-purpose agents with inline prompts. Without skills loaded,
they lack knowledge of PolicyEngine parameter structure, variable
patterns, period handling, and boolean toggle patterns — making them
more likely to produce false positives or miss real issues.

- pdf-audit-{topic}: parameter-patterns, period-patterns
- verifier-codepath-{N}: variable-patterns, parameter-patterns, period-patterns, code-style
- verifier-mismatch-{N}: parameter-patterns, period-patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PolicyEngine-US has hundreds of existing variables for common concepts
(fpg, smi, tanf_fpg, is_tanf_enrolled, ssi, etc.). Agents should
search the codebase before creating new non-program-specific variables.

Implementation side (backdate-program):
- parameter-architect, rules-engineer, review-fixer: "Grep the codebase
  before creating ANY non-program-specific variable"

Validation side (both commands):
- program-reviewer: flag reinvented variables as CRITICAL
- implementation-validator: duplicate variable detection via Grep

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hua7450 hua7450 merged commit c429185 into main Feb 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant