feat: guard pattern, stuck recovery, scenario dimensions, constraints, noise handling#87
feat: guard pattern, stuck recovery, scenario dimensions, constraints, noise handling#87
Conversation
…ints, and noise handling
Phase 1 (Cheap Wins):
- Add scenario_dimensions to task-decomposer test_strategy schema (happy_path, error, edge_case, security)
- Add scenario dimensions requirement to map-tdd TEST_WRITER
- Add constraint schema (max_files, max_subtasks, time_budget, scope_glob) to map-plan
- Add scope_glob validation (reject .., absolute paths, brace expansion)
- Add iteration_summary.json derivation to ralph-iteration-logger
- Add conditional git history context ({{git_history}}) to actor for debug/retry/resume
Phase 2 (Core Execution Quality):
- Add guard pattern decision table to map-efficient: monitor pass + guard fail → retry Actor (max 2 rework) → escalate
- Add exit code capture (TESTS_EXIT, LINT_EXIT) to TESTS_GATE/LINTER_GATE
- Add stuck recovery at monitor retry 3: research-agent → predictor → retry 4-5
- Add intermediate recovery paths to escalation-matrix
- Add flaky-aware noise handling to final-verifier (3x re-run, 2/3 majority, confidence -= 0.1)
Workflow gate rewrite:
- Rewrite workflow-gate.py from completed_steps (workflow_state.json) to phase-based enforcement (step_state.json)
- Fix chicken-and-egg: Actor agents can now Edit during ACTOR/APPLY/TEST_WRITER phases
- Fix parallel wave mode: check subtask_phases dict instead of disabling gate
- Add constraint enforcement (scope_glob, time_budget) as hook-level deny
- Remove max_files evidence scanning (too expensive per hook call)
- Rewrite tests for new phase-based logic (24 passed)
All checks pass: ruff clean, mypy clean, 676 passed / 1 skipped.
There was a problem hiding this comment.
Pull request overview
This PR updates the MAP Framework templates to improve workflow execution quality and reliability, including phase-based workflow gating (step_state.json), constraint enforcement, and several orchestration/agent protocol enhancements (guard pattern, stuck recovery, flaky/noise handling, scenario dimensions, and iteration summaries).
Changes:
- Rewrite workflow gate enforcement to be phase-based (step_state.json) with added constraint checks (scope_glob, time_budget), plus rewritten tests.
- Add iteration_summary.json derivation to the Ralph iteration logger.
- Update MAP command/agent templates to incorporate scenario dimensions, guard/stuck recovery guidance, flaky-test handling, and conditional git history context.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_workflow_gate.py | Rewritten tests for phase-based gate behavior, branch scoping, and constraint enforcement. |
| src/mapify_cli/templates/settings.json | Expands Claude Code permissions allowlist for editing .claude/* paths. |
| src/mapify_cli/templates/references/escalation-matrix.md | Documents stuck recovery at retry 3 and guard-pattern escalation policy. |
| src/mapify_cli/templates/map/scripts/map_step_runner.py | Adds/adjusts type annotations for issue JSON defaults. |
| src/mapify_cli/templates/hooks/workflow-gate.py | Replaces step-completion gating with phase-based gating + constraints. |
| src/mapify_cli/templates/hooks/ralph-iteration-logger.py | Adds best-effort iteration_summary.json derivation. |
| src/mapify_cli/templates/commands/map-tdd.md | Adds scenario-dimension coverage requirements to TDD phase instructions. |
| src/mapify_cli/templates/commands/map-plan.md | Adds constraints schema + validation guidance for scope_glob before state init. |
| src/mapify_cli/templates/commands/map-efficient.md | Adds stuck recovery pseudocode, guard pattern decision table, and captures exit codes for gates. |
| src/mapify_cli/templates/agents/task-decomposer.md | Extends required JSON schema with test_strategy.scenario_dimensions. |
| src/mapify_cli/templates/agents/final-verifier.md | Adds flaky/noise handling protocol and confidence adjustment rules. |
| src/mapify_cli/templates/agents/actor.md | Adds conditional {{git_history}} context consumption guidance. |
| .claude/settings.json | Mirrors template settings allowlist updates for .claude/* edits/writes. |
| .claude/references/escalation-matrix.md | Mirrors escalation matrix updates. |
| .claude/hooks/workflow-gate.py | Mirrors phase-based gating + constraints logic. |
| .claude/hooks/ralph-iteration-logger.py | Mirrors iteration_summary.json derivation. |
| .claude/commands/map-tdd.md | Mirrors scenario-dimension rules addition. |
| .claude/commands/map-plan.md | Mirrors constraints schema + validation steps addition. |
| .claude/commands/map-efficient.md | Mirrors stuck recovery + guard decision documentation updates. |
| .claude/agents/task-decomposer.md | Mirrors decomposer schema extension. |
| .claude/agents/final-verifier.md | Mirrors flaky/noise handling protocol updates. |
| .claude/agents/actor.md | Mirrors conditional git history guidance. |
Comments suppressed due to low confidence (2)
.claude/hooks/workflow-gate.py:45
extract_target_file_pathsonly considersfile_path(and edits[].file_path). Other hooks in this repo also accepttool_input['path']; if Claude Code sendspathhere, exemption and scope_glob checks won’t run for that tool call. Consider supporting both keys (and the same inside edits) to avoid enforcement bypass.
def extract_target_file_paths(tool_call: dict) -> list[str]:
"""Extract file paths from tool call payload."""
tool_input = tool_call.get("tool_input") or {}
if not isinstance(tool_input, dict):
return []
paths: list[str] = []
direct = tool_input.get("file_path")
if isinstance(direct, str) and direct.strip():
paths.append(direct)
src/mapify_cli/templates/hooks/workflow-gate.py:45
extract_target_file_pathsonly considersfile_path(and edits[].file_path). Other hooks in this repo also accepttool_input['path']; if Claude Code sendspathhere, exemption and scope_glob checks won’t run for that tool call. Consider supporting both keys (and the same inside edits) to avoid enforcement bypass.
def extract_target_file_paths(tool_call: dict) -> list[str]:
"""Extract file paths from tool call payload."""
tool_input = tool_call.get("tool_input") or {}
if not isinstance(tool_input, dict):
return []
paths: list[str] = []
direct = tool_input.get("file_path")
if isinstance(direct, str) and direct.strip():
paths.append(direct)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for tp in target_paths: | ||
| try: | ||
| rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve())) | ||
| except ValueError: | ||
| rel = tp |
There was a problem hiding this comment.
scope_glob enforcement can be bypassed via symlinks / paths that resolve outside the repo root. In the ValueError case you fall back to rel = tp, which can still match the glob even when Path(tp).resolve() is outside Path.cwd(), allowing writes outside the intended scope. Consider denying when the resolved path is not within the repo root (or computing rel from the resolved path only) so scope_glob cannot be circumvented.
| for tp in target_paths: | |
| try: | |
| rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve())) | |
| except ValueError: | |
| rel = tp | |
| repo_root = Path.cwd().resolve() | |
| for tp in target_paths: | |
| resolved = Path(tp).resolve() | |
| try: | |
| rel_path = resolved.relative_to(repo_root) | |
| except ValueError: | |
| # Path resolves outside the repository root: treat as out of scope | |
| return ( | |
| f"Constraint: scope_glob='{scope_glob}'\n" | |
| f"File '{tp}' resolves outside repository root." | |
| ) | |
| rel = str(rel_path) |
.claude/commands/map-tdd.md
Outdated
| 8. Cover scenario dimensions from test_strategy: write tests for at minimum | ||
| happy_path, error, edge_case, and security dimensions (use "N/A" if not applicable). | ||
| Each dimension should have at least one dedicated test or test case. | ||
| 8. Test files MUST be lint-clean. Use proper imports at the top of the file |
There was a problem hiding this comment.
The STRICT RULES list now has two items numbered “8.” (scenario dimensions + lint-clean). Renumber the subsequent items to keep the instructions unambiguous.
| 8. Test files MUST be lint-clean. Use proper imports at the top of the file | |
| 9. Test files MUST be lint-clean. Use proper imports at the top of the file |
.claude/hooks/workflow-gate.py
Outdated
| for tp in target_paths: | ||
| try: | ||
| rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve())) | ||
| except ValueError: | ||
| rel = tp |
There was a problem hiding this comment.
scope_glob enforcement can be bypassed via symlinks / paths that resolve outside the repo root. In the ValueError case you fall back to rel = tp, which can still match the glob even when Path(tp).resolve() is outside Path.cwd(), allowing writes outside the intended scope. Consider denying when the resolved path is not within the repo root (or computing rel from the resolved path only) so scope_glob cannot be circumvented.
| for tp in target_paths: | |
| try: | |
| rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve())) | |
| except ValueError: | |
| rel = tp | |
| repo_root = Path.cwd().resolve() | |
| for tp in target_paths: | |
| resolved = Path(tp).resolve() | |
| try: | |
| rel_path = resolved.relative_to(repo_root) | |
| except ValueError: | |
| # Resolved path is outside the repository root; treat as out of scope. | |
| return ( | |
| f"Constraint: scope_glob='{scope_glob}'\n" | |
| f"File '{resolved}' is outside allowed scope." | |
| ) | |
| rel = str(rel_path) |
| "Edit(.claude/agents/*)", | ||
| "Edit(.claude/commands/*)", | ||
| "Edit(.claude/hooks/*)", | ||
| "Edit(.claude/references/*)", | ||
| "Write(.claude/agents/*)", | ||
| "Write(.claude/commands/*)", | ||
| "Write(.claude/hooks/*)", | ||
| "Write(.claude/references/*)", |
There was a problem hiding this comment.
Allowing Edit/Write on .claude/hooks/* means the agent can modify safety-critical hooks (e.g., safety-guardrails.py / workflow-gate.py) and potentially disable guardrails. Consider adding explicit deny entries for at least .claude/hooks/safety-guardrails.py (and any other security hooks), or avoid allowing .claude/hooks/* broadly. Also, the hook description below still refers to Actor+Monitor steps; it should be updated to reflect the new phase/step_state.json gating.
| 8. Cover scenario dimensions from test_strategy: write tests for at minimum | ||
| happy_path, error, edge_case, and security dimensions (use "N/A" if not applicable). | ||
| Each dimension should have at least one dedicated test or test case. | ||
| 8. Test files MUST be lint-clean. Use proper imports at the top of the file |
There was a problem hiding this comment.
The STRICT RULES list now has two items numbered “8.” (scenario dimensions + lint-clean). Renumber the subsequent items to keep the instructions unambiguous.
| 8. Test files MUST be lint-clean. Use proper imports at the top of the file | |
| 9. Test files MUST be lint-clean. Use proper imports at the top of the file |
| # Run tests if available (do NOT install dependencies). Capture exit code for guard pattern. | ||
| TESTS_EXIT=0 | ||
| if [ -f "pytest.ini" ] || [ -f "setup.py" ]; then | ||
| pytest | ||
| pytest; TESTS_EXIT=$? | ||
| elif [ -f "package.json" ]; then | ||
| npm test | ||
| npm test; TESTS_EXIT=$? | ||
| elif [ -f "go.mod" ]; then | ||
| go test ./... | ||
| go test ./...; TESTS_EXIT=$? | ||
| elif [ -f "Cargo.toml" ]; then | ||
| cargo test | ||
| cargo test; TESTS_EXIT=$? |
There was a problem hiding this comment.
The guard decision table says to pass “test/lint stderr output” to the Actor on guard failures, but the TESTS_GATE/LINTER_GATE snippets only capture exit codes. Consider also capturing the command output (e.g., TEST_OUTPUT=$(pytest ... 2>&1); TESTS_EXIT=$? and similarly for lint) so the orchestrator actually has failure context to pass into the retry.
| lines = log_file.read_text(encoding="utf-8").strip().split("\n") | ||
| entries = [] | ||
| for line in lines: | ||
| if line.strip(): | ||
| try: | ||
| entries.append(json.loads(line)) | ||
| except json.JSONDecodeError: | ||
| continue | ||
|
|
||
| total_entries_seen = len(entries) | ||
| if total_entries_seen == 0: | ||
| return | ||
|
|
||
| # Truncate to last 50 if > 100 entries | ||
| dropped_count = 0 | ||
| if total_entries_seen > 100: | ||
| dropped_count = total_entries_seen - 50 | ||
| entries = entries[-50:] | ||
|
|
There was a problem hiding this comment.
derive_summary reads and parses the entire iteration_log.jsonl on every hook invocation. As the log grows, this becomes O(N) per tool call (and effectively O(N²) over a session), which can violate the “fast hook” expectation. Consider streaming and only keeping the last ~100 lines (and separately counting total lines) rather than read_text().split().
| lines = log_file.read_text(encoding="utf-8").strip().split("\n") | ||
| entries = [] | ||
| for line in lines: | ||
| if line.strip(): | ||
| try: | ||
| entries.append(json.loads(line)) | ||
| except json.JSONDecodeError: | ||
| continue | ||
|
|
||
| total_entries_seen = len(entries) | ||
| if total_entries_seen == 0: | ||
| return | ||
|
|
||
| # Truncate to last 50 if > 100 entries | ||
| dropped_count = 0 | ||
| if total_entries_seen > 100: | ||
| dropped_count = total_entries_seen - 50 | ||
| entries = entries[-50:] | ||
|
|
There was a problem hiding this comment.
derive_summary reads and parses the entire iteration_log.jsonl on every hook invocation. As the log grows, this becomes O(N) per tool call (and effectively O(N²) over a session), which can violate the “fast hook” expectation. Consider streaming and only keeping the last ~100 lines (and separately counting total lines) rather than read_text().split().
| file_data: dict[str, list[float]] = {} | ||
| file_thrashing: Counter[str] = Counter() | ||
| for entry in entries: | ||
| f = entry.get("file", "unknown") |
There was a problem hiding this comment.
derive_summary aggregates per-file stats but includes entries with file == "" (all non-Edit/Write tools log an empty file path). This will skew file_stats and thrashing counters toward an empty filename. Consider skipping entries without a non-empty file (and/or only including Edit/Write tools) when building per-file aggregates.
| f = entry.get("file", "unknown") | |
| f = entry.get("file") | |
| # Skip entries without a concrete file path (e.g., non-Edit/Write tools) | |
| if not f: | |
| continue |
| KNOWN_ISSUES_DEFAULT: dict[str, list[str]] = {"issues": []} | ||
| ACTIVE_ISSUES_DEFAULT: dict[str, str | list[str]] = {"updated_at": "", "issues": []} |
There was a problem hiding this comment.
The new type annotations don’t match the actual JSON structure: KNOWN_ISSUES_DEFAULT/ACTIVE_ISSUES_DEFAULT issues is later treated as a list of dicts, not list[str]. Please adjust the annotations (e.g., list[dict[str, str]] / list[dict[str, object]]) or widen them to avoid misleading types.
| KNOWN_ISSUES_DEFAULT: dict[str, list[str]] = {"issues": []} | |
| ACTIVE_ISSUES_DEFAULT: dict[str, str | list[str]] = {"updated_at": "", "issues": []} | |
| KNOWN_ISSUES_DEFAULT: dict[str, list[dict[str, object]]] = {"issues": []} | |
| ACTIVE_ISSUES_DEFAULT: dict[str, object] = {"updated_at": "", "issues": []} |
- workflow-gate: deny paths resolving outside repo root (symlink bypass) - workflow-gate: remove broad .claude/hooks/* Edit/Write permission - map-efficient: capture test/lint output ($TEST_OUTPUT, $LINT_OUTPUT) not just exit codes - map-tdd: fix duplicate rule numbering (8 → 8, 9) - ralph-iteration-logger: stream last 100 lines via deque (O(1) memory), skip empty file entries - map_step_runner: fix type annotations to match actual JSON structure
research.md duplicated information already present in findings_<branch>.md and spec_<branch>.md. Removed from: map-plan command (preflight, discovery, checkpoint, distillation checklist), __init__.py (artifact list + template), README.md, and test assertion.
Design doc for major map-efficient pipeline rewrite: 11 phases → 2-3 per subtask. Key decisions: - Remove: XML_PACKET, CONTEXT_SEARCH, evidence files, VERIFY_ADHERENCE, workflow_state.json, session-log.md, devlog-XXX.md - Per-wave gates instead of per-subtask (with isolation strategy for multi-subtask interaction failures) - Predictor only in stuck recovery (retry 3), not a regular phase - State lifecycle: MONITOR_PASSED != done; wave advances only after gates pass - New step_state.json schema: active_phase, workflow_status, subtask_files_changed, wave_states - Full migration surface: ~50 files Reviewed: 3 rounds, all P1/P2 resolved (guard isolation, state lifecycle, hook contracts, active_phase enum, workflow_status field).
Summary
Implements Phase 1 (Cheap Wins) and Phase 2 (Core Execution Quality) from the autoresearch analysis.
Phase 1: Cheap Wins
happy_path,error,edge_case,securitymax_files,max_subtasks,time_budget,scope_globwith validationiteration_summary.json){{git_history}}for debug/retry/resume)Phase 2: Core Execution Quality
flaky_detected, confidence -= 0.1Workflow gate rewrite
workflow_state.json(completed_steps) tostep_state.json(phase)subtask_phasesdict per-subtask instead of disabling gate entirelyTest plan
make checkpasses (ruff clean, mypy clean, 676 passed / 1 skipped)make sync-templates— all.claude/↔src/mapify_cli/templates/in sync/map-efficienton a test branch to verify guard pattern + stuck recovery in practice