Add clean retry quarantine#140
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a “clean retry quarantine” mechanism to isolate repeated Actor/Monitor failure loops by emitting a durable retry_quarantine.json artifact, surfacing retry isolation signals via run-health reporting, and updating shipped skills/templates and docs to enforce the clean-room retry behavior.
Changes:
- Add retry isolation state tracking (
clean_retry_count,contaminated_retry_count,retry_isolation_status) and writeretry_quarantine.jsonafter repeated Monitor failures. - Extend run-health reporting/inventory and schemas to include retry quarantine and retry isolation telemetry while keeping legacy run-health validation compatible.
- Wire clean retry instructions into MAP skills/templates and update docs/learned guidance.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_map_step_runner.py | Adds coverage for retry quarantine build/validation and legacy run-health compatibility. |
| tests/test_map_orchestrator.py | Adds coverage asserting clean retry quarantine is required on repeated Monitor failures (serial + wave) and surfaced in instructions. |
| tests/test_artifact_schemas.py | Extends schema validation tests for retry quarantine and run-health additions. |
| src/mapify_cli/templates/skills/map-task/SKILL.md | Updates workflow instructions to use monitor_failed and clean retry quarantine validation. |
| src/mapify_cli/templates/skills/map-resume/SKILL.md | Adds resume guidance for clean retry quarantine flows. |
| src/mapify_cli/templates/skills/map-efficient/SKILL.md | Adds clean retry quarantine instructions to efficient workflow. |
| src/mapify_cli/templates/skills/map-debug/SKILL.md | Adds debug workflow guidance for building/using retry quarantine on repeated rejections. |
| src/mapify_cli/templates/map/scripts/map_step_runner.py | Implements retry quarantine artifact build/validation and extends run-health signals/inventory. |
| src/mapify_cli/templates/map/scripts/map_orchestrator.py | Implements retry isolation tracking, writes retry quarantine artifact, and surfaces clean-retry instructions in next-step/wave/resume briefing. |
| src/mapify_cli/schemas.py | Adds JSON schema for retry quarantine and extends manifest/run-health schemas to include new fields. |
| README.md | Documents the new clean retry quarantine feature. |
| docs/USAGE.md | Documents operational behavior and validation command for clean retry quarantine. |
| docs/learned/testing-strategies.md | Adds learned checklist for testing retry isolation behavior end-to-end. |
| docs/learned/review-checks.md | Adds review checklist item ensuring clean retry is a behavior gate, not telemetry-only. |
| docs/learned/commands.md | Adds command guidance for validating/building retry quarantine. |
| docs/learned/architecture-patterns.md | Adds architecture pattern note describing quarantine after repeated failures. |
| docs/improvement-plan.md | Minor formatting change (blank line). |
| docs/improvement-loop-log.md | Logs the improvement loop entry for clean retry quarantine. |
| docs/improvement-done.md | Marks the improvement item as done and summarizes the implementation. |
| docs/ARCHITECTURE.md | Adds architectural bullet for clean retry quarantine. |
| .map/scripts/map_step_runner.py | Mirrors step runner changes for repo-local runtime scripts. |
| .map/scripts/map_orchestrator.py | Mirrors orchestrator changes for repo-local runtime scripts. |
| .claude/skills/map-task/SKILL.md | Updates Claude skill instructions to include clean retry quarantine behavior. |
| .claude/skills/map-resume/SKILL.md | Updates Claude resume instructions for clean retry quarantine. |
| .claude/skills/map-efficient/SKILL.md | Updates Claude efficient workflow instructions for clean retry quarantine. |
| .claude/skills/map-debug/SKILL.md | Updates Claude debug workflow instructions for clean retry quarantine. |
Comments suppressed due to low confidence (2)
.map/scripts/map_step_runner.py:2547
validate_retry_quarantine()reads the file withpath.read_text(encoding='utf-8')but only catchesFileNotFoundErrorand(json.JSONDecodeError, OSError). If the file exists but contains invalid UTF-8,read_textraisesUnicodeDecodeErrorand this will crash the command instead of returning a structuredstatus=errorresponse.
try:
payload = json.loads(path.read_text(encoding="utf-8"))
except FileNotFoundError:
return {
"status": "error",
"valid": False,
"path": str(path),
"errors": [f"retry quarantine not found: {path}"],
"warnings": [],
}
except (json.JSONDecodeError, OSError) as exc:
return {
"status": "error",
"valid": False,
"path": str(path),
"errors": [f"cannot read retry quarantine: {exc}"],
"warnings": [],
}
.map/scripts/map_step_runner.py:2596
validate_retry_quarantine()currently verifies only a subset of the entry schema (e.g., it doesn't validatefailed_attempttype/emptiness,rejected_assumptions/do_not_repeatbeing arrays of strings, orsource_artifacts[*].path/kindbeing strings). Since this validator is used as a behavior gate before clean retries and also updatesartifact_manifest.json, it should align withRETRY_QUARANTINE_SCHEMAto avoid marking invalid quarantine artifacts as valid.
required_fields = {
"subtask_id",
"retry_count",
"isolation_mode",
"failed_attempt",
"monitor_rejection_summary",
"rejected_assumptions",
"do_not_repeat",
"preserved_constraints",
"required_evidence",
"source_artifacts",
}
for index, item in enumerate(quarantines):
prefix = f"quarantines[{index}]"
if not isinstance(item, Mapping):
errors.append(f"{prefix} must be an object")
continue
for field_name in sorted(required_fields - set(item)):
errors.append(f"{prefix}.{field_name} is required")
if not isinstance(item.get("subtask_id"), str) or not item.get("subtask_id"):
errors.append(f"{prefix}.subtask_id must be a non-empty string")
retry_count = item.get("retry_count")
if not isinstance(retry_count, int) or retry_count < 2:
errors.append(f"{prefix}.retry_count must be an integer >= 2")
if item.get("isolation_mode") != "clean_retry":
errors.append(f"{prefix}.isolation_mode must be clean_retry")
if not isinstance(item.get("monitor_rejection_summary"), str) or not item.get(
"monitor_rejection_summary"
):
errors.append(f"{prefix}.monitor_rejection_summary must be non-empty")
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+2338
to
2347
| for key in ( | ||
| "retry_count", | ||
| "max_retries", | ||
| "max_subtask_retry_count", | ||
| "clean_retry_count", | ||
| "contaminated_retry_count", | ||
| ): | ||
| value = signals.get(key) | ||
| if key in signals and (not isinstance(value, int) or value < 0): | ||
| errors.append(f"resiliency_signals.{key} must be a non-negative integer") |
Comment on lines
+788
to
+804
| "rejected_assumptions": {"type": "array", "items": {"type": "string"}}, | ||
| "do_not_repeat": {"type": "array", "items": {"type": "string"}}, | ||
| "preserved_constraints": {"type": "array", "items": {"type": "string"}}, | ||
| "required_evidence": {"type": "array", "items": {"type": "string"}}, | ||
| "source_artifacts": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "properties": { | ||
| "path": {"type": "string"}, | ||
| "kind": {"type": "string"}, | ||
| }, | ||
| "required": ["path", "kind"], | ||
| "additionalProperties": False, | ||
| }, | ||
| }, | ||
| }, |
| "updated_at": _utc_timestamp(), | ||
| "quarantines": quarantines, | ||
| } | ||
| path.write_text(json.dumps(payload, indent=2, ensure_ascii=True), encoding="utf-8") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validation