Add clean retry quarantine by azalio · Pull Request #140 · azalio/map-framework

azalio · 2026-05-20T20:40:07Z

Summary

add clean-room retry isolation after repeated Monitor failures
write and validate retry_quarantine.json, surface clean retry signals in run health, and preserve legacy run-health validation compatibility
wire clean retry instructions into shipped MAP skills/templates and close improvement-plan item 2605.08563

Validation

pytest tests/test_map_orchestrator.py::TestMonitorFailed tests/test_map_orchestrator.py::TestWaveMonitorFailed tests/test_map_step_runner.py::test_write_run_health_report_creates_report_and_manifest tests/test_map_step_runner.py::test_validate_run_health_report_accepts_legacy_without_clean_retry_fields tests/test_map_step_runner.py::test_build_retry_quarantine_writes_valid_artifact tests/test_map_step_runner.py::test_validate_retry_quarantine_rejects_missing_constraints tests/test_artifact_schemas.py::test_validate_run_health_report_schema tests/test_artifact_schemas.py::test_validate_retry_quarantine_schema tests/test_template_sync.py -v
pytest tests/test_skills.py tests/test_template_sync.py -v
PYTHONPATH=src python -m mapify_cli.skill_ir src/mapify_cli/templates/skills src/mapify_cli/templates/codex/skills --format json
generated-project clean retry smoke using mapify init, monitor_failed, validate_retry_quarantine, and write_run_health_report
make lint
pytest -m "not slow"
pytest attempted twice; both runs exceeded the 30-minute tool timeout without a deterministic assertion failure

Copilot

Pull request overview

This PR adds a “clean retry quarantine” mechanism to isolate repeated Actor/Monitor failure loops by emitting a durable retry_quarantine.json artifact, surfacing retry isolation signals via run-health reporting, and updating shipped skills/templates and docs to enforce the clean-room retry behavior.

Changes:

Add retry isolation state tracking (clean_retry_count, contaminated_retry_count, retry_isolation_status) and write retry_quarantine.json after repeated Monitor failures.
Extend run-health reporting/inventory and schemas to include retry quarantine and retry isolation telemetry while keeping legacy run-health validation compatible.
Wire clean retry instructions into MAP skills/templates and update docs/learned guidance.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_map_step_runner.py	Adds coverage for retry quarantine build/validation and legacy run-health compatibility.
tests/test_map_orchestrator.py	Adds coverage asserting clean retry quarantine is required on repeated Monitor failures (serial + wave) and surfaced in instructions.
tests/test_artifact_schemas.py	Extends schema validation tests for retry quarantine and run-health additions.
src/mapify_cli/templates/skills/map-task/SKILL.md	Updates workflow instructions to use `monitor_failed` and clean retry quarantine validation.
src/mapify_cli/templates/skills/map-resume/SKILL.md	Adds resume guidance for clean retry quarantine flows.
src/mapify_cli/templates/skills/map-efficient/SKILL.md	Adds clean retry quarantine instructions to efficient workflow.
src/mapify_cli/templates/skills/map-debug/SKILL.md	Adds debug workflow guidance for building/using retry quarantine on repeated rejections.
src/mapify_cli/templates/map/scripts/map_step_runner.py	Implements retry quarantine artifact build/validation and extends run-health signals/inventory.
src/mapify_cli/templates/map/scripts/map_orchestrator.py	Implements retry isolation tracking, writes retry quarantine artifact, and surfaces clean-retry instructions in next-step/wave/resume briefing.
src/mapify_cli/schemas.py	Adds JSON schema for retry quarantine and extends manifest/run-health schemas to include new fields.
README.md	Documents the new clean retry quarantine feature.
docs/USAGE.md	Documents operational behavior and validation command for clean retry quarantine.
docs/learned/testing-strategies.md	Adds learned checklist for testing retry isolation behavior end-to-end.
docs/learned/review-checks.md	Adds review checklist item ensuring clean retry is a behavior gate, not telemetry-only.
docs/learned/commands.md	Adds command guidance for validating/building retry quarantine.
docs/learned/architecture-patterns.md	Adds architecture pattern note describing quarantine after repeated failures.
docs/improvement-plan.md	Minor formatting change (blank line).
docs/improvement-loop-log.md	Logs the improvement loop entry for clean retry quarantine.
docs/improvement-done.md	Marks the improvement item as done and summarizes the implementation.
docs/ARCHITECTURE.md	Adds architectural bullet for clean retry quarantine.
.map/scripts/map_step_runner.py	Mirrors step runner changes for repo-local runtime scripts.
.map/scripts/map_orchestrator.py	Mirrors orchestrator changes for repo-local runtime scripts.
.claude/skills/map-task/SKILL.md	Updates Claude skill instructions to include clean retry quarantine behavior.
.claude/skills/map-resume/SKILL.md	Updates Claude resume instructions for clean retry quarantine.
.claude/skills/map-efficient/SKILL.md	Updates Claude efficient workflow instructions for clean retry quarantine.
.claude/skills/map-debug/SKILL.md	Updates Claude debug workflow instructions for clean retry quarantine.

Comments suppressed due to low confidence (2)

.map/scripts/map_step_runner.py:2547

validate_retry_quarantine() reads the file with path.read_text(encoding='utf-8') but only catches FileNotFoundError and (json.JSONDecodeError, OSError). If the file exists but contains invalid UTF-8, read_text raises UnicodeDecodeError and this will crash the command instead of returning a structured status=error response.

    try:
        payload = json.loads(path.read_text(encoding="utf-8"))
    except FileNotFoundError:
        return {
            "status": "error",
            "valid": False,
            "path": str(path),
            "errors": [f"retry quarantine not found: {path}"],
            "warnings": [],
        }
    except (json.JSONDecodeError, OSError) as exc:
        return {
            "status": "error",
            "valid": False,
            "path": str(path),
            "errors": [f"cannot read retry quarantine: {exc}"],
            "warnings": [],
        }

.map/scripts/map_step_runner.py:2596

validate_retry_quarantine() currently verifies only a subset of the entry schema (e.g., it doesn't validate failed_attempt type/emptiness, rejected_assumptions/do_not_repeat being arrays of strings, or source_artifacts[*].path/kind being strings). Since this validator is used as a behavior gate before clean retries and also updates artifact_manifest.json, it should align with RETRY_QUARANTINE_SCHEMA to avoid marking invalid quarantine artifacts as valid.

    required_fields = {
        "subtask_id",
        "retry_count",
        "isolation_mode",
        "failed_attempt",
        "monitor_rejection_summary",
        "rejected_assumptions",
        "do_not_repeat",
        "preserved_constraints",
        "required_evidence",
        "source_artifacts",
    }
    for index, item in enumerate(quarantines):
        prefix = f"quarantines[{index}]"
        if not isinstance(item, Mapping):
            errors.append(f"{prefix} must be an object")
            continue
        for field_name in sorted(required_fields - set(item)):
            errors.append(f"{prefix}.{field_name} is required")
        if not isinstance(item.get("subtask_id"), str) or not item.get("subtask_id"):
            errors.append(f"{prefix}.subtask_id must be a non-empty string")
        retry_count = item.get("retry_count")
        if not isinstance(retry_count, int) or retry_count < 2:
            errors.append(f"{prefix}.retry_count must be an integer >= 2")
        if item.get("isolation_mode") != "clean_retry":
            errors.append(f"{prefix}.isolation_mode must be clean_retry")
        if not isinstance(item.get("monitor_rejection_summary"), str) or not item.get(
            "monitor_rejection_summary"
        ):
            errors.append(f"{prefix}.monitor_rejection_summary must be non-empty")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        for key in (
+            "retry_count",
+            "max_retries",
+            "max_subtask_retry_count",
+            "clean_retry_count",
+            "contaminated_retry_count",
+        ):
            value = signals.get(key)
            if key in signals and (not isinstance(value, int) or value < 0):
                errors.append(f"resiliency_signals.{key} must be a non-negative integer")


+        "rejected_assumptions": {"type": "array", "items": {"type": "string"}},
+        "do_not_repeat": {"type": "array", "items": {"type": "string"}},
+        "preserved_constraints": {"type": "array", "items": {"type": "string"}},
+        "required_evidence": {"type": "array", "items": {"type": "string"}},
+        "source_artifacts": {
+            "type": "array",
+            "items": {
+                "type": "object",
+                "properties": {
+                    "path": {"type": "string"},
+                    "kind": {"type": "string"},
+                },
+                "required": ["path", "kind"],
+                "additionalProperties": False,
+            },
+        },
+    },


+        "updated_at": _utc_timestamp(),
+        "quarantines": quarantines,
+    }
+    path.write_text(json.dumps(payload, indent=2, ensure_ascii=True), encoding="utf-8")


Add clean retry quarantine

10fb2d8

Copilot AI review requested due to automatic review settings May 20, 2026 20:40

Copilot started reviewing on behalf of azalio May 20, 2026 20:40 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Harden retry quarantine validation

7165954

azalio merged commit 0fcb6b3 into main May 21, 2026
6 checks passed

azalio deleted the codex/2605-08563-clean-retry branch May 21, 2026 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add clean retry quarantine#140

Add clean retry quarantine#140
azalio merged 2 commits into
mainfrom
codex/2605-08563-clean-retry

azalio commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

azalio commented May 20, 2026

Summary

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants