Skip to content

Add clean retry quarantine#140

Merged
azalio merged 2 commits into
mainfrom
codex/2605-08563-clean-retry
May 21, 2026
Merged

Add clean retry quarantine#140
azalio merged 2 commits into
mainfrom
codex/2605-08563-clean-retry

Conversation

@azalio
Copy link
Copy Markdown
Owner

@azalio azalio commented May 20, 2026

Summary

  • add clean-room retry isolation after repeated Monitor failures
  • write and validate retry_quarantine.json, surface clean retry signals in run health, and preserve legacy run-health validation compatibility
  • wire clean retry instructions into shipped MAP skills/templates and close improvement-plan item 2605.08563

Validation

  • pytest tests/test_map_orchestrator.py::TestMonitorFailed tests/test_map_orchestrator.py::TestWaveMonitorFailed tests/test_map_step_runner.py::test_write_run_health_report_creates_report_and_manifest tests/test_map_step_runner.py::test_validate_run_health_report_accepts_legacy_without_clean_retry_fields tests/test_map_step_runner.py::test_build_retry_quarantine_writes_valid_artifact tests/test_map_step_runner.py::test_validate_retry_quarantine_rejects_missing_constraints tests/test_artifact_schemas.py::test_validate_run_health_report_schema tests/test_artifact_schemas.py::test_validate_retry_quarantine_schema tests/test_template_sync.py -v
  • pytest tests/test_skills.py tests/test_template_sync.py -v
  • PYTHONPATH=src python -m mapify_cli.skill_ir src/mapify_cli/templates/skills src/mapify_cli/templates/codex/skills --format json
  • generated-project clean retry smoke using mapify init, monitor_failed, validate_retry_quarantine, and write_run_health_report
  • make lint
  • pytest -m "not slow"
  • pytest attempted twice; both runs exceeded the 30-minute tool timeout without a deterministic assertion failure

Copilot AI review requested due to automatic review settings May 20, 2026 20:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a “clean retry quarantine” mechanism to isolate repeated Actor/Monitor failure loops by emitting a durable retry_quarantine.json artifact, surfacing retry isolation signals via run-health reporting, and updating shipped skills/templates and docs to enforce the clean-room retry behavior.

Changes:

  • Add retry isolation state tracking (clean_retry_count, contaminated_retry_count, retry_isolation_status) and write retry_quarantine.json after repeated Monitor failures.
  • Extend run-health reporting/inventory and schemas to include retry quarantine and retry isolation telemetry while keeping legacy run-health validation compatible.
  • Wire clean retry instructions into MAP skills/templates and update docs/learned guidance.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_map_step_runner.py Adds coverage for retry quarantine build/validation and legacy run-health compatibility.
tests/test_map_orchestrator.py Adds coverage asserting clean retry quarantine is required on repeated Monitor failures (serial + wave) and surfaced in instructions.
tests/test_artifact_schemas.py Extends schema validation tests for retry quarantine and run-health additions.
src/mapify_cli/templates/skills/map-task/SKILL.md Updates workflow instructions to use monitor_failed and clean retry quarantine validation.
src/mapify_cli/templates/skills/map-resume/SKILL.md Adds resume guidance for clean retry quarantine flows.
src/mapify_cli/templates/skills/map-efficient/SKILL.md Adds clean retry quarantine instructions to efficient workflow.
src/mapify_cli/templates/skills/map-debug/SKILL.md Adds debug workflow guidance for building/using retry quarantine on repeated rejections.
src/mapify_cli/templates/map/scripts/map_step_runner.py Implements retry quarantine artifact build/validation and extends run-health signals/inventory.
src/mapify_cli/templates/map/scripts/map_orchestrator.py Implements retry isolation tracking, writes retry quarantine artifact, and surfaces clean-retry instructions in next-step/wave/resume briefing.
src/mapify_cli/schemas.py Adds JSON schema for retry quarantine and extends manifest/run-health schemas to include new fields.
README.md Documents the new clean retry quarantine feature.
docs/USAGE.md Documents operational behavior and validation command for clean retry quarantine.
docs/learned/testing-strategies.md Adds learned checklist for testing retry isolation behavior end-to-end.
docs/learned/review-checks.md Adds review checklist item ensuring clean retry is a behavior gate, not telemetry-only.
docs/learned/commands.md Adds command guidance for validating/building retry quarantine.
docs/learned/architecture-patterns.md Adds architecture pattern note describing quarantine after repeated failures.
docs/improvement-plan.md Minor formatting change (blank line).
docs/improvement-loop-log.md Logs the improvement loop entry for clean retry quarantine.
docs/improvement-done.md Marks the improvement item as done and summarizes the implementation.
docs/ARCHITECTURE.md Adds architectural bullet for clean retry quarantine.
.map/scripts/map_step_runner.py Mirrors step runner changes for repo-local runtime scripts.
.map/scripts/map_orchestrator.py Mirrors orchestrator changes for repo-local runtime scripts.
.claude/skills/map-task/SKILL.md Updates Claude skill instructions to include clean retry quarantine behavior.
.claude/skills/map-resume/SKILL.md Updates Claude resume instructions for clean retry quarantine.
.claude/skills/map-efficient/SKILL.md Updates Claude efficient workflow instructions for clean retry quarantine.
.claude/skills/map-debug/SKILL.md Updates Claude debug workflow instructions for clean retry quarantine.
Comments suppressed due to low confidence (2)

.map/scripts/map_step_runner.py:2547

  • validate_retry_quarantine() reads the file with path.read_text(encoding='utf-8') but only catches FileNotFoundError and (json.JSONDecodeError, OSError). If the file exists but contains invalid UTF-8, read_text raises UnicodeDecodeError and this will crash the command instead of returning a structured status=error response.
    try:
        payload = json.loads(path.read_text(encoding="utf-8"))
    except FileNotFoundError:
        return {
            "status": "error",
            "valid": False,
            "path": str(path),
            "errors": [f"retry quarantine not found: {path}"],
            "warnings": [],
        }
    except (json.JSONDecodeError, OSError) as exc:
        return {
            "status": "error",
            "valid": False,
            "path": str(path),
            "errors": [f"cannot read retry quarantine: {exc}"],
            "warnings": [],
        }

.map/scripts/map_step_runner.py:2596

  • validate_retry_quarantine() currently verifies only a subset of the entry schema (e.g., it doesn't validate failed_attempt type/emptiness, rejected_assumptions/do_not_repeat being arrays of strings, or source_artifacts[*].path/kind being strings). Since this validator is used as a behavior gate before clean retries and also updates artifact_manifest.json, it should align with RETRY_QUARANTINE_SCHEMA to avoid marking invalid quarantine artifacts as valid.
    required_fields = {
        "subtask_id",
        "retry_count",
        "isolation_mode",
        "failed_attempt",
        "monitor_rejection_summary",
        "rejected_assumptions",
        "do_not_repeat",
        "preserved_constraints",
        "required_evidence",
        "source_artifacts",
    }
    for index, item in enumerate(quarantines):
        prefix = f"quarantines[{index}]"
        if not isinstance(item, Mapping):
            errors.append(f"{prefix} must be an object")
            continue
        for field_name in sorted(required_fields - set(item)):
            errors.append(f"{prefix}.{field_name} is required")
        if not isinstance(item.get("subtask_id"), str) or not item.get("subtask_id"):
            errors.append(f"{prefix}.subtask_id must be a non-empty string")
        retry_count = item.get("retry_count")
        if not isinstance(retry_count, int) or retry_count < 2:
            errors.append(f"{prefix}.retry_count must be an integer >= 2")
        if item.get("isolation_mode") != "clean_retry":
            errors.append(f"{prefix}.isolation_mode must be clean_retry")
        if not isinstance(item.get("monitor_rejection_summary"), str) or not item.get(
            "monitor_rejection_summary"
        ):
            errors.append(f"{prefix}.monitor_rejection_summary must be non-empty")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2338 to 2347
for key in (
"retry_count",
"max_retries",
"max_subtask_retry_count",
"clean_retry_count",
"contaminated_retry_count",
):
value = signals.get(key)
if key in signals and (not isinstance(value, int) or value < 0):
errors.append(f"resiliency_signals.{key} must be a non-negative integer")
Comment thread src/mapify_cli/schemas.py
Comment on lines +788 to +804
"rejected_assumptions": {"type": "array", "items": {"type": "string"}},
"do_not_repeat": {"type": "array", "items": {"type": "string"}},
"preserved_constraints": {"type": "array", "items": {"type": "string"}},
"required_evidence": {"type": "array", "items": {"type": "string"}},
"source_artifacts": {
"type": "array",
"items": {
"type": "object",
"properties": {
"path": {"type": "string"},
"kind": {"type": "string"},
},
"required": ["path", "kind"],
"additionalProperties": False,
},
},
},
Comment thread .map/scripts/map_orchestrator.py Outdated
"updated_at": _utc_timestamp(),
"quarantines": quarantines,
}
path.write_text(json.dumps(payload, indent=2, ensure_ascii=True), encoding="utf-8")
@azalio azalio merged commit 0fcb6b3 into main May 21, 2026
6 checks passed
@azalio azalio deleted the codex/2605-08563-clean-retry branch May 21, 2026 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants