Skip to content

Rectify: Stdout Idle Watchdog + Bounded Suppression for Stall Detection#735

Merged
Trecek merged 6 commits intointegrationfrom
impl-rectify-idle-stall-watchdog-20260411-142908
Apr 12, 2026
Merged

Rectify: Stdout Idle Watchdog + Bounded Suppression for Stall Detection#735
Trecek merged 6 commits intointegrationfrom
impl-rectify-idle-stall-watchdog-20260411-142908

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 11, 2026

Summary

A headless Claude subprocess hung for 82 minutes on a silently stalled HTTPS SSE stream. The TCP connection remained ESTABLISHED with zero traffic, so the _has_active_api_connection suppression gate in _process_monitor.py kept resetting the stale-kill timer every 2-second poll cycle. Neither Channel A nor Channel B detected the stall because Channel A only checks for valid JSONL result records (not raw byte growth), and Channel B's stale detection was suppressed by the ESTABLISHED port-443 connection.

Three layers of fix provide immunity:

  1. Stdout idle watchdog (primary) — new _watch_stdout_idle coroutine in the anyio task group monitors stdout temp file byte count. Fires TerminationReason.IDLE_STALL if no growth for idle_output_timeout seconds. NOT suppressed by active API connections — orthogonal to Channel A/B.

  2. Bounded suppression (defense-in-depth) — max_suppression_seconds caps the _has_active_api_connection and _has_active_child_processes suppression gates so they cannot defer stale kills indefinitely. Counter resets on genuine JSONL activity.

  3. Subprocess timeoutstimeout= added to all unguarded subprocess.run() and asyncio subprocess calls in remote_resolver, smoke_utils, clone, and token_summary_appender.

Changed Files

New (★)

  • tests/execution/test_process_idle_watchdog.py — watchdog coroutine unit tests
  • tests/workspace/test_clone_timeouts.py — static analysis: git network commands have timeouts

Modified (●)

  • src/autoskillit/core/_type_enums.pyIDLE_STALL members on TerminationReason and CliSubtype
  • src/autoskillit/config/settings.pyidle_output_timeout and max_suppression_seconds on RunSkillConfig
  • src/autoskillit/execution/_process_race.py_watch_stdout_idle coroutine, idle_stall field on RaceAccumulator/RaceSignals, priority update in resolve_termination
  • src/autoskillit/execution/_process_monitor.py — bounded suppression logic with suppression_start tracker
  • src/autoskillit/execution/process.py — wire watchdog into task group, IDLE_STALL kill dispatch
  • src/autoskillit/execution/headless.pyIDLE_STALL routing in _build_skill_result (retriable)
  • src/autoskillit/execution/session.py — exhaustive match arms for IDLE_STALL
  • src/autoskillit/core/_type_subprocess.py — protocol updated with new params
  • src/autoskillit/execution/recording.py — runner implementations updated
  • src/autoskillit/execution/remote_resolver.py — 15s timeout via asyncio.wait_for
  • src/autoskillit/smoke_utils.pytimeout=60 on all subprocess calls
  • src/autoskillit/workspace/clone.pytimeout=300 (clone), timeout=120 (push)
  • src/autoskillit/hooks/token_summary_appender.pytimeout=30 on gh api calls

Test Plan

  • test_watch_stdout_idle_fires_on_silence — watchdog fires IDLE_STALL when stdout stops growing
  • test_watch_stdout_idle_resets_on_continuous_output — watchdog does NOT fire when output is steady
  • test_watch_stdout_idle_handles_missing_file — tolerates missing file until it appears
  • test_resolve_termination_idle_stall_priority — idle stall outranks stale in priority chain
  • test_resolve_termination_process_exit_beats_idle_stall — process exit still highest priority
  • test_resolve_termination_idle_stall_beats_stale — idle stall outranks stale
  • test_race_signals_field_count — sentinel: 7 fields on RaceSignals
  • test_build_skill_result_idle_stall_is_retriable — IDLE_STALL produces retriable SkillResult
  • test_run_managed_async_idle_stall_kills_hanging_process — end-to-end integration
  • test_stale_suppression_bounded_by_max_duration — bounded suppression fires after cap
  • test_stale_suppression_resets_on_genuine_activity — counter resets on real JSONL activity
  • test_stale_suppression_logs_warning_on_bounded_kill — warning log emitted
  • Static analysis tests: all subprocess calls in target files have timeout=
  • All 7370 tests pass, pre-commit clean, mypy clean

Implementation Plan

Plan file: .autoskillit/temp/rectify/rectify_idle_stall_watchdog_2026-04-11_140500.md

🤖 Generated with Claude Code via AutoSkillit

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested (posted as COMMENT — cannot REQUEST_CHANGES on own PR)

Comment thread src/autoskillit/execution/remote_resolver.py
stdout, _ = await proc.communicate()
io_task = asyncio.ensure_future(proc.communicate())
try:
await asyncio.wait_for(proc.wait(), timeout=15.0)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: wait_for(proc.wait()) races with concurrent proc.communicate() future. wait() does not drain stdout pipe; if subprocess fills the pipe buffer before exiting, this can deadlock. Use asyncio.wait_for(proc.communicate(), timeout=15.0) directly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. The code creates io_task = asyncio.ensure_future(proc.communicate()) at line 54 BEFORE calling wait_for(proc.wait()). The communicate() future runs concurrently on the event loop and actively drains stdout, so the pipe buffer deadlock scenario cannot occur here. wait() alone would deadlock, but the concurrent communicate() task prevents it.

Comment thread src/autoskillit/execution/_process_monitor.py Outdated
input_data: str | None = None,
completion_drain_timeout: float = 5.0,
linux_tracing_config: Any | None = None,
idle_output_timeout: float | None = None,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] arch: idle_output_timeout and max_suppression_seconds are execution-tuning params (L1 concerns) added to the HeadlessExecutor protocol in L0 core/. This couples L0 to L1 operational details. Consider kwargs escape hatch or moving protocol to L1.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. The SubprocessRunner protocol already contained multiple execution-tuning params at L0 before this PR: timeout (float), stale_threshold (float=1200), completion_drain_timeout (float=5.0), pty_mode (bool), linux_tracing_config (Any|None). The new idle_output_timeout and max_suppression_seconds follow the identical pre-existing pattern. This is the established design for this protocol — it serves as the structural contract mirroring run_managed_async.

Orthogonal to Channel A/B: NOT suppressed by active API connections.
Monitors raw byte count (st_size), not JSONL record structure.
"""
import time as _time
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] cohesion: import time as _time is a local import inside _watch_stdout_idle. The rest of the module uses module-level imports. Promote to module level.

await anyio.sleep(_poll_interval)
try:
current_size = stdout_path.stat().st_size
except OSError:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] defense: except OSError: continue silently swallows all OS errors on stdout_path.stat(). ENOENT is expected, but other errors (permission denied, I/O error) should be logged at DEBUG.

_phase2_poll=_phase2_poll,
_phase1_timeout=_phase1_timeout,
expected_session_id=acc.stdout_session_id,
**_monitor_kwargs, # type: ignore[arg-type]
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] defense: # type: ignore[arg-type] suppresses type error without rationale. dict[str, object] is too wide for the target signature. Consider TypedDict or add explanatory comment.

completion_drain_timeout: float = 5.0
exit_after_stop_delay_ms: int = 120000
idle_output_timeout: int = 600
max_suppression_seconds: int = 1800
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] defense: max_suppression_seconds and idle_output_timeout have no range validation. Zero or negative values would cause silent logic bugs (immediate expiry). Consider post_init assertion.

is_error=True,
exit_code=-1,
needs_retry=True,
retry_reason=RetryReason.STALE,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] defense: RetryReason.STALE reused for IDLE_STALL termination. Downstream retry telemetry cannot distinguish the two failure modes.

@Trecek Trecek added this pull request to the merge queue Apr 12, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 12, 2026
@Trecek Trecek added this pull request to the merge queue Apr 12, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 12, 2026
Trecek and others added 6 commits April 11, 2026 22:27
…ss timeouts

Addresses 82-minute hung SSE stream where ESTABLISHED TCP connection
suppressed stale-kill indefinitely. Three layers of fix:

1. Stdout idle watchdog (_watch_stdout_idle): independent coroutine in the
   anyio task group that monitors stdout temp file byte count. Fires
   IDLE_STALL if no growth for idle_output_timeout seconds. NOT suppressed
   by active API connections — orthogonal to Channel A/B.

2. Bounded suppression: max_suppression_seconds caps _has_active_api_connection
   and _has_active_child_processes suppression gates so they cannot defer
   stale kills indefinitely. Counter resets on genuine JSONL activity.

3. Subprocess timeouts: add timeout= to all unguarded subprocess.run() and
   asyncio subprocess calls in remote_resolver, smoke_utils, clone, and
   token_summary_appender.
Add io_task.cancel() in the TimeoutError handler to prevent a task
leak when proc.communicate() future is abandoned after timeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split shared suppression_start into suppression_start_api and
suppression_start_child to prevent elapsed time carry-over when
sessions alternate between API connection and child process conditions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ynaconf

The contract test test_every_subconfig_field_referenced_in_from_dynaconf
requires every RunSkillConfig field to have a val() line in from_dynaconf.
These two fields were added to the dataclass but not wired into the
config builder, causing merge-queue CI failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Trecek Trecek force-pushed the impl-rectify-idle-stall-watchdog-20260411-142908 branch from deeff77 to 4584ee1 Compare April 12, 2026 05:30
@Trecek Trecek enabled auto-merge April 12, 2026 05:32
@Trecek Trecek added this pull request to the merge queue Apr 12, 2026
Merged via the queue into integration with commit 225d904 Apr 12, 2026
2 checks passed
@Trecek Trecek deleted the impl-rectify-idle-stall-watchdog-20260411-142908 branch April 12, 2026 05:39
github-merge-queue bot pushed a commit that referenced this pull request Apr 13, 2026
## Summary

`clone_repo` collapses three distinct subprocess outcomes
(`TimeoutExpired`, non-zero `returncode`, genuinely absent origin) into
a single silent fallback — `detected_url = ""` → `_resolve_clone_source`
→ `git clone <local-path>` via git's local transport. The return dict's
`remote_url` is an untagged `str`, so four downstream recipes that
splice `context.remote_url` into shell commands and `push_to_remote`
cannot distinguish a healthy clone from a silently-corrupted one. The
same `""` value causes the `if effective_url:` guard at `clone.py:333`
to skip the `#377` origin-isolation rewrite, compounding the regression.

**Bug class:** "subprocess fails → value collapses to empty string →
downstream consumers silently use the empty value → invariant violation
goes undetected." Five prior PRs (#368, #377, #424, #544, #735) each
patched an adjacent sub-path but left this collapse site untouched.

**Part A architectural immunity (three phases, sequentially layered):**

1. **Typed probe result.** Replace the one-line `_resolve_clone_source`
collapse with `_probe_origin_url` returning a `CloneSourceResolution`
dataclass carrying `reason` (`"ok" | "no_origin" | "timeout" |
"error"`), `url`, and `stderr`. The call site must switch on `reason`;
no path can collapse the four cases back into a single string.
2. **Typed return contract.** Introduce `CloneResult` — a `TypedDict`
union with a `clone_source_type: Literal["remote", "local"]`
discriminator on the success shape. mypy statically rejects any return
branch that omits the discriminator, so provenance is impossible to
lose.
3. **Unconditional origin isolation.** Extract `_ensure_origin_isolated`
and call it for every successful clone regardless of URL availability.
Removes the `if effective_url:` guard that silently skipped the `#377`
rewrite.

Together these three phases make the `git clone <local-path>` failure
mode structurally unreachable from the default strategy: the only way to
get a local-source clone is the explicit `strategy="clone_local"` branch
(`shutil.copytree`), which now also produces a typed success result with
`clone_source_type="local"` and a `file://` origin.

## Architecture Impact

### Error/Resilience Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    START([START])

    %% ── RECIPE-TIME VALIDATION ─────────────────────────────────────────────
    subgraph RecipeValidation ["★ RECIPE-TIME VALIDATION  (rules_clone.py)"]
        direction TB
        RuleGate["★ clone-local-strategy-with-<br/>remote-url-capture<br/>━━━━━━━━━━<br/>fired at open_kitchen / load_recipe<br/>inspects strategy + capture_list"]
        RuleErr["★ RuleFinding ERROR<br/>━━━━━━━━━━<br/>strategy='clone_local'<br/>+ remote_url capture"]
        RuleWarn["★ RuleFinding WARNING<br/>━━━━━━━━━━<br/>templated / unknown strategy<br/>+ remote_url capture"]
    end

    T_RECIPE_BLOCKED(["RECIPE_BLOCKED<br/>recipe invalid at load time"])

    %% ── CLONE ENTRY GATES ──────────────────────────────────────────────────
    subgraph CloneGates ["● CLONE ENTRY GATES  (clone_repo)"]
        direction TB
        SourceCheck{"● source_dir<br/>exists?"}
        UncommittedGate{"● detect_uncommitted_changes<br/>━━━━━━━━━━<br/>only when strategy=''<br/>never raises"}
        UnpublishedGate{"● detect_unpublished_branch<br/>━━━━━━━━━━<br/>15 s timeout — fail-open<br/>no origin → False"}
    end

    T_VAL_ERR(["ValueError<br/>source_dir missing"])
    T_UNCOMMITTED(["★ CloneGateUncommitted<br/>━━━━━━━━━━<br/>soft stop: dirty tree<br/>returns dict, no raise"])
    T_UNPUBLISHED(["★ CloneGateUnpublished<br/>━━━━━━━━━━<br/>soft stop: branch not on remote<br/>returns dict, no raise"])

    %% ── ORIGIN PROBE ───────────────────────────────────────────────────────
    subgraph OriginProbe ["★ ORIGIN PROBE  (_probe_origin_url)"]
        direction LR
        Probe["★ git remote get-url origin<br/>━━━━━━━━━━<br/>30 s timeout<br/>always runs (all strategies)"]
        ProbeOK["★ reason='ok'<br/>URL captured"]
        ProbeNoOrigin["★ reason='no_origin'<br/>no remote configured"]
        ProbeTimeout["★ reason='timeout'<br/>subprocess hung ≥ 30 s"]
        ProbeError["★ reason='error'<br/>non-zero / unexpected stderr"]
    end

    %% ── STRATEGY ROUTING ───────────────────────────────────────────────────
    subgraph StrategyRouter ["● STRATEGY ROUTING  (clone_repo)"]
        direction TB
        StratDecide{"● strategy?"}
        ProbeOKCheck{"probe.reason<br/>== 'ok'?"}
        LocalCopy["clone_local:<br/>━━━━━━━━━━<br/>shutil.copytree<br/>remote_url always ''"]
        RemoteClone["● git clone {url}<br/>━━━━━━━━━━<br/>300 s timeout<br/>--branch if set"]
    end

    T_PROBE_ERR(["RuntimeError<br/>clone_origin_probe_failed<br/>use strategy='clone_local'<br/>for local-only repos"])
    T_CLONE_FAIL(["RuntimeError<br/>git clone non-zero exit"])

    %% ── POST-CLONE ISOLATION ───────────────────────────────────────────────
    subgraph PostClone ["POST-CLONE ORIGIN ISOLATION  (_ensure_origin_isolated)"]
        direction TB
        IsolateOrigin["_ensure_origin_isolated<br/>━━━━━━━━━━<br/>origin  → file://clone_path<br/>upstream → real remote URL<br/>(if effective_url truthy)"]
        Decontaminate["decontaminate:<br/>━━━━━━━━━━<br/>git rm --cached GENERATED_FILES<br/>silent on failure (warns only)"]
    end

    T_ISOLATE_ERR(["RuntimeError<br/>origin isolation failed<br/>git remote add/set-url"])

    T_SUCCESS_LOCAL(["★ CloneSuccessResult<br/>━━━━━━━━━━<br/>clone_source_type='local'<br/>remote_url=''"])
    T_SUCCESS_REMOTE(["★ CloneSuccessResult<br/>━━━━━━━━━━<br/>clone_source_type='remote'<br/>remote_url=probed URL"])

    %% ── PUSH RESILIENCE ────────────────────────────────────────────────────
    subgraph PushPath ["● PUSH RESILIENCE  (push_to_remote)"]
        direction TB
        ProtectedCheck{"protected<br/>branch?"}
        URLResolve{"● remote_url param<br/>provided?<br/>━━━━━━━━━━<br/>fall back to source_dir<br/>origin lookup"}
        BareCheck{"classify_remote_url<br/>== 'nonbare_local'?"}
        PushExec["● git push -u upstream {branch}<br/>━━━━━━━━━━<br/>120 s timeout<br/>--force-with-lease if force=True"]
    end

    T_PUSH_PROT(["success=False<br/>error_type='protected_branch_push'"])
    T_PUSH_NOURL(["success=False<br/>neither remote_url<br/>nor source_dir"])
    T_PUSH_NONBARE(["success=False<br/>error_type='local_non_bare_remote'"])
    T_PUSH_FAIL(["success=False<br/>stderr=git output<br/>error_type if force"])
    T_PUSH_OK(["success=True"])

    %% ── HOOK FORMATTER ─────────────────────────────────────────────────────
    subgraph FmtLayer ["★ HOOK OUTPUT FORMATTER  (_fmt_clone_repo)"]
        FmtDetect{"★ payload keys?"}
        FmtWarn["★ ⚠ WARNING<br/>uncommitted_changes<br/>or unpublished_branch"]
        FmtFail["★ ✗ FAIL<br/>'error' key present"]
        FmtOK["★ ✓ OK<br/>clone succeeded"]
    end

    %% ══ CONNECTIONS ════════════════════════════════════════════════════════

    START -.->|"static analysis<br/>before any runtime call"| RuleGate
    RuleGate -->|"clone_local + remote_url capture"| RuleErr
    RuleGate -->|"templated/unknown + remote_url capture"| RuleWarn
    RuleErr --> T_RECIPE_BLOCKED
    RuleWarn --> T_RECIPE_BLOCKED
    RuleGate -->|"safe strategies / no capture"| SourceCheck

    SourceCheck -->|"missing"| T_VAL_ERR
    SourceCheck -->|"exists"| UncommittedGate
    UncommittedGate -->|"dirty (strategy='')"| T_UNCOMMITTED
    UncommittedGate -->|"clean / strategy≠''"| UnpublishedGate
    UnpublishedGate -->|"unpublished (exit 2)"| T_UNPUBLISHED
    UnpublishedGate -->|"ok / timeout / no origin"| Probe

    Probe -->|"returncode=0, url≠''"| ProbeOK
    Probe -->|"'no such remote' in stderr"| ProbeNoOrigin
    Probe -->|"≥30 s elapsed"| ProbeTimeout
    Probe -->|"other non-zero exit"| ProbeError
    ProbeOK --> StratDecide
    ProbeNoOrigin --> StratDecide
    ProbeTimeout --> StratDecide
    ProbeError --> StratDecide

    StratDecide -->|"strategy='clone_local'"| LocalCopy
    StratDecide -->|"proceed / default"| ProbeOKCheck
    ProbeOKCheck -->|"no (no_origin / timeout / error)"| T_PROBE_ERR
    ProbeOKCheck -->|"yes"| RemoteClone
    RemoteClone -->|"returncode≠0"| T_CLONE_FAIL
    RemoteClone -->|"success"| IsolateOrigin
    LocalCopy --> IsolateOrigin

    IsolateOrigin -->|"git error (not already-exists)"| T_ISOLATE_ERR
    IsolateOrigin -->|"ok"| Decontaminate
    Decontaminate -->|"clone_local path"| T_SUCCESS_LOCAL
    Decontaminate -->|"remote clone path"| T_SUCCESS_REMOTE

    T_SUCCESS_REMOTE -.->|"later in pipeline"| ProtectedCheck
    ProtectedCheck -->|"yes"| T_PUSH_PROT
    ProtectedCheck -->|"no"| URLResolve
    URLResolve -->|"neither provided"| T_PUSH_NOURL
    URLResolve -->|"resolved"| BareCheck
    BareCheck -->|"nonbare_local"| T_PUSH_NONBARE
    BareCheck -->|"network / bare_local / none"| PushExec
    PushExec -->|"returncode≠0"| T_PUSH_FAIL
    PushExec -->|"success"| T_PUSH_OK

    T_SUCCESS_LOCAL -.->|"result piped to hook"| FmtDetect
    T_SUCCESS_REMOTE -.->|"result piped to hook"| FmtDetect
    T_UNCOMMITTED -.->|"result piped to hook"| FmtDetect
    T_UNPUBLISHED -.->|"result piped to hook"| FmtDetect
    FmtDetect -->|"uncommitted_changes / unpublished_branch"| FmtWarn
    FmtDetect -->|"error key"| FmtFail
    FmtDetect -->|"neither"| FmtOK

    %% CLASS ASSIGNMENTS
    class START terminal;
    class RuleGate,RuleErr,RuleWarn newComponent;
    class T_RECIPE_BLOCKED,T_VAL_ERR,T_UNCOMMITTED,T_UNPUBLISHED,T_PROBE_ERR,T_CLONE_FAIL,T_ISOLATE_ERR terminal;
    class T_SUCCESS_LOCAL,T_SUCCESS_REMOTE,T_PUSH_OK terminal;
    class T_PUSH_PROT,T_PUSH_NOURL,T_PUSH_NONBARE,T_PUSH_FAIL terminal;
    class SourceCheck,UncommittedGate,UnpublishedGate detector;
    class Probe,ProbeOK,ProbeNoOrigin,ProbeTimeout,ProbeError newComponent;
    class StratDecide,ProbeOKCheck phase;
    class LocalCopy,RemoteClone handler;
    class IsolateOrigin,Decontaminate output;
    class ProtectedCheck,URLResolve,BareCheck detector;
    class PushExec handler;
    class FmtDetect,FmtWarn,FmtFail,FmtOK newComponent;
```

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% ENTRY %%
    CALL(["clone_repo(remote_url='', strategy=?)"])

    subgraph RuntimeGates ["RUNTIME VALIDATION GATES — workspace/clone.py ●"]
        direction TB
        SRC_CHECK["G1 · source.is_dir()<br/>━━━━━━━━━━<br/>Raises ValueError<br/>if not a directory"]
        STRAT_CHECK{"strategy<br/>== '' ?"}
        UNCOMMIT["G2 · detect_uncommitted_changes<br/>━━━━━━━━━━<br/>git status --porcelain<br/>Returns gate result — no clone created"]
        UNPUB["G3 · detect_unpublished_branch<br/>━━━━━━━━━━<br/>git ls-remote --exit-code origin<br/>Returns gate result — no clone created"]
        PROBE["G4 · _probe_origin_url ●<br/>━━━━━━━━━━<br/>Raises RuntimeError if reason != ok<br/>NO silent local-path fallback"]
    end

    subgraph GateResults ["GATE RESULT TYPES — core/_type_results.py ●"]
        direction LR
        RES_UNCOMMIT["CloneGateUncommitted<br/>━━━━━━━━━━<br/>uncommitted_changes: Literal['true']<br/>changed_files: str<br/>total_changed: str<br/>branch, source_dir<br/>— NO clone_path — NO remote_url —"]
        RES_UNPUB["CloneGateUnpublished<br/>━━━━━━━━━━<br/>unpublished_branch: Literal['true']<br/>branch, source_dir<br/>— NO clone_path — NO remote_url —"]
    end

    subgraph RemoteResolution ["remote_url RESOLUTION — workspace/clone.py ●"]
        direction TB
        INPUT_URL["Input: remote_url = '' (optional)<br/>━━━━━━━━━━<br/>Caller may supply or leave blank"]
        LOCAL_STRAT{"strategy ==<br/>'clone_local' ?"}
        USE_PROBE["effective_url = resolution.url<br/>━━━━━━━━━━<br/>Real remote (from G4 probe)"]
        USE_CALLER["effective_url = caller_url OR resolution.url<br/>━━━━━━━━━━<br/>Caller override takes priority"]
        ISOLATE["_ensure_origin_isolated<br/>━━━━━━━━━━<br/>origin  → file://clone_path (unique)<br/>upstream → effective_url"]
    end

    subgraph SuccessResult ["SUCCESS CONTRACT — core/_type_results.py ●"]
        direction TB
        CLONE_PATH["clone_path: str [INIT_ONLY]<br/>━━━━━━━━━━<br/>&lt;run_name&gt;-YYYYMMDD-HHMMSS-ffffff<br/>Unique per call — never reused"]
        SRC_DIR["source_dir: str [INIT_ONLY]<br/>━━━━━━━━━━<br/>expanduser().resolve() — fixed at entry"]
        REMOTE_URL["remote_url: str [RESOLVED · required]<br/>━━━━━━━━━━<br/>Non-Optional on CloneSuccessResult<br/>'' for clone_local · real URL for proceed"]
        SRC_TYPE["clone_source_type: Literal['remote','local'] [INIT_ONLY]<br/>━━━━━━━━━━<br/>'local' when strategy='clone_local'"]
        SRC_REASON["clone_source_reason: str [INIT_ONLY]<br/>━━━━━━━━━━<br/>'strategy_clone_local' or 'ok'"]
    end

    subgraph RecipeGate ["★ RECIPE SEMANTIC GATE — recipe/rules_clone.py ●"]
        direction TB
        RULE["clone-local-strategy-with-remote-url-capture<br/>━━━━━━━━━━<br/>Fires at recipe-load time (open_kitchen / load_recipe)<br/>Checks: python=clone_repo AND capture references result.remote_url<br/>Inspects template VALUE not capture key name"]
        STRAT_EVAL{"strategy<br/>value?"}
        PASS_RULE["PASS — strategy='' / 'proceed'<br/>━━━━━━━━━━<br/>Part A guarantees non-empty remote_url<br/>Safe to capture downstream"]
        ERROR_RULE["ERROR — strategy='clone_local' literal<br/>━━━━━━━━━━<br/>remote_url always '' for local transport<br/>Capturing it corrupts downstream push"]
        WARN_RULE["WARN — strategy=${{template}}<br/>━━━━━━━━━━<br/>Cannot determine statically<br/>May produce empty remote_url"]
    end

    subgraph Formatter ["★ DISPLAY LAYER — hooks/_fmt_status.py (new tests)"]
        FMT["_fmt_clone_repo<br/>━━━━━━━━━━<br/>All scalar fields rendered unconditionally<br/>remote_url: '' rendered as-is for local clones<br/>Sentinel key PRESENCE → WARNING mark (not value)"]
    end

    %% RUNTIME FLOW %%
    CALL --> SRC_CHECK
    SRC_CHECK -->|invalid dir| ERR_VAL(["ValueError"])
    SRC_CHECK -->|valid| STRAT_CHECK
    STRAT_CHECK -->|yes · check clean| UNCOMMIT
    UNCOMMIT -->|changes found| RES_UNCOMMIT
    UNCOMMIT -->|clean| UNPUB
    UNPUB -->|branch absent| RES_UNPUB
    UNPUB -->|published| PROBE
    STRAT_CHECK -->|no| PROBE
    PROBE -->|reason != ok| ERR_PROBE(["RuntimeError no_origin"])
    PROBE -->|reason == ok| INPUT_URL

    %% RESOLUTION FLOW %%
    INPUT_URL --> LOCAL_STRAT
    LOCAL_STRAT -->|yes — shutil.copytree| USE_CALLER
    LOCAL_STRAT -->|no — git clone| USE_PROBE
    USE_PROBE --> USE_CALLER
    USE_CALLER --> ISOLATE
    ISOLATE --> CLONE_PATH
    ISOLATE --> SRC_DIR
    ISOLATE --> REMOTE_URL
    ISOLATE --> SRC_TYPE
    ISOLATE --> SRC_REASON

    %% DISPLAY FLOW %%
    CLONE_PATH --> FMT
    SRC_DIR --> FMT
    REMOTE_URL --> FMT
    SRC_TYPE --> FMT
    SRC_REASON --> FMT
    RES_UNCOMMIT --> FMT
    RES_UNPUB --> FMT

    %% RECIPE GATE FLOW %%
    RULE --> STRAT_EVAL
    STRAT_EVAL -->|safe strategy| PASS_RULE
    STRAT_EVAL -->|clone_local literal| ERROR_RULE
    STRAT_EVAL -->|template expr| WARN_RULE

    %% CLASS ASSIGNMENTS %%
    class CALL terminal;
    class SRC_CHECK,UNCOMMIT,UNPUB,PROBE detector;
    class STRAT_CHECK,LOCAL_STRAT,STRAT_EVAL phase;
    class RES_UNCOMMIT,RES_UNPUB gap;
    class INPUT_URL,USE_PROBE,USE_CALLER,ISOLATE handler;
    class CLONE_PATH,SRC_DIR,SRC_TYPE,SRC_REASON stateNode;
    class REMOTE_URL gap;
    class RULE detector;
    class PASS_RULE output;
    class ERROR_RULE detector;
    class WARN_RULE gap;
    class FMT output;
    class ERR_VAL,ERR_PROBE terminal;
```

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([START])
    GATED([GATED — early return])
    FAIL([ERROR — raises RuntimeError])
    COMPLETE([COMPLETE])

    subgraph Entry ["Entry & Resolution"]
        direction TB
        A["● clone_repo<br/>━━━━━━━━━━<br/>workspace/clone.py<br/>args: source_dir, run_name,<br/>branch, strategy, remote_url"]
        B["Resolve source_dir<br/>━━━━━━━━━━<br/>detect_source_dir(cwd)<br/>if source_dir empty"]
        C["Detect branch<br/>━━━━━━━━━━<br/>detect_branch(source_dir)<br/>blank if HEAD / detached"]
    end

    subgraph Gates ["Strategy Gates — bypassed when strategy='clone_local' or 'proceed'"]
        direction TB
        D{"strategy<br/>bypasses gates?"}
        E["detect_uncommitted_changes<br/>━━━━━━━━━━<br/>git status --porcelain<br/>reads: working tree"]
        F{"any uncommitted<br/>changes?"}
        G["detect_unpublished_branch<br/>━━━━━━━━━━<br/>git ls-remote origin<br/>15 s timeout — fail-open"]
        H{"branch<br/>unpublished?"}
        GU["CloneGateUncommitted<br/>━━━━━━━━━━<br/>writes: {uncommitted_changes,<br/>changed_files, total_changed}"]
        GP["CloneGateUnpublished<br/>━━━━━━━━━━<br/>writes: {unpublished_branch,<br/>branch, source_dir}"]
    end

    subgraph Dispatch ["Clone Dispatch"]
        direction TB
        K{"strategy ==<br/>'clone_local'?"}
        L["● _probe_origin_url<br/>━━━━━━━━━━<br/>git remote get-url origin<br/>30 s timeout<br/>reads: git remote config<br/>writes: CloneSourceResolution"]
        M{"resolution.reason<br/>== 'ok'?"}
        N["shutil.copytree<br/>━━━━━━━━━━<br/>full worktree copy<br/>source_type = 'local'<br/>reason = 'strategy_clone_local'"]
        O["git clone --branch<br/>━━━━━━━━━━<br/>resolution.url → clone_path<br/>300 s timeout<br/>source_type = 'remote'"]
    end

    subgraph Isolation ["● Origin Isolation Fix — #807 / #377"]
        direction TB
        P["● _ensure_origin_isolated<br/>━━━━━━━━━━<br/>UNCONDITIONAL on both paths<br/>writes: origin = file://clone_path<br/>(self-referential — breaks repo aliasing)"]
        Q{"effective_url<br/>truthy?"}
        R["_add_or_set_upstream<br/>━━━━━━━━━━<br/>writes: upstream = real remote URL<br/>preserves provenance for push_to_remote"]
        S["Decontaminate clone<br/>━━━━━━━━━━<br/>git rm --cached GENERATED_FILES<br/>git commit --no-verify<br/>os.unlink disk copies"]
    end

    subgraph Result ["● Result Assembly"]
        direction TB
        T["● CloneSuccessResult<br/>━━━━━━━━━━<br/>clone_path, source_dir<br/>remote_url (effective_url)<br/>★ clone_source_type: 'remote'|'local'<br/>★ clone_source_reason: str<br/>writes: _type_results.py"]
    end

    subgraph Downstream ["Downstream Consumers"]
        direction LR
        U["★ _fmt_clone_repo<br/>━━━━━━━━━━<br/>hooks/_fmt_status.py<br/>renders clone_source_type<br/>+ clone_source_reason<br/>as flat KV output"]
        V["● Recipe Validator<br/>━━━━━━━━━━<br/>run_semantic_rules()<br/>reads: Recipe steps + captures"]
    end

    subgraph RuleLogic ["● New Rule: clone-local-strategy-with-remote-url-capture"]
        direction TB
        W{"step = clone_repo<br/>AND captures<br/>result.remote_url?"}
        X{"strategy value?"}
        RE["★ RuleFinding ERROR<br/>━━━━━━━━━━<br/>strategy='clone_local'<br/>remote_url always ''\n→ recipe INVALID<br/>fixture: invalid.yaml"]
        RW["★ RuleFinding WARNING<br/>━━━━━━━━━━<br/>template / unknown strategy<br/>cannot resolve statically<br/>→ recipe VALID, flagged"]
        RN["No Finding<br/>━━━━━━━━━━<br/>safe/proceed/empty strategy<br/>or no remote_url capture<br/>fixture: valid.yaml"]
    end

    %% MAIN FLOW %%
    START --> A
    A --> B --> C
    C --> D

    D -->|"yes — skip gates"| K
    D -->|"no"| E
    E --> F
    F -->|"yes"| GU --> GATED
    F -->|"no"| G
    G --> H
    H -->|"yes"| GP --> GATED
    H -->|"no"| K

    K -->|"yes (clone_local)"| N
    K -->|"no"| L
    L --> M
    M -->|"no — no usable origin"| FAIL
    M -->|"yes"| O

    N --> P
    O --> P

    P --> Q
    Q -->|"yes"| R --> S
    Q -->|"no"| S
    S --> T

    T --> U
    T --> V

    V --> W
    W -->|"no"| RN
    W -->|"yes"| X
    X -->|"'clone_local'"| RE
    X -->|"template / unknown"| RW
    X -->|"'' / 'proceed'"| RN

    RE --> COMPLETE
    RW --> COMPLETE
    RN --> COMPLETE

    %% CLASS ASSIGNMENTS %%
    class START,GATED,FAIL,COMPLETE terminal;
    class A,N,O,P,R,S handler;
    class B,C,L phase;
    class D,F,H,K,M,Q,W,X stateNode;
    class GU,GP detector;
    class T output;
    class U newComponent;
    class V,RE,RW detector;
    class RN phase;
```

Closes #807

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/remediation-807-20260413-075530-622615/.autoskillit/temp/rectify/rectify_clone_repo_silent_fallback_2026-04-13_110616_part_a.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| investigate | 120 | 14.5k | 477.8k | 78.4k | 1 | 5m 38s |
| rectify | 4.1k | 68.6k | 2.2M | 233.2k | 2 | 25m 4s |
| review | 2.9k | 8.5k | 226.5k | 55.4k | 1 | 6m 55s |
| dry_walkthrough | 3.2k | 45.6k | 2.7M | 137.3k | 2 | 15m 50s |
| implement | 2.4k | 49.1k | 7.0M | 146.4k | 2 | 19m 12s |
| assess | 1.2k | 25.3k | 2.6M | 77.2k | 1 | 11m 51s |
| prepare_pr | 60 | 6.7k | 212.6k | 30.7k | 1 | 1m 43s |
| run_arch_lenses | 3.2k | 31.5k | 622.1k | 118.9k | 3 | 9m 49s |
| compose_pr | 67 | 11.6k | 243.5k | 38.1k | 1 | 2m 46s |
| **Total** | 17.2k | 261.3k | 16.3M | 915.6k | | 1h 38m |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant