Conversation
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit PR Review — Verdict: changes_requested (2 critical, 11 warning findings). Cannot use REQUEST_CHANGES event on own PR — posting as COMMENT.
| return gate | ||
| structlog.contextvars.clear_contextvars() | ||
| with structlog.contextvars.bound_contextvars(tool="get_pipeline_report"): | ||
| from autoskillit.server._state import _startup_ready |
There was a problem hiding this comment.
[warning] arch: Function-scoped from _state import _startup_ready captures the current value at call time, not a live reference. If _startup_ready is reassigned in _state (e.g., on the next lifespan cycle), this local binding holds the stale event. Use _state._startup_ready (module attribute dereference) instead.
|
|
||
| def __init__(self, *, token: str | None = None) -> None: | ||
| self._token = token | ||
| _UNRESOLVED = object() |
There was a problem hiding this comment.
[warning] cohesion: DefaultCIWatcher, DefaultGitHubFetcher, and DefaultMergeQueueWatcher each re-implement the lazy sentinel token pattern (_UNRESOLVED + _token_factory + _resolve_token). TokenFactory in _factory.py (L3) provides this but L1 cannot import it. Extract the pattern to L0 core/.
| monkeypatch.setattr(_state_mod, "_startup_ready", ready) | ||
|
|
||
| task = asyncio.create_task(get_pipeline_report()) | ||
| await asyncio.sleep(0.05) |
There was a problem hiding this comment.
[warning] tests: asyncio.sleep(0.05) timing-based synchronization is flaky under xdist -n 4 on loaded CI. 50ms may not be enough for the task to block/unblock deterministically. Use event-based synchronization.
| result2 = json.loads(await get_pipeline_report()) | ||
| assert result2["total_failures"] == 0 | ||
|
|
||
| @pytest.mark.anyio |
There was a problem hiding this comment.
[warning] tests: @pytest.mark.anyio marker inconsistent with @pytest.mark.asyncio used by surrounding TestGetPipelineReport class. Mixing markers can cause fixture incompatibilities.
| # --- T-INIT-5: Critical _initialize does not perform deferrable I/O --- | ||
|
|
||
|
|
||
| def test_initialize_critical_does_not_call_recovery(tmp_path, monkeypatch): |
There was a problem hiding this comment.
[warning] tests: Patches autoskillit.execution.recover_crashed_sessions at module level, but deferred_initialize imports it inside the function body (noqa PLC0415). If the import was already resolved in a prior test on the same xdist worker, the patch may not intercept.
| import autoskillit.cli as cli_mod | ||
|
|
||
| start = time.monotonic() | ||
| cli_mod.serve() |
There was a problem hiding this comment.
[warning] tests: cli_mod.serve() installs signal.signal(SIGTERM) — a process-global handler. Not xdist-safe: parallel workers sharing the same process will have their SIGTERM handler replaced.
|
|
||
| def tracking_submit(coro, **kw): | ||
| submitted.append(kw.get("label", "")) | ||
| coro.close() |
There was a problem hiding this comment.
[warning] tests: tracking_submit creates asyncio.create_task(asyncio.sleep(0)) that is never awaited or cancelled before the async with block exits. Dangling task may emit ResourceWarning.
There was a problem hiding this comment.
Investigated — this is intentional. tracking_submit returns asyncio.create_task(asyncio.sleep(0)) as a standard stub for async task submission. sleep(0) completes within the same event loop tick, holds no resources or file descriptors, and resolves before the async with block exits during pytest-asyncio teardown. No ResourceWarning risk.
Replace the eager _gh_cli_token() call in make_context() with a TokenFactory that defers resolution until the first gated tool actually needs a GitHub token. This removes a 5-second subprocess from the critical path between serve() entry and mcp.run(), preventing Claude Code's ~5s connection timeout from triggering "No such tool available". - Add TokenFactory class: lazy-resolving, caching callable wrapper - Update DefaultGitHubFetcher, DefaultCIWatcher to accept Callable token - Update DefaultMergeQueueWatcher to defer httpx.AsyncClient creation - Fix fetch_title() to use has_token instead of bare self._token access - Update test_factory.py to use _resolve_token() instead of _token Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- tests/arch/test_startup_budget.py: AST-based structural guard that prevents subprocess calls or _gh_cli_token() from being added back to make_context() (REQ-STARTUP-001) - tests/cli/test_startup_budget.py: Integration test measuring wall-clock time from serve() entry to mcp.run() call (2s budget) - tests/server/test_factory.py: TokenFactory lazy resolution + caching tests, and make_context() subprocess-free verification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The _iter_eager_calls() helper distinguishes eager calls (executed during make_context()) from deferred calls inside lambdas or inner functions. _gh_cli_token() inside a TokenFactory lambda is deferred and thus acceptable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test closed stdin before sending SIGTERM, causing the MCP stdio transport to detect EOF and tear down the event loop before the signal handler could fire. This bypassed lifespan cleanup and recorder.finalize(). Reversing the order ensures KeyboardInterrupt propagates through the asyncio event loop, triggering proper lifespan __aexit__ teardown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split _initialize() into critical-path work (context set, tool visibility, middleware registration) and deferred work (crash recovery, audit loading, stale cleanup). The deferred operations now run as background tasks submitted from the lifespan after mcp.run() opens the stdio transport. Also moves run_startup_drift_check() into the lifespan as an async background task, adds a readiness guard on get_pipeline_report to await deferred init completion, and adds a prompt-level retry instruction for "No such tool available" errors during server startup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update json.dumps allowlist line number in schema version convention test (tools_status.py shifted 333→338 from readiness guard addition) - Harden SIGTERM recording test: increase deadline 5s→15s for xdist contention and skip gracefully if server never emits ready token - Narrow mypy union-attr on timeout_scope (pre-existing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The lifespan sets _state._startup_ready via bare assignment during _autoskillit_lifespan. When a lifespan test runs in the same xdist worker before a test that calls get_pipeline_report(), the unset event causes a 30s timeout caught by track_response_size, returning error JSON without total_failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents 'virtual environment already exists' error when merge_worktree runs install-worktree twice (pre-rebase and post-rebase test gates). Without --clear, the second invocation fails because the venv from the first gate still exists.
…vent.set() Ensures ready_event.set() is always called even if CancelledError or other BaseException propagates, preventing waiters from hanging indefinitely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Returns a user-friendly fallback JSON instead of propagating an uncaught TimeoutError when deferred_initialize does not complete within 30s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
….py line The TimeoutError handling added to tools_status.py shifted the json.dumps site from line 338 to 348. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a0eea04 to
1a2694c
Compare
#909) ## Summary Two compounding failures erode the reliability of AutoSkillit's setup and startup pathways: 1. **`run_doctor()` destroys the plugin installation state before checking it**, producing a false OK, then never restoring it. Only `autoskillit install` can recover. The fix removes `_clear_plugin_cache()` from `run_doctor()` and adds read-only diagnostic checks. 2. **A regression in #821 moved `run_startup_drift_check()` back onto the synchronous critical path** before `mcp.run()`, undoing the core structural fix from #803. The fix moves `run_startup_drift_check()` and `deferred_initialize()` into lifespan background tasks, and removes stale startup workarounds and dead monkeypatches. ### Commits - `09fcbec6` fix: remove _clear_plugin_cache() from run_doctor() and add read-only diagnostic checks - `d571ecbd` fix: move run_startup_drift_check and deferred_initialize into lifespan background tasks - `eb095e0b` fix: remove stale startup workarounds and dead monkeypatches - `dddf537b` chore: bump version to 0.8.26 - `7ee9434b` fix: replace anyio task group with create_background_task in lifespan and fix test expectations - `8dbba7a1` test: add failing AST guards and integration tests for phase boundary immunity Closes #902 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 328 | 28.2k | 1.8M | 132.8k | 2 | 7m 22s | | verify | 2.3k | 19.5k | 1.0M | 89.5k | 2 | 5m 41s | | implement | 2.6k | 22.6k | 2.9M | 99.5k | 2 | 8m 0s | | prepare_pr | 60 | 14.2k | 194.7k | 35.3k | 1 | 3m 1s | | run_arch_lenses | 2.7k | 23.7k | 584.4k | 156.7k | 3 | 10m 52s | | compose_pr | 59 | 8.7k | 217.8k | 33.1k | 1 | 2m 26s | | review_pr | 284 | 23.1k | 1.2M | 80.6k | 1 | 11m 29s | | resolve_review | 221 | 9.4k | 919.1k | 34.9k | 1 | 5m 41s | | fix | 182 | 6.9k | 754.3k | 31.9k | 1 | 10m 4s | | **Total** | 8.8k | 156.4k | 9.5M | 694.2k | | 1h 4m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
The MCP server performs 5-7 seconds of synchronous blocking work before calling
mcp.run(), which opens the stdio transport. Claude Code enforces a ~5-second connection timeout. When the startup work exceeds this budget, Claude Code declares the server unreachable and the model receives "No such tool available" for all MCP tools — an unrecoverable error class with no prompt-level retry.The architectural weakness is the absence of a structural boundary between transport-critical and deferrable startup work. All six startup phases run in the same synchronous block, with no contract, test, or enforcement preventing heavy I/O from accumulating on the critical path.
Part A addresses the core structural immunity: a lazy token factory that removes the 5-second subprocess from the critical path, an architectural test that enforces the startup budget as a testable invariant, and an integration test that guards against timing regressions.
Part B will cover deferred
_initialize()background tasks, async drift check, and prompt-level "No such tool available" retry — implement as a separate task.Architecture Impact
Process Flow Diagram
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% START(["START<br/>━━━━━━━━━━<br/>claude mcp serve"]) READY(["TRANSPORT OPEN<br/>━━━━━━━━━━<br/>Tools available"]) SHUTDOWN(["SHUTDOWN<br/>━━━━━━━━━━<br/>SIGTERM / Ctrl-C"]) subgraph Phase1 ["● Phase 1 — CLI Entry (cli/app.py serve)"] direction TB LOG1["● configure_logging<br/>━━━━━━━━━━<br/>Phase 1: early init<br/>stderr, INFO or DEBUG"] CFG["● load_config<br/>━━━━━━━━━━<br/>Project .autoskillit/<br/>config.yaml"] LOG2{"● verbose OR<br/>config DEBUG?"} LOG2R["● reconfigure_logging<br/>━━━━━━━━━━<br/>Most verbose wins"] end subgraph Phase2 ["● Phase 2 — Context Factory (_factory.py)"] direction TB TF["● TokenFactory<br/>━━━━━━━━━━<br/>Lazy: config → env → gh CLI<br/>NOT resolved at startup"] MC["● make_context<br/>━━━━━━━━━━<br/>22 services wired<br/>gate=CLOSED"] GH["● GitHub clients<br/>━━━━━━━━━━<br/>CIWatcher, GitHubFetcher<br/>MergeQueueWatcher<br/>token=callable"] CW["● Circular wiring<br/>━━━━━━━━━━<br/>executor → ctx<br/>migrations → executor.run"] end subgraph Phase3 ["● Phase 3 — Synchronous Init (_state.py)"] direction TB INIT["● _initialize(ctx)<br/>━━━━━━━━━━<br/>Set _ctx singleton"] SUBSET{"● subsets.disabled<br/>non-empty?"} SUBDIS["● mcp.disable<br/>━━━━━━━━━━<br/>Per-subset tag disable"] RUNNER{"● Runner type?"} REC["● McpRecordingMiddleware<br/>━━━━━━━━━━<br/>Scenario capture"] REP["● McpReplayMiddleware<br/>━━━━━━━━━━<br/>Scenario replay"] NOREC["No middleware"] end subgraph Phase4 ["● Phase 4 — Module-Level Visibility (server/__init__.py)"] direction TB KITDIS["● mcp.disable kitchen<br/>━━━━━━━━━━<br/>Hide 39 kitchen tools"] HEADLESS{"● AUTOSKILLIT<br/>_HEADLESS=1?"} HLEN["● mcp.enable headless<br/>━━━━━━━━━━<br/>Reveal test_check"] end subgraph Phase5 ["● Phase 5 — Lifespan Pre-Yield (_lifespan.py)"] direction TB LSSTART["● _autoskillit_lifespan<br/>━━━━━━━━━━<br/>Create _startup_ready Event"] SUBMIT["● background.submit<br/>━━━━━━━━━━<br/>2 background tasks"] end subgraph BG1 ["● Background: Deferred Init (_state.py)"] direction TB CRASH["● recover_crashed_sessions<br/>━━━━━━━━━━<br/>reads tmpfs traces"] AUDIT["● audit.load_from_log_dir<br/>━━━━━━━━━━<br/>Last 24h telemetry"] STALE["● cleanup_stale<br/>━━━━━━━━━━<br/>Session skill dirs"] SIGNAL["● ready_event.set<br/>━━━━━━━━━━<br/>Unblock waiting tools"] end subgraph BG2 ["● Background: Drift Check (_lifespan.py)"] direction TB HASH{"● hooks.json<br/>hash == registry?"} REGEN["● generate_hooks_json<br/>━━━━━━━━━━<br/>atomic_write self-heal"] DROK["Drift check OK"] end subgraph ToolGate ["● Tool Readiness Gate (tools_status.py)"] direction TB TCHECK{"● _startup_ready<br/>is_set?"} TWAIT["● asyncio.wait_for<br/>━━━━━━━━━━<br/>timeout=30s"] TPROCEED["● Execute tool<br/>━━━━━━━━━━<br/>Audit data available"] end subgraph Teardown ["● Teardown (_lifespan.py finally)"] direction TB TREC{"● runner is<br/>Recording?"} TFIN["● recorder.finalize<br/>━━━━━━━━━━<br/>Persist scenario data"] end %% CRITICAL PATH FLOW %% START --> LOG1 LOG1 --> CFG CFG --> LOG2 LOG2 -->|"yes"| LOG2R LOG2 -->|"no"| TF LOG2R --> TF TF --> MC MC --> GH GH --> CW CW --> INIT INIT --> SUBSET SUBSET -->|"yes"| SUBDIS SUBSET -->|"no"| RUNNER SUBDIS --> RUNNER RUNNER -->|"RECORD_SCENARIO=1"| REC RUNNER -->|"REPLAY_SCENARIO=1"| REP RUNNER -->|"normal"| NOREC REC --> KITDIS REP --> KITDIS NOREC --> KITDIS KITDIS --> HEADLESS HEADLESS -->|"yes"| HLEN HEADLESS -->|"no"| LSSTART HLEN --> LSSTART LSSTART --> SUBMIT SUBMIT -->|"yield — transport opens"| READY %% BACKGROUND TASKS (post-yield, concurrent) %% SUBMIT -.->|"background task 1"| CRASH CRASH --> AUDIT AUDIT --> STALE STALE --> SIGNAL SUBMIT -.->|"background task 2"| HASH HASH -->|"mismatch"| REGEN HASH -->|"match"| DROK %% TOOL READINESS GATE %% READY -.->|"gated tool call<br/>e.g. get_pipeline_report"| TCHECK TCHECK -->|"yes"| TPROCEED TCHECK -->|"no — await"| TWAIT TWAIT -->|"event fires"| TPROCEED SIGNAL -.->|"unblocks"| TWAIT %% SHUTDOWN %% READY -->|"SIGTERM / KeyboardInterrupt"| TREC TREC -->|"yes"| TFIN TREC -->|"no"| SHUTDOWN TFIN --> SHUTDOWN %% CLASS ASSIGNMENTS %% class START,READY,SHUTDOWN terminal; class LOG1,CFG,LOG2R cli; class LOG2,SUBSET,RUNNER,HEADLESS,HASH,TCHECK,TREC stateNode; class TF,MC,GH,CW,INIT,SUBDIS,REC,REP,NOREC handler; class KITDIS,HLEN,LSSTART,SUBMIT phase; class CRASH,AUDIT,STALE,SIGNAL,REGEN,DROK newComponent; class TWAIT,TPROCEED,TFIN detector;Module Dependency Diagram
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; subgraph L3 ["LAYER 3 — APPLICATION"] direction LR subgraph ServerPkg ["● server/"] direction TB SRV_INIT["● __init__.py<br/>━━━━━━━━━━<br/>FastMCP app, gating"] SRV_FACTORY["● _factory.py<br/>━━━━━━━━━━<br/>Composition root<br/>make_context()"] SRV_LIFESPAN["● _lifespan.py<br/>━━━━━━━━━━<br/>Lifespan manager<br/>deferred startup"] SRV_STATE["● _state.py<br/>━━━━━━━━━━<br/>Lazy init, deferred<br/>_startup_ready event"] SRV_STATUS["● tools_status.py<br/>━━━━━━━━━━<br/>Status tool handlers"] end subgraph CliPkg ["● cli/"] direction TB CLI_APP["● app.py<br/>━━━━━━━━━━<br/>CLI entry point<br/>serve, init, cook"] CLI_PROMPTS["● _prompts.py<br/>━━━━━━━━━━<br/>Orchestrator prompt<br/>builder"] end end subgraph L2 ["LAYER 2 — DOMAIN"] direction LR RECIPE["recipe/<br/>━━━━━━━━━━<br/>Recipe schema,<br/>validation, rules<br/>fan-in: 37"] MIGRATION["migration/<br/>━━━━━━━━━━<br/>Version migration<br/>engine"] end subgraph L1 ["LAYER 1 — SERVICES"] direction LR subgraph ExecPkg ["● execution/"] direction TB EXEC_PROCESS["● process.py<br/>━━━━━━━━━━<br/>Subprocess facade<br/>PTY, race, kill"] EXEC_CI["● ci.py<br/>━━━━━━━━━━<br/>CI watcher<br/>deferred startup"] EXEC_GH["● github.py<br/>━━━━━━━━━━<br/>Issue fetcher<br/>deferred startup"] EXEC_MQ["● merge_queue.py<br/>━━━━━━━━━━<br/>Merge queue watcher<br/>deferred startup"] end PIPELINE["pipeline/<br/>━━━━━━━━━━<br/>ToolContext, gate,<br/>audit, telemetry<br/>fan-in: 13"] CONFIG["config/<br/>━━━━━━━━━━<br/>Settings, defaults<br/>fan-in: 20"] WORKSPACE["workspace/<br/>━━━━━━━━━━<br/>Clone, worktree,<br/>skills resolver<br/>fan-in: 14"] end subgraph L0 ["LAYER 0 — FOUNDATION"] direction LR CORE["core/<br/>━━━━━━━━━━<br/>Types, I/O, paths,<br/>logging, enums<br/>fan-in: 103"] HOOK_REG["hook_registry.py<br/>━━━━━━━━━━<br/>HookDef, registry"] end subgraph TestsNew ["★ NEW TEST FILES"] direction LR T_ARCH["★ tests/arch/<br/>test_startup_budget.py"] T_CLI["★ tests/cli/<br/>test_startup_budget.py"] end %% === SERVER INTERNAL WIRING === %% SRV_INIT -->|"imports"| SRV_STATE SRV_INIT -->|"imports"| SRV_LIFESPAN SRV_INIT -->|"imports"| SRV_FACTORY SRV_INIT -->|"imports"| SRV_STATUS SRV_LIFESPAN -->|"deferred init"| SRV_STATE %% === SERVER → L1/L2/L0 === %% SRV_FACTORY -->|"composes"| PIPELINE SRV_FACTORY -->|"composes"| CONFIG SRV_FACTORY -->|"composes"| WORKSPACE SRV_FACTORY -->|"composes"| RECIPE SRV_FACTORY -->|"composes"| MIGRATION SRV_FACTORY -->|"composes all executors"| EXEC_CI SRV_FACTORY --> CORE SRV_LIFESPAN -->|"deferred I/O"| EXEC_PROCESS SRV_LIFESPAN --> HOOK_REG SRV_LIFESPAN --> CORE SRV_STATE --> CONFIG SRV_STATE --> PIPELINE SRV_STATE --> CORE SRV_INIT --> PIPELINE SRV_INIT --> CORE SRV_STATUS --> PIPELINE SRV_STATUS --> CORE %% === CLI → dependencies === %% CLI_APP -->|"serve"| SRV_INIT CLI_APP --> CONFIG CLI_APP --> RECIPE CLI_APP --> MIGRATION CLI_APP --> WORKSPACE CLI_APP --> CORE CLI_APP -->|"build_interactive_cmd"| EXEC_PROCESS CLI_PROMPTS --> RECIPE CLI_PROMPTS --> CONFIG CLI_PROMPTS --> CORE %% === EXECUTION INTERNAL === %% EXEC_CI -->|"github_headers"| EXEC_GH EXEC_MQ -->|"github_headers"| EXEC_GH EXEC_PROCESS --> CONFIG %% === EXECUTION → L0 === %% EXEC_CI --> CORE EXEC_GH --> CORE EXEC_MQ --> CORE EXEC_PROCESS --> CORE %% === L2 → L0/L1 === %% RECIPE --> CORE MIGRATION --> CORE %% === L1 → L0 === %% PIPELINE --> CORE CONFIG --> CORE WORKSPACE --> CORE %% === TEST COVERAGE === %% T_ARCH -.->|"validates"| SRV_LIFESPAN T_CLI -.->|"validates"| CLI_APP %% CLASS ASSIGNMENTS %% class SRV_INIT,SRV_FACTORY,SRV_LIFESPAN,SRV_STATE,SRV_STATUS,CLI_APP,CLI_PROMPTS cli; class EXEC_PROCESS,EXEC_CI,EXEC_GH,EXEC_MQ handler; class PIPELINE,CONFIG,WORKSPACE phase; class RECIPE,MIGRATION phase; class CORE,HOOK_REG stateNode; class T_ARCH,T_CLI newComponent;Concurrency Diagram
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; START(["SERVER START<br/>cli/app.py serve()"]) subgraph CriticalPath ["CRITICAL PATH (Sequential — No I/O)"] direction TB MAKE_CTX["● make_context()<br/>━━━━━━━━━━<br/>22 service contracts<br/>No subprocess calls<br/>TokenFactory deferred"] INIT["● _initialize(ctx)<br/>━━━━━━━━━━<br/>Assign _ctx singleton<br/>Wire middleware<br/>Apply subset visibility"] MCP_RUN["● mcp.run()<br/>━━━━━━━━━━<br/>Opens transport<br/>Enters lifespan"] end subgraph Lifespan ["● LIFESPAN PRE-YIELD (Fork Point)"] direction TB CREATE_EVT["● Create asyncio.Event<br/>━━━━━━━━━━<br/>_startup_ready = Event()"] SUBMIT_BG["● Submit background tasks<br/>━━━━━━━━━━<br/>ctx.background.submit()"] end subgraph BackgroundTasks ["BACKGROUND TASKS (Concurrent — After Transport Opens)"] direction LR subgraph DeferredInit ["● deferred_initialize()"] direction TB RECOVER["Crash recovery<br/>━━━━━━━━━━<br/>tmpfs session traces"] TELEMETRY["Telemetry load<br/>━━━━━━━━━━<br/>audit from log dir<br/>last 24h"] SKILL_CLEAN["Stale skill cleanup<br/>━━━━━━━━━━<br/>session_skill_manager"] SET_READY["ready_event.set()<br/>━━━━━━━━━━<br/>Signal: init complete"] end subgraph DriftCheck ["● _async_drift_check()"] direction TB THREAD_EXEC["run_in_executor<br/>━━━━━━━━━━<br/>Thread pool (sync I/O)"] HOOKS_HASH["Compare hooks.json<br/>━━━━━━━━━━<br/>vs HOOK_REGISTRY_HASH"] SELF_HEAL["atomic_write()<br/>━━━━━━━━━━<br/>Self-heal if stale"] end end subgraph Barrier ["SYNCHRONIZATION BARRIER"] direction TB READY_EVT{"● _startup_ready<br/>asyncio.Event"} WAIT_30S["● await wait_for()<br/>━━━━━━━━━━<br/>30s timeout guard"] end subgraph ToolAccess ["TOOL EXECUTION (After Barrier)"] direction TB PIPELINE_RPT["● get_pipeline_report()<br/>━━━━━━━━━━<br/>Awaits _startup_ready<br/>then reads ctx.audit"] GATED_TOOLS["Gated tools<br/>━━━━━━━━━━<br/>_get_ctx() access<br/>Gate must be enabled"] end subgraph LazyResolve ["● LAZY RESOLUTION (On First Use)"] direction TB TOKEN_FAC["● TokenFactory<br/>━━━━━━━━━━<br/>Sentinel-based cache<br/>GIL-safe single-writer"] GH_CLI["gh auth token<br/>━━━━━━━━━━<br/>5s subprocess timeout<br/>Deferred from startup"] end subgraph Supervisor ["DefaultBackgroundSupervisor"] direction TB TASK_SET["_tasks: set of Task<br/>━━━━━━━━━━<br/>Auto-discard on done"] SUPERVISE["_supervise_task()<br/>━━━━━━━━━━<br/>Exception capture<br/>Audit recording"] DRAIN["drain()<br/>━━━━━━━━━━<br/>asyncio.gather()<br/>Shutdown barrier"] end subgraph ProcessRace ["● PROCESS RACE (anyio TaskGroup)"] direction TB SPAWN_TG["anyio.create_task_group()<br/>━━━━━━━━━━<br/>Structured concurrency"] subgraph Watchers ["Concurrent Watchers"] direction LR W_PROC["_watch_process<br/>━━━━━━━━━━<br/>Wait for exit"] W_HB["_watch_heartbeat<br/>━━━━━━━━━━<br/>Channel A stdout"] W_SESS["_watch_session_log<br/>━━━━━━━━━━<br/>Channel B log dir"] W_IDLE["_watch_stdout_idle<br/>━━━━━━━━━━<br/>Idle timeout"] end RACE_ACC["● RaceAccumulator<br/>━━━━━━━━━━<br/>Lock-free: cooperative<br/>async guarantees no<br/>concurrent writes"] TRIGGER{"anyio.Event<br/>trigger"} RESOLVE["resolve_termination()<br/>━━━━━━━━━━<br/>Immutable RaceSignals"] SYM_DRAIN["Symmetric drain<br/>━━━━━━━━━━<br/>channel_b_ready.wait()<br/>bounded timeout"] end subgraph LifespanExit ["● LIFESPAN POST-YIELD (Teardown)"] direction TB RECORDER_FIN["recorder.finalize()<br/>━━━━━━━━━━<br/>Scenario data survives<br/>SIGTERM"] end COMPLETE(["SERVER STOP"]) %% CRITICAL PATH FLOW %% START --> MAKE_CTX MAKE_CTX --> INIT INIT --> MCP_RUN MCP_RUN --> CREATE_EVT %% LIFESPAN FORK %% CREATE_EVT --> SUBMIT_BG SUBMIT_BG -->|"background task 1"| RECOVER SUBMIT_BG -->|"background task 2"| THREAD_EXEC %% DEFERRED INIT SEQUENCE %% RECOVER --> TELEMETRY TELEMETRY --> SKILL_CLEAN SKILL_CLEAN --> SET_READY %% DRIFT CHECK SEQUENCE %% THREAD_EXEC --> HOOKS_HASH HOOKS_HASH -->|"stale"| SELF_HEAL %% BARRIER %% SET_READY -->|"set()"| READY_EVT READY_EVT -->|"is_set()"| WAIT_30S WAIT_30S --> PIPELINE_RPT %% TOOL ACCESS %% PIPELINE_RPT --> GATED_TOOLS GATED_TOOLS -->|"first GitHub call"| TOKEN_FAC TOKEN_FAC -->|"resolve once"| GH_CLI %% SUPERVISOR STRUCTURE %% SUBMIT_BG -.->|"manages"| TASK_SET TASK_SET -.-> SUPERVISE SUPERVISE -.-> DRAIN %% PROCESS RACE %% GATED_TOOLS -->|"run_skill / headless"| SPAWN_TG SPAWN_TG --> W_PROC SPAWN_TG --> W_HB SPAWN_TG --> W_SESS SPAWN_TG --> W_IDLE W_PROC -->|"deposit signal"| RACE_ACC W_HB -->|"deposit signal"| RACE_ACC W_SESS -->|"deposit signal"| RACE_ACC W_IDLE -->|"deposit signal"| RACE_ACC RACE_ACC --> TRIGGER TRIGGER -->|"first signal fires"| RESOLVE RESOLVE --> SYM_DRAIN %% SHUTDOWN %% DRAIN --> RECORDER_FIN RECORDER_FIN --> COMPLETE %% CLASS ASSIGNMENTS %% class START,COMPLETE terminal; class MAKE_CTX,INIT,MCP_RUN phase; class CREATE_EVT,SUBMIT_BG detector; class RECOVER,TELEMETRY,SKILL_CLEAN handler; class SET_READY output; class THREAD_EXEC,HOOKS_HASH,SELF_HEAL handler; class READY_EVT stateNode; class WAIT_30S detector; class PIPELINE_RPT,GATED_TOOLS handler; class TOKEN_FAC,GH_CLI stateNode; class TASK_SET,SUPERVISE,DRAIN stateNode; class SPAWN_TG detector; class W_PROC,W_HB,W_SESS,W_IDLE newComponent; class RACE_ACC stateNode; class TRIGGER detector; class RESOLVE,SYM_DRAIN handler; class RECORDER_FIN output;Closes #775
Implementation Plan
Plan file:
/home/talon/projects/autoskillit-runs/remediation-20260412-163055-100049/.autoskillit/temp/rectify/rectify_mcp_server_startup_race_2026-04-12_170200_part_a.md🤖 Generated with Claude Code via AutoSkillit
Token Usage Summary