Skip to content

Stale-hooks check prompts infinitely — never clears after install #680

@Trecek

Description

@Trecek

Problem

Every invocation of autoskillit cook (or any non-serve CLI subcommand) prompts the user about stale hooks. Answering yes and running autoskillit install does not clear the condition — the next invocation immediately prompts again. Answering no only suppresses it for 12 hours. The prompt is inescapable.

This began after the recent merges on the integration branch (specifically commits 8e9b3c74 "Rectify: Ghost Hook Registrations Survive Git Revert" and 034362ff "Stale-Check Dismiss/Snooze State Split").

Root Cause

Three distinct defects compound to create the infinite prompt loop:

Defect 1 — _extract_cmds() collects ALL hook commands, not just autoskillit's

Location: src/autoskillit/hook_registry.py, _extract_cmds() function (lines 132-141)

The drift checker compares canonical hooks (from HOOK_REGISTRY) against deployed hooks (from ~/.claude/settings.json). But _extract_cmds() walks every event type, every entry, every hook in settings.json with zero filtering:

def _extract_cmds(hooks_dict: dict) -> set[str]:
    return {
        hook.get("command", "")
        for event_entries in hooks_dict.values()
        if isinstance(event_entries, list)
        for entry in event_entries
        for hook in entry.get("hooks", [])
        if hook.get("command", "")
    }

Any user-defined, non-autoskillit hooks (e.g., wsl-notify-send.exe "Claude Code Task Complete!" or a custom plan-mode deny echo hook) are captured. Since they will never appear in HOOK_REGISTRY, they are permanently orphaned. This guarantees drift.orphaned >= N (where N = number of user-defined hooks) on every check, making the prompt fire unconditionally.

The irony: The filter already exists. _is_autoskillit_hook_command() in cli/_hooks.py (lines 23-28) correctly identifies autoskillit-owned hooks by checking for "autoskillit" in the command string or matching known script basenames. This filter is used by _evict_stale_autoskillit_hooks() during install — but _count_hook_registry_drift() never calls it.

# cli/_hooks.py:23-28 — this filter EXISTS but is NOT used by the drift checker
def _is_autoskillit_hook_command(command: str) -> bool:
    if "autoskillit" in command:
        return True
    known_scripts = {s for h in HOOK_REGISTRY for s in h.scripts}
    return any(command.endswith(script) or f"/{script}" in command for script in known_scripts)

Defect 2 — The YES-path writes no dismiss/snooze record and has no return

Location: src/autoskillit/cli/_stale_check.py, run_stale_check() (lines 334-342)

When the user accepts the reinstall:

if answer in ("", "y", "yes"):
    with terminal_guard():
        subprocess.run(["autoskillit", "install"], check=False, env=_skip_env)
    # ← NO dismiss state written
    # ← NO return statement
else:
    state["hooks"] = {
        "dismissed_at": datetime.now(UTC).isoformat(),
        "dismissed_version": current,
    }
    _write_dismiss_state(_home, state)

After autoskillit install completes (successfully or not), the function just falls off the end of the if block. No snooze or dismiss record is written to ~/.autoskillit/update_check.json. The next CLI invocation immediately re-runs the full drift check from scratch with zero cooldown.

Compare this with the binary-version YES-path (lines 285-308 in the same file), which correctly:

  • Calls _verify_update_result() to check if the update actually took effect
  • Writes a binary_snoozed record if the version didn't change
  • Returns unconditionally after handling

The hooks YES-path has none of this post-install handling.

Defect 3 — Dev-checkout vs uv-tool pkg_root() path mismatch

Location: src/autoskillit/hook_registry.py, generate_hooks_json() (line 94); src/autoskillit/core/paths.py, pkg_root()

generate_hooks_json() bakes absolute filesystem paths into every hook command string:

def generate_hooks_json() -> dict:
    hooks_dir = pkg_root() / "hooks"
    # ...
    {"type": "command", "command": f"python3 {hooks_dir / script}"}

pkg_root() resolves via importlib.resources.files("autoskillit"), so the path depends on which Python interpreter is running:

Installation pkg_root() resolves to
Dev checkout (editable install) /home/user/projects/generic_automation_mcp/src/autoskillit/
uv tool install ~/.local/share/uv/tools/autoskillit/lib/python3.13/site-packages/autoskillit/

When the stale check runs from the dev checkout but ~/.claude/settings.json was last written by the uv-tool binary's autoskillit install, the command strings are completely disjoint. Every canonical command is "missing" and every deployed command is "orphaned." Live measurement: missing=13, orphaned=15.

Even when the stale check's YES-path runs subprocess.run(["autoskillit", "install"]), the autoskillit binary resolved from $PATH may be the uv-tool copy — so it writes uv-tool paths into settings.json. But the next stale check runs from the dev checkout's Python, generating dev-checkout canonical paths. The two sets can never match.

Complete Data Flow of the Bug

User runs: autoskillit cook
  → main() in cli/app.py (line 707-710)
    → run_stale_check() in cli/_stale_check.py
      → _count_hook_registry_drift(~/.claude/settings.json)
        → generate_hooks_json()
            builds 13 canonical cmds with active pkg_root() paths
        → _load_settings_data(~/.claude/settings.json)
            reads 15 deployed cmds (different pkg_root paths + 2 user hooks)
        → _extract_cmds(canonical) → 13 commands
        → _extract_cmds(deployed)  → 15 commands (NO autoskillit filtering)
        → orphaned = deployed - canonical = 15 (completely disjoint sets)
        → missing  = canonical - deployed = 13
      → drift.orphaned=15 > 0 → prompt fires
      → User answers Y
        → subprocess.run(["autoskillit", "install"])
          → install runs from uv-tool binary (PATH lookup)
          → writes uv-tool paths to settings.json
          → completes successfully
        → NO dismiss record written
        → NO return statement — function ends
  → Next `autoskillit cook` → same check → same result → prompt again

Why This Started After Recent Merges

  1. Commit 034362ff ("Stale-Check Dismiss/Snooze State Split", Mar 27) shrunk _DISMISS_WINDOW from 7 days to 12 hours. The user's last hooks dismiss record (2026-03-27T04:34:31) is now 12+ days expired, so it provides no protection.

  2. Commit 8e9b3c74 ("Rectify: Ghost Hook Registrations Survive Git Revert", Apr 8) made the drift check bidirectional — it now counts both missing (canonical − deployed) AND orphaned (deployed − canonical). Previously only missing was checked. The orphaned direction is where the user-defined hooks leak in.

  3. The same commit added the Write|Edit / generated_file_write_guard.py entry to HOOK_REGISTRY, adding a genuinely new canonical command that doesn't exist in the deployed settings from the previous install.

Recommendations

Fix 1 — Filter deployed commands to autoskillit-only before comparison

In _count_hook_registry_drift() (hook_registry.py:157), filter deployed_cmds through the existing _is_autoskillit_hook_command() before computing the set difference:

from autoskillit.cli._hooks import _is_autoskillit_hook_command

deployed_cmds = {cmd for cmd in _extract_cmds(deployed_data.get("hooks", {}))
                 if _is_autoskillit_hook_command(cmd)}

This eliminates false orphans from user-defined hooks. The filter function already exists and is battle-tested in the eviction path.

Note on import direction: _is_autoskillit_hook_command currently lives in cli/_hooks.py (L3) while _count_hook_registry_drift is in hook_registry.py (L0). This would be a layering violation. The function should be moved to hook_registry.py (where HOOK_REGISTRY already lives) to keep the dependency clean.

Fix 2 — Write dismiss/snooze state and return after hooks install

In _stale_check.py:334-336, after the install subprocess completes:

  1. Re-run _count_hook_registry_drift() to verify drift is now zero
  2. If zero → write a short dismiss record and return
  3. If still non-zero (path mismatch) → write a 1-hour snooze record and return with a warning message explaining the dual-install situation

This mirrors the existing binary-version YES-path pattern.

Fix 3 — Normalize command comparison to script basenames

Instead of comparing full absolute path strings, extract just the script filename from each command before comparison. The script names (quota_check.py, branch_protection_guard.py, etc.) are the stable identifiers — the path prefix varies by installation. This makes drift detection immune to pkg_root() differences.

Alternatively, compare only the basename portion of each python3 /path/to/script.py command, since the basename is the invariant across dev-checkout and uv-tool installations.

Test Gaps

No existing test covers:

  1. Non-autoskillit hooks in settings.json being excluded from drift calculation
  2. The YES-path writing a dismiss/snooze record after install
  3. Drift detection when pkg_root() differs between the checking process and the installed hooks (dual-install scenario)

Metadata

Metadata

Assignees

No one assigned

    Labels

    recipe:remediationRoute: investigate/decompose before implementationstagedImplementation staged and waiting for promotion to main

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions