fix: three dogfood-discovered bugs in user-facing install path#7
Merged
Merged
Conversation
…ll path Discovered while running the rc12 dogfood (Tier 1, fresh ubuntu:24.04 docker container). All three reproduce against current master / v0.4.0; the bats suite passed without catching them because none of the existing tests exercised the user-facing curl-pipe-bash path end-to-end on a fresh target. 1. Wrong default ORG in curl-installer: install.sh defaulted ORG=agentlinux. The actual repo is github.com/Roo4L/Agent-Linux, so the GET against https://github.com/agentlinux/agent-linux/releases/download/... returned HTTP 404. Fixed: default to ORG=Roo4L. 2. --purge regression: sudoers drop-in left behind (INST-04 / BHV-07): Phase 5.1 (ADR-012) added /etc/sudoers.d/agentlinux via 20-sudoers.sh but run_purge was never updated. After --purge, the drop-in stayed — orphaned NOPASSWD grant after the agent user was removed. Fixed: rm -f /etc/sudoers.d/agentlinux as Step 3.5 of run_purge. 3. Per-agent uninstall.sh tripped AGENTLINUX_AGENT_HOME guard on --purge: run_purge invoked uninstall.sh recipes without setting AGENTLINUX_AGENT_HOME (runner.ts normally provides it). Cosmetic warning, purge continued, but noisy. Fixed: export the var when invoking the recipes. Bats coverage extended: 40-registry-cli.bats INST-04 @test now also asserts /etc/sudoers.d/agentlinux is gone after --purge. Verified: full local Tier 1 dogfood in fresh ubuntu:24.04 docker container with the fixed install.sh. SHA256 verify ✓, provisioner ✓, agentlinux install claude-code ✓, claude --version = 2.1.98 ✓, claude update exits 0 with zero EACCES (2.1.98 → 2.1.126) ✓, --purge leaves opt + agent user + /home/agent + sudoers gone, Node kept ✓. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discovered by user dogfood: `agentlinux install gsd` reported success but
Claude Code showed zero /gsd-* commands. Root cause: get-shit-done-cc is a
BOOTSTRAPPER, not the GSD command set itself. npm install only puts the
bootstrapper binary on PATH; the user (or our recipe) must then run
`get-shit-done-cc --global --claude` to actually copy the GSD skill set
(~79 skills + hooks + statusline) into ~/.claude/skills/.
Our recipe was technically correct (npm install succeeded, binary on
PATH, banner matched pin) but the user-visible intent ("install GSD")
was not satisfied. AGT-04 only checked the banner; it never checked
that ~/.claude/skills/gsd-* was populated.
install.sh: add post-npm-install step `get-shit-done-cc --global --claude`.
uninstall.sh: add pre-npm-uninstall step
`get-shit-done-cc --global --claude --uninstall` for symmetric removal.
Also `hash -r` before the post-uninstall `command -v` check — bash hashes
the binary path during the bootstrapper invocation; without hash -r the
cached entry reports the now-deleted file as still-resolvable.
tests/bats/50-agents.bats AGT-04: new @test asserts
~/.claude/skills/gsd-* count >= 10 after `agentlinux install gsd`.
Closes the bats coverage gap that allowed the regression.
Verified end-to-end in fresh ubuntu:24.04 docker container:
install → 79 gsd-* skills wired ✓
remove → 0 gsd-* skills, binary gone ✓
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…i (agent CLI) The user's intent for the "playwright" catalog entry was Microsoft's @playwright/cli — the token-efficient command-line tool for coding agents that ships a Claude Code skill — NOT the playwright test framework. The previous recipe installed `playwright@1.59.1` (the test framework), downloaded ~281 MB of chromium browser, and called `--with-deps` to apt-install system libraries — none of which the user wanted. The user wanted: an agent-friendly CLI that surfaces /playwright skills inside Claude Code, the same shape as gsd. Catalog: replace the `playwright` entry with `playwright-cli`: id = playwright-cli npm_package_name = @playwright/cli pinned_version = 0.1.11 (latest published, verified via npm registry) homepage = https://playwright.dev/agent-cli/installation Recipe (mirrors the gsd skill-bootstrapper pattern from the prior commit): install.sh 1. npm install -g @playwright/cli@<pin> 2. playwright-cli install --skills # wires ~/.claude/skills/playwright-cli/ uninstall.sh 1. playwright-cli install --skills --uninstall (best-effort) + defensive `find ... -iname '*playwright*' -exec rm -rf` to handle version drift in the bootstrapper's --uninstall flag coverage 2. npm uninstall -g @playwright/cli 3. hash -r before the post-uninstall command -v check Bats coverage updated: tests/bats/50-agents.bats AGT-05 — three @tests now cover playwright-cli instead of the test framework: --version matches pin, ~/.claude/skills/ has a *playwright* entry (catches recipe regressing to npm-only), and CLI-03 idempotency holds on re-install. Verified end-to-end in fresh ubuntu:24.04 docker container: agentlinux install playwright-cli → ~/.claude/skills/playwright-cli/ ✓ playwright-cli --version → 0.1.11 ✓ agentlinux remove playwright-cli → skill gone, binary gone ✓ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Findings consolidated from bash-engineer + security-engineer + qa-engineer + catalog-auditor + behavior-coverage-auditor. BLOCKERS (would have failed CI): - tests/bats/50-agents.bats AGT-05: typo `playwright-cli-cli` → `playwright-cli` on the @test name (line 251) and the `agentlinux install` invocation inside the idempotency @test (line 286). The previous string would have hit "no such agent" and made the @test vacuously red. - tests/bats/40-registry-cli.bats CAT-01: `grep -qw 'playwright'` would NOT match `playwright-cli` because `-` is not a word boundary. Switched to `grep -qF ' playwright-cli '` against a space-padded id stream so the whole-token check is preserved without the word-boundary trap. - tests/bats/40-registry-cli.bats CAT-04: stale spot-checks for id=="playwright" + 1.59.1. Updated to id=="playwright-cli" + 0.1.11. IMPORTANT (would have shipped quality regressions): - tests/bats/50-agents.bats teardown_file: orphan `remove --force playwright` → `remove --force playwright-cli` so teardown actually cleans the new id. - tests/bats/50-agents.bats AGT-01 mode loop: parity with the claude variant — added `grep -Eq '[0-9]+\.[0-9]+\.[0-9]+'` semver-shape check so an exit-0 with empty output (e.g. an upstream regression in --version under non-TTY stdin) would not silently pass. - tests/bats/50-agents.bats AGT-04/AGT-05 stale-state false-positive: added defensive `rm -rf ~/.claude/skills/{gsd-*,*playwright*}` in setup_file BEFORE the per-agent installs so the skill-wired assertions fail loud when the recipe regresses to "npm install only" — the regression those @tests are supposed to catch. - plugin/catalog/agents/gsd/uninstall.sh: bootstrapper `--uninstall` is best-effort (older versions don't ship it). Added defensive `find ~/.claude/skills -maxdepth 1 -type d -name 'gsd-*' -exec rm -rf {} +` after the bootstrapper call so 79+ skill dirs don't leak when --uninstall is broken/absent. Mirrors playwright-cli's pattern. - plugin/catalog/agents/{gsd,playwright-cli}/install.sh: bootstrapper invocations were unguarded under `set -e`. A re-install on `--force` or a partial-state recovery would abort the recipe even though npm install + version check already succeeded. Wrap each with `|| echo ... >&2` and rely on the post-bootstrapper skill-presence assertion as the real truth check — closes idempotency hole. - plugin/catalog/agents/playwright-cli/install.sh: tighten skill-presence match from broad `-iname '*playwright*'` to `-name 'playwright-cli*'` (-maxdepth 1, -type d). Matches the install side and avoids accidentally asserting against unrelated user skills. - plugin/catalog/agents/playwright-cli/uninstall.sh: tighten the skill cleanup find from `-iname '*playwright*'` (could collateral-damage a hand-rolled `~/.claude/skills/playwright-notes/`) to `-name 'playwright-cli*'`. Drop `2>/dev/null` so legit rm errors surface in the installer transcript. NITS: - tests/bats/50-agents.bats AGT-04/05 __fail messages: dropped the leading `agent` token (copy-paste artifact); replaced `~/.claude/...` (would shellcheck-warn SC2088 inside single quotes) with literal `/home/agent/.claude/...`. - CLI-02 stale comment: "three real agents (claude-code, gsd, playwright)" → `playwright-cli`. Diagnostic strings updated. Verified end-to-end in fresh ubuntu:24.04 docker container: install gsd → 79 gsd-* skills wired ✓ install playwright-cli → 1 playwright-cli* skill dir wired ✓ re-install playwright-cli → "already installed; no-op" ✓ remove gsd → 0 gsd-* skills, binary gone ✓ remove playwright-cli → 0 playwright-cli* dirs, binary gone ✓ DEFERRED (flagged in PR comments, not blocking): - env-via-`env` hardening at plugin/bin/agentlinux-install:250 (pre-existing pattern; matters only if a future stricter sudoers ships). - REQUIREMENTS.md AGT-04 / AGT-05 spec text drift vs new test contract (architectural — cleaner as a separate spec-update PR, possibly introducing a cross-cutting AGT-06 for "agent install wires its skill set into Claude Code" so per-agent IDs don't duplicate the contract). - Tag re-validation symmetry in packaging/curl-installer/install.sh:111-113 (pre-existing on master; not introduced by this PR). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #7 review pass missed a runtime issue that the bats CI surfaced: `playwright-cli install --skills` calls initWorkspace() which mkdirs ./.playwright in the CURRENT directory. AgentLinux dispatches recipes from /opt/agentlinux-src/ — a read-only repo bind-mount in the Docker harness — so the bootstrapper crashes with EACCES on .playwright before it can write any skill into ~/.claude/skills/. Local dogfood missed this because the local container had been manipulated (catalog overlays, etc.) leaving CWD writable. The CI test file (50-agents.bats) runs the recipe AFTER 40-registry-cli.bats's INST-04 --purge, the recovery installer kicks in, and the recipe dispatches with CWD anchored to /opt/agentlinux-src/. Fix: wrap the bootstrapper invocation in a subshell with `cd "${AGENTLINUX_AGENT_HOME}"` so .playwright lands at /home/agent/.playwright (agent-owned, writable, cleaned by `userdel -r` on --purge). Verified by reproducing the CI scenario locally: install (FRESH) before fix → EACCES, no skill wired ✗ install (FRESH) after fix → 100% download, skill wired ✓ install (POST-PURGE) after recovery + fix → skill wired ✓ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a3eae67 to
7402fce
Compare
3 tasks
Roo4L
added a commit
that referenced
this pull request
May 2, 2026
…ler allowlist Two blockers surfaced by the first 26.04 PR cycle (the third — Playwright's chromium-install rejecting ubuntu26.04 — is moot after master's #7 dogfood fix swapped the catalog from `playwright` (full + chromium) to `playwright-cli` (@playwright/cli, JS-only, no per-OS browser recipe)). 1. INST-02 / BHV-07 byte-stable re-run failed with "install: No such file or directory" only on the second installer pass. Diagnosed via strace: Ubuntu 26.04 ships uutils-coreutils 0.7.0 (Rust rewrite). Its `install` recursively readlink-chases /dev/stdin → /proc/self/fd/0 → "pipe:[NNN]" and ENOENTs whenever the destination already exists. The first run creates the file (succeeds); the idempotent re-run tries to overwrite the existing file (fails). GNU coreutils opens fd 0 directly and never hits this path. Fix: add a portable `write_file_atomic <mode> <dest>` helper to plugin/lib/idempotency.sh — same atomic-rename semantics as `install -m <mode> /dev/stdin <dest>` but via a same-directory tmpfile so it works on both GNU and uutils. Function-scoped RETURN trap mirrors ensure_marker_block's tmpfile cleanup pattern. Three call sites in plugin/provisioner/40-path-wiring.sh (profile.d, agentlinux.env, cron.d) migrated. The /dev/null source path is unaffected and stays. 2. INST-03 curl-installer fixture rejected 26.04 because packaging/curl-installer/install.sh has its own detect_ubuntu_version allowlist that the AGE-11 patch missed. Extended to 22.04|24.04|26.04 in lockstep with plugin/lib/distro_detect.sh; matching error message updated. Comment makes the lockstep invariant explicit. Verified locally on the rebased branch: tests/docker/run.sh ubuntu-{22.04,24.04,26.04} all PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Roo4L
added a commit
that referenced
this pull request
May 2, 2026
…ler allowlist Two blockers surfaced by the first 26.04 PR cycle (the third — Playwright's chromium-install rejecting ubuntu26.04 — is moot after master's #7 dogfood fix swapped the catalog from `playwright` (full + chromium) to `playwright-cli` (@playwright/cli, JS-only, no per-OS browser recipe)). 1. INST-02 / BHV-07 byte-stable re-run failed with "install: No such file or directory" only on the second installer pass. Diagnosed via strace: Ubuntu 26.04 ships uutils-coreutils 0.7.0 (Rust rewrite). Its `install` recursively readlink-chases /dev/stdin → /proc/self/fd/0 → "pipe:[NNN]" and ENOENTs whenever the destination already exists. The first run creates the file (succeeds); the idempotent re-run tries to overwrite the existing file (fails). GNU coreutils opens fd 0 directly and never hits this path. Fix: add a portable `write_file_atomic <mode> <dest>` helper to plugin/lib/idempotency.sh — same atomic-rename semantics as `install -m <mode> /dev/stdin <dest>` but via a same-directory tmpfile so it works on both GNU and uutils. Function-scoped RETURN trap mirrors ensure_marker_block's tmpfile cleanup pattern. Three call sites in plugin/provisioner/40-path-wiring.sh (profile.d, agentlinux.env, cron.d) migrated. The /dev/null source path is unaffected and stays. 2. INST-03 curl-installer fixture rejected 26.04 because packaging/curl-installer/install.sh has its own detect_ubuntu_version allowlist that the AGE-11 patch missed. Extended to 22.04|24.04|26.04 in lockstep with plugin/lib/distro_detect.sh; matching error message updated. Comment makes the lockstep invariant explicit. Verified locally on the rebased branch: tests/docker/run.sh ubuntu-{22.04,24.04,26.04} all PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Roo4L
added a commit
that referenced
this pull request
May 2, 2026
… targets (#5) * feat(matrix): add Ubuntu 26.04 (Resolute Raccoon) to v0.3.0 supported targets AGE-11. Wires 26.04 LTS (released 2026-04-23, codename `resolute`) into the v0.3.0 plugin matrix end-to-end: - tests/docker/Dockerfile.ubuntu-26.04 (mirrors 24.04 sibling) - tests/docker/run.sh accepts ubuntu-26.04 - tests/qemu/cloud-images.txt + boot.sh codename map (resolute) - plugin/lib/distro_detect.sh + agentlinux-install --help - CI matrices: test.yml + nightly-qemu.yml + release.yml gates - README, PROJECT, REQUIREMENTS, CLAUDE.md, HARNESS.md copy refresh Empirical "installer green on 26.04" verification deferred to the next test.yml PR run + first nightly-qemu cycle. * fix(26.04): unblock CI on Ubuntu 26.04 — uutils install + curl-installer allowlist Two blockers surfaced by the first 26.04 PR cycle (the third — Playwright's chromium-install rejecting ubuntu26.04 — is moot after master's #7 dogfood fix swapped the catalog from `playwright` (full + chromium) to `playwright-cli` (@playwright/cli, JS-only, no per-OS browser recipe)). 1. INST-02 / BHV-07 byte-stable re-run failed with "install: No such file or directory" only on the second installer pass. Diagnosed via strace: Ubuntu 26.04 ships uutils-coreutils 0.7.0 (Rust rewrite). Its `install` recursively readlink-chases /dev/stdin → /proc/self/fd/0 → "pipe:[NNN]" and ENOENTs whenever the destination already exists. The first run creates the file (succeeds); the idempotent re-run tries to overwrite the existing file (fails). GNU coreutils opens fd 0 directly and never hits this path. Fix: add a portable `write_file_atomic <mode> <dest>` helper to plugin/lib/idempotency.sh — same atomic-rename semantics as `install -m <mode> /dev/stdin <dest>` but via a same-directory tmpfile so it works on both GNU and uutils. Function-scoped RETURN trap mirrors ensure_marker_block's tmpfile cleanup pattern. Three call sites in plugin/provisioner/40-path-wiring.sh (profile.d, agentlinux.env, cron.d) migrated. The /dev/null source path is unaffected and stays. 2. INST-03 curl-installer fixture rejected 26.04 because packaging/curl-installer/install.sh has its own detect_ubuntu_version allowlist that the AGE-11 patch missed. Extended to 22.04|24.04|26.04 in lockstep with plugin/lib/distro_detect.sh; matching error message updated. Comment makes the lockstep invariant explicit. Verified locally on the rebased branch: tests/docker/run.sh ubuntu-{22.04,24.04,26.04} all PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
Roo4L
added a commit
that referenced
this pull request
May 3, 2026
Patch on top of v0.3.1 carrying the master-merged follow-ups since the first dogfood failure (AL-18): - PR #7 — three dogfood-discovered installer-path bugs (curl-installer ORG default, --purge sudoers cleanup, GSD + Playwright CLI skill bootstrap wiring, AGENTLINUX_AGENT_HOME export during purge, playwright-cli cd to writable home). - PR #5 — Ubuntu 26.04 (Resolute Raccoon) added to v0.3.0 supported targets. - PR #11 — bump GitHub Actions to Node 24-ready versions. - PR #13 — review-reminder Stop hook + ADR-010 refinement (AL-23). - PR #14 — workspace-cleanup skill. - PR #4 / #9 / #10 — CI / website-deploy fixes. scripts/build-release.sh enforces a three-way version lock — the tag's base version (after stripping any -rc suffix) must equal both plugin/cli/package.json.version and plugin/catalog/catalog.json.version. Bumping both files to 0.3.2 so v0.3.2-rc1 (and eventually v0.3.2 final) clear the gate. Refs: AL-18 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Roo4L
added a commit
that referenced
this pull request
May 3, 2026
Patch on top of v0.3.1 carrying the master-merged follow-ups since the first dogfood failure (AL-18): - PR #7 — three dogfood-discovered installer-path bugs (curl-installer ORG default, --purge sudoers cleanup, GSD + Playwright CLI skill bootstrap wiring, AGENTLINUX_AGENT_HOME export during purge, playwright-cli cd to writable home). - PR #5 — Ubuntu 26.04 (Resolute Raccoon) added to v0.3.0 supported targets. - PR #11 — bump GitHub Actions to Node 24-ready versions. - PR #13 — review-reminder Stop hook + ADR-010 refinement (AL-23). - PR #14 — workspace-cleanup skill. - PR #4 / #9 / #10 — CI / website-deploy fixes. scripts/build-release.sh enforces a three-way version lock — the tag's base version (after stripping any -rc suffix) must equal both plugin/cli/package.json.version and plugin/catalog/catalog.json.version. Bumping both files to 0.3.2 so v0.3.2-rc1 (and eventually v0.3.2 final) clear the gate. Refs: AL-18 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes three bugs in the v0.3.0/v0.4.0 user-facing install path discovered while running the rc12 dogfood (Tier 1, fresh
ubuntu:24.04Docker container). All three reproduce against current master / v0.4.0; the bats suite passed without catching them because no test exercised the user-facing curl-pipe-bash path end-to-end on a fresh target.install.shdefaultsORG=agentlinux→https://github.com/agentlinux/agent-linux/...404s. The v0.4.0 one-liner is broken.ORG=Roo4L. Override-via-env intact.--purgeleaves/etc/sudoers.d/agentlinuxbehind — orphaned NOPASSWD grant after the agent user is removed (Phase 5.1 / ADR-012 / BHV-07 regression).run_purge:rm -f /etc/sudoers.d/agentlinux.claude-code/uninstall.shtrips its own${AGENTLINUX_AGENT_HOME:?}guard during--purge(cosmetic;log_warncaught it but output was noisy).AGENTLINUX_AGENT_HOME=/home/agentwhenrun_purgeinvokes per-agentuninstall.sh.Bats coverage extended:
tests/bats/40-registry-cli.batsINST-04 @test now also asserts/etc/sudoers.d/agentlinuxis gone after--purgeso this regression cannot recur silently.Dogfood evidence (fresh
ubuntu:24.04Docker, Tier 1)8 of 8 dogfood signals pass.
Test plan
gitleaksandDetect hardcoded secrets.docker run --rm ubuntu:24.04.v0.4.1patch release on merge, or batch with other v0.4.x work?🤖 Generated with Claude Code