Skip to content

fix: three dogfood-discovered bugs in user-facing install path#7

Merged
Roo4L merged 5 commits into
masterfrom
fix/curl-installer-org-default-and-purge-sudoers
May 2, 2026
Merged

fix: three dogfood-discovered bugs in user-facing install path#7
Roo4L merged 5 commits into
masterfrom
fix/curl-installer-org-default-and-purge-sudoers

Conversation

@Roo4L
Copy link
Copy Markdown
Owner

@Roo4L Roo4L commented May 2, 2026

Summary

Fixes three bugs in the v0.3.0/v0.4.0 user-facing install path discovered while running the rc12 dogfood (Tier 1, fresh ubuntu:24.04 Docker container). All three reproduce against current master / v0.4.0; the bats suite passed without catching them because no test exercised the user-facing curl-pipe-bash path end-to-end on a fresh target.

# Bug Fix
1 install.sh defaults ORG=agentlinuxhttps://github.com/agentlinux/agent-linux/... 404s. The v0.4.0 one-liner is broken. Default to ORG=Roo4L. Override-via-env intact.
2 --purge leaves /etc/sudoers.d/agentlinux behind — orphaned NOPASSWD grant after the agent user is removed (Phase 5.1 / ADR-012 / BHV-07 regression). Add Step 3.5 to run_purge: rm -f /etc/sudoers.d/agentlinux.
3 claude-code/uninstall.sh trips its own ${AGENTLINUX_AGENT_HOME:?} guard during --purge (cosmetic; log_warn caught it but output was noisy). Export AGENTLINUX_AGENT_HOME=/home/agent when run_purge invokes per-agent uninstall.sh.

Bats coverage extended: tests/bats/40-registry-cli.bats INST-04 @test now also asserts /etc/sudoers.d/agentlinux is gone after --purge so this regression cannot recur silently.

Dogfood evidence (fresh ubuntu:24.04 Docker, Tier 1)

=== STEP 1: pipe install.sh through bash ===
[INFO] agentlinux-install complete (transcript: /var/log/agentlinux-install.log)

=== STEP 2: agentlinux --version === → 0.3.0
=== STEP 3: agentlinux list === → 3 agents

=== STEP 4: agentlinux install claude-code ===
claude-code: installed, reports: 2.1.98 (Claude Code)
claude-code: install complete (AGT-02b version-lock satisfied)

=== STEP 5: claude --version === → 2.1.98 (Claude Code)  ✓ AGT-02b

=== STEP 6: claude update (AGT-02 — THE bug class) ===
Current version: 2.1.98
Successfully updated from 2.1.98 to version 2.1.126
OK_NO_PERMISSION_ERRORS                                  ✓ AGT-02

=== STEP 7: --purge ===
[INFO] running uninstall.sh for claude-code
claude-code: uninstall complete
[INFO] --purge complete

=== POST-PURGE FILE-SYSTEM CHECKS ===
OK_opt_gone
OK_user_gone
OK_sudoers_gone        ← the new fix
OK_home_gone
OK_node_kept

8 of 8 dogfood signals pass.

Test plan

  • Local pre-commit (12 hooks) green including gitleaks and Detect hardcoded secrets.
  • Local Tier 1 dogfood end-to-end in fresh docker run --rm ubuntu:24.04.
  • CI test workflow green on this branch (Docker bats matrix + gitleaks + cli-unit).
  • Reviewer to confirm shape: ship as v0.4.1 patch release on merge, or batch with other v0.4.x work?

🤖 Generated with Claude Code

Roo4L and others added 5 commits May 2, 2026 13:46
…ll path

Discovered while running the rc12 dogfood (Tier 1, fresh ubuntu:24.04 docker
container). All three reproduce against current master / v0.4.0; the bats
suite passed without catching them because none of the existing tests
exercised the user-facing curl-pipe-bash path end-to-end on a fresh target.

1. Wrong default ORG in curl-installer:
   install.sh defaulted ORG=agentlinux. The actual repo is
   github.com/Roo4L/Agent-Linux, so the GET against
   https://github.com/agentlinux/agent-linux/releases/download/...
   returned HTTP 404. Fixed: default to ORG=Roo4L.

2. --purge regression: sudoers drop-in left behind (INST-04 / BHV-07):
   Phase 5.1 (ADR-012) added /etc/sudoers.d/agentlinux via 20-sudoers.sh
   but run_purge was never updated. After --purge, the drop-in stayed —
   orphaned NOPASSWD grant after the agent user was removed.
   Fixed: rm -f /etc/sudoers.d/agentlinux as Step 3.5 of run_purge.

3. Per-agent uninstall.sh tripped AGENTLINUX_AGENT_HOME guard on --purge:
   run_purge invoked uninstall.sh recipes without setting
   AGENTLINUX_AGENT_HOME (runner.ts normally provides it). Cosmetic
   warning, purge continued, but noisy. Fixed: export the var when
   invoking the recipes.

Bats coverage extended: 40-registry-cli.bats INST-04 @test now also
asserts /etc/sudoers.d/agentlinux is gone after --purge.

Verified: full local Tier 1 dogfood in fresh ubuntu:24.04 docker container
with the fixed install.sh. SHA256 verify ✓, provisioner ✓,
agentlinux install claude-code ✓, claude --version = 2.1.98 ✓,
claude update exits 0 with zero EACCES (2.1.98 → 2.1.126) ✓,
--purge leaves opt + agent user + /home/agent + sudoers gone, Node kept ✓.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discovered by user dogfood: `agentlinux install gsd` reported success but
Claude Code showed zero /gsd-* commands. Root cause: get-shit-done-cc is a
BOOTSTRAPPER, not the GSD command set itself. npm install only puts the
bootstrapper binary on PATH; the user (or our recipe) must then run
`get-shit-done-cc --global --claude` to actually copy the GSD skill set
(~79 skills + hooks + statusline) into ~/.claude/skills/.

Our recipe was technically correct (npm install succeeded, binary on
PATH, banner matched pin) but the user-visible intent ("install GSD")
was not satisfied. AGT-04 only checked the banner; it never checked
that ~/.claude/skills/gsd-* was populated.

install.sh: add post-npm-install step `get-shit-done-cc --global --claude`.

uninstall.sh: add pre-npm-uninstall step
`get-shit-done-cc --global --claude --uninstall` for symmetric removal.
Also `hash -r` before the post-uninstall `command -v` check — bash hashes
the binary path during the bootstrapper invocation; without hash -r the
cached entry reports the now-deleted file as still-resolvable.

tests/bats/50-agents.bats AGT-04: new @test asserts
~/.claude/skills/gsd-* count >= 10 after `agentlinux install gsd`.
Closes the bats coverage gap that allowed the regression.

Verified end-to-end in fresh ubuntu:24.04 docker container:
  install → 79 gsd-* skills wired ✓
  remove  → 0 gsd-* skills, binary gone ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…i (agent CLI)

The user's intent for the "playwright" catalog entry was Microsoft's
@playwright/cli — the token-efficient command-line tool for coding agents
that ships a Claude Code skill — NOT the playwright test framework.

The previous recipe installed `playwright@1.59.1` (the test framework),
downloaded ~281 MB of chromium browser, and called `--with-deps` to
apt-install system libraries — none of which the user wanted. The user
wanted: an agent-friendly CLI that surfaces /playwright skills inside
Claude Code, the same shape as gsd.

Catalog: replace the `playwright` entry with `playwright-cli`:
  id              = playwright-cli
  npm_package_name = @playwright/cli
  pinned_version   = 0.1.11   (latest published, verified via npm registry)
  homepage         = https://playwright.dev/agent-cli/installation

Recipe (mirrors the gsd skill-bootstrapper pattern from the prior commit):
  install.sh
    1. npm install -g @playwright/cli@<pin>
    2. playwright-cli install --skills        # wires ~/.claude/skills/playwright-cli/
  uninstall.sh
    1. playwright-cli install --skills --uninstall (best-effort) +
       defensive `find ... -iname '*playwright*' -exec rm -rf` to handle
       version drift in the bootstrapper's --uninstall flag coverage
    2. npm uninstall -g @playwright/cli
    3. hash -r before the post-uninstall command -v check

Bats coverage updated:
  tests/bats/50-agents.bats AGT-05 — three @tests now cover playwright-cli
  instead of the test framework: --version matches pin, ~/.claude/skills/
  has a *playwright* entry (catches recipe regressing to npm-only), and
  CLI-03 idempotency holds on re-install.

Verified end-to-end in fresh ubuntu:24.04 docker container:
  agentlinux install playwright-cli → ~/.claude/skills/playwright-cli/   ✓
  playwright-cli --version          → 0.1.11                              ✓
  agentlinux remove playwright-cli  → skill gone, binary gone             ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Findings consolidated from bash-engineer + security-engineer + qa-engineer
+ catalog-auditor + behavior-coverage-auditor.

BLOCKERS (would have failed CI):
- tests/bats/50-agents.bats AGT-05: typo `playwright-cli-cli` → `playwright-cli`
  on the @test name (line 251) and the `agentlinux install` invocation
  inside the idempotency @test (line 286). The previous string would have
  hit "no such agent" and made the @test vacuously red.
- tests/bats/40-registry-cli.bats CAT-01: `grep -qw 'playwright'` would NOT
  match `playwright-cli` because `-` is not a word boundary. Switched to
  `grep -qF ' playwright-cli '` against a space-padded id stream so the
  whole-token check is preserved without the word-boundary trap.
- tests/bats/40-registry-cli.bats CAT-04: stale spot-checks for
  id=="playwright" + 1.59.1. Updated to id=="playwright-cli" + 0.1.11.

IMPORTANT (would have shipped quality regressions):
- tests/bats/50-agents.bats teardown_file: orphan `remove --force playwright`
  → `remove --force playwright-cli` so teardown actually cleans the new id.
- tests/bats/50-agents.bats AGT-01 mode loop: parity with the claude
  variant — added `grep -Eq '[0-9]+\.[0-9]+\.[0-9]+'` semver-shape check so
  an exit-0 with empty output (e.g. an upstream regression in --version
  under non-TTY stdin) would not silently pass.
- tests/bats/50-agents.bats AGT-04/AGT-05 stale-state false-positive: added
  defensive `rm -rf ~/.claude/skills/{gsd-*,*playwright*}` in setup_file
  BEFORE the per-agent installs so the skill-wired assertions fail loud
  when the recipe regresses to "npm install only" — the regression those
  @tests are supposed to catch.
- plugin/catalog/agents/gsd/uninstall.sh: bootstrapper `--uninstall` is
  best-effort (older versions don't ship it). Added defensive
  `find ~/.claude/skills -maxdepth 1 -type d -name 'gsd-*' -exec rm -rf {} +`
  after the bootstrapper call so 79+ skill dirs don't leak when --uninstall
  is broken/absent. Mirrors playwright-cli's pattern.
- plugin/catalog/agents/{gsd,playwright-cli}/install.sh: bootstrapper
  invocations were unguarded under `set -e`. A re-install on `--force` or
  a partial-state recovery would abort the recipe even though npm install
  + version check already succeeded. Wrap each with `|| echo ... >&2` and
  rely on the post-bootstrapper skill-presence assertion as the real truth
  check — closes idempotency hole.
- plugin/catalog/agents/playwright-cli/install.sh: tighten skill-presence
  match from broad `-iname '*playwright*'` to `-name 'playwright-cli*'`
  (-maxdepth 1, -type d). Matches the install side and avoids accidentally
  asserting against unrelated user skills.
- plugin/catalog/agents/playwright-cli/uninstall.sh: tighten the skill
  cleanup find from `-iname '*playwright*'` (could collateral-damage a
  hand-rolled `~/.claude/skills/playwright-notes/`) to `-name
  'playwright-cli*'`. Drop `2>/dev/null` so legit rm errors surface in the
  installer transcript.

NITS:
- tests/bats/50-agents.bats AGT-04/05 __fail messages: dropped the leading
  `agent` token (copy-paste artifact); replaced `~/.claude/...` (would
  shellcheck-warn SC2088 inside single quotes) with literal
  `/home/agent/.claude/...`.
- CLI-02 stale comment: "three real agents (claude-code, gsd, playwright)"
  → `playwright-cli`. Diagnostic strings updated.

Verified end-to-end in fresh ubuntu:24.04 docker container:
  install gsd → 79 gsd-* skills wired                                   ✓
  install playwright-cli → 1 playwright-cli* skill dir wired            ✓
  re-install playwright-cli → "already installed; no-op"                ✓
  remove gsd → 0 gsd-* skills, binary gone                              ✓
  remove playwright-cli → 0 playwright-cli* dirs, binary gone           ✓

DEFERRED (flagged in PR comments, not blocking):
- env-via-`env` hardening at plugin/bin/agentlinux-install:250
  (pre-existing pattern; matters only if a future stricter sudoers ships).
- REQUIREMENTS.md AGT-04 / AGT-05 spec text drift vs new test contract
  (architectural — cleaner as a separate spec-update PR, possibly
  introducing a cross-cutting AGT-06 for "agent install wires its skill
  set into Claude Code" so per-agent IDs don't duplicate the contract).
- Tag re-validation symmetry in packaging/curl-installer/install.sh:111-113
  (pre-existing on master; not introduced by this PR).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #7 review pass missed a runtime issue that the bats CI surfaced:
`playwright-cli install --skills` calls initWorkspace() which mkdirs
./.playwright in the CURRENT directory. AgentLinux dispatches recipes
from /opt/agentlinux-src/ — a read-only repo bind-mount in the Docker
harness — so the bootstrapper crashes with EACCES on .playwright before
it can write any skill into ~/.claude/skills/.

Local dogfood missed this because the local container had been
manipulated (catalog overlays, etc.) leaving CWD writable. The CI test
file (50-agents.bats) runs the recipe AFTER 40-registry-cli.bats's
INST-04 --purge, the recovery installer kicks in, and the recipe
dispatches with CWD anchored to /opt/agentlinux-src/.

Fix: wrap the bootstrapper invocation in a subshell with
`cd "${AGENTLINUX_AGENT_HOME}"` so .playwright lands at
/home/agent/.playwright (agent-owned, writable, cleaned by `userdel -r`
on --purge).

Verified by reproducing the CI scenario locally:
  install (FRESH) before fix → EACCES, no skill wired           ✗
  install (FRESH) after fix  → 100% download, skill wired        ✓
  install (POST-PURGE) after recovery + fix → skill wired         ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Roo4L Roo4L force-pushed the fix/curl-installer-org-default-and-purge-sudoers branch from a3eae67 to 7402fce Compare May 2, 2026 13:46
@Roo4L Roo4L merged commit 3ebbab0 into master May 2, 2026
10 of 11 checks passed
@Roo4L Roo4L deleted the fix/curl-installer-org-default-and-purge-sudoers branch May 2, 2026 13:59
Roo4L added a commit that referenced this pull request May 2, 2026
…ler allowlist

Two blockers surfaced by the first 26.04 PR cycle (the third — Playwright's
chromium-install rejecting ubuntu26.04 — is moot after master's #7 dogfood
fix swapped the catalog from `playwright` (full + chromium) to `playwright-cli`
(@playwright/cli, JS-only, no per-OS browser recipe)).

1. INST-02 / BHV-07 byte-stable re-run failed with
   "install: No such file or directory" only on the second installer pass.
   Diagnosed via strace: Ubuntu 26.04 ships uutils-coreutils 0.7.0 (Rust
   rewrite). Its `install` recursively readlink-chases /dev/stdin →
   /proc/self/fd/0 → "pipe:[NNN]" and ENOENTs whenever the destination
   already exists. The first run creates the file (succeeds); the
   idempotent re-run tries to overwrite the existing file (fails). GNU
   coreutils opens fd 0 directly and never hits this path.

   Fix: add a portable `write_file_atomic <mode> <dest>` helper to
   plugin/lib/idempotency.sh — same atomic-rename semantics as
   `install -m <mode> /dev/stdin <dest>` but via a same-directory tmpfile
   so it works on both GNU and uutils. Function-scoped RETURN trap mirrors
   ensure_marker_block's tmpfile cleanup pattern. Three call sites in
   plugin/provisioner/40-path-wiring.sh (profile.d, agentlinux.env,
   cron.d) migrated. The /dev/null source path is unaffected and stays.

2. INST-03 curl-installer fixture rejected 26.04 because
   packaging/curl-installer/install.sh has its own detect_ubuntu_version
   allowlist that the AGE-11 patch missed. Extended to 22.04|24.04|26.04
   in lockstep with plugin/lib/distro_detect.sh; matching error message
   updated. Comment makes the lockstep invariant explicit.

Verified locally on the rebased branch: tests/docker/run.sh
ubuntu-{22.04,24.04,26.04} all PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Roo4L added a commit that referenced this pull request May 2, 2026
…ler allowlist

Two blockers surfaced by the first 26.04 PR cycle (the third — Playwright's
chromium-install rejecting ubuntu26.04 — is moot after master's #7 dogfood
fix swapped the catalog from `playwright` (full + chromium) to `playwright-cli`
(@playwright/cli, JS-only, no per-OS browser recipe)).

1. INST-02 / BHV-07 byte-stable re-run failed with
   "install: No such file or directory" only on the second installer pass.
   Diagnosed via strace: Ubuntu 26.04 ships uutils-coreutils 0.7.0 (Rust
   rewrite). Its `install` recursively readlink-chases /dev/stdin →
   /proc/self/fd/0 → "pipe:[NNN]" and ENOENTs whenever the destination
   already exists. The first run creates the file (succeeds); the
   idempotent re-run tries to overwrite the existing file (fails). GNU
   coreutils opens fd 0 directly and never hits this path.

   Fix: add a portable `write_file_atomic <mode> <dest>` helper to
   plugin/lib/idempotency.sh — same atomic-rename semantics as
   `install -m <mode> /dev/stdin <dest>` but via a same-directory tmpfile
   so it works on both GNU and uutils. Function-scoped RETURN trap mirrors
   ensure_marker_block's tmpfile cleanup pattern. Three call sites in
   plugin/provisioner/40-path-wiring.sh (profile.d, agentlinux.env,
   cron.d) migrated. The /dev/null source path is unaffected and stays.

2. INST-03 curl-installer fixture rejected 26.04 because
   packaging/curl-installer/install.sh has its own detect_ubuntu_version
   allowlist that the AGE-11 patch missed. Extended to 22.04|24.04|26.04
   in lockstep with plugin/lib/distro_detect.sh; matching error message
   updated. Comment makes the lockstep invariant explicit.

Verified locally on the rebased branch: tests/docker/run.sh
ubuntu-{22.04,24.04,26.04} all PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Roo4L added a commit that referenced this pull request May 2, 2026
… targets (#5)

* feat(matrix): add Ubuntu 26.04 (Resolute Raccoon) to v0.3.0 supported targets

AGE-11. Wires 26.04 LTS (released 2026-04-23, codename `resolute`) into
the v0.3.0 plugin matrix end-to-end:

  - tests/docker/Dockerfile.ubuntu-26.04 (mirrors 24.04 sibling)
  - tests/docker/run.sh accepts ubuntu-26.04
  - tests/qemu/cloud-images.txt + boot.sh codename map (resolute)
  - plugin/lib/distro_detect.sh + agentlinux-install --help
  - CI matrices: test.yml + nightly-qemu.yml + release.yml gates
  - README, PROJECT, REQUIREMENTS, CLAUDE.md, HARNESS.md copy refresh

Empirical "installer green on 26.04" verification deferred to the next
test.yml PR run + first nightly-qemu cycle.

* fix(26.04): unblock CI on Ubuntu 26.04 — uutils install + curl-installer allowlist

Two blockers surfaced by the first 26.04 PR cycle (the third — Playwright's
chromium-install rejecting ubuntu26.04 — is moot after master's #7 dogfood
fix swapped the catalog from `playwright` (full + chromium) to `playwright-cli`
(@playwright/cli, JS-only, no per-OS browser recipe)).

1. INST-02 / BHV-07 byte-stable re-run failed with
   "install: No such file or directory" only on the second installer pass.
   Diagnosed via strace: Ubuntu 26.04 ships uutils-coreutils 0.7.0 (Rust
   rewrite). Its `install` recursively readlink-chases /dev/stdin →
   /proc/self/fd/0 → "pipe:[NNN]" and ENOENTs whenever the destination
   already exists. The first run creates the file (succeeds); the
   idempotent re-run tries to overwrite the existing file (fails). GNU
   coreutils opens fd 0 directly and never hits this path.

   Fix: add a portable `write_file_atomic <mode> <dest>` helper to
   plugin/lib/idempotency.sh — same atomic-rename semantics as
   `install -m <mode> /dev/stdin <dest>` but via a same-directory tmpfile
   so it works on both GNU and uutils. Function-scoped RETURN trap mirrors
   ensure_marker_block's tmpfile cleanup pattern. Three call sites in
   plugin/provisioner/40-path-wiring.sh (profile.d, agentlinux.env,
   cron.d) migrated. The /dev/null source path is unaffected and stays.

2. INST-03 curl-installer fixture rejected 26.04 because
   packaging/curl-installer/install.sh has its own detect_ubuntu_version
   allowlist that the AGE-11 patch missed. Extended to 22.04|24.04|26.04
   in lockstep with plugin/lib/distro_detect.sh; matching error message
   updated. Comment makes the lockstep invariant explicit.

Verified locally on the rebased branch: tests/docker/run.sh
ubuntu-{22.04,24.04,26.04} all PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Roo4L added a commit that referenced this pull request May 3, 2026
Patch on top of v0.3.1 carrying the master-merged follow-ups since the
first dogfood failure (AL-18):

- PR #7  — three dogfood-discovered installer-path bugs
            (curl-installer ORG default, --purge sudoers cleanup,
             GSD + Playwright CLI skill bootstrap wiring,
             AGENTLINUX_AGENT_HOME export during purge,
             playwright-cli cd to writable home).
- PR #5  — Ubuntu 26.04 (Resolute Raccoon) added to v0.3.0 supported targets.
- PR #11 — bump GitHub Actions to Node 24-ready versions.
- PR #13 — review-reminder Stop hook + ADR-010 refinement (AL-23).
- PR #14 — workspace-cleanup skill.
- PR #4 / #9 / #10 — CI / website-deploy fixes.

scripts/build-release.sh enforces a three-way version lock — the tag's
base version (after stripping any -rc suffix) must equal both
plugin/cli/package.json.version and plugin/catalog/catalog.json.version.
Bumping both files to 0.3.2 so v0.3.2-rc1 (and eventually v0.3.2 final)
clear the gate.

Refs: AL-18

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Roo4L added a commit that referenced this pull request May 3, 2026
Patch on top of v0.3.1 carrying the master-merged follow-ups since the
first dogfood failure (AL-18):

- PR #7  — three dogfood-discovered installer-path bugs
            (curl-installer ORG default, --purge sudoers cleanup,
             GSD + Playwright CLI skill bootstrap wiring,
             AGENTLINUX_AGENT_HOME export during purge,
             playwright-cli cd to writable home).
- PR #5  — Ubuntu 26.04 (Resolute Raccoon) added to v0.3.0 supported targets.
- PR #11 — bump GitHub Actions to Node 24-ready versions.
- PR #13 — review-reminder Stop hook + ADR-010 refinement (AL-23).
- PR #14 — workspace-cleanup skill.
- PR #4 / #9 / #10 — CI / website-deploy fixes.

scripts/build-release.sh enforces a three-way version lock — the tag's
base version (after stripping any -rc suffix) must equal both
plugin/cli/package.json.version and plugin/catalog/catalog.json.version.
Bumping both files to 0.3.2 so v0.3.2-rc1 (and eventually v0.3.2 final)
clear the gate.

Refs: AL-18

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant