feat: dynamic numtide catalog (88+ agents), CI smoke tests, catalog defensive fixes#2
Merged
HashWarlock merged 140 commits intomasterfrom Apr 10, 2026
Merged
feat: dynamic numtide catalog (88+ agents), CI smoke tests, catalog defensive fixes#2HashWarlock merged 140 commits intomasterfrom
HashWarlock merged 140 commits intomasterfrom
Conversation
Complete design specification for the Pi + nixosandbox refactor covering: - Repository structure (two-package flat layout) - NDJSON protocol contract with truthfulness invariants - Runtime client and crash synthesis - Session manager, profiles, runtime bases, and reconciler - Pi extension tools (sandbox_run, read/write/list files, session info) - Rust runtime stub and 6 canonical protocol tests - Migration phases and definition of done Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements Tasks 9-12: full plan validation with error/warning codes, network observer stub with would-have-blocked computation, process supervisor with atomic sequence numbers and cancel-channel SIGTERM support, and a wired main.rs pipeline (parse -> validate -> supervise -> result). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements Tasks 13-18: version mismatch rejection, validation failure (RW_TARGET_NOT_ALLOWED), successful echo run with sequence checking, cancel flow observing cancel_requested lifecycle, crash synthesis TS-only unit tests (both with/without validation state), and degraded allowlist mode verification (ALLOWLIST_NOT_ENFORCED warning, effective mode=full). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… states) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… extension tools (Phase 6-7) Implements Tasks 19-24: hardcoded execution profiles, host-derived mount resolution with sha256 fingerprinting, UUID-based session directory management, orphan-PID reconciliation, 5 sandbox tool definitions, and the Pi extension entry point with session_start/session_shutdown lifecycle hooks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…bservation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pure function that converts a PlanPayload + EffectiveState into a Vec<String> of bwrap arguments, covering mounts, devices, proc, namespaces, env, cwd, and command. Includes 13 unit tests covering all argument categories. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…D warnings Validator now accepts BwrapAvailability to determine which namespaces can be applied, emitting NAMESPACE_DEGRADED warnings when bwrap is unavailable. Also adds env_allowlist filtering and namespacesApplied/envApplied fields to EffectiveState. main.rs wired to detect bwrap and pass it to validator. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…back Supervisor now accepts BwrapAvailability and dispatches to bwrap argv construction via plan_builder when available, or falls back to direct Command execution on platforms where bwrap is unavailable (e.g., macOS). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rser Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…wlist) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add @types/node to devDependencies (fixes CI typecheck failure) - Add matrix-based agent smoke tests: create sandbox with each agent (claude-code, codex, opencode, amp, droid, pi), verify binary launches - Tests run in parallel via strategy.matrix with fail-fast: false
- Use NIXOSANDBOX_BWRAP_PATH to point to Nix-built bwrap 0.11.0 (system bwrap from apt was found first via PATH, lacking --pivot-root) - Update actions/checkout v4→v6, actions/cache v4→v5, actions/setup-node v4→v6 to eliminate Node.js 20 deprecation warnings
…rrectly Three bugs fixed: - bwrap spawn ignored the path from detect(), always using bare "bwrap" which fails when NIXOSANDBOX_BWRAP_PATH points outside PATH - cmd_exec called load_profile() on custom:* sessions, printing a spurious warning since those aren't real profile files - docker ensure_image() leaked stderr to terminal during tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
--pivot-root is not a bubblewrap CLI option (it's the syscall bwrap uses internally). The correct approach is --ro-bind <rootfs> / which tells bwrap to mount the rootfs read-only at / and internally perform the pivot_root. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lags - Switch CI from Nix-built bwrap (non-setuid, blocked by AppArmor on Ubuntu 24.04) to apt-installed bwrap (setuid, handles namespaces) - Add inline verification that bwrap can actually create sandboxes - Add AppArmor sysctl fallback for containers/bubblewrap#632 - Add --die-with-parent and --new-session per bwrap docs for lifecycle safety Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Nix rootfs contains absolute symlinks pointing into /nix/store. Without binding /nix/store into the sandbox, all binaries and libraries are dangling symlinks, causing "execvp: No such file or directory" for every command. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bwrap needs /nix/store to exist as a directory inside the read-only rootfs so it can bind-mount the host Nix store there. Without it, bwrap fails with "Can't mkdir parents for /nix/store: Read-only file system" since it can't create directories on a ro-bind mount. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The old README described a REST API server that no longer exists. Rewritten to document the current architecture: Rust CLI, Nix catalog with 25+ agents from llm-agents.nix, bubblewrap isolation, session management, built-in profiles, and custom composition via --with. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Build/test commands, architecture overview, key design decisions, module responsibilities, and CI-specific notes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Guard all tool execute() args destructuring with (args ?? {}) to
handle Pi passing undefined for tools called with no arguments
- Patch TypeBox schemas to always include required[] array, which Pi
expects but TypeBox omits for all-optional parameter schemas
- Add Pi extension setup instructions to README
- Add .pi/ to gitignore (local extension wrappers are user-specific)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…llm-agents-pkgs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…LAUDE.md module descriptions
…l string d[\"agents\"] inside a single-quoted shell heredoc passes literal backslashes to Python, causing: SyntaxError: unexpected character after line continuation character. Extract to a variable (agents = d["agents"]) instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
nix/catalog.nixwith a fullllm-agents.nixpassthrough — 88+ agents now exposed automatically, zero maintenance on upstream bumpsopenclaw,hermes-agent(binary:hermes), andjulesto the agent matrix; added catalog count assertion (>50) with presence check for new agentsquery_cataloginnix.rsnow filters non-derivation attrs viafilterDrvsbefore accessing.meta.description;catalog.nixemits abuiltins.tracewarning whenllm-agents-pkgsis unexpectedly emptyTest Plan
cargo test)nix eval .#catalog.agentsreturns 88+ package namesnixosandbox catalog --jsonshows 88 agents, 24 tools🤖 Generated with Claude Code