fleetnode: run with heartbeat + plugin scaffolding (stack 2/3)#323
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2226f776c0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
🔐 Codex Security Review
Review SummaryOverall Risk: MEDIUM Findings[MEDIUM] Optional Plugin Bootstrap Can Take Down Heartbeat-Only Daemon
NotesNo changed hunk showed SQL interpolation, auth bypass, pool/wallet rewriting, hardcoded payout addresses, protobuf wire-format changes, or new frontend/infrastructure exposure. I attempted a targeted Generated by Codex Security Review | |
2226f77 to
3694b1f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3694b1f5d0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
- fleetnode run daemon: state lock acquisition, session-refresh-on-tick, 30s heartbeat loop. SIGINT/SIGTERM/SIGHUP shutdown. Logs to stdout for IDE-terminal visibility. - Plugin scaffolding lives on disk: resolvePluginsDir validates <exe-dir>/plugins ownership + permissions, validatePluginFiles checks each entry (rejects symlinks + group/world-writable + non-owner files). - Orphan reaper sweeps stray plugin children from prior crashes before the new agent spawns its own. - Plugin manager wrapper (pluginDiscoverer / newPluginDiscoverer) loads subprocess plugins via hashicorp/go-plugin. No control loop yet, so the manager just bootstraps and idles; reportFromDiscovered + synthesizeIdentifier ride along for the discovery layer. - stubGatewayClient in run_test.go satisfies the full gateway interface (UploadHeartbeat / ReportDiscoveredDevices / ControlStream) so PR 3 can wire the control loop without touching the test fixture. - README grows Plugins, Run, Troubleshooting sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- newPluginDiscoverer now calls manager.Shutdown with a bounded timeout on LoadPlugins failure. Aggregate-error returns can still leave earlier plugins started, and the previous no-op cleanup leaked those subprocesses when the agent exited. - reapOrphanedPlugins now consults ppid. Two agents sharing the binary and plugins layout but holding different state locks no longer kill each other's live plugin children. Reap fires only when the parent is init (ppid == 1) or no longer present in the ps snapshot. - Add orphan_reaper_test.go covering the live-parent skip, the self-pid skip, and the dead-parent reap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3694b1f to
dc50f50
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dc50f5019b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
- Reject symlinked plugins dir at resolve time so the loader and orphan reaper agree on the canonical path. - Wire signal.NotifyContext above plugin loading so SIGTERM during the up-to-60s LoadPlugins window aborts cleanly. Plumb the ctx through reapOrphanedPlugins and newPluginDiscoverer. - Bound the cleanup-path manager.Shutdown to 10s instead of context.Background(), matching the partial-load error path. - Add 5s timeout on the orphan-reaper ps invocation; a hung ps no longer blocks startup while the state lock is held. - Tighten the reaper match to direct children of pluginsAbs so an operator process invoked from a subpath isn't reaped. - Snapshot state under stateMu before SaveState so the marshal doesn't race the tokenSource goroutine the control loop adds. - Rename run() parameter stderr -> logOutput to match the actual destination (os.Stdout). - Cover empty/whitespace ps output, kill-failure-continues, subdirectory-process-skip, symlinked plugins dir, and isolated group-/world-writable bits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5c0ff64ca
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR (stack 2/3) enables fleetnode run to operate as a long-running heartbeat-only daemon and introduces runtime scaffolding for the upcoming discovery control loop: plugin directory validation, orphaned plugin process reaping, and a plugin-backed discoverer wrapper.
Changes:
- Implement
fleetnode runheartbeat loop with session refresh and signal-based shutdown. - Add plugin directory resolution + safety validation (ownership/permissions, reject symlinks) and plugin discovery scaffolding.
- Add orphan plugin subprocess reaper and expand README + tests around the new behaviors.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| server/cmd/fleetnode/run.go | Heartbeat daemon loop, state-lock usage, plugin bootstrap wiring, signal handling |
| server/cmd/fleetnode/run_test.go | Stub gateway extended to satisfy upcoming gateway interface needs |
| server/cmd/fleetnode/README.md | Document plugins, run behavior, security constraints, and troubleshooting |
| server/cmd/fleetnode/plugins_dir.go | Resolve <exe-dir>/plugins and enforce safety checks before use |
| server/cmd/fleetnode/plugins_dir_unix.go | Unix ownership/permission validation for plugin dir and plugin binaries |
| server/cmd/fleetnode/plugins_dir_windows.go | Windows plugin dir existence checks (ACL validation deferred) |
| server/cmd/fleetnode/plugins_dir_test.go | Unit tests for plugin dir resolution and per-file validation (non-Windows) |
| server/cmd/fleetnode/orphan_reaper.go | Non-Windows orphan plugin process sweep using ps output + SIGKILL |
| server/cmd/fleetnode/orphan_reaper_windows.go | Windows no-op orphan reaper placeholder |
| server/cmd/fleetnode/orphan_reaper_test.go | Unit tests for orphan reaper selection logic (non-Windows) |
| server/cmd/fleetnode/control.go | Plugin discoverer wrapper + report/identifier synthesis utilities |
| server/cmd/fleetnode/control_test.go | Unit tests for report synthesis + identifier behavior |
- orphan_reaper: drop filepath.EvalSymlinks so reaper prefix matches the unresolved path the loader passes to exec.Command (resolvePluginsDir already rejects symlinked leaf; path components above can still be symlinks and were producing a loader-vs-reaper path mismatch). - orphan_reaper: re-check strings.HasPrefix(argv0, prefix) after truncating at the first space so plugin install paths containing spaces (e.g. "/opt/Proto Fleet/plugins/x") don't slice into the prefix and panic with a slice-bounds error. - orphan_reaper_windows: match the Unix signature (ctx context.Context, string, *slog.Logger) so the shared run.go caller compiles on Windows. - signals_unix / signals_windows: extract defaultSignals() to a platform-specific helper so syscall.SIGHUP (Unix-only) no longer leaks into the shared run.go file. - run.go: log "using injected discoverer" when r.discoverer is non-nil and resolvedPluginsDir is empty (test-injection path), instead of the misleading "heartbeat only" message. - plugins_dir_windows: refuse plugin loading when a plugins dir is present, until proper Windows ACL validation lands. Heartbeat-only mode (no plugins dir) remains the default; this only closes the silent-no-validation case Codex Security Review flagged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 82c698016b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
The previous fix re-checked HasPrefix(argv0, prefix) after truncating on the first space, which prevented the slice-bounds panic but also caused the reaper to silently skip every orphan when the install path itself contains whitespace. ps space-joins argv without quoting, so any argv0 parsing from the reconstructed command is ambiguous when the prefix contains spaces. Switch to matching against the actual installed-plugin paths from os.ReadDir(pluginsDir): e.command must equal an allowed path, or start with an allowed path followed by a space (its first argument). This sidesteps ps's argv-joining ambiguity entirely and reaps correctly on paths like "/opt/Proto Fleet/plugins/...". Test TestReapOrphans_PluginsPathWithSpaces now asserts the orphans ARE reaped (previously asserted they were skipped without panic, which is the regression Codex flagged in discussion_r3314133920). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 465bddca5c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
- plugins_dir_unix: apply ownership and writability checks to every regular file in the plugins dir, not just executables. A non-exec file owned by another user can be chmod +x'd between validation and plugin load, becoming a stealth RCE vector under the agent uid. - plugins_dir_unix: add validatePathChain to walk every ancestor of the plugins dir and require root/agent ownership + non-writable perms (with a sticky-bit exception for /tmp-style dirs). Walks both the original path and the symlink-resolved form so a swappable symlink in the chain is caught via its containing-dir's perms and the underlying target tree is caught via the resolved walk. - plugins_dir_windows: matching no-op stub so resolvePluginsDir's shared call sequence type-checks. - orphan_reaper: hard-code /bin/ps instead of letting exec resolve via PATH. A hostile $PATH could inject a shim that runs as the daemon during startup; /bin/ps is canonical on Linux and macOS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e6565a6124
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Removed verbose docblocks and what-comments; kept the non-obvious why on ppid filtering, os.ReadDir matching, /bin/ps hard-coding, the non-exec ownership check, validatePathChain's two-walk strategy, the sticky-bit exception, and the cleanup ctx separation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings in: - #325 RBAC swap: List/GetActive/Update curtailment RPCs now gated via RequirePermission; new entries in ProcedurePermissions. - #329 UpdateCurtailmentEvent client wiring + edit-modal UI. - #321 Curtailment start/restore action UI. - #322 / #323 fleetnode enrollment + heartbeat stack. Conflict resolution: server/internal/handlers/middleware/rpc_permissions.go. Main added CurtailmentService reads + Update + DeviceCollection + DeviceSet + ErrorQuery + FleetManagement entries; the branch added IngestCurtailmentSignal. Took the union — folded the Ingest entry into the CurtailmentService block alongside Update/reads/AdminTerminate. handler.go auto-merged cleanly; build + targeted tests pass.
Summary
PR 2 of 3. Stacked on #322. Lights up `fleetnode run` as a heartbeat-only daemon and carries the agent's runtime scaffolding (plugin-dir validator, orphan reaper, plugin manager wrapper) that PR 3's control loop consumes.
Stack
Test plan
🤖 Generated with Claude Code