Skip to content

feat: add fluid connect command and TUI connect wizard#76

Merged
aspectrr merged 20 commits into
mainfrom
feat/connect-command
Mar 15, 2026
Merged

feat: add fluid connect command and TUI connect wizard#76
aspectrr merged 20 commits into
mainfrom
feat/connect-command

Conversation

@aspectrr
Copy link
Copy Markdown
Owner

Summary

  • Adds fluid connect <address> CLI subcommand to connect to a fluid daemon and save config
  • Adds TUI connect wizard (/connect slash command) with SSH key auth, doctor checks, and step-by-step progress display
  • Fixes lint issues: replace WriteString(fmt.Sprintf(...)) with fmt.Fprintf throughout connect.go

Test plan

  • Run fluid connect <daemon-address> and verify config is saved
  • In TUI, run /connect and step through the wizard
  • Verify doctor check results render correctly (pass/fail/fix commands)
  • Run make lint in fluid-cli and confirm no staticcheck violations

🤖 Generated with Claude Code

aspectrr and others added 7 commits March 13, 2026 20:24
- Add `fluid connect <address>` CLI command: tests gRPC connection, runs doctor checks, and saves daemon to config
- Add TUI /connect wizard with multi-step flow (address input, connecting, doctor checks, done)
- Add Cancel() to AgentRunner interface and ESC key support to abort running agent
- Add SetSandboxService() to hot-swap daemon connection after /connect
- Extend readonly prepare/validate/shell in both cli and daemon
- Update web docs: daemon-setup-steps, quickstart-steps, cli-reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ect.go

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… in connect.go

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t.go

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 14, 2026 00:43
@claude
Copy link
Copy Markdown

claude Bot commented Mar 14, 2026

PR Review: feat: add fluid connect command and TUI connect wizard

Good overall PR - the feature is well-scoped and the tests are updated correctly. A few issues worth addressing before merge:


🔴 Security: --insecure defaults to true

fluid-cli/cmd/fluid/main.go

connectCmd.Flags().Bool("insecure", true, "skip TLS verification")

fluid-cli/internal/tui/connect.go

insecureInput.SetValue("true")
...
Insecure: insecure == "" || insecure == "true" || insecure == "yes" || insecure == "1",

Both the CLI flag and the TUI wizard default to insecure = true, meaning TLS verification is skipped by default. This should default to false. The current behavior trains users to ignore certificate validation, and any connection made in the default state is vulnerable to MITM attacks.


🔴 Race condition on cancelFunc

fluid-cli/internal/tui/agent.go

a.cancelFunc is written from the goroutine running Run() and read/written from the UI goroutine calling Cancel() — with no mutex or atomic operation protecting it:

// UI goroutine
func (a *FluidAgent) Cancel() {
    if a.cancelFunc != nil {  // read
        a.cancelFunc()
        a.cancelFunc = nil    // write
    }
}

// agent goroutine (inside Run's tea.Cmd)
a.cancelFunc = cancel         // write
defer func() {
    a.cancelFunc = nil        // write
}()

A sync/atomic value (storing the cancel func as a pointer) or a sync.Mutex is needed here. The Go race detector will flag this.


🟡 Double "Agent stopped." message

fluid-cli/internal/tui/model.go

When ESC is pressed, the handler immediately adds a system message and sets state to idle:

case "esc":
    if m.state == StateThinking {
        m.agentRunner.Cancel()
        m.addSystemMessage("Agent stopped.")  // #1
        m.state = StateIdle
        ...

Then, if/when the goroutine responds with AgentCancelledMsg:

case AgentCancelledMsg:
    m.addSystemMessage("Agent stopped.")  // #2

The user sees "Agent stopped." twice. One of the two paths should be removed; the AgentCancelledMsg handler is the correct place since it's the authoritative signal.


🟡 Empty SourceVM in AutoReadOnlyMsg after refactor

fluid-cli/internal/tui/agent.go

The old per-call deferred approach passed the actual source VM name:

a.sendStatus(AutoReadOnlyMsg{SourceVM: sourceVM, Enabled: true})

The new sticky approach sends an empty string:

a.sendStatus(AutoReadOnlyMsg{SourceVM: "", Enabled: true})

This results in the system message: "Auto read-only: accessing source VM " (empty). The currentSourceVM is already being set on the same code path — use it when constructing the message.


🟡 Sticky auto-read-only can get stuck

The new sticky mode transitions back only when the LLM calls a sandbox tool. But when autoReadOnly is active, a.readOnly = true causes GetReadOnlyTools() to be used, which excludes sandbox tools entirely. The LLM therefore never gets the opportunity to trigger the create_sandbox branch that would clear autoReadOnly. The agent can only escape this mode when the conversation ends. Consider keeping the previous per-call reset behavior, or adding an explicit /readonly off escape hatch.


🟡 Resource leak: svc.Close() skipped when saving

fluid-cli/cmd/fluid/main.go

defer func() {
    if skipSave {
        _ = svc.Close()
    }
}()

When skipSave = false (the default path), the gRPC connection is never explicitly closed. For a short-lived CLI command this isn't critical — the process exits immediately — but the intent is inconsistent with the skipSave branch. Unconditionally closing the connection is cleaner.


🟢 Minor: AGENTS.md formatting issue

The patch inserts a partial Makefile table header row before the "CLI Subcommands" section, resulting in a duplicate "Makefile Targets" heading in the rendered markdown. The orphaned | Target | Description | / |--------|-------------| rows above ### CLI Subcommands should be removed.


🟢 Minor: Demo script path vs. comment mismatch

scripts/haproxy-ssl-debug.sh contains the comment:

# Usage: ./demo/haproxy-ssl-debug.sh <ssh-host>

Either move the script to demo/ to match the comment, or update the comment.


✅ What's done well

  • The private key redaction (redactPrivateKeys) is a solid security improvement and is well-tested with all major PEM key types.
  • The openssl allowlist is thoughtfully scoped — read-only subcommands allowed, key-generation blocked in both shell and validator layers.
  • The connect wizard flow (StepAddress → StepConnecting → StepDoctor → StepDone) is clean and the retry-on-error path works.
  • Test coverage for prepare.go step indices is updated correctly throughout.
  • The lefthook.yaml fix for gofumpt path is a good portability improvement.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new “connect” flows (CLI subcommand + TUI wizard) to link the Fluid CLI to a running daemon, plus supporting docs/UX updates and some read-only/TLS-related enhancements.

Changes:

  • Add fluid connect <address> command to health-check a daemon, run doctor checks, and optionally persist daemon config.
  • Add a TUI /connect wizard modal for guided connection + saving config, and improve live output/tool display.
  • Expand read-only tooling for TLS diagnostics (allow openssl with restricted subcommands) and improve host preparation (journal/log group access).

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
web/src/routes/docs/cli-reference.tsx Documents new CLI entry points and TUI slash commands.
web/src/routes/_public/index.tsx Updates public FAQ copy (adds “What is Fluid…” explanation).
web/src/routes/_public.tsx Footer layout tweaks for better wrapping/responsiveness.
web/src/components/docs/quickstart-steps.tsx Adds quickstart step describing fluid connect and /connect.
web/src/components/docs/daemon-setup-steps.tsx Adds daemon setup step for connecting + --no-save tip.
scripts/haproxy-ssl-debug.sh Adds a remote demo script for HAProxy SSL cert/key mismatch debugging.
lefthook.yaml Adjusts gofumpt invocation path for pre-commit formatting.
fluid-daemon/internal/readonly/validate.go Allowlists openssl and adds restricted subcommands.
fluid-daemon/internal/readonly/shell.go Blocks additional dangerous openssl subcommands in restricted shell.
fluid-daemon/internal/readonly/prepare.go Adds best-effort journal/log group membership for fluid-readonly user.
fluid-cli/internal/tui/onboarding.go Updates onboarding completion copy to mention /connect.
fluid-cli/internal/tui/model.go Adds connect wizard mode, ESC agent-cancel, live-output header improvements, and extra tool formatting.
fluid-cli/internal/tui/messages.go Adds new message types (agent cancelled, sensitive redaction, connect close).
fluid-cli/internal/tui/connect.go Implements the /connect multi-step wizard modal.
fluid-cli/internal/tui/agent_test.go Adds tests for PEM private-key redaction behavior.
fluid-cli/internal/tui/agent.go Adds ESC cancellation support, auto read-only “sticky” transitions, private key redaction, and live output for source reads.
fluid-cli/internal/readonly/validate_test.go Adds tests for openssl allow/block behavior.
fluid-cli/internal/readonly/validate.go Allowlists openssl and adds restricted subcommands.
fluid-cli/internal/readonly/shell.go Blocks additional dangerous openssl subcommands in restricted shell script.
fluid-cli/internal/readonly/prepare_test.go Updates prepare tests to account for new journal/log group step.
fluid-cli/internal/readonly/prepare.go Adds best-effort journal/log group membership for fluid-readonly user.
fluid-cli/cmd/fluid/main.go Adds fluid connect cobra subcommand and implementation.
fluid-cli/AGENTS.md Updates dev docs for new /connect and CLI subcommand docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

question: 'What is Fluid and how does it work?',
answer:
'Not unrestricted SSH access. Fluid creates a dedicated fluid-readonly user with a restricted login shell. A client-side allowlist validates every command against ~50 permitted read-only commands (cat, ls, grep, ps, journalctl, etc.) before it is even sent. Server-side, the restricted shell blocks 50+ destructive patterns - sudo, rm, mv, chmod, wget, python, bash - at the OS level. Command substitution ($(...), backticks), output redirection, and subshells are all blocked. Even if the AI constructs something creative, the shell will not execute it.',
'Fluid is an AI agent built for working on Linux servers. It uses tools you already know like ssh, login shells, and Ansible playbooks to investigate Linux servers. Fluid creates a dedicated fluid-readonly user with a restricted login shell. A client-side allowlist validates every command against ~50 permitted read-only commands (cat, ls, grep, ps, journalctl, etc.) before it is even sent. Server-side, the restricted shell blocks 50+ destructive patterns - sudo, rm, mv, chmod, wget, python, bash - at the OS level. Command substitution ($(...), backticks), output redirection, and subshells are all blocked. Even if the AI constructs something creative, the shell will not execute it. If a sandbox host is configured and a possible fix can be constructed, Fluid will create a sandbox of the server to test changes and updates. Finally, Fluid will create an ansible playbook that can be applied to production to fix the issue.',
Comment thread fluid-cli/internal/tui/connect.go Outdated
Comment on lines +241 to +248
labels := []string{" Address:", " Name: ", " Insecure:"}
for i := range connectFieldCount {
prefix := " "
if connectField(i) == m.focused {
prefix = "> "
}
b.WriteString(fmt.Sprintf("%s%s %s\n", prefix, labels[i], m.inputs[i].View()))
}
Comment thread fluid-cli/internal/tui/agent.go Outdated
Comment on lines +82 to +149
@@ -121,17 +125,38 @@ func (a *FluidAgent) SetReadOnly(ro bool) {
a.readOnly = ro
}

// SetSandboxService hot-swaps the sandbox service (e.g. after /connect).
func (a *FluidAgent) SetSandboxService(svc sandbox.Service) {
if a.service != nil {
_ = a.service.Close()
}
a.service = svc
}

// sendStatus sends a status message through the callback if set
func (a *FluidAgent) sendStatus(msg tea.Msg) {
if a.statusCallback != nil {
a.statusCallback(msg)
}
}

// Cancel stops the currently running agent loop
func (a *FluidAgent) Cancel() {
if a.cancelFunc != nil {
a.cancelFunc()
a.cancelFunc = nil
}
}
Comment thread fluid-cli/internal/tui/agent.go Outdated
Comment on lines +566 to +583
// Sticky mode transitions: only change mode when tool category changes
switch tc.Function.Name {
case "run_source_command", "read_source_file":
if !a.readOnly {
a.autoReadOnly = true
a.readOnly = true
a.sendStatus(AutoReadOnlyMsg{SourceVM: "", Enabled: true})
}
case "create_sandbox", "destroy_sandbox", "run_command", "start_sandbox",
"stop_sandbox", "create_snapshot", "edit_file", "read_file",
"create_playbook", "add_playbook_task":
if a.autoReadOnly {
a.autoReadOnly = false
a.readOnly = false
a.currentSourceVM = ""
a.sendStatus(AutoReadOnlyMsg{SourceVM: "", Enabled: false})
}
}
Comment thread fluid-cli/AGENTS.md Outdated
Comment on lines +146 to +162
## Makefile Targets

| Target | Description |
|--------|-------------|
### CLI Subcommands

| Command | Description |
|---------|-------------|
| `fluid` | Launch the interactive TUI agent (default) |
| `fluid connect <address>` | Connect to a fluid daemon and save config |
| `fluid mcp` | Start MCP server on stdio |
| `fluid doctor` | Check daemon setup on a host |
| `fluid source prepare <host>` | Prepare a host for read-only access |
| `fluid source list` | List configured source hosts |
| `fluid update` | Self-update to the latest release |

## Makefile Targets
Comment thread fluid-cli/internal/readonly/validate.go Outdated
Comment on lines +116 to +125
"list": true,
},
"openssl": {
"x509": true,
"verify": true,
"s_client": true,
"crl": true,
"version": true,
"ciphers": true,
},
Comment on lines +409 to +418
svc, err := sandbox.NewRemoteService(addr, cpCfg, loadedCfg.Hosts)
if err != nil {
fmt.Printf(" %s Failed to dial: %v\n", red("[error]"), err)
return err
}
defer func() {
if skipSave {
_ = svc.Close()
}
}()
Comment on lines 706 to +716
case "esc":
if m.state == StateThinking {
if m.agentRunner != nil {
m.agentRunner.Cancel()
}
m.addSystemMessage("Agent stopped.")
m.state = StateIdle
m.thinking = false
m.updateViewportContent(false)
m.textarea.Focus()
return m, nil
Comment thread scripts/haproxy-ssl-debug.sh Outdated
Comment on lines +10 to +11
# ./demo/haproxy-ssl-debug.sh <ssh-host> # Setup the broken scenario
# ./demo/haproxy-ssl-debug.sh <ssh-host> --cleanup # Tear everything down
Comment on lines +117 to +122
"x509": true,
"verify": true,
"s_client": true,
"crl": true,
"version": true,
"ciphers": true,
@claude
Copy link
Copy Markdown

claude Bot commented Mar 14, 2026

Code Review

Overall this is a solid PR — the connect wizard is well-structured, private key redaction is a great security improvement, and ESC-to-cancel is a welcome UX addition. A few issues worth addressing:


Bugs

Double-close in runConnect (main.go)

svc is closed by both the defer and the explicit early-return calls:

defer func() { _ = svc.Close() }()  // line ~139

if err := svc.Health(ctx); err != nil {
    _ = svc.Close()  // double close — defer will also run
    return err
}
// same pattern for GetHostInfo failure

Remove the explicit svc.Close() calls inside the error branches — the defer is sufficient.

Doctor checks not cancellable during connect wizard (connect.go)

When the user hits ESC during StepDoctor, m.inConnect becomes false but doctor.RunAll is still running in a goroutine with its own 30s timeout context. When it finishes, it sends ConnectDoctorResultMsg back to a model that's no longer in connect mode. The message is silently dropped, which is fine, but the goroutine runs to completion regardless. Consider threading a cancellable context through runDoctorChecks tied to a model-level cancel func.

Race condition on readOnly/autoReadOnly fields (agent.go)

The auto-read-only logic was moved into executeTool, which sets and defers-restores a.readOnly and a.autoReadOnly without a mutex. These fields are also read in Run() on the same goroutine, so for the current serial execution model this is fine — but it's fragile. Worth noting at minimum.


UX / Design Issues

insecure field as a text input (connect.go)

Using a free-text field for a boolean (accepting "true"/"yes"/"1") is surprising UX. A simple [x] toggle or even just a (y/N) prompt would be clearer. This is a minor point but the current approach means users have to know the magic strings.

ESC cancellation double-reset (model.go)

When ESC is pressed during StateThinking, the model immediately resets to StateIdle. Later when the goroutine finishes and AgentCancelledMsg arrives, it resets to idle again. This is harmless but results in a redundant addSystemMessage("Agent stopped.") call on a potentially stale conversation. Could check if m.state != StateIdle before processing AgentCancelledMsg.


Code Quality

Duplicate TLS system prompt injection (agent.go)

The TLS debugging guidance string is constructed identically in two places:

  1. Inside the !a.cfg.HasSandboxHosts() branch (source-only mode)
  2. In the separate if len(a.cfg.PreparedHosts()) > 0 && (a.cfg.HasSandboxHosts() && !a.readOnly) block

Consider extracting it to a constant or package-level var.

Duplicate localhost detection (main.go + connect.go)

The check host == "localhost" || host == "127.0.0.1" || host == "::1" || host == "" appears verbatim in both runConnect and runDoctorChecks. A small helper (isLocalHost(host string) bool) would eliminate the duplication.

Color helpers duplicated in runConnect (main.go)

The green/red/dim ANSI helpers are defined inline in runConnect. This is fine for a CLI-only path, but if this pattern is used elsewhere consider moving to a shared location.


Security

openssl s_client on the allowlist

openssl s_client -connect host:443 is permitted in read-only mode. This can probe arbitrary internal network endpoints from the source host. Given the rest of the allowlist philosophy (read-only filesystem/process diagnostics), this seems intentional for TLS debugging, but worth a comment in validate.go explaining the rationale so future reviewers don't flag it.

Private key redaction (agent.go)

The redactPrivateKeys regex is solid and the test coverage is thorough (RSA, EC, PKCS8, mixed content, certificates-not-redacted). One edge case not covered: keys with Windows-style \r\n line endings — the (?s) flag handles multiline but \r before \n could theoretically trip the BEGIN/END marker matching if the content has CRLF. Low priority but worth a test case.


Tests

  • The prepare_test.go index updates are correct and thorough — good.
  • agent_test.go private key redaction tests cover the main cases well.
  • No tests for ConnectModel behavior (wizard state transitions, buildConfig, resolveAddress). These would be straightforward to add given the pure functional Bubble Tea model design.
  • runConnect in main.go has no unit tests — the core logic (upsert behavior, name fallback from hostname) could be extracted and tested independently.

Minor

  • lefthook.yaml change ($(go env GOPATH)/bin/gofumpt) is a good fix for machines where GOPATH/bin isn't in PATH.
  • haproxy-ssl-debug.sh is a useful demo script. It properly uses set -euo pipefail and handles cleanup. No issues.
  • fluid-daemon/internal/readonly/prepare.go gets the journal group addition too — good consistency, though the daemon-side prepare_test.go doesn't seem to have the corresponding index updates. Worth verifying those tests still pass.

🤖 Generated with Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented Mar 14, 2026

Code Review

Overall this is a solid, well-scoped PR. The connect wizard, ESC cancellation, and private-key redaction are all useful additions. Tests for redactPrivateKeys and the updated prepare_test.go steps look good. A few issues worth addressing before merging:


Bug: Service leak when ESC is pressed during a connection attempt

attemptConnect runs as an async tea.Cmd. If the user presses ESC while the dial is still in-flight, model.go sets m.inConnect = false and calls m.connectModel.GetService() — which returns nil at that moment because ConnectHealthResultMsg hasn't arrived yet. When the goroutine finishes and the message eventually arrives, m.inConnect is already false so the message is never forwarded to the connect model, and the sandbox.Service is created but never closed.

// model.go – ConnectCloseMsg handler
if svc := m.connectModel.GetService(); svc != nil {  // nil here if dial still in-flight
    _ = svc.Close()
}

One safe fix: store a pendingConnectMsg channel or simply close any service delivered via ConnectHealthResultMsg when m.inConnect is false:

case ConnectHealthResultMsg:
    if !m.inConnect {
        if msg.Service != nil {
            _ = msg.Service.Close()
        }
        return m, nil
    }
    // ... forward to connect model

Missing tests for connect.go and runConnect

CLAUDE.md requires tests for every code change. connect.go is 457 lines with non-trivial logic (resolveAddress, buildConfig, renderDoctorResults, the state-machine transitions) and has no connect_test.go. runConnect in main.go is also untested. At a minimum, unit tests for resolveAddress, buildConfig, and the doctor-skip logic for localhost would be valuable.


Code duplication: isLocalHost defined in two packages

isLocalHost appears identically in both fluid-cli/cmd/fluid/main.go and fluid-cli/internal/tui/connect.go. Since both packages already import the tui package (or vice-versa), the function should live in one place — either a small internal netutil helper or directly in the tui package, with main.go calling the one from tui.


Dead variable in /connect handler (model.go)

var cmd tea.Cmd   // always nil
if m.width > 0 && m.height > 0 {
    connectModel, _ := m.connectModel.Update(tea.WindowSizeMsg{...})
    //              ^^ cmd from Update is discarded
    m.connectModel = connectModel.(ConnectModel)
}
return m, tea.Batch(m.connectModel.Init(), cmd)  // cmd is always nil

cmd is declared but never assigned, so tea.Batch(m.connectModel.Init(), nil) is equivalent to just m.connectModel.Init(). Either capture the returned cmd from Update or drop cmd entirely:

return m, m.connectModel.Init()

Minor: a.currentSourceVM assignment left dangling in fallback path

In the refactored runSourceCommand / readSourceFile:

a.currentSourceVM = sourceVM   // set here
// ... old defer with cleanup removed

The cleanup of currentSourceVM now relies entirely on the outer defer in executeTool. That's correct, but the bare assignment in runSourceCommand (and readSourceFile) is now redundant — executeTool already sets it before calling these methods. Removing the assignment would make the invariant clearer and avoid confusion for future maintainers.


Minor: openssl s_client -connect with -starttls could probe non-HTTPS services

The s_client subcommand is allowed in validate.go (good for TLS diagnostics), but openssl s_client -connect host:25 -starttls smtp can interact with mail servers in ways that go beyond passive observation. This is a low-severity concern for the read-only shell context, but worth noting for the security model.


Nit: TLS debugging guidance comment on condition

if len(a.cfg.PreparedHosts()) > 0 && (a.cfg.HasSandboxHosts() && !a.readOnly) {
    systemPrompt += tlsDebuggingGuidance
}

The outer parentheses around a.cfg.HasSandboxHosts() && !a.readOnly don't add clarity — && already associates left-to-right. Minor but easy to simplify.


Overall the approach is clean. The ESC-cancel and private-key redaction are particularly good additions. The service leak and missing tests are the most important items to address.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 14, 2026

Code Review

Overall this is a solid PR with well-structured code, good test coverage, and clean separation between the CLI command and TUI wizard. A few issues worth addressing:


Bugs / Correctness

1. Race condition in SetSandboxService (agent.go:609–614)

SetSandboxService is called from the TUI Update loop (triggered by ConnectCloseMsg), but the agent goroutine may still be executing if Cancel() was not called first. Cancel() only signals cancellation — the goroutine checks ctx.Err() at discrete points and may still be mid-tool-call when the service is swapped.

Consider requiring Cancel() before SetSandboxService, or guarding with a mutex:

func (a *FluidAgent) SetSandboxService(svc sandbox.Service) {
    a.cancelMu.Lock()           // reuse existing mu, or add a dedicated one
    defer a.cancelMu.Unlock()
    if a.service != nil {
        _ = a.service.Close()
    }
    a.service = svc
}

Or document that callers must ensure no active agent run before calling this.

2. openssl req blocked for read-only inspection (validate.go:520–527)

openssl req -text -noout is a common read-only operation for inspecting CSR files, but req is not in subcommandRestrictions["openssl"], so it's blocked. The shell-level blocklist at ^openssl req also catches it. Agents won't be able to diagnose CSR issues.

Consider adding req to the allowlist with a note that openssl req -new -key ... is still caught by the shell-level blocklist, or add req + a validate-level check for dangerous flags like -new/-signkey.

3. ESC-cancel state reset is optimistic (model.go:917–926)

When ESC is pressed during StateThinking, m.state is immediately set to StateIdle and m.thinking = false, before the agent goroutine has actually stopped. Subsequent CommandOutputChunkMsg, ToolCompleteMsg, etc. that arrive in the narrow window between cancel signal and goroutine exit will be processed in StateIdle, which may cause visual glitches (e.g., a stray live output entry appearing after the agent is "stopped").

A simple guard: only apply those message handlers when m.state != StateIdle.


Code Quality

4. Duplicated auto-read-only logic in executeTool (agent.go ~694–732)

The identical 15-line block for setting/restoring autoReadOnly/readOnly is copied verbatim for both run_source_command and read_source_file. Extract to a small helper:

func (a *FluidAgent) withAutoReadOnly(sourceVM string, fn func() (any, error)) (any, error) {
    a.currentSourceVM = sourceVM
    wasAutoReadOnly := a.autoReadOnly
    if !a.readOnly {
        a.autoReadOnly = true
        a.readOnly = true
        a.sendStatus(AutoReadOnlyMsg{SourceVM: sourceVM, Enabled: true})
    }
    defer func() {
        a.currentSourceVM = ""
        if a.autoReadOnly && !wasAutoReadOnly {
            a.autoReadOnly = false
            a.readOnly = false
            a.sendStatus(AutoReadOnlyMsg{Enabled: false})
        }
    }()
    return fn()
}

5. CommandOutputStartMsg silently dropped when live output already active (model.go:1952–1954)

If m.showingLiveOutput is already true when CommandOutputStartMsg arrives (e.g., two read_source_file calls in quick succession), the new output is silently merged into the existing live output box with no header update. This is probably fine but worth a comment explaining the intentional behavior.


Minor / Nits

6. resolveAddress double-parses the address (connect.go:1367–1369)

After constructing addr via net.JoinHostPort, immediately calling net.SplitHostPort(addr) again as a validation step is redundant — JoinHostPort always produces a valid address. The double-parse adds no safety and can be removed.

7. Context timeout created once but used for two calls (main.go:170–184)

In runConnect, a single 10s timeout context is used for both svc.Health(ctx) and svc.GetHostInfo(ctx). If Health takes 9.9s, GetHostInfo has almost no budget. Consider giving each call its own timeout or increasing the total.

8. for i := range int(fieldInsecure) (connect.go:1253)

range int(fieldInsecure) is valid Go 1.22+ range-over-integer syntax but int(fieldInsecure) is a redundant cast since fieldInsecure is already an untyped int constant. Minor style nit.


Test Coverage

The new netutil, connect.go wizard, and redactPrivateKeys all have solid unit tests — nice work. The test updates to prepare_test.go to account for the new journal-group step are correct and thorough.

One gap: runConnect in main.go has no unit test. Since it dials a real gRPC endpoint this is hard to test directly, but the address-normalization logic (net.SplitHostPort + net.JoinHostPort) could be extracted and tested independently, similar to how resolveAddress is tested for the TUI wizard.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 14, 2026

test

@claude
Copy link
Copy Markdown

claude Bot commented Mar 14, 2026

Security Model

fluid.sh uses defense-in-depth to isolate AI agent workloads in VM sandboxes. This document describes the security architecture across the CLI, daemon, and API control plane.

Overview

Security is enforced across multiple layers:

  1. SSH Certificate Authority - short-lived certificates replace persistent credentials
  2. Principal separation - sandbox (sandbox) and read-only (fluid-readonly) access use distinct SSH principals
  3. Read-only enforcement - client-side allowlist + server-side restricted shell block destructive commands on source VMs
  4. VM isolation - QEMU microVM hypervisor isolation with copy-on-write overlays
  5. Secrets redaction - sensitive data stripped from LLM messages with deterministic tokens
  6. Human approval workflow - blocking confirmation dialogs for network access, resource limits, and source VM preparation
  7. Hash-chained audit log - tamper-evident append-only log of all agent actions
  8. Input validation - shell argument sanitization, path traversal prevention, file size limits
  9. API authentication and authorization - bcrypt passwords, session tokens, OAuth, RBAC
  10. Transport security - CORS lockdown, rate limiting, optional mTLS for gRPC
  11. Encryption at rest - AES-256-GCM for OAuth tokens and credentials
  12. Telemetry privacy - enabled by default (opt-out), anonymous, no user content collected

SSH Certificate Authority

The SSH CA signs short-lived certificates for all sandbox and source VM access. No persistent SSH keys are stored on VMs.

Key generation: Ed25519 CA key pair generated via ssh-keygen. Private key stored at configurable path (default /etc/virsh-sandbox/ssh_ca) with 0600 permissions. Public key at the same path with .pub suffix.

Certificate identity format:

user:{UserID}-vm:{VMID}-sbx:{SandboxID}-cert:{CertID}

Certificate properties:

  • Default TTL: 30 minutes
  • Maximum TTL: 60 minutes
  • Minimum TTL: 1 minute
  • Clock skew buffer: 1 minute (validity starts 1 minute before issuance)
  • Serial numbers: random 64-bit, incremented per issuance
  • Extensions: permit-pty only
  • Restrictions: no-port-forwarding, no-agent-forwarding, no-X11-forwarding

Permission validation: the CA enforces that private key files have mode 0600 or 0400 (no group/world access) before signing.

Source: fluid-daemon/internal/sshca/ca.go

Sandbox Credentials

Each sandbox gets ephemeral Ed25519 key pairs, generated on demand and cached until expiry.

  • Principal: "sandbox"
  • Key directory: {keyDir}/{sandboxID}/ with 0700 permissions
  • Private keys: 0600 permissions
  • Certificates: 0644 permissions
  • Auto-refresh: credentials regenerate 30 seconds before certificate expiry
  • Thread safety: per-sandbox mutexes prevent concurrent key generation
  • Cleanup: key files and cache entries removed on sandbox destroy

Pre-flight permission checks run before every SSH connection: the runner verifies the private key file has no group/world permissions (perm & 0077 == 0) and rejects the connection otherwise.

Source: fluid-daemon/internal/sshkeys/manager.go

Source VM Read-Only Mode

Source (golden) VMs are accessible only for inspection, never modification. The CLI connects directly to source hosts via SSH for read-only operations (not through the daemon). Three enforcement layers ensure safety.

Layer 1: Client-side allowlist

ValidateCommand() parses the command into pipeline segments and checks each segment's base command against an allowlist of ~70 safe commands.

Allowed categories:

  • File inspection: cat, ls, find, head, tail, stat, file, wc, du, tree, strings, md5sum, sha256sum, readlink, realpath, basename, dirname, base64
  • Process/system info: ps, top, pgrep, systemctl, journalctl, dmesg
  • Network info: ss, netstat, ip, ifconfig, dig, nslookup, ping
  • Disk info: df, lsblk, blkid
  • Package queries: dpkg, rpm, apt, pip (restricted subcommands only)
  • System info: uname, hostname, uptime, free, lscpu, lsmod, lspci, lsusb, arch, nproc
  • User info: whoami, id, groups, who, w, last
  • Misc: env, printenv, date, which, type, echo, test
  • Pipe targets: grep, awk, sed, sort, uniq, cut, tr, xargs

Subcommand restrictions (first argument must match allowlist):

  • systemctl: status, show, list-units, is-active, is-enabled
  • dpkg: -l, --list
  • rpm: -qa, -q
  • apt: list
  • pip: list

Metacharacter blocking:

  • Command substitution: $(...) and backticks
  • Process substitution: <(...) and >(...)
  • Output redirection: > and >>
  • Newlines: \n and \r

Source: fluid-daemon/internal/readonly/validate.go

Layer 2: Server-side restricted shell

A bash script installed at /usr/local/bin/fluid-readonly-shell on source VMs acts as the login shell for the fluid-readonly user. It:

  1. Denies interactive login (requires SSH_ORIGINAL_COMMAND)
  2. Blocks command substitution, subshells, output redirection, and newlines
  3. Parses the command on pipe/semicolon/&&/|| boundaries
  4. Checks each segment against a blocklist of destructive command patterns

Blocked command categories (regex patterns on each pipeline segment):

  • Privilege escalation: sudo, su
  • File mutation: rm, mv, cp, dd, chmod, chown, chgrp
  • Process control: kill, killall, pkill, shutdown, reboot, halt, poweroff
  • User management: useradd, userdel, usermod, groupadd, groupdel, passwd
  • Disk operations: mkfs, mount, umount, fdisk, parted
  • Network tools: wget, curl, scp, rsync, ftp, sftp
  • Interpreters/shells: python, perl, ruby, node, bash, sh, zsh, dash, csh
  • Editors: vi, vim, nano, emacs
  • Build tools: make, gcc, g++, cc
  • Package installation: apt install/remove/purge, apt-get, dpkg -i/--install/--remove/--purge, rpm -i/--install/-e/--erase, yum, dnf, pip install/uninstall
  • Service mutation: systemctl start/stop/restart/reload/enable/disable/daemon/mask/unmask/edit/set
  • Firewall: iptables, ip6tables, nft
  • Write tools: sed -i, tee, install

Source: fluid-daemon/internal/readonly/shell.go

Layer 3: SSH principal separation

Source VM credentials use the "fluid-readonly" principal. The sshd on source VMs is configured with:

  • TrustedUserCAKeys /etc/ssh/fluid_ca.pub
  • AuthorizedPrincipalsFile /etc/ssh/authorized_principals/%u

Only certificates with the fluid-readonly principal are accepted for the fluid-readonly user. Sandbox certificates (principal "sandbox") cannot authenticate to source VMs.

Source VM preparation (fluid source prepare) is idempotent and performs:

  1. Install restricted shell at /usr/local/bin/fluid-readonly-shell
  2. Create fluid-readonly system user with the restricted shell as login shell
  3. Copy CA public key to /etc/ssh/fluid_ca.pub
  4. Configure sshd to trust the CA key and use per-user authorized principals
  5. Create /etc/ssh/authorized_principals/fluid-readonly containing fluid-readonly
  6. Restart sshd

Source: fluid-daemon/internal/readonly/prepare.go, fluid-daemon/internal/sshkeys/manager.go

VM Isolation

  • Hypervisor: QEMU microVMs provide hardware-level isolation between sandboxes
  • Copy-on-write overlays: sandboxes are linked clones from golden images via qcow2 overlay files, so the source disk is never modified
  • Random MAC addresses: each clone gets a random MAC in the 52:54:00 QEMU prefix via crypto/rand
  • Network isolation: per-sandbox TAP devices attached to a bridge network; optional SSH ProxyJump for isolated networks not directly reachable from the host

Source: fluid-daemon/internal/microvm/manager.go

Secrets Redaction

Both the CLI and daemon include identical redaction packages that strip sensitive data from all outgoing LLM messages and restore tokens in responses before tool execution.

Built-in detectors:

  • SSH private keys (-----BEGIN ... PRIVATE KEY-----)
  • Connection strings: PostgreSQL, MySQL, MongoDB, Redis
  • AWS access keys (AKIA...)
  • API keys (sk-, key-, Bearer)
  • IPv4 and IPv6 addresses

Token format: [REDACTED_CATEGORY_N] - deterministic per category, allowing the LLM to reference redacted values without seeing them.

Configurability: custom regex patterns and allowlists can be added.

Source: fluid-cli/internal/redact/, fluid-daemon/internal/redact/

Human Approval Workflow

The TUI enforces human-in-the-loop confirmation for potentially dangerous operations via blocking dialogs. All dialogs default to "No"; Escape maps to "No".

Network access: blocking dialog before commands using curl, wget, nc, ssh, scp, rsync, and similar network tools. Default: deny.

Resource limits: warning dialog when sandbox creation exceeds available memory, CPU, or storage. Default: deny.

Source VM preparation: confirmation before running fluid source prepare. Default: deny.

Source: fluid-cli/internal/tui/confirm.go, fluid-cli/internal/tui/agent.go

Hash-Chained Audit Log

Append-only JSONL audit log at ~/.config/fluid/audit.jsonl with 0600 permissions.

Hash chain: each entry contains a SHA-256 hash computed from the previous entry's hash plus the current entry. The genesis entry uses an all-zeros hash.

Logged events:

  • Session start/end
  • User input (length only, never content)
  • LLM requests and responses
  • Tool calls with arguments, results, and duration

Integrity verification: VerifyChain() validates the entire chain and detects any tampering or insertion.

Size protection: configurable max file size; events are dropped when the limit is reached.

Source: fluid-cli/internal/audit/, fluid-daemon/internal/audit/

MCP Input Validation

All MCP tool inputs are validated before execution.

Shell argument validation: rejects empty strings, arguments over 32 KB, null bytes, and control characters.

Shell escaping: POSIX single-quote wrapping for all shell arguments.

File path validation: paths must be absolute, must not contain .. after cleaning, and must not contain null bytes.

File size limit: 10 MB maximum for file operations.

Source: fluid-cli/internal/mcp/validate.go

Config File Security

Permission checking: warns if config files are group- or world-readable (should be 0600).

Secret detection: flags insecure permissions when API keys or tokens are present in the config.

File creation: config files are saved with 0600 permissions.

Source: fluid-cli/internal/config/config.go

Telemetry Privacy

Telemetry is enabled by default (opt-out). Disable via telemetry.enable_anonymous_usage: false in config or ENABLE_ANONYMOUS_USAGE=false env var.

  • Requires build-time API key injection; defaults to a no-op service otherwise
  • Persistent anonymous UUID at ~/.config/fluid/telemetry_id for cross-session correlation
  • $ip is set to 0.0.0.0 to prevent IP logging
  • Tracks only: tool names, message counts, OS/arch
  • Never collects: commands, file contents, IP addresses, hostnames, user input
  • Daemon redaction scope: daemon audit uses built-in detectors only; CLI custom redaction patterns (redact.custom_patterns) do not apply on the daemon side

Source: fluid-cli/internal/telemetry/, fluid-daemon/internal/telemetry/

API Authentication

Password authentication: bcrypt with cost factor 12, minimum 8-character passwords, generic error messages to prevent user enumeration.

Session tokens: 32 cryptographically random bytes; only the SHA-256 hash is stored server-side. Cookies are HttpOnly, Secure, SameSite=Strict.

OAuth (GitHub, Google): CSRF state parameter uses 32 random bytes with constant-time comparison. OAuth tokens are encrypted at rest.

Host tokens: SHA-256 hashed in the database with expiry enforced at lookup time.

Source: api/internal/auth/

API Authorization (RBAC)

Three roles with numeric levels: owner (3), admin (2), member (1).

  • Per-resource membership verification on every request
  • Escalated operations (create sandbox, manage hosts) require admin or higher
  • Organization deletion: owner-only
  • Role checks use numeric comparison for consistent enforcement

Source: api/internal/rest/, api/internal/store/

API Transport Security

CORS: origin locked to configured frontend URL (not wildcard), credentials allowed.

Rate limiting: per-IP token bucket. Auth routes have custom limits:

  • Registration: 0.1 requests/sec, burst 5
  • Login: 0.2 requests/sec, burst 10

Proxy IP resolution: X-Forwarded-For only trusted from configured CIDR ranges.

Source: api/internal/rest/server.go, api/internal/rest/ratelimit.go

Encryption at Rest

AES-256-GCM with random nonce for OAuth tokens and Proxmox credentials.

Sensitive fields are excluded from JSON serialization: PasswordHash, tokens, and secrets are all tagged json:"-".

Source: api/internal/crypto/crypto.go

gRPC Security (Control Plane)

Daemon-to-API: optional mTLS with client certificate and custom CA pool. Defaults to insecure for backwards compatibility.

API-to-daemon: host token authentication via stream interceptor. Optional TLS (warns if disabled).

Concurrency limiting: max 64 concurrent command handlers.

Source: api/internal/grpc/, fluid-daemon/internal/agent/

Network Isolation (Daemon)

  • Bridge name validation: ^[a-zA-Z0-9_-]+$ regex
  • Per-sandbox TAP devices: each sandbox gets a dedicated TAP device attached to the bridge
  • Random MAC addresses: QEMU OUI prefix (52:54:00) with crypto/rand for remaining octets
  • IP discovery: reads DHCP leases and ARP table; no direct guest communication required
  • Lease file path sanitization: filepath.Base() prevents path traversal

Source: fluid-daemon/internal/network/

Sandbox Lifecycle (Janitor)

Background TTL enforcement for automatic sandbox cleanup.

  • Default TTL: 24 hours, with per-sandbox override
  • Check interval: every 1 minute
  • Cleanup: destroys expired sandboxes (VM process + storage + state)

Source: fluid-daemon/internal/janitor/

Command Execution Security

  • Shell escaping: environment variable values are single-quote escaped via shellQuote() (replaces ' with '\'')
  • Environment variable name sanitization: safeShellIdent() strips all characters except [A-Za-z0-9_], replacing them with underscores
  • SSH retry with backoff: transient connection failures retry up to 5 times with exponential backoff (2s initial, 30s max delay)
  • IP conflict detection: before every command execution, the service re-discovers the VM IP and validates it is not assigned to another running or starting sandbox
  • StrictHostKeyChecking disabled: ephemeral VMs have no stable host keys; trust is established via the CA certificate chain instead

Source: fluid-daemon/internal/microvm/manager.go

Path Traversal Prevention

VM names used in filesystem paths are sanitized via sanitizeVMName():

regex: [^A-Za-z0-9_-]  ->  replaced with underscore

This prevents ../ sequences and absolute path injection in source VM names when constructing key directories.

Source: fluid-daemon/internal/sshkeys/manager.go

File Permissions Summary

Asset Permission Notes
CA private key 0600 Enforced at initialization; 0400 also accepted
CA public key 0644 Readable by sshd on VMs
Key directories 0700 Per-sandbox and per-source-VM
Private keys 0600 Validated before every SSH connection
Certificates 0644 Standard SSH certificate permissions
CA work directory 0700 Temp directory for certificate operations
Restricted shell 0755 Executable on source VMs
Config file 0600 Warns if group/world readable
Audit log 0600 Append-only JSONL
State DB default SQLite, unencrypted

Timeouts

Operation Default Notes
Command execution 10 minutes Configurable per-call
IP discovery 2 minutes Polls DHCP leases or ARP table
SSH readiness 60 seconds Exponential backoff probes after IP discovery
SSH connect 15 seconds Per-connection ConnectTimeout
Certificate TTL 30 minutes Max 60 minutes, min 1 minute
Credential refresh 30 seconds before expiry Auto-regenerates keys and certificates
Sandbox TTL 24 hours Per-sandbox override; janitor enforced
Janitor interval 1 minute Background cleanup cycle
OAuth state cookie 600 seconds CSRF state parameter lifetime
Rate limiter cleanup 10 minutes Expired per-IP bucket removal

@claude
Copy link
Copy Markdown

claude Bot commented Mar 14, 2026

PLACEHOLDER

@claude
Copy link
Copy Markdown

claude Bot commented Mar 14, 2026

Security Model

fluid.sh uses defense-in-depth to isolate AI agent workloads in VM sandboxes. This document describes the security architecture across the CLI, daemon, and API control plane.

Overview

Security is enforced across multiple layers:

  1. SSH Certificate Authority - short-lived certificates replace persistent credentials
  2. Principal separation - sandbox (sandbox) and read-only (fluid-readonly) access use distinct SSH principals
  3. Read-only enforcement - client-side allowlist + server-side restricted shell block destructive commands on source VMs
  4. VM isolation - QEMU microVM hypervisor isolation with copy-on-write overlays
  5. Secrets redaction - sensitive data stripped from LLM messages with deterministic tokens
  6. Human approval workflow - blocking confirmation dialogs for network access, resource limits, and source VM preparation
  7. Hash-chained audit log - tamper-evident append-only log of all agent actions
  8. Input validation - shell argument sanitization, path traversal prevention, file size limits
  9. API authentication and authorization - bcrypt passwords, session tokens, OAuth, RBAC
  10. Transport security - CORS lockdown, rate limiting, optional mTLS for gRPC
  11. Encryption at rest - AES-256-GCM for OAuth tokens and credentials
  12. Telemetry privacy - enabled by default (opt-out), anonymous, no user content collected

SSH Certificate Authority

The SSH CA signs short-lived certificates for all sandbox and source VM access. No persistent SSH keys are stored on VMs.

Key generation: Ed25519 CA key pair generated via ssh-keygen. Private key stored at configurable path (default /etc/virsh-sandbox/ssh_ca) with 0600 permissions. Public key at the same path with .pub suffix.

Certificate identity format:

user:{UserID}-vm:{VMID}-sbx:{SandboxID}-cert:{CertID}

Certificate properties:

  • Default TTL: 30 minutes
  • Maximum TTL: 60 minutes
  • Minimum TTL: 1 minute
  • Clock skew buffer: 1 minute (validity starts 1 minute before issuance)
  • Serial numbers: random 64-bit, incremented per issuance
  • Extensions: permit-pty only
  • Restrictions: no-port-forwarding, no-agent-forwarding, no-X11-forwarding

Permission validation: the CA enforces that private key files have mode 0600 or 0400 (no group/world access) before signing.

Source: fluid-daemon/internal/sshca/ca.go

Sandbox Credentials

Each sandbox gets ephemeral Ed25519 key pairs, generated on demand and cached until expiry.

  • Principal: "sandbox"
  • Key directory: {keyDir}/{sandboxID}/ with 0700 permissions
  • Private keys: 0600 permissions
  • Certificates: 0644 permissions
  • Auto-refresh: credentials regenerate 30 seconds before certificate expiry
  • Thread safety: per-sandbox mutexes prevent concurrent key generation
  • Cleanup: key files and cache entries removed on sandbox destroy

Pre-flight permission checks run before every SSH connection: the runner verifies the private key file has no group/world permissions (perm & 0077 == 0) and rejects the connection otherwise.

Source: fluid-daemon/internal/sshkeys/manager.go

Source VM Read-Only Mode

Source (golden) VMs are accessible only for inspection, never modification. The CLI connects directly to source hosts via SSH for read-only operations (not through the daemon). Three enforcement layers ensure safety.

Layer 1: Client-side allowlist

ValidateCommand() parses the command into pipeline segments and checks each segment's base command against an allowlist of ~70 safe commands.

Allowed categories:

  • File inspection: cat, ls, find, head, tail, stat, file, wc, du, tree, strings, md5sum, sha256sum, readlink, realpath, basename, dirname, base64
  • Process/system info: ps, top, pgrep, systemctl, journalctl, dmesg
  • Network info: ss, netstat, ip, ifconfig, dig, nslookup, ping
  • Disk info: df, lsblk, blkid
  • Package queries: dpkg, rpm, apt, pip (restricted subcommands only)
  • System info: uname, hostname, uptime, free, lscpu, lsmod, lspci, lsusb, arch, nproc
  • User info: whoami, id, groups, who, w, last
  • Misc: env, printenv, date, which, type, echo, test
  • Pipe targets: grep, awk, sed, sort, uniq, cut, tr, xargs

Subcommand restrictions (first argument must match allowlist):

  • systemctl: status, show, list-units, is-active, is-enabled
  • dpkg: -l, --list
  • rpm: -qa, -q
  • apt: list
  • pip: list

Metacharacter blocking:

  • Command substitution: $(...) and backticks
  • Process substitution: <(...) and >(...)
  • Output redirection: > and >>
  • Newlines: \n and \r

Source: fluid-daemon/internal/readonly/validate.go

Layer 2: Server-side restricted shell

A bash script installed at /usr/local/bin/fluid-readonly-shell on source VMs acts as the login shell for the fluid-readonly user. It:

  1. Denies interactive login (requires SSH_ORIGINAL_COMMAND)
  2. Blocks command substitution, subshells, output redirection, and newlines
  3. Parses the command on pipe/semicolon/&&/|| boundaries
  4. Checks each segment against a blocklist of destructive command patterns

Blocked command categories (regex patterns on each pipeline segment):

  • Privilege escalation: sudo, su
  • File mutation: rm, mv, cp, dd, chmod, chown, chgrp
  • Process control: kill, killall, pkill, shutdown, reboot, halt, poweroff
  • User management: useradd, userdel, usermod, groupadd, groupdel, passwd
  • Disk operations: mkfs, mount, umount, fdisk, parted
  • Network tools: wget, curl, scp, rsync, ftp, sftp
  • Interpreters/shells: python, perl, ruby, node, bash, sh, zsh, dash, csh
  • Editors: vi, vim, nano, emacs
  • Build tools: make, gcc, g++, cc
  • Package installation: apt install/remove/purge, apt-get, dpkg -i/--install/--remove/--purge, rpm -i/--install/-e/--erase, yum, dnf, pip install/uninstall
  • Service mutation: systemctl start/stop/restart/reload/enable/disable/daemon/mask/unmask/edit/set
  • Firewall: iptables, ip6tables, nft
  • Write tools: sed -i, tee, install

Source: fluid-daemon/internal/readonly/shell.go

Layer 3: SSH principal separation

Source VM credentials use the "fluid-readonly" principal. The sshd on source VMs is configured with:

  • TrustedUserCAKeys /etc/ssh/fluid_ca.pub
  • AuthorizedPrincipalsFile /etc/ssh/authorized_principals/%u

Only certificates with the fluid-readonly principal are accepted for the fluid-readonly user. Sandbox certificates (principal "sandbox") cannot authenticate to source VMs.

Source VM preparation (fluid source prepare) is idempotent and performs:

  1. Install restricted shell at /usr/local/bin/fluid-readonly-shell
  2. Create fluid-readonly system user with the restricted shell as login shell
  3. Copy CA public key to /etc/ssh/fluid_ca.pub
  4. Configure sshd to trust the CA key and use per-user authorized principals
  5. Create /etc/ssh/authorized_principals/fluid-readonly containing fluid-readonly
  6. Restart sshd

Source: fluid-daemon/internal/readonly/prepare.go, fluid-daemon/internal/sshkeys/manager.go

VM Isolation

  • Hypervisor: QEMU microVMs provide hardware-level isolation between sandboxes
  • Copy-on-write overlays: sandboxes are linked clones from golden images via qcow2 overlay files, so the source disk is never modified
  • Random MAC addresses: each clone gets a random MAC in the 52:54:00 QEMU prefix via crypto/rand
  • Network isolation: per-sandbox TAP devices attached to a bridge network; optional SSH ProxyJump for isolated networks not directly reachable from the host

Source: fluid-daemon/internal/microvm/manager.go

Secrets Redaction

Both the CLI and daemon include identical redaction packages that strip sensitive data from all outgoing LLM messages and restore tokens in responses before tool execution.

Built-in detectors:

  • SSH private keys (-----BEGIN ... PRIVATE KEY-----)
  • Connection strings: PostgreSQL, MySQL, MongoDB, Redis
  • AWS access keys (AKIA...)
  • API keys (sk-, key-, Bearer)
  • IPv4 and IPv6 addresses

Token format: [REDACTED_CATEGORY_N] - deterministic per category, allowing the LLM to reference redacted values without seeing them.

Configurability: custom regex patterns and allowlists can be added.

Source: fluid-cli/internal/redact/, fluid-daemon/internal/redact/

Human Approval Workflow

The TUI enforces human-in-the-loop confirmation for potentially dangerous operations via blocking dialogs. All dialogs default to "No"; Escape maps to "No".

Network access: blocking dialog before commands using curl, wget, nc, ssh, scp, rsync, and similar network tools. Default: deny.

Resource limits: warning dialog when sandbox creation exceeds available memory, CPU, or storage. Default: deny.

Source VM preparation: confirmation before running fluid source prepare. Default: deny.

Source: fluid-cli/internal/tui/confirm.go, fluid-cli/internal/tui/agent.go

Hash-Chained Audit Log

Append-only JSONL audit log at ~/.config/fluid/audit.jsonl with 0600 permissions.

Hash chain: each entry contains a SHA-256 hash computed from the previous entry's hash plus the current entry. The genesis entry uses an all-zeros hash.

Logged events:

  • Session start/end
  • User input (length only, never content)
  • LLM requests and responses
  • Tool calls with arguments, results, and duration

Integrity verification: VerifyChain() validates the entire chain and detects any tampering or insertion.

Size protection: configurable max file size; events are dropped when the limit is reached.

Source: fluid-cli/internal/audit/, fluid-daemon/internal/audit/

MCP Input Validation

All MCP tool inputs are validated before execution.

Shell argument validation: rejects empty strings, arguments over 32 KB, null bytes, and control characters.

Shell escaping: POSIX single-quote wrapping for all shell arguments.

File path validation: paths must be absolute, must not contain .. after cleaning, and must not contain null bytes.

File size limit: 10 MB maximum for file operations.

Source: fluid-cli/internal/mcp/validate.go

Config File Security

Permission checking: warns if config files are group- or world-readable (should be 0600).

Secret detection: flags insecure permissions when API keys or tokens are present in the config.

File creation: config files are saved with 0600 permissions.

Source: fluid-cli/internal/config/config.go

Telemetry Privacy

Telemetry is enabled by default (opt-out). Disable via telemetry.enable_anonymous_usage: false in config or ENABLE_ANONYMOUS_USAGE=false env var.

  • Requires build-time API key injection; defaults to a no-op service otherwise
  • Persistent anonymous UUID at ~/.config/fluid/telemetry_id for cross-session correlation
  • $ip is set to 0.0.0.0 to prevent IP logging
  • Tracks only: tool names, message counts, OS/arch
  • Never collects: commands, file contents, IP addresses, hostnames, user input
  • Daemon redaction scope: daemon audit uses built-in detectors only; CLI custom redaction patterns (redact.custom_patterns) do not apply on the daemon side

Source: fluid-cli/internal/telemetry/, fluid-daemon/internal/telemetry/

API Authentication

Password authentication: bcrypt with cost factor 12, minimum 8-character passwords, generic error messages to prevent user enumeration.

Session tokens: 32 cryptographically random bytes; only the SHA-256 hash is stored server-side. Cookies are HttpOnly, Secure, SameSite=Strict.

OAuth (GitHub, Google): CSRF state parameter uses 32 random bytes with constant-time comparison. OAuth tokens are encrypted at rest.

Host tokens: SHA-256 hashed in the database with expiry enforced at lookup time.

Source: api/internal/auth/

API Authorization (RBAC)

Three roles with numeric levels: owner (3), admin (2), member (1).

  • Per-resource membership verification on every request
  • Escalated operations (create sandbox, manage hosts) require admin or higher
  • Organization deletion: owner-only
  • Role checks use numeric comparison for consistent enforcement

Source: api/internal/rest/, api/internal/store/

API Transport Security

CORS: origin locked to configured frontend URL (not wildcard), credentials allowed.

Rate limiting: per-IP token bucket. Auth routes have custom limits:

  • Registration: 0.1 requests/sec, burst 5
  • Login: 0.2 requests/sec, burst 10

Proxy IP resolution: X-Forwarded-For only trusted from configured CIDR ranges.

Source: api/internal/rest/server.go, api/internal/rest/ratelimit.go

Encryption at Rest

AES-256-GCM with random nonce for OAuth tokens and Proxmox credentials.

Sensitive fields are excluded from JSON serialization: PasswordHash, tokens, and secrets are all tagged json:"-".

Source: api/internal/crypto/crypto.go

gRPC Security (Control Plane)

Daemon-to-API: optional mTLS with client certificate and custom CA pool. Defaults to insecure for backwards compatibility.

API-to-daemon: host token authentication via stream interceptor. Optional TLS (warns if disabled).

Concurrency limiting: max 64 concurrent command handlers.

Source: api/internal/grpc/, fluid-daemon/internal/agent/

Network Isolation (Daemon)

  • Bridge name validation: ^[a-zA-Z0-9_-]+$ regex
  • Per-sandbox TAP devices: each sandbox gets a dedicated TAP device attached to the bridge
  • Random MAC addresses: QEMU OUI prefix (52:54:00) with crypto/rand for remaining octets
  • IP discovery: reads DHCP leases and ARP table; no direct guest communication required
  • Lease file path sanitization: filepath.Base() prevents path traversal

Source: fluid-daemon/internal/network/

Sandbox Lifecycle (Janitor)

Background TTL enforcement for automatic sandbox cleanup.

  • Default TTL: 24 hours, with per-sandbox override
  • Check interval: every 1 minute
  • Cleanup: destroys expired sandboxes (VM process + storage + state)

Source: fluid-daemon/internal/janitor/

Command Execution Security

  • Shell escaping: environment variable values are single-quote escaped via shellQuote() (replaces ' with '\'')
  • Environment variable name sanitization: safeShellIdent() strips all characters except [A-Za-z0-9_], replacing them with underscores
  • SSH retry with backoff: transient connection failures retry up to 5 times with exponential backoff (2s initial, 30s max delay)
  • IP conflict detection: before every command execution, the service re-discovers the VM IP and validates it is not assigned to another running or starting sandbox
  • StrictHostKeyChecking disabled: ephemeral VMs have no stable host keys; trust is established via the CA certificate chain instead

Source: fluid-daemon/internal/microvm/manager.go

Path Traversal Prevention

VM names used in filesystem paths are sanitized via sanitizeVMName():

regex: [^A-Za-z0-9_-]  ->  replaced with underscore

This prevents ../ sequences and absolute path injection in source VM names when constructing key directories.

Source: fluid-daemon/internal/sshkeys/manager.go

File Permissions Summary

Asset Permission Notes
CA private key 0600 Enforced at initialization; 0400 also accepted
CA public key 0644 Readable by sshd on VMs
Key directories 0700 Per-sandbox and per-source-VM
Private keys 0600 Validated before every SSH connection
Certificates 0644 Standard SSH certificate permissions
CA work directory 0700 Temp directory for certificate operations
Restricted shell 0755 Executable on source VMs
Config file 0600 Warns if group/world readable
Audit log 0600 Append-only JSONL
State DB default SQLite, unencrypted

Timeouts

Operation Default Notes
Command execution 10 minutes Configurable per-call
IP discovery 2 minutes Polls DHCP leases or ARP table
SSH readiness 60 seconds Exponential backoff probes after IP discovery
SSH connect 15 seconds Per-connection ConnectTimeout
Certificate TTL 30 minutes Max 60 minutes, min 1 minute
Credential refresh 30 seconds before expiry Auto-regenerates keys and certificates
Sandbox TTL 24 hours Per-sandbox override; janitor enforced
Janitor interval 1 minute Background cleanup cycle
OAuth state cookie 600 seconds CSRF state parameter lifetime
Rate limiter cleanup 10 minutes Expired per-IP bucket removal

@claude
Copy link
Copy Markdown

claude Bot commented Mar 14, 2026

Code Review

Overall this is a solid, well-structured PR with good test coverage. The wizard UX, ESC cancellation, and private key redaction are all well done. A few issues worth addressing:

Bug: openssl req is blocked in shell but allowed in validate.go

Files: fluid-cli/internal/readonly/shell.go and validate.go (same in fluid-daemon)

shell.go adds the pattern to BLOCKED_PATTERNS that blocks all openssl req subcommands at the shell level. But validate.go allows openssl req with only -new/-signkey/-x509 blocked. The comment in validate.go says req is allowed for read-only inspection but the shell pattern blocks all of them including: openssl req -text -noout

The TestValidateCommand_Allowed test for that command passes at the Go level but the command will be rejected at runtime by the shell.

Fix: Either tighten the shell patterns to block only the specific dangerous flags, or remove req from subcommandRestrictions and rely entirely on the shell block. The two layers currently contradict each other.

Security: run_source_command output bypasses private key redaction

File: fluid-cli/internal/tui/agent.go

redactPrivateKeys() is applied to read_source_file and read_file. But an LLM could request 'cat /etc/ssl/private/server.key' via run_source_command, and that stdout would reach the LLM unredacted. The same redaction should be applied to command stdout/stderr before returning the result map.

Minor: Duplicate step numbering in runConnect

File: fluid-cli/cmd/fluid/main.go

Both the Connect-and-health-check block and the health timeout block are labeled // 1. The second should be // 2.

Minor: withAutoReadOnly closure captures cmdErr redundantly

File: fluid-cli/internal/tui/agent.go

In the run_source_command branch, cmdErr is assigned inside the closure (by capture) and then again from the return value of withAutoReadOnly. Both end up with the same value so there is no bug, but using a local variable inside the closure would make the flow easier to follow.

Concern: SetSandboxService caller contract is unenforced

File: fluid-cli/internal/tui/agent.go

The doc comment says 'Must be called after Cancel() to avoid race conditions,' but there is no runtime protection if a caller skips Cancel() first. If the agent is mid-execution when SetSandboxService closes the old service, this could cause a crash. A lightweight protection would be asserting a.cancelFunc == nil at the start of SetSandboxService.

Missing test: PrepareWithKey command count

File: fluid-cli/internal/readonly/prepare_test.go

The journal group step was added to both Prepare and PrepareWithKey, and the Prepare tests are correctly updated (10 to 11 commands). There is no command-count assertion for PrepareWithKey. Worth adding to catch future regressions. Also: fluid-daemon/internal/readonly/prepare.go gets the same change - are there daemon-side tests that need updating?

Positive highlights

  • ESC cancellation is cleanly implemented. The TUI correctly guards against in-flight messages (ToolCompleteMsg, CommandOutputChunkMsg, etc.) that arrive after state returns to idle.
  • netutil.IsLocalHost handles bracketed IPv6, loopback IPs, and empty hosts correctly with thorough tests.
  • redactPrivateKeys test suite is comprehensive: RSA, EC, PKCS8, CRLF, mixed content, multiple keys, and importantly verifies certificates are NOT redacted.
  • withAutoReadOnly cleanly eliminates the duplicated read-only state management that was copy-pasted between runSourceCommand and readSourceFile.
  • CommandOutputStartMsg pre-initializing the live output box before chunks arrive is a nice UX improvement.
  • The --no-save flag on fluid connect is a useful escape hatch for testing connectivity without modifying config.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

PR Review: feat: add fluid connect command and TUI connect wizard

Overall this is a solid, well-structured PR with good test coverage for the new netutil, ConnectModel, and redactPrivateKeys code. A few issues worth addressing before merge.

SECURITY

  1. adm group is broader than needed. The prepare.go change adds fluid-readonly to systemd-journal and adm. On many distros, adm grants read access to all of /var/log/, not just journal logs. Consider restricting to systemd-journal only, or documenting the intent explicitly.

  2. openssl s_client allows arbitrary outbound connections. openssl s_client -connect is now allowlisted and could probe internal network hosts from a source VM. Worth a comment acknowledging this accepted risk.

  3. validateOpenSSLArgs comment is misleading. The comment says dangerous openssl req operations are blocked by the shell-level blocklist in shell.go, but validateOpenSSLArgs also blocks them at the Go level. Both layers are correct, but the comment implies only shell-level exists. Should clarify both defenses are active since the Go-level validation is more reliable.

CORRECTNESS

  1. SensitiveContentRedactedMsg shows empty path for command output. In model.go the system message uses msg.Path which is empty for command stdout/stderr redaction (not file reads), producing: 'Private key detected in - redacted before sending to LLM'. Should fall back to msg.Host or use a different message format.

  2. attemptConnect uses a single context for two sequential RPC calls. In connect.go a single 10-second context is shared between Health() and GetHostInfo(). If Health takes 9 seconds, GetHostInfo only gets 1 second. Give each call its own timeout, as the CLI runConnect already does correctly.

  3. SetSandboxService panics in production code. The panic on cancelFunc != nil documents the contract but is harsh for production. Consider returning an error so callers can handle it gracefully.

CODE QUALITY

  1. Duplicate liveOutputCommand extraction logic. The block extracting command or path from m.currentToolArgs appears identically in both the CommandOutputStartMsg and CommandOutputChunkMsg handlers. Should be a private helper on Model.

  2. No test for runConnect (CLI). The TUI ConnectModel has good test coverage, but the CLI runConnect function in main.go has no tests. At minimum, the address normalization logic (appending :9091 when port is absent) should have a unit test.

  3. Silent overwrite without confirmation (CLI). fluid connect silently overwrites an existing config entry with the same name or address. A warning or --force flag would prevent accidental misconfigurations.

  4. tlsDebuggingGuidance may be appended twice to the system prompt. In agent.go, tlsDebuggingGuidance is appended in one branch when there are no sandbox hosts, and again in a second block when there are both prepared hosts and sandbox hosts. Verify these are mutually exclusive or add a guard.

  5. Duplicate openssl changes across fluid-cli and fluid-daemon. validate.go and shell.go changes are copy-pasted between both packages (existing pattern). Any future openssl subcommand change needs updating in two places.

MINOR

  • The lefthook.yaml fix using go env GOPATH for gofumpt is a good improvement.
  • haproxy-ssl-debug.sh is a useful demo script; consider a note at the top marking it as development/testing only.

Generated with Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

PR Review

Overall this is a solid, well-structured PR with good test coverage. The TUI wizard is cleanly implemented using the Bubbletea model pattern and the private key redaction is an important security feature. A few issues worth addressing.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

PR Review: feat: add fluid connect command and TUI connect wizard

Overall this is a solid, well-structured PR with good test coverage. The TUI wizard is cleanly implemented using the Bubbletea model pattern and the private key redaction is an important security feature. A few issues worth addressing.

Bug / Inconsistency: fluid-cli vs fluid-daemon group membership mismatch -- fluid-cli/internal/readonly/prepare.go adds only systemd-journal (adm omitted as overly broad per comment), but fluid-daemon/internal/readonly/prepare.go adds systemd-journal,adm. Pick one behavior consistently. The adm group provides broad read access to /var/log and is more permissive.

Race Condition: withAutoReadOnly accesses unprotected shared state -- withAutoReadOnly modifies a.currentSourceVM, a.autoReadOnly, and a.readOnly without any lock, but FluidAgent.Run executes as a goroutine (tea.Cmd). cancelMu only protects cancelFunc. If Cancel() is called while withAutoReadOnly is mid-execution, these fields could be read/written from two goroutines simultaneously.

Security Note: openssl s_client can reach arbitrary hosts -- The allowlist in validate.go permits openssl s_client -connect arbitrary-host:port. In a read-only shell this lets an agent initiate outbound TCP to any host:port. Consider whether this fits your threat model or restrict to localhost.

Minor Security: insecure flag help text -- The --insecure flag on connectCmd should include a stronger warning, e.g. appending (INSECURE: use only for local/dev daemons) so it is visible in --help output.

Missing Test Coverage: runConnect has no unit tests -- The core CLI flow in runConnect (address normalization, duplicate-host detection by name OR address, config save logic, fallback naming) is untested. The TUI model tests are thorough but this function contains meaningful logic that could use table-driven tests similar to TestResolveAddress.

Minor: Misleading comment in fluid-daemon/validate.go -- The comment for openssl req in subcommandRestrictions says only the shell-level blocklist blocks dangerous operations, omitting that validateOpenSSLArgs also blocks them at the Go level. The fluid-cli version correctly mentions both layers.

What's Good: Private key redaction is well-implemented with comprehensive test coverage across key types (RSA, EC, PKCS8, CRLF). netutil.IsLocalHost correctly handles all address forms - all tested. Agent ESC cancellation cleanly threads through cancelMu and drops in-flight results after cancel. colorFunc refactor is a clean deduplication. ConnectModel tests cover tab navigation, toggle, escape, and address resolution edge cases. OpenSSL validation with two-layer defense (Go-level validateOpenSSLArgs plus shell-level blocklist) is the right approach. CommandOutputStartMsg cleanly pre-initializes the live output box before chunks arrive.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

PR Review: feat: add fluid connect command and TUI connect wizard

Good overall implementation. The code is well-structured, thread-safety has been improved, and test coverage is solid. A few issues worth addressing:


Bugs / Correctness

1. Potential service leak on ESC-during-connect race (medium)

In connect.go, pressing ESC during StepConnecting sends ConnectCloseMsg{Saved: false}, which causes model.go to call svc.Close() on m.connectModel.GetService(). However, attemptConnect runs as a goroutine — if it completes after the modal closes, a ConnectHealthResultMsg will still be processed by the (now-invisible) connectModel, and the new service stored in connectModel.service will never be closed. Same applies to ConnectDoctorResultMsg arriving after ESC.

A guard in model.go's Update could drop these messages when !m.inConnect, closing the service immediately:

case ConnectHealthResultMsg:
    if !m.inConnect {
        if msg.Service != nil { _ = msg.Service.Close() }
        return m, nil
    }

2. AgentCancelledMsg may not reset UI state (medium)

AgentCancelledMsg is new in messages.go and returned from agent.go's Run when cancelled via ESC. The diff doesn't show a corresponding case in model.go's Update. If it falls through unhandled, the UI might remain in a "running" state (spinner stuck, input disabled) after cancellation. Confirm AgentCancelledMsg is handled identically to AgentDoneMsg in model.go.


Code Quality

3. Duplicate upsert logic in model.go

model.go lines ~2172–2181 re-implements the name/address upsert inline. upsertSandboxHost() already exists for this exact purpose. Use it:

m.cfg.SandboxHosts = upsertSandboxHost(m.cfg.SandboxHosts, closeMsg.Config)

4. withAutoReadOnly unlocks then calls sendStatus with no re-lock

This is fine for avoiding deadlock, but the pattern of unlocking mid-function (two unlock call sites) is fragile. Consider restructuring to determine whether to send status while holding the lock, then send after release — same behavior but easier to reason about.


Security

5. openssl req restriction uses -signkey (historical flag)

validateOpenSSLArgs blocks openssl req -signkey, but current OpenSSL uses -key for this. -signkey was removed in OpenSSL 3.x. The shell-level blocklist covers openssl req.*-signkey which is belt-and-suspenders, but the Go validator should also check -key in req context to be complete. Not a bypass risk since genrsa/genpkey are blocked at the subcommand level, but worth aligning.

6. Private key redaction is client-side only

The redactPrivateKeys regex catches well-formed PEM blocks but won't catch partially-base64-encoded keys, keys returned as hex, or keys within JSON/YAML. This is noted implicitly by the openssl allowlist not allowing genrsa, but worth a code comment acknowledging the scope of the protection.


Tests

7. upsertSandboxHost test assertion logic is overcomplicated

The assertion block (lines ~356–363) uses if tc.wantLen == 1 || tc.wantLen > len(tc.hosts) to decide whether to check wantName on the last element. This misses verifying the updated entry by index for the update-by-name and update-by-address cases. Consider checking the entry at a specific expected index instead:

wantIndex int  // add to test struct
// then: result[tc.wantIndex].Name != tc.wantName

Minor

  • runConnect in main.go creates a ControlPlaneConfig with DaemonAddress set, but NewRemoteService also takes addr as a separate first argument — double-check that both are used consistently and there's no mismatch if they diverge.
  • The --insecure warning in the flag description is good. Consider printing a visible runtime warning (⚠ using insecure TLS) when the flag is actually used in runConnect, so it shows in logs/output.

Overall the connect wizard UX is clean, the mutex refactor on FluidAgent is the right call, and the openssl allowlist/validator is well-thought-out. The private key redaction is a nice security improvement. Address the service leak and AgentCancelledMsg handling before merging.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

fluid.sh

Claude Code for Working on Linux Servers - with human approval.

What This Is

fluid.sh lets AI agents do infrastructure work in isolated VM sandboxes. Agent works autonomously. Human approves before production.

Project Structure

fluid-cli/        # Go CLI - Interactive TUI agent + MCP server
fluid-daemon/     # Go - Background microVM sandbox management daemon
api/              # Go - Control plane REST API + gRPC server
web/              # React - Dashboard UI for monitoring/approval
demo-server/      # Go - WebSocket demo server for interactive docs
proto/            # Protobuf definitions for gRPC services

Testing Required

Every code change needs tests. See project-specific AGENTS.md files for details.

Quick Reference

mprocs                                 # Start all services for dev
cd fluid-cli && make test              # Test CLI
cd fluid-daemon && make test           # Test daemon
cd api && make test                    # Test API
cd web && bun run build                # Build web

Project Docs

  • @fluid-cli/AGENTS.md
  • @web/AGENTS.md
  • @api/AGENTS.md
  • @fluid-daemon/AGENTS.md
  • @demo-server/AGENTS.md

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

PR Review

Overall this is a well-structured PR with good test coverage. The ESC-to-cancel, thread-safety refactor, and TUI wizard are solid additions. A few things worth addressing:


Bugs / Correctness

1. AgentCancelledMsg can interrupt a newly-started run

In model.go, pressing ESC during StateThinking immediately sets m.state = StateIdle and calls Cancel(). However, Cancel() only sends a cancellation signal - the goroutine continues running asynchronously. If the user immediately submits a new message before AgentCancelledMsg arrives, the model transitions to StateThinking again. When the delayed AgentCancelledMsg from the old run then arrives, m.state != StateIdle is true (new run in progress), so it adds a spurious "Agent stopped." and resets UI state mid-run.

Consider tagging AgentCancelledMsg with a generation counter or run ID so stale cancellations can be ignored.


2. SetSandboxService race window after Cancel()

The doc comment says "Must be called after Cancel()", but Cancel() only signals - it does not wait for the goroutine to exit. After Cancel() returns, the goroutine may still be mid-tool-call holding a.service. SetSandboxService then closes and replaces that service while the goroutine may still hold a reference.

The cancelFunc != nil guard protects against concurrent starts, but not against the running goroutine's tail. Consider adding a sync.WaitGroup or done channel to Run so callers can wait for actual completion before swapping the service.


3. Double error output in runConnect

Several error paths in runConnect (main.go) both print the error and return it. For example:

fmt.Printf("  %s Health check failed: %v\n", red("[error]"), err)
return err   // Cobra will also print: "Error: health check failed: ..."

This produces duplicate output in the terminal. Either set connectCmd.SilenceErrors = true / SilenceUsage = true, or only propagate via Cobra's error mechanism without the manual fmt.Printf.


Design / Maintainability

4. redactSensitiveKeys in agent.go duplicates internal/redact

The PR adds base64PEMDetector and k8sSecretDetector to internal/redact/patterns.go (Layer 2), and also adds privateKeyRe, sensitiveBase64PEMRe, sensitiveK8sSecretRe with a standalone redactSensitiveKeys function in agent.go (Layer 1). Two implementations to maintain that already diverge slightly. Layer 1 could just call the existing Redactor from internal/redact instead of reimplementing the same patterns.


5. tls.crt should not be treated as a sensitive key

Both k8sSecretRe in agent.go and k8sSecretDetector in patterns.go include "tls.crt" in the redaction list. tls.crt is a public certificate - it contains no secret material. Redacting it actively harms TLS debugging (the agent cannot see the cert chain, SANs, expiry, etc.), which is exactly the use-case the new tlsDebuggingGuidance system prompt is trying to support. Remove tls.crt from the sensitive key list.


6. openssl s_client remote restriction is Go-only

validateOpenSSLArgs correctly blocks openssl s_client -connect remote:port, but shell.go's BLOCKED_PATTERNS array has no corresponding entry. The two-layer defense described in AGENTS.md implies the shell should be the backstop. Since validate.go runs client-side and the shell runs on the server, consider adding a shell-level pattern for s_client as a belt-and-suspenders measure.


Minor

  • liveOutputCommand field is added to Model but does not appear to be used anywhere in this diff - dead field?
  • CommandOutputStartMsg is new in messages.go but I do not see it consumed in model.go's Update switch in this diff - handled elsewhere or a placeholder?
  • The _ = svc.Close() pattern is used correctly throughout.

Positives

  • withAutoReadOnly is a clean abstraction that properly protects readOnly/autoReadOnly/currentSourceVM under the mutex - much better than the previous inline duplicate pattern.
  • colorFunc refactor removes copy-paste color helpers.
  • Test coverage is thorough: UpsertSandboxHost, IsLocalHost, resolveAddress, redactSensitiveKeys, and PrepareWithKey all have proper table-driven tests.
  • Service cleanup on ESC-cancel of the connect wizard (ConnectHealthResultMsg handled when !m.inConnect) is a nice defensive touch.
  • UpsertSandboxHost placed in config.go is the right layer for this logic.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

Code Review

Great PR overall. The /connect wizard, inline redaction, openssl allowlisting, and ESC-cancel are all well-structured and tested. A few issues worth addressing before merge.

Race condition in SetSandboxService

agent.go SetSandboxService checks cancelFunc == nil under the lock, releases the lock, waits on doneCh, then re-acquires the lock to swap the service. Between the release and re-acquire a new agent run can start (setting cancelFunc), so the "agent is not running" guarantee is violated.

The fix is to re-check cancelFunc after re-acquiring the lock before swapping the service, and return an error if a new run has started.

openssl s_client is blocked at shell level but allowed at Go level

shell.go adds the pattern "^openssl s_client" to the shell-level blocklist, blocking ALL openssl s_client invocations. But validate.go explicitly allows s_client (restricted to localhost). The two layers are contradictory: Go validation passes "openssl s_client -connect localhost:443" but the shell script will then reject it at runtime.

If the intent is defence-in-depth with both layers restricting to localhost, the shell pattern needs to be more precise. If s_client should be blocked entirely, the Go allowlist entry and its test cases should be removed to avoid confusion.

SensitiveContentRedactedMsg message text is too narrow

The Redactor catches connection strings, API keys, AWS credentials, IPs, and more -- not just private keys. The TUI message in model.go hardcodes "Private key detected" in both branches. Suggest changing to "Sensitive content detected" (or passing the detector category through the message struct) so the message stays accurate as the redactor scope grows.

Minor nits

runConnect double-prints errors: each error branch calls fmt.Printf and then returns err. Cobra RunE will print the returned error a second time. Either print and return nil, or just return fmt.Errorf without the manual print.

extractLiveOutputCommand comment placement: the doc comment appears above formatLiveOutput rather than directly above extractLiveOutputCommand, so go doc will not associate it correctly.

IPv6 bracket edge case in validateOpenSSLArgs: the s_client host extraction uses strings.LastIndex(hostPort, ":") which works for [::1]:443 but gives a wrong result for a bare [::1] without a port. Unlikely in practice, but a test case would make the boundary explicit.

What is good

Test coverage is comprehensive (connect wizard, redact patterns, UpsertSandboxHost, netutil, validate, prepare command count). withAutoReadOnly consolidation removes duplicated defer logic cleanly. AgentCancelledMsg.RunID stale-cancellation guard is a nice touch. colorFunc refactor reduces duplication. Proper mutex protection on all new shared fields. k8sSecretDetector correctly uses regexp.QuoteMeta so the dot in tls.key is escaped.

Generated with Claude Code

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

PR Review: feat: add fluid connect command and TUI connect wizard

Overall this is a solid, well-tested feature addition. Good use of the Elm architecture for the wizard, careful mutex discipline in FluidAgent, and thorough test coverage for the new UpsertSandboxHost, IsLocalHost, and ConnectModel logic. A few issues worth addressing before merging:


Bugs

1. Service leak when SetSandboxService fails after a successful connect (model.go)

if svc := m.connectModel.GetService(); svc != nil {
    if agent, ok := m.agentRunner.(*FluidAgent); ok {
        if err := agent.SetSandboxService(svc); err != nil {
            m.addSystemMessage(fmt.Sprintf("Failed to swap sandbox service: %v", err))
            // ← svc is never closed here
        }
    }
}

If SetSandboxService returns an error (e.g., a new run started while waiting), the connection established during the wizard is never closed. Add _ = svc.Close() in the error path.

2. CommandOutputDoneMsg sent without a matching start when RunCommandStreaming errors

In agent.go, the streaming run_source_command path unconditionally sends CommandOutputDoneMsg even on error:

a.sendStatus(CommandOutputDoneMsg{SandboxID: args.Host})
if cmdErr != nil {
    return nil, cmdErr
}

If the command fails before emitting any chunks, the TUI receives a Done without a preceding Start or any Chunk. The model.go CommandOutputDoneMsg handler calls m.liveOutputLines = nil and sets m.showingLiveOutput = false unconditionally — if live output was never started, m.liveOutputIndex is 0 and the handler would corrupt m.conversation[0]. Move the sendStatus(CommandOutputDoneMsg…) inside the success path, mirroring how the non-streaming runSourceCommand handles this.


Code Duplication

3. validateOpenSSLArgs is duplicated between fluid-cli and fluid-daemon with a subtle difference

fluid-cli/internal/readonly/validate.go and fluid-daemon/internal/readonly/validate.go both define validateOpenSSLArgs. The IPv6 bracket-stripping logic differs slightly:

  • CLI version: checks strings.HasPrefix(hostPort, "[") first
  • Daemon version: calls strings.LastIndex(hostPort, ":") first, then strips brackets

Both have the same edge-case bug for bare IPv6 addresses without brackets (e.g., ::1 without port — LastIndex extracts :: instead of ::1). In practice openssl s_client -connect [::1]:443 is the idiomatic form so this is low risk, but having two copies means the next fix will need to happen twice too.

Consider extracting this to a shared internal/readonly package, or at minimum add a comment noting the copies must stay in sync.


Security Observations

4. k8sSecretDetector regex may redact non-secret data

The pattern ([A-Za-z0-9+/=\s]{64,}) will redact any 64+ char base64-like value in a field named private_key, secret_key, etc. — regardless of whether it's actually a key. Application configs sometimes have these field names for non-secret values (e.g., a long public identifier). Consider requiring the value to decode to something that looks like a PEM block or binary key material before redacting, similar to what base64PEMDetector does. Otherwise it may silently hide legitimate diagnostic output and confuse users.

5. openssl s_client -connect restriction has a bypass via -proxy

openssl s_client supports -proxy <host:port> to tunnel through an HTTP proxy. The current validateOpenSSLArgs only validates -connect targets. A user could use -connect localhost:443 -proxy external.host:8080 to have the daemon reach out to an external host. Consider either:

  • Blocking the -proxy flag in blockedFlags for openssl, or
  • Checking for -proxy in validateOpenSSLArgs

Minor

6. Missing test for SetSandboxService and Cancel/RunID interaction

The new mutex-protected cancellation logic in agent.go (Cancel, SetSandboxService, withAutoReadOnly) is behaviorally complex — it's exactly the kind of code that benefits from concurrency tests. Even a basic sequential test that calls Cancel() then SetSandboxService() would catch regressions.

7. UpsertSandboxHost can produce two entries with the same DaemonAddress

If host A has address X and host B has address Y, then upserting host A with address Y matches on name (not address) and updates host A's address to Y. Now both A and B point to Y. No test covers this. If this case should be prevented (probably yes, to avoid ambiguous routing), add a second scan for address conflicts before the name scan, or document the intended behavior explicitly.


Nits

  • runConnect in main.go uses a 5-second health timeout; ConnectModel.attemptConnect uses 10 seconds. Inconsistency is harmless but slightly confusing.
  • connect.go:StepConnecting — when in error state, the step stays as StepConnecting per comment but connErr is the signal shown. The logic is correct but the comment // stay on connecting step to show error could be clearer about when step will next change.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

Code Review

Overall this is a solid, well-structured PR with good test coverage.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

Code Review

Overall this is a solid, well-structured PR with good test coverage. The fluid connect CLI command, TUI wizard, ESC cancellation, and redaction improvements are all well-thought-out. A few issues worth addressing below.


Bugs / Correctness

1. SetSandboxService blocks the Bubbletea event loop (significant)

In model.go, SetSandboxService is called synchronously from Update(). But SetSandboxService can block for up to 2 seconds waiting on doneCh. Bubbletea Update runs on the main goroutine — blocking here freezes the entire TUI. This should be dispatched as a tea.Cmd (goroutine) instead.

2. AgentRunner interface leak via type assertion

The ConnectCloseMsg handler type-asserts to *FluidAgent. This couples Model tightly to a concrete type, bypassing the AgentRunner interface. SetSandboxService should either be added to the AgentRunner interface, or the service swap should be handled through a dedicated message dispatched from outside the model.


Code Quality

3. Duplicated validateOpenSSLArgs between fluid-cli and fluid-daemon

The function is copy-pasted verbatim in both packages (the comment even says keep this s_client block in sync). Since both packages import from the same module, this could live in a shared utility. Copy-paste with a keep-in-sync note is a maintenance trap — especially for security-critical validation logic.

4. UpsertSandboxHost — in-place slice re-use is non-obvious

This is the standard Go filter-in-place idiom and is correct here (write index <= read index always holds). But since this is config-mutation code with a deduplication side-effect, a brief comment explaining why it is safe would help future readers. As written it looks like an accidental mutation of the input slice.

5. TestSetSandboxService_TimesOut adds 2s to every test run

The test waits for the hard-coded 2-second timeout in SetSandboxService. Consider making the timeout configurable so tests can use a short timeout.


Security

6. Doctor checks fail silently when SSH is not configured

If the user has no SSH alias or key for the host, doctor checks will silently time out or produce confusing errors. A clear early message like Doctor checks require SSH access to the host — skipping would be more user-friendly and avoid the 30-second wait.

7. --insecure flag persisted without re-confirmation

The Insecure: true field is written to config and silently reused on subsequent connects. Consider logging a warning at connection time when loading a host with Insecure: true, so users are not surprised later.


Minor / Nits

  • runConnect timeout pattern: the manual healthCancel() calls are correct but using defer healthCancel() would be cleaner and avoids accidental leaks if error paths are added later.
  • StepConnecting retry does not re-validate address: on retry via Enter, m.resolveAddress() is called but any validation error sets m.connErr = err (not m.addrErr), bypassing the dedicated error field.
  • SensitiveContentRedactedMsg spam: if a command repeatedly outputs sensitive content, the system message will appear on every tool call. Deduplicating within a single agent run would improve UX.

What is well done

  • The 4-step wizard structure in ConnectModel is clean and follows Bubbletea conventions properly.
  • ESC cancellation with runID tracking correctly prevents stale AgentCancelledMsg from corrupting a new run.
  • The withAutoReadOnly refactor is a nice cleanup over the previous duplicated defer blocks.
  • base64PEMDetector and k8sSecretDetector are well-tested with both positive and negative cases.
  • netutil.IsLocalHost correctly handles IPv6 bracket notation.
  • Test coverage is solid throughout — the command count tests and table-driven tests are particularly thorough.

Generated with Claude Code

@aspectrr aspectrr merged commit e5499e4 into main Mar 15, 2026
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants