Skip to content

feat(daemon): wire Session Manager + agent shim + RepoResolver + inbox messenger#70

Merged
harshitsinghbhandari merged 1 commit into
stagingfrom
session/aa-39
Jun 1, 2026
Merged

feat(daemon): wire Session Manager + agent shim + RepoResolver + inbox messenger#70
harshitsinghbhandari merged 1 commit into
stagingfrom
session/aa-39

Conversation

@harshitsinghbhandari
Copy link
Copy Markdown
Collaborator

Summary

PR β in our SM-revival planning. PR #62 (merged 2026-06-01) deleted the Session Manager wiring from the daemon — every call to session.New() was removed; only the integration test still constructs one. This brings SM back end-to-end: calling sm.Spawn() now launches a real Claude Code agent in a real git worktree in a real zellij session, and lifecycle nudges actually reach the agent via an inbox file.

Branches from origin/staging (= main + #65) so @yyovil's claudecode adapter is available. PR base is staging.

What's new

Four new pieces:

  • adapters/agent/portshim — bridges the richer adapters/agent.Agent interface (PR Add agent adapters and wire per-session agents into the session manager #65) onto the narrower ports.Agent the SM consumes. POSIX shell-quoting joins argv into the single string the zellij sh -lc wrapper expects.
  • adapters/workspace/gitworktree/projectresolvergitworktree.RepoResolver backed by project.Manager. Lives in its own subpackage so gitworktree stays free of the project import.
  • adapters/messenger/inboxports.AgentMessenger writing each message as <session-workspace>/.ao/inbox/<nano>_<hash>.md. Symlink-safe via os.Lstat on the .ao / .ao/inbox segments.
  • daemon/session_wiring.go — assembles claudecode → portshim, gitworktree over projectresolver, inbox messenger over the sqlite store, and the SM itself. Reuses the existing zellij runtime / project manager / lcm singletons.

Also promotes the duplicated "agentSessionId" metadata key in claudecode and codex to a single agent.MetadataKeyAgentSessionID constant so the portshim has a stable place to populate Session.Metadata for the underlying adapter.

Singleton sharing (the contract this PR enforces)

  • One zellij.Runtime services both terminal mux and SM.Spawn (two adapters would race on the same socket).
  • One *lifecycle.Manager services both the reaper and the SM (two LCMs would split agent-nudge state).
  • One project.Manager services both httpd and the gitworktree RepoResolver.
  • One ports.AgentMessenger services both the LCM (PR-driven reactions) and the SM (Send).

Out of scope (covered by follow-ups)

  • HTTP routes for SM (POST /sessions, etc.) — γ
  • ao session new CLI — γ
  • SCM poller — δ (depends on aa-38)
  • Codex agent wiring (claude-code only)
  • Zellij send-keys pane-ping (agent reads inbox on demand)

References

Test plan

  • go test -race ./... — 323/324 pass (the 1 failure is TestSessionStreamsRealZellijPane, a pre-existing host-env issue: \$TMPDIR > 103 chars exceeds zellij's IPC socket limit; also fails on origin/staging without these changes)
  • go vet ./... clean
  • gofmt -l backend/ clean
  • goimports -l clean
  • portshim tests: 15 cases (shell quoting, env, restore propagation, error fall-through)
  • projectresolver tests: 4 cases (happy / unknown / degraded / interface satisfaction)
  • inbox tests: 9 cases (happy / dir-create / two-distinct / unknown session / empty workspace path / symlinked inbox refused / empty message / filename shape / interface satisfaction)
  • daemon wiring test: SM stack constructed, shares singletons, messenger reaches store-known workspace end to end

🤖 Generated with Claude Code

…x messenger

PR #62 ("simplify session lifecycle and zellij runtime") deleted the Session
Manager wiring from the daemon — every call to session.New() was removed and
only the integration test still constructed one. This brings SM back, end to
end: calling sm.Spawn() now launches a real Claude Code agent in a real git
worktree in a real zellij session, and lifecycle nudges reach the agent via
an inbox file.

Four new pieces:

- adapters/agent/portshim: bridges the richer adapters/agent.Agent interface
  (PR #65, @yyovil) onto the narrower ports.Agent the SM consumes. POSIX
  shell-quoting joins argv into the single string the zellij `sh -lc` wrapper
  expects.
- adapters/workspace/gitworktree/projectresolver: gitworktree.RepoResolver
  backed by project.Manager. Lives in its own subpackage so gitworktree stays
  free of the project import (and the cycle that would create).
- adapters/messenger/inbox: ports.AgentMessenger writing each message as
  <session-workspace>/.ao/inbox/<nano>_<hash>.md. Symlink-safe via os.Lstat
  on the .ao/inbox segments.
- daemon/session_wiring.go: assembles claudecode → portshim, gitworktree
  over projectresolver, inbox messenger over the sqlite store, and the SM
  itself. Reuses the existing zellij runtime / project manager / lcm
  singletons rather than constructing parallel copies.

Daemon-wide singleton sharing (the change of behavior under #62 / #65 +
this PR):

- One zellij.Runtime instance services both the terminal mux and SM.Spawn.
  Two adapters would race on the same socket.
- One lifecycle.Manager instance services both the reaper (runtime liveness
  observations) and the SM (spawn/restore/kill writes). Two LCMs would split
  agent-nudge state.
- One project.Manager instance services both httpd (/api/v1/projects) and
  the gitworktree RepoResolver. Two stores would diverge on cached reads.
- One ports.AgentMessenger services both the LCM (PR-driven reactions:
  CI fail, review feedback, merge conflict) and the SM (Send).
- One *sqlite.Store services CDC, lifecycle, SM, and the inbox workspace
  lookup. Already the case; preserved.

Also promotes the duplicated "agentSessionId" metadata key literal in the
claudecode and codex adapters to a single agent.MetadataKeyAgentSessionID
constant in adapters/agent, which the portshim now uses to populate
Session.Metadata for the underlying adapter's GetRestoreCommand.

What this PR does NOT do (covered by follow-ups γ/δ):
- No HTTP routes for SM (POST /sessions, etc.) — γ
- No `ao session new` CLI — γ
- No SCM poller — δ
- No codex agent wiring (claude-code only) — later
- No zellij send-keys pane-ping — the agent reads its inbox on demand

Tests:
- portshim: 15 table-driven cases (shell quoting, env, restore propagation,
  error fall-through, safe-string short-circuit).
- projectresolver: 4 cases (interface satisfaction, happy path, unknown
  project, degraded project).
- inbox: 9 cases (interface satisfaction, write, dir create, two-distinct,
  unknown session, empty workspace path, symlinked inbox refused, empty
  message, filename shape).
- daemon/wiring_test: SM stack constructed + sharing singletons + messenger
  reaches the same store via SessionMetadata.WorkspacePath end to end.

go test -race / go vet / gofmt clean. The pre-existing
TestSessionStreamsRealZellijPane integration test fails on this host
because \$TMPDIR > 103 chars (zellij IPC socket limit) — also fails on
origin/staging without these changes.

Branched from origin/staging (= main + #65) so claudecode is available;
PR base must be staging.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@harshitsinghbhandari harshitsinghbhandari merged commit 82e9111 into staging Jun 1, 2026
6 of 7 checks passed
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 1, 2026

Greptile Summary

This PR re-wires the Session Manager end-to-end after it was removed in #62, adding four new components: a portshim bridging the richer agent.Agent interface to ports.Agent, a projectresolver subpackage connecting the gitworktree workspace to the project store, an inbox messenger writing lifecycle nudges as files under <workspace>/.ao/inbox/, and session_wiring.go assembling them all from shared daemon singletons.

  • portshim translates argv slices to POSIX-quoted shell strings and adapts context-bearing agent calls to the context-free port interface; it silently returns \"\" on adapter errors since ports.Agent.GetLaunchCommand has no error return, and it zeros the Permissions field in both launch and restore configs because the port interface carries no permission mode.
  • inbox messenger guards against symlinked .ao/.ao/inbox directories via os.Lstat and generates collision-resistant filenames from nanosecond timestamps plus SHA-256 prefixes.
  • daemon.go correctly constructs singletons once and threads them through both startLifecycle and buildSessionStack; the SM stack is built but intentionally unused (_ = ss) pending HTTP route work in a follow-up PR.

Confidence Score: 3/5

The wiring is structurally sound and the singleton contract is correctly enforced, but the portshim silently returns an empty launch command on adapter error, which can cause the Session Manager to record a successful spawn while no agent process is actually running.

The portshim swallows GetLaunchCommand errors and returns an empty string. The SM passes that string directly to runtime.Create without checking it, so if the claude binary is missing the daemon creates a zellij session with sh -lc empty string, marks it spawned, and the caller gets back a valid domain.Session with no agent behind it. The reaper will eventually clean up the zombie, but no error is surfaced at spawn time. The permission-mode zeroing and context.Background() propagation are known port-interface limitations that do not introduce incorrect state.

backend/internal/adapters/agent/portshim/shim.go warrants the closest review — specifically how GetLaunchCommand and GetRestoreCommand handle errors and the absence of permission-mode propagation.

Important Files Changed

Filename Overview
backend/internal/adapters/agent/portshim/shim.go New bridge from richer agent.Agent to ports.Agent; correctly implements POSIX shell quoting but silently drops adapter errors in GetLaunchCommand (returns empty string) and zeros permission modes in both launch and restore paths.
backend/internal/adapters/messenger/inbox/inbox.go New AgentMessenger writing timestamped+hashed .md files under workspace/.ao/inbox/; symlink safety via Lstat on .ao and .ao/inbox segments is sound; TOCTOU window between Lstat and MkdirAll is documented and acceptable given the workspace ownership model.
backend/internal/adapters/workspace/gitworktree/projectresolver/resolver.go New subpackage bridging project.Manager to gitworktree.RepoResolver; cleanly prevents the import cycle; correctly surfaces degraded/unknown project as errors to avoid silent empty-path worktree creation.
backend/internal/daemon/session_wiring.go New wiring file assembling the session stack; singleton contract is correctly enforced by threading shared instances; storeWorkspaceLookup correctly refuses empty workspace paths before inbox writes.
backend/internal/daemon/daemon.go Singletons (runtimeAdapter, projects, messenger) are now constructed once and shared correctly; error-path cleanup for buildSessionStack failure is consistent with the pre-existing httpd error path.
backend/internal/daemon/lifecycle_wiring.go LCM now receives the real messenger (was nil); lcm field exposed on lifecycleStack so the SM can share the same instance; straightforward change with no issues.
backend/internal/adapters/agent/agent.go Promotes the duplicated agentSessionId string literal to MetadataKeyAgentSessionID constant; clean deduplication with no logic changes.
backend/internal/adapters/agent/claudecode/claudecode.go Migrates claudeAgentSessionIDMetadataKey usages to agent.MetadataKeyAgentSessionID; no behavioral change.
backend/internal/adapters/agent/codex/codex.go codexAgentSessionIDMetadataKey now aliases agent.MetadataKeyAgentSessionID instead of duplicating the string literal; const aliasing another const is valid Go and produces the same value.
backend/internal/daemon/wiring_test.go New wiring test verifies singleton sharing via pointer identity and exercises the end-to-end Send path through the shared store; cleanup order (cancel then Stop) is correctly handled to avoid deadlock.

Sequence Diagram

sequenceDiagram
    participant D as daemon.Run
    participant LC as startLifecycle
    participant SS as buildSessionStack
    participant PM as portshim.Shim
    participant CC as claudecode.Plugin
    participant SM as session.Manager
    participant WS as gitworktree.Workspace
    participant RT as zellij.Runtime
    participant IB as inbox.Messenger
    participant ST as sqlite.Store

    D->>LC: startLifecycle(ctx, store, runtime, messenger)
    LC-->>D: "lifecycleStack{lcm, reaperDone}"

    D->>SS: buildSessionStack(cfg, store, runtime, projects, lcm, messenger)
    SS->>PM: portshim.New(claudecode.New())
    SS->>WS: gitworktree.New(projectresolver.New(projects))
    SS->>SM: "session.New(Deps{...})"
    SS-->>D: "sessionStack{sm, workspace, messenger}"

    Note over D: _ = ss (routes in follow-up PR)
    D->>RT: srv.Run(ctx)

    Note over SM,RT: On Spawn()
    SM->>PM: GetLaunchCommand(AgentConfig)
    PM->>CC: GetLaunchCommand(context.Background, LaunchConfig)
    CC-->>PM: argv or err
    PM-->>SM: shell-quoted string or empty string on err
    SM->>WS: Create(WorkspaceConfig)
    WS-->>SM: "WorkspaceInfo{Path, Branch}"
    SM->>RT: "Create(RuntimeConfig{LaunchCommand})"
    RT-->>SM: RuntimeHandle

    Note over SM,IB: On Send()
    SM->>IB: Send(ctx, sessionID, message)
    IB->>ST: WorkspacePath(ctx, id)
    ST-->>IB: workspace path
    IB->>IB: ensureRealDir(.ao, .ao/inbox)
    IB->>IB: WriteFile(nano_hash.md)
Loading

Comments Outside Diff (3)

  1. backend/internal/adapters/agent/portshim/shim.go, line 134-140 (link)

    P1 Adapter error silently becomes an empty launch command

    When s.agent.GetLaunchCommand returns an error (e.g., the claude binary is not on $PATH), the shim returns "". session.Manager.Spawn passes that empty string directly to runtime.Create without checking it, so a zellij session is created with sh -lc "". The pane exits immediately and the SM records a successful spawn. The caller gets a domain.Session object with a live ID, unaware that no agent is actually running. The error from the underlying adapter is entirely swallowed — there is no log line and no observable failure until the reaper cleans up the stale session.

  2. backend/internal/adapters/agent/portshim/shim.go, line 169-175 (link)

    P2 Permission mode is silently zeroed for both launch and restore

    launchConfigFor omits Permissions (leaving it at the zero value ""), and GetRestoreCommand likewise leaves RestoreConfig.Permissions unset. claudecode.appendPermissionFlags is called with an empty PermissionMode, so every session launched or restored via this shim gets whatever default Claude's TUI resolves from the user's ~/.claude/settings.json. A session originally spawned in bypassPermissions mode will silently revert to the configured default on restore, causing Claude to halt and prompt for approval instead of running autonomously. Neither the SM nor the caller is informed of this mode drop. The ports.AgentConfig and ports.Agent.GetRestoreCommand signatures do not carry a permissions field, so the shim cannot fix this without a port interface change — but the loss is currently invisible.

  3. backend/internal/adapters/agent/portshim/shim.go, line 119-128 (link)

    P2 context.Background() prevents cancellation propagation on all adapter calls

    The shim threads context.Background() into every call to the wrapped agent.Agent — including binary lookup in GetLaunchCommand, which may shell out to exec.LookPath. If the daemon is shutting down (context cancelled) or Spawn is cancelled mid-flight, those adapter calls continue running until they complete naturally. The comment acknowledges this as a known limitation ("context-free at its API surface"), but consider at minimum logging a warning when context.Background() is used during an already-cancelled parent call, to make debugging easier.

Reviews (1): Last reviewed commit: "feat(daemon): wire Session Manager + age..." | Re-trigger Greptile

harshitsinghbhandari added a commit that referenced this pull request Jun 1, 2026
Review fixes (PR #71):
- spawn CLI now uses a dedicated 90 s timeout (90 s > server's 60 s
  DefaultRequestTimeout) via context.WithTimeout, and stops sharing
  deps.HTTPClient — that client is sized for fast /healthz/shutdown
  probes (2 s) and was preempting the synchronous Spawn long before the
  daemon could finish provisioning a worktree + zellij pane + agent.
- Harden writeSpawnError so a *project.Error with a non-client Kind
  ("internal", "not_implemented", or anything unknown) falls through to
  the generic 500 SPAWN_FAILED envelope instead of passing the project
  error's Code/Message verbatim to the client. Adds three subtests that
  pin down the opacity contract.

Lint debt cleared (inherited from PRs #65/#70):
- Add doc comments on every exported symbol in the agent / claudecode /
  codex / adapters-registry packages (revive: exported)
- gosec G306/G301: inbox file/dir perms 0644→0600 and 0755→0750
- gosec G703 (path traversal via taint): excluded globally with the same
  rationale as G304 — adapter paths are daemon-config/worktree-derived,
  not user input
- gocritic emptyStringTest: len(strings.TrimSpace(...)) > 0 → != ""
- gocritic paramTypeCombine: combine adjacent same-type params
- errcheck: wrap deferred os.Remove(tmpName) in a closure
- prealloc: preallocate cmd slices on the resume paths
harshitsinghbhandari added a commit that referenced this pull request Jun 1, 2026
* feat(ao): `ao spawn` CLI + POST /api/v1/sessions route

* fix(ao): address PR γ review + clear inherited lint debt

Review fixes (PR #71):
- spawn CLI now uses a dedicated 90 s timeout (90 s > server's 60 s
  DefaultRequestTimeout) via context.WithTimeout, and stops sharing
  deps.HTTPClient — that client is sized for fast /healthz/shutdown
  probes (2 s) and was preempting the synchronous Spawn long before the
  daemon could finish provisioning a worktree + zellij pane + agent.
- Harden writeSpawnError so a *project.Error with a non-client Kind
  ("internal", "not_implemented", or anything unknown) falls through to
  the generic 500 SPAWN_FAILED envelope instead of passing the project
  error's Code/Message verbatim to the client. Adds three subtests that
  pin down the opacity contract.

Lint debt cleared (inherited from PRs #65/#70):
- Add doc comments on every exported symbol in the agent / claudecode /
  codex / adapters-registry packages (revive: exported)
- gosec G306/G301: inbox file/dir perms 0644→0600 and 0755→0750
- gosec G703 (path traversal via taint): excluded globally with the same
  rationale as G304 — adapter paths are daemon-config/worktree-derived,
  not user input
- gocritic emptyStringTest: len(strings.TrimSpace(...)) > 0 → != ""
- gocritic paramTypeCombine: combine adjacent same-type params
- errcheck: wrap deferred os.Remove(tmpName) in a closure
- prealloc: preallocate cmd slices on the resume paths

* fix(lint): lift loop condition in scm poller test (staticcheck QF1006)

Inherited from PR #72 merging into staging after this branch opened.
golangci-lint v2.12.2 → 0 issues.
harshitsinghbhandari added a commit that referenced this pull request Jun 1, 2026
…s wiring

Main moved structurally via PR #67 (session→session_manager rename + service
layer for read-model assembly). Staging had the live SM daemon wiring (PR #70)
but for the old session package shape. This merge adopts main's shape and
retargets staging's wiring on top.

Structural decisions:
- Keep main's httpd/api.go shape with controllers.SessionService (drop
  staging's session.Spawner).
- Keep main's rich httpd/controllers/sessions.go (list/get/spawn/restore/kill/
  send/rename/spawnOrchestrator) and its tests.
- Keep main's SCM provider/client/test code: the pagination guard, the
  merge_state_status REST field, and the application/vnd.github+json Accept
  header are intentional safety + correctness fixes. Staging's text/plain
  Accept tweak would 406 the /actions/jobs/{id}/logs endpoint.
- Delete session.Spawner: service.Session is the new controller contract and
  duck-types into controllers.SessionService.

Wiring retarget (daemon/session_wiring.go, daemon/daemon.go):
- buildSessionStack now constructs *session_manager.Manager and wraps it in
  service.NewSession(sm, store); the stack returns *service.Session instead of
  a bare *Manager.
- daemon.Run passes ss.svc into httpd.APIDeps{Sessions: ...}.

Preserved from staging:
- aa-46 default-branch fix in session_manager/manager.go::Spawn (branch
  defaults to "ao/<sessionID>" when SpawnConfig.Branch is empty) plus its two
  pinning tests (TestSpawn_DefaultsBranchPerSession_WhenUnset,
  TestSpawn_HonorsExplicitBranch).
- ao spawn CLI + POST /api/v1/sessions route, ao-here.sh helper, agent
  adapters (claudecode + portshim), inbox messenger, projectresolver,
  gitworktree daemon wiring, SCM poller, find_branch_pr, ao spawn route fix.

Import migration: internal/session → internal/session_manager updated in
daemon/session_wiring.go and adapters/agent/portshim/shim_test.go.

Verification: go build, go vet, gofmt all clean. go test -race ./... reports
406 pass / 1 fail (TestSessionStreamsRealZellijPane — pre-existing macOS
$TMPDIR > 103-char zellij IPC socket length issue, not introduced by merge).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant