Skip to content

feat(remote-mcp): Step 1 — interactive OAuth + Connection abstraction + security baseline#175

Merged
mgoldsborough merged 9 commits into
mainfrom
feat/interactive-oauth
May 7, 2026
Merged

feat(remote-mcp): Step 1 — interactive OAuth + Connection abstraction + security baseline#175
mgoldsborough merged 9 commits into
mainfrom
feat/interactive-oauth

Conversation

@mgoldsborough
Copy link
Copy Markdown
Contributor

@mgoldsborough mgoldsborough commented May 6, 2026

Step 1 of the remote MCP connections work. Lands the application-layer machinery for DCR-based OAuth-protected remote MCP servers, the per-principal Connection abstraction that Steps 2 + 3 build on, and the security plumbing every later step depends on.

Unlocks Granola and Notion (workspace-scope only — per-member scoping is Step 3).

What's in this branch

  • Connection abstraction. New per-principal Connection type with state machine (starting → pending_auth → running → crashed → dead | stopped), BundleInstance.connections: Map<string, Connection>, summary state derivation. Step 1 always populates with a single _workspace entry; the multi-entry shape is forward-compatible with member-scope (Step 3).
  • Interactive OAuth state machine. WorkspaceOAuthProvider.onInteractiveAuthRequired callback + flow registry + mcp-source UnauthorizedError retry. installRemote and the workspace-startup loop kick off start() non-blocking so the install API doesn't hang on user-interactive auth.
  • Routes. POST /v1/mcp-auth/initiate (workspace-authed; sets session-bound HttpOnly state cookie; returns {authorizationUrl}). GET /v1/mcp-auth/callback (verifies cookie matches sha256(state) before consulting the flow registry).
  • SSE event surface. connection.state_changed per principal — workspace-scope shows one event stream per bundle (principal _workspace); member-scope (Step 3) will show one per active member. Routed through a declarative SSE_ROUTES table in events.ts so workspace-scoped events filter by wsId and unrouted types stay internal.
  • Web banner. Pending-auth banner per Connection in pending state. Click Connect → POST /initiatewindow.location = authorizationUrl. Banner clears when the connection transitions to running.
  • Security primitives. CredentialStore interface + FileCredentialStore (plaintext-disk, mode 0o600 / dir 0o700, atomic writes) — introduced as the boundary that admits a SaaS-encrypted impl in a future swap. Redacted<T> wrapper consumed at the CredentialStore boundary for logging hygiene; threading it through the OAuth provider's token / DCR client_secret / PKCE verifier caches is a follow-up.

Verification

  • bun run verify:static clean (1 pre-existing warning in url-validator.ts)
  • bun run test:unit — 2390 pass, 0 fail
  • bun run test:integration — 460 pass, 6 skip, 0 fail
  • bun run smoke — 17 pass, 0 fail
  • E2E against real Granola: bun run dev:empty with workspace.json containing the Granola URL + NB_API_URL=http://localhost:27249 → bundle starts → pending_auth banner appears → Connect → OAuth flow → tokens persist to mcp-oauth/granola/{client,tokens,verifier}.json → bundle transitions to running → 6 tools registered (granola__list_meetings, granola__get_meetings, granola__get_meeting_transcript, granola__list_meeting_folders, granola__query_granola_meetings, granola__get_account_info) → nb__search finds them. Restart preserves auth (starting → running directly).

What's NOT in this PR

  • Pre-registered OAuth clients (Step 2) — needed for HubSpot, Asana, Gmail, Outlook, Zoom
  • Per-member credential scoping (Step 3) — needed for personal Granola, personal Gmail, etc.
  • Connections page UI / catalog (Track C) — workspace.json editing is the only path to add a connection in this PR

PR #166 (feat/connections-tracks) is stacked on this and will follow.

Migration / breaking changes

None. New routes, new types, new files. Existing workspaces with no remote URL bundles see zero behavior change.

Lays the groundwork for REMOTE_MCP_CONNECTIONS.md Step 1 without yet
touching the OAuth state machine itself.

- Redacted<T> wrapper for sensitive values: overrides toString, toJSON,
  and util.inspect so secrets render as [redacted] in any logger path
  (S5 in the spec). reveal() is the only sanctioned exit.
- CredentialStore interface + FileCredentialStore (plaintext-disk) impl
  for opaque secrets like OAuth client_secret. Files at
  workspaces/<wsId>/credentials/secrets/<key>, mode 0o600, atomic
  rename, key/wsId path-traversal validation. Boundary that admits a
  SaaS encrypted impl later.
- BundleState gains pending_auth. EngineEventType gains
  bundle.pending_auth. Both unwired for now; the provider/lifecycle
  hookup is the next chunk.
Per first-principles review: bundle-level state breaks under member-scoped
OAuth (Step 3) where each member has independent auth state. A's RT can
expire while B's stays valid — no single 'running' / 'pending_auth' for
the bundle. Lay down the Connection abstraction now so Step 3 is additive.

- Connection: { principalId, state, source, authorizationUrl?, lastError? }
  state machine matches BundleState. WORKSPACE_PRINCIPAL_ID = '_workspace'
  for the workspace-shared identity.
- BundleInstance.connections: Map<string, Connection>. Step 1 always uses
  one entry keyed '_workspace'; Step 3 lights up multi-entry maps for
  member-scope. BundleInstance.state becomes a derived summary —
  summarizeConnectionState applies 'any running wins' rules.
- BundleInstance.oauthScope plumbed through types. Step 1 always treats
  declared value as 'workspace' regardless.
- EngineEventType: connection.state_changed (replaces a planned
  bundle.pending_auth). Forward-compatible for Step 3 — same event shape
  per principal whether workspace or member scope.
Wires the interactive OAuth flow end-to-end on the server side:

- WorkspaceOAuthProvider: onInteractiveAuthRequired callback. When the
  headless redirect probe doesn't land on our callback, the provider
  registers (state, wsId, serverName) with oauth-flow-registry, swaps
  pendingFlow.promise to the registry promise (so awaitPendingFlow
  resolves when the HTTP callback lands), fires the callback so
  lifecycle can transition the Connection to pending_auth, then throws
  UnauthorizedError to escape the SDK auth loop. McpSource.start()'s
  existing UnauthorizedError catch path takes over from there — awaits
  the registry, calls finishAuth, retries connect.

- BundleLifecycleManager: recordConnectionStateChange owns Connection
  state transitions, BundleInstance.state summary recompute, and
  connection.state_changed SSE emission. getPendingAuthUrl is the
  /initiate route's lookup path; only returns a URL when the named
  Connection is in pending_auth (no leaks on stale state).

- installRemote: pre-registers the BundleInstance + 'starting' Connection
  BEFORE startBundleSource so the interactive callback (fired during
  source.start()) can find the instance to transition. On success ->
  connection running. On failure -> connection dead + instance removed.

- Routes:
  - POST /v1/mcp-auth/initiate (workspace-authed): looks up the captured
    auth URL via lifecycle.getPendingAuthUrl, sets nb_oauth_state cookie
    bound to /v1/mcp-auth/callback path (sha256 of state, HttpOnly,
    SameSite=Lax, Secure off-localhost, Max-Age=900), returns the URL.
    POST-only (no GET) so an <img> or prefetch can't trigger a flow.
    CSRF protection via the existing X-Workspace-Id preflight (custom
    header forces CORS preflight, blocking simple-form CSRF).
  - GET /v1/mcp-auth/callback: timing-safe verifies nb_oauth_state cookie
    sha256 matches URL state before resolving the flow registry. Without
    the cookie, returns 400 — closes the gap where a leaked state value
    alone (referrer, history, network log) could let an attacker drop
    tokens into someone else's account. Clears the cookie on success.

Step 1 limitation: install API still blocks until interactive auth
completes (provider's awaitPendingFlow blocks inside startBundleSource).
Acceptable for self-host / dev where the user is the same person; the
banner appears the moment we hit pending_auth so the user knows what to
do. Non-blocking install is a follow-up to make this production-ready
for SaaS.
…avior

Unit tests (test/unit/lifecycle-connection.test.ts):
- recordConnectionStateChange creates the connection on first call
- Emits connection.state_changed with the expected payload
- BundleInstance.state mirrors the single connection's state for
  workspace-scope (Step 1)
- authorizationUrl is cleared on transition out of pending_auth
- getPendingAuthUrl returns the URL only while pending_auth, null
  otherwise (closes leak vector if state stale)
- Returns null for unknown server / wsId / principal (no enumeration leak)
- No-op + no emit when the underlying instance is missing
- lastError populates on dead, clears on running

Integration tests (test/integration/workspace-oauth-provider.test.ts):
- Updated the two interactive-branch cases (302 redirect + 200 login
  page) to match the NEW behavior: provider throws UnauthorizedError
  (the SDK's own class), fires onInteractiveAuthRequired with the
  authorization URL, and registers the flow with oauth-flow-registry
  such that resolveWithCode resolves awaitPendingFlow's promise.
- Smoke test that InteractiveOAuthNotSupportedError is still exported
  for any consumer importing the deprecated symbol.

Total: 2209 unit tests pass, 22+5 skip integration tests pass.
Server (src/bundles/startup.ts):
- URL-bundle path now races source.start() against an early-return
  signal triggered by onInteractiveAuthRequired. The moment the provider
  determines interactive auth is needed (lifecycle records pending_auth,
  SSE event fires, banner appears), startBundleSource returns with a
  placeholder meta ('remote (pending auth)'). source.start() continues
  in the background and resolves when the user completes auth via
  /v1/mcp-auth/initiate -> AS -> /v1/mcp-auth/callback. This unblocks
  both the install API path and the workspace-runtime boot loop —
  neither hangs waiting for user interaction.
- Source registration is gated on success or pending_auth (not failure):
  remote-lifecycle test still asserts hasSource() === false after a
  failed connect.

Web (web/src/):
- types.ts: BundleState gains 'pending_auth'. New
  ConnectionStateChangedEvent in SseEventMap.
- hooks/useEvents.ts: optional onConnectionStateChanged callback on the
  workspace SSE subscription.
- api/client.ts: initiateMcpOAuth(serverName, principalId?) POSTs to
  /v1/mcp-auth/initiate, returns { authorizationUrl }. Browser stores
  the nb_oauth_state cookie set by the server.
- components/PendingAuthBanner.tsx: amber banner row per pending
  Connection. Click Connect -> initiateMcpOAuth -> window.location.assign.
  Banner self-clears when the connection's state_changed event arrives
  with a non-pending_auth state.
- App.tsx: workspace shell holds the pendingAuth Map<key,event> state,
  wires onConnectionStateChanged to add/remove rows, drops the map on
  workspace switch, mounts <PendingAuthBanner /> inside <ShellLayout>.
…tion wiring

Closes the full pending_auth → user authorization → running cycle for
URL bundles loaded at platform boot (workspace.json), not just install-
API-driven adds. Verified end-to-end against a mock OAuth+MCP server.

- pending-auth-buffer.ts: process-local buffer for interactive-OAuth
  notifications captured BEFORE BundleLifecycleManager exists. boot
  ordering is workspace-runtime -> startBundleSource (provider may fire
  onInteractiveAuthRequired) -> Runtime constructs lifecycle ->
  seedInstance. Without the buffer the pending_auth signal is dropped.
  Also adds a connection-running notification path used by background
  start() completions to drive lifecycle transitions out of pending_auth.
- workspace-runtime.ts: wires onInteractiveAuthRequired callback that
  calls setPendingAuth(buffer) keyed by (wsId, serverName).
- lifecycle.seedInstance: consumes the pending-auth buffer for URL
  bundles and either records pending_auth (with captured authUrl) or
  running. Either way emits the connection.state_changed SSE event.
- runtime.ts: registers the connection-running handler on
  pending-auth-buffer post-construction so subsequent background start()
  successes route through lifecycle.recordConnectionStateChange.
- startup.ts: calls notifyConnectionRunning(wsId, serverName) inside
  start().then so URL bundles whose interactive OAuth completes after
  boot transition the BundleInstance.connection out of pending_auth and
  fire the SSE clear-banner event.
- mcp-source.ts: transport.onclose handler is wired AFTER successful
  start() rather than at construction. A close during the start handshake
  (SDK 401 -> auth throws -> SDK closes transport) is part of the start
  flow, not a mid-session crash. Without this, the OAuth retry path
  fights HealthMonitor's own restart attempts triggered by the
  transient close.
- api/routes/mcp-auth.ts: middleware applied per-handler instead of
  via .use(*) on a sub-app. Hono's sub-app wildcard middleware
  applies to ALL routes mounted under the same prefix, which would
  otherwise gate the unauthenticated /callback on workspace headers
  the user's browser cannot set on a return-from-AS navigation.
  Adds GET /v1/connections/pending — workspace-authed snapshot the
  web client uses to populate the banner on first render
  (connection.state_changed SSE only delivers from connect time
  forward, so a client connecting after boot would otherwise miss the
  signal until reload).
- web/api/client.ts: listPendingConnections() helper.
- web/App.tsx: on workspace activation, fetches the pending snapshot
  and seeds the banner state. Subsequent transitions arrive via SSE.

E2E test scaffolding:
- test/scripts/mock-oauth-mcp-server.ts: single-process Bun.serve
  implementing OAuth 2.1 metadata, DCR (RFC 7591), authorize
  (200 HTML form forcing the interactive branch — same shape as
  Granola/Notion/HubSpot), token endpoint, and a Streamable HTTP MCP
  endpoint with one mock tool. Runs on :19999 by default.

Verified manually:
1. Add { url: 'http://localhost:19999/mcp', serverName: 'mock-granola' }
   to ws_default workspace.json + allowInsecureRemotes=true in
   nimblebrain.json.
2. Boot dev server. Bundle hits interactive OAuth, returns early,
   workspace-runtime moves on. Connection seeded as pending_auth.
3. Browser loads workspace -> snapshot fetch -> banner appears.
4. Click Connect -> POST /v1/mcp-auth/initiate (sets nb_oauth_state
   cookie) -> redirect to mock AS -> Continue button -> /callback ->
   cookie verified -> flow registry resolves -> McpSource finishAuth +
   retry -> bundle running -> notifyConnectionRunning ->
   recordConnectionStateChange(running) -> SSE -> banner clears.
The chat panel is fixed top-0 right-0 z-10. The banner was rendered inside
ShellLayout's main column, which the chat overlays — when the chat opens,
the right portion of the banner including the Connect button gets hidden
under it. Lift the banner to fixed full-viewport-width with z-50 so it
sits above both sidebar and chat. ShellLayout gets a sibling spacer
(PendingAuthBannerSpacer) that pushes its main content down by the
banner's height when the pending Map is non-empty.
…dle events

- Replace the ad-hoc allowlist in SseEventManager.emit() with a declarative
  SSE_ROUTES table keyed on EngineEventType. New event types are one-line
  additions; typos fail at compile time.
- Without a route entry, connection.state_changed was dropped at the SSE
  boundary — the pending-auth banner never auto-cleared after the user
  completed interactive OAuth.
- Add wsId to bundle.installed / crashed / recovered / dead payloads so
  those events can be filtered by workspace at the table boundary, closing
  a pre-existing cross-workspace leak (bundle events were previously
  fanned out to every workspace's clients).
- Add unit coverage for the routing table and the mcp-auth route cookie
  binding (both uncovered before).
- Correct misleading "verifier.json deleted after finishAuth" docstring.
@mgoldsborough mgoldsborough added the qa-reviewed QA review completed with no critical issues label May 6, 2026
…alId in Step 1

QA review follow-ups:

- Replace handwritten constant-time hex compare with crypto.timingSafeEqual.
  Add a SHA256-hex format guard so malformed cookies are rejected up-front
  instead of throwing on length-mismatched buffers.

- Stop reading `principalId` from /v1/mcp-auth/initiate request body. Step 1
  is workspace-scope only — the principal is always WORKSPACE_PRINCIPAL_ID.
  Step 3 will reintroduce the field with member-auth checks and reject
  user-supplied "_"-prefixed values.
@mgoldsborough mgoldsborough merged commit c0cd227 into main May 7, 2026
4 checks passed
@mgoldsborough mgoldsborough deleted the feat/interactive-oauth branch May 26, 2026 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qa-reviewed QA review completed with no critical issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant