Skip to content

fix(agent-room): enforce 1:1 invariant + admin view + migrate legacy multi-member pods#232

Merged
samxu01 merged 4 commits intomainfrom
fix/agent-room-1to1-invariant
Apr 25, 2026
Merged

fix(agent-room): enforce 1:1 invariant + admin view + migrate legacy multi-member pods#232
samxu01 merged 4 commits intomainfrom
fix/agent-room-1to1-invariant

Conversation

@lilyshen0722
Copy link
Copy Markdown
Contributor

Summary

ADR-001 §3.10 says agent-rooms are personal 1:1 DMs — one user, one agent, no third party. The §3.10 amendment that corrected the original "N humans × 1 agent office" framing was documented but never enforced in code. Live data on api-dev showed agent-room pods with 5–6 members — a privacy/UX bug where what users perceive as a private DM was joinable by other people.

Surfaced during ADR-005 Phase 2 codex smoke (the commonly pod list output included agent-rooms with 5–6 members).

What changed

File Change
backend/controllers/podController.ts joinPod rejects any third-person join on agent-room (HTTP 403). getAllPods / getPodsByType skip the membership filter when caller is a global admin.
backend/services/agentIdentityService.ts ensureAgentInPod refuses to add an agent to an agent-room when that agent isn't already a member — closes the auto-install path that was the most likely entry route for the multi-member legacy data.
backend/services/dmService.ts Comment refresh — was still documenting the rejected N×1 "office" design.
backend/scripts/migrate-agent-room-multimember.ts (NEW) Convert agent-room pods with 3+ members to type: 'chat'. Preserves messages, members, createdBy. Run with --dry to preview. Idempotent.
backend/__tests__/unit/controllers/podController.test.js 4 new tests: joinPod 403 on agent-room, chat-pod regression guard, admin bypass present, non-admin still filtered.
backend/__tests__/unit/services/agentIdentityService.ensureAgentInPod.test.js (NEW) 3 tests: refuse intruder agent, no-regression on chat pod, no-op for host agent re-running for its own DM.

Tests

PASS __tests__/unit/controllers/podController.test.js                12/12  (4 new)
PASS __tests__/unit/services/agentIdentityService.ensureAgentInPod   3/3   (all new)
PASS __tests__/unit/routes/pods.test.js                              5/5   (regression check)

20/20 across affected suites. No regression in adjacent test files.

Deploy steps after merge

  1. Roll out the backend image (standard deploy).
  2. Once running, trigger the migration on dev:
    kubectl exec -n commonly-dev <backend-pod> -- \
      npx ts-node backend/scripts/migrate-agent-room-multimember.ts --dry
    # review the offender list, then re-run without --dry
    
  3. Verify commonly pod list --instance dev shows no agent-room pod with > 2 members.

Out of scope (filed as follow-ups)

  • agent-admin type: ADR-001 §3.10 says this is legacy and slated for deprecation as LiteLLM session observability matures. This PR doesn't touch it; agent-admin can still have multiple admins (its design intent).
  • Frontend admin UI for the new "view all agent-rooms" affordance — backend now supports it; UI surface is a separate ticket.
  • A periodic invariant-check cron that warns if any agent-room ever drifts back to >2 members (defense in depth) — not added since the three write paths are now all guarded.

Test plan

  • Unit tests for joinPod reject + chat regression
  • Unit tests for ensureAgentInPod refuse + chat regression + self-noop
  • Unit tests for getPodsByType admin bypass + non-admin filter
  • Migration script idempotent + dry-run mode
  • Manual smoke after merge: log in as a non-admin, attempt to join an existing agent-room owned by another user → expect 403
  • Manual smoke after merge: log in as admin, hit /api/pods?type=agent-room → expect ALL agent-rooms returned regardless of membership
  • Run migration against dev DB, verify commonly pod list shows no agent-room with >2 members afterward

🤖 Generated with Claude Code

lilyshen0722 and others added 2 commits April 24, 2026 21:58
…multi-member pods

ADR-001 §3.10 says agent-rooms are personal 1:1 DMs (one user, one
agent, no third party). The amendment that corrected the original
"N humans × 1 agent office" framing was documented but never enforced
in code. Live data showed agent-room pods with 5–6 members on dev,
which is a privacy/UX bug: an agent's "DM" had multiple humans in it.

Changes:

1. `joinPod` (podController.ts): reject any third-person join on a
   pod whose `type === 'agent-room'` with HTTP 403. Applies to humans
   AND agents — the room is closed once the host agent + opening user
   are in. Multi-party human↔agent surfaces should use `type: 'chat'`.

2. `ensureAgentInPod` (agentIdentityService.ts): refuse to add an
   agent to an agent-room when that agent isn't already a member.
   Closes the auto-install path that was sneaking second agents into
   existing DMs. Existing-member case is unaffected (host agent
   re-running for its own DM still no-ops).

3. `getAllPods` / `getPodsByType` (podController.ts): admin-view
   bypass. Global admins skip the membership filter on agent-room +
   agent-admin queries so the moderation surface for these private
   pod types actually exists. Non-admins still see only their own.

4. Migration script `migrate-agent-room-multimember.ts`: convert
   `agent-room` pods with 3+ members to `type: 'chat'`. Preserves all
   messages, members, createdBy — only the type field changes. Run
   with `--dry` to preview. Idempotent.

5. `getOrCreateAgentRoom` (dmService.ts): comment refresh — was
   documenting the rejected N×1 "office" design that the §3.10
   amendment killed.

Tests: 4 new podController cases (joinPod reject + chat-pod regression,
admin bypass present + non-admin still filtered) + 3 ensureAgentInPod
cases (refuse, regress chat, no-op for self). All 20 affected tests
pass; no regression in adjacent suites.

Migration on dev needs a manual run after deploy:
  kubectl exec -n commonly-dev <backend-pod> -- \\
    npx ts-node backend/scripts/migrate-agent-room-multimember.ts --dry
  # review offenders, then drop --dry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…comments + getAllPods test

Three findings from the code-reviewer pass on PR #232.

1. CRITICAL — ObjectId equality bug in ensureAgentInPod (pre-existing,
   exposed by this PR). `pod.members.includes(agentUser._id)` uses JS
   `===` which compares object references, not ObjectId values. In
   production with real Mongoose docs, the host agent is always treated
   as "not a member" of its own room, the new agent-room guard fires,
   and `null` propagates to callers as "pod not found." Replace with
   `.some((m) => m.equals?.(id) ?? String(m) === String(id))` — same
   pattern already used by removeAgentFromPod a few lines below.

   The previous test masked this bug by overriding pod.members.includes
   with a hand-rolled .equals lookup. Test now uses fakeObjectId() with
   real `.equals()` semantics — no override — so it would fail under the
   broken `.includes()` and pass only with the fixed `.some(.equals)`.

2. Test gap on getAllPods admin bypass — only getPodsByType was directly
   tested. Add a parity case asserting that an admin caller hitting
   getAllPods with type=agent-room sees all agent-rooms, not just their
   own. (getAllPods has an extra .populate(parentPod) call vs.
   getPodsByType — the test mock chain mirrors that.)

3. The joinPod guard comment said "not even the creator (the agent
   itself)" can rejoin. Tighten to make explicit that `pod.createdBy`
   for agent-rooms is the host agent, not a human user, so the
   creator-bypass branch in the invite-only block is effectively dead
   for this pod type — preventing future readers from misreading the
   comment as implying a creator-bypass exists.

4. Migration script header — document the downstream consequences a
   reviewer flagged: UI tab move, privacy-filter behavior change after
   conversion, agentAutoJoinService scan visibility. These are correct
   behaviors but worth surfacing for the operator running the
   migration on prod.

21/21 tests pass on affected suites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@samxu01
Copy link
Copy Markdown
Contributor

samxu01 commented Apr 25, 2026

Self-review pass via code-reviewer subagent — addressed the critical finding + two important items in fbd93c7512.

Finding Severity Resolution
pod.members.includes(agentUser._id) uses JS ===, fails for Mongoose ObjectIds — host agent treated as "not a member" of its own room, the new agent-room guard fires, and null propagates to callers as "pod not found." Pre-existing bug; this PR exposed it. Critical Replace with .some((m) => m.equals?.(id) ?? String(m) === String(id)) — same pattern as removeAgentFromPod below. Test mock that hand-rolled the override deleted; new test uses real .equals()-bearing fakes so it would fail under the broken .includes().
getAllPods admin-bypass not directly tested (only getPodsByType was) Important Added parity case asserting an admin sees all agent-rooms via getAllPods
joinPod comment said "not even the creator" without clarifying that pod.createdBy for agent-rooms is the host agent (not a human user), so the creator-bypass in the invite-only block is effectively dead Important Tightened comment to make this explicit
Migration script silently changed pod.type = 'chat' without documenting downstream effects (UI tab move, privacy filter behavior, agentAutoJoinService scan visibility) Important Header now lists each consequence so the operator sees them before running on prod

Reviewer findings deliberately deferred:

  • Q: 8+ other ensureAgentInPod callers ignore the return value — true, and now null can mean "refused" not just "pod not found." For most callers (install paths) the membership prerequisite is already enforced upstream, so the new refuse path closes a back door rather than silently breaking. Worth a follow-up audit but not blocking this PR.
  • Q: Migration UX — the post-migration "5-member chat pod" is the simplest reasonable shape. All messages preserved, members preserved, just shows up under Chat instead of Agent DMs. Now documented in the script header.
  • Q: agentAutoJoinService visibility post-migration — same point, now documented as a known consequence the operator should audit before running on prod.

Verdict: Request changes → critical + 3 important addressed. Tests now 21/21 across affected suites (was 20/20 before; +1 for the getAllPods admin parity case).

@samxu01 samxu01 merged commit 9ea8ee0 into main Apr 25, 2026
10 checks passed
@samxu01 samxu01 deleted the fix/agent-room-1to1-invariant branch April 25, 2026 22:16
samxu01 added a commit that referenced this pull request Apr 25, 2026
… chat

PR #232's migration script converted any agent-room with >2 members to
type: 'chat'. Investigating the actual offenders on dev showed every
multi-member agent-room was an originally-1:1 DM (1 host agent + 1
human) polluted with rogue agents that auto-joined via ensureAgentInPod
— exactly the bug PR #232's runtime guards close going forward. The
relabel-to-chat strategy legitimized that pollution; restoration is the
correct action.

New migration logic:

  for each agent-room with members.length > 2:
    humans = members where User.isBot === false
    case humans.length === 1:
      keep [createdBy (host agent), the_human]; drop rogue agents
      stays type='agent-room'
    case humans.length === 0:
      agent↔agent DM. Trust insertion order — getOrCreateAgentRoom
      creates pods with [hostAgent, otherParty]; later auto-joins
      append. Keep [members[0], members[1]]; drop the rest.
      stays type='agent-room'
    case humans.length >= 2:
      Was never a DM (multi-human surfaces are chat pods, not DMs).
      type → 'chat'. All members preserved.

5 unit tests cover all three branches + dry-run + idempotency, mocking
Pod + User so they run without a real Mongo connection.

Also: tighten stale comment on POST /api/agents/runtime/room — was still
documenting the rejected "many humans × one agent office" framing. Now
matches ADR-001 §3.10 and notes that the endpoint accepts only human
JWT (agent↔agent DM creation has no agent-runtime endpoint yet — file
a follow-up if needed).

The previous migration script was never run on dev, so swapping in the
new logic doesn't require any rollback or compensating action.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants