Skip to content

fix(agent-room): rewrite migration to RESTORE 1:1 DMs, not relabel as chat#235

Merged
samxu01 merged 2 commits intomainfrom
fix/agent-room-migration-restore-dm
Apr 25, 2026
Merged

fix(agent-room): rewrite migration to RESTORE 1:1 DMs, not relabel as chat#235
samxu01 merged 2 commits intomainfrom
fix/agent-room-migration-restore-dm

Conversation

@samxu01
Copy link
Copy Markdown
Contributor

@samxu01 samxu01 commented Apr 25, 2026

Summary

Follow-up to #232. The previous migration script converted any agent-room pod with >2 members to type: 'chat'. Inspecting the actual offenders on api-dev showed every multi-member agent-room was an originally-1:1 DM polluted with rogue agents that auto-joined via ensureAgentInPod — exactly the bug PR #232's runtime guards close going forward. Relabel-to-chat legitimized the pollution; restoration is the correct action.

Live data that informed this

The three offending pods on dev all had the same shape: 1 host agent + 1 human + N rogue agents.

Pod Host agent Human Rogue agents that auto-joined
commonly-bot (1) commonly-bot ls111 openclaw-{fakesam, tarik, ai-citation-strategist, tom}
commonly-bot (2) commonly-bot xcjsam same 4
openclaw (fakesam) openclaw-fakesam xcjsam openclaw-{ai-citation-strategist, tom, tarik}

None of these were ever multi-party chats. They were 1:1 DMs with intruders. Promoting them to chat would have made the bug permanent.

New migration logic

for each agent-room with members.length > 2:
  humans = members where User.isBot === false
  case humans.length === 1:
    keep [createdBy, the_human]; drop rogue agents
    stays type='agent-room'
  case humans.length === 0:
    agent↔agent DM. Trust insertion order — getOrCreateAgentRoom creates
    pods with [hostAgent, otherParty]; later auto-joins append.
    keep [members[0], members[1]]; drop the rest.
    stays type='agent-room'
  case humans.length >= 2:
    Was never a DM (multi-human surfaces are chat pods, not DMs).
    type → 'chat'. All members preserved.

Per ADR-001 §3.10, agent-room is exactly 2 members; the pair can be human + agent OR agent + agent, never 3+.

Also fixed

backend/routes/agentsRuntime.ts:516 — stale comment on POST /api/agents/runtime/room was still documenting the rejected "many humans × one agent office" framing. Now matches §3.10 and notes that the endpoint accepts only human JWT (agent↔agent DM creation has no agent-runtime endpoint yet — file a follow-up if needed).

Tests

backend/__tests__/unit/scripts/migrate-agent-room-multimember.test.js — 5 tests covering all three branches + dry-run + idempotency. Pod + User mocks so no real Mongo connection needed.

PASS __tests__/unit/scripts/migrate-agent-room-multimember.test.js
  ✓ 1 host agent + 1 human + N rogue agents → restore [host, human], stay agent-room
  ✓ 0 humans → agent↔agent DM, keep [members[0], members[1]] by insertion order
  ✓ 2+ humans → was never a DM, convert to chat, members preserved
  ✓ --dry mode reports a plan but does not save
  ✓ idempotent — pods with members.length <= 2 are not even returned by the cursor query

21/21 across the migration test + the inherited podController + ensureAgentInPod suites from #232.

Rollback safety

The previous migration script (#232) was never run on dev, so swapping in this rewrite doesn't require any rollback or compensating action. The runtime guards in #232 already prevent new pollution.

Test plan

  • Unit tests for all three planner branches + dry-run + idempotency
  • Manual kubectl exec ... migrate-agent-room-multimember.ts --dry against dev — review the plan against the live data before committing
  • Re-run without --dry, verify commonly pod list --instance dev shows the three offending pods now have 2 members each and stayed agent-room type
  • Verify the host agent can still post into its own DM (i.e. the ObjectId-equality fix from fix(agent-room): enforce 1:1 invariant + admin view + migrate legacy multi-member pods #232 plus this restored membership both hold)

🤖 Generated with Claude Code

@samxu01
Copy link
Copy Markdown
Contributor Author

samxu01 commented Apr 25, 2026

Self-review pass via code-reviewer subagent — addressed all findings in a3ef49526a.

Finding Severity Resolution
createdBy not in members would silently produce a 2-member pod with a ghost ID (no AgentInstallation, no User session matches) Important Added guard: skip with a warning if hostId ∉ memberIds in the human-branch. Pinned with a new test.
Orphan member IDs (no User row) handled correctly but not documented — next reader would think it was an oversight Important Added comment block above the humans filter spelling out the orphan semantics
Insertion-order claim asserted "Mongoose preserves order" without citing the evidence Question Comment now lists all three load-bearing facts: dmService.ts:290 creation shape, .push()-only mutations, no $set:{members:[...]} reorder (verified via grep)
Dedup comment misattributed which branch could collide Nit Comment removed; the dedupe is just defensive
Agent↔agent test didn't assert pod.type === 'agent-room' Nit Added explicit assertion
keepIds test used order-sensitive toEqual everywhere Nit Kept toEqual for the ordered keepIds (order is part of the contract — first kept is host) but switched dropIds to arrayContaining since drop order is not contract

Tests now 6/6 (was 5/5; +1 for the ghost-host edge case). All other findings from #232 already in scope and addressed there. CI status incoming via the Monitor.

samxu01 and others added 2 commits April 25, 2026 16:00
… chat

PR #232's migration script converted any agent-room with >2 members to
type: 'chat'. Investigating the actual offenders on dev showed every
multi-member agent-room was an originally-1:1 DM (1 host agent + 1
human) polluted with rogue agents that auto-joined via ensureAgentInPod
— exactly the bug PR #232's runtime guards close going forward. The
relabel-to-chat strategy legitimized that pollution; restoration is the
correct action.

New migration logic:

  for each agent-room with members.length > 2:
    humans = members where User.isBot === false
    case humans.length === 1:
      keep [createdBy (host agent), the_human]; drop rogue agents
      stays type='agent-room'
    case humans.length === 0:
      agent↔agent DM. Trust insertion order — getOrCreateAgentRoom
      creates pods with [hostAgent, otherParty]; later auto-joins
      append. Keep [members[0], members[1]]; drop the rest.
      stays type='agent-room'
    case humans.length >= 2:
      Was never a DM (multi-human surfaces are chat pods, not DMs).
      type → 'chat'. All members preserved.

5 unit tests cover all three branches + dry-run + idempotency, mocking
Pod + User so they run without a real Mongo connection.

Also: tighten stale comment on POST /api/agents/runtime/room — was still
documenting the rejected "many humans × one agent office" framing. Now
matches ADR-001 §3.10 and notes that the endpoint accepts only human
JWT (agent↔agent DM creation has no agent-runtime endpoint yet — file
a follow-up if needed).

The previous migration script was never run on dev, so swapping in the
new logic doesn't require any rollback or compensating action.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two important items + several nits/questions from the code-reviewer pass.

1. Defensive guard: if `pod.createdBy` is NOT actually in `pod.members`
   (data corruption / bulk-import that bypassed the create hook), the
   human↔agent branch would silently produce a 2-member pod where
   hostId is a ghost — no AgentInstallation matches, no User session
   exists. Now skips the pod with a warning so an operator can triage.
   Test pins the behavior.

2. Orphan handling documented: a member ID with no User row returns
   undefined from isBotById.get(). Such orphans aren't counted as
   humans and aren't counted as bots. The script's behavior is now
   spelled out in a comment so a reader doesn't have to reverse-engineer
   it from the filter.

3. Insertion-order claim now cites the evidence: dmService.ts:290
   creates pods with [hostAgent, otherParty]; the only mutation paths
   use Mongoose .push() (append-only); no $set:{members:[...]} reorders.
   Verified via repo-wide grep at PR-time.

4. Misleading dedup comment removed (it conflated which branch's keepIds
   could collide; the de-dupe is just defensive).

5. Test for the agent↔agent branch now explicitly asserts pod.type
   stays 'agent-room' and uses arrayContaining for the unordered
   dropIds set (still uses toEqual for the ordered keepIds since order
   is part of the contract).

6/6 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@samxu01 samxu01 force-pushed the fix/agent-room-migration-restore-dm branch from a3ef495 to 8f0dd06 Compare April 25, 2026 23:01
@samxu01 samxu01 merged commit 5f0493c into main Apr 25, 2026
8 checks passed
@samxu01 samxu01 deleted the fix/agent-room-migration-restore-dm branch April 25, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant