fix(wrap): resolve DM participants for correct routing#644
Conversation
DMs with conv_/dm_ targets were blindly passed through to every wrapped agent. Now resolves participants via the Relaycast API (with 30s cache) and only delivers if the wrapped agent is in the conversation. Also fixes case-insensitive matching for self-echo filtering and direct target routing — previously used exact HashSet::contains which could miss differently-cased names from the cloud. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
src/wrap.rs
Outdated
| .unwrap_or_else(|error| { | ||
| tracing::debug!( | ||
| workspace_id = %workspace_id, | ||
| conversation_id = %conversation_id, | ||
| error = %error, | ||
| "failed resolving DM participants in wrap mode" | ||
| ); | ||
| vec![] | ||
| }); | ||
|
|
||
| cache.insert(cache_key, (Instant::now(), fetched.clone())); |
There was a problem hiding this comment.
🔴 Failed DM participant resolution silently drops messages (regression from old pass-through behavior)
When get_dm_participants fails (SDK error, network issue, uninitialized client), it returns Ok(vec![]) internally (relaycast_ws.rs:726-733). This empty result is then cached for 30 seconds at wrap.rs:1367. The caller at wrap.rs:1067-1078 checks is_participant against the empty list, which is always false, causing the DM to be dropped. All subsequent DMs for that conversation are also dropped for the cache TTL duration.
The old code unconditionally allowed dm_ and conv_ prefixed targets through (!mapped.target.starts_with("dm_") && !mapped.target.starts_with("conv_")), so transient failures never caused message loss. The new code has no fallback, turning any participant-resolution failure into silent message drops.
Was this helpful? React with 👍 or 👎 to provide feedback.
Presence events (agent.online/offline) have empty text and target, causing agents to see empty messages and respond to them — creating loops. Reactions similarly have no useful content for PTY injection. Skip both event types early in the wrap message handler. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract resolve_dm_participants_cached into helpers.rs, eliminating duplicate implementations in main.rs and wrap.rs. The unified function includes cache eviction (8192 cap), consistent workspace_id trimming, error-aware caching (errors not cached), and warn-level logging. Addresses: DRY violation, missing trim drift bug, unbounded cache, and cached-error suppression findings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… main.rs - Extract agent_name_eq() and is_self_name() helpers for case-insensitive agent name comparison, replacing scattered eq_ignore_ascii_case calls - Add DM_DROPS_TOTAL atomic counter for fail-closed DM drop metrics - Update resolve_dm_participants_cached error path to increment counter - Align main.rs DM routing to use shared helpers (agent_name_eq, is_self_name) - Add unit tests for agent_name_eq, is_self_name, and cache constants Addresses: code duplication, recurring case-sensitivity bugs, silent DM drop observability, cache eviction consistency, and test coverage gaps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| Err(error) => { | ||
| DM_DROPS_TOTAL.fetch_add(1, Ordering::Relaxed); | ||
| tracing::warn!( | ||
| workspace_id = %workspace_id, | ||
| conversation_id = %conversation_id, | ||
| error = %error, | ||
| dm_drops_total = DM_DROPS_TOTAL.load(Ordering::Relaxed), | ||
| "failed resolving DM participants — DM silently dropped" | ||
| ); | ||
| vec![] | ||
| } |
There was a problem hiding this comment.
🟡 Error results from DM participant resolution are no longer cached, enabling API call storms under sustained failures
The old resolve_dm_participants in src/main.rs:5256-5270 unconditionally cached the result (including vec![] from unwrap_or_else on error) for the 30-second TTL. The new resolve_dm_participants_cached only caches on the Ok path (src/helpers.rs:1055) and skips caching entirely on Err (src/helpers.rs:1058-1068). This means that when the DM participants API is failing (network issue, outage, etc.), every incoming DM for the same conversation-ID will trigger a fresh API call instead of being throttled by the 30-second TTL. Under sustained failures with many DMs, this can multiply failed API requests significantly compared to the old behaviour, which capped retries at roughly one per TTL window per conversation.
| Err(error) => { | |
| DM_DROPS_TOTAL.fetch_add(1, Ordering::Relaxed); | |
| tracing::warn!( | |
| workspace_id = %workspace_id, | |
| conversation_id = %conversation_id, | |
| error = %error, | |
| dm_drops_total = DM_DROPS_TOTAL.load(Ordering::Relaxed), | |
| "failed resolving DM participants — DM silently dropped" | |
| ); | |
| vec![] | |
| } | |
| Err(error) => { | |
| DM_DROPS_TOTAL.fetch_add(1, Ordering::Relaxed); | |
| tracing::warn!( | |
| workspace_id = %workspace_id, | |
| conversation_id = %conversation_id, | |
| error = %error, | |
| dm_drops_total = DM_DROPS_TOTAL.load(Ordering::Relaxed), | |
| "failed resolving DM participants — DM silently dropped" | |
| ); | |
| let empty: Vec<String> = vec![]; | |
| cache.insert(cache_key, (Instant::now(), empty.clone())); | |
| empty | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
conv_/dm_conversation-ID targets were being delivered to all wrapped agents instead of only the intended recipientmain.rs) and only delivers if the wrapped agent is a participantHashSet::containswhich missed differently-cased namesRoot cause
PR #641 added a DM routing filter but whitelisted
conv_/dm_prefixed targets as pass-through. Since Relaycast delivers most DMs withconversation_idas the target (not the agent name), every wrapped agent received every DM.Test plan
cargo test --allpasses (516 tests green)🤖 Generated with Claude Code