Skip to content

Conversation

@tbarbugli
Copy link
Member

@tbarbugli tbarbugli commented Oct 15, 2025

Summary by CodeRabbit

  • New Features

    • Unified messaging API with support for streaming content and message buffering
    • Automatic smart chunking for large messages with code block preservation
    • Participant tracking for multi-user realtime audio conversations
    • New transcription events for user and agent speech
  • Enhancements

    • Improved event streaming with unique identifiers for content tracking
    • Better message synchronization across conversation backends
    • Enhanced realtime audio handling with participant context
  • Updates

    • Dependency upgrades: getstream, cartesia, openai-realtime packages

@coderabbitai
Copy link

coderabbitai bot commented Oct 15, 2025

Walkthrough

This PR refactors the conversation messaging API from a fragmented streaming interface to a unified asynchronous interface with send_message and upsert_message. It adds event-driven LLM and speech event handlers in agents, implements multi-user participant tracking for realtime audio, completely reimplements GetStream integration with automatic message chunking, and migrates audio processing from torchaudio to PyAV.

Changes

Cohort / File(s) Change Summary
Conversation API Refactoring
agents-core/vision_agents/core/agents/conversation.py
Replaced fragmented streaming API with unified send_message and upsert_message methods. Introduced ContentBuffer, MessageState, and InMemoryConversation abstractions. Added async state tracking via _message_states and _lock for buffering and backend synchronization.
Agent Event Handlers
agents-core/vision_agents/core/agents/agents.py
Added _setup_llm_events and _setup_speech_events async methods to subscribe to LLM/STT events and route them into unified conversation API. Updated join flow to initialize handlers before conversation creation. Removed legacy delta/response handling methods.
Core Event System Updates
agents-core/vision_agents/core/events/base.py, agents-core/vision_agents/core/agents/events.py
Updated event dataclasses with double-quoted type field defaults. Added user_id() method to BaseEvent, plugin_name/plugin_version fields to PluginBaseEvent, and new fields reconnect_attempt and was_clean to connection events. Added error_message property to AgentSayErrorEvent.
LLM Event System
agents-core/vision_agents/core/llm/events.py
Switched quote style to double quotes across all event type fields. Removed RealtimeTranscriptEvent and RealtimePartialTranscriptEvent. Added new RealtimeUserSpeechTranscriptionEvent and RealtimeAgentSpeechTranscriptionEvent. Enhanced RealtimeResponseEvent with is_complete and conversation_item_id fields; added fields to other realtime events for status/role/content tracking.
Generated SFU Events
agents-core/vision_agents/core/edge/sfu_events.py
Updated from_proto return type annotations to string-literal annotations across ~18 event wrapper classes. Minor control-flow adjustments and dataclass field wrapping with field(...) for event type declarations.
Gemini LLM Integration
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py
Introduced UUID-based item_id tracking for streaming events. Reworked streaming loops to enumerate chunks and propagate item_id and index through all event emissions. Updated _standardize_and_emit_event signature to accept item_id and idx.
OpenAI LLM Integration
plugins/openai/vision_agents/plugins/openai/openai_llm.py
Standardized event forwarding to use explicit keyword arguments. Added item_id tracking for streaming and completion events. Enhanced tool-calling workflow with consistent message formatting and multi-round deduplication.
Realtime Audio Base
agents-core/vision_agents/core/llm/realtime.py
Added _current_participant: Optional[Participant] state. Extended simple_audio_response to accept optional participant parameter. Introduced _emit_user_speech_transcription and _emit_agent_speech_transcription methods. Removed old transcript emission methods.
Gemini Realtime
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py
Extended public API with optional participant parameter to simple_audio_response. Stores participant in _current_participant. Minor refactoring of multi-line formatting and receive loop conditionals.
OpenAI Realtime
plugins/openai/vision_agents/plugins/openai/openai_realtime.py
Added per-participant mapping from conversation items to Participant objects. Tracks pending participant for audio correlation. Enhanced transcript handling with unified agent/user speech emission. Added simple_response and simple_audio_response with participant support.
RTC Manager
plugins/openai/vision_agents/plugins/openai/rtc_manager.py
Updated data channel session to include input_audio_transcription (whisper-1) config. Minor formatting adjustments to multiline logging and signature styling. Changed start() to require non-None _shared_forwarder.
GetStream Integration
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py
Complete refactor from InMemoryConversation base to Conversation base. Introduced channel-scoped automatic chunking via _smart_chunk, _split_large_block, _force_split. Replaced worker-thread model with direct async channel interactions. Added lifecycle management for chunks with _create_chunks and _update_chunks.
GetStream Edge Transport
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py
Replaced persistent self.channel with local channel via chat_client.channel(...). Updated StreamConversation construction to pass local channel. Introduced is_agent_track logic to skip TrackAddedEvent emission for agent-owned tracks.
Dependency Version Bumps
agents-core/pyproject.toml, plugins/cartesia/pyproject.toml, plugins/openai/pyproject.toml
Updated getstream[webrtc,telemetry] from ≥2.5.3 to ≥2.5.5; cartesia from ≥2.0.5 to ≥2.0.9; openai[realtime] from ≥2.2.0 to ≥2.5.0.
Audio Library Migration
plugins/deepgram/pyproject.toml, plugins/deepgram/tests/test_stt.py, plugins/silero/pyproject.toml, plugins/silero/tests/test_vad.py
Replaced torchaudio>=2.7.1 with torchvision>=0.20.0 in dev dependencies. Refactored audio loading from torchaudio to PyAV (av.open, frame decoding, mono conversion, resampling to 48kHz). Added float32 normalization and int16 conversion.
Comprehensive Test Suite
plugins/getstream/tests/test_message_chunking.py, plugins/getstream/tests/test_stream_conversation.py, tests/test_conversation.py
Added extensive tests for message chunking logic, boundary handling, code-block preservation, and chunk metadata tracking. Added streaming integration tests with mocked GetStream channel. Migrated conversation tests to InMemoryConversation-centric async API validation.
Event Integration Tests
plugins/gemini/tests/test_gemini_llm.py, plugins/openai/tests/test_openai_llm.py, plugins/openai/tests/test_openai_realtime.py
Added test_events methods to verify streaming event emission, chunk/completion sequences, and item_id consistency. Added transcription event tests for user/agent speech tracking and multi-user participant scenarios.
Example Updates
examples/01_simple_agent_example/simple_agent_example.py, examples/02_golf_coach_example/golf_coach_example.py
Switched LLM provider from OpenAI to Gemini. Reduced Gemini Realtime FPS from 10 to 3. Converted single-line constructs to multi-line formatting.
Test Infrastructure
tests/test_realtime_base.py
Updated FakeRealtime.simple_audio_response signature to accept optional participant parameter for consistency with realtime implementations.
Configuration & Documentation
.cursor/rules/python.mdc
Added metadata block to Python rules markdown with description, globs, and strict alwaysApply flag for running Python or tests. Includes note on project tooling (uv usage, avoiding python -m).

Sequence Diagram

sequenceDiagram
    participant Agent
    participant LLMSetup as _setup_llm_events
    participant SpeechSetup as _setup_speech_events
    participant LLMProvider
    participant STTProvider
    participant Conversation
    
    Note over Agent: join() flow initialization
    Agent->>LLMSetup: await _setup_llm_events()
    LLMSetup->>LLMProvider: subscribe to LLM events
    
    Agent->>SpeechSetup: await _setup_speech_events()
    SpeechSetup->>STTProvider: subscribe to STT/transcription events
    
    Note over Agent: After MCP connection
    Agent->>Conversation: create/initialize
    
    par LLM Event Path
        LLMProvider->>LLMSetup: emit completion/delta events
        LLMSetup->>Conversation: upsert_message(role=assistant, content_index)
    and Speech Event Path
        STTProvider->>SpeechSetup: emit transcription events
        SpeechSetup->>Conversation: upsert_message(role=user, completed=True)
    end
    
    Conversation->>Conversation: buffer/accumulate content
    Conversation->>Conversation: _sync_to_backend() [async]
Loading
sequenceDiagram
    participant Client
    participant StreamConversation
    participant GetStreamChannel
    participant Backend
    
    Note over StreamConversation: Unified upsert_message API
    Client->>StreamConversation: upsert_message(content delta, content_index)
    
    activate StreamConversation
    StreamConversation->>StreamConversation: buffer fragment in ContentBuffer
    StreamConversation->>StreamConversation: accumulate content via _apply_pending()
    
    alt First time (not in backend)
        StreamConversation->>StreamConversation: _smart_chunk() split into chunks
        StreamConversation->>GetStreamChannel: _create_chunks() send all chunks
        GetStreamChannel->>Backend: send chunk messages with metadata
    else Subsequent updates
        StreamConversation->>StreamConversation: _update_chunks() compute new chunks
        GetStreamChannel->>Backend: create/update/delete chunks as needed
    end
    
    alt On completion (completed=True)
        StreamConversation->>StreamConversation: finalize all chunks
        GetStreamChannel->>Backend: mark generating=false, finalize state
    end
    deactivate StreamConversation
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Rationale: This PR encompasses significant refactoring across multiple interconnected systems (conversation API, agent orchestration, LLM event handling, GetStream integration, realtime audio). While many changes follow consistent patterns (event type quote updates, participant parameter threading), the core logic changes are heterogeneous and demand separate reasoning: unified message buffering/upsert semantics, chunk lifecycle management with backend synchronization, multi-user participant tracking, and event-driven LLM/speech routing. The breadth of affected files (20+) and the density of business logic in agents.py, conversation.py, stream_conversation.py, and openai_realtime.py require careful validation of cross-domain interactions.

Possibly related PRs

Suggested reviewers

  • maxkahan
  • Nash0x7E2
  • d3xvn

Poem

Fragments converge on a pin-point—
each delta, each breath now one message,
the bell jar lifted at last.
Chunks marry Backend in ceremony;
participants named, transcripts whole:
a unification bell, tolling.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 59.26% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "New conversation API" accurately reflects the primary change in this changeset. The core modification involves replacing a fragmented streaming API with a unified, asynchronous messaging API, as evidenced by the substantial refactoring in agents-core/vision_agents/core/agents/conversation.py which introduces a new Conversation base class with send_message and upsert_message methods. The title is concise, specific, and clearly communicates the main change without vagueness or misleading information.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch option-c-unified-message-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
plugins/openai/tests/test_openai_llm.py (1)

41-43: Use pytest_asyncio.fixture for async fixtures
Switch the decorator and add the import.

+import pytest_asyncio
@@
-    @pytest.fixture
-    async def llm(self) -> OpenAILLM:
+    @pytest_asyncio.fixture
+    async def llm(self) -> OpenAILLM:
         llm = OpenAILLM(model="gpt-4o")
         return llm
agents-core/vision_agents/core/agents/agents.py (1)

329-351: Duplicate wildcard subscriptions → double event delivery

on_wildcard is registered twice for _coordinator_ws_client, causing duplicate events.

Apply:

-            connection._connection._coordinator_ws_client.on_wildcard(
-                "*", lambda event_name, event: self.events.send(event)
-            )
...
-            connection._connection._coordinator_ws_client.on_wildcard(
-                "*",
-                lambda event_name, event: self.events.send(event),
-            )
+            connection._connection._coordinator_ws_client.on_wildcard(
+                "*", lambda event_name, event: self.events.send(event)
+            )

Keep a single registration for coordinator and the one for _ws_client.

🧹 Nitpick comments (14)
agents-core/vision_agents/core/events/base.py (2)

117-118: Field name ‘user_id’ conflicts with BaseEvent.user_id; rename or drop

This dataclass field shadows the BaseEvent.user_id accessor. It creates inconsistent semantics across events (sometimes attribute, sometimes callable/property) and confuses type-checkers.

Consider renaming to avoid the clash, or rely on BaseEvent.user_id derived from user_metadata.

-    user_id: Optional[str] = None  # type: ignore[assignment]
+    connection_user_id: Optional[str] = None

If you keep it, make BaseEvent.user_id a property (see previous comment) to standardize access patterns. Based on learnings.


41-41: Broaden user_metadata typing to interop with local Participant wrappers

Events elsewhere pass a local Participant dataclass (not the pb2 Participant). Current narrow typing can trigger false positives in type-checkers.

-    user_metadata: Optional[Participant] = None
+    user_metadata: Optional[Any] = None

Alternatively, introduce a Protocol with a user_id: Optional[str] attribute for stronger typing. Based on learnings.

agents-core/vision_agents/core/agents/conversation.py (2)

78-79: Lock held across I/O; confirm intended serialization

Holding self._lock while awaiting _sync_to_backend serializes all message ops and can increase latency. If ordering is not strictly required, consider narrowing the lock to state mutations only, and release before backend I/O.

Would you like a draft refactor that separates state updates (under lock) from backend sync (outside lock) with an immutable payload?


22-25: Use timezone-aware timestamp

For consistency with other UTC-aware timestamps in the codebase, prefer an aware datetime.

-        self.timestamp = datetime.datetime.now()
+        self.timestamp = datetime.datetime.now(datetime.UTC)
agents-core/vision_agents/core/edge/sfu_events.py (1)

488-494: Consistent user_id accessor matches BaseEvent (post-refactor)

This property correctly guards for missing payload fields. Once BaseEvent.user_id is a property, the override becomes conventional and you can drop type: ignore comments in similar properties.

tests/test_conversation.py (1)

45-50: Optional: avoid asserting exact TypeError message text.

Message wording can vary by Python version. It’s enough to assert TypeError raised.

-        with pytest.raises(TypeError) as exc_info:
-            Conversation("instructions", [])
-        assert "Can't instantiate abstract class" in str(exc_info.value)
+        with pytest.raises(TypeError):
+            Conversation("instructions", [])
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)

105-112: Add a monotonic sequence_number across the entire stream (don’t reuse idx across phases).

idx resets in follow-up loops, which can confuse event consumers. Use LLMResponseChunkEvent.sequence_number to carry a single increasing counter across both loops; keep content_index for per-phase indexing if you like.

Apply this refactor:

@@
-        for idx, chunk in enumerate(iterator):
+        seq = 0
+        for idx, chunk in enumerate(iterator):
             response_chunk: GenerateContentResponse = chunk
             final_chunk = response_chunk
-            self._standardize_and_emit_event(response_chunk, text_parts, item_id, idx)
+            self._standardize_and_emit_event(response_chunk, text_parts, item_id, idx, seq)
+            seq += 1
@@
-                for idx, chk in enumerate(follow_up_iter):
+                for idx, chk in enumerate(follow_up_iter):
                     follow_up_last = chk
-                    # TODO: unclear if this is correct (item_id and idx)
-                    self._standardize_and_emit_event(chk, follow_up_text_parts, item_id, idx)
+                    self._standardize_and_emit_event(chk, follow_up_text_parts, item_id, idx, seq)
+                    seq += 1
@@
-    def _standardize_and_emit_event(self, chunk: GenerateContentResponse, text_parts: List[str], item_id: str, idx: int) -> Optional[LLMResponseEvent[Any]]:
+    def _standardize_and_emit_event(self, chunk: GenerateContentResponse, text_parts: List[str], item_id: str, idx: int, seq: int) -> Optional[LLMResponseEvent[Any]]:
@@
-            self.events.send(LLMResponseChunkEvent(
+            self.events.send(LLMResponseChunkEvent(
                 plugin_name="gemini",
                 content_index=idx,
                 item_id=item_id,
-                delta=chunk.text,
+                delta=chunk.text,
+                sequence_number=seq,
             ))

Item_id propagation and completion event look correct. Nice.

Also applies to: 153-157, 214-221

agents-core/vision_agents/core/agents/agents.py (2)

709-729: Rename: hasImageProcessers → has_image_processors

Spelling and PEP8. No behavior change.

Apply:

-        hasImageProcessers = len(self.image_processors) > 0
+        has_image_processors = len(self.image_processors) > 0
...
-        if not hasImageProcessers:
+        if not has_image_processors:
...
-                    if hasImageProcessers:
+                    if has_image_processors:

196-240: Add Google-style docstrings to new setup hooks

Document event subscriptions, responsibilities, and side effects.

Example for _setup_llm_events:

async def _setup_llm_events(self):
    """Subscribe to LLM events and sync TTS + conversation.

    Subscribes:
      - LLMResponseCompletedEvent: sends TTS (if any) and replaces assistant message.
      - LLMResponseChunkEvent: appends/streams assistant deltas.

    Side effects:
      - Writes to self.conversation if available.
      - Calls self.tts.send() for non-empty completions.
    """

As per coding guidelines

Also applies to: 241-288

plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)

126-129: Avoid double-converting tools; _get_tools_for_provider already returns provider format

Pass through directly.

-        tools_spec = self._get_tools_for_provider()
-        if tools_spec:
-            kwargs["tools"] = self._convert_tools_to_provider_format(tools_spec)  # type: ignore[arg-type]
+        tools_spec = self._get_tools_for_provider()
+        if tools_spec:
+            kwargs["tools"] = tools_spec  # already provider-formatted

79-85: Type hint: participant should be Optional[Participant]

Minor typing polish.

-        participant: Participant = None,
+        participant: Optional[Participant] = None,
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)

85-103: API error check may never trigger; SDK raises on non-2xx

response.data.message.type == "error" is unlikely; the SDK usually raises exceptions. Consider removing or logging only.

plugins/getstream/tests/test_message_chunking.py (2)

138-144: Strengthen boundary test

One char over should split.

-        # Should be split (no newlines, so splits anywhere)
-        assert len(chunks) >= 1
+        # Should be split into at least 2 chunks
+        assert len(chunks) >= 2

161-167: Clarify multiple code blocks assertion

Directly assert presence of both blocks.

-        assert full_text.count('```python') == full_text.count('```') / 2 or '```python' in full_text
-        assert 'code1()' in full_text
-        assert 'code2()' in full_text
+        assert '```python' in full_text and '```javascript' in full_text
+        assert 'code1()' in full_text
+        assert 'code2()' in full_text
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 39a821d and 102a61e.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (20)
  • .cursor/rules/python.mdc (1 hunks)
  • agents-core/pyproject.toml (1 hunks)
  • agents-core/vision_agents/core/agents/agents.py (19 hunks)
  • agents-core/vision_agents/core/agents/conversation.py (2 hunks)
  • agents-core/vision_agents/core/agents/events.py (2 hunks)
  • agents-core/vision_agents/core/edge/sfu_events.py (70 hunks)
  • agents-core/vision_agents/core/events/base.py (4 hunks)
  • agents-core/vision_agents/core/llm/events.py (9 hunks)
  • examples/01_simple_agent_example/simple_agent_example.py (2 hunks)
  • examples/02_golf_coach_example/golf_coach_example.py (2 hunks)
  • plugins/cartesia/pyproject.toml (1 hunks)
  • plugins/gemini/tests/test_gemini_llm.py (2 hunks)
  • plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (6 hunks)
  • plugins/getstream/tests/test_message_chunking.py (1 hunks)
  • plugins/getstream/tests/test_stream_conversation.py (1 hunks)
  • plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1 hunks)
  • plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1 hunks)
  • plugins/openai/tests/test_openai_llm.py (3 hunks)
  • plugins/openai/vision_agents/plugins/openai/openai_llm.py (15 hunks)
  • tests/test_conversation.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide

Files:

  • plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py
  • examples/01_simple_agent_example/simple_agent_example.py
  • plugins/gemini/tests/test_gemini_llm.py
  • plugins/openai/vision_agents/plugins/openai/openai_llm.py
  • agents-core/vision_agents/core/llm/events.py
  • agents-core/vision_agents/core/edge/sfu_events.py
  • agents-core/vision_agents/core/agents/events.py
  • agents-core/vision_agents/core/agents/agents.py
  • plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py
  • plugins/openai/tests/test_openai_llm.py
  • agents-core/vision_agents/core/agents/conversation.py
  • tests/test_conversation.py
  • agents-core/vision_agents/core/events/base.py
  • plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py
  • plugins/getstream/tests/test_stream_conversation.py
  • examples/02_golf_coach_example/golf_coach_example.py
  • plugins/getstream/tests/test_message_chunking.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

tests/**/*.py: Do not use mocking in tests
All tests must use pytest
Integration tests must be marked with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic

Files:

  • tests/test_conversation.py
🧬 Code graph analysis (17)
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (2)
plugins/getstream/tests/test_message_chunking.py (2)
  • conversation (15-27)
  • conversation (244-251)
agents-core/vision_agents/core/agents/conversation.py (6)
  • Conversation (67-227)
  • Message (14-24)
  • MessageState (55-64)
  • _sync_to_backend (213-223)
  • _sync_to_backend (233-237)
  • send_message (80-120)
examples/01_simple_agent_example/simple_agent_example.py (4)
examples/02_golf_coach_example/golf_coach_example.py (1)
  • start_agent (12-38)
plugins/gemini/tests/test_gemini_llm.py (1)
  • llm (33-36)
plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
  • simple_response (79-107)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
  • simple_response (88-99)
plugins/gemini/tests/test_gemini_llm.py (4)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (108-123)
  • LLMResponseCompletedEvent (127-133)
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (2)
  • GeminiLLM (21-322)
  • send_message (75-182)
plugins/gemini/vision_agents/plugins/gemini/events.py (2)
  • GeminiResponseEvent (35-38)
  • GeminiErrorEvent (14-17)
agents-core/vision_agents/core/events/manager.py (1)
  • wait (451-465)
plugins/openai/vision_agents/plugins/openai/openai_llm.py (4)
agents-core/vision_agents/core/edge/sfu_events.py (7)
  • Participant (229-270)
  • participant (1495-1500)
  • participant (1537-1542)
  • participant (1610-1615)
  • participant (2078-2083)
  • participant (2127-2132)
  • name (2162-2166)
agents-core/vision_agents/core/llm/llm.py (4)
  • LLM (35-358)
  • LLMResponseEvent (25-28)
  • _dedup_and_execute (313-345)
  • _get_tools_for_provider (86-95)
agents-core/vision_agents/core/llm/events.py (3)
  • LLMResponseChunkEvent (108-123)
  • LLMResponseCompletedEvent (127-133)
  • error_message (103-104)
plugins/openai/vision_agents/plugins/openai/events.py (2)
  • OpenAIStreamEvent (7-11)
  • LLMErrorEvent (15-19)
agents-core/vision_agents/core/llm/events.py (1)
agents-core/vision_agents/core/events/base.py (1)
  • PluginBaseEvent (50-52)
agents-core/vision_agents/core/edge/sfu_events.py (1)
agents-core/vision_agents/core/events/base.py (2)
  • user_id (43-46)
  • BaseEvent (34-46)
agents-core/vision_agents/core/agents/events.py (2)
agents-core/vision_agents/core/edge/sfu_events.py (6)
  • user_id (489-493)
  • user_id (856-860)
  • user_id (901-905)
  • user_id (1186-1190)
  • user_id (2071-2075)
  • user_id (2113-2117)
agents-core/vision_agents/core/events/base.py (2)
  • user_id (43-46)
  • PluginBaseEvent (50-52)
agents-core/vision_agents/core/agents/agents.py (7)
agents-core/vision_agents/core/stt/events.py (1)
  • STTTranscriptEvent (7-22)
agents-core/vision_agents/core/turn_detection/events.py (2)
  • TurnStartedEvent (10-25)
  • TurnEndedEvent (29-44)
agents-core/vision_agents/core/agents/conversation.py (2)
  • Conversation (67-227)
  • upsert_message (122-210)
agents-core/vision_agents/core/mcp/mcp_manager.py (1)
  • MCPManager (9-147)
agents-core/vision_agents/core/events/manager.py (2)
  • subscribe (292-353)
  • send (406-449)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
  • create_conversation (224-231)
  • create_audio_track (291-294)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
  • VideoForwarder (13-188)
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
agents-core/vision_agents/core/llm/llm.py (1)
  • LLMResponseEvent (25-28)
plugins/openai/tests/test_openai_llm.py (4)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (108-123)
  • LLMResponseCompletedEvent (127-133)
plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
  • OpenAILLM (31-509)
plugins/openai/vision_agents/plugins/openai/events.py (2)
  • OpenAIStreamEvent (7-11)
  • LLMErrorEvent (15-19)
agents-core/vision_agents/core/events/manager.py (1)
  • wait (451-465)
agents-core/vision_agents/core/agents/conversation.py (1)
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)
  • _sync_to_backend (45-64)
tests/test_conversation.py (1)
agents-core/vision_agents/core/agents/conversation.py (3)
  • Message (14-24)
  • send_message (80-120)
  • upsert_message (122-210)
agents-core/vision_agents/core/events/base.py (1)
agents-core/vision_agents/core/edge/sfu_events.py (7)
  • Participant (229-270)
  • user_id (489-493)
  • user_id (856-860)
  • user_id (901-905)
  • user_id (1186-1190)
  • user_id (2071-2075)
  • user_id (2113-2117)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)
  • StreamConversation (17-353)
plugins/getstream/tests/test_stream_conversation.py (3)
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)
  • StreamConversation (17-353)
plugins/getstream/tests/test_message_chunking.py (3)
  • mock_channel (217-241)
  • conversation (15-27)
  • conversation (244-251)
agents-core/vision_agents/core/agents/conversation.py (2)
  • send_message (80-120)
  • upsert_message (122-210)
examples/02_golf_coach_example/golf_coach_example.py (3)
plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
  • simple_response (79-107)
agents-core/vision_agents/core/agents/agents.py (1)
  • simple_response (168-180)
tests/test_realtime_base.py (1)
  • simple_response (238-252)
plugins/getstream/tests/test_message_chunking.py (2)
plugins/getstream/tests/test_stream_conversation.py (2)
  • stream_conversation (48-57)
  • mock_channel (24-45)
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (3)
  • StreamConversation (17-353)
  • _smart_chunk (206-301)
  • _split_large_block (303-332)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
🔇 Additional comments (13)
examples/02_golf_coach_example/golf_coach_example.py (3)

17-17: Verify 3 FPS provides adequate golf swing analysis.

The FPS reduction from 10 to 3 significantly decreases how often the LLM processes frames. A golf swing typically completes in 1–2 seconds, so at 3 FPS, only 3–6 frames are analyzed per swing. While this conserves cost (as the comment notes), it may miss critical swing phase details needed for meaningful feedback.

Please confirm this frame rate has been tested and still delivers useful coaching insights.


19-21: LGTM.

Multi-line formatting improves readability.


33-35: LGTM.

Multi-line formatting improves readability.

examples/01_simple_agent_example/simple_agent_example.py (4)

6-6: LGTM! Import aligns with LLM provider change.

The import change from openai to gemini is consistent with the LLM initialization below.


16-18: LGTM! Improved readability.

The multiline formatting of agent_user improves code readability without changing functionality.


25-27: LGTM! Improved readability.

The multiline formatting of TurnDetection improves code readability without changing functionality.


12-12: Model identifier is correct. The canonical Gemini 2.0 Flash ID is gemini-2.0-flash; the -exp suffix denotes the experimental variant and doesn’t apply here.

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)

226-231: Channel scoping refactor looks good

Using a local channel and passing it to StreamConversation removes unnecessary state and aligns with the chunked backend sync design.

plugins/gemini/tests/test_gemini_llm.py (1)

82-206: Strong integration coverage for streaming/event pipeline

Good assertions on chunk presence, completion, item_id consistency, and reconstruction.

Minor thought: if any provider reorders indices, consider asserting on sorted(content_indices) against range(max+1). Not blocking.

plugins/openai/tests/test_openai_llm.py (1)

101-228: Event-stream test is thorough and aligns with the new event model.

Good coverage: chunk vs completion correlation, item_id propagation, ordering, and OpenAI stream event types.

Please confirm OpenAILLM sets content_index as None or sequential; your test tolerates both, which is fine given current mapping notes.

agents-core/vision_agents/core/agents/events.py (1)

50-52: error_message property addition looks good.

Simple, safe accessor; no concerns.

agents-core/vision_agents/core/llm/events.py (1)

108-134: LGTM on event schema updates

item_id and delta additions align with upstream emitters; no issues spotted.

plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)

146-156: Verify SDK method and payload shape
Ensure ephemeral_message_update exists on the client and that its set payload accepts the chunk metadata at top level; if the SDK requires metadata under a nested custom key, adjust the payload accordingly.

requires-python = ">=3.10"
dependencies = [
"getstream[webrtc,telemetry]>=2.5.3",
"getstream[webrtc,telemetry]>=2.5.5",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Blocker: getstream>=2.5.5 is not publishable yet.

PyPI’s latest public wheel is 2.5.3; requesting 2.5.5 makes installs fail with “No matching distribution found.” Please keep the floor at 2.5.3 (or whichever released version supplies the needed features) until 2.5.5 actually ships.

-    "getstream[webrtc,telemetry]>=2.5.5",
+    "getstream[webrtc,telemetry]>=2.5.3",

Based on learnings

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"getstream[webrtc,telemetry]>=2.5.5",
"getstream[webrtc,telemetry]>=2.5.3",
🤖 Prompt for AI Agents
In agents-core/pyproject.toml around line 24, the dependency currently requests
getstream[webrtc,telemetry]>=2.5.5 which isn't published on PyPI; change the
constraint to >=2.5.3 (or the newest released version that provides the required
features) — e.g. getstream[webrtc,telemetry]>=2.5.3 — then update the lockfile
(poetry lock / pip-compile / equivalent) and verify the needed features exist in
that released version before finalizing.

Comment on lines +190 to +202
# Update content
if content_index is not None:
# Streaming: buffer fragments in order
state.buffer.add_fragment(content_index, content)
message.content = state.buffer.get_accumulated()
elif replace:
# Replace all content
state.buffer.clear()
message.content = content
else:
# Append to existing
message.content += content

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix content loss when mixing streaming (content_index) and append updates

Appending without updating the ContentBuffer means the next indexed chunk overwrite will drop appended text. Use the buffer for all mutations to keep a single source of truth.

-            elif replace:
+            elif replace:
                 # Replace all content
                 state.buffer.clear()
                 message.content = content
             else:
-                # Append to existing
-                message.content += content
+                # Append to existing via buffer to preserve order with future indexed chunks
+                next_index = state.buffer.last_index + 1
+                if next_index < 0:
+                    next_index = 0
+                state.buffer.add_fragment(next_index, content)
+                message.content = state.buffer.get_accumulated()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Update content
if content_index is not None:
# Streaming: buffer fragments in order
state.buffer.add_fragment(content_index, content)
message.content = state.buffer.get_accumulated()
elif replace:
# Replace all content
state.buffer.clear()
message.content = content
else:
# Append to existing
message.content += content
# Update content
if content_index is not None:
# Streaming: buffer fragments in order
state.buffer.add_fragment(content_index, content)
message.content = state.buffer.get_accumulated()
elif replace:
# Replace all content
state.buffer.clear()
message.content = content
else:
# Append to existing via buffer to preserve order with future indexed chunks
next_index = state.buffer.last_index + 1
if next_index < 0:
next_index = 0
state.buffer.add_fragment(next_index, content)
message.content = state.buffer.get_accumulated()
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/conversation.py around lines 190-202,
the current logic appends text directly to message.content when content_index is
None and replace is False, which causes appended text to be lost when later
indexed fragments arrive; instead, always mutate the ContentBuffer as the single
source of truth: for replace, clear the buffer and (if appropriate) add the
replacement content as a base fragment; for append, add the incoming content to
the buffer via the buffer's fragment/append API (or add_fragment with a
sentinel/next index), and after every branch set message.content =
state.buffer.get_accumulated() so message.content is always derived from the
buffer.

Comment on lines +12 to 13
user_id: Optional[str] = None # type: ignore[assignment]
metadata: Optional[Dict[str, Any]] = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Do not shadow BaseEvent.user_id method; this breaks polymorphism at runtime.

Declaring a user_id field overrides the BaseEvent.user_id method. Any code calling event.user_id() will crash. Rename the field and, if needed, override the method explicitly.

Apply this diff to avoid shadowing:

-    user_id: Optional[str] = None  # type: ignore[assignment]
+    actor_user_id: Optional[str] = None
@@
-    user_id: Optional[str] = None  # type: ignore[assignment]
+    actor_user_id: Optional[str] = None
@@
-    user_id: Optional[str] = None  # type: ignore[assignment]
+    actor_user_id: Optional[str] = None
@@
-    user_id: Optional[str] = None  # type: ignore[assignment]
+    actor_user_id: Optional[str] = None

Then add method overrides to preserve the original API shape:

# Add inside each class
def user_id(self) -> Optional[str]:  # type: ignore[override]
    # Prefer explicit field if set, else fallback to BaseEvent behavior
    return self.actor_user_id or super().user_id()

Check for method-style calls that would be broken:

Also applies to: 26-27, 36-37, 47-48


🏁 Script executed:

#!/bin/bash
rg -nP --type=py -C2 '\.user_id\s*\('

Length of output: 825


Avoid shadowing BaseEvent.user_id method
Rename the user_id attribute in each subclass (e.g. to actor_user_id) and override user_id() to preserve the original API:

-    user_id: Optional[str] = None  # type: ignore[assignment]
+    actor_user_id: Optional[str] = None
def user_id(self) -> Optional[str]:  # type: ignore[override]
    return self.actor_user_id or super().user_id()

This prevents runtime errors in calls to event.user_id().

🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/events.py around lines 12 to 13, the
subclasses define a user_id attribute which shadows the BaseEvent.user_id
method; rename the attribute in each subclass (for example to actor_user_id) and
add an overriding user_id(self) -> Optional[str] method that returns the new
attribute or falls back to super().user_id() (use type ignore override if
needed), so calls to event.user_id() continue to work and avoid runtime errors.

Comment on lines +43 to +46
def user_id(self) -> Optional[str]:
if self.user_metadata is None:
return None
return getattr(self.user_metadata, "user_id")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Make user_id a property and guard for missing attribute

Unify the API shape with other event classes and avoid method/field clashes. Also prevent AttributeError if user_metadata lacks user_id.

-    def user_id(self) -> Optional[str]:
-        if self.user_metadata is None:
-            return None
-        return getattr(self.user_metadata, "user_id")
+    @property
+    def user_id(self) -> Optional[str]:
+        if self.user_metadata is None:
+            return None
+        return getattr(self.user_metadata, "user_id", None)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def user_id(self) -> Optional[str]:
if self.user_metadata is None:
return None
return getattr(self.user_metadata, "user_id")
@property
def user_id(self) -> Optional[str]:
if self.user_metadata is None:
return None
return getattr(self.user_metadata, "user_id", None)
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/events/base.py around lines 43 to 46, change
the user_id method into a @property that returns Optional[str] and guard against
a missing attribute by using getattr(self.user_metadata, "user_id", None) (or
equivalent) instead of getattr(...) without a default; keep the existing None
check for user_metadata and update any type hints or docstrings to reflect that
user_id is a property rather than a method.

import pytest
import uuid
import asyncio
from unittest.mock import Mock, AsyncMock
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace mocks with simple fakes (tests must not use mocking).

Per test guidelines, avoid unittest.mock. Implement minimal fake Channel/Client classes with the needed async methods and state tracking instead of Mock/AsyncMock.

Apply these changes:

  • Remove the mock imports.
  • Rewrite the fixture to return fakes.
-from unittest.mock import Mock, AsyncMock
@@
-    def mock_channel(self):
-        """Create a mock Channel."""
-        channel = Mock()
-        channel.channel_type = "messaging"
-        channel.channel_id = "test-channel-123"
-        
-        # Mock the client
-        channel.client = Mock()
-        
-        # Create async mocks for client methods
-        channel.client.update_message_partial = AsyncMock(return_value=Mock())
-        channel.client.ephemeral_message_update = AsyncMock(return_value=Mock())
-        
-        # Mock send_message response
-        mock_response = Mock()
-        mock_response.data.message.id = "stream-message-123"
-        mock_response.data.message.type = "regular"
-        
-        # Create async mock for send_message
-        channel.send_message = AsyncMock(return_value=mock_response)
-        
-        return channel
+    def mock_channel(self):
+        """Create a fake Channel (no unittest.mock)."""
+        class FakeResponseMsg:
+            def __init__(self, mid: str):
+                self.id = mid
+                self.type = "regular"
+                self.custom = {}
+                self.text = ""
+        class FakeResponseData:
+            def __init__(self, mid: str):
+                self.message = FakeResponseMsg(mid)
+        class FakeResponse:
+            def __init__(self, mid: str):
+                self.data = FakeResponseData(mid)
+        class FakeClient:
+            async def update_message_partial(self, *_args, **_kwargs):
+                return object()
+            async def ephemeral_message_update(self, *_args, **_kwargs):
+                return object()
+            async def delete_message(self, *_args, **_kwargs):
+                return object()
+        class FakeChannel:
+            def __init__(self):
+                self.channel_type = "messaging"
+                self.channel_id = "test-channel-123"
+                self.client = FakeClient()
+                self._send_count = 0
+            async def send_message(self, request):
+                mid = f"stream-message-{self._send_count}"
+                self._send_count += 1
+                return FakeResponse(mid)
+        return FakeChannel()

Adjust assertions that check Mock call methods to validate behavior (e.g., maintain counters or flags on the fake).

I can provide tailored fakes and assertion updates if you confirm required behaviors.

Also applies to: 23-45

🤖 Prompt for AI Agents
In plugins/getstream/tests/test_stream_conversation.py around lines 7 and 23-45,
the test currently imports and uses unittest.mock.Mock/AsyncMock; replace those
with small hand-written fake classes: remove the mock imports, implement a
FakeChannel and FakeClient inside the test module that expose the async methods
used by the tests (e.g., async send, async get or similar) and maintain simple
state counters/flags (call_count, last_args, or a list of messages) so
assertions can inspect behavior; update the fixture to return instances of these
fakes instead of mocks, and replace any Mock-specific assertions (like
assert_awaited or call_args) with checks against the fake's counters/flags or
stored data to validate expected interactions.

Comment on lines +59 to +67
@pytest.mark.asyncio
async def test_send_message_simple(self, stream_conversation, mock_channel):
"""Test send_message convenience method."""
await stream_conversation.send_message(
role="user",
user_id="user123",
content="Hello",
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Remove @pytest.mark.asyncio across this file.

Async support is automatic in this repo; decorators are prohibited.

Example:

-@pytest.mark.asyncio
 async def test_send_message_simple(...):

Find all occurrences:

Also applies to: 80-84, 83-106, 107-142, 143-183, 184-207, 208-234, 236-615


🏁 Script executed:

#!/bin/bash
rg -n '@pytest\.mark\.asyncio' plugins/getstream/tests/test_stream_conversation.py

Length of output: 371


Remove all @pytest.mark.asyncio decorators Async support is automatic; these decorators are prohibited.

🤖 Prompt for AI Agents
In plugins/getstream/tests/test_stream_conversation.py around lines 59 to 67,
remove the prohibited @pytest.mark.asyncio decorator(s) from the async test
function(s); leave the async def test_send_message_simple(...) signature intact
(async support is automatic), and delete any corresponding import of
pytest.mark.asyncio if it exists so no unused decorator remains.

Comment on lines +129 to +139
await self.channel.client.update_message_partial(
chunk_id,
user_id=message.user_id,
set={
"text": chunk_text,
"generating": False if is_last_chunk else False,
"chunk_group": message.id,
"chunk_index": i,
"total_chunks": new_chunk_count,
},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Partial update should set custom. fields under custom, not at root*

Current code writes generating/chunk_* at message root; creation wrote them under custom. Align shapes.

-                    await self.channel.client.update_message_partial(
-                        chunk_id,
-                        user_id=message.user_id,
-                        set={
-                            "text": chunk_text,
-                            "generating": False if is_last_chunk else False,
-                            "chunk_group": message.id,
-                            "chunk_index": i,
-                            "total_chunks": new_chunk_count,
-                        },
-                    )
+                    await self.channel.client.update_message_partial(
+                        chunk_id,
+                        user_id=message.user_id,
+                        set={
+                            "text": chunk_text,
+                            "custom": {
+                                "generating": False,
+                                "chunk_group": message.id,
+                                "chunk_index": i,
+                                "total_chunks": new_chunk_count,
+                            },
+                        },
+                    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await self.channel.client.update_message_partial(
chunk_id,
user_id=message.user_id,
set={
"text": chunk_text,
"generating": False if is_last_chunk else False,
"chunk_group": message.id,
"chunk_index": i,
"total_chunks": new_chunk_count,
},
)
await self.channel.client.update_message_partial(
chunk_id,
user_id=message.user_id,
set={
"text": chunk_text,
"custom": {
"generating": False,
"chunk_group": message.id,
"chunk_index": i,
"total_chunks": new_chunk_count,
},
},
)
🤖 Prompt for AI Agents
In plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py
around lines 129 to 139, the partial update writes generating and chunk_* fields
at the message root whereas message creation stores them under the custom
object; change the update payload so "text" remains at root but the generating,
chunk_group, chunk_index and total_chunks fields are nested inside a "custom"
dict (i.e., set={"text": chunk_text, "custom": {"generating": <bool>,
"chunk_group": message.id, "chunk_index": i, "total_chunks": new_chunk_count}}),
keeping the same boolean logic for generating.

Comment on lines +197 to +203
self.events.send(
LLMResponseCompletedEvent(
item_id=llm_response.original.output[0].id,
original=llm_response.original,
text=llm_response.text,
)
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard item_id extraction for non-streaming responses

Responses without text outputs (e.g., only tool calls) can leave output empty.

-            self.events.send(
-                LLMResponseCompletedEvent(
-                    item_id=llm_response.original.output[0].id,
-                    original=llm_response.original,
-                    text=llm_response.text,
-                )
-            )
+            self.events.send(
+                LLMResponseCompletedEvent(
+                    item_id=(
+                        llm_response.original.output[0].id
+                        if getattr(llm_response.original, "output", None)
+                        and len(llm_response.original.output) > 0
+                        else None
+                    ),
+                    original=llm_response.original,
+                    text=llm_response.text,
+                )
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.events.send(
LLMResponseCompletedEvent(
item_id=llm_response.original.output[0].id,
original=llm_response.original,
text=llm_response.text,
)
)
self.events.send(
LLMResponseCompletedEvent(
item_id=(
llm_response.original.output[0].id
if getattr(llm_response.original, "output", None)
and len(llm_response.original.output) > 0
else None
),
original=llm_response.original,
text=llm_response.text,
)
)
🤖 Prompt for AI Agents
In plugins/openai/vision_agents/plugins/openai/openai_llm.py around lines 197 to
203, the code assumes llm_response.original.output[0] exists which will raise
when output is empty (e.g., non-text/tool-only responses); guard the extraction
by checking that llm_response.original.output is a non-empty list before
accessing [0] and use a safe fallback (for example None or
llm_response.original.id) for item_id when output is empty, then send the
LLMResponseCompletedEvent using that guarded item_id so the event is still
emitted without raising.

Comment on lines +80 to 84
@pytest.mark.asyncio
async def test_send_message(self, conversation):
"""Test send_message creates a new message."""
message = await conversation.send_message(
role="user",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Remove @pytest.mark.asyncio decorators (auto async support is enabled).

Per project guidelines, drop these markers; keep tests as async def only.

Representative change:

-    @pytest.mark.asyncio
     async def test_send_message(self, conversation):

List occurrences to update:

Also applies to: 95-103, 122-131, 135-146, 179-188, 210-219, 248-260


🏁 Script executed:

#!/bin/bash
rg -n --type=py '@pytest\.mark\.asyncio' tests | sed -n '1,120p'

Length of output: 4278


Remove all @pytest.mark.asyncio decorators
Async support is automatic; retain only async def on test functions.

Representative change:

-    @pytest.mark.asyncio
     async def test_send_message(self, conversation):

To remove everywhere:

rg -l '@pytest\.mark\.asyncio' tests | xargs sed -i '/@pytest.mark.asyncio/d'
🤖 Prompt for AI Agents
In tests/test_conversation.py around lines 80 to 84, remove the
@pytest.mark.asyncio decorator above the async test function test_send_message;
keep the async def signature only. Search for and delete any occurrences of
@pytest.mark.asyncio in this file (and other tests if applicable) so tests rely
on pytest's builtin asyncio support, ensuring indentation and spacing remain
correct after removal.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (1)

16-20: Unify PcmData type to the core definition to avoid drift.

This module imports PcmData from RTC utils while the rest of the repo uses vision_agents.core.edge.types.PcmData. Unify to the core type; RTC layer only needs .samples/.sample_rate.

-from getstream.video.rtc.track_util import PcmData
+from vision_agents.core.edge.types import PcmData
@@
-    async def simple_audio_response(
-        self, audio: PcmData, participant: Optional[Participant] = None
-    ):
+    async def simple_audio_response(
+        self, audio: PcmData, participant: Optional[Participant] = None
+    ):

(Optional) Adjust RTCManager.send_audio_pcm typing if needed; runtime behavior remains identical.

Also applies to: 121-136

♻️ Duplicate comments (1)
agents-core/vision_agents/core/agents/agents.py (1)

215-224: Delta upserts will fan‑out messages when item_id is missing — add a stable fallback map and clean up on completion.

Several providers omit item_id on chunks/completions; your code passes None, creating a new message per delta. Maintain a per‑response key→message_id map (e.g., by (output_index, content_index)), use it for chunks, then finalize and delete on completion.

Apply:

@@
-            await self.conversation.upsert_message(
-                message_id=event.item_id,
+            # Use a stable id for this response: prefer item_id, else map by indexes
+            key = event.item_id or f"{event.output_index or 0}:{event.content_index or 0}"
+            msg_id = getattr(self, "_llm_stream_msg_ids", {}).get(key)
+            if not getattr(self, "_llm_stream_msg_ids", None):
+                self._llm_stream_msg_ids = {}
+            if msg_id is None:
+                msg_id = str(uuid.uuid4())
+                self._llm_stream_msg_ids[key] = msg_id
+            await self.conversation.upsert_message(
+                message_id=msg_id,
                 role="assistant",
                 user_id=self.agent_user.id or "agent",
                 content=event.delta or "",
                 content_index=event.content_index,
                 completed=False,  # Still streaming
             )
@@
-            await self.conversation.upsert_message(
-                message_id=event.item_id,
+            key = event.item_id or f"{getattr(event,'output_index',0) or 0}:{None}"
+            msg_id = getattr(self, "_llm_stream_msg_ids", {}).get(key) or event.item_id
+            await self.conversation.upsert_message(
+                message_id=msg_id,
                 role="assistant",
                 user_id=self.agent_user.id or "agent",
                 content=event.text or "",
                 completed=True,
                 replace=True,  # Replace any partial content from deltas
             )
+            # Cleanup mapping for this response
+            if key in self._llm_stream_msg_ids:
+                self._llm_stream_msg_ids.pop(key, None)

Also applies to: 226-245

🧹 Nitpick comments (16)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (2)

664-665: FIXME comment indicates unresolved typing issue.

The event dictionary typing could be improved with a TypedDict or protocol.

Would you like me to generate a TypedDict definition for the OpenAI Realtime API events, or open an issue to track this?


711-736: Consider reducing log verbosity for frame-by-frame details.

Per-frame logging at INFO level (lines 714-716, 731-736) could be excessive in production. Consider DEBUG level for frame-by-frame details while keeping INFO for summaries.

Example adjustment:

-            logger.info(
-                f"🎥 Starting video frame forwarding loop (fps={fps}, interval={interval:.3f}s)"
-            )
-            logger.info(
-                f"🎥 Source track: {type(source_track).__name__}, kind={getattr(source_track, 'kind', 'unknown')}"
-            )
+            logger.info(
+                f"🎥 Starting video frame forwarding loop (fps={fps}, interval={interval:.3f}s)"
+            )
+            logger.debug(
+                f"🎥 Source track: {type(source_track).__name__}, kind={getattr(source_track, 'kind', 'unknown')}"
+            )

And:

                    # Log frame details
-                    logger.info(
-                        f"🎥 SUCCESS: Read frame #{frame_count} from user track!"
-                    )
-                    logger.info(
-                        f"🎥 Frame details: {frame.width}x{frame.height}, format={frame.format}, pts={frame.pts}"
-                    )
+                    logger.debug(
+                        f"🎥 SUCCESS: Read frame #{frame_count} from user track!"
+                    )
+                    logger.debug(
+                        f"🎥 Frame details: {frame.width}x{frame.height}, format={frame.format}, pts={frame.pts}"
+                    )
agents-core/vision_agents/core/agents/agents.py (6)

276-285: Realtime transcript upserts: replace=True with random IDs is redundant.

You already generate a new UUID per transcript. Remove replace=True to avoid confusion; reserve it for true replacements.

-                completed=True,
-                replace=True,
+                completed=True,

(and same change in the agent transcript block)

Also applies to: 295-303


684-687: Background task lacks error hook — attach callback to avoid swallowed exceptions.

-                asyncio.create_task(
-                    self.llm.simple_audio_response(pcm_data, participant)
-                )
+                task = asyncio.create_task(
+                    self.llm.simple_audio_response(pcm_data, participant)
+                )
+                task.add_done_callback(_log_task_exception)

249-261: Reduce PII in logs — truncate transcript text.

Full transcripts in INFO can leak user data. Log first N chars or switch to DEBUG.

-            self.logger.info(f"🎤 [Transcript]: {event.text}")
+            self.logger.debug("🎤 [Transcript]: %s", (event.text or "")[:120])
@@
-            self.logger.info(f"🎤 [User transcript]: {event.text}")
+            self.logger.debug("🎤 [User transcript]: %s", (event.text or "")[:120])
@@
-            self.logger.info(f"🎤 [Agent transcript]: {event.text}")
+            self.logger.debug("🎤 [Agent transcript]: %s", (event.text or "")[:120])

Also applies to: 271-303


344-351: Early events can be lost before conversation is set — buffer or await creation earlier.

Handlers bail when self.conversation is None. Either await create_conversation_coro before wiring LLM/STT or buffer upserts then drain once set.

Would you prefer I draft a small ring buffer (deque maxlen=N) to queue upserts until conversation is ready?

Also applies to: 404-406


450-458: Graceful shutdown: await cancelled tasks to drain and suppress CancelledError.

-        for _, track in self._track_tasks.items():
-            track.cancel()
+        for _, task in self._track_tasks.items():
+            task.cancel()
+        if self._track_tasks:
+            await asyncio.gather(*self._track_tasks.values(), return_exceptions=True)
+        self._track_tasks.clear()
@@
-        if self._interval_task:
-            self._interval_task.cancel()
+        if self._interval_task:
+            self._interval_task.cancel()
+            try:
+                await self._interval_task
+            except asyncio.CancelledError:
+                pass

Also applies to: 464-471, 496-499


754-756: Typo and readability: hasImageProcessershas_image_processors.

-        hasImageProcessers = len(self.image_processors) > 0
+        has_image_processors = len(self.image_processors) > 0
@@
-        if not hasImageProcessers:
+        if not has_image_processors:
@@
-                    if hasImageProcessers:
+                    if has_image_processors:

Also applies to: 768-773, 789-799

tests/test_realtime_base.py (1)

88-171: Unskip and update to the unified Conversation API.

The skipped test still targets legacy methods (partial_update_message, finish_last_message). Rework to await conversation.upsert_message(...) assertions.

I can provide a minimal FakeConversation.upsert_message that records calls and simulates chunking semantics.

plugins/openai/tests/test_openai_realtime.py (1)

113-173: Event assertions look solid; consider awaiting event queue instead of fixed sleeps.

Replace await asyncio.sleep(0.1) with await realtime.events.wait(timeout=1.0) to reduce flakiness.

agents-core/vision_agents/core/llm/realtime.py (1)

185-205: Add short Google‑style docstrings to new emitters.

Improves discoverability for downstream handlers.

Example:

def _emit_user_speech_transcription(self, text: str, original=None):
    """Emit user speech transcript.

    Args:
        text: Transcript text.
        original: Provider-specific raw event.
    """
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (3)

194-233: Participant map can leak if completion never arrives — add bounds/cleanup.

If conversation.item.created fires without a matching transcription, _item_to_participant retains entries.

  • Clear map on close()/disconnect.
  • Bound size (e.g., max 100) and evict oldest.
  • Add periodic cleanup (TTL) if feasible.

301-304: Tool‑call logs may include sensitive args — lower to DEBUG or redact.

-            logger.info(
+            logger.debug(
                 f"Executing tool call: {tool_call['name']} with args: {tool_call['arguments_json']}"
             )
@@
-                logger.info(f"Tool call {tool_call['name']} succeeded: {response_data}")
+                logger.debug("Tool call %s succeeded", tool_call["name"])

Also applies to: 309-319


333-374: Tool response path looks correct; small polish on JSON conversion and follow‑up trigger.

Consider explicitly limiting response_str length or switching to JSON object if API supports it.

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (2)

208-237: LGTM - Defensive conditional checks.

The expanded multi-line conditionals make None-checks explicit and improve code safety. While more verbose, this prevents potential AttributeError exceptions when accessing nested attributes.


299-302: Consider enriching LLMResponseChunkEvent with additional metadata.

The LLMResponseChunkEvent includes optional fields like item_id, output_index, and sequence_number that could provide richer context for event consumers. If server_message contains relevant identifiers or ordering information, consider populating these fields for consistency with OpenAI realtime events.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 102a61e and 6fd13c6.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • agents-core/vision_agents/core/agents/agents.py (21 hunks)
  • agents-core/vision_agents/core/llm/events.py (8 hunks)
  • agents-core/vision_agents/core/llm/realtime.py (5 hunks)
  • examples/01_simple_agent_example/simple_agent_example.py (2 hunks)
  • plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (19 hunks)
  • plugins/openai/pyproject.toml (1 hunks)
  • plugins/openai/tests/test_openai_realtime.py (4 hunks)
  • plugins/openai/vision_agents/plugins/openai/openai_realtime.py (13 hunks)
  • plugins/openai/vision_agents/plugins/openai/rtc_manager.py (23 hunks)
  • tests/test_realtime_base.py (9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/01_simple_agent_example/simple_agent_example.py
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

tests/**/*.py: Do not use mocking in tests
All tests must use pytest
Integration tests must be marked with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic

Files:

  • tests/test_realtime_base.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide

Files:

  • tests/test_realtime_base.py
  • agents-core/vision_agents/core/llm/realtime.py
  • plugins/openai/vision_agents/plugins/openai/openai_realtime.py
  • agents-core/vision_agents/core/agents/agents.py
  • plugins/openai/tests/test_openai_realtime.py
  • plugins/openai/vision_agents/plugins/openai/rtc_manager.py
  • plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py
  • agents-core/vision_agents/core/llm/events.py
🧬 Code graph analysis (8)
tests/test_realtime_base.py (4)
agents-core/vision_agents/core/edge/edge_transport.py (1)
  • EdgeTransport (18-59)
agents-core/vision_agents/core/edge/types.py (2)
  • User (15-18)
  • Connection (27-34)
agents-core/vision_agents/core/events/manager.py (1)
  • wait (451-465)
agents-core/vision_agents/core/agents/agents.py (1)
  • Agent (56-1045)
agents-core/vision_agents/core/llm/realtime.py (4)
agents-core/vision_agents/core/edge/types.py (1)
  • PcmData (37-183)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
  • simple_audio_response (130-156)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (1)
  • simple_audio_response (121-136)
agents-core/vision_agents/core/llm/events.py (2)
  • RealtimeUserSpeechTranscriptionEvent (142-147)
  • RealtimeAgentSpeechTranscriptionEvent (151-156)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (6)
plugins/openai/tests/test_openai_realtime.py (1)
  • realtime (20-29)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (5)
  • RTCManager (253-796)
  • connect (285-308)
  • start_video_sender (477-538)
  • stop_video_sender (540-582)
  • _send_event (447-475)
agents-core/vision_agents/core/llm/realtime.py (6)
  • connect (68-68)
  • _emit_connected_event (83-97)
  • simple_audio_response (71-73)
  • _emit_agent_speech_transcription (196-204)
  • _emit_response_event (138-156)
  • _emit_user_speech_transcription (185-194)
agents-core/vision_agents/core/edge/types.py (1)
  • PcmData (37-183)
agents-core/vision_agents/core/llm/llm.py (2)
  • _run_one_tool (226-292)
  • get_available_functions (180-182)
agents-core/vision_agents/core/llm/llm_types.py (1)
  • ToolSchema (64-67)
agents-core/vision_agents/core/agents/agents.py (8)
agents-core/vision_agents/core/llm/events.py (4)
  • LLMResponseChunkEvent (90-105)
  • LLMResponseCompletedEvent (109-115)
  • RealtimeUserSpeechTranscriptionEvent (142-147)
  • RealtimeAgentSpeechTranscriptionEvent (151-156)
agents-core/vision_agents/core/llm/realtime.py (3)
  • Realtime (24-220)
  • simple_audio_response (71-73)
  • _watch_video_track (75-77)
agents-core/vision_agents/core/mcp/mcp_manager.py (1)
  • MCPManager (9-147)
agents-core/vision_agents/core/stt/events.py (1)
  • STTTranscriptEvent (7-22)
agents-core/vision_agents/core/agents/conversation.py (2)
  • Conversation (67-227)
  • upsert_message (122-210)
agents-core/vision_agents/core/events/manager.py (3)
  • register_events_from_module (214-247)
  • subscribe (292-353)
  • send (406-449)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
  • create_conversation (224-231)
  • create_audio_track (291-294)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
  • VideoForwarder (13-188)
plugins/openai/tests/test_openai_realtime.py (4)
agents-core/vision_agents/core/llm/events.py (3)
  • RealtimeAudioOutputEvent (37-45)
  • RealtimeUserSpeechTranscriptionEvent (142-147)
  • RealtimeAgentSpeechTranscriptionEvent (151-156)
agents-core/vision_agents/core/agents/agents.py (2)
  • subscribe (187-199)
  • simple_response (173-185)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (4)
  • connect (72-98)
  • simple_response (100-119)
  • _handle_openai_event (152-244)
  • simple_audio_response (121-136)
agents-core/vision_agents/core/edge/types.py (2)
  • PcmData (37-183)
  • resample (129-183)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
  • VideoForwarder (13-188)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (5)
agents-core/vision_agents/core/llm/events.py (2)
  • RealtimeAudioOutputEvent (37-45)
  • LLMResponseChunkEvent (90-105)
agents-core/vision_agents/core/llm/realtime.py (2)
  • _emit_user_speech_transcription (185-194)
  • _emit_agent_speech_transcription (196-204)
agents-core/vision_agents/core/edge/types.py (1)
  • PcmData (37-183)
agents-core/vision_agents/core/utils/video_forwarder.py (2)
  • start_event_consumer (104-188)
  • start (31-39)
agents-core/vision_agents/core/llm/llm.py (1)
  • _run_one_tool (226-292)
agents-core/vision_agents/core/llm/events.py (1)
agents-core/vision_agents/core/events/base.py (2)
  • PluginBaseEvent (50-52)
  • AudioFormat (23-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
🔇 Additional comments (19)
plugins/openai/pyproject.toml (1)

14-14: Consider tightening the version constraint to prevent unexpected major version upgrades.

The constraint >=2.5.0 permits any future major version (e.g., 3.0.0), which could introduce breaking changes. Since OpenAI's Python client follows SemVer with potential backwards-incompatible changes in minor versions, a safer constraint would be >=2.5.0,<3.0.0 to ensure stability.

Additionally, verify that the PR changes (e.g., new OpenAI realtime event types like RealtimeUserSpeechTranscriptionEvent) actually require 2.5.0 specifically, or if an earlier 2.x version would suffice. The latest openai package version is 2.5.0 (released Oct 17, 2025), but confirming the minimum required version ensures clarity for future maintainers.

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (11)

45-50: LGTM: Clean type hints added.

The explicit type annotations improve clarity.


96-132: LGTM: Improved formatting and logging.

The multiline formatting and enhanced log messages improve readability.


136-149: LGTM: Formatting improvements.

The multiline parameter formatting enhances code readability.


161-228: LGTM: Enhanced logging formatting.

The multiline log statements improve readability without changing behavior.


349-362: LGTM: Session configuration enhancement.

Enabling semantic VAD and input audio transcription aligns with the PR objectives for enriched event payloads.


326-328: LGTM: Consistent multiline formatting.

The multiline formatting for async calls improves readability.

Also applies to: 457-459, 606-608, 627-630


424-475: LGTM: Logging formatting improvements.

The multiline log statements are more readable.


477-538: LGTM: Formatting and logging improvements.

The multiline formatting enhances code organization and readability.


540-582: LGTM: Spacing and logging improvements.

The additional spacing and multiline logs improve code structure.


679-681: LGTM: Logging formatting improvement.

The multiline formatting is more readable.


782-782: LGTM: Spacing improvement.

plugins/openai/tests/test_openai_realtime.py (1)

64-77: Nice resampling harness and typed PcmData construction.

This aligns with the core PcmData API and exercises realistic paths. LGTM.

agents-core/vision_agents/core/llm/realtime.py (1)

70-74: API change acknowledged: simple_audio_response(..., participant=None)

Signature aligns with downstream usage (Agent/OpenAI/Gemini). LGTM.

Please confirm all concrete implementations were updated (OpenAI, Gemini) and no third-party LLMs remain on the old signature.

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (3)

9-24: LGTM - Improved type coverage.

The expanded imports from google.genai.types provide better type annotations throughout the module.


239-264: LGTM - Transcription events properly emitted.

The user and agent speech transcription events are correctly emitted using the base class methods. The participant context set in simple_audio_response will be included in user transcription events via user_metadata=self._current_participant in the base class implementation.


130-157: No changes needed—attribute is properly initialized in base class.

The verification confirms that _current_participant is initialized in the base class (agents-core/vision_agents/core/llm/realtime.py) as self._current_participant: Optional[Participant] = None. The assignment in simple_audio_response (line 149) safely updates an already-initialized attribute; no AttributeError risk exists.

Likely an incorrect or invalid review comment.

agents-core/vision_agents/core/llm/events.py (2)

11-87: LGTM - String literal formatting and field enrichment.

The change from single-quoted to double-quoted string literals in type field defaults is a stylistic consistency improvement. The new fields added to events (is_complete, conversation_item_id, status, role, content, is_recoverable, delta, item_id) are properly typed and provide richer event payloads for consumers.


141-156: LGTM - New transcription events well-structured.

The new RealtimeUserSpeechTranscriptionEvent and RealtimeAgentSpeechTranscriptionEvent classes follow consistent patterns with existing events. The separation between user and agent transcriptions provides clear event semantics for consumers. Docstrings are concise and conform to the Google style guide requirement.

Comment on lines 254 to 265
user_id = event.user_id() or "user"

await self.conversation.upsert_message(
# TODO: FIX THIS, IT IS INCREDIBLY WRONG!
message_id="stt-" + user_id,
role="user",
user_id=user_id,
content=event.text or "",
completed=True,
replace=True, # Replace any partial transcripts
original=event,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

STT upserts overwrite each other across utterances — use per‑utterance IDs (drop replace unless streaming).

Static message_id="stt-"+user_id with replace=True collapses all user speech into one message.

-            user_id = event.user_id() or "user"
+            user_id = event.user_id() or "user"
@@
-                # TODO: FIX THIS, IT IS INCREDIBLY WRONG!
-                message_id="stt-" + user_id,
+                message_id=f"stt-{user_id}-{uuid.uuid4()}",
                 role="user",
                 user_id=user_id,
                 content=event.text or "",
                 completed=True,
-                replace=True,  # Replace any partial transcripts
+                # replace only if you're purposely reusing the same message_id for streaming
+                replace=False,
                 original=event,

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 254 to 265, the
STT upsert is using a static message_id ("stt-"+user_id) and replace=True which
causes all utterances to overwrite one another; change to generate a unique
per-utterance message_id (for example include event.id, a timestamp, or an
incrementing counter) and remove replace=True unless you intentionally want
streaming/partial-replacement semantics; ensure completed is set appropriately
per-utterance and retain original=event.

Comment on lines +113 to +118
async def simple_response(
self,
text: str,
processors: Optional[List[Processor]] = None,
participant: Optional[Participant] = None,
):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused parameter: participant is not utilized.

The participant parameter is added to simple_response but never used or stored, unlike simple_audio_response which sets self._current_participant. If participant tracking is needed for text inputs, the parameter should be stored; otherwise, consider removing it to avoid API confusion.

Apply this diff if participant tracking is intended:

     async def simple_response(
         self,
         text: str,
         processors: Optional[List[Processor]] = None,
         participant: Optional[Participant] = None,
     ):
         """
         Simple response standardizes how to send a text instruction to this LLM.
 
         Example:
             llm.simple_response("tell me a poem about Boulder")
 
         For more advanced use cases you can use the native send_realtime_input
         """
+        self._current_participant = participant
         self.logger.info("Simple response called with text: %s", text)
         await self.send_realtime_input(text=text)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py around lines
113 to 118, the new parameter `participant` on simple_response is unused which
causes API confusion; either persist it like simple_audio_response or remove it.
If participant tracking is intended, set self._current_participant = participant
at the start of simple_response (same behavior as simple_audio_response) and
ensure any type checks/uses downstream rely on that field; otherwise delete the
participant parameter from the method signature and update any callers to avoid
passing it.

if event.get("type") == "session.created" and "session" in event:
self.session_info = event["session"]
logger.debug(f"Stored session info: {self.session_info}")
logger.error(f"Stored session info: {self.session_info}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect log level for informational message.

This logs successful session storage at ERROR level, which will pollute error logs and trigger false alarms in monitoring.

Apply this diff:

-            logger.error(f"Stored session info: {self.session_info}")
+            logger.info(f"Stored session info: {self.session_info}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.error(f"Stored session info: {self.session_info}")
logger.info(f"Stored session info: {self.session_info}")
🤖 Prompt for AI Agents
In plugins/openai/vision_agents/plugins/openai/rtc_manager.py around line 667,
the code logs a successful session storage using logger.error which is the wrong
severity; change the log level to an informational one (e.g., logger.info or
logger.debug) and update the message if needed to reflect normal operation so
successful session info is not treated as an error.

Comment on lines 89 to 90
@pytest.mark.asyncio
async def test_agent_conversation_updates_with_realtime():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove @pytest.mark.asyncio — async support is automatic per repo guidelines.

Replace with plain async def tests.

-@pytest.mark.asyncio
 async def test_agent_conversation_updates_with_realtime():
@@
-@pytest.mark.asyncio
 async def test_simple_response_aggregates_and_returns_realtimeresponse():
@@
-@pytest.mark.asyncio
 async def test_wait_until_ready_returns_true_immediately():
@@
-@pytest.mark.asyncio
 async def test_close_emits_disconnected_event():
@@
-@pytest.mark.asyncio
 async def test_noop_video_and_playback_methods_do_not_error():

As per coding guidelines.

Also applies to: 266-266, 291-291, 297-297, 317-317, 365-365

🤖 Prompt for AI Agents
In tests/test_realtime_base.py around lines 89-90 (also at lines 266, 291, 297,
317, and 365), remove the leading @pytest.mark.asyncio decorator from each test
function so they become plain async def tests; leave the async def signature
intact and delete only the decorator lines, ensuring no other test code or
imports are changed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
plugins/deepgram/tests/test_stt.py (1)

144-189: Add docstring to fixture.

The audio_data fixture is missing a docstring. Other fixtures in this file follow the pattern of documenting their purpose.

As per coding guidelines.

Apply this diff:

 @pytest.fixture
 def audio_data(mia_mp3_path):
+    """Load and prepare the audio data for testing."""
     from scipy import signal
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (3)

234-241: Don’t stop a shared forwarder.

stop() unconditionally schedules self._forwarder.stop(), which can tear down a shared resource used by other tracks.

Apply this guard:

@@     def stop(self):
-        try:
-            if self._forwarder is not None:
-                asyncio.create_task(self._forwarder.stop())
-                self._forwarder = None
+        try:
+            if self._forwarder is not None:
+                if self._forwarder is self._shared_forwarder:
+                    logger.debug("Shared VideoForwarder detected; skipping stop()")
+                else:
+                    asyncio.create_task(self._forwarder.stop())
+                self._forwarder = None

319-326: Ensure client_secret.value exists; fail fast with a clear error.

If the API changes or returns no client_secret, you may return None and proceed with a bad token.

Apply:

@@     async def _get_session_token(self) -> str:
-                    data: dict = resp.json()
-                    secret = data.get("client_secret", {})
-                    return secret.get("value")
+                    data: dict = resp.json()
+                    secret = data.get("client_secret") or {}
+                    token = secret.get("value")
+                    if not token:
+                        raise RuntimeError("Missing client_secret.value in session.create response")
+                    return token

470-495: _video_sender may be undefined; initialize in init to avoid AttributeError.

start_video_sender uses self._video_sender but it’s not set in init. If called before _set_video_track, this can raise.

Add this in init (outside this hunk):

# in RTCManager.__init__
self._video_sender: Optional[Any] = None  # RTCRtpSender; keep Any to avoid extra import

And keep the existing None checks. Optionally, assert that connect() completed or that _set_video_track() ran.

♻️ Duplicate comments (1)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)

658-658: Log level and redaction for stored session info.

Re‑flagging: logging normal session storage at ERROR is wrong; also avoid dumping full session objects (may contain sensitive fields).

Apply:

-            logger.error(f"Stored session info: {self.session_info}")
+            # Avoid logging secrets; log minimal identifiers only
+            safe_id = self.session_info.get("id") if isinstance(self.session_info, dict) else None
+            safe_model = self.session_info.get("model") if isinstance(self.session_info, dict) else None
+            logger.info("Stored session info (id=%s, model=%s)", safe_id, safe_model)
🧹 Nitpick comments (13)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)

226-230: Cleaner architecture with local channel variable.

The change from class-level self.channel storage to a local variable passed to StreamConversation improves encapsulation and aligns with the updated StreamConversation constructor signature shown in the relevant code snippets.

Consider adding error handling for the channel.get_or_create() call to gracefully handle potential channel creation failures:

 async def create_conversation(self, call: Call, user, instructions):
     chat_client: ChatClient = call.client.stream.chat
     channel = chat_client.channel(self.channel_type, call.id)
-    await channel.get_or_create(
-        data=ChannelInput(created_by_id=user.id),
-    )
+    try:
+        await channel.get_or_create(
+            data=ChannelInput(created_by_id=user.id),
+        )
+    except Exception as e:
+        self.logger.error(f"Failed to create or retrieve channel: {e}")
+        raise
     self.conversation = StreamConversation(instructions, [], channel)
     return self.conversation
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (5)

100-119: Clarify the unused participant parameter.

The participant parameter is documented as ignored, yet the PR introduces per-participant handling throughout this module. If this parameter exists solely for interface parity with the base Realtime class or other providers, that's reasonable—but if there's a plan to use it for correlation (e.g., tracking which participant sent the text), please clarify or open a follow-up issue.

Do you want me to open an issue to track potential future use of the participant parameter in simple_response, or is the current design intentional?


172-193: LGTM—workaround is well-documented.

The handling of both response.audio_transcript.done and response.output_audio_transcript.done via event mutation is a reasonable workaround for the OpenAI API inconsistency. The GitHub issue reference provides good context.

Optionally, enhance the comment to explain what breaks without the workaround:

-        # code here is weird because OpenAI does something strange
-        # see issue: https://github.com/openai/openai-python/issues/2698
-        # as a workaround we copy the event and set type to response.output_audio_transcript.done so that
-        # ResponseAudioTranscriptDoneEvent.model_validate is happy
+        # OpenAI sends response.audio_transcript.done but the SDK expects response.output_audio_transcript.done.
+        # Without this workaround, Pydantic validation fails.
+        # See: https://github.com/openai/openai-python/issues/2698

194-203: Consider clearing _pending_participant after mapping.

If OpenAI doesn't emit a conversation.item.created event for some reason, _pending_participant remains set and could be incorrectly attributed to a subsequent item.

Clear the pending participant immediately after establishing the mapping:

                 if item_id and self._pending_participant:
                     self._item_to_participant[item_id] = self._pending_participant
+                    self._pending_participant = None
                     logger.debug(

Also, simplify the logging expression:

-                        f"Mapped item {item_id} to participant {self._pending_participant.user_id if self._pending_participant else 'None'}"
+                        f"Mapped item {item_id} to participant {self._item_to_participant[item_id].user_id}"

275-331: LGTM—tool call handling is well-structured.

The enhanced logging and error handling improve observability. The tool execution flow correctly delegates to _run_one_tool and handles both success and error cases.

Consider making the 30-second timeout configurable rather than hardcoded at line 306:

+            timeout = getattr(self, 'tool_call_timeout', 30)
-            tc, result, error = await self._run_one_tool(tool_call, timeout_s=30)
+            tc, result, error = await self._run_one_tool(tool_call, timeout_s=timeout)

This allows callers to adjust the timeout for tools with varying execution times.


333-376: Tool response flow is correct.

The method correctly sanitizes output, sends the tool response, and triggers a follow-up response with audio. The instructions ensure the AI provides a conversational response after receiving tool results.

The hardcoded instruction at line 370 could be made configurable:

+            follow_up_instructions = getattr(
+                self, 
+                'tool_response_instructions', 
+                "Please respond to the user with the tool results in a conversational way."
+            )
             await self.rtc._send_event(
                 {
                     "type": "response.create",
                     "response": {
                         "modalities": ["text", "audio"],
-                        "instructions": "Please respond to the user with the tool results in a conversational way.",
+                        "instructions": follow_up_instructions,
                     },
                 }
             )
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (7)

45-51: Audio input setter looks good; small micro‑opt note.

The copy via bytes(pcm_data) is safe. If perf matters, consider accepting memoryview/bytearray to avoid copies, but fine as-is.


649-653: Remove stray print; use structured logging.

print() in async handlers will spam stdout and bypass log routing.

-        print(f"_handle_event {event['type']}")
+        logger.debug("Event received: %s", event.get("type"))

690-762: Forwarding loop is unused and overly chatty at INFO.

Given the switch to StreamVideoForwardingTrack, _forward_video_frames seems unused. Either remove it or downgrade INFO logs inside the loop to DEBUG to avoid log noise.

Minimal noise reduction:

-            logger.info(
+            logger.debug(
                 f"🎥 Starting video frame forwarding loop (fps={fps}, interval={interval:.3f}s)"
             )
-            logger.info(
+            logger.debug(
                 f"🎥 Source track: {type(source_track).__name__}, kind={getattr(source_track, 'kind', 'unknown')}"
             )
@@
-                    logger.info(
+                    logger.debug(
                         f"🎥 SUCCESS: Read frame #{frame_count} from user track!"
                     )
-                    logger.info(
+                    logger.debug(
                         f"🎥 Frame details: {frame.width}x{frame.height}, format={frame.format}, pts={frame.pts}"
                     )
@@
-            logger.info(
+            logger.debug(
                 f"🎥 Video forwarding task cancelled after {frame_count} frames"
             )
@@
-            logger.info(
+            logger.debug(
                 f"🎥 Video forwarding task ended. Total frames processed: {frame_count}"
             )

135-135: Minor: drop f‑string; improve message casing.

No interpolation here; also “RTCManager” for consistency.

-            logger.warning(f"rtc manager already started", stack_info=True)
+            logger.warning("RTCManager already started", stack_info=True)

125-131: Log level sanity: init messages OK; consider DEBUG for non‑actionable info.

Initialization messages with emojis are fine for dev, but might be noisy in prod. Consider DEBUG for the non‑actionable branch.


205-219: Circuit‑breaker logs: keep at ERROR, but include context for triage.

Consider adding model/session id to ease correlation when multiple sessions run.


577-585: SDP exchange timeouts look reasonable; consider retry with backoff.

Single attempt can be brittle in transient network issues; a short bounded backoff (e.g., 2 tries) can improve resilience.

Also applies to: 597-603

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6fd13c6 and 63c6352.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • agents-core/vision_agents/core/agents/agents.py (21 hunks)
  • plugins/deepgram/pyproject.toml (1 hunks)
  • plugins/deepgram/tests/test_stt.py (7 hunks)
  • plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (3 hunks)
  • plugins/openai/vision_agents/plugins/openai/openai_realtime.py (13 hunks)
  • plugins/openai/vision_agents/plugins/openai/rtc_manager.py (22 hunks)
  • plugins/silero/pyproject.toml (1 hunks)
  • plugins/silero/tests/test_vad.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide

Files:

  • plugins/openai/vision_agents/plugins/openai/openai_realtime.py
  • plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py
  • plugins/openai/vision_agents/plugins/openai/rtc_manager.py
  • agents-core/vision_agents/core/agents/agents.py
  • plugins/silero/tests/test_vad.py
  • plugins/deepgram/tests/test_stt.py
🧬 Code graph analysis (5)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (6)
plugins/openai/tests/test_openai_realtime.py (1)
  • realtime (20-29)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (5)
  • RTCManager (246-787)
  • connect (278-301)
  • start_video_sender (470-531)
  • stop_video_sender (533-575)
  • _send_event (440-468)
agents-core/vision_agents/core/llm/realtime.py (8)
  • connect (68-68)
  • _emit_connected_event (83-97)
  • simple_audio_response (71-73)
  • _emit_agent_speech_transcription (196-204)
  • _emit_response_event (138-156)
  • _emit_user_speech_transcription (185-194)
  • _watch_video_track (75-77)
  • _stop_watching_video_track (79-81)
agents-core/vision_agents/core/edge/types.py (1)
  • PcmData (37-183)
agents-core/vision_agents/core/llm/llm.py (2)
  • _run_one_tool (226-292)
  • get_available_functions (180-182)
agents-core/vision_agents/core/llm/llm_types.py (1)
  • ToolSchema (64-67)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (3)
agents-core/vision_agents/core/edge/sfu_events.py (14)
  • user_id (489-493)
  • user_id (856-860)
  • user_id (901-905)
  • user_id (1186-1190)
  • user_id (2071-2075)
  • user_id (2113-2117)
  • track_type (579-583)
  • track_type (1193-1197)
  • track_type (2254-2258)
  • participant (1495-1500)
  • participant (1537-1542)
  • participant (1610-1615)
  • participant (2078-2083)
  • participant (2127-2132)
agents-core/vision_agents/core/edge/events.py (1)
  • TrackAddedEvent (17-22)
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)
  • StreamConversation (17-353)
agents-core/vision_agents/core/agents/agents.py (6)
agents-core/vision_agents/core/llm/events.py (4)
  • LLMResponseChunkEvent (90-105)
  • LLMResponseCompletedEvent (109-115)
  • RealtimeUserSpeechTranscriptionEvent (142-147)
  • RealtimeAgentSpeechTranscriptionEvent (151-156)
agents-core/vision_agents/core/llm/llm.py (1)
  • LLM (35-358)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (4)
  • Realtime (37-463)
  • simple_response (100-119)
  • simple_audio_response (121-136)
  • _watch_video_track (266-270)
agents-core/vision_agents/core/llm/realtime.py (3)
  • Realtime (24-220)
  • simple_audio_response (71-73)
  • _watch_video_track (75-77)
agents-core/vision_agents/core/agents/conversation.py (2)
  • Conversation (67-227)
  • upsert_message (122-210)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
  • create_conversation (224-231)
  • create_audio_track (291-294)
plugins/silero/tests/test_vad.py (2)
plugins/deepgram/tests/test_stt.py (3)
  • mia_mp3_path (126-128)
  • audio_data (144-189)
  • close (61-63)
agents-core/vision_agents/core/vad/events.py (1)
  • VADPartialEvent (44-55)
plugins/deepgram/tests/test_stt.py (3)
plugins/silero/tests/test_vad.py (1)
  • mia_mp3_path (23-25)
plugins/moonshine/tests/test_stt.py (1)
  • mia_mp3_path (33-35)
plugins/deepgram/vision_agents/plugins/deepgram/stt.py (1)
  • close (180-199)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Test "not integration"
🔇 Additional comments (17)
plugins/silero/pyproject.toml (1)

39-39: Verify the rationale for swapping audio/vision dependencies.

The documented PyTorch/torchvision pairing suggests torchvision 0.20.0 pairs with PyTorch 2.5.0, whereas this project specifies torch>=2.7.1. While the loose constraint >=0.20.0 will pull modern releases compatible with PyTorch 2.7.x, the substitution of torchaudio (audio processing) with torchvision (computer vision) is atypical for a Silero text-to-speech plugin.

Since this is a dev dependency, the risk is lower, but the purpose of this swap should be clarified—is this intentional or a mistake?

plugins/deepgram/pyproject.toml (1)

42-42: Clarify the coordinated dependency swap across plugins.

This mirrors the change in plugins/silero/pyproject.toml, yet substituting torchvision (computer vision) for audio-processing libraries seems incongruent with Deepgram's STT (speech-to-text) focus. The PR summary indicates these are "broader environment adjustments," but the functional purpose is unclear.

Please document why torchvision>=0.20.0 is now required in dev environments for audio-centric plugins, or verify that this is an unintended side effect of a larger refactoring.

plugins/deepgram/tests/test_stt.py (2)

456-459: LGTM!

The multiline decorator formatting improves readability while preserving the same functional behavior. This pattern is consistently applied across multiple test functions.


165-165: LGTM!

Explicit cleanup of the PyAV container is good practice for resource management.

plugins/silero/tests/test_vad.py (2)

47-74: LGTM: Audio conversion logic is correct.

The PyAV-based mp3 to wav conversion properly handles:

  • Stereo to mono conversion via channel averaging
  • Normalization from int16/int32 to float32 in the [-1.0, 1.0] range required by soundfile
  • Frame assembly and concatenation

This implementation mirrors the pattern in plugins/deepgram/tests/test_stt.py (lines 143-188).


474-486: LGTM: Logging format refactored for brevity.

The duration calculation has been inlined from multiline blocks into single expressions. Functionality is preserved—the ternary operator still safely handles None values for event.duration_ms.

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)

152-163: Agent track filtering logic is correct.

The conditional filtering to prevent emitting TrackAddedEvent for the agent's own tracks is appropriate and prevents the agent from processing its own audio/video as remote participant input.


107-107: ****

The concern about timing is not valid. While _on_track_published is registered as an event handler during __init__ (line 72), event firing and handler registration are distinct operations. Track published events only fire during an active call after join() is called, which occurs after create_user() has already set self.agent_user_id. All usage patterns in the codebase enforce this sequence: await agent.create_user()await agent.join(call). Therefore, self.agent_user_id is guaranteed to be non-None when the event handler fires.

Likely an incorrect or invalid review comment.

plugins/openai/vision_agents/plugins/openai/openai_realtime.py (2)

121-136: Participant tracking logic is correct, pending concurrency fix.

The approach of setting _pending_participant before sending audio is sound for correlating subsequent OpenAI conversation items with the correct user. However, this depends on the thread safety improvements suggested for lines 68-70 to avoid race conditions when multiple participants send audio concurrently.


427-462: LGTM—tool registration flow is sound.

The method correctly retrieves available functions, converts them to OpenAI's format, and registers them with the realtime session. The logging provides good visibility, and error handling ensures registration failures don't break the connection.

agents-core/vision_agents/core/agents/agents.py (6)

4-4: LGTM—imports support the new unified conversation API.

The added imports (uuid, llm_events, RealtimeUserSpeechTranscriptionEvent, RealtimeAgentSpeechTranscriptionEvent, STTTranscriptEvent, Conversation) are all used in the new setup hooks and align with the PR's unified messaging objective.

Also applies to: 20-20, 24-25, 32-32, 39-39


305-329: LGTM—turn detection integration is clean.

The handler correctly accumulates transcripts when turn detection is enabled and triggers the LLM immediately otherwise. The logic for buffering per-user transcripts and deferring response until turn end is sound.


343-349: LGTM—deferred conversation creation ensures correct initialization order.

Creating the conversation coroutine early (Line 344) but awaiting it after MCP and edge setup (Line 404) ensures event handlers are wired before the conversation exists. This prevents missed events during initialization.

Also applies to: 403-404


448-451: LGTM—minor formatting and logging improvements.

These changes add clearer logging, improve code organization with blank lines, and pass the shared_forwarder parameter consistently. All are refinements without functional impact.

Also applies to: 719-721, 728-731, 739-744, 749-751, 758-760, 769-772, 774-777


593-624: LGTM—say method now syncs with conversation.

The addition of conversation.upsert_message (Lines 617-624) correctly records say() calls as completed assistant messages. The user_id fallback chain (Line 621) provides appropriate defaults.


817-817: LGTM—formatting and logging cleanup.

These changes remove unnecessary blank lines, improve log messages, and clean up docstrings without altering behavior.

Also applies to: 842-842, 848-865, 895-895, 1025-1027

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)

342-355: API schema verified—session.update payload is correct.

The Realtime API accepts session.update with turn_detection type "semantic_vad" and input_audio_transcription model "whisper-1". The code at lines 342–355 uses the correct field structure; no changes needed.

Comment on lines +201 to +244
async def _setup_llm_events(self):
@self.llm.events.subscribe
async def on_llm_response_send_to_tts(event: LLMResponseCompletedEvent):
# Trigger TTS directly instead of through event system
if self.tts and event.text and event.text.strip():
await self.tts.send(event.text)

@self.llm.events.subscribe
async def on_llm_response_sync_conversation(event: LLMResponseCompletedEvent):
self.logger.info(f"🤖 [LLM response]: {event.text} {event.item_id}")

if self.conversation is None:
return

# Unified API: handles both streaming and non-streaming
await self.conversation.upsert_message(
message_id=event.item_id,
role="assistant",
user_id=self.agent_user.id or "agent",
content=event.text or "",
completed=True,
replace=True, # Replace any partial content from deltas
)

@self.llm.events.subscribe
async def _handle_output_text_delta(event: LLMResponseChunkEvent):
"""Handle partial LLM response text deltas."""

self.logger.info(
f"🤖 [LLM delta response]: {event.delta} {event.item_id} {event.content_index}"
)

if self.conversation is None:
return

# Unified API: streaming delta
await self.conversation.upsert_message(
message_id=event.item_id,
role="assistant",
user_id=self.agent_user.id or "agent",
content=event.delta or "",
content_index=event.content_index,
completed=False, # Still streaming
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify item_id is consistently emitted across all LLM providers.

The upsert logic assumes event.item_id is set for both deltas (Line 238) and completions (Line 217). If any provider emits None or omits item_id, each delta creates a new message rather than updating the same message, fragmenting the conversation.

The past review comment indicates this was addressed in commits 39a821d–6fd13c6, but the event type definitions still show item_id: Optional[str]. Run this verification to confirm all active providers populate item_id:


🏁 Script executed:

#!/bin/bash
# Check that all LLM provider plugins emit item_id on chunk and completion events
rg -nP --type=py -C3 'LLMResponse(ChunkEvent|CompletedEvent)\s*\(' | rg -C3 'item_id'

Length of output: 2717


[/script]


🏁 Script executed:

#!/bin/bash
# Find Anthropic and xAI completion events
echo "=== Anthropic completion events ==="
rg -nP --type=py 'LLMResponseCompletedEvent' plugins/anthropic/ -A5 | grep -E '(item_id|LLMResponseCompletedEvent)'

echo ""
echo "=== xAI completion events ==="
rg -nP --type=py 'LLMResponseCompletedEvent' plugins/xai/ -A5 | grep -E '(item_id|LLMResponseCompletedEvent)'

echo ""
echo "=== OpenAI chunk events ==="
rg -nP --type=py 'LLMResponseChunkEvent' plugins/openai/ -A5 | grep -E '(item_id|LLMResponseChunkEvent)'

echo ""
echo "=== Gemini item_id variable source ==="
rg -nB10 'self.events.send\(LLMResponseChunkEvent' plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py | grep -E '(item_id\s*=|def |class )'

echo ""
echo "=== upsert_message implementation ==="
rg -nP --type=py 'def upsert_message' agents-core/vision_agents/ -A20 | head -40

Length of output: 3722


[/script]


🏁 Script executed:

#!/bin/bash
# Get full xAI completion event context
echo "=== xAI completion events (full context) ==="
rg -nP --type=py 'self.events.send\(LLMResponseCompletedEvent' plugins/xai/ -A8

echo ""
echo "=== Gemini completion events ==="
rg -nP --type=py 'LLMResponseCompletedEvent' plugins/gemini/ -B2 -A8

echo ""
echo "=== Anthropic completion event full code ==="
rg -nP --type=py 'self.events.send\(LLMResponseCompletedEvent' plugins/anthropic/ -B5 -A5

Length of output: 6406


[/script]


🏁 Script executed:

#!/bin/bash
# Get full upsert_message implementation to confirm UUID generation
rg -nP --type=py 'def upsert_message' agents-core/vision_agents/ -A50 | head -70

Length of output: 4974


Anthropic and xAI LLM providers must emit item_id in completion events.

Verification confirms the streaming consolidation in agents.py (lines 201-244) is broken for two providers:

  • Anthropic (plugins/anthropic/.../anthropic_llm.py:295): LLMResponseCompletedEvent omits item_id entirely
  • xAI (plugins/xai/.../llm.py:142, 193): LLMResponseCompletedEvent omits item_id entirely

Both emit chunks with item_id="" (empty string) but completions with no item_id parameter, causing event.item_id = None. When passed to upsert_message() at agents.py line 217, this triggers auto-generation of a new UUID (conversation.py:161), fragmenting the message instead of consolidating deltas.

Gemini and OpenAI correctly emit item_id on both chunks and completions.

Required fixes:

  • Add item_id parameter to Anthropic completion event (line 295)
  • Add item_id parameter to both xAI completion events (lines 142 and 193)

Comment on lines +246 to +264
async def _setup_speech_events(self):
@self.events.subscribe
async def on_stt_transcript_event_sync_conversation(event: STTTranscriptEvent):
self.logger.info(f"🎤 [Transcript]: {event.text}")

if self.conversation is None:
return

user_id = event.user_id() or "user"

await self.conversation.upsert_message(
message_id=str(uuid.uuid4()),
role="user",
user_id=user_id,
content=event.text or "",
completed=True,
replace=True, # Replace any partial transcripts
original=event,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace logic is inconsistent—remove replace=True for completed, unique transcripts.

Lines 257 and 262 generate a unique message_id per STT transcript but set replace=True. Since each utterance gets a fresh UUID, there is nothing to replace—replace=True has no effect here and signals confused intent.

If transcripts are complete utterances (not streaming updates), use replace=False. If you intend to stream partial transcripts and later replace them with final text, reuse the same message_id for the partial and final versions.

Apply this diff to fix the STT transcript handler:

             await self.conversation.upsert_message(
                 message_id=str(uuid.uuid4()),
                 role="user",
                 user_id=user_id,
                 content=event.text or "",
                 completed=True,
-                replace=True,  # Replace any partial transcripts
+                replace=False,  # Each transcript is a distinct utterance
                 original=event,
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def _setup_speech_events(self):
@self.events.subscribe
async def on_stt_transcript_event_sync_conversation(event: STTTranscriptEvent):
self.logger.info(f"🎤 [Transcript]: {event.text}")
if self.conversation is None:
return
user_id = event.user_id() or "user"
await self.conversation.upsert_message(
message_id=str(uuid.uuid4()),
role="user",
user_id=user_id,
content=event.text or "",
completed=True,
replace=True, # Replace any partial transcripts
original=event,
)
async def _setup_speech_events(self):
@self.events.subscribe
async def on_stt_transcript_event_sync_conversation(event: STTTranscriptEvent):
self.logger.info(f"🎤 [Transcript]: {event.text}")
if self.conversation is None:
return
user_id = event.user_id() or "user"
await self.conversation.upsert_message(
message_id=str(uuid.uuid4()),
role="user",
user_id=user_id,
content=event.text or "",
completed=True,
replace=False, # Each transcript is a distinct utterance
original=event,
)
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 246 to 264, the
STT transcript handler generates a new UUID for each transcript but sets
replace=True which is ineffective and misleading; change the call to
upsert_message to use replace=False for these completed, unique transcripts (or
if you intend streaming partial→final transcripts, reuse the same message_id for
partial and final and only set replace=True when writing the final replacement).

Comment on lines +267 to +283
async def on_realtime_user_speech_transcription(
event: RealtimeUserSpeechTranscriptionEvent,
):
self.logger.info(f"🎤 [User transcript]: {event.text}")

if self.conversation is None or not event.text:
return

await self.conversation.upsert_message(
message_id=str(uuid.uuid4()),
role="user",
user_id=event.user_id() or "",
content=event.text,
completed=True,
replace=True,
original=event,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace logic is inconsistent—remove replace=True for completed, unique realtime user transcripts.

Lines 276 and 281 generate a unique message_id per realtime user transcript but set replace=True. Since each transcription gets a fresh UUID, there is nothing to replace.

Apply this diff:

             await self.conversation.upsert_message(
                 message_id=str(uuid.uuid4()),
                 role="user",
                 user_id=event.user_id() or "",
                 content=event.text,
                 completed=True,
-                replace=True,
+                replace=False,
                 original=event,
             )
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 267 to 283, the
handler creates a fresh UUID for each realtime user transcript but still passes
replace=True to upsert_message, which is inconsistent; remove the replace=True
argument (or set replace=False/omit it) when calling
self.conversation.upsert_message for these completed, unique realtime
transcripts so each new message is inserted instead of attempting to replace a
non-existent prior entry.

Comment on lines +286 to +302
async def on_realtime_agent_speech_transcription(
event: RealtimeAgentSpeechTranscriptionEvent,
):
self.logger.info(f"🎤 [Agent transcript]: {event.text}")

if self.conversation is None or not event.text:
return

await self.conversation.upsert_message(
message_id=str(uuid.uuid4()),
role="assistant",
user_id=self.agent_user.id or "",
content=event.text,
completed=True,
replace=True,
original=event,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace logic is inconsistent—remove replace=True for completed, unique realtime agent transcripts.

Lines 295 and 300 generate a unique message_id per realtime agent transcript but set replace=True. This is semantically incorrect for the same reasons as the user transcript handler.

Apply this diff:

             await self.conversation.upsert_message(
                 message_id=str(uuid.uuid4()),
                 role="assistant",
                 user_id=self.agent_user.id or "",
                 content=event.text,
                 completed=True,
-                replace=True,
+                replace=False,
                 original=event,
             )
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 286 to 302, the
realtime agent transcript handler creates a unique message_id for each event but
incorrectly passes replace=True; remove the replace=True flag (or set
replace=False/omit it) when calling conversation.upsert_message for completed
realtime agent transcripts so each transcript is stored as a distinct message
rather than attempting to replace an existing one.

ListenV1ResultsMetadata,
)
from getstream.video.rtc.track_util import PcmData
from torchvision.io.video import av
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: torchvision video decoding is deprecated and removed.

Importing av from torchvision.io.video will fail on torchvision 0.24.0+ as video decoding/encoding capabilities have been removed. Install and import PyAV directly instead.

Based on learnings.

Apply this diff:

-from torchvision.io.video import av
+import av

Then ensure PyAV is installed:

pip install av
🤖 Prompt for AI Agents
In plugins/deepgram/tests/test_stt.py around line 18, importing av via "from
torchvision.io.video import av" is deprecated and removed; replace that import
with a direct PyAV import ("import av") and update project test/dev dependencies
or CI to ensure the "av" package is installed (e.g., add to requirements/dev or
test extras and run pip install av) so tests can import PyAV successfully.

Comment on lines 233 to +236
elif et == "input_audio_buffer.speech_started":
# Validate event but don't need to store it
InputAudioBufferSpeechStartedEvent.model_validate(event)
await self.output_track.flush()
# await self.output_track.flush()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove or document commented-out code.

Line 236 has # await self.output_track.flush() without explanation. Either remove it if obsolete, or add a comment explaining why it's disabled and whether it should be re-enabled in the future.

If the flush is intentionally disabled:

-            # await self.output_track.flush()
+            # TODO: Investigate if output_track.flush() is needed when user starts speaking
+            # Disabled temporarily due to [reason]

Otherwise, remove the line entirely.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In plugins/openai/vision_agents/plugins/openai/openai_realtime.py around lines
233 to 236, there's a commented-out call "# await self.output_track.flush()"
with no explanation; either remove this dead code if it's obsolete, or replace
the comment with a short explanatory comment stating why flush is disabled
(e.g., causes latency/duplicate events, handled elsewhere) and whether/when it
should be re-enabled; if flush must be conditionally run, implement the small
conditional (e.g., a flag or feature toggle) and document the intended behavior
and side effects in that comment so future readers know why it's not awaited
here.

Comment on lines 133 to 144
async def start(self) -> None:
if self._started:
logger.warning(f"rtc manager already started", stack_info=True)
return

if self._shared_forwarder is not None:
# Use the shared forwarder
self._forwarder = self._shared_forwarder
logger.info(f"🎥 OpenAI using shared VideoForwarder at {self._fps} FPS")
else:
# Create our own VideoForwarder with the input source track (legacy behavior)
self._forwarder = VideoForwarder(self._source_track, max_buffer=5, fps=self._fps) # type: ignore[arg-type]
await self._forwarder.start()


if self._shared_forwarder is None:
raise RuntimeError("self._shared_forwarder is None, something is very wrong")

self._forwarder = self._shared_forwarder
logger.info(f"🎥 OpenAI using shared VideoForwarder at {self._fps} FPS")
self._started = True

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

start() hard‑requires a shared_forwarder but the error message and upstream contract are unclear.

Right now, start() raises RuntimeError when self._shared_forwarder is None, but start_video_sender still treats shared_forwarder as “optional.” Align the contract: either require it explicitly (and validate earlier) or create/own a forwarder here.

Apply one of the following:

Option A — enforce requirement early in start_video_sender (preferred):

@@ async def start_video_sender(self, stream_video_track: MediaStreamTrack, fps: int = 1, shared_forwarder=None) -> None:
-        Args:
-            stream_video_track: Video track to forward to OpenAI.
-            fps: Target frames per second.
-            shared_forwarder: Optional shared VideoForwarder to use instead of creating a new one.
+        Args:
+            stream_video_track: Video track to forward to OpenAI.
+            fps: Target frames per second.
+            shared_forwarder: Required. Active VideoForwarder instance to pull frames from.
@@
-        try:
+        try:
+            if shared_forwarder is None:
+                logger.error("❌ shared_forwarder is required; pass the active VideoForwarder instance")
+                raise ValueError("shared_forwarder is required for video forwarding")

Option B — keep “optional” param and create your own forwarder here, then let start() accept None and create it:

@@ class StreamVideoForwardingTrack(VideoStreamTrack):
-    def __init__(self, source_track: MediaStreamTrack, fps: int = 1, shared_forwarder=None):
+    def __init__(self, source_track: MediaStreamTrack, fps: int = 1, shared_forwarder=None):
         ...
         self._forwarder: Optional[VideoForwarder] = None
         self._shared_forwarder = shared_forwarder
+        self._owns_forwarder = False
@@
-        if self._shared_forwarder is None:
-            raise RuntimeError("self._shared_forwarder is None, something is very wrong")
-        self._forwarder = self._shared_forwarder
+        if self._shared_forwarder is None:
+            self._forwarder = VideoForwarder()
+            self._owns_forwarder = True
+        else:
+            self._forwarder = self._shared_forwarder

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +450 to +456
await asyncio.wait_for(
self._data_channel_open_event.wait(), timeout=5.0
)
except asyncio.TimeoutError:
logger.warning("Data channel not open after timeout; dropping event")
logger.warning(
"Data channel not open after timeout; dropping event"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

If the data channel isn’t open, return early; don’t attempt send.

We warn on non‑open readyState but still call send(), causing noisy errors.

Apply:

@@     async def _send_event(self, event: dict):
-            if not self._data_channel_open_event.is_set():
+            if not self._data_channel_open_event.is_set():
                 try:
                     await asyncio.wait_for(
                         self._data_channel_open_event.wait(), timeout=5.0
                     )
                 except asyncio.TimeoutError:
                     logger.warning(
                         "Data channel not open after timeout; dropping event"
                     )
                     return
@@
-            if self.data_channel.readyState and self.data_channel.readyState != "open":
-                logger.warning(
-                    f"Data channel state is '{self.data_channel.readyState}', cannot send event"
-                )
+            if self.data_channel.readyState and self.data_channel.readyState != "open":
+                logger.warning(
+                    f"Data channel state is '{self.data_channel.readyState}', dropping event"
+                )
+                return

Also applies to: 461-462

🤖 Prompt for AI Agents
In plugins/openai/vision_agents/plugins/openai/rtc_manager.py around lines
450-456 (and similarly 461-462), the code waits for the data channel open event
and logs a warning on timeout but still proceeds to call send(), causing noisy
errors; modify the flow so that after catching asyncio.TimeoutError you return
early (or skip the send) instead of continuing, and also add an explicit check
of data_channel.readyState (ensure it equals "open") before calling send() to
avoid attempts to send on a non-open channel.

Comment on lines +507 to +511
forwarding_track = StreamVideoForwardingTrack(
stream_video_track, fps, shared_forwarder=shared_forwarder
)
await forwarding_track.start()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

start_video_sender assumes shared_forwarder; align docstring + validation.

Docstring says “Optional shared VideoForwarder,” but the code now requires it (forwarding_track.start() will raise). Update the docstring and add an explicit check with a helpful error, or implement the “optional” path (create your own forwarder). See earlier comment with diffs.

🤖 Prompt for AI Agents
In plugins/openai/vision_agents/plugins/openai/rtc_manager.py around lines 507
to 511, the start_video_sender implementation assumes a shared_forwarder is
provided but the docstring says it's optional; update either the docstring to
mark shared_forwarder as required or add explicit validation and fallback
creation. Fix by either (a) changing the docstring to state shared_forwarder is
required and raise a clear ValueError if shared_forwarder is None before
creating StreamVideoForwardingTrack, or (b) implement the optional path: if
shared_forwarder is None instantiate a local VideoForwarder (with appropriate
init params), pass it into StreamVideoForwardingTrack, and ensure it is cleaned
up after use; include a helpful error message if creation fails.

import pytest
import soundfile as sf
import torchaudio
from torchvision.io.video import av
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Incorrect import: PyAV should be imported directly, not from torchvision.

Importing av from torchvision.io.video will fail. Based on learnings, torchvision 0.24.0 removed deprecated video decoding/encoding capabilities, and the video_reader backend was removed. The correct approach is to import PyAV directly: import av.

Apply this diff to fix the import:

-from torchvision.io.video import av
+import av

This matches the pattern used in plugins/deepgram/tests/test_stt.py (line 125-188), where PyAV is imported directly and used for audio extraction.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from torchvision.io.video import av
import av
🤖 Prompt for AI Agents
In plugins/silero/tests/test_vad.py around line 9, the test incorrectly imports
PyAV via torchvision (from torchvision.io.video import av); replace that import
with a direct PyAV import (import av) so the test uses the av package directly;
update any references if necessary to match the direct import and run the tests
to confirm audio extraction works as in plugins/deepgram/tests/test_stt.py.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (4)

472-533: Docstring still claims shared_forwarder is "Optional" but code requires it.

Line 483 documents shared_forwarder as "Optional," but Line 512 calls forwarding_track.start(), which will raise RuntimeError if shared_forwarder is None (see Lines 138-141).

Update the docstring to mark shared_forwarder as required and add explicit validation before Line 509, or implement the optional path by creating a local VideoForwarder when None is passed. See the earlier review comment for detailed diffs.


138-145: Hard requirement on shared_forwarder conflicts with "optional" upstream parameter.

The RuntimeError on Line 139 enforces that shared_forwarder must be non-None, but __init__ (Line 100) accepts None by default and start_video_sender (Line 483) documents it as "Optional." This contract mismatch will cause runtime crashes if callers rely on the "optional" declaration.

Refer to the previous review comment suggesting either (A) validate and raise a clear ValueError early in start_video_sender, or (B) create a local forwarder when None is passed.


461-464: Data channel readyState check still missing early return.

After logging the warning that the data channel state is not "open," the code proceeds to attempt send() on Line 467, which will fail noisily.

Apply this diff to return early:

 if self.data_channel.readyState and self.data_channel.readyState != "open":
     logger.warning(
-        f"Data channel state is '{self.data_channel.readyState}', cannot send event"
+        f"Data channel state is '{self.data_channel.readyState}', dropping event"
     )
+    return

660-660: Successful session storage logged at ERROR level.

Logging successful session storage with logger.error pollutes error logs and will trigger false alarms in monitoring systems.

Apply this diff:

-        logger.error(f"Stored session info: {self.session_info}")
+        logger.info(f"Stored session info: {self.session_info}")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 63c6352 and 0412f11.

📒 Files selected for processing (1)
  • plugins/openai/vision_agents/plugins/openai/rtc_manager.py (22 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide

Files:

  • plugins/openai/vision_agents/plugins/openai/rtc_manager.py
🔇 Additional comments (4)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (4)

96-132: LGTM—formatting improvements enhance readability.

The explicit float literals (0.0 instead of 0.) and multiline logging blocks improve code clarity without altering behavior.


147-245: LGTM—multiline logging improves readability.

The frame forwarding logic, error handling, and circuit breaker behavior are preserved; only log formatting has changed.


692-763: LGTM—multiline logging enhances readability.

The frame forwarding logic, timeout handling, and throttling remain intact; only log formatting has been improved.


344-357: The code change is correct. Input audio transcription events are intentionally handled downstream.

The script confirms that transcription events are explicitly handled in openai_realtime.py, with event handler for "conversation.item.input_audio_transcription.completed" at line 204. Multiple tests verify this functionality works correctly, and the feature is documented in the class docstring. The session update in rtc_manager.py enabling transcription is an intentional design choice with full downstream support.


async def _handle_event(self, event: dict) -> None:
"""Minimal event handler for data channel messages."""
print(f"_handle_event {event['type']}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove debug print statement from production code.

The print call bypasses the logging framework and will clutter output. Production code should use the logger exclusively.

Apply this diff:

-    print(f"_handle_event {event['type']}")
+    logger.debug(f"Handling event: {event['type']}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"_handle_event {event['type']}")
logger.debug(f"Handling event: {event['type']}")
🤖 Prompt for AI Agents
In plugins/openai/vision_agents/plugins/openai/rtc_manager.py around line 651,
remove the debug print statement `print(f"_handle_event {event['type']}")` and
replace it with a logger call (e.g., logger.debug or logger.info) that emits the
event type via the module logger; ensure the module has a configured logger
(logging.getLogger(__name__)) and use structured/templated logging (including
event type and minimal context) instead of direct print so output goes through
the logging framework.

@tbarbugli tbarbugli merged commit 3cb339b into main Oct 20, 2025
5 checks passed
@tbarbugli tbarbugli deleted the option-c-unified-message-api branch October 20, 2025 19:22
Nash0x7E2 added a commit to Nash0x7E2/Vision-Agents that referenced this pull request Oct 28, 2025
commit ec32383
Author: Neevash Ramdial (Nash) <mail@neevash.dev>
Date:   Mon Oct 27 15:51:53 2025 -0600

    mypy clean up (GetStream#130)

commit c52fe4c
Author: Neevash Ramdial (Nash) <mail@neevash.dev>
Date:   Mon Oct 27 15:28:00 2025 -0600

    remove turn keeping from example (GetStream#129)

commit e1072e8
Merge: 5bcffa3 fea101a
Author: Yarik <43354956+yarikdevcom@users.noreply.github.com>
Date:   Mon Oct 27 14:28:05 2025 +0100

    Merge pull request GetStream#106 from tjirab/feat/20251017_gh-labeler

    feat: Github pull request labeler

commit 5bcffa3
Merge: 406673c bfe888f
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Sat Oct 25 10:56:27 2025 -0600

    Merge pull request GetStream#119 from GetStream/fix-screensharing

    Fix screensharing

commit bfe888f
Merge: 8019c14 406673c
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Sat Oct 25 10:56:15 2025 -0600

    Merge branch 'main' into fix-screensharing

commit 406673c
Author: Stefan Blos <stefan.blos@gmail.com>
Date:   Sat Oct 25 03:03:10 2025 +0200

    Update README (GetStream#118)

    * Changed README to LaRaes version

    * Remove arrows from table

    * Add table with people & projects to follow

    * Update images and links in README.md

commit 3316908
Author: Tommaso Barbugli <tbarbugli@gmail.com>
Date:   Fri Oct 24 23:48:06 2025 +0200

    Simplify TTS plugin and audio utils (GetStream#123)

    - Simplified TTS plugin
    - AWS Polly TTS plugin
    - OpenAI TTS plugin
    - Improved audio utils

commit 8019c14
Author: Max Kahan <max.kahan@getstream.io>
Date:   Fri Oct 24 17:32:26 2025 +0100

    remove video forwarder lazy init

commit ca62d37
Author: Max Kahan <max.kahan@getstream.io>
Date:   Thu Oct 23 16:44:03 2025 +0100

    use correct codec

commit 8cf8788
Author: Max Kahan <max.kahan@getstream.io>
Date:   Thu Oct 23 14:27:18 2025 +0100

    rename variable to fix convention

commit 33fd70d
Author: Max Kahan <max.kahan@getstream.io>
Date:   Thu Oct 23 14:24:42 2025 +0100

    unsubscribe from events

commit 3692131
Author: Max Kahan <max.kahan@getstream.io>
Date:   Thu Oct 23 14:19:53 2025 +0100

    remove nonexistent type

commit c5f68fe
Author: Max Kahan <max.kahan@getstream.io>
Date:   Thu Oct 23 14:10:07 2025 +0100

    cleanup tests to fit style

commit 8b3c61a
Author: Max Kahan <max.kahan@getstream.io>
Date:   Thu Oct 23 13:55:08 2025 +0100

    clean up resources when track cancelled

commit d8e08cb
Author: Max Kahan <max.kahan@getstream.io>
Date:   Thu Oct 23 13:24:55 2025 +0100

    fix track republishing in agent

commit 0f8e116
Author: Max Kahan <max.kahan@getstream.io>
Date:   Wed Oct 22 15:37:11 2025 +0100

    add tests

commit 08e6133
Author: Max Kahan <max.kahan@getstream.io>
Date:   Wed Oct 22 15:25:37 2025 +0100

    ensure video track dimensions are an even number

commit 6a725b0
Merge: 5f001e0 5088709
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 15:23:58 2025 -0600

    Merge pull request GetStream#122 from GetStream/cleanup_stt

    Cleanup STT

commit 5088709
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 15:23:34 2025 -0600

    cleanup of stt

commit f185120
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 15:08:42 2025 -0600

    more cleanup

commit 05ccbfd
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 14:51:48 2025 -0600

    cleanup

commit bb834ca
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 14:28:53 2025 -0600

    more cleanup for stt

commit 7a3f2d2
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 14:11:35 2025 -0600

    more test cleanup

commit ad7f4fe
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 14:10:57 2025 -0600

    cleanup test

commit 9e50cdd
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 14:03:45 2025 -0600

    large cleanup

commit 5f001e0
Merge: 95a03e4 5d204f3
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 12:01:52 2025 -0600

    Merge pull request GetStream#121 from GetStream/fish_stt

    [AI-201] Fish speech to text (partial)

commit 5d204f3
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 11:48:16 2025 -0600

    remove ugly tests

commit ee9a241
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 11:46:19 2025 -0600

    cleanup

commit 6eb8270
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 11:23:00 2025 -0600

    fix 48khz support

commit 3b90548
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 23 10:59:08 2025 -0600

    first attempt at fish stt, doesnt entirely work just yet

commit 95a03e4
Merge: b90c9e3 b4c0da8
Author: Tommaso Barbugli <tbarbugli@gmail.com>
Date:   Thu Oct 23 10:11:39 2025 +0200

    Merge branch 'main' of github.com:GetStream/Vision-Agents

commit b90c9e3
Author: Tommaso Barbugli <tbarbugli@gmail.com>
Date:   Wed Oct 22 23:28:28 2025 +0200

    remove print and double event handling

commit b4c0da8
Merge: 3d06446 a426bc2
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Wed Oct 22 15:08:51 2025 -0600

    Merge pull request GetStream#117 from GetStream/openrouter

    [AI-194] Openrouter

commit a426bc2
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Wed Oct 22 15:03:10 2025 -0600

    skip broken test

commit ba6c027
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Wed Oct 22 14:50:23 2025 -0600

    almost working openrouter

commit 0b1c873
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Wed Oct 22 14:47:12 2025 -0600

    almost working, just no instruction following

commit ce63233
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Wed Oct 22 14:35:53 2025 -0600

    working memory for openai

commit 149e886
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Wed Oct 22 13:32:43 2025 -0600

    todo

commit e0df1f6
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Wed Oct 22 13:20:38 2025 -0600

    first pass at adding openrouter

commit 3d06446
Merge: 4eb8ef4 ef55d66
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Wed Oct 22 13:20:11 2025 -0600

    Merge branch 'main' of github.com:GetStream/Vision-Agents

commit 4eb8ef4
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Wed Oct 22 13:20:01 2025 -0600

    cleanup ai plugin instructions

commit ef55d66
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Wed Oct 22 12:54:33 2025 -0600

    Add link to stash_pomichter for spatial memory

commit 9c9737f
Merge: c954409 390c45b
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 19:45:09 2025 -0600

    Merge pull request GetStream#115 from GetStream/fish

    [AI-195] Fish support

commit 390c45b
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 19:44:37 2025 -0600

    cleannup

commit 1cc1cf1
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 19:42:03 2025 -0600

    happy tests

commit 8163d32
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 19:39:21 2025 -0600

    fix gemini rule following

commit ada3ac9
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 19:20:18 2025 -0600

    fish tts

commit 61a26cf
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 16:44:03 2025 -0600

    attempt at fish

commit c954409
Merge: ab27e48 c71da10
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 14:18:15 2025 -0600

    Merge pull request GetStream#104 from GetStream/bedrock

    [AI-192] - Bedrock, AWS & Nova

commit c71da10
Author: Tommaso Barbugli <tbarbugli@gmail.com>
Date:   Tue Oct 21 22:00:25 2025 +0200

    maybe

commit b5482da
Author: Tommaso Barbugli <tbarbugli@gmail.com>
Date:   Tue Oct 21 21:46:15 2025 +0200

    debugging

commit 9a36e45
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 13:14:58 2025 -0600

    echo environment name

commit 6893968
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 12:53:58 2025 -0600

    more debugging

commit c35fc47
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 12:45:44 2025 -0600

    add some debug info

commit 0d6d3fd
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 12:03:13 2025 -0600

    run test fix

commit c3a31bd
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 11:52:25 2025 -0600

    log cache hit

commit 04554ae
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 11:48:03 2025 -0600

    fix glob

commit 7da96db
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 11:33:56 2025 -0600

    mypy

commit 186053f
Merge: 4b540c9 ab27e48
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 11:17:17 2025 -0600

    happy tests

commit 4b540c9
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 10:20:04 2025 -0600

    happy tests

commit b05a60a
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 09:17:45 2025 -0600

    add readme

commit 71affcc
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 21 09:13:01 2025 -0600

    rename to aws

commit d2eeba7
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Mon Oct 20 21:32:01 2025 -0600

    ai tts instructions

commit 98a4f9d
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Mon Oct 20 16:49:00 2025 -0600

    small edits

commit ab27e48
Author: Tommaso Barbugli <tbarbugli@gmail.com>
Date:   Mon Oct 20 21:42:04 2025 +0200

    Ensure user agent is initialized before joining the call (GetStream#113)

    * ensure user agent is initialized before joining the call

    * wip

commit 3cb339b
Author: Tommaso Barbugli <tbarbugli@gmail.com>
Date:   Mon Oct 20 21:22:57 2025 +0200

    New conversation API (GetStream#102)

    * trying to resurrect

    * test transcription events for openai

    * more tests for openai and gemini llm

    * more tests for openai and gemini llm

    * update py-client

    * wip

    * ruff

    * wip

    * ruff

    * snap

    * another way

    * another way, a better way

    * ruff

    * ruff

    * rev

    * ruffit

    * mypy everything

    * brief

    * tests

    * openai dep bump

    * snap - broken

    * nothingfuckingworks

    * message id

    * fix test

    * ruffit

commit cb6f00a
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Mon Oct 20 13:18:03 2025 -0600

    use qwen

commit f84b2ad
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Mon Oct 20 13:02:24 2025 -0600

    fix tests

commit e61acca
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Mon Oct 20 12:50:40 2025 -0600

    testing and linting

commit 5f4d353
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Mon Oct 20 12:34:14 2025 -0600

    working

commit c2a15a9
Merge: a310771 1025a42
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Mon Oct 20 11:40:00 2025 -0600

    Merge branch 'main' of github.com:GetStream/Vision-Agents into bedrock

commit a310771
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Mon Oct 20 11:39:48 2025 -0600

    wip

commit b4370f4
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Mon Oct 20 11:22:43 2025 -0600

    something isn't quite working

commit 2dac975
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Mon Oct 20 10:30:04 2025 -0600

    add the examples

commit 6885289
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Sun Oct 19 20:19:42 2025 -0600

    ai realtime docs

commit a0fa3cc
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Sun Oct 19 18:48:06 2025 -0600

    wip

commit b914fc3
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Sun Oct 19 18:40:22 2025 -0600

    fix ai llm

commit b5b00a7
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Sun Oct 19 17:11:26 2025 -0600

    work audio input

commit ac72260
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Sun Oct 19 16:47:19 2025 -0600

    fix model id

commit 2b5863c
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Sun Oct 19 16:32:54 2025 -0600

    wip on bedrock

commit 8bb4162
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Fri Oct 17 15:22:03 2025 -0600

    next up the connect method

commit 7a21e4e
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Fri Oct 17 14:12:00 2025 -0600

    nova progress

commit 16e8ba0
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Fri Oct 17 13:16:00 2025 -0600

    docs for bedrock nova

commit 1025a42
Author: Bart Schuijt <schuijt.bart@gmail.com>
Date:   Fri Oct 17 21:05:45 2025 +0200

    fix: Update .env.example for Gemini Live (GetStream#108)

commit e12112d
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Fri Oct 17 11:49:07 2025 -0600

    wip

commit fea101a
Author: Bart Schuijt <schuijt.bart@gmail.com>
Date:   Fri Oct 17 09:25:55 2025 +0200

    workflow file update

commit bb2d74c
Author: Bart Schuijt <schuijt.bart@gmail.com>
Date:   Fri Oct 17 09:22:33 2025 +0200

    initial commit

commit d2853cd
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 16 19:44:59 2025 -0600

    always remember pep 420

commit 30a8eca
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Thu Oct 16 19:36:58 2025 -0600

    start of bedrock branch

commit fc032bf
Author: Tommaso Barbugli <tbarbugli@gmail.com>
Date:   Thu Oct 16 09:17:42 2025 +0200

    Remove cli handler from examples (GetStream#101)

commit 39a821d
Author: Dan Gusev <dangusev92@gmail.com>
Date:   Tue Oct 14 12:20:41 2025 +0200

    Update Deepgram plugin to use SDK v5.0.0 (GetStream#98)

    * Update Deepgram plugin to use SDK v5.0.0

    * Merge test_realtime and test_stt and update the remaining tests

    * Make deepgram.STT.start() idempotent

    * Clean up unused import

    * Use uv as the default package manager > pip

    ---------

    Co-authored-by: Neevash Ramdial (Nash) <mail@neevash.dev>

commit 2013be5
Author: Tommaso Barbugli <tbarbugli@gmail.com>
Date:   Mon Oct 13 16:57:37 2025 +0200

    ensure chat works with default types (GetStream#99)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants