New conversation API #102

tbarbugli · 2025-10-15T15:35:42Z

Summary by CodeRabbit

New Features
- Unified messaging API with support for streaming content and message buffering
- Automatic smart chunking for large messages with code block preservation
- Participant tracking for multi-user realtime audio conversations
- New transcription events for user and agent speech
Enhancements
- Improved event streaming with unique identifiers for content tracking
- Better message synchronization across conversation backends
- Enhanced realtime audio handling with participant context
Updates
- Dependency upgrades: getstream, cartesia, openai-realtime packages

coderabbitai · 2025-10-15T15:35:55Z

Walkthrough

This PR refactors the conversation messaging API from a fragmented streaming interface to a unified asynchronous interface with send_message and upsert_message. It adds event-driven LLM and speech event handlers in agents, implements multi-user participant tracking for realtime audio, completely reimplements GetStream integration with automatic message chunking, and migrates audio processing from torchaudio to PyAV.

Changes

Cohort / File(s)	Change Summary
Conversation API Refactoring `agents-core/vision_agents/core/agents/conversation.py`	Replaced fragmented streaming API with unified `send_message` and `upsert_message` methods. Introduced `ContentBuffer`, `MessageState`, and `InMemoryConversation` abstractions. Added async state tracking via `_message_states` and `_lock` for buffering and backend synchronization.
Agent Event Handlers `agents-core/vision_agents/core/agents/agents.py`	Added `_setup_llm_events` and `_setup_speech_events` async methods to subscribe to LLM/STT events and route them into unified conversation API. Updated join flow to initialize handlers before conversation creation. Removed legacy delta/response handling methods.
Core Event System Updates `agents-core/vision_agents/core/events/base.py`, `agents-core/vision_agents/core/agents/events.py`	Updated event dataclasses with double-quoted type field defaults. Added `user_id()` method to `BaseEvent`, `plugin_name`/`plugin_version` fields to `PluginBaseEvent`, and new fields `reconnect_attempt` and `was_clean` to connection events. Added `error_message` property to `AgentSayErrorEvent`.
LLM Event System `agents-core/vision_agents/core/llm/events.py`	Switched quote style to double quotes across all event type fields. Removed `RealtimeTranscriptEvent` and `RealtimePartialTranscriptEvent`. Added new `RealtimeUserSpeechTranscriptionEvent` and `RealtimeAgentSpeechTranscriptionEvent`. Enhanced `RealtimeResponseEvent` with `is_complete` and `conversation_item_id` fields; added fields to other realtime events for status/role/content tracking.
Generated SFU Events `agents-core/vision_agents/core/edge/sfu_events.py`	Updated `from_proto` return type annotations to string-literal annotations across ~18 event wrapper classes. Minor control-flow adjustments and dataclass field wrapping with `field(...)` for event type declarations.
Gemini LLM Integration `plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py`	Introduced UUID-based `item_id` tracking for streaming events. Reworked streaming loops to enumerate chunks and propagate `item_id` and index through all event emissions. Updated `_standardize_and_emit_event` signature to accept `item_id` and `idx`.
OpenAI LLM Integration `plugins/openai/vision_agents/plugins/openai/openai_llm.py`	Standardized event forwarding to use explicit keyword arguments. Added `item_id` tracking for streaming and completion events. Enhanced tool-calling workflow with consistent message formatting and multi-round deduplication.
Realtime Audio Base `agents-core/vision_agents/core/llm/realtime.py`	Added `_current_participant: Optional[Participant]` state. Extended `simple_audio_response` to accept optional `participant` parameter. Introduced `_emit_user_speech_transcription` and `_emit_agent_speech_transcription` methods. Removed old transcript emission methods.
Gemini Realtime `plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py`	Extended public API with optional `participant` parameter to `simple_audio_response`. Stores participant in `_current_participant`. Minor refactoring of multi-line formatting and receive loop conditionals.
OpenAI Realtime `plugins/openai/vision_agents/plugins/openai/openai_realtime.py`	Added per-participant mapping from conversation items to `Participant` objects. Tracks pending participant for audio correlation. Enhanced transcript handling with unified agent/user speech emission. Added `simple_response` and `simple_audio_response` with participant support.
RTC Manager `plugins/openai/vision_agents/plugins/openai/rtc_manager.py`	Updated data channel session to include `input_audio_transcription` (whisper-1) config. Minor formatting adjustments to multiline logging and signature styling. Changed `start()` to require non-None `_shared_forwarder`.
GetStream Integration `plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py`	Complete refactor from `InMemoryConversation` base to `Conversation` base. Introduced channel-scoped automatic chunking via `_smart_chunk`, `_split_large_block`, `_force_split`. Replaced worker-thread model with direct async channel interactions. Added lifecycle management for chunks with `_create_chunks` and `_update_chunks`.
GetStream Edge Transport `plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py`	Replaced persistent `self.channel` with local channel via `chat_client.channel(...)`. Updated `StreamConversation` construction to pass local channel. Introduced `is_agent_track` logic to skip `TrackAddedEvent` emission for agent-owned tracks.
Dependency Version Bumps `agents-core/pyproject.toml`, `plugins/cartesia/pyproject.toml`, `plugins/openai/pyproject.toml`	Updated `getstream[webrtc,telemetry]` from ≥2.5.3 to ≥2.5.5; `cartesia` from ≥2.0.5 to ≥2.0.9; `openai[realtime]` from ≥2.2.0 to ≥2.5.0.
Audio Library Migration `plugins/deepgram/pyproject.toml`, `plugins/deepgram/tests/test_stt.py`, `plugins/silero/pyproject.toml`, `plugins/silero/tests/test_vad.py`	Replaced `torchaudio>=2.7.1` with `torchvision>=0.20.0` in dev dependencies. Refactored audio loading from torchaudio to PyAV (`av.open`, frame decoding, mono conversion, resampling to 48kHz). Added float32 normalization and int16 conversion.
Comprehensive Test Suite `plugins/getstream/tests/test_message_chunking.py`, `plugins/getstream/tests/test_stream_conversation.py`, `tests/test_conversation.py`	Added extensive tests for message chunking logic, boundary handling, code-block preservation, and chunk metadata tracking. Added streaming integration tests with mocked GetStream channel. Migrated conversation tests to `InMemoryConversation`-centric async API validation.
Event Integration Tests `plugins/gemini/tests/test_gemini_llm.py`, `plugins/openai/tests/test_openai_llm.py`, `plugins/openai/tests/test_openai_realtime.py`	Added `test_events` methods to verify streaming event emission, chunk/completion sequences, and item_id consistency. Added transcription event tests for user/agent speech tracking and multi-user participant scenarios.
Example Updates `examples/01_simple_agent_example/simple_agent_example.py`, `examples/02_golf_coach_example/golf_coach_example.py`	Switched LLM provider from OpenAI to Gemini. Reduced Gemini Realtime FPS from 10 to 3. Converted single-line constructs to multi-line formatting.
Test Infrastructure `tests/test_realtime_base.py`	Updated `FakeRealtime.simple_audio_response` signature to accept optional `participant` parameter for consistency with realtime implementations.
Configuration & Documentation `.cursor/rules/python.mdc`	Added metadata block to Python rules markdown with description, globs, and strict `alwaysApply` flag for running Python or tests. Includes note on project tooling (uv usage, avoiding `python -m`).

Sequence Diagram

sequenceDiagram
    participant Agent
    participant LLMSetup as _setup_llm_events
    participant SpeechSetup as _setup_speech_events
    participant LLMProvider
    participant STTProvider
    participant Conversation
    
    Note over Agent: join() flow initialization
    Agent->>LLMSetup: await _setup_llm_events()
    LLMSetup->>LLMProvider: subscribe to LLM events
    
    Agent->>SpeechSetup: await _setup_speech_events()
    SpeechSetup->>STTProvider: subscribe to STT/transcription events
    
    Note over Agent: After MCP connection
    Agent->>Conversation: create/initialize
    
    par LLM Event Path
        LLMProvider->>LLMSetup: emit completion/delta events
        LLMSetup->>Conversation: upsert_message(role=assistant, content_index)
    and Speech Event Path
        STTProvider->>SpeechSetup: emit transcription events
        SpeechSetup->>Conversation: upsert_message(role=user, completed=True)
    end
    
    Conversation->>Conversation: buffer/accumulate content
    Conversation->>Conversation: _sync_to_backend() [async]

sequenceDiagram
    participant Client
    participant StreamConversation
    participant GetStreamChannel
    participant Backend
    
    Note over StreamConversation: Unified upsert_message API
    Client->>StreamConversation: upsert_message(content delta, content_index)
    
    activate StreamConversation
    StreamConversation->>StreamConversation: buffer fragment in ContentBuffer
    StreamConversation->>StreamConversation: accumulate content via _apply_pending()
    
    alt First time (not in backend)
        StreamConversation->>StreamConversation: _smart_chunk() split into chunks
        StreamConversation->>GetStreamChannel: _create_chunks() send all chunks
        GetStreamChannel->>Backend: send chunk messages with metadata
    else Subsequent updates
        StreamConversation->>StreamConversation: _update_chunks() compute new chunks
        GetStreamChannel->>Backend: create/update/delete chunks as needed
    end
    
    alt On completion (completed=True)
        StreamConversation->>StreamConversation: finalize all chunks
        GetStreamChannel->>Backend: mark generating=false, finalize state
    end
    deactivate StreamConversation

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Rationale: This PR encompasses significant refactoring across multiple interconnected systems (conversation API, agent orchestration, LLM event handling, GetStream integration, realtime audio). While many changes follow consistent patterns (event type quote updates, participant parameter threading), the core logic changes are heterogeneous and demand separate reasoning: unified message buffering/upsert semantics, chunk lifecycle management with backend synchronization, multi-user participant tracking, and event-driven LLM/speech routing. The breadth of affected files (20+) and the density of business logic in agents.py, conversation.py, stream_conversation.py, and openai_realtime.py require careful validation of cross-domain interactions.

Possibly related PRs

Refactor project structure to replace 'stream_agents' with 'vision_ag… #69: Namespace and import path refactoring from stream_agents to vision_agents, affecting the same module surface touched by this PR's conversation and agent API updates.
fix: Agent Example and TURN detection #70: Agent transcript handling and turn-detection-triggered LLM calls; overlaps with this PR's unified speech/LLM event hooks and transcription event additions.

Suggested reviewers

maxkahan
Nash0x7E2
d3xvn

Poem

Fragments converge on a pin-point—
each delta, each breath now one message,
the bell jar lifted at last.
Chunks marry Backend in ceremony;
participants named, transcripts whole:
a unification bell, tolling.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 59.26% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title "New conversation API" accurately reflects the primary change in this changeset. The core modification involves replacing a fragmented streaming API with a unified, asynchronous messaging API, as evidenced by the substantial refactoring in `agents-core/vision_agents/core/agents/conversation.py` which introduces a new Conversation base class with send_message and upsert_message methods. The title is concise, specific, and clearly communicates the main change without vagueness or misleading information.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch option-c-unified-message-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

plugins/openai/tests/test_openai_llm.py (1)

41-43: Use pytest_asyncio.fixture for async fixtures
Switch the decorator and add the import.

+import pytest_asyncio
@@
-    @pytest.fixture
-    async def llm(self) -> OpenAILLM:
+    @pytest_asyncio.fixture
+    async def llm(self) -> OpenAILLM:
         llm = OpenAILLM(model="gpt-4o")
         return llm

agents-core/vision_agents/core/agents/agents.py (1)

329-351: Duplicate wildcard subscriptions → double event delivery

on_wildcard is registered twice for _coordinator_ws_client, causing duplicate events.

Apply:

-            connection._connection._coordinator_ws_client.on_wildcard(
-                "*", lambda event_name, event: self.events.send(event)
-            )
...
-            connection._connection._coordinator_ws_client.on_wildcard(
-                "*",
-                lambda event_name, event: self.events.send(event),
-            )
+            connection._connection._coordinator_ws_client.on_wildcard(
+                "*", lambda event_name, event: self.events.send(event)
+            )

Keep a single registration for coordinator and the one for _ws_client.

🧹 Nitpick comments (14)

agents-core/vision_agents/core/events/base.py (2)
117-118: Field name ‘user_id’ conflicts with BaseEvent.user_id; rename or drop

This dataclass field shadows the BaseEvent.user_id accessor. It creates inconsistent semantics across events (sometimes attribute, sometimes callable/property) and confuses type-checkers.

Consider renaming to avoid the clash, or rely on BaseEvent.user_id derived from user_metadata.
-    user_id: Optional[str] = None  # type: ignore[assignment]
+    connection_user_id: Optional[str] = None
If you keep it, make BaseEvent.user_id a property (see previous comment) to standardize access patterns. Based on learnings.

41-41: Broaden user_metadata typing to interop with local Participant wrappers

Events elsewhere pass a local Participant dataclass (not the pb2 Participant). Current narrow typing can trigger false positives in type-checkers.
-    user_metadata: Optional[Participant] = None
+    user_metadata: Optional[Any] = None
Alternatively, introduce a Protocol with a user_id: Optional[str] attribute for stronger typing. Based on learnings.
agents-core/vision_agents/core/agents/conversation.py (2)
78-79: Lock held across I/O; confirm intended serialization

Holding self._lock while awaiting _sync_to_backend serializes all message ops and can increase latency. If ordering is not strictly required, consider narrowing the lock to state mutations only, and release before backend I/O.

Would you like a draft refactor that separates state updates (under lock) from backend sync (outside lock) with an immutable payload?

22-25: Use timezone-aware timestamp

For consistency with other UTC-aware timestamps in the codebase, prefer an aware datetime.
-        self.timestamp = datetime.datetime.now()
+        self.timestamp = datetime.datetime.now(datetime.UTC)
agents-core/vision_agents/core/edge/sfu_events.py (1)

488-494: Consistent user_id accessor matches BaseEvent (post-refactor)

This property correctly guards for missing payload fields. Once BaseEvent.user_id is a property, the override becomes conventional and you can drop type: ignore comments in similar properties.
tests/test_conversation.py (1)
45-50: Optional: avoid asserting exact TypeError message text.

Message wording can vary by Python version. It’s enough to assert TypeError raised.
-        with pytest.raises(TypeError) as exc_info:
-            Conversation("instructions", [])
-        assert "Can't instantiate abstract class" in str(exc_info.value)
+        with pytest.raises(TypeError):
+            Conversation("instructions", [])
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
105-112: Add a monotonic sequence_number across the entire stream (don’t reuse idx across phases).

idx resets in follow-up loops, which can confuse event consumers. Use LLMResponseChunkEvent.sequence_number to carry a single increasing counter across both loops; keep content_index for per-phase indexing if you like.

Apply this refactor:
@@
-        for idx, chunk in enumerate(iterator):
+        seq = 0
+        for idx, chunk in enumerate(iterator):
             response_chunk: GenerateContentResponse = chunk
             final_chunk = response_chunk
-            self._standardize_and_emit_event(response_chunk, text_parts, item_id, idx)
+            self._standardize_and_emit_event(response_chunk, text_parts, item_id, idx, seq)
+            seq += 1
@@
-                for idx, chk in enumerate(follow_up_iter):
+                for idx, chk in enumerate(follow_up_iter):
                     follow_up_last = chk
-                    # TODO: unclear if this is correct (item_id and idx)
-                    self._standardize_and_emit_event(chk, follow_up_text_parts, item_id, idx)
+                    self._standardize_and_emit_event(chk, follow_up_text_parts, item_id, idx, seq)
+                    seq += 1
@@
-    def _standardize_and_emit_event(self, chunk: GenerateContentResponse, text_parts: List[str], item_id: str, idx: int) -> Optional[LLMResponseEvent[Any]]:
+    def _standardize_and_emit_event(self, chunk: GenerateContentResponse, text_parts: List[str], item_id: str, idx: int, seq: int) -> Optional[LLMResponseEvent[Any]]:
@@
-            self.events.send(LLMResponseChunkEvent(
+            self.events.send(LLMResponseChunkEvent(
                 plugin_name="gemini",
                 content_index=idx,
                 item_id=item_id,
-                delta=chunk.text,
+                delta=chunk.text,
+                sequence_number=seq,
             ))
Item_id propagation and completion event look correct. Nice.

Also applies to: 153-157, 214-221
agents-core/vision_agents/core/agents/agents.py (2)
709-729: Rename: hasImageProcessers → has_image_processors

Spelling and PEP8. No behavior change.

Apply:
-        hasImageProcessers = len(self.image_processors) > 0
+        has_image_processors = len(self.image_processors) > 0
...
-        if not hasImageProcessers:
+        if not has_image_processors:
...
-                    if hasImageProcessers:
+                    if has_image_processors:
196-240: Add Google-style docstrings to new setup hooks

Document event subscriptions, responsibilities, and side effects.

Example for _setup_llm_events:
async def _setup_llm_events(self):
    """Subscribe to LLM events and sync TTS + conversation.

    Subscribes:
      - LLMResponseCompletedEvent: sends TTS (if any) and replaces assistant message.
      - LLMResponseChunkEvent: appends/streams assistant deltas.

    Side effects:
      - Writes to self.conversation if available.
      - Calls self.tts.send() for non-empty completions.
    """
As per coding guidelines

Also applies to: 241-288
plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)
126-129: Avoid double-converting tools; _get_tools_for_provider already returns provider format

Pass through directly.
-        tools_spec = self._get_tools_for_provider()
-        if tools_spec:
-            kwargs["tools"] = self._convert_tools_to_provider_format(tools_spec)  # type: ignore[arg-type]
+        tools_spec = self._get_tools_for_provider()
+        if tools_spec:
+            kwargs["tools"] = tools_spec  # already provider-formatted
79-85: Type hint: participant should be Optional[Participant]

Minor typing polish.
-        participant: Participant = None,
+        participant: Optional[Participant] = None,
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)

85-103: API error check may never trigger; SDK raises on non-2xx

response.data.message.type == "error" is unlikely; the SDK usually raises exceptions. Consider removing or logging only.
plugins/getstream/tests/test_message_chunking.py (2)
138-144: Strengthen boundary test

One char over should split.
-        # Should be split (no newlines, so splits anywhere)
-        assert len(chunks) >= 1
+        # Should be split into at least 2 chunks
+        assert len(chunks) >= 2
161-167: Clarify multiple code blocks assertion

Directly assert presence of both blocks.
-        assert full_text.count('```python') == full_text.count('```') / 2 or '```python' in full_text
-        assert 'code1()' in full_text
-        assert 'code2()' in full_text
+        assert '```python' in full_text and '```javascript' in full_text
+        assert 'code1()' in full_text
+        assert 'code2()' in full_text

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 39a821d and 102a61e.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (20)

.cursor/rules/python.mdc (1 hunks)
agents-core/pyproject.toml (1 hunks)
agents-core/vision_agents/core/agents/agents.py (19 hunks)
agents-core/vision_agents/core/agents/conversation.py (2 hunks)
agents-core/vision_agents/core/agents/events.py (2 hunks)
agents-core/vision_agents/core/edge/sfu_events.py (70 hunks)
agents-core/vision_agents/core/events/base.py (4 hunks)
agents-core/vision_agents/core/llm/events.py (9 hunks)
examples/01_simple_agent_example/simple_agent_example.py (2 hunks)
examples/02_golf_coach_example/golf_coach_example.py (2 hunks)
plugins/cartesia/pyproject.toml (1 hunks)
plugins/gemini/tests/test_gemini_llm.py (2 hunks)
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (6 hunks)
plugins/getstream/tests/test_message_chunking.py (1 hunks)
plugins/getstream/tests/test_stream_conversation.py (1 hunks)
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1 hunks)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1 hunks)
plugins/openai/tests/test_openai_llm.py (3 hunks)
plugins/openai/vision_agents/plugins/openai/openai_llm.py (15 hunks)
tests/test_conversation.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide

Files:

plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py
examples/01_simple_agent_example/simple_agent_example.py
plugins/gemini/tests/test_gemini_llm.py
plugins/openai/vision_agents/plugins/openai/openai_llm.py
agents-core/vision_agents/core/llm/events.py
agents-core/vision_agents/core/edge/sfu_events.py
agents-core/vision_agents/core/agents/events.py
agents-core/vision_agents/core/agents/agents.py
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py
plugins/openai/tests/test_openai_llm.py
agents-core/vision_agents/core/agents/conversation.py
tests/test_conversation.py
agents-core/vision_agents/core/events/base.py
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py
plugins/getstream/tests/test_stream_conversation.py
examples/02_golf_coach_example/golf_coach_example.py
plugins/getstream/tests/test_message_chunking.py

tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

tests/**/*.py: Do not use mocking in tests
All tests must use pytest
Integration tests must be marked with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic

Files:

tests/test_conversation.py

🧬 Code graph analysis (17)

plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (2)

plugins/getstream/tests/test_message_chunking.py (2)

conversation (15-27)

conversation (244-251)

agents-core/vision_agents/core/agents/conversation.py (6)

Conversation (67-227)

Message (14-24)

MessageState (55-64)

_sync_to_backend (213-223)

_sync_to_backend (233-237)

send_message (80-120)

examples/01_simple_agent_example/simple_agent_example.py (4)

examples/02_golf_coach_example/golf_coach_example.py (1)

start_agent (12-38)

plugins/gemini/tests/test_gemini_llm.py (1)

llm (33-36)

plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)

simple_response (79-107)

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)

simple_response (88-99)

plugins/gemini/tests/test_gemini_llm.py (4)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (108-123)

LLMResponseCompletedEvent (127-133)

plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (2)

GeminiLLM (21-322)

send_message (75-182)

plugins/gemini/vision_agents/plugins/gemini/events.py (2)

GeminiResponseEvent (35-38)

GeminiErrorEvent (14-17)

agents-core/vision_agents/core/events/manager.py (1)

wait (451-465)

plugins/openai/vision_agents/plugins/openai/openai_llm.py (4)

agents-core/vision_agents/core/edge/sfu_events.py (7)

Participant (229-270)

participant (1495-1500)

participant (1537-1542)

participant (1610-1615)

participant (2078-2083)

participant (2127-2132)

name (2162-2166)

agents-core/vision_agents/core/llm/llm.py (4)

LLM (35-358)

LLMResponseEvent (25-28)

_dedup_and_execute (313-345)

_get_tools_for_provider (86-95)

agents-core/vision_agents/core/llm/events.py (3)

LLMResponseChunkEvent (108-123)

LLMResponseCompletedEvent (127-133)

error_message (103-104)

plugins/openai/vision_agents/plugins/openai/events.py (2)

OpenAIStreamEvent (7-11)

LLMErrorEvent (15-19)

agents-core/vision_agents/core/llm/events.py (1)

agents-core/vision_agents/core/events/base.py (1)

PluginBaseEvent (50-52)

agents-core/vision_agents/core/edge/sfu_events.py (1)

agents-core/vision_agents/core/events/base.py (2)

user_id (43-46)

BaseEvent (34-46)

agents-core/vision_agents/core/agents/events.py (2)

agents-core/vision_agents/core/edge/sfu_events.py (6)

user_id (489-493)

user_id (856-860)

user_id (901-905)

user_id (1186-1190)

user_id (2071-2075)

user_id (2113-2117)

agents-core/vision_agents/core/events/base.py (2)

user_id (43-46)

PluginBaseEvent (50-52)

agents-core/vision_agents/core/agents/agents.py (7)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (7-22)

agents-core/vision_agents/core/turn_detection/events.py (2)

TurnStartedEvent (10-25)

TurnEndedEvent (29-44)

agents-core/vision_agents/core/agents/conversation.py (2)

Conversation (67-227)

upsert_message (122-210)

agents-core/vision_agents/core/mcp/mcp_manager.py (1)

MCPManager (9-147)

agents-core/vision_agents/core/events/manager.py (2)

subscribe (292-353)

send (406-449)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)

create_conversation (224-231)

create_audio_track (291-294)

agents-core/vision_agents/core/utils/video_forwarder.py (1)

VideoForwarder (13-188)

plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)

agents-core/vision_agents/core/llm/llm.py (1)

LLMResponseEvent (25-28)

plugins/openai/tests/test_openai_llm.py (4)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (108-123)

LLMResponseCompletedEvent (127-133)

plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)

OpenAILLM (31-509)

plugins/openai/vision_agents/plugins/openai/events.py (2)

OpenAIStreamEvent (7-11)

LLMErrorEvent (15-19)

agents-core/vision_agents/core/events/manager.py (1)

wait (451-465)

agents-core/vision_agents/core/agents/conversation.py (1)

plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)

_sync_to_backend (45-64)

tests/test_conversation.py (1)

agents-core/vision_agents/core/agents/conversation.py (3)

Message (14-24)

send_message (80-120)

upsert_message (122-210)

agents-core/vision_agents/core/events/base.py (1)

agents-core/vision_agents/core/edge/sfu_events.py (7)

Participant (229-270)

user_id (489-493)

user_id (856-860)

user_id (901-905)

user_id (1186-1190)

user_id (2071-2075)

user_id (2113-2117)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)

plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)

StreamConversation (17-353)

plugins/getstream/tests/test_stream_conversation.py (3)

plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)

StreamConversation (17-353)

plugins/getstream/tests/test_message_chunking.py (3)

mock_channel (217-241)

conversation (15-27)

conversation (244-251)

agents-core/vision_agents/core/agents/conversation.py (2)

send_message (80-120)

upsert_message (122-210)

examples/02_golf_coach_example/golf_coach_example.py (3)

plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)

simple_response (79-107)

agents-core/vision_agents/core/agents/agents.py (1)

simple_response (168-180)

tests/test_realtime_base.py (1)

simple_response (238-252)

plugins/getstream/tests/test_message_chunking.py (2)

plugins/getstream/tests/test_stream_conversation.py (2)

stream_conversation (48-57)

mock_channel (24-45)

plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (3)

StreamConversation (17-353)

_smart_chunk (206-301)

_split_large_block (303-332)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"

🔇 Additional comments (13)

examples/02_golf_coach_example/golf_coach_example.py (3)

17-17: Verify 3 FPS provides adequate golf swing analysis.

The FPS reduction from 10 to 3 significantly decreases how often the LLM processes frames. A golf swing typically completes in 1–2 seconds, so at 3 FPS, only 3–6 frames are analyzed per swing. While this conserves cost (as the comment notes), it may miss critical swing phase details needed for meaningful feedback.

Please confirm this frame rate has been tested and still delivers useful coaching insights.

19-21: LGTM.

Multi-line formatting improves readability.

33-35: LGTM.

Multi-line formatting improves readability.

examples/01_simple_agent_example/simple_agent_example.py (4)

6-6: LGTM! Import aligns with LLM provider change.

The import change from openai to gemini is consistent with the LLM initialization below.

16-18: LGTM! Improved readability.

The multiline formatting of agent_user improves code readability without changing functionality.

25-27: LGTM! Improved readability.

The multiline formatting of TurnDetection improves code readability without changing functionality.

12-12: Model identifier is correct. The canonical Gemini 2.0 Flash ID is gemini-2.0-flash; the -exp suffix denotes the experimental variant and doesn’t apply here.

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)

226-231: Channel scoping refactor looks good

Using a local channel and passing it to StreamConversation removes unnecessary state and aligns with the chunked backend sync design.

plugins/gemini/tests/test_gemini_llm.py (1)

82-206: Strong integration coverage for streaming/event pipeline

Good assertions on chunk presence, completion, item_id consistency, and reconstruction.

Minor thought: if any provider reorders indices, consider asserting on sorted(content_indices) against range(max+1). Not blocking.

plugins/openai/tests/test_openai_llm.py (1)

101-228: Event-stream test is thorough and aligns with the new event model.

Good coverage: chunk vs completion correlation, item_id propagation, ordering, and OpenAI stream event types.

Please confirm OpenAILLM sets content_index as None or sequential; your test tolerates both, which is fine given current mapping notes.

agents-core/vision_agents/core/agents/events.py (1)

50-52: error_message property addition looks good.

Simple, safe accessor; no concerns.

agents-core/vision_agents/core/llm/events.py (1)

108-134: LGTM on event schema updates

item_id and delta additions align with upstream emitters; no issues spotted.

plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)

146-156: Verify SDK method and payload shape
Ensure ephemeral_message_update exists on the client and that its set payload accepts the chunk metadata at top level; if the SDK requires metadata under a nested custom key, adjust the payload accordingly.

coderabbitai · 2025-10-15T15:47:49Z

agents-core/pyproject.toml

 requires-python = ">=3.10"
 dependencies = [
-    "getstream[webrtc,telemetry]>=2.5.3",
+    "getstream[webrtc,telemetry]>=2.5.5",


⚠️ Potential issue | 🔴 Critical

Blocker: getstream>=2.5.5 is not publishable yet.

PyPI’s latest public wheel is 2.5.3; requesting 2.5.5 makes installs fail with “No matching distribution found.” Please keep the floor at 2.5.3 (or whichever released version supplies the needed features) until 2.5.5 actually ships.

- "getstream[webrtc,telemetry]>=2.5.5", + "getstream[webrtc,telemetry]>=2.5.3",

Based on learnings

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"getstream[webrtc,telemetry]>=2.5.5",

"getstream[webrtc,telemetry]>=2.5.3",

🤖 Prompt for AI Agents

In agents-core/pyproject.toml around line 24, the dependency currently requests getstream[webrtc,telemetry]>=2.5.5 which isn't published on PyPI; change the constraint to >=2.5.3 (or the newest released version that provides the required features) — e.g. getstream[webrtc,telemetry]>=2.5.3 — then update the lockfile (poetry lock / pip-compile / equivalent) and verify the needed features exist in that released version before finalizing.

agents-core/vision_agents/core/agents/agents.py

coderabbitai · 2025-10-15T15:47:49Z

agents-core/vision_agents/core/agents/conversation.py

+            # Update content
+            if content_index is not None:
+                # Streaming: buffer fragments in order
+                state.buffer.add_fragment(content_index, content)
+                message.content = state.buffer.get_accumulated()
+            elif replace:
+                # Replace all content
+                state.buffer.clear()
+                message.content = content
+            else:
+                # Append to existing
+                message.content += content
+


⚠️ Potential issue | 🔴 Critical

Fix content loss when mixing streaming (content_index) and append updates

Appending without updating the ContentBuffer means the next indexed chunk overwrite will drop appended text. Use the buffer for all mutations to keep a single source of truth.

- elif replace: + elif replace: # Replace all content state.buffer.clear() message.content = content else: - # Append to existing - message.content += content + # Append to existing via buffer to preserve order with future indexed chunks + next_index = state.buffer.last_index + 1 + if next_index < 0: + next_index = 0 + state.buffer.add_fragment(next_index, content) + message.content = state.buffer.get_accumulated()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Update content

if content_index is not None:

# Streaming: buffer fragments in order

state.buffer.add_fragment(content_index, content)

message.content = state.buffer.get_accumulated()

elif replace:

# Replace all content

state.buffer.clear()

message.content = content

else:

# Append to existing

message.content += content

# Update content

if content_index is not None:

# Streaming: buffer fragments in order

state.buffer.add_fragment(content_index, content)

message.content = state.buffer.get_accumulated()

elif replace:

# Replace all content

state.buffer.clear()

message.content = content

else:

# Append to existing via buffer to preserve order with future indexed chunks

next_index = state.buffer.last_index + 1

if next_index < 0:

next_index = 0

state.buffer.add_fragment(next_index, content)

message.content = state.buffer.get_accumulated()

🤖 Prompt for AI Agents

In agents-core/vision_agents/core/agents/conversation.py around lines 190-202, the current logic appends text directly to message.content when content_index is None and replace is False, which causes appended text to be lost when later indexed fragments arrive; instead, always mutate the ContentBuffer as the single source of truth: for replace, clear the buffer and (if appropriate) add the replacement content as a base fragment; for append, add the incoming content to the buffer via the buffer's fragment/append API (or add_fragment with a sentinel/next index), and after every branch set message.content = state.buffer.get_accumulated() so message.content is always derived from the buffer.

coderabbitai · 2025-10-15T15:47:50Z

agents-core/vision_agents/core/agents/events.py

+    user_id: Optional[str] = None  # type: ignore[assignment]
    metadata: Optional[Dict[str, Any]] = None


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Do not shadow BaseEvent.user_id method; this breaks polymorphism at runtime.

Declaring a user_id field overrides the BaseEvent.user_id method. Any code calling event.user_id() will crash. Rename the field and, if needed, override the method explicitly.

Apply this diff to avoid shadowing:

- user_id: Optional[str] = None # type: ignore[assignment] + actor_user_id: Optional[str] = None @@ - user_id: Optional[str] = None # type: ignore[assignment] + actor_user_id: Optional[str] = None @@ - user_id: Optional[str] = None # type: ignore[assignment] + actor_user_id: Optional[str] = None @@ - user_id: Optional[str] = None # type: ignore[assignment] + actor_user_id: Optional[str] = None

Then add method overrides to preserve the original API shape:

# Add inside each class def user_id(self) -> Optional[str]: # type: ignore[override] # Prefer explicit field if set, else fallback to BaseEvent behavior return self.actor_user_id or super().user_id()

Check for method-style calls that would be broken:

Also applies to: 26-27, 36-37, 47-48

🏁 Script executed:

#!/bin/bash rg -nP --type=py -C2 '\.user_id\s*\('

Length of output: 825

Avoid shadowing BaseEvent.user_id method
Rename the user_id attribute in each subclass (e.g. to actor_user_id) and override user_id() to preserve the original API:

- user_id: Optional[str] = None # type: ignore[assignment] + actor_user_id: Optional[str] = None

def user_id(self) -> Optional[str]: # type: ignore[override] return self.actor_user_id or super().user_id()

This prevents runtime errors in calls to event.user_id().

🤖 Prompt for AI Agents

In agents-core/vision_agents/core/agents/events.py around lines 12 to 13, the subclasses define a user_id attribute which shadows the BaseEvent.user_id method; rename the attribute in each subclass (for example to actor_user_id) and add an overriding user_id(self) -> Optional[str] method that returns the new attribute or falls back to super().user_id() (use type ignore override if needed), so calls to event.user_id() continue to work and avoid runtime errors.

coderabbitai · 2025-10-15T15:47:50Z

agents-core/vision_agents/core/events/base.py

+    def user_id(self) -> Optional[str]:
+        if self.user_metadata is None:
+            return None
+        return getattr(self.user_metadata, "user_id")


🛠️ Refactor suggestion | 🟠 Major

Make user_id a property and guard for missing attribute

Unify the API shape with other event classes and avoid method/field clashes. Also prevent AttributeError if user_metadata lacks user_id.

- def user_id(self) -> Optional[str]: - if self.user_metadata is None: - return None - return getattr(self.user_metadata, "user_id") + @property + def user_id(self) -> Optional[str]: + if self.user_metadata is None: + return None + return getattr(self.user_metadata, "user_id", None)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def user_id(self) -> Optional[str]:

if self.user_metadata is None:

return None

return getattr(self.user_metadata, "user_id")

@property

def user_id(self) -> Optional[str]:

if self.user_metadata is None:

return None

return getattr(self.user_metadata, "user_id", None)

🤖 Prompt for AI Agents

In agents-core/vision_agents/core/events/base.py around lines 43 to 46, change the user_id method into a @property that returns Optional[str] and guard against a missing attribute by using getattr(self.user_metadata, "user_id", None) (or equivalent) instead of getattr(...) without a default; keep the existing None check for user_metadata and update any type hints or docstrings to reflect that user_id is a property rather than a method.

coderabbitai · 2025-10-15T15:47:50Z

plugins/getstream/tests/test_stream_conversation.py

+import pytest
+import uuid
+import asyncio
+from unittest.mock import Mock, AsyncMock


⚠️ Potential issue | 🟠 Major

Replace mocks with simple fakes (tests must not use mocking).

Per test guidelines, avoid unittest.mock. Implement minimal fake Channel/Client classes with the needed async methods and state tracking instead of Mock/AsyncMock.

Apply these changes:

Remove the mock imports.

Rewrite the fixture to return fakes.

-from unittest.mock import Mock, AsyncMock @@ - def mock_channel(self): - """Create a mock Channel.""" - channel = Mock() - channel.channel_type = "messaging" - channel.channel_id = "test-channel-123" - - # Mock the client - channel.client = Mock() - - # Create async mocks for client methods - channel.client.update_message_partial = AsyncMock(return_value=Mock()) - channel.client.ephemeral_message_update = AsyncMock(return_value=Mock()) - - # Mock send_message response - mock_response = Mock() - mock_response.data.message.id = "stream-message-123" - mock_response.data.message.type = "regular" - - # Create async mock for send_message - channel.send_message = AsyncMock(return_value=mock_response) - - return channel + def mock_channel(self): + """Create a fake Channel (no unittest.mock).""" + class FakeResponseMsg: + def __init__(self, mid: str): + self.id = mid + self.type = "regular" + self.custom = {} + self.text = "" + class FakeResponseData: + def __init__(self, mid: str): + self.message = FakeResponseMsg(mid) + class FakeResponse: + def __init__(self, mid: str): + self.data = FakeResponseData(mid) + class FakeClient: + async def update_message_partial(self, *_args, **_kwargs): + return object() + async def ephemeral_message_update(self, *_args, **_kwargs): + return object() + async def delete_message(self, *_args, **_kwargs): + return object() + class FakeChannel: + def __init__(self): + self.channel_type = "messaging" + self.channel_id = "test-channel-123" + self.client = FakeClient() + self._send_count = 0 + async def send_message(self, request): + mid = f"stream-message-{self._send_count}" + self._send_count += 1 + return FakeResponse(mid) + return FakeChannel()

Adjust assertions that check Mock call methods to validate behavior (e.g., maintain counters or flags on the fake).

I can provide tailored fakes and assertion updates if you confirm required behaviors.

Also applies to: 23-45

🤖 Prompt for AI Agents

In plugins/getstream/tests/test_stream_conversation.py around lines 7 and 23-45, the test currently imports and uses unittest.mock.Mock/AsyncMock; replace those with small hand-written fake classes: remove the mock imports, implement a FakeChannel and FakeClient inside the test module that expose the async methods used by the tests (e.g., async send, async get or similar) and maintain simple state counters/flags (call_count, last_args, or a list of messages) so assertions can inspect behavior; update the fixture to return instances of these fakes instead of mocks, and replace any Mock-specific assertions (like assert_awaited or call_args) with checks against the fake's counters/flags or stored data to validate expected interactions.

coderabbitai · 2025-10-15T15:47:50Z

plugins/getstream/tests/test_stream_conversation.py

+    @pytest.mark.asyncio
+    async def test_send_message_simple(self, stream_conversation, mock_channel):
+        """Test send_message convenience method."""
+        await stream_conversation.send_message(
+            role="user",
+            user_id="user123",
+            content="Hello",
+        )
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Remove @pytest.mark.asyncio across this file.

Async support is automatic in this repo; decorators are prohibited.

Example:

-@pytest.mark.asyncio async def test_send_message_simple(...):

Find all occurrences:

Also applies to: 80-84, 83-106, 107-142, 143-183, 184-207, 208-234, 236-615

🏁 Script executed:

#!/bin/bash rg -n '@pytest\.mark\.asyncio' plugins/getstream/tests/test_stream_conversation.py

Length of output: 371

Remove all @pytest.mark.asyncio decorators Async support is automatic; these decorators are prohibited.

🤖 Prompt for AI Agents

In plugins/getstream/tests/test_stream_conversation.py around lines 59 to 67, remove the prohibited @pytest.mark.asyncio decorator(s) from the async test function(s); leave the async def test_send_message_simple(...) signature intact (async support is automatic), and delete any corresponding import of pytest.mark.asyncio if it exists so no unused decorator remains.

coderabbitai · 2025-10-15T15:47:51Z

plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py

+                    await self.channel.client.update_message_partial(
+                        chunk_id,
+                        user_id=message.user_id,
+                        set={
+                            "text": chunk_text,
+                            "generating": False if is_last_chunk else False,
+                            "chunk_group": message.id,
+                            "chunk_index": i,
+                            "total_chunks": new_chunk_count,
+                        },
+                    )


⚠️ Potential issue | 🟠 Major

Partial update should set custom. fields under custom, not at root*

Current code writes generating/chunk_* at message root; creation wrote them under custom. Align shapes.

- await self.channel.client.update_message_partial( - chunk_id, - user_id=message.user_id, - set={ - "text": chunk_text, - "generating": False if is_last_chunk else False, - "chunk_group": message.id, - "chunk_index": i, - "total_chunks": new_chunk_count, - }, - ) + await self.channel.client.update_message_partial( + chunk_id, + user_id=message.user_id, + set={ + "text": chunk_text, + "custom": { + "generating": False, + "chunk_group": message.id, + "chunk_index": i, + "total_chunks": new_chunk_count, + }, + }, + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

await self.channel.client.update_message_partial(

chunk_id,

user_id=message.user_id,

set={

"text": chunk_text,

"generating": False if is_last_chunk else False,

"chunk_group": message.id,

"chunk_index": i,

"total_chunks": new_chunk_count,

},

)

await self.channel.client.update_message_partial(

chunk_id,

user_id=message.user_id,

set={

"text": chunk_text,

"custom": {

"generating": False,

"chunk_group": message.id,

"chunk_index": i,

"total_chunks": new_chunk_count,

},

},

)

🤖 Prompt for AI Agents

In plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py around lines 129 to 139, the partial update writes generating and chunk_* fields at the message root whereas message creation stores them under the custom object; change the update payload so "text" remains at root but the generating, chunk_group, chunk_index and total_chunks fields are nested inside a "custom" dict (i.e., set={"text": chunk_text, "custom": {"generating": <bool>, "chunk_group": message.id, "chunk_index": i, "total_chunks": new_chunk_count}}), keeping the same boolean logic for generating.

coderabbitai · 2025-10-15T15:47:51Z

plugins/openai/vision_agents/plugins/openai/openai_llm.py

+            self.events.send(
+                LLMResponseCompletedEvent(
+                    item_id=llm_response.original.output[0].id,
+                    original=llm_response.original,
+                    text=llm_response.text,
+                )
+            )


⚠️ Potential issue | 🟡 Minor

Guard item_id extraction for non-streaming responses

Responses without text outputs (e.g., only tool calls) can leave output empty.

- self.events.send( - LLMResponseCompletedEvent( - item_id=llm_response.original.output[0].id, - original=llm_response.original, - text=llm_response.text, - ) - ) + self.events.send( + LLMResponseCompletedEvent( + item_id=( + llm_response.original.output[0].id + if getattr(llm_response.original, "output", None) + and len(llm_response.original.output) > 0 + else None + ), + original=llm_response.original, + text=llm_response.text, + ) + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.events.send(

LLMResponseCompletedEvent(

item_id=llm_response.original.output[0].id,

original=llm_response.original,

text=llm_response.text,

)

)

self.events.send(

LLMResponseCompletedEvent(

item_id=(

llm_response.original.output[0].id

if getattr(llm_response.original, "output", None)

and len(llm_response.original.output) > 0

else None

),

original=llm_response.original,

text=llm_response.text,

)

)

🤖 Prompt for AI Agents

In plugins/openai/vision_agents/plugins/openai/openai_llm.py around lines 197 to 203, the code assumes llm_response.original.output[0] exists which will raise when output is empty (e.g., non-text/tool-only responses); guard the extraction by checking that llm_response.original.output is a non-empty list before accessing [0] and use a safe fallback (for example None or llm_response.original.id) for item_id when output is empty, then send the LLMResponseCompletedEvent using that guarded item_id so the event is still emitted without raising.

coderabbitai · 2025-10-15T15:47:51Z

tests/test_conversation.py

+    @pytest.mark.asyncio
+    async def test_send_message(self, conversation):
+        """Test send_message creates a new message."""
+        message = await conversation.send_message(
            role="user",


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Remove @pytest.mark.asyncio decorators (auto async support is enabled).

Per project guidelines, drop these markers; keep tests as async def only.

Representative change:

- @pytest.mark.asyncio async def test_send_message(self, conversation):

List occurrences to update:

Also applies to: 95-103, 122-131, 135-146, 179-188, 210-219, 248-260

🏁 Script executed:

#!/bin/bash rg -n --type=py '@pytest\.mark\.asyncio' tests | sed -n '1,120p'

Length of output: 4278

Remove all @pytest.mark.asyncio decorators
Async support is automatic; retain only async def on test functions.

Representative change:

- @pytest.mark.asyncio async def test_send_message(self, conversation):

To remove everywhere:

rg -l '@pytest\.mark\.asyncio' tests | xargs sed -i '/@pytest.mark.asyncio/d'

🤖 Prompt for AI Agents

In tests/test_conversation.py around lines 80 to 84, remove the @pytest.mark.asyncio decorator above the async test function test_send_message; keep the async def signature only. Search for and delete any occurrences of @pytest.mark.asyncio in this file (and other tests if applicable) so tests rely on pytest's builtin asyncio support, ensuring indentation and spacing remain correct after removal.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/openai/vision_agents/plugins/openai/openai_realtime.py (1)
16-20: Unify PcmData type to the core definition to avoid drift.

This module imports PcmData from RTC utils while the rest of the repo uses vision_agents.core.edge.types.PcmData. Unify to the core type; RTC layer only needs .samples/.sample_rate.
-from getstream.video.rtc.track_util import PcmData
+from vision_agents.core.edge.types import PcmData
@@
-    async def simple_audio_response(
-        self, audio: PcmData, participant: Optional[Participant] = None
-    ):
+    async def simple_audio_response(
+        self, audio: PcmData, participant: Optional[Participant] = None
+    ):
(Optional) Adjust RTCManager.send_audio_pcm typing if needed; runtime behavior remains identical.

Also applies to: 121-136

♻️ Duplicate comments (1)

agents-core/vision_agents/core/agents/agents.py (1)

215-224: Delta upserts will fan‑out messages when item_id is missing — add a stable fallback map and clean up on completion.

Several providers omit item_id on chunks/completions; your code passes None, creating a new message per delta. Maintain a per‑response key→message_id map (e.g., by (output_index, content_index)), use it for chunks, then finalize and delete on completion.

Apply:

@@
-            await self.conversation.upsert_message(
-                message_id=event.item_id,
+            # Use a stable id for this response: prefer item_id, else map by indexes
+            key = event.item_id or f"{event.output_index or 0}:{event.content_index or 0}"
+            msg_id = getattr(self, "_llm_stream_msg_ids", {}).get(key)
+            if not getattr(self, "_llm_stream_msg_ids", None):
+                self._llm_stream_msg_ids = {}
+            if msg_id is None:
+                msg_id = str(uuid.uuid4())
+                self._llm_stream_msg_ids[key] = msg_id
+            await self.conversation.upsert_message(
+                message_id=msg_id,
                 role="assistant",
                 user_id=self.agent_user.id or "agent",
                 content=event.delta or "",
                 content_index=event.content_index,
                 completed=False,  # Still streaming
             )
@@
-            await self.conversation.upsert_message(
-                message_id=event.item_id,
+            key = event.item_id or f"{getattr(event,'output_index',0) or 0}:{None}"
+            msg_id = getattr(self, "_llm_stream_msg_ids", {}).get(key) or event.item_id
+            await self.conversation.upsert_message(
+                message_id=msg_id,
                 role="assistant",
                 user_id=self.agent_user.id or "agent",
                 content=event.text or "",
                 completed=True,
                 replace=True,  # Replace any partial content from deltas
             )
+            # Cleanup mapping for this response
+            if key in self._llm_stream_msg_ids:
+                self._llm_stream_msg_ids.pop(key, None)

Also applies to: 226-245

🧹 Nitpick comments (16)

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (2)
664-665: FIXME comment indicates unresolved typing issue.

The event dictionary typing could be improved with a TypedDict or protocol.

Would you like me to generate a TypedDict definition for the OpenAI Realtime API events, or open an issue to track this?

711-736: Consider reducing log verbosity for frame-by-frame details.

Per-frame logging at INFO level (lines 714-716, 731-736) could be excessive in production. Consider DEBUG level for frame-by-frame details while keeping INFO for summaries.

Example adjustment:
-            logger.info(
-                f"🎥 Starting video frame forwarding loop (fps={fps}, interval={interval:.3f}s)"
-            )
-            logger.info(
-                f"🎥 Source track: {type(source_track).__name__}, kind={getattr(source_track, 'kind', 'unknown')}"
-            )
+            logger.info(
+                f"🎥 Starting video frame forwarding loop (fps={fps}, interval={interval:.3f}s)"
+            )
+            logger.debug(
+                f"🎥 Source track: {type(source_track).__name__}, kind={getattr(source_track, 'kind', 'unknown')}"
+            )
And:
                    # Log frame details
-                    logger.info(
-                        f"🎥 SUCCESS: Read frame #{frame_count} from user track!"
-                    )
-                    logger.info(
-                        f"🎥 Frame details: {frame.width}x{frame.height}, format={frame.format}, pts={frame.pts}"
-                    )
+                    logger.debug(
+                        f"🎥 SUCCESS: Read frame #{frame_count} from user track!"
+                    )
+                    logger.debug(
+                        f"🎥 Frame details: {frame.width}x{frame.height}, format={frame.format}, pts={frame.pts}"
+                    )
agents-core/vision_agents/core/agents/agents.py (6)
276-285: Realtime transcript upserts: replace=True with random IDs is redundant.

You already generate a new UUID per transcript. Remove replace=True to avoid confusion; reserve it for true replacements.
-                completed=True,
-                replace=True,
+                completed=True,
(and same change in the agent transcript block)

Also applies to: 295-303

684-687: Background task lacks error hook — attach callback to avoid swallowed exceptions.
-                asyncio.create_task(
-                    self.llm.simple_audio_response(pcm_data, participant)
-                )
+                task = asyncio.create_task(
+                    self.llm.simple_audio_response(pcm_data, participant)
+                )
+                task.add_done_callback(_log_task_exception)
249-261: Reduce PII in logs — truncate transcript text.

Full transcripts in INFO can leak user data. Log first N chars or switch to DEBUG.
-            self.logger.info(f"🎤 [Transcript]: {event.text}")
+            self.logger.debug("🎤 [Transcript]: %s", (event.text or "")[:120])
@@
-            self.logger.info(f"🎤 [User transcript]: {event.text}")
+            self.logger.debug("🎤 [User transcript]: %s", (event.text or "")[:120])
@@
-            self.logger.info(f"🎤 [Agent transcript]: {event.text}")
+            self.logger.debug("🎤 [Agent transcript]: %s", (event.text or "")[:120])
Also applies to: 271-303

344-351: Early events can be lost before conversation is set — buffer or await creation earlier.

Handlers bail when self.conversation is None. Either await create_conversation_coro before wiring LLM/STT or buffer upserts then drain once set.

Would you prefer I draft a small ring buffer (deque maxlen=N) to queue upserts until conversation is ready?

Also applies to: 404-406

450-458: Graceful shutdown: await cancelled tasks to drain and suppress CancelledError.
-        for _, track in self._track_tasks.items():
-            track.cancel()
+        for _, task in self._track_tasks.items():
+            task.cancel()
+        if self._track_tasks:
+            await asyncio.gather(*self._track_tasks.values(), return_exceptions=True)
+        self._track_tasks.clear()
@@
-        if self._interval_task:
-            self._interval_task.cancel()
+        if self._interval_task:
+            self._interval_task.cancel()
+            try:
+                await self._interval_task
+            except asyncio.CancelledError:
+                pass
Also applies to: 464-471, 496-499

754-756: Typo and readability: hasImageProcessers → has_image_processors.
-        hasImageProcessers = len(self.image_processors) > 0
+        has_image_processors = len(self.image_processors) > 0
@@
-        if not hasImageProcessers:
+        if not has_image_processors:
@@
-                    if hasImageProcessers:
+                    if has_image_processors:
Also applies to: 768-773, 789-799
tests/test_realtime_base.py (1)

88-171: Unskip and update to the unified Conversation API.

The skipped test still targets legacy methods (partial_update_message, finish_last_message). Rework to await conversation.upsert_message(...) assertions.

I can provide a minimal FakeConversation.upsert_message that records calls and simulates chunking semantics.

plugins/openai/tests/test_openai_realtime.py (1)

113-173: Event assertions look solid; consider awaiting event queue instead of fixed sleeps.

Replace await asyncio.sleep(0.1) with await realtime.events.wait(timeout=1.0) to reduce flakiness.
agents-core/vision_agents/core/llm/realtime.py (1)
185-205: Add short Google‑style docstrings to new emitters.

Improves discoverability for downstream handlers.

Example:
def _emit_user_speech_transcription(self, text: str, original=None):
    """Emit user speech transcript.

    Args:
        text: Transcript text.
        original: Provider-specific raw event.
    """
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (3)
194-233: Participant map can leak if completion never arrives — add bounds/cleanup.

If conversation.item.created fires without a matching transcription, _item_to_participant retains entries.

Clear map on close()/disconnect.

Bound size (e.g., max 100) and evict oldest.

Add periodic cleanup (TTL) if feasible.

301-304: Tool‑call logs may include sensitive args — lower to DEBUG or redact.
-            logger.info(
+            logger.debug(
                 f"Executing tool call: {tool_call['name']} with args: {tool_call['arguments_json']}"
             )
@@
-                logger.info(f"Tool call {tool_call['name']} succeeded: {response_data}")
+                logger.debug("Tool call %s succeeded", tool_call["name"])
Also applies to: 309-319

333-374: Tool response path looks correct; small polish on JSON conversion and follow‑up trigger.

Consider explicitly limiting response_str length or switching to JSON object if API supports it.
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (2)

208-237: LGTM - Defensive conditional checks.

The expanded multi-line conditionals make None-checks explicit and improve code safety. While more verbose, this prevents potential AttributeError exceptions when accessing nested attributes.

299-302: Consider enriching LLMResponseChunkEvent with additional metadata.

The LLMResponseChunkEvent includes optional fields like item_id, output_index, and sequence_number that could provide richer context for event consumers. If server_message contains relevant identifiers or ordering information, consider populating these fields for consistency with OpenAI realtime events.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 102a61e and 6fd13c6.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (10)

agents-core/vision_agents/core/agents/agents.py (21 hunks)
agents-core/vision_agents/core/llm/events.py (8 hunks)
agents-core/vision_agents/core/llm/realtime.py (5 hunks)
examples/01_simple_agent_example/simple_agent_example.py (2 hunks)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (19 hunks)
plugins/openai/pyproject.toml (1 hunks)
plugins/openai/tests/test_openai_realtime.py (4 hunks)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (13 hunks)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (23 hunks)
tests/test_realtime_base.py (9 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

examples/01_simple_agent_example/simple_agent_example.py

🧰 Additional context used

📓 Path-based instructions (2)

tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

tests/**/*.py: Do not use mocking in tests
All tests must use pytest
Integration tests must be marked with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic

Files:

tests/test_realtime_base.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide

Files:

tests/test_realtime_base.py
agents-core/vision_agents/core/llm/realtime.py
plugins/openai/vision_agents/plugins/openai/openai_realtime.py
agents-core/vision_agents/core/agents/agents.py
plugins/openai/tests/test_openai_realtime.py
plugins/openai/vision_agents/plugins/openai/rtc_manager.py
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py
agents-core/vision_agents/core/llm/events.py

🧬 Code graph analysis (8)

tests/test_realtime_base.py (4)

agents-core/vision_agents/core/edge/edge_transport.py (1)

EdgeTransport (18-59)

agents-core/vision_agents/core/edge/types.py (2)

User (15-18)

Connection (27-34)

agents-core/vision_agents/core/events/manager.py (1)

wait (451-465)

agents-core/vision_agents/core/agents/agents.py (1)

Agent (56-1045)

agents-core/vision_agents/core/llm/realtime.py (4)

agents-core/vision_agents/core/edge/types.py (1)

PcmData (37-183)

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)

simple_audio_response (130-156)

plugins/openai/vision_agents/plugins/openai/openai_realtime.py (1)

simple_audio_response (121-136)

agents-core/vision_agents/core/llm/events.py (2)

RealtimeUserSpeechTranscriptionEvent (142-147)

RealtimeAgentSpeechTranscriptionEvent (151-156)

plugins/openai/vision_agents/plugins/openai/openai_realtime.py (6)

plugins/openai/tests/test_openai_realtime.py (1)

realtime (20-29)

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (5)

RTCManager (253-796)

connect (285-308)

start_video_sender (477-538)

stop_video_sender (540-582)

_send_event (447-475)

agents-core/vision_agents/core/llm/realtime.py (6)

connect (68-68)

_emit_connected_event (83-97)

simple_audio_response (71-73)

_emit_agent_speech_transcription (196-204)

_emit_response_event (138-156)

_emit_user_speech_transcription (185-194)

agents-core/vision_agents/core/edge/types.py (1)

PcmData (37-183)

agents-core/vision_agents/core/llm/llm.py (2)

_run_one_tool (226-292)

get_available_functions (180-182)

agents-core/vision_agents/core/llm/llm_types.py (1)

ToolSchema (64-67)

agents-core/vision_agents/core/agents/agents.py (8)

agents-core/vision_agents/core/llm/events.py (4)

LLMResponseChunkEvent (90-105)

LLMResponseCompletedEvent (109-115)

RealtimeUserSpeechTranscriptionEvent (142-147)

RealtimeAgentSpeechTranscriptionEvent (151-156)

agents-core/vision_agents/core/llm/realtime.py (3)

Realtime (24-220)

simple_audio_response (71-73)

_watch_video_track (75-77)

agents-core/vision_agents/core/mcp/mcp_manager.py (1)

MCPManager (9-147)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (7-22)

agents-core/vision_agents/core/agents/conversation.py (2)

Conversation (67-227)

upsert_message (122-210)

agents-core/vision_agents/core/events/manager.py (3)

register_events_from_module (214-247)

subscribe (292-353)

send (406-449)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)

create_conversation (224-231)

create_audio_track (291-294)

agents-core/vision_agents/core/utils/video_forwarder.py (1)

VideoForwarder (13-188)

plugins/openai/tests/test_openai_realtime.py (4)

agents-core/vision_agents/core/llm/events.py (3)

RealtimeAudioOutputEvent (37-45)

RealtimeUserSpeechTranscriptionEvent (142-147)

RealtimeAgentSpeechTranscriptionEvent (151-156)

agents-core/vision_agents/core/agents/agents.py (2)

subscribe (187-199)

simple_response (173-185)

plugins/openai/vision_agents/plugins/openai/openai_realtime.py (4)

connect (72-98)

simple_response (100-119)

_handle_openai_event (152-244)

simple_audio_response (121-136)

agents-core/vision_agents/core/edge/types.py (2)

PcmData (37-183)

resample (129-183)

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)

agents-core/vision_agents/core/utils/video_forwarder.py (1)

VideoForwarder (13-188)

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (5)

agents-core/vision_agents/core/llm/events.py (2)

RealtimeAudioOutputEvent (37-45)

LLMResponseChunkEvent (90-105)

agents-core/vision_agents/core/llm/realtime.py (2)

_emit_user_speech_transcription (185-194)

_emit_agent_speech_transcription (196-204)

agents-core/vision_agents/core/edge/types.py (1)

PcmData (37-183)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

start_event_consumer (104-188)

start (31-39)

agents-core/vision_agents/core/llm/llm.py (1)

_run_one_tool (226-292)

agents-core/vision_agents/core/llm/events.py (1)

agents-core/vision_agents/core/events/base.py (2)

PluginBaseEvent (50-52)

AudioFormat (23-30)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy

🔇 Additional comments (19)

plugins/openai/pyproject.toml (1)

14-14: Consider tightening the version constraint to prevent unexpected major version upgrades.

The constraint >=2.5.0 permits any future major version (e.g., 3.0.0), which could introduce breaking changes. Since OpenAI's Python client follows SemVer with potential backwards-incompatible changes in minor versions, a safer constraint would be >=2.5.0,<3.0.0 to ensure stability.

Additionally, verify that the PR changes (e.g., new OpenAI realtime event types like RealtimeUserSpeechTranscriptionEvent) actually require 2.5.0 specifically, or if an earlier 2.x version would suffice. The latest openai package version is 2.5.0 (released Oct 17, 2025), but confirming the minimum required version ensures clarity for future maintainers.

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (11)

45-50: LGTM: Clean type hints added.

The explicit type annotations improve clarity.

96-132: LGTM: Improved formatting and logging.

The multiline formatting and enhanced log messages improve readability.

136-149: LGTM: Formatting improvements.

The multiline parameter formatting enhances code readability.

161-228: LGTM: Enhanced logging formatting.

The multiline log statements improve readability without changing behavior.

349-362: LGTM: Session configuration enhancement.

Enabling semantic VAD and input audio transcription aligns with the PR objectives for enriched event payloads.

326-328: LGTM: Consistent multiline formatting.

The multiline formatting for async calls improves readability.

Also applies to: 457-459, 606-608, 627-630

424-475: LGTM: Logging formatting improvements.

The multiline log statements are more readable.

477-538: LGTM: Formatting and logging improvements.

The multiline formatting enhances code organization and readability.

540-582: LGTM: Spacing and logging improvements.

The additional spacing and multiline logs improve code structure.

679-681: LGTM: Logging formatting improvement.

The multiline formatting is more readable.

782-782: LGTM: Spacing improvement.

plugins/openai/tests/test_openai_realtime.py (1)

64-77: Nice resampling harness and typed PcmData construction.

This aligns with the core PcmData API and exercises realistic paths. LGTM.

agents-core/vision_agents/core/llm/realtime.py (1)

70-74: API change acknowledged: simple_audio_response(..., participant=None)

Signature aligns with downstream usage (Agent/OpenAI/Gemini). LGTM.

Please confirm all concrete implementations were updated (OpenAI, Gemini) and no third-party LLMs remain on the old signature.

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (3)

9-24: LGTM - Improved type coverage.

The expanded imports from google.genai.types provide better type annotations throughout the module.

239-264: LGTM - Transcription events properly emitted.

The user and agent speech transcription events are correctly emitted using the base class methods. The participant context set in simple_audio_response will be included in user transcription events via user_metadata=self._current_participant in the base class implementation.

130-157: No changes needed—attribute is properly initialized in base class.

The verification confirms that _current_participant is initialized in the base class (agents-core/vision_agents/core/llm/realtime.py) as self._current_participant: Optional[Participant] = None. The assignment in simple_audio_response (line 149) safely updates an already-initialized attribute; no AttributeError risk exists.

Likely an incorrect or invalid review comment.

agents-core/vision_agents/core/llm/events.py (2)

11-87: LGTM - String literal formatting and field enrichment.

The change from single-quoted to double-quoted string literals in type field defaults is a stylistic consistency improvement. The new fields added to events (is_complete, conversation_item_id, status, role, content, is_recoverable, delta, item_id) are properly typed and provide richer event payloads for consumers.

141-156: LGTM - New transcription events well-structured.

The new RealtimeUserSpeechTranscriptionEvent and RealtimeAgentSpeechTranscriptionEvent classes follow consistent patterns with existing events. The separation between user and agent transcriptions provides clear event semantics for consumers. Docstrings are concise and conform to the Google style guide requirement.

coderabbitai · 2025-10-20T07:38:27Z

agents-core/vision_agents/core/agents/agents.py

+            user_id = event.user_id() or "user"
+
+            await self.conversation.upsert_message(
+                # TODO: FIX THIS, IT IS INCREDIBLY WRONG!
+                message_id="stt-" + user_id,
+                role="user",
+                user_id=user_id,
+                content=event.text or "",
+                completed=True,
+                replace=True,  # Replace any partial transcripts
+                original=event,
+            )


⚠️ Potential issue | 🟠 Major

STT upserts overwrite each other across utterances — use per‑utterance IDs (drop replace unless streaming).

Static message_id="stt-"+user_id with replace=True collapses all user speech into one message.

- user_id = event.user_id() or "user" + user_id = event.user_id() or "user" @@ - # TODO: FIX THIS, IT IS INCREDIBLY WRONG! - message_id="stt-" + user_id, + message_id=f"stt-{user_id}-{uuid.uuid4()}", role="user", user_id=user_id, content=event.text or "", completed=True, - replace=True, # Replace any partial transcripts + # replace only if you're purposely reusing the same message_id for streaming + replace=False, original=event,

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In agents-core/vision_agents/core/agents/agents.py around lines 254 to 265, the STT upsert is using a static message_id ("stt-"+user_id) and replace=True which causes all utterances to overwrite one another; change to generate a unique per-utterance message_id (for example include event.id, a timestamp, or an incrementing counter) and remove replace=True unless you intentionally want streaming/partial-replacement semantics; ensure completed is set appropriately per-utterance and retain original=event.

coderabbitai · 2025-10-20T07:38:27Z

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py

+    async def simple_response(
+        self,
+        text: str,
+        processors: Optional[List[Processor]] = None,
+        participant: Optional[Participant] = None,
+    ):


⚠️ Potential issue | 🟡 Minor

Unused parameter: participant is not utilized.

The participant parameter is added to simple_response but never used or stored, unlike simple_audio_response which sets self._current_participant. If participant tracking is needed for text inputs, the parameter should be stored; otherwise, consider removing it to avoid API confusion.

Apply this diff if participant tracking is intended:

async def simple_response( self, text: str, processors: Optional[List[Processor]] = None, participant: Optional[Participant] = None, ): """ Simple response standardizes how to send a text instruction to this LLM. Example: llm.simple_response("tell me a poem about Boulder") For more advanced use cases you can use the native send_realtime_input """ + self._current_participant = participant self.logger.info("Simple response called with text: %s", text) await self.send_realtime_input(text=text)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py around lines 113 to 118, the new parameter `participant` on simple_response is unused which causes API confusion; either persist it like simple_audio_response or remove it. If participant tracking is intended, set self._current_participant = participant at the start of simple_response (same behavior as simple_audio_response) and ensure any type checks/uses downstream rely on that field; otherwise delete the participant parameter from the method signature and update any callers to avoid passing it.

coderabbitai · 2025-10-20T07:38:27Z

plugins/openai/vision_agents/plugins/openai/rtc_manager.py

        if event.get("type") == "session.created" and "session" in event:
            self.session_info = event["session"]
-            logger.debug(f"Stored session info: {self.session_info}")
+            logger.error(f"Stored session info: {self.session_info}")


⚠️ Potential issue | 🟡 Minor

Incorrect log level for informational message.

This logs successful session storage at ERROR level, which will pollute error logs and trigger false alarms in monitoring.

Apply this diff:

- logger.error(f"Stored session info: {self.session_info}") + logger.info(f"Stored session info: {self.session_info}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.error(f"Stored session info: {self.session_info}")

logger.info(f"Stored session info: {self.session_info}")

🤖 Prompt for AI Agents

In plugins/openai/vision_agents/plugins/openai/rtc_manager.py around line 667, the code logs a successful session storage using logger.error which is the wrong severity; change the log level to an informational one (e.g., logger.info or logger.debug) and update the message if needed to reflect normal operation so successful session info is not treated as an error.

coderabbitai · 2025-10-20T07:38:27Z

tests/test_realtime_base.py

 @pytest.mark.asyncio
 async def test_agent_conversation_updates_with_realtime():


⚠️ Potential issue | 🟠 Major

Remove @pytest.mark.asyncio — async support is automatic per repo guidelines.

Replace with plain async def tests.

-@pytest.mark.asyncio async def test_agent_conversation_updates_with_realtime(): @@ -@pytest.mark.asyncio async def test_simple_response_aggregates_and_returns_realtimeresponse(): @@ -@pytest.mark.asyncio async def test_wait_until_ready_returns_true_immediately(): @@ -@pytest.mark.asyncio async def test_close_emits_disconnected_event(): @@ -@pytest.mark.asyncio async def test_noop_video_and_playback_methods_do_not_error():

As per coding guidelines.

Also applies to: 266-266, 291-291, 297-297, 317-317, 365-365

🤖 Prompt for AI Agents

In tests/test_realtime_base.py around lines 89-90 (also at lines 266, 291, 297, 317, and 365), remove the leading @pytest.mark.asyncio decorator from each test function so they become plain async def tests; leave the async def signature intact and delete only the decorator lines, ensuring no other test code or imports are changed.

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

plugins/deepgram/tests/test_stt.py (1)
144-189: Add docstring to fixture.

The audio_data fixture is missing a docstring. Other fixtures in this file follow the pattern of documenting their purpose.

As per coding guidelines.

Apply this diff:
 @pytest.fixture
 def audio_data(mia_mp3_path):
+    """Load and prepare the audio data for testing."""
     from scipy import signal
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (3)
234-241: Don’t stop a shared forwarder.

stop() unconditionally schedules self._forwarder.stop(), which can tear down a shared resource used by other tracks.

Apply this guard:
@@     def stop(self):
-        try:
-            if self._forwarder is not None:
-                asyncio.create_task(self._forwarder.stop())
-                self._forwarder = None
+        try:
+            if self._forwarder is not None:
+                if self._forwarder is self._shared_forwarder:
+                    logger.debug("Shared VideoForwarder detected; skipping stop()")
+                else:
+                    asyncio.create_task(self._forwarder.stop())
+                self._forwarder = None
319-326: Ensure client_secret.value exists; fail fast with a clear error.

If the API changes or returns no client_secret, you may return None and proceed with a bad token.

Apply:
@@     async def _get_session_token(self) -> str:
-                    data: dict = resp.json()
-                    secret = data.get("client_secret", {})
-                    return secret.get("value")
+                    data: dict = resp.json()
+                    secret = data.get("client_secret") or {}
+                    token = secret.get("value")
+                    if not token:
+                        raise RuntimeError("Missing client_secret.value in session.create response")
+                    return token
470-495: _video_sender may be undefined; initialize in init to avoid AttributeError.

start_video_sender uses self._video_sender but it’s not set in init. If called before _set_video_track, this can raise.

Add this in init (outside this hunk):
# in RTCManager.__init__
self._video_sender: Optional[Any] = None  # RTCRtpSender; keep Any to avoid extra import
And keep the existing None checks. Optionally, assert that connect() completed or that _set_video_track() ran.

♻️ Duplicate comments (1)

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)
658-658: Log level and redaction for stored session info.

Re‑flagging: logging normal session storage at ERROR is wrong; also avoid dumping full session objects (may contain sensitive fields).

Apply:
-            logger.error(f"Stored session info: {self.session_info}")
+            # Avoid logging secrets; log minimal identifiers only
+            safe_id = self.session_info.get("id") if isinstance(self.session_info, dict) else None
+            safe_model = self.session_info.get("model") if isinstance(self.session_info, dict) else None
+            logger.info("Stored session info (id=%s, model=%s)", safe_id, safe_model)

🧹 Nitpick comments (13)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
226-230: Cleaner architecture with local channel variable.

The change from class-level self.channel storage to a local variable passed to StreamConversation improves encapsulation and aligns with the updated StreamConversation constructor signature shown in the relevant code snippets.

Consider adding error handling for the channel.get_or_create() call to gracefully handle potential channel creation failures:
 async def create_conversation(self, call: Call, user, instructions):
     chat_client: ChatClient = call.client.stream.chat
     channel = chat_client.channel(self.channel_type, call.id)
-    await channel.get_or_create(
-        data=ChannelInput(created_by_id=user.id),
-    )
+    try:
+        await channel.get_or_create(
+            data=ChannelInput(created_by_id=user.id),
+        )
+    except Exception as e:
+        self.logger.error(f"Failed to create or retrieve channel: {e}")
+        raise
     self.conversation = StreamConversation(instructions, [], channel)
     return self.conversation
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (5)
100-119: Clarify the unused participant parameter.

The participant parameter is documented as ignored, yet the PR introduces per-participant handling throughout this module. If this parameter exists solely for interface parity with the base Realtime class or other providers, that's reasonable—but if there's a plan to use it for correlation (e.g., tracking which participant sent the text), please clarify or open a follow-up issue.

Do you want me to open an issue to track potential future use of the participant parameter in simple_response, or is the current design intentional?

172-193: LGTM—workaround is well-documented.

The handling of both response.audio_transcript.done and response.output_audio_transcript.done via event mutation is a reasonable workaround for the OpenAI API inconsistency. The GitHub issue reference provides good context.

Optionally, enhance the comment to explain what breaks without the workaround:
-        # code here is weird because OpenAI does something strange
-        # see issue: https://github.com/openai/openai-python/issues/2698
-        # as a workaround we copy the event and set type to response.output_audio_transcript.done so that
-        # ResponseAudioTranscriptDoneEvent.model_validate is happy
+        # OpenAI sends response.audio_transcript.done but the SDK expects response.output_audio_transcript.done.
+        # Without this workaround, Pydantic validation fails.
+        # See: https://github.com/openai/openai-python/issues/2698
194-203: Consider clearing _pending_participant after mapping.

If OpenAI doesn't emit a conversation.item.created event for some reason, _pending_participant remains set and could be incorrectly attributed to a subsequent item.

Clear the pending participant immediately after establishing the mapping:
                 if item_id and self._pending_participant:
                     self._item_to_participant[item_id] = self._pending_participant
+                    self._pending_participant = None
                     logger.debug(
Also, simplify the logging expression:
-                        f"Mapped item {item_id} to participant {self._pending_participant.user_id if self._pending_participant else 'None'}"
+                        f"Mapped item {item_id} to participant {self._item_to_participant[item_id].user_id}"
275-331: LGTM—tool call handling is well-structured.

The enhanced logging and error handling improve observability. The tool execution flow correctly delegates to _run_one_tool and handles both success and error cases.

Consider making the 30-second timeout configurable rather than hardcoded at line 306:
+            timeout = getattr(self, 'tool_call_timeout', 30)
-            tc, result, error = await self._run_one_tool(tool_call, timeout_s=30)
+            tc, result, error = await self._run_one_tool(tool_call, timeout_s=timeout)
This allows callers to adjust the timeout for tools with varying execution times.

333-376: Tool response flow is correct.

The method correctly sanitizes output, sends the tool response, and triggers a follow-up response with audio. The instructions ensure the AI provides a conversational response after receiving tool results.

The hardcoded instruction at line 370 could be made configurable:
+            follow_up_instructions = getattr(
+                self, 
+                'tool_response_instructions', 
+                "Please respond to the user with the tool results in a conversational way."
+            )
             await self.rtc._send_event(
                 {
                     "type": "response.create",
                     "response": {
                         "modalities": ["text", "audio"],
-                        "instructions": "Please respond to the user with the tool results in a conversational way.",
+                        "instructions": follow_up_instructions,
                     },
                 }
             )
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (7)
45-51: Audio input setter looks good; small micro‑opt note.

The copy via bytes(pcm_data) is safe. If perf matters, consider accepting memoryview/bytearray to avoid copies, but fine as-is.

649-653: Remove stray print; use structured logging.

print() in async handlers will spam stdout and bypass log routing.
-        print(f"_handle_event {event['type']}")
+        logger.debug("Event received: %s", event.get("type"))
690-762: Forwarding loop is unused and overly chatty at INFO.

Given the switch to StreamVideoForwardingTrack, _forward_video_frames seems unused. Either remove it or downgrade INFO logs inside the loop to DEBUG to avoid log noise.

Minimal noise reduction:
-            logger.info(
+            logger.debug(
                 f"🎥 Starting video frame forwarding loop (fps={fps}, interval={interval:.3f}s)"
             )
-            logger.info(
+            logger.debug(
                 f"🎥 Source track: {type(source_track).__name__}, kind={getattr(source_track, 'kind', 'unknown')}"
             )
@@
-                    logger.info(
+                    logger.debug(
                         f"🎥 SUCCESS: Read frame #{frame_count} from user track!"
                     )
-                    logger.info(
+                    logger.debug(
                         f"🎥 Frame details: {frame.width}x{frame.height}, format={frame.format}, pts={frame.pts}"
                     )
@@
-            logger.info(
+            logger.debug(
                 f"🎥 Video forwarding task cancelled after {frame_count} frames"
             )
@@
-            logger.info(
+            logger.debug(
                 f"🎥 Video forwarding task ended. Total frames processed: {frame_count}"
             )
135-135: Minor: drop f‑string; improve message casing.

No interpolation here; also “RTCManager” for consistency.
-            logger.warning(f"rtc manager already started", stack_info=True)
+            logger.warning("RTCManager already started", stack_info=True)
125-131: Log level sanity: init messages OK; consider DEBUG for non‑actionable info.

Initialization messages with emojis are fine for dev, but might be noisy in prod. Consider DEBUG for the non‑actionable branch.

205-219: Circuit‑breaker logs: keep at ERROR, but include context for triage.

Consider adding model/session id to ease correlation when multiple sessions run.

577-585: SDP exchange timeouts look reasonable; consider retry with backoff.

Single attempt can be brittle in transient network issues; a short bounded backoff (e.g., 2 tries) can improve resilience.

Also applies to: 597-603

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6fd13c6 and 63c6352.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

agents-core/vision_agents/core/agents/agents.py (21 hunks)
plugins/deepgram/pyproject.toml (1 hunks)
plugins/deepgram/tests/test_stt.py (7 hunks)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (3 hunks)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (13 hunks)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (22 hunks)
plugins/silero/pyproject.toml (1 hunks)
plugins/silero/tests/test_vad.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide

Files:

plugins/openai/vision_agents/plugins/openai/openai_realtime.py
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py
plugins/openai/vision_agents/plugins/openai/rtc_manager.py
agents-core/vision_agents/core/agents/agents.py
plugins/silero/tests/test_vad.py
plugins/deepgram/tests/test_stt.py

🧬 Code graph analysis (5)

plugins/openai/vision_agents/plugins/openai/openai_realtime.py (6)

plugins/openai/tests/test_openai_realtime.py (1)

realtime (20-29)

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (5)

RTCManager (246-787)

connect (278-301)

start_video_sender (470-531)

stop_video_sender (533-575)

_send_event (440-468)

agents-core/vision_agents/core/llm/realtime.py (8)

connect (68-68)

_emit_connected_event (83-97)

simple_audio_response (71-73)

_emit_agent_speech_transcription (196-204)

_emit_response_event (138-156)

_emit_user_speech_transcription (185-194)

_watch_video_track (75-77)

_stop_watching_video_track (79-81)

agents-core/vision_agents/core/edge/types.py (1)

PcmData (37-183)

agents-core/vision_agents/core/llm/llm.py (2)

_run_one_tool (226-292)

get_available_functions (180-182)

agents-core/vision_agents/core/llm/llm_types.py (1)

ToolSchema (64-67)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (3)

agents-core/vision_agents/core/edge/sfu_events.py (14)

user_id (489-493)

user_id (856-860)

user_id (901-905)

user_id (1186-1190)

user_id (2071-2075)

user_id (2113-2117)

track_type (579-583)

track_type (1193-1197)

track_type (2254-2258)

participant (1495-1500)

participant (1537-1542)

participant (1610-1615)

participant (2078-2083)

participant (2127-2132)

agents-core/vision_agents/core/edge/events.py (1)

TrackAddedEvent (17-22)

plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)

StreamConversation (17-353)

agents-core/vision_agents/core/agents/agents.py (6)

agents-core/vision_agents/core/llm/events.py (4)

LLMResponseChunkEvent (90-105)

LLMResponseCompletedEvent (109-115)

RealtimeUserSpeechTranscriptionEvent (142-147)

RealtimeAgentSpeechTranscriptionEvent (151-156)

agents-core/vision_agents/core/llm/llm.py (1)

LLM (35-358)

plugins/openai/vision_agents/plugins/openai/openai_realtime.py (4)

Realtime (37-463)

simple_response (100-119)

simple_audio_response (121-136)

_watch_video_track (266-270)

agents-core/vision_agents/core/llm/realtime.py (3)

Realtime (24-220)

simple_audio_response (71-73)

_watch_video_track (75-77)

agents-core/vision_agents/core/agents/conversation.py (2)

Conversation (67-227)

upsert_message (122-210)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)

create_conversation (224-231)

create_audio_track (291-294)

plugins/silero/tests/test_vad.py (2)

plugins/deepgram/tests/test_stt.py (3)

mia_mp3_path (126-128)

audio_data (144-189)

close (61-63)

agents-core/vision_agents/core/vad/events.py (1)

VADPartialEvent (44-55)

plugins/deepgram/tests/test_stt.py (3)

plugins/silero/tests/test_vad.py (1)

mia_mp3_path (23-25)

plugins/moonshine/tests/test_stt.py (1)

mia_mp3_path (33-35)

plugins/deepgram/vision_agents/plugins/deepgram/stt.py (1)

close (180-199)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Test "not integration"

🔇 Additional comments (17)

plugins/silero/pyproject.toml (1)

39-39: Verify the rationale for swapping audio/vision dependencies.

The documented PyTorch/torchvision pairing suggests torchvision 0.20.0 pairs with PyTorch 2.5.0, whereas this project specifies torch>=2.7.1. While the loose constraint >=0.20.0 will pull modern releases compatible with PyTorch 2.7.x, the substitution of torchaudio (audio processing) with torchvision (computer vision) is atypical for a Silero text-to-speech plugin.

Since this is a dev dependency, the risk is lower, but the purpose of this swap should be clarified—is this intentional or a mistake?

plugins/deepgram/pyproject.toml (1)

42-42: Clarify the coordinated dependency swap across plugins.

This mirrors the change in plugins/silero/pyproject.toml, yet substituting torchvision (computer vision) for audio-processing libraries seems incongruent with Deepgram's STT (speech-to-text) focus. The PR summary indicates these are "broader environment adjustments," but the functional purpose is unclear.

Please document why torchvision>=0.20.0 is now required in dev environments for audio-centric plugins, or verify that this is an unintended side effect of a larger refactoring.

plugins/deepgram/tests/test_stt.py (2)

456-459: LGTM!

The multiline decorator formatting improves readability while preserving the same functional behavior. This pattern is consistently applied across multiple test functions.

165-165: LGTM!

Explicit cleanup of the PyAV container is good practice for resource management.

plugins/silero/tests/test_vad.py (2)

47-74: LGTM: Audio conversion logic is correct.

The PyAV-based mp3 to wav conversion properly handles:

Stereo to mono conversion via channel averaging

Normalization from int16/int32 to float32 in the [-1.0, 1.0] range required by soundfile

Frame assembly and concatenation

This implementation mirrors the pattern in plugins/deepgram/tests/test_stt.py (lines 143-188).

474-486: LGTM: Logging format refactored for brevity.

The duration calculation has been inlined from multiline blocks into single expressions. Functionality is preserved—the ternary operator still safely handles None values for event.duration_ms.

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)

152-163: Agent track filtering logic is correct.

The conditional filtering to prevent emitting TrackAddedEvent for the agent's own tracks is appropriate and prevents the agent from processing its own audio/video as remote participant input.

107-107: ****

The concern about timing is not valid. While _on_track_published is registered as an event handler during __init__ (line 72), event firing and handler registration are distinct operations. Track published events only fire during an active call after join() is called, which occurs after create_user() has already set self.agent_user_id. All usage patterns in the codebase enforce this sequence: await agent.create_user() → await agent.join(call). Therefore, self.agent_user_id is guaranteed to be non-None when the event handler fires.

Likely an incorrect or invalid review comment.

plugins/openai/vision_agents/plugins/openai/openai_realtime.py (2)

121-136: Participant tracking logic is correct, pending concurrency fix.

The approach of setting _pending_participant before sending audio is sound for correlating subsequent OpenAI conversation items with the correct user. However, this depends on the thread safety improvements suggested for lines 68-70 to avoid race conditions when multiple participants send audio concurrently.

427-462: LGTM—tool registration flow is sound.

The method correctly retrieves available functions, converts them to OpenAI's format, and registers them with the realtime session. The logging provides good visibility, and error handling ensures registration failures don't break the connection.

agents-core/vision_agents/core/agents/agents.py (6)

4-4: LGTM—imports support the new unified conversation API.

The added imports (uuid, llm_events, RealtimeUserSpeechTranscriptionEvent, RealtimeAgentSpeechTranscriptionEvent, STTTranscriptEvent, Conversation) are all used in the new setup hooks and align with the PR's unified messaging objective.

Also applies to: 20-20, 24-25, 32-32, 39-39

305-329: LGTM—turn detection integration is clean.

The handler correctly accumulates transcripts when turn detection is enabled and triggers the LLM immediately otherwise. The logic for buffering per-user transcripts and deferring response until turn end is sound.

343-349: LGTM—deferred conversation creation ensures correct initialization order.

Creating the conversation coroutine early (Line 344) but awaiting it after MCP and edge setup (Line 404) ensures event handlers are wired before the conversation exists. This prevents missed events during initialization.

Also applies to: 403-404

448-451: LGTM—minor formatting and logging improvements.

These changes add clearer logging, improve code organization with blank lines, and pass the shared_forwarder parameter consistently. All are refinements without functional impact.

Also applies to: 719-721, 728-731, 739-744, 749-751, 758-760, 769-772, 774-777

593-624: LGTM—say method now syncs with conversation.

The addition of conversation.upsert_message (Lines 617-624) correctly records say() calls as completed assistant messages. The user_id fallback chain (Line 621) provides appropriate defaults.

817-817: LGTM—formatting and logging cleanup.

These changes remove unnecessary blank lines, improve log messages, and clean up docstrings without altering behavior.

Also applies to: 842-842, 848-865, 895-895, 1025-1027

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)

342-355: API schema verified—session.update payload is correct.

The Realtime API accepts session.update with turn_detection type "semantic_vad" and input_audio_transcription model "whisper-1". The code at lines 342–355 uses the correct field structure; no changes needed.

coderabbitai · 2025-10-20T18:37:28Z

agents-core/vision_agents/core/agents/agents.py

+    async def _setup_llm_events(self):
+        @self.llm.events.subscribe
+        async def on_llm_response_send_to_tts(event: LLMResponseCompletedEvent):
+            # Trigger TTS directly instead of through event system
+            if self.tts and event.text and event.text.strip():
+                await self.tts.send(event.text)
+
+        @self.llm.events.subscribe
+        async def on_llm_response_sync_conversation(event: LLMResponseCompletedEvent):
+            self.logger.info(f"🤖 [LLM response]: {event.text} {event.item_id}")
+
+            if self.conversation is None:
+                return
+
+            # Unified API: handles both streaming and non-streaming
+            await self.conversation.upsert_message(
+                message_id=event.item_id,
+                role="assistant",
+                user_id=self.agent_user.id or "agent",
+                content=event.text or "",
+                completed=True,
+                replace=True,  # Replace any partial content from deltas
+            )
+
+        @self.llm.events.subscribe
+        async def _handle_output_text_delta(event: LLMResponseChunkEvent):
+            """Handle partial LLM response text deltas."""
+
+            self.logger.info(
+                f"🤖 [LLM delta response]: {event.delta} {event.item_id} {event.content_index}"
+            )
+
+            if self.conversation is None:
+                return
+
+            # Unified API: streaming delta
+            await self.conversation.upsert_message(
+                message_id=event.item_id,
+                role="assistant",
+                user_id=self.agent_user.id or "agent",
+                content=event.delta or "",
+                content_index=event.content_index,
+                completed=False,  # Still streaming
+            )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify item_id is consistently emitted across all LLM providers.

The upsert logic assumes event.item_id is set for both deltas (Line 238) and completions (Line 217). If any provider emits None or omits item_id, each delta creates a new message rather than updating the same message, fragmenting the conversation.

The past review comment indicates this was addressed in commits 39a821d–6fd13c6, but the event type definitions still show item_id: Optional[str]. Run this verification to confirm all active providers populate item_id:

🏁 Script executed:

#!/bin/bash # Check that all LLM provider plugins emit item_id on chunk and completion events rg -nP --type=py -C3 'LLMResponse(ChunkEvent|CompletedEvent)\s*\(' | rg -C3 'item_id'

Length of output: 2717

[/script]

🏁 Script executed:

#!/bin/bash # Find Anthropic and xAI completion events echo "=== Anthropic completion events ===" rg -nP --type=py 'LLMResponseCompletedEvent' plugins/anthropic/ -A5 | grep -E '(item_id|LLMResponseCompletedEvent)' echo "" echo "=== xAI completion events ===" rg -nP --type=py 'LLMResponseCompletedEvent' plugins/xai/ -A5 | grep -E '(item_id|LLMResponseCompletedEvent)' echo "" echo "=== OpenAI chunk events ===" rg -nP --type=py 'LLMResponseChunkEvent' plugins/openai/ -A5 | grep -E '(item_id|LLMResponseChunkEvent)' echo "" echo "=== Gemini item_id variable source ===" rg -nB10 'self.events.send\(LLMResponseChunkEvent' plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py | grep -E '(item_id\s*=|def |class )' echo "" echo "=== upsert_message implementation ===" rg -nP --type=py 'def upsert_message' agents-core/vision_agents/ -A20 | head -40

Length of output: 3722

[/script]

🏁 Script executed:

#!/bin/bash # Get full xAI completion event context echo "=== xAI completion events (full context) ===" rg -nP --type=py 'self.events.send\(LLMResponseCompletedEvent' plugins/xai/ -A8 echo "" echo "=== Gemini completion events ===" rg -nP --type=py 'LLMResponseCompletedEvent' plugins/gemini/ -B2 -A8 echo "" echo "=== Anthropic completion event full code ===" rg -nP --type=py 'self.events.send\(LLMResponseCompletedEvent' plugins/anthropic/ -B5 -A5

Length of output: 6406

[/script]

🏁 Script executed:

#!/bin/bash # Get full upsert_message implementation to confirm UUID generation rg -nP --type=py 'def upsert_message' agents-core/vision_agents/ -A50 | head -70

Length of output: 4974

Anthropic and xAI LLM providers must emit item_id in completion events.

Verification confirms the streaming consolidation in agents.py (lines 201-244) is broken for two providers:

Anthropic (plugins/anthropic/.../anthropic_llm.py:295): LLMResponseCompletedEvent omits item_id entirely

xAI (plugins/xai/.../llm.py:142, 193): LLMResponseCompletedEvent omits item_id entirely

Both emit chunks with item_id="" (empty string) but completions with no item_id parameter, causing event.item_id = None. When passed to upsert_message() at agents.py line 217, this triggers auto-generation of a new UUID (conversation.py:161), fragmenting the message instead of consolidating deltas.

Gemini and OpenAI correctly emit item_id on both chunks and completions.

Required fixes:

Add item_id parameter to Anthropic completion event (line 295)

Add item_id parameter to both xAI completion events (lines 142 and 193)

coderabbitai · 2025-10-20T18:37:28Z

agents-core/vision_agents/core/agents/agents.py

+    async def _setup_speech_events(self):
+        @self.events.subscribe
+        async def on_stt_transcript_event_sync_conversation(event: STTTranscriptEvent):
+            self.logger.info(f"🎤 [Transcript]: {event.text}")
+
+            if self.conversation is None:
+                return
+
+            user_id = event.user_id() or "user"
+
+            await self.conversation.upsert_message(
+                message_id=str(uuid.uuid4()),
+                role="user",
+                user_id=user_id,
+                content=event.text or "",
+                completed=True,
+                replace=True,  # Replace any partial transcripts
+                original=event,
+            )


⚠️ Potential issue | 🟠 Major

Replace logic is inconsistent—remove replace=True for completed, unique transcripts.

Lines 257 and 262 generate a unique message_id per STT transcript but set replace=True. Since each utterance gets a fresh UUID, there is nothing to replace—replace=True has no effect here and signals confused intent.

If transcripts are complete utterances (not streaming updates), use replace=False. If you intend to stream partial transcripts and later replace them with final text, reuse the same message_id for the partial and final versions.

Apply this diff to fix the STT transcript handler:

await self.conversation.upsert_message( message_id=str(uuid.uuid4()), role="user", user_id=user_id, content=event.text or "", completed=True, - replace=True, # Replace any partial transcripts + replace=False, # Each transcript is a distinct utterance original=event, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def _setup_speech_events(self):

@self.events.subscribe

async def on_stt_transcript_event_sync_conversation(event: STTTranscriptEvent):

self.logger.info(f"🎤 [Transcript]: {event.text}")

if self.conversation is None:

return

user_id = event.user_id() or "user"

await self.conversation.upsert_message(

message_id=str(uuid.uuid4()),

role="user",

user_id=user_id,

content=event.text or "",

completed=True,

replace=True, # Replace any partial transcripts

original=event,

)

async def _setup_speech_events(self):

@self.events.subscribe

async def on_stt_transcript_event_sync_conversation(event: STTTranscriptEvent):

self.logger.info(f"🎤 [Transcript]: {event.text}")

if self.conversation is None:

return

user_id = event.user_id() or "user"

await self.conversation.upsert_message(

message_id=str(uuid.uuid4()),

role="user",

user_id=user_id,

content=event.text or "",

completed=True,

replace=False, # Each transcript is a distinct utterance

original=event,

)

🤖 Prompt for AI Agents

In agents-core/vision_agents/core/agents/agents.py around lines 246 to 264, the STT transcript handler generates a new UUID for each transcript but sets replace=True which is ineffective and misleading; change the call to upsert_message to use replace=False for these completed, unique transcripts (or if you intend streaming partial→final transcripts, reuse the same message_id for partial and final and only set replace=True when writing the final replacement).

coderabbitai · 2025-10-20T18:37:28Z

agents-core/vision_agents/core/agents/agents.py

+        async def on_realtime_user_speech_transcription(
+            event: RealtimeUserSpeechTranscriptionEvent,
+        ):
+            self.logger.info(f"🎤 [User transcript]: {event.text}")
+
+            if self.conversation is None or not event.text:
+                return
+
+            await self.conversation.upsert_message(
+                message_id=str(uuid.uuid4()),
+                role="user",
+                user_id=event.user_id() or "",
+                content=event.text,
+                completed=True,
+                replace=True,
+                original=event,
+            )


⚠️ Potential issue | 🟠 Major

Replace logic is inconsistent—remove replace=True for completed, unique realtime user transcripts.

Lines 276 and 281 generate a unique message_id per realtime user transcript but set replace=True. Since each transcription gets a fresh UUID, there is nothing to replace.

Apply this diff:

await self.conversation.upsert_message( message_id=str(uuid.uuid4()), role="user", user_id=event.user_id() or "", content=event.text, completed=True, - replace=True, + replace=False, original=event, )

🤖 Prompt for AI Agents

In agents-core/vision_agents/core/agents/agents.py around lines 267 to 283, the handler creates a fresh UUID for each realtime user transcript but still passes replace=True to upsert_message, which is inconsistent; remove the replace=True argument (or set replace=False/omit it) when calling self.conversation.upsert_message for these completed, unique realtime transcripts so each new message is inserted instead of attempting to replace a non-existent prior entry.

coderabbitai · 2025-10-20T18:37:28Z

agents-core/vision_agents/core/agents/agents.py

+        async def on_realtime_agent_speech_transcription(
+            event: RealtimeAgentSpeechTranscriptionEvent,
+        ):
+            self.logger.info(f"🎤 [Agent transcript]: {event.text}")
+
+            if self.conversation is None or not event.text:
+                return
+
+            await self.conversation.upsert_message(
+                message_id=str(uuid.uuid4()),
+                role="assistant",
+                user_id=self.agent_user.id or "",
+                content=event.text,
+                completed=True,
+                replace=True,
+                original=event,
+            )


⚠️ Potential issue | 🟠 Major

Replace logic is inconsistent—remove replace=True for completed, unique realtime agent transcripts.

Lines 295 and 300 generate a unique message_id per realtime agent transcript but set replace=True. This is semantically incorrect for the same reasons as the user transcript handler.

Apply this diff:

await self.conversation.upsert_message( message_id=str(uuid.uuid4()), role="assistant", user_id=self.agent_user.id or "", content=event.text, completed=True, - replace=True, + replace=False, original=event, )

🤖 Prompt for AI Agents

In agents-core/vision_agents/core/agents/agents.py around lines 286 to 302, the realtime agent transcript handler creates a unique message_id for each event but incorrectly passes replace=True; remove the replace=True flag (or set replace=False/omit it) when calling conversation.upsert_message for completed realtime agent transcripts so each transcript is stored as a distinct message rather than attempting to replace an existing one.

coderabbitai · 2025-10-20T18:37:29Z

plugins/deepgram/tests/test_stt.py

    ListenV1ResultsMetadata,
 )
 from getstream.video.rtc.track_util import PcmData
+from torchvision.io.video import av


⚠️ Potential issue | 🔴 Critical

Critical: torchvision video decoding is deprecated and removed.

Importing av from torchvision.io.video will fail on torchvision 0.24.0+ as video decoding/encoding capabilities have been removed. Install and import PyAV directly instead.

Based on learnings.

Apply this diff:

-from torchvision.io.video import av +import av

Then ensure PyAV is installed:

pip install av

🤖 Prompt for AI Agents

In plugins/deepgram/tests/test_stt.py around line 18, importing av via "from torchvision.io.video import av" is deprecated and removed; replace that import with a direct PyAV import ("import av") and update project test/dev dependencies or CI to ensure the "av" package is installed (e.g., add to requirements/dev or test extras and run pip install av) so tests can import PyAV successfully.

coderabbitai · 2025-10-20T18:37:29Z

plugins/openai/vision_agents/plugins/openai/openai_realtime.py

        elif et == "input_audio_buffer.speech_started":
            # Validate event but don't need to store it
            InputAudioBufferSpeechStartedEvent.model_validate(event)
-            await self.output_track.flush()
+            # await self.output_track.flush()


🛠️ Refactor suggestion | 🟠 Major

Remove or document commented-out code.

Line 236 has # await self.output_track.flush() without explanation. Either remove it if obsolete, or add a comment explaining why it's disabled and whether it should be re-enabled in the future.

If the flush is intentionally disabled:

- # await self.output_track.flush() + # TODO: Investigate if output_track.flush() is needed when user starts speaking + # Disabled temporarily due to [reason]

Otherwise, remove the line entirely.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In plugins/openai/vision_agents/plugins/openai/openai_realtime.py around lines 233 to 236, there's a commented-out call "# await self.output_track.flush()" with no explanation; either remove this dead code if it's obsolete, or replace the comment with a short explanatory comment stating why flush is disabled (e.g., causes latency/duplicate events, handled elsewhere) and whether/when it should be re-enabled; if flush must be conditionally run, implement the small conditional (e.g., a flag or feature toggle) and document the intended behavior and side effects in that comment so future readers know why it's not awaited here.

coderabbitai · 2025-10-20T18:37:29Z

plugins/openai/vision_agents/plugins/openai/rtc_manager.py

    async def start(self) -> None:
        if self._started:
+            logger.warning(f"rtc manager already started", stack_info=True)
            return
-
-        if self._shared_forwarder is not None:
-            # Use the shared forwarder
-            self._forwarder = self._shared_forwarder
-            logger.info(f"🎥 OpenAI using shared VideoForwarder at {self._fps} FPS")
-        else:
-            # Create our own VideoForwarder with the input source track (legacy behavior)
-            self._forwarder = VideoForwarder(self._source_track, max_buffer=5, fps=self._fps)  # type: ignore[arg-type]
-            await self._forwarder.start()
-
+
+        if self._shared_forwarder is None:
+            raise RuntimeError("self._shared_forwarder is None, something is very wrong")
+
+        self._forwarder = self._shared_forwarder
+        logger.info(f"🎥 OpenAI using shared VideoForwarder at {self._fps} FPS")
        self._started = True



⚠️ Potential issue | 🟠 Major

start() hard‑requires a shared_forwarder but the error message and upstream contract are unclear.

Right now, start() raises RuntimeError when self._shared_forwarder is None, but start_video_sender still treats shared_forwarder as “optional.” Align the contract: either require it explicitly (and validate earlier) or create/own a forwarder here.

Apply one of the following:

Option A — enforce requirement early in start_video_sender (preferred):

@@ async def start_video_sender(self, stream_video_track: MediaStreamTrack, fps: int = 1, shared_forwarder=None) -> None: - Args: - stream_video_track: Video track to forward to OpenAI. - fps: Target frames per second. - shared_forwarder: Optional shared VideoForwarder to use instead of creating a new one. + Args: + stream_video_track: Video track to forward to OpenAI. + fps: Target frames per second. + shared_forwarder: Required. Active VideoForwarder instance to pull frames from. @@ - try: + try: + if shared_forwarder is None: + logger.error("❌ shared_forwarder is required; pass the active VideoForwarder instance") + raise ValueError("shared_forwarder is required for video forwarding")

Option B — keep “optional” param and create your own forwarder here, then let start() accept None and create it:

@@ class StreamVideoForwardingTrack(VideoStreamTrack): - def __init__(self, source_track: MediaStreamTrack, fps: int = 1, shared_forwarder=None): + def __init__(self, source_track: MediaStreamTrack, fps: int = 1, shared_forwarder=None): ... self._forwarder: Optional[VideoForwarder] = None self._shared_forwarder = shared_forwarder + self._owns_forwarder = False @@ - if self._shared_forwarder is None: - raise RuntimeError("self._shared_forwarder is None, something is very wrong") - self._forwarder = self._shared_forwarder + if self._shared_forwarder is None: + self._forwarder = VideoForwarder() + self._owns_forwarder = True + else: + self._forwarder = self._shared_forwarder

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2025-10-20T18:37:29Z

plugins/openai/vision_agents/plugins/openai/rtc_manager.py

+                    await asyncio.wait_for(
+                        self._data_channel_open_event.wait(), timeout=5.0
+                    )
                except asyncio.TimeoutError:
-                    logger.warning("Data channel not open after timeout; dropping event")
+                    logger.warning(
+                        "Data channel not open after timeout; dropping event"
+                    )


⚠️ Potential issue | 🟠 Major

If the data channel isn’t open, return early; don’t attempt send.

We warn on non‑open readyState but still call send(), causing noisy errors.

Apply:

@@ async def _send_event(self, event: dict): - if not self._data_channel_open_event.is_set(): + if not self._data_channel_open_event.is_set(): try: await asyncio.wait_for( self._data_channel_open_event.wait(), timeout=5.0 ) except asyncio.TimeoutError: logger.warning( "Data channel not open after timeout; dropping event" ) return @@ - if self.data_channel.readyState and self.data_channel.readyState != "open": - logger.warning( - f"Data channel state is '{self.data_channel.readyState}', cannot send event" - ) + if self.data_channel.readyState and self.data_channel.readyState != "open": + logger.warning( + f"Data channel state is '{self.data_channel.readyState}', dropping event" + ) + return

Also applies to: 461-462

🤖 Prompt for AI Agents

In plugins/openai/vision_agents/plugins/openai/rtc_manager.py around lines 450-456 (and similarly 461-462), the code waits for the data channel open event and logs a warning on timeout but still proceeds to call send(), causing noisy errors; modify the flow so that after catching asyncio.TimeoutError you return early (or skip the send) instead of continuing, and also add an explicit check of data_channel.readyState (ensure it equals "open") before calling send() to avoid attempts to send on a non-open channel.

coderabbitai · 2025-10-20T18:37:30Z

plugins/openai/vision_agents/plugins/openai/rtc_manager.py

+            forwarding_track = StreamVideoForwardingTrack(
+                stream_video_track, fps, shared_forwarder=shared_forwarder
+            )
            await forwarding_track.start()
-            
+


⚠️ Potential issue | 🟠 Major

start_video_sender assumes shared_forwarder; align docstring + validation.

Docstring says “Optional shared VideoForwarder,” but the code now requires it (forwarding_track.start() will raise). Update the docstring and add an explicit check with a helpful error, or implement the “optional” path (create your own forwarder). See earlier comment with diffs.

🤖 Prompt for AI Agents

In plugins/openai/vision_agents/plugins/openai/rtc_manager.py around lines 507 to 511, the start_video_sender implementation assumes a shared_forwarder is provided but the docstring says it's optional; update either the docstring to mark shared_forwarder as required or add explicit validation and fallback creation. Fix by either (a) changing the docstring to state shared_forwarder is required and raise a clear ValueError if shared_forwarder is None before creating StreamVideoForwardingTrack, or (b) implement the optional path: if shared_forwarder is None instantiate a local VideoForwarder (with appropriate init params), pass it into StreamVideoForwardingTrack, and ensure it is cleaned up after use; include a helpful error message if creation fails.

coderabbitai · 2025-10-20T18:37:30Z

plugins/silero/tests/test_vad.py

 import pytest
 import soundfile as sf
-import torchaudio
+from torchvision.io.video import av


⚠️ Potential issue | 🔴 Critical

Incorrect import: PyAV should be imported directly, not from torchvision.

Importing av from torchvision.io.video will fail. Based on learnings, torchvision 0.24.0 removed deprecated video decoding/encoding capabilities, and the video_reader backend was removed. The correct approach is to import PyAV directly: import av.

Apply this diff to fix the import:

-from torchvision.io.video import av +import av

This matches the pattern used in plugins/deepgram/tests/test_stt.py (line 125-188), where PyAV is imported directly and used for audio extraction.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from torchvision.io.video import av

import av

🤖 Prompt for AI Agents

In plugins/silero/tests/test_vad.py around line 9, the test incorrectly imports PyAV via torchvision (from torchvision.io.video import av); replace that import with a direct PyAV import (import av) so the test uses the av package directly; update any references if necessary to match the direct import and run the tests to confirm audio extraction works as in plugins/deepgram/tests/test_stt.py.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (4)
472-533: Docstring still claims shared_forwarder is "Optional" but code requires it.

Line 483 documents shared_forwarder as "Optional," but Line 512 calls forwarding_track.start(), which will raise RuntimeError if shared_forwarder is None (see Lines 138-141).

Update the docstring to mark shared_forwarder as required and add explicit validation before Line 509, or implement the optional path by creating a local VideoForwarder when None is passed. See the earlier review comment for detailed diffs.

138-145: Hard requirement on shared_forwarder conflicts with "optional" upstream parameter.

The RuntimeError on Line 139 enforces that shared_forwarder must be non-None, but __init__ (Line 100) accepts None by default and start_video_sender (Line 483) documents it as "Optional." This contract mismatch will cause runtime crashes if callers rely on the "optional" declaration.

Refer to the previous review comment suggesting either (A) validate and raise a clear ValueError early in start_video_sender, or (B) create a local forwarder when None is passed.

461-464: Data channel readyState check still missing early return.

After logging the warning that the data channel state is not "open," the code proceeds to attempt send() on Line 467, which will fail noisily.

Apply this diff to return early:
 if self.data_channel.readyState and self.data_channel.readyState != "open":
     logger.warning(
-        f"Data channel state is '{self.data_channel.readyState}', cannot send event"
+        f"Data channel state is '{self.data_channel.readyState}', dropping event"
     )
+    return
660-660: Successful session storage logged at ERROR level.

Logging successful session storage with logger.error pollutes error logs and will trigger false alarms in monitoring systems.

Apply this diff:
-        logger.error(f"Stored session info: {self.session_info}")
+        logger.info(f"Stored session info: {self.session_info}")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 63c6352 and 0412f11.

📒 Files selected for processing (1)

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (22 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide

Files:

plugins/openai/vision_agents/plugins/openai/rtc_manager.py

🔇 Additional comments (4)

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (4)

96-132: LGTM—formatting improvements enhance readability.

The explicit float literals (0.0 instead of 0.) and multiline logging blocks improve code clarity without altering behavior.

147-245: LGTM—multiline logging improves readability.

The frame forwarding logic, error handling, and circuit breaker behavior are preserved; only log formatting has changed.

692-763: LGTM—multiline logging enhances readability.

The frame forwarding logic, timeout handling, and throttling remain intact; only log formatting has been improved.

344-357: The code change is correct. Input audio transcription events are intentionally handled downstream.

The script confirms that transcription events are explicitly handled in openai_realtime.py, with event handler for "conversation.item.input_audio_transcription.completed" at line 204. Multiple tests verify this functionality works correctly, and the feature is documented in the class docstring. The session update in rtc_manager.py enabling transcription is an intentional design choice with full downstream support.

coderabbitai · 2025-10-20T18:41:29Z

plugins/openai/vision_agents/plugins/openai/rtc_manager.py


    async def _handle_event(self, event: dict) -> None:
        """Minimal event handler for data channel messages."""
+        print(f"_handle_event {event['type']}")


⚠️ Potential issue | 🟡 Minor

Remove debug print statement from production code.

The print call bypasses the logging framework and will clutter output. Production code should use the logger exclusively.

Apply this diff:

- print(f"_handle_event {event['type']}") + logger.debug(f"Handling event: {event['type']}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(f"_handle_event {event['type']}")

logger.debug(f"Handling event: {event['type']}")

🤖 Prompt for AI Agents

In plugins/openai/vision_agents/plugins/openai/rtc_manager.py around line 651, remove the debug print statement `print(f"_handle_event {event['type']}")` and replace it with a logger call (e.g., logger.debug or logger.info) that emits the event type via the module logger; ensure the module has a configured logger (logging.getLogger(__name__)) and use structured/templated logging (including event type and minimal context) instead of direct print so output goes through the logging framework.

commit ec32383 Author: Neevash Ramdial (Nash) <mail@neevash.dev> Date: Mon Oct 27 15:51:53 2025 -0600 mypy clean up (GetStream#130) commit c52fe4c Author: Neevash Ramdial (Nash) <mail@neevash.dev> Date: Mon Oct 27 15:28:00 2025 -0600 remove turn keeping from example (GetStream#129) commit e1072e8 Merge: 5bcffa3 fea101a Author: Yarik <43354956+yarikdevcom@users.noreply.github.com> Date: Mon Oct 27 14:28:05 2025 +0100 Merge pull request GetStream#106 from tjirab/feat/20251017_gh-labeler feat: Github pull request labeler commit 5bcffa3 Merge: 406673c bfe888f Author: Thierry Schellenbach <thierry@getstream.io> Date: Sat Oct 25 10:56:27 2025 -0600 Merge pull request GetStream#119 from GetStream/fix-screensharing Fix screensharing commit bfe888f Merge: 8019c14 406673c Author: Thierry Schellenbach <thierry@getstream.io> Date: Sat Oct 25 10:56:15 2025 -0600 Merge branch 'main' into fix-screensharing commit 406673c Author: Stefan Blos <stefan.blos@gmail.com> Date: Sat Oct 25 03:03:10 2025 +0200 Update README (GetStream#118) * Changed README to LaRaes version * Remove arrows from table * Add table with people & projects to follow * Update images and links in README.md commit 3316908 Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Fri Oct 24 23:48:06 2025 +0200 Simplify TTS plugin and audio utils (GetStream#123) - Simplified TTS plugin - AWS Polly TTS plugin - OpenAI TTS plugin - Improved audio utils commit 8019c14 Author: Max Kahan <max.kahan@getstream.io> Date: Fri Oct 24 17:32:26 2025 +0100 remove video forwarder lazy init commit ca62d37 Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 16:44:03 2025 +0100 use correct codec commit 8cf8788 Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 14:27:18 2025 +0100 rename variable to fix convention commit 33fd70d Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 14:24:42 2025 +0100 unsubscribe from events commit 3692131 Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 14:19:53 2025 +0100 remove nonexistent type commit c5f68fe Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 14:10:07 2025 +0100 cleanup tests to fit style commit 8b3c61a Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 13:55:08 2025 +0100 clean up resources when track cancelled commit d8e08cb Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 13:24:55 2025 +0100 fix track republishing in agent commit 0f8e116 Author: Max Kahan <max.kahan@getstream.io> Date: Wed Oct 22 15:37:11 2025 +0100 add tests commit 08e6133 Author: Max Kahan <max.kahan@getstream.io> Date: Wed Oct 22 15:25:37 2025 +0100 ensure video track dimensions are an even number commit 6a725b0 Merge: 5f001e0 5088709 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 15:23:58 2025 -0600 Merge pull request GetStream#122 from GetStream/cleanup_stt Cleanup STT commit 5088709 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 15:23:34 2025 -0600 cleanup of stt commit f185120 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 15:08:42 2025 -0600 more cleanup commit 05ccbfd Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 14:51:48 2025 -0600 cleanup commit bb834ca Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 14:28:53 2025 -0600 more cleanup for stt commit 7a3f2d2 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 14:11:35 2025 -0600 more test cleanup commit ad7f4fe Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 14:10:57 2025 -0600 cleanup test commit 9e50cdd Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 14:03:45 2025 -0600 large cleanup commit 5f001e0 Merge: 95a03e4 5d204f3 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 12:01:52 2025 -0600 Merge pull request GetStream#121 from GetStream/fish_stt [AI-201] Fish speech to text (partial) commit 5d204f3 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 11:48:16 2025 -0600 remove ugly tests commit ee9a241 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 11:46:19 2025 -0600 cleanup commit 6eb8270 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 11:23:00 2025 -0600 fix 48khz support commit 3b90548 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 10:59:08 2025 -0600 first attempt at fish stt, doesnt entirely work just yet commit 95a03e4 Merge: b90c9e3 b4c0da8 Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Thu Oct 23 10:11:39 2025 +0200 Merge branch 'main' of github.com:GetStream/Vision-Agents commit b90c9e3 Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Wed Oct 22 23:28:28 2025 +0200 remove print and double event handling commit b4c0da8 Merge: 3d06446 a426bc2 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 15:08:51 2025 -0600 Merge pull request GetStream#117 from GetStream/openrouter [AI-194] Openrouter commit a426bc2 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 15:03:10 2025 -0600 skip broken test commit ba6c027 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 14:50:23 2025 -0600 almost working openrouter commit 0b1c873 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 14:47:12 2025 -0600 almost working, just no instruction following commit ce63233 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 14:35:53 2025 -0600 working memory for openai commit 149e886 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 13:32:43 2025 -0600 todo commit e0df1f6 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 13:20:38 2025 -0600 first pass at adding openrouter commit 3d06446 Merge: 4eb8ef4 ef55d66 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 13:20:11 2025 -0600 Merge branch 'main' of github.com:GetStream/Vision-Agents commit 4eb8ef4 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 13:20:01 2025 -0600 cleanup ai plugin instructions commit ef55d66 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 12:54:33 2025 -0600 Add link to stash_pomichter for spatial memory commit 9c9737f Merge: c954409 390c45b Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 19:45:09 2025 -0600 Merge pull request GetStream#115 from GetStream/fish [AI-195] Fish support commit 390c45b Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 19:44:37 2025 -0600 cleannup commit 1cc1cf1 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 19:42:03 2025 -0600 happy tests commit 8163d32 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 19:39:21 2025 -0600 fix gemini rule following commit ada3ac9 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 19:20:18 2025 -0600 fish tts commit 61a26cf Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 16:44:03 2025 -0600 attempt at fish commit c954409 Merge: ab27e48 c71da10 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 14:18:15 2025 -0600 Merge pull request GetStream#104 from GetStream/bedrock [AI-192] - Bedrock, AWS & Nova commit c71da10 Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Tue Oct 21 22:00:25 2025 +0200 maybe commit b5482da Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Tue Oct 21 21:46:15 2025 +0200 debugging commit 9a36e45 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 13:14:58 2025 -0600 echo environment name commit 6893968 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 12:53:58 2025 -0600 more debugging commit c35fc47 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 12:45:44 2025 -0600 add some debug info commit 0d6d3fd Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 12:03:13 2025 -0600 run test fix commit c3a31bd Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 11:52:25 2025 -0600 log cache hit commit 04554ae Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 11:48:03 2025 -0600 fix glob commit 7da96db Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 11:33:56 2025 -0600 mypy commit 186053f Merge: 4b540c9 ab27e48 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 11:17:17 2025 -0600 happy tests commit 4b540c9 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 10:20:04 2025 -0600 happy tests commit b05a60a Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 09:17:45 2025 -0600 add readme commit 71affcc Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 09:13:01 2025 -0600 rename to aws commit d2eeba7 Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 21:32:01 2025 -0600 ai tts instructions commit 98a4f9d Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 16:49:00 2025 -0600 small edits commit ab27e48 Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Mon Oct 20 21:42:04 2025 +0200 Ensure user agent is initialized before joining the call (GetStream#113) * ensure user agent is initialized before joining the call * wip commit 3cb339b Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Mon Oct 20 21:22:57 2025 +0200 New conversation API (GetStream#102) * trying to resurrect * test transcription events for openai * more tests for openai and gemini llm * more tests for openai and gemini llm * update py-client * wip * ruff * wip * ruff * snap * another way * another way, a better way * ruff * ruff * rev * ruffit * mypy everything * brief * tests * openai dep bump * snap - broken * nothingfuckingworks * message id * fix test * ruffit commit cb6f00a Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 13:18:03 2025 -0600 use qwen commit f84b2ad Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 13:02:24 2025 -0600 fix tests commit e61acca Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 12:50:40 2025 -0600 testing and linting commit 5f4d353 Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 12:34:14 2025 -0600 working commit c2a15a9 Merge: a310771 1025a42 Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 11:40:00 2025 -0600 Merge branch 'main' of github.com:GetStream/Vision-Agents into bedrock commit a310771 Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 11:39:48 2025 -0600 wip commit b4370f4 Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 11:22:43 2025 -0600 something isn't quite working commit 2dac975 Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 10:30:04 2025 -0600 add the examples commit 6885289 Author: Thierry Schellenbach <thierry@getstream.io> Date: Sun Oct 19 20:19:42 2025 -0600 ai realtime docs commit a0fa3cc Author: Thierry Schellenbach <thierry@getstream.io> Date: Sun Oct 19 18:48:06 2025 -0600 wip commit b914fc3 Author: Thierry Schellenbach <thierry@getstream.io> Date: Sun Oct 19 18:40:22 2025 -0600 fix ai llm commit b5b00a7 Author: Thierry Schellenbach <thierry@getstream.io> Date: Sun Oct 19 17:11:26 2025 -0600 work audio input commit ac72260 Author: Thierry Schellenbach <thierry@getstream.io> Date: Sun Oct 19 16:47:19 2025 -0600 fix model id commit 2b5863c Author: Thierry Schellenbach <thierry@getstream.io> Date: Sun Oct 19 16:32:54 2025 -0600 wip on bedrock commit 8bb4162 Author: Thierry Schellenbach <thierry@getstream.io> Date: Fri Oct 17 15:22:03 2025 -0600 next up the connect method commit 7a21e4e Author: Thierry Schellenbach <thierry@getstream.io> Date: Fri Oct 17 14:12:00 2025 -0600 nova progress commit 16e8ba0 Author: Thierry Schellenbach <thierry@getstream.io> Date: Fri Oct 17 13:16:00 2025 -0600 docs for bedrock nova commit 1025a42 Author: Bart Schuijt <schuijt.bart@gmail.com> Date: Fri Oct 17 21:05:45 2025 +0200 fix: Update .env.example for Gemini Live (GetStream#108) commit e12112d Author: Thierry Schellenbach <thierry@getstream.io> Date: Fri Oct 17 11:49:07 2025 -0600 wip commit fea101a Author: Bart Schuijt <schuijt.bart@gmail.com> Date: Fri Oct 17 09:25:55 2025 +0200 workflow file update commit bb2d74c Author: Bart Schuijt <schuijt.bart@gmail.com> Date: Fri Oct 17 09:22:33 2025 +0200 initial commit commit d2853cd Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 16 19:44:59 2025 -0600 always remember pep 420 commit 30a8eca Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 16 19:36:58 2025 -0600 start of bedrock branch commit fc032bf Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Thu Oct 16 09:17:42 2025 +0200 Remove cli handler from examples (GetStream#101) commit 39a821d Author: Dan Gusev <dangusev92@gmail.com> Date: Tue Oct 14 12:20:41 2025 +0200 Update Deepgram plugin to use SDK v5.0.0 (GetStream#98) * Update Deepgram plugin to use SDK v5.0.0 * Merge test_realtime and test_stt and update the remaining tests * Make deepgram.STT.start() idempotent * Clean up unused import * Use uv as the default package manager > pip --------- Co-authored-by: Neevash Ramdial (Nash) <mail@neevash.dev> commit 2013be5 Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Mon Oct 13 16:57:37 2025 +0200 ensure chat works with default types (GetStream#99)

tbarbugli added 18 commits October 13, 2025 08:57

trying to resurrect

9de7aa3

test transcription events for openai

6bf4541

more tests for openai and gemini llm

ba166ce

more tests for openai and gemini llm

9dce998

Merge branch 'main' into conversation

95068d0

update py-client

ef7c2cf

wip

e4344e3

ruff

0c3622b

wip

dca7239

ruff

4316886

snap

9d11274

another way

d838a92

another way, a better way

e93adb6

ruff

a746eb7

ruff

2b34725

rev

2e0eaae

ruffit

f2fc7c9

mypy everything

102a61e

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

tbarbugli added 5 commits October 17, 2025 21:07

Merge branch 'main' into option-c-unified-message-api

bb3570b

brief

2388b5e

tests

2277d50

openai dep bump

f4b9687

snap - broken

6fd13c6

coderabbitai bot reviewed Oct 20, 2025

View reviewed changes

tbarbugli added 4 commits October 20, 2025 10:50

nothingfuckingworks

e51151c

Merge branch 'main' into option-c-unified-message-api

5311bd3

message id

54b83d7

fix test

63c6352

ruffit

0412f11

coderabbitai bot reviewed Oct 20, 2025

View reviewed changes

tbarbugli merged commit 3cb339b into main Oct 20, 2025
5 checks passed

tbarbugli deleted the option-c-unified-message-api branch October 20, 2025 19:22

This was referenced Oct 21, 2025

[AI-192] - Bedrock, AWS & Nova #104

Merged

Cleanup stt #122

Merged

Simplify TTS plugin and audio utils #123

Merged

This was referenced Oct 28, 2025

WIP - Vogent + New Smart TURN + Audio utils usage #128

Merged

feat: add AWS Bedrock function calling implementation #120

Merged

	"getstream[webrtc,telemetry]>=2.5.5",
	"getstream[webrtc,telemetry]>=2.5.3",

		user_id: Optional[str] = None # type: ignore[assignment]
		metadata: Optional[Dict[str, Any]] = None

	logger.error(f"Stored session info: {self.session_info}")
	logger.info(f"Stored session info: {self.session_info}")

		@pytest.mark.asyncio
		async def test_agent_conversation_updates_with_realtime():

	print(f"_handle_event {event['type']}")
	logger.debug(f"Handling event: {event['type']}")

New conversation API #102

New conversation API #102

Uh oh!

Conversation

tbarbugli commented Oct 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

tbarbugli commented Oct 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 15, 2025 •

edited

Loading