-
Notifications
You must be signed in to change notification settings - Fork 62
New conversation API #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New conversation API #102
Conversation
WalkthroughThis PR refactors the conversation messaging API from a fragmented streaming interface to a unified asynchronous interface with Changes
Sequence DiagramsequenceDiagram
participant Agent
participant LLMSetup as _setup_llm_events
participant SpeechSetup as _setup_speech_events
participant LLMProvider
participant STTProvider
participant Conversation
Note over Agent: join() flow initialization
Agent->>LLMSetup: await _setup_llm_events()
LLMSetup->>LLMProvider: subscribe to LLM events
Agent->>SpeechSetup: await _setup_speech_events()
SpeechSetup->>STTProvider: subscribe to STT/transcription events
Note over Agent: After MCP connection
Agent->>Conversation: create/initialize
par LLM Event Path
LLMProvider->>LLMSetup: emit completion/delta events
LLMSetup->>Conversation: upsert_message(role=assistant, content_index)
and Speech Event Path
STTProvider->>SpeechSetup: emit transcription events
SpeechSetup->>Conversation: upsert_message(role=user, completed=True)
end
Conversation->>Conversation: buffer/accumulate content
Conversation->>Conversation: _sync_to_backend() [async]
sequenceDiagram
participant Client
participant StreamConversation
participant GetStreamChannel
participant Backend
Note over StreamConversation: Unified upsert_message API
Client->>StreamConversation: upsert_message(content delta, content_index)
activate StreamConversation
StreamConversation->>StreamConversation: buffer fragment in ContentBuffer
StreamConversation->>StreamConversation: accumulate content via _apply_pending()
alt First time (not in backend)
StreamConversation->>StreamConversation: _smart_chunk() split into chunks
StreamConversation->>GetStreamChannel: _create_chunks() send all chunks
GetStreamChannel->>Backend: send chunk messages with metadata
else Subsequent updates
StreamConversation->>StreamConversation: _update_chunks() compute new chunks
GetStreamChannel->>Backend: create/update/delete chunks as needed
end
alt On completion (completed=True)
StreamConversation->>StreamConversation: finalize all chunks
GetStreamChannel->>Backend: mark generating=false, finalize state
end
deactivate StreamConversation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Rationale: This PR encompasses significant refactoring across multiple interconnected systems (conversation API, agent orchestration, LLM event handling, GetStream integration, realtime audio). While many changes follow consistent patterns (event type quote updates, participant parameter threading), the core logic changes are heterogeneous and demand separate reasoning: unified message buffering/upsert semantics, chunk lifecycle management with backend synchronization, multi-user participant tracking, and event-driven LLM/speech routing. The breadth of affected files (20+) and the density of business logic in agents.py, conversation.py, stream_conversation.py, and openai_realtime.py require careful validation of cross-domain interactions. Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
plugins/openai/tests/test_openai_llm.py (1)
41-43: Use pytest_asyncio.fixture for async fixtures
Switch the decorator and add the import.+import pytest_asyncio @@ - @pytest.fixture - async def llm(self) -> OpenAILLM: + @pytest_asyncio.fixture + async def llm(self) -> OpenAILLM: llm = OpenAILLM(model="gpt-4o") return llmagents-core/vision_agents/core/agents/agents.py (1)
329-351: Duplicate wildcard subscriptions → double event deliveryon_wildcard is registered twice for _coordinator_ws_client, causing duplicate events.
Apply:
- connection._connection._coordinator_ws_client.on_wildcard( - "*", lambda event_name, event: self.events.send(event) - ) ... - connection._connection._coordinator_ws_client.on_wildcard( - "*", - lambda event_name, event: self.events.send(event), - ) + connection._connection._coordinator_ws_client.on_wildcard( + "*", lambda event_name, event: self.events.send(event) + )Keep a single registration for coordinator and the one for _ws_client.
🧹 Nitpick comments (14)
agents-core/vision_agents/core/events/base.py (2)
117-118: Field name ‘user_id’ conflicts with BaseEvent.user_id; rename or dropThis dataclass field shadows the BaseEvent.user_id accessor. It creates inconsistent semantics across events (sometimes attribute, sometimes callable/property) and confuses type-checkers.
Consider renaming to avoid the clash, or rely on BaseEvent.user_id derived from user_metadata.
- user_id: Optional[str] = None # type: ignore[assignment] + connection_user_id: Optional[str] = NoneIf you keep it, make BaseEvent.user_id a property (see previous comment) to standardize access patterns. Based on learnings.
41-41: Broaden user_metadata typing to interop with local Participant wrappersEvents elsewhere pass a local Participant dataclass (not the pb2 Participant). Current narrow typing can trigger false positives in type-checkers.
- user_metadata: Optional[Participant] = None + user_metadata: Optional[Any] = NoneAlternatively, introduce a Protocol with a user_id: Optional[str] attribute for stronger typing. Based on learnings.
agents-core/vision_agents/core/agents/conversation.py (2)
78-79: Lock held across I/O; confirm intended serializationHolding self._lock while awaiting _sync_to_backend serializes all message ops and can increase latency. If ordering is not strictly required, consider narrowing the lock to state mutations only, and release before backend I/O.
Would you like a draft refactor that separates state updates (under lock) from backend sync (outside lock) with an immutable payload?
22-25: Use timezone-aware timestampFor consistency with other UTC-aware timestamps in the codebase, prefer an aware datetime.
- self.timestamp = datetime.datetime.now() + self.timestamp = datetime.datetime.now(datetime.UTC)agents-core/vision_agents/core/edge/sfu_events.py (1)
488-494: Consistent user_id accessor matches BaseEvent (post-refactor)This property correctly guards for missing payload fields. Once BaseEvent.user_id is a property, the override becomes conventional and you can drop type: ignore comments in similar properties.
tests/test_conversation.py (1)
45-50: Optional: avoid asserting exact TypeError message text.Message wording can vary by Python version. It’s enough to assert TypeError raised.
- with pytest.raises(TypeError) as exc_info: - Conversation("instructions", []) - assert "Can't instantiate abstract class" in str(exc_info.value) + with pytest.raises(TypeError): + Conversation("instructions", [])plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
105-112: Add a monotonic sequence_number across the entire stream (don’t reuse idx across phases).idx resets in follow-up loops, which can confuse event consumers. Use LLMResponseChunkEvent.sequence_number to carry a single increasing counter across both loops; keep content_index for per-phase indexing if you like.
Apply this refactor:
@@ - for idx, chunk in enumerate(iterator): + seq = 0 + for idx, chunk in enumerate(iterator): response_chunk: GenerateContentResponse = chunk final_chunk = response_chunk - self._standardize_and_emit_event(response_chunk, text_parts, item_id, idx) + self._standardize_and_emit_event(response_chunk, text_parts, item_id, idx, seq) + seq += 1 @@ - for idx, chk in enumerate(follow_up_iter): + for idx, chk in enumerate(follow_up_iter): follow_up_last = chk - # TODO: unclear if this is correct (item_id and idx) - self._standardize_and_emit_event(chk, follow_up_text_parts, item_id, idx) + self._standardize_and_emit_event(chk, follow_up_text_parts, item_id, idx, seq) + seq += 1 @@ - def _standardize_and_emit_event(self, chunk: GenerateContentResponse, text_parts: List[str], item_id: str, idx: int) -> Optional[LLMResponseEvent[Any]]: + def _standardize_and_emit_event(self, chunk: GenerateContentResponse, text_parts: List[str], item_id: str, idx: int, seq: int) -> Optional[LLMResponseEvent[Any]]: @@ - self.events.send(LLMResponseChunkEvent( + self.events.send(LLMResponseChunkEvent( plugin_name="gemini", content_index=idx, item_id=item_id, - delta=chunk.text, + delta=chunk.text, + sequence_number=seq, ))Item_id propagation and completion event look correct. Nice.
Also applies to: 153-157, 214-221
agents-core/vision_agents/core/agents/agents.py (2)
709-729: Rename: hasImageProcessers → has_image_processorsSpelling and PEP8. No behavior change.
Apply:
- hasImageProcessers = len(self.image_processors) > 0 + has_image_processors = len(self.image_processors) > 0 ... - if not hasImageProcessers: + if not has_image_processors: ... - if hasImageProcessers: + if has_image_processors:
196-240: Add Google-style docstrings to new setup hooksDocument event subscriptions, responsibilities, and side effects.
Example for _setup_llm_events:
async def _setup_llm_events(self): """Subscribe to LLM events and sync TTS + conversation. Subscribes: - LLMResponseCompletedEvent: sends TTS (if any) and replaces assistant message. - LLMResponseChunkEvent: appends/streams assistant deltas. Side effects: - Writes to self.conversation if available. - Calls self.tts.send() for non-empty completions. """As per coding guidelines
Also applies to: 241-288
plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)
126-129: Avoid double-converting tools; _get_tools_for_provider already returns provider formatPass through directly.
- tools_spec = self._get_tools_for_provider() - if tools_spec: - kwargs["tools"] = self._convert_tools_to_provider_format(tools_spec) # type: ignore[arg-type] + tools_spec = self._get_tools_for_provider() + if tools_spec: + kwargs["tools"] = tools_spec # already provider-formatted
79-85: Type hint: participant should be Optional[Participant]Minor typing polish.
- participant: Participant = None, + participant: Optional[Participant] = None,plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)
85-103: API error check may never trigger; SDK raises on non-2xxresponse.data.message.type == "error" is unlikely; the SDK usually raises exceptions. Consider removing or logging only.
plugins/getstream/tests/test_message_chunking.py (2)
138-144: Strengthen boundary testOne char over should split.
- # Should be split (no newlines, so splits anywhere) - assert len(chunks) >= 1 + # Should be split into at least 2 chunks + assert len(chunks) >= 2
161-167: Clarify multiple code blocks assertionDirectly assert presence of both blocks.
- assert full_text.count('```python') == full_text.count('```') / 2 or '```python' in full_text - assert 'code1()' in full_text - assert 'code2()' in full_text + assert '```python' in full_text and '```javascript' in full_text + assert 'code1()' in full_text + assert 'code2()' in full_text
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (20)
.cursor/rules/python.mdc(1 hunks)agents-core/pyproject.toml(1 hunks)agents-core/vision_agents/core/agents/agents.py(19 hunks)agents-core/vision_agents/core/agents/conversation.py(2 hunks)agents-core/vision_agents/core/agents/events.py(2 hunks)agents-core/vision_agents/core/edge/sfu_events.py(70 hunks)agents-core/vision_agents/core/events/base.py(4 hunks)agents-core/vision_agents/core/llm/events.py(9 hunks)examples/01_simple_agent_example/simple_agent_example.py(2 hunks)examples/02_golf_coach_example/golf_coach_example.py(2 hunks)plugins/cartesia/pyproject.toml(1 hunks)plugins/gemini/tests/test_gemini_llm.py(2 hunks)plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py(6 hunks)plugins/getstream/tests/test_message_chunking.py(1 hunks)plugins/getstream/tests/test_stream_conversation.py(1 hunks)plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py(1 hunks)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py(1 hunks)plugins/openai/tests/test_openai_llm.py(3 hunks)plugins/openai/vision_agents/plugins/openai/openai_llm.py(15 hunks)tests/test_conversation.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide
Files:
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.pyexamples/01_simple_agent_example/simple_agent_example.pyplugins/gemini/tests/test_gemini_llm.pyplugins/openai/vision_agents/plugins/openai/openai_llm.pyagents-core/vision_agents/core/llm/events.pyagents-core/vision_agents/core/edge/sfu_events.pyagents-core/vision_agents/core/agents/events.pyagents-core/vision_agents/core/agents/agents.pyplugins/gemini/vision_agents/plugins/gemini/gemini_llm.pyplugins/openai/tests/test_openai_llm.pyagents-core/vision_agents/core/agents/conversation.pytests/test_conversation.pyagents-core/vision_agents/core/events/base.pyplugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.pyplugins/getstream/tests/test_stream_conversation.pyexamples/02_golf_coach_example/golf_coach_example.pyplugins/getstream/tests/test_message_chunking.py
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
tests/**/*.py: Do not use mocking in tests
All tests must use pytest
Integration tests must be marked with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic
Files:
tests/test_conversation.py
🧬 Code graph analysis (17)
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (2)
plugins/getstream/tests/test_message_chunking.py (2)
conversation(15-27)conversation(244-251)agents-core/vision_agents/core/agents/conversation.py (6)
Conversation(67-227)Message(14-24)MessageState(55-64)_sync_to_backend(213-223)_sync_to_backend(233-237)send_message(80-120)
examples/01_simple_agent_example/simple_agent_example.py (4)
examples/02_golf_coach_example/golf_coach_example.py (1)
start_agent(12-38)plugins/gemini/tests/test_gemini_llm.py (1)
llm(33-36)plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
simple_response(79-107)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
simple_response(88-99)
plugins/gemini/tests/test_gemini_llm.py (4)
agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(108-123)LLMResponseCompletedEvent(127-133)plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (2)
GeminiLLM(21-322)send_message(75-182)plugins/gemini/vision_agents/plugins/gemini/events.py (2)
GeminiResponseEvent(35-38)GeminiErrorEvent(14-17)agents-core/vision_agents/core/events/manager.py (1)
wait(451-465)
plugins/openai/vision_agents/plugins/openai/openai_llm.py (4)
agents-core/vision_agents/core/edge/sfu_events.py (7)
Participant(229-270)participant(1495-1500)participant(1537-1542)participant(1610-1615)participant(2078-2083)participant(2127-2132)name(2162-2166)agents-core/vision_agents/core/llm/llm.py (4)
LLM(35-358)LLMResponseEvent(25-28)_dedup_and_execute(313-345)_get_tools_for_provider(86-95)agents-core/vision_agents/core/llm/events.py (3)
LLMResponseChunkEvent(108-123)LLMResponseCompletedEvent(127-133)error_message(103-104)plugins/openai/vision_agents/plugins/openai/events.py (2)
OpenAIStreamEvent(7-11)LLMErrorEvent(15-19)
agents-core/vision_agents/core/llm/events.py (1)
agents-core/vision_agents/core/events/base.py (1)
PluginBaseEvent(50-52)
agents-core/vision_agents/core/edge/sfu_events.py (1)
agents-core/vision_agents/core/events/base.py (2)
user_id(43-46)BaseEvent(34-46)
agents-core/vision_agents/core/agents/events.py (2)
agents-core/vision_agents/core/edge/sfu_events.py (6)
user_id(489-493)user_id(856-860)user_id(901-905)user_id(1186-1190)user_id(2071-2075)user_id(2113-2117)agents-core/vision_agents/core/events/base.py (2)
user_id(43-46)PluginBaseEvent(50-52)
agents-core/vision_agents/core/agents/agents.py (7)
agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(7-22)agents-core/vision_agents/core/turn_detection/events.py (2)
TurnStartedEvent(10-25)TurnEndedEvent(29-44)agents-core/vision_agents/core/agents/conversation.py (2)
Conversation(67-227)upsert_message(122-210)agents-core/vision_agents/core/mcp/mcp_manager.py (1)
MCPManager(9-147)agents-core/vision_agents/core/events/manager.py (2)
subscribe(292-353)send(406-449)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
create_conversation(224-231)create_audio_track(291-294)agents-core/vision_agents/core/utils/video_forwarder.py (1)
VideoForwarder(13-188)
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
agents-core/vision_agents/core/llm/llm.py (1)
LLMResponseEvent(25-28)
plugins/openai/tests/test_openai_llm.py (4)
agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(108-123)LLMResponseCompletedEvent(127-133)plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
OpenAILLM(31-509)plugins/openai/vision_agents/plugins/openai/events.py (2)
OpenAIStreamEvent(7-11)LLMErrorEvent(15-19)agents-core/vision_agents/core/events/manager.py (1)
wait(451-465)
agents-core/vision_agents/core/agents/conversation.py (1)
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)
_sync_to_backend(45-64)
tests/test_conversation.py (1)
agents-core/vision_agents/core/agents/conversation.py (3)
Message(14-24)send_message(80-120)upsert_message(122-210)
agents-core/vision_agents/core/events/base.py (1)
agents-core/vision_agents/core/edge/sfu_events.py (7)
Participant(229-270)user_id(489-493)user_id(856-860)user_id(901-905)user_id(1186-1190)user_id(2071-2075)user_id(2113-2117)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)
StreamConversation(17-353)
plugins/getstream/tests/test_stream_conversation.py (3)
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)
StreamConversation(17-353)plugins/getstream/tests/test_message_chunking.py (3)
mock_channel(217-241)conversation(15-27)conversation(244-251)agents-core/vision_agents/core/agents/conversation.py (2)
send_message(80-120)upsert_message(122-210)
examples/02_golf_coach_example/golf_coach_example.py (3)
plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
simple_response(79-107)agents-core/vision_agents/core/agents/agents.py (1)
simple_response(168-180)tests/test_realtime_base.py (1)
simple_response(238-252)
plugins/getstream/tests/test_message_chunking.py (2)
plugins/getstream/tests/test_stream_conversation.py (2)
stream_conversation(48-57)mock_channel(24-45)plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (3)
StreamConversation(17-353)_smart_chunk(206-301)_split_large_block(303-332)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (13)
examples/02_golf_coach_example/golf_coach_example.py (3)
17-17: Verify 3 FPS provides adequate golf swing analysis.The FPS reduction from 10 to 3 significantly decreases how often the LLM processes frames. A golf swing typically completes in 1–2 seconds, so at 3 FPS, only 3–6 frames are analyzed per swing. While this conserves cost (as the comment notes), it may miss critical swing phase details needed for meaningful feedback.
Please confirm this frame rate has been tested and still delivers useful coaching insights.
19-21: LGTM.Multi-line formatting improves readability.
33-35: LGTM.Multi-line formatting improves readability.
examples/01_simple_agent_example/simple_agent_example.py (4)
6-6: LGTM! Import aligns with LLM provider change.The import change from
openaitogeminiis consistent with the LLM initialization below.
16-18: LGTM! Improved readability.The multiline formatting of
agent_userimproves code readability without changing functionality.
25-27: LGTM! Improved readability.The multiline formatting of
TurnDetectionimproves code readability without changing functionality.
12-12: Model identifier is correct. The canonical Gemini 2.0 Flash ID isgemini-2.0-flash; the-expsuffix denotes the experimental variant and doesn’t apply here.plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
226-231: Channel scoping refactor looks goodUsing a local channel and passing it to StreamConversation removes unnecessary state and aligns with the chunked backend sync design.
plugins/gemini/tests/test_gemini_llm.py (1)
82-206: Strong integration coverage for streaming/event pipelineGood assertions on chunk presence, completion, item_id consistency, and reconstruction.
Minor thought: if any provider reorders indices, consider asserting on sorted(content_indices) against range(max+1). Not blocking.
plugins/openai/tests/test_openai_llm.py (1)
101-228: Event-stream test is thorough and aligns with the new event model.Good coverage: chunk vs completion correlation, item_id propagation, ordering, and OpenAI stream event types.
Please confirm OpenAILLM sets content_index as None or sequential; your test tolerates both, which is fine given current mapping notes.
agents-core/vision_agents/core/agents/events.py (1)
50-52: error_message property addition looks good.Simple, safe accessor; no concerns.
agents-core/vision_agents/core/llm/events.py (1)
108-134: LGTM on event schema updatesitem_id and delta additions align with upstream emitters; no issues spotted.
plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)
146-156: Verify SDK method and payload shape
Ensureephemeral_message_updateexists on the client and that itssetpayload accepts the chunk metadata at top level; if the SDK requires metadata under a nestedcustomkey, adjust the payload accordingly.
| requires-python = ">=3.10" | ||
| dependencies = [ | ||
| "getstream[webrtc,telemetry]>=2.5.3", | ||
| "getstream[webrtc,telemetry]>=2.5.5", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocker: getstream>=2.5.5 is not publishable yet.
PyPI’s latest public wheel is 2.5.3; requesting 2.5.5 makes installs fail with “No matching distribution found.” Please keep the floor at 2.5.3 (or whichever released version supplies the needed features) until 2.5.5 actually ships.
- "getstream[webrtc,telemetry]>=2.5.5",
+ "getstream[webrtc,telemetry]>=2.5.3",Based on learnings
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "getstream[webrtc,telemetry]>=2.5.5", | |
| "getstream[webrtc,telemetry]>=2.5.3", |
🤖 Prompt for AI Agents
In agents-core/pyproject.toml around line 24, the dependency currently requests
getstream[webrtc,telemetry]>=2.5.5 which isn't published on PyPI; change the
constraint to >=2.5.3 (or the newest released version that provides the required
features) — e.g. getstream[webrtc,telemetry]>=2.5.3 — then update the lockfile
(poetry lock / pip-compile / equivalent) and verify the needed features exist in
that released version before finalizing.
| # Update content | ||
| if content_index is not None: | ||
| # Streaming: buffer fragments in order | ||
| state.buffer.add_fragment(content_index, content) | ||
| message.content = state.buffer.get_accumulated() | ||
| elif replace: | ||
| # Replace all content | ||
| state.buffer.clear() | ||
| message.content = content | ||
| else: | ||
| # Append to existing | ||
| message.content += content | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix content loss when mixing streaming (content_index) and append updates
Appending without updating the ContentBuffer means the next indexed chunk overwrite will drop appended text. Use the buffer for all mutations to keep a single source of truth.
- elif replace:
+ elif replace:
# Replace all content
state.buffer.clear()
message.content = content
else:
- # Append to existing
- message.content += content
+ # Append to existing via buffer to preserve order with future indexed chunks
+ next_index = state.buffer.last_index + 1
+ if next_index < 0:
+ next_index = 0
+ state.buffer.add_fragment(next_index, content)
+ message.content = state.buffer.get_accumulated()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Update content | |
| if content_index is not None: | |
| # Streaming: buffer fragments in order | |
| state.buffer.add_fragment(content_index, content) | |
| message.content = state.buffer.get_accumulated() | |
| elif replace: | |
| # Replace all content | |
| state.buffer.clear() | |
| message.content = content | |
| else: | |
| # Append to existing | |
| message.content += content | |
| # Update content | |
| if content_index is not None: | |
| # Streaming: buffer fragments in order | |
| state.buffer.add_fragment(content_index, content) | |
| message.content = state.buffer.get_accumulated() | |
| elif replace: | |
| # Replace all content | |
| state.buffer.clear() | |
| message.content = content | |
| else: | |
| # Append to existing via buffer to preserve order with future indexed chunks | |
| next_index = state.buffer.last_index + 1 | |
| if next_index < 0: | |
| next_index = 0 | |
| state.buffer.add_fragment(next_index, content) | |
| message.content = state.buffer.get_accumulated() |
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/conversation.py around lines 190-202,
the current logic appends text directly to message.content when content_index is
None and replace is False, which causes appended text to be lost when later
indexed fragments arrive; instead, always mutate the ContentBuffer as the single
source of truth: for replace, clear the buffer and (if appropriate) add the
replacement content as a base fragment; for append, add the incoming content to
the buffer via the buffer's fragment/append API (or add_fragment with a
sentinel/next index), and after every branch set message.content =
state.buffer.get_accumulated() so message.content is always derived from the
buffer.
| user_id: Optional[str] = None # type: ignore[assignment] | ||
| metadata: Optional[Dict[str, Any]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Do not shadow BaseEvent.user_id method; this breaks polymorphism at runtime.
Declaring a user_id field overrides the BaseEvent.user_id method. Any code calling event.user_id() will crash. Rename the field and, if needed, override the method explicitly.
Apply this diff to avoid shadowing:
- user_id: Optional[str] = None # type: ignore[assignment]
+ actor_user_id: Optional[str] = None
@@
- user_id: Optional[str] = None # type: ignore[assignment]
+ actor_user_id: Optional[str] = None
@@
- user_id: Optional[str] = None # type: ignore[assignment]
+ actor_user_id: Optional[str] = None
@@
- user_id: Optional[str] = None # type: ignore[assignment]
+ actor_user_id: Optional[str] = NoneThen add method overrides to preserve the original API shape:
# Add inside each class
def user_id(self) -> Optional[str]: # type: ignore[override]
# Prefer explicit field if set, else fallback to BaseEvent behavior
return self.actor_user_id or super().user_id()Check for method-style calls that would be broken:
Also applies to: 26-27, 36-37, 47-48
🏁 Script executed:
#!/bin/bash
rg -nP --type=py -C2 '\.user_id\s*\('Length of output: 825
Avoid shadowing BaseEvent.user_id method
Rename the user_id attribute in each subclass (e.g. to actor_user_id) and override user_id() to preserve the original API:
- user_id: Optional[str] = None # type: ignore[assignment]
+ actor_user_id: Optional[str] = Nonedef user_id(self) -> Optional[str]: # type: ignore[override]
return self.actor_user_id or super().user_id()This prevents runtime errors in calls to event.user_id().
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/events.py around lines 12 to 13, the
subclasses define a user_id attribute which shadows the BaseEvent.user_id
method; rename the attribute in each subclass (for example to actor_user_id) and
add an overriding user_id(self) -> Optional[str] method that returns the new
attribute or falls back to super().user_id() (use type ignore override if
needed), so calls to event.user_id() continue to work and avoid runtime errors.
| def user_id(self) -> Optional[str]: | ||
| if self.user_metadata is None: | ||
| return None | ||
| return getattr(self.user_metadata, "user_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Make user_id a property and guard for missing attribute
Unify the API shape with other event classes and avoid method/field clashes. Also prevent AttributeError if user_metadata lacks user_id.
- def user_id(self) -> Optional[str]:
- if self.user_metadata is None:
- return None
- return getattr(self.user_metadata, "user_id")
+ @property
+ def user_id(self) -> Optional[str]:
+ if self.user_metadata is None:
+ return None
+ return getattr(self.user_metadata, "user_id", None)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def user_id(self) -> Optional[str]: | |
| if self.user_metadata is None: | |
| return None | |
| return getattr(self.user_metadata, "user_id") | |
| @property | |
| def user_id(self) -> Optional[str]: | |
| if self.user_metadata is None: | |
| return None | |
| return getattr(self.user_metadata, "user_id", None) |
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/events/base.py around lines 43 to 46, change
the user_id method into a @property that returns Optional[str] and guard against
a missing attribute by using getattr(self.user_metadata, "user_id", None) (or
equivalent) instead of getattr(...) without a default; keep the existing None
check for user_metadata and update any type hints or docstrings to reflect that
user_id is a property rather than a method.
| import pytest | ||
| import uuid | ||
| import asyncio | ||
| from unittest.mock import Mock, AsyncMock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace mocks with simple fakes (tests must not use mocking).
Per test guidelines, avoid unittest.mock. Implement minimal fake Channel/Client classes with the needed async methods and state tracking instead of Mock/AsyncMock.
Apply these changes:
- Remove the mock imports.
- Rewrite the fixture to return fakes.
-from unittest.mock import Mock, AsyncMock
@@
- def mock_channel(self):
- """Create a mock Channel."""
- channel = Mock()
- channel.channel_type = "messaging"
- channel.channel_id = "test-channel-123"
-
- # Mock the client
- channel.client = Mock()
-
- # Create async mocks for client methods
- channel.client.update_message_partial = AsyncMock(return_value=Mock())
- channel.client.ephemeral_message_update = AsyncMock(return_value=Mock())
-
- # Mock send_message response
- mock_response = Mock()
- mock_response.data.message.id = "stream-message-123"
- mock_response.data.message.type = "regular"
-
- # Create async mock for send_message
- channel.send_message = AsyncMock(return_value=mock_response)
-
- return channel
+ def mock_channel(self):
+ """Create a fake Channel (no unittest.mock)."""
+ class FakeResponseMsg:
+ def __init__(self, mid: str):
+ self.id = mid
+ self.type = "regular"
+ self.custom = {}
+ self.text = ""
+ class FakeResponseData:
+ def __init__(self, mid: str):
+ self.message = FakeResponseMsg(mid)
+ class FakeResponse:
+ def __init__(self, mid: str):
+ self.data = FakeResponseData(mid)
+ class FakeClient:
+ async def update_message_partial(self, *_args, **_kwargs):
+ return object()
+ async def ephemeral_message_update(self, *_args, **_kwargs):
+ return object()
+ async def delete_message(self, *_args, **_kwargs):
+ return object()
+ class FakeChannel:
+ def __init__(self):
+ self.channel_type = "messaging"
+ self.channel_id = "test-channel-123"
+ self.client = FakeClient()
+ self._send_count = 0
+ async def send_message(self, request):
+ mid = f"stream-message-{self._send_count}"
+ self._send_count += 1
+ return FakeResponse(mid)
+ return FakeChannel()Adjust assertions that check Mock call methods to validate behavior (e.g., maintain counters or flags on the fake).
I can provide tailored fakes and assertion updates if you confirm required behaviors.
Also applies to: 23-45
🤖 Prompt for AI Agents
In plugins/getstream/tests/test_stream_conversation.py around lines 7 and 23-45,
the test currently imports and uses unittest.mock.Mock/AsyncMock; replace those
with small hand-written fake classes: remove the mock imports, implement a
FakeChannel and FakeClient inside the test module that expose the async methods
used by the tests (e.g., async send, async get or similar) and maintain simple
state counters/flags (call_count, last_args, or a list of messages) so
assertions can inspect behavior; update the fixture to return instances of these
fakes instead of mocks, and replace any Mock-specific assertions (like
assert_awaited or call_args) with checks against the fake's counters/flags or
stored data to validate expected interactions.
| @pytest.mark.asyncio | ||
| async def test_send_message_simple(self, stream_conversation, mock_channel): | ||
| """Test send_message convenience method.""" | ||
| await stream_conversation.send_message( | ||
| role="user", | ||
| user_id="user123", | ||
| content="Hello", | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Remove @pytest.mark.asyncio across this file.
Async support is automatic in this repo; decorators are prohibited.
Example:
-@pytest.mark.asyncio
async def test_send_message_simple(...):Find all occurrences:
Also applies to: 80-84, 83-106, 107-142, 143-183, 184-207, 208-234, 236-615
🏁 Script executed:
#!/bin/bash
rg -n '@pytest\.mark\.asyncio' plugins/getstream/tests/test_stream_conversation.pyLength of output: 371
Remove all @pytest.mark.asyncio decorators Async support is automatic; these decorators are prohibited.
🤖 Prompt for AI Agents
In plugins/getstream/tests/test_stream_conversation.py around lines 59 to 67,
remove the prohibited @pytest.mark.asyncio decorator(s) from the async test
function(s); leave the async def test_send_message_simple(...) signature intact
(async support is automatic), and delete any corresponding import of
pytest.mark.asyncio if it exists so no unused decorator remains.
| await self.channel.client.update_message_partial( | ||
| chunk_id, | ||
| user_id=message.user_id, | ||
| set={ | ||
| "text": chunk_text, | ||
| "generating": False if is_last_chunk else False, | ||
| "chunk_group": message.id, | ||
| "chunk_index": i, | ||
| "total_chunks": new_chunk_count, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial update should set custom. fields under custom, not at root*
Current code writes generating/chunk_* at message root; creation wrote them under custom. Align shapes.
- await self.channel.client.update_message_partial(
- chunk_id,
- user_id=message.user_id,
- set={
- "text": chunk_text,
- "generating": False if is_last_chunk else False,
- "chunk_group": message.id,
- "chunk_index": i,
- "total_chunks": new_chunk_count,
- },
- )
+ await self.channel.client.update_message_partial(
+ chunk_id,
+ user_id=message.user_id,
+ set={
+ "text": chunk_text,
+ "custom": {
+ "generating": False,
+ "chunk_group": message.id,
+ "chunk_index": i,
+ "total_chunks": new_chunk_count,
+ },
+ },
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| await self.channel.client.update_message_partial( | |
| chunk_id, | |
| user_id=message.user_id, | |
| set={ | |
| "text": chunk_text, | |
| "generating": False if is_last_chunk else False, | |
| "chunk_group": message.id, | |
| "chunk_index": i, | |
| "total_chunks": new_chunk_count, | |
| }, | |
| ) | |
| await self.channel.client.update_message_partial( | |
| chunk_id, | |
| user_id=message.user_id, | |
| set={ | |
| "text": chunk_text, | |
| "custom": { | |
| "generating": False, | |
| "chunk_group": message.id, | |
| "chunk_index": i, | |
| "total_chunks": new_chunk_count, | |
| }, | |
| }, | |
| ) |
🤖 Prompt for AI Agents
In plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py
around lines 129 to 139, the partial update writes generating and chunk_* fields
at the message root whereas message creation stores them under the custom
object; change the update payload so "text" remains at root but the generating,
chunk_group, chunk_index and total_chunks fields are nested inside a "custom"
dict (i.e., set={"text": chunk_text, "custom": {"generating": <bool>,
"chunk_group": message.id, "chunk_index": i, "total_chunks": new_chunk_count}}),
keeping the same boolean logic for generating.
| self.events.send( | ||
| LLMResponseCompletedEvent( | ||
| item_id=llm_response.original.output[0].id, | ||
| original=llm_response.original, | ||
| text=llm_response.text, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard item_id extraction for non-streaming responses
Responses without text outputs (e.g., only tool calls) can leave output empty.
- self.events.send(
- LLMResponseCompletedEvent(
- item_id=llm_response.original.output[0].id,
- original=llm_response.original,
- text=llm_response.text,
- )
- )
+ self.events.send(
+ LLMResponseCompletedEvent(
+ item_id=(
+ llm_response.original.output[0].id
+ if getattr(llm_response.original, "output", None)
+ and len(llm_response.original.output) > 0
+ else None
+ ),
+ original=llm_response.original,
+ text=llm_response.text,
+ )
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self.events.send( | |
| LLMResponseCompletedEvent( | |
| item_id=llm_response.original.output[0].id, | |
| original=llm_response.original, | |
| text=llm_response.text, | |
| ) | |
| ) | |
| self.events.send( | |
| LLMResponseCompletedEvent( | |
| item_id=( | |
| llm_response.original.output[0].id | |
| if getattr(llm_response.original, "output", None) | |
| and len(llm_response.original.output) > 0 | |
| else None | |
| ), | |
| original=llm_response.original, | |
| text=llm_response.text, | |
| ) | |
| ) |
🤖 Prompt for AI Agents
In plugins/openai/vision_agents/plugins/openai/openai_llm.py around lines 197 to
203, the code assumes llm_response.original.output[0] exists which will raise
when output is empty (e.g., non-text/tool-only responses); guard the extraction
by checking that llm_response.original.output is a non-empty list before
accessing [0] and use a safe fallback (for example None or
llm_response.original.id) for item_id when output is empty, then send the
LLMResponseCompletedEvent using that guarded item_id so the event is still
emitted without raising.
| @pytest.mark.asyncio | ||
| async def test_send_message(self, conversation): | ||
| """Test send_message creates a new message.""" | ||
| message = await conversation.send_message( | ||
| role="user", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Remove @pytest.mark.asyncio decorators (auto async support is enabled).
Per project guidelines, drop these markers; keep tests as async def only.
Representative change:
- @pytest.mark.asyncio
async def test_send_message(self, conversation):List occurrences to update:
Also applies to: 95-103, 122-131, 135-146, 179-188, 210-219, 248-260
🏁 Script executed:
#!/bin/bash
rg -n --type=py '@pytest\.mark\.asyncio' tests | sed -n '1,120p'Length of output: 4278
Remove all @pytest.mark.asyncio decorators
Async support is automatic; retain only async def on test functions.
Representative change:
- @pytest.mark.asyncio
async def test_send_message(self, conversation):To remove everywhere:
rg -l '@pytest\.mark\.asyncio' tests | xargs sed -i '/@pytest.mark.asyncio/d'🤖 Prompt for AI Agents
In tests/test_conversation.py around lines 80 to 84, remove the
@pytest.mark.asyncio decorator above the async test function test_send_message;
keep the async def signature only. Search for and delete any occurrences of
@pytest.mark.asyncio in this file (and other tests if applicable) so tests rely
on pytest's builtin asyncio support, ensuring indentation and spacing remain
correct after removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (1)
16-20: UnifyPcmDatatype to the core definition to avoid drift.This module imports
PcmDatafrom RTC utils while the rest of the repo usesvision_agents.core.edge.types.PcmData. Unify to the core type; RTC layer only needs.samples/.sample_rate.-from getstream.video.rtc.track_util import PcmData +from vision_agents.core.edge.types import PcmData @@ - async def simple_audio_response( - self, audio: PcmData, participant: Optional[Participant] = None - ): + async def simple_audio_response( + self, audio: PcmData, participant: Optional[Participant] = None + ):(Optional) Adjust
RTCManager.send_audio_pcmtyping if needed; runtime behavior remains identical.Also applies to: 121-136
♻️ Duplicate comments (1)
agents-core/vision_agents/core/agents/agents.py (1)
215-224: Delta upserts will fan‑out messages whenitem_idis missing — add a stable fallback map and clean up on completion.Several providers omit
item_idon chunks/completions; your code passesNone, creating a new message per delta. Maintain a per‑response key→message_id map (e.g., by(output_index, content_index)), use it for chunks, then finalize and delete on completion.Apply:
@@ - await self.conversation.upsert_message( - message_id=event.item_id, + # Use a stable id for this response: prefer item_id, else map by indexes + key = event.item_id or f"{event.output_index or 0}:{event.content_index or 0}" + msg_id = getattr(self, "_llm_stream_msg_ids", {}).get(key) + if not getattr(self, "_llm_stream_msg_ids", None): + self._llm_stream_msg_ids = {} + if msg_id is None: + msg_id = str(uuid.uuid4()) + self._llm_stream_msg_ids[key] = msg_id + await self.conversation.upsert_message( + message_id=msg_id, role="assistant", user_id=self.agent_user.id or "agent", content=event.delta or "", content_index=event.content_index, completed=False, # Still streaming ) @@ - await self.conversation.upsert_message( - message_id=event.item_id, + key = event.item_id or f"{getattr(event,'output_index',0) or 0}:{None}" + msg_id = getattr(self, "_llm_stream_msg_ids", {}).get(key) or event.item_id + await self.conversation.upsert_message( + message_id=msg_id, role="assistant", user_id=self.agent_user.id or "agent", content=event.text or "", completed=True, replace=True, # Replace any partial content from deltas ) + # Cleanup mapping for this response + if key in self._llm_stream_msg_ids: + self._llm_stream_msg_ids.pop(key, None)Also applies to: 226-245
🧹 Nitpick comments (16)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (2)
664-665: FIXME comment indicates unresolved typing issue.The event dictionary typing could be improved with a TypedDict or protocol.
Would you like me to generate a TypedDict definition for the OpenAI Realtime API events, or open an issue to track this?
711-736: Consider reducing log verbosity for frame-by-frame details.Per-frame logging at INFO level (lines 714-716, 731-736) could be excessive in production. Consider DEBUG level for frame-by-frame details while keeping INFO for summaries.
Example adjustment:
- logger.info( - f"🎥 Starting video frame forwarding loop (fps={fps}, interval={interval:.3f}s)" - ) - logger.info( - f"🎥 Source track: {type(source_track).__name__}, kind={getattr(source_track, 'kind', 'unknown')}" - ) + logger.info( + f"🎥 Starting video frame forwarding loop (fps={fps}, interval={interval:.3f}s)" + ) + logger.debug( + f"🎥 Source track: {type(source_track).__name__}, kind={getattr(source_track, 'kind', 'unknown')}" + )And:
# Log frame details - logger.info( - f"🎥 SUCCESS: Read frame #{frame_count} from user track!" - ) - logger.info( - f"🎥 Frame details: {frame.width}x{frame.height}, format={frame.format}, pts={frame.pts}" - ) + logger.debug( + f"🎥 SUCCESS: Read frame #{frame_count} from user track!" + ) + logger.debug( + f"🎥 Frame details: {frame.width}x{frame.height}, format={frame.format}, pts={frame.pts}" + )agents-core/vision_agents/core/agents/agents.py (6)
276-285: Realtime transcript upserts:replace=Truewith random IDs is redundant.You already generate a new UUID per transcript. Remove
replace=Trueto avoid confusion; reserve it for true replacements.- completed=True, - replace=True, + completed=True,(and same change in the agent transcript block)
Also applies to: 295-303
684-687: Background task lacks error hook — attach callback to avoid swallowed exceptions.- asyncio.create_task( - self.llm.simple_audio_response(pcm_data, participant) - ) + task = asyncio.create_task( + self.llm.simple_audio_response(pcm_data, participant) + ) + task.add_done_callback(_log_task_exception)
249-261: Reduce PII in logs — truncate transcript text.Full transcripts in INFO can leak user data. Log first N chars or switch to DEBUG.
- self.logger.info(f"🎤 [Transcript]: {event.text}") + self.logger.debug("🎤 [Transcript]: %s", (event.text or "")[:120]) @@ - self.logger.info(f"🎤 [User transcript]: {event.text}") + self.logger.debug("🎤 [User transcript]: %s", (event.text or "")[:120]) @@ - self.logger.info(f"🎤 [Agent transcript]: {event.text}") + self.logger.debug("🎤 [Agent transcript]: %s", (event.text or "")[:120])Also applies to: 271-303
344-351: Early events can be lost beforeconversationis set — buffer or await creation earlier.Handlers bail when
self.conversationis None. Either awaitcreate_conversation_corobefore wiring LLM/STT or buffer upserts then drain once set.Would you prefer I draft a small ring buffer (deque maxlen=N) to queue upserts until
conversationis ready?Also applies to: 404-406
450-458: Graceful shutdown: await cancelled tasks to drain and suppress CancelledError.- for _, track in self._track_tasks.items(): - track.cancel() + for _, task in self._track_tasks.items(): + task.cancel() + if self._track_tasks: + await asyncio.gather(*self._track_tasks.values(), return_exceptions=True) + self._track_tasks.clear() @@ - if self._interval_task: - self._interval_task.cancel() + if self._interval_task: + self._interval_task.cancel() + try: + await self._interval_task + except asyncio.CancelledError: + passAlso applies to: 464-471, 496-499
754-756: Typo and readability:hasImageProcessers→has_image_processors.- hasImageProcessers = len(self.image_processors) > 0 + has_image_processors = len(self.image_processors) > 0 @@ - if not hasImageProcessers: + if not has_image_processors: @@ - if hasImageProcessers: + if has_image_processors:Also applies to: 768-773, 789-799
tests/test_realtime_base.py (1)
88-171: Unskip and update to the unified Conversation API.The skipped test still targets legacy methods (
partial_update_message,finish_last_message). Rework toawait conversation.upsert_message(...)assertions.I can provide a minimal
FakeConversation.upsert_messagethat records calls and simulates chunking semantics.plugins/openai/tests/test_openai_realtime.py (1)
113-173: Event assertions look solid; consider awaiting event queue instead of fixed sleeps.Replace
await asyncio.sleep(0.1)withawait realtime.events.wait(timeout=1.0)to reduce flakiness.agents-core/vision_agents/core/llm/realtime.py (1)
185-205: Add short Google‑style docstrings to new emitters.Improves discoverability for downstream handlers.
Example:
def _emit_user_speech_transcription(self, text: str, original=None): """Emit user speech transcript. Args: text: Transcript text. original: Provider-specific raw event. """plugins/openai/vision_agents/plugins/openai/openai_realtime.py (3)
194-233: Participant map can leak if completion never arrives — add bounds/cleanup.If
conversation.item.createdfires without a matching transcription,_item_to_participantretains entries.
- Clear map on
close()/disconnect.- Bound size (e.g., max 100) and evict oldest.
- Add periodic cleanup (TTL) if feasible.
301-304: Tool‑call logs may include sensitive args — lower to DEBUG or redact.- logger.info( + logger.debug( f"Executing tool call: {tool_call['name']} with args: {tool_call['arguments_json']}" ) @@ - logger.info(f"Tool call {tool_call['name']} succeeded: {response_data}") + logger.debug("Tool call %s succeeded", tool_call["name"])Also applies to: 309-319
333-374: Tool response path looks correct; small polish on JSON conversion and follow‑up trigger.Consider explicitly limiting
response_strlength or switching to JSON object if API supports it.plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (2)
208-237: LGTM - Defensive conditional checks.The expanded multi-line conditionals make None-checks explicit and improve code safety. While more verbose, this prevents potential
AttributeErrorexceptions when accessing nested attributes.
299-302: Consider enrichingLLMResponseChunkEventwith additional metadata.The
LLMResponseChunkEventincludes optional fields likeitem_id,output_index, andsequence_numberthat could provide richer context for event consumers. Ifserver_messagecontains relevant identifiers or ordering information, consider populating these fields for consistency with OpenAI realtime events.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
agents-core/vision_agents/core/agents/agents.py(21 hunks)agents-core/vision_agents/core/llm/events.py(8 hunks)agents-core/vision_agents/core/llm/realtime.py(5 hunks)examples/01_simple_agent_example/simple_agent_example.py(2 hunks)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py(19 hunks)plugins/openai/pyproject.toml(1 hunks)plugins/openai/tests/test_openai_realtime.py(4 hunks)plugins/openai/vision_agents/plugins/openai/openai_realtime.py(13 hunks)plugins/openai/vision_agents/plugins/openai/rtc_manager.py(23 hunks)tests/test_realtime_base.py(9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/01_simple_agent_example/simple_agent_example.py
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
tests/**/*.py: Do not use mocking in tests
All tests must use pytest
Integration tests must be marked with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic
Files:
tests/test_realtime_base.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide
Files:
tests/test_realtime_base.pyagents-core/vision_agents/core/llm/realtime.pyplugins/openai/vision_agents/plugins/openai/openai_realtime.pyagents-core/vision_agents/core/agents/agents.pyplugins/openai/tests/test_openai_realtime.pyplugins/openai/vision_agents/plugins/openai/rtc_manager.pyplugins/gemini/vision_agents/plugins/gemini/gemini_realtime.pyagents-core/vision_agents/core/llm/events.py
🧬 Code graph analysis (8)
tests/test_realtime_base.py (4)
agents-core/vision_agents/core/edge/edge_transport.py (1)
EdgeTransport(18-59)agents-core/vision_agents/core/edge/types.py (2)
User(15-18)Connection(27-34)agents-core/vision_agents/core/events/manager.py (1)
wait(451-465)agents-core/vision_agents/core/agents/agents.py (1)
Agent(56-1045)
agents-core/vision_agents/core/llm/realtime.py (4)
agents-core/vision_agents/core/edge/types.py (1)
PcmData(37-183)plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
simple_audio_response(130-156)plugins/openai/vision_agents/plugins/openai/openai_realtime.py (1)
simple_audio_response(121-136)agents-core/vision_agents/core/llm/events.py (2)
RealtimeUserSpeechTranscriptionEvent(142-147)RealtimeAgentSpeechTranscriptionEvent(151-156)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (6)
plugins/openai/tests/test_openai_realtime.py (1)
realtime(20-29)plugins/openai/vision_agents/plugins/openai/rtc_manager.py (5)
RTCManager(253-796)connect(285-308)start_video_sender(477-538)stop_video_sender(540-582)_send_event(447-475)agents-core/vision_agents/core/llm/realtime.py (6)
connect(68-68)_emit_connected_event(83-97)simple_audio_response(71-73)_emit_agent_speech_transcription(196-204)_emit_response_event(138-156)_emit_user_speech_transcription(185-194)agents-core/vision_agents/core/edge/types.py (1)
PcmData(37-183)agents-core/vision_agents/core/llm/llm.py (2)
_run_one_tool(226-292)get_available_functions(180-182)agents-core/vision_agents/core/llm/llm_types.py (1)
ToolSchema(64-67)
agents-core/vision_agents/core/agents/agents.py (8)
agents-core/vision_agents/core/llm/events.py (4)
LLMResponseChunkEvent(90-105)LLMResponseCompletedEvent(109-115)RealtimeUserSpeechTranscriptionEvent(142-147)RealtimeAgentSpeechTranscriptionEvent(151-156)agents-core/vision_agents/core/llm/realtime.py (3)
Realtime(24-220)simple_audio_response(71-73)_watch_video_track(75-77)agents-core/vision_agents/core/mcp/mcp_manager.py (1)
MCPManager(9-147)agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(7-22)agents-core/vision_agents/core/agents/conversation.py (2)
Conversation(67-227)upsert_message(122-210)agents-core/vision_agents/core/events/manager.py (3)
register_events_from_module(214-247)subscribe(292-353)send(406-449)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
create_conversation(224-231)create_audio_track(291-294)agents-core/vision_agents/core/utils/video_forwarder.py (1)
VideoForwarder(13-188)
plugins/openai/tests/test_openai_realtime.py (4)
agents-core/vision_agents/core/llm/events.py (3)
RealtimeAudioOutputEvent(37-45)RealtimeUserSpeechTranscriptionEvent(142-147)RealtimeAgentSpeechTranscriptionEvent(151-156)agents-core/vision_agents/core/agents/agents.py (2)
subscribe(187-199)simple_response(173-185)plugins/openai/vision_agents/plugins/openai/openai_realtime.py (4)
connect(72-98)simple_response(100-119)_handle_openai_event(152-244)simple_audio_response(121-136)agents-core/vision_agents/core/edge/types.py (2)
PcmData(37-183)resample(129-183)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
VideoForwarder(13-188)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (5)
agents-core/vision_agents/core/llm/events.py (2)
RealtimeAudioOutputEvent(37-45)LLMResponseChunkEvent(90-105)agents-core/vision_agents/core/llm/realtime.py (2)
_emit_user_speech_transcription(185-194)_emit_agent_speech_transcription(196-204)agents-core/vision_agents/core/edge/types.py (1)
PcmData(37-183)agents-core/vision_agents/core/utils/video_forwarder.py (2)
start_event_consumer(104-188)start(31-39)agents-core/vision_agents/core/llm/llm.py (1)
_run_one_tool(226-292)
agents-core/vision_agents/core/llm/events.py (1)
agents-core/vision_agents/core/events/base.py (2)
PluginBaseEvent(50-52)AudioFormat(23-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (19)
plugins/openai/pyproject.toml (1)
14-14: Consider tightening the version constraint to prevent unexpected major version upgrades.The constraint
>=2.5.0permits any future major version (e.g.,3.0.0), which could introduce breaking changes. Since OpenAI's Python client follows SemVer with potential backwards-incompatible changes in minor versions, a safer constraint would be>=2.5.0,<3.0.0to ensure stability.Additionally, verify that the PR changes (e.g., new OpenAI realtime event types like
RealtimeUserSpeechTranscriptionEvent) actually require2.5.0specifically, or if an earlier2.xversion would suffice. The latest openai package version is 2.5.0 (released Oct 17, 2025), but confirming the minimum required version ensures clarity for future maintainers.plugins/openai/vision_agents/plugins/openai/rtc_manager.py (11)
45-50: LGTM: Clean type hints added.The explicit type annotations improve clarity.
96-132: LGTM: Improved formatting and logging.The multiline formatting and enhanced log messages improve readability.
136-149: LGTM: Formatting improvements.The multiline parameter formatting enhances code readability.
161-228: LGTM: Enhanced logging formatting.The multiline log statements improve readability without changing behavior.
349-362: LGTM: Session configuration enhancement.Enabling semantic VAD and input audio transcription aligns with the PR objectives for enriched event payloads.
326-328: LGTM: Consistent multiline formatting.The multiline formatting for async calls improves readability.
Also applies to: 457-459, 606-608, 627-630
424-475: LGTM: Logging formatting improvements.The multiline log statements are more readable.
477-538: LGTM: Formatting and logging improvements.The multiline formatting enhances code organization and readability.
540-582: LGTM: Spacing and logging improvements.The additional spacing and multiline logs improve code structure.
679-681: LGTM: Logging formatting improvement.The multiline formatting is more readable.
782-782: LGTM: Spacing improvement.plugins/openai/tests/test_openai_realtime.py (1)
64-77: Nice resampling harness and typedPcmDataconstruction.This aligns with the core
PcmDataAPI and exercises realistic paths. LGTM.agents-core/vision_agents/core/llm/realtime.py (1)
70-74: API change acknowledged:simple_audio_response(..., participant=None)Signature aligns with downstream usage (Agent/OpenAI/Gemini). LGTM.
Please confirm all concrete implementations were updated (OpenAI, Gemini) and no third-party LLMs remain on the old signature.
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (3)
9-24: LGTM - Improved type coverage.The expanded imports from
google.genai.typesprovide better type annotations throughout the module.
239-264: LGTM - Transcription events properly emitted.The user and agent speech transcription events are correctly emitted using the base class methods. The participant context set in
simple_audio_responsewill be included in user transcription events viauser_metadata=self._current_participantin the base class implementation.
130-157: No changes needed—attribute is properly initialized in base class.The verification confirms that
_current_participantis initialized in the base class (agents-core/vision_agents/core/llm/realtime.py) asself._current_participant: Optional[Participant] = None. The assignment insimple_audio_response(line 149) safely updates an already-initialized attribute; noAttributeErrorrisk exists.Likely an incorrect or invalid review comment.
agents-core/vision_agents/core/llm/events.py (2)
11-87: LGTM - String literal formatting and field enrichment.The change from single-quoted to double-quoted string literals in
typefield defaults is a stylistic consistency improvement. The new fields added to events (is_complete,conversation_item_id,status,role,content,is_recoverable,delta,item_id) are properly typed and provide richer event payloads for consumers.
141-156: LGTM - New transcription events well-structured.The new
RealtimeUserSpeechTranscriptionEventandRealtimeAgentSpeechTranscriptionEventclasses follow consistent patterns with existing events. The separation between user and agent transcriptions provides clear event semantics for consumers. Docstrings are concise and conform to the Google style guide requirement.
| user_id = event.user_id() or "user" | ||
|
|
||
| await self.conversation.upsert_message( | ||
| # TODO: FIX THIS, IT IS INCREDIBLY WRONG! | ||
| message_id="stt-" + user_id, | ||
| role="user", | ||
| user_id=user_id, | ||
| content=event.text or "", | ||
| completed=True, | ||
| replace=True, # Replace any partial transcripts | ||
| original=event, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STT upserts overwrite each other across utterances — use per‑utterance IDs (drop replace unless streaming).
Static message_id="stt-"+user_id with replace=True collapses all user speech into one message.
- user_id = event.user_id() or "user"
+ user_id = event.user_id() or "user"
@@
- # TODO: FIX THIS, IT IS INCREDIBLY WRONG!
- message_id="stt-" + user_id,
+ message_id=f"stt-{user_id}-{uuid.uuid4()}",
role="user",
user_id=user_id,
content=event.text or "",
completed=True,
- replace=True, # Replace any partial transcripts
+ # replace only if you're purposely reusing the same message_id for streaming
+ replace=False,
original=event,Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 254 to 265, the
STT upsert is using a static message_id ("stt-"+user_id) and replace=True which
causes all utterances to overwrite one another; change to generate a unique
per-utterance message_id (for example include event.id, a timestamp, or an
incrementing counter) and remove replace=True unless you intentionally want
streaming/partial-replacement semantics; ensure completed is set appropriately
per-utterance and retain original=event.
| async def simple_response( | ||
| self, | ||
| text: str, | ||
| processors: Optional[List[Processor]] = None, | ||
| participant: Optional[Participant] = None, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused parameter: participant is not utilized.
The participant parameter is added to simple_response but never used or stored, unlike simple_audio_response which sets self._current_participant. If participant tracking is needed for text inputs, the parameter should be stored; otherwise, consider removing it to avoid API confusion.
Apply this diff if participant tracking is intended:
async def simple_response(
self,
text: str,
processors: Optional[List[Processor]] = None,
participant: Optional[Participant] = None,
):
"""
Simple response standardizes how to send a text instruction to this LLM.
Example:
llm.simple_response("tell me a poem about Boulder")
For more advanced use cases you can use the native send_realtime_input
"""
+ self._current_participant = participant
self.logger.info("Simple response called with text: %s", text)
await self.send_realtime_input(text=text)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py around lines
113 to 118, the new parameter `participant` on simple_response is unused which
causes API confusion; either persist it like simple_audio_response or remove it.
If participant tracking is intended, set self._current_participant = participant
at the start of simple_response (same behavior as simple_audio_response) and
ensure any type checks/uses downstream rely on that field; otherwise delete the
participant parameter from the method signature and update any callers to avoid
passing it.
| if event.get("type") == "session.created" and "session" in event: | ||
| self.session_info = event["session"] | ||
| logger.debug(f"Stored session info: {self.session_info}") | ||
| logger.error(f"Stored session info: {self.session_info}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect log level for informational message.
This logs successful session storage at ERROR level, which will pollute error logs and trigger false alarms in monitoring.
Apply this diff:
- logger.error(f"Stored session info: {self.session_info}")
+ logger.info(f"Stored session info: {self.session_info}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.error(f"Stored session info: {self.session_info}") | |
| logger.info(f"Stored session info: {self.session_info}") |
🤖 Prompt for AI Agents
In plugins/openai/vision_agents/plugins/openai/rtc_manager.py around line 667,
the code logs a successful session storage using logger.error which is the wrong
severity; change the log level to an informational one (e.g., logger.info or
logger.debug) and update the message if needed to reflect normal operation so
successful session info is not treated as an error.
| @pytest.mark.asyncio | ||
| async def test_agent_conversation_updates_with_realtime(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove @pytest.mark.asyncio — async support is automatic per repo guidelines.
Replace with plain async def tests.
-@pytest.mark.asyncio
async def test_agent_conversation_updates_with_realtime():
@@
-@pytest.mark.asyncio
async def test_simple_response_aggregates_and_returns_realtimeresponse():
@@
-@pytest.mark.asyncio
async def test_wait_until_ready_returns_true_immediately():
@@
-@pytest.mark.asyncio
async def test_close_emits_disconnected_event():
@@
-@pytest.mark.asyncio
async def test_noop_video_and_playback_methods_do_not_error():As per coding guidelines.
Also applies to: 266-266, 291-291, 297-297, 317-317, 365-365
🤖 Prompt for AI Agents
In tests/test_realtime_base.py around lines 89-90 (also at lines 266, 291, 297,
317, and 365), remove the leading @pytest.mark.asyncio decorator from each test
function so they become plain async def tests; leave the async def signature
intact and delete only the decorator lines, ensuring no other test code or
imports are changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
plugins/deepgram/tests/test_stt.py (1)
144-189: Add docstring to fixture.The
audio_datafixture is missing a docstring. Other fixtures in this file follow the pattern of documenting their purpose.As per coding guidelines.
Apply this diff:
@pytest.fixture def audio_data(mia_mp3_path): + """Load and prepare the audio data for testing.""" from scipy import signalplugins/openai/vision_agents/plugins/openai/rtc_manager.py (3)
234-241: Don’t stop a shared forwarder.stop() unconditionally schedules self._forwarder.stop(), which can tear down a shared resource used by other tracks.
Apply this guard:
@@ def stop(self): - try: - if self._forwarder is not None: - asyncio.create_task(self._forwarder.stop()) - self._forwarder = None + try: + if self._forwarder is not None: + if self._forwarder is self._shared_forwarder: + logger.debug("Shared VideoForwarder detected; skipping stop()") + else: + asyncio.create_task(self._forwarder.stop()) + self._forwarder = None
319-326: Ensure client_secret.value exists; fail fast with a clear error.If the API changes or returns no client_secret, you may return None and proceed with a bad token.
Apply:
@@ async def _get_session_token(self) -> str: - data: dict = resp.json() - secret = data.get("client_secret", {}) - return secret.get("value") + data: dict = resp.json() + secret = data.get("client_secret") or {} + token = secret.get("value") + if not token: + raise RuntimeError("Missing client_secret.value in session.create response") + return token
470-495: _video_sender may be undefined; initialize in init to avoid AttributeError.start_video_sender uses self._video_sender but it’s not set in init. If called before _set_video_track, this can raise.
Add this in init (outside this hunk):
# in RTCManager.__init__ self._video_sender: Optional[Any] = None # RTCRtpSender; keep Any to avoid extra importAnd keep the existing None checks. Optionally, assert that connect() completed or that _set_video_track() ran.
♻️ Duplicate comments (1)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)
658-658: Log level and redaction for stored session info.Re‑flagging: logging normal session storage at ERROR is wrong; also avoid dumping full session objects (may contain sensitive fields).
Apply:
- logger.error(f"Stored session info: {self.session_info}") + # Avoid logging secrets; log minimal identifiers only + safe_id = self.session_info.get("id") if isinstance(self.session_info, dict) else None + safe_model = self.session_info.get("model") if isinstance(self.session_info, dict) else None + logger.info("Stored session info (id=%s, model=%s)", safe_id, safe_model)
🧹 Nitpick comments (13)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
226-230: Cleaner architecture with local channel variable.The change from class-level
self.channelstorage to a local variable passed toStreamConversationimproves encapsulation and aligns with the updatedStreamConversationconstructor signature shown in the relevant code snippets.Consider adding error handling for the
channel.get_or_create()call to gracefully handle potential channel creation failures:async def create_conversation(self, call: Call, user, instructions): chat_client: ChatClient = call.client.stream.chat channel = chat_client.channel(self.channel_type, call.id) - await channel.get_or_create( - data=ChannelInput(created_by_id=user.id), - ) + try: + await channel.get_or_create( + data=ChannelInput(created_by_id=user.id), + ) + except Exception as e: + self.logger.error(f"Failed to create or retrieve channel: {e}") + raise self.conversation = StreamConversation(instructions, [], channel) return self.conversationplugins/openai/vision_agents/plugins/openai/openai_realtime.py (5)
100-119: Clarify the unused participant parameter.The
participantparameter is documented as ignored, yet the PR introduces per-participant handling throughout this module. If this parameter exists solely for interface parity with the baseRealtimeclass or other providers, that's reasonable—but if there's a plan to use it for correlation (e.g., tracking which participant sent the text), please clarify or open a follow-up issue.Do you want me to open an issue to track potential future use of the
participantparameter insimple_response, or is the current design intentional?
172-193: LGTM—workaround is well-documented.The handling of both
response.audio_transcript.doneandresponse.output_audio_transcript.donevia event mutation is a reasonable workaround for the OpenAI API inconsistency. The GitHub issue reference provides good context.Optionally, enhance the comment to explain what breaks without the workaround:
- # code here is weird because OpenAI does something strange - # see issue: https://github.com/openai/openai-python/issues/2698 - # as a workaround we copy the event and set type to response.output_audio_transcript.done so that - # ResponseAudioTranscriptDoneEvent.model_validate is happy + # OpenAI sends response.audio_transcript.done but the SDK expects response.output_audio_transcript.done. + # Without this workaround, Pydantic validation fails. + # See: https://github.com/openai/openai-python/issues/2698
194-203: Consider clearing_pending_participantafter mapping.If OpenAI doesn't emit a
conversation.item.createdevent for some reason,_pending_participantremains set and could be incorrectly attributed to a subsequent item.Clear the pending participant immediately after establishing the mapping:
if item_id and self._pending_participant: self._item_to_participant[item_id] = self._pending_participant + self._pending_participant = None logger.debug(Also, simplify the logging expression:
- f"Mapped item {item_id} to participant {self._pending_participant.user_id if self._pending_participant else 'None'}" + f"Mapped item {item_id} to participant {self._item_to_participant[item_id].user_id}"
275-331: LGTM—tool call handling is well-structured.The enhanced logging and error handling improve observability. The tool execution flow correctly delegates to
_run_one_tooland handles both success and error cases.Consider making the 30-second timeout configurable rather than hardcoded at line 306:
+ timeout = getattr(self, 'tool_call_timeout', 30) - tc, result, error = await self._run_one_tool(tool_call, timeout_s=30) + tc, result, error = await self._run_one_tool(tool_call, timeout_s=timeout)This allows callers to adjust the timeout for tools with varying execution times.
333-376: Tool response flow is correct.The method correctly sanitizes output, sends the tool response, and triggers a follow-up response with audio. The instructions ensure the AI provides a conversational response after receiving tool results.
The hardcoded instruction at line 370 could be made configurable:
+ follow_up_instructions = getattr( + self, + 'tool_response_instructions', + "Please respond to the user with the tool results in a conversational way." + ) await self.rtc._send_event( { "type": "response.create", "response": { "modalities": ["text", "audio"], - "instructions": "Please respond to the user with the tool results in a conversational way.", + "instructions": follow_up_instructions, }, } )plugins/openai/vision_agents/plugins/openai/rtc_manager.py (7)
45-51: Audio input setter looks good; small micro‑opt note.The copy via bytes(pcm_data) is safe. If perf matters, consider accepting memoryview/bytearray to avoid copies, but fine as-is.
649-653: Remove stray print; use structured logging.print() in async handlers will spam stdout and bypass log routing.
- print(f"_handle_event {event['type']}") + logger.debug("Event received: %s", event.get("type"))
690-762: Forwarding loop is unused and overly chatty at INFO.Given the switch to StreamVideoForwardingTrack, _forward_video_frames seems unused. Either remove it or downgrade INFO logs inside the loop to DEBUG to avoid log noise.
Minimal noise reduction:
- logger.info( + logger.debug( f"🎥 Starting video frame forwarding loop (fps={fps}, interval={interval:.3f}s)" ) - logger.info( + logger.debug( f"🎥 Source track: {type(source_track).__name__}, kind={getattr(source_track, 'kind', 'unknown')}" ) @@ - logger.info( + logger.debug( f"🎥 SUCCESS: Read frame #{frame_count} from user track!" ) - logger.info( + logger.debug( f"🎥 Frame details: {frame.width}x{frame.height}, format={frame.format}, pts={frame.pts}" ) @@ - logger.info( + logger.debug( f"🎥 Video forwarding task cancelled after {frame_count} frames" ) @@ - logger.info( + logger.debug( f"🎥 Video forwarding task ended. Total frames processed: {frame_count}" )
135-135: Minor: drop f‑string; improve message casing.No interpolation here; also “RTCManager” for consistency.
- logger.warning(f"rtc manager already started", stack_info=True) + logger.warning("RTCManager already started", stack_info=True)
125-131: Log level sanity: init messages OK; consider DEBUG for non‑actionable info.Initialization messages with emojis are fine for dev, but might be noisy in prod. Consider DEBUG for the non‑actionable branch.
205-219: Circuit‑breaker logs: keep at ERROR, but include context for triage.Consider adding model/session id to ease correlation when multiple sessions run.
577-585: SDP exchange timeouts look reasonable; consider retry with backoff.Single attempt can be brittle in transient network issues; a short bounded backoff (e.g., 2 tries) can improve resilience.
Also applies to: 597-603
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
agents-core/vision_agents/core/agents/agents.py(21 hunks)plugins/deepgram/pyproject.toml(1 hunks)plugins/deepgram/tests/test_stt.py(7 hunks)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py(3 hunks)plugins/openai/vision_agents/plugins/openai/openai_realtime.py(13 hunks)plugins/openai/vision_agents/plugins/openai/rtc_manager.py(22 hunks)plugins/silero/pyproject.toml(1 hunks)plugins/silero/tests/test_vad.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide
Files:
plugins/openai/vision_agents/plugins/openai/openai_realtime.pyplugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.pyplugins/openai/vision_agents/plugins/openai/rtc_manager.pyagents-core/vision_agents/core/agents/agents.pyplugins/silero/tests/test_vad.pyplugins/deepgram/tests/test_stt.py
🧬 Code graph analysis (5)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (6)
plugins/openai/tests/test_openai_realtime.py (1)
realtime(20-29)plugins/openai/vision_agents/plugins/openai/rtc_manager.py (5)
RTCManager(246-787)connect(278-301)start_video_sender(470-531)stop_video_sender(533-575)_send_event(440-468)agents-core/vision_agents/core/llm/realtime.py (8)
connect(68-68)_emit_connected_event(83-97)simple_audio_response(71-73)_emit_agent_speech_transcription(196-204)_emit_response_event(138-156)_emit_user_speech_transcription(185-194)_watch_video_track(75-77)_stop_watching_video_track(79-81)agents-core/vision_agents/core/edge/types.py (1)
PcmData(37-183)agents-core/vision_agents/core/llm/llm.py (2)
_run_one_tool(226-292)get_available_functions(180-182)agents-core/vision_agents/core/llm/llm_types.py (1)
ToolSchema(64-67)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (3)
agents-core/vision_agents/core/edge/sfu_events.py (14)
user_id(489-493)user_id(856-860)user_id(901-905)user_id(1186-1190)user_id(2071-2075)user_id(2113-2117)track_type(579-583)track_type(1193-1197)track_type(2254-2258)participant(1495-1500)participant(1537-1542)participant(1610-1615)participant(2078-2083)participant(2127-2132)agents-core/vision_agents/core/edge/events.py (1)
TrackAddedEvent(17-22)plugins/getstream/vision_agents/plugins/getstream/stream_conversation.py (1)
StreamConversation(17-353)
agents-core/vision_agents/core/agents/agents.py (6)
agents-core/vision_agents/core/llm/events.py (4)
LLMResponseChunkEvent(90-105)LLMResponseCompletedEvent(109-115)RealtimeUserSpeechTranscriptionEvent(142-147)RealtimeAgentSpeechTranscriptionEvent(151-156)agents-core/vision_agents/core/llm/llm.py (1)
LLM(35-358)plugins/openai/vision_agents/plugins/openai/openai_realtime.py (4)
Realtime(37-463)simple_response(100-119)simple_audio_response(121-136)_watch_video_track(266-270)agents-core/vision_agents/core/llm/realtime.py (3)
Realtime(24-220)simple_audio_response(71-73)_watch_video_track(75-77)agents-core/vision_agents/core/agents/conversation.py (2)
Conversation(67-227)upsert_message(122-210)plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
create_conversation(224-231)create_audio_track(291-294)
plugins/silero/tests/test_vad.py (2)
plugins/deepgram/tests/test_stt.py (3)
mia_mp3_path(126-128)audio_data(144-189)close(61-63)agents-core/vision_agents/core/vad/events.py (1)
VADPartialEvent(44-55)
plugins/deepgram/tests/test_stt.py (3)
plugins/silero/tests/test_vad.py (1)
mia_mp3_path(23-25)plugins/moonshine/tests/test_stt.py (1)
mia_mp3_path(33-35)plugins/deepgram/vision_agents/plugins/deepgram/stt.py (1)
close(180-199)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (17)
plugins/silero/pyproject.toml (1)
39-39: Verify the rationale for swapping audio/vision dependencies.The documented PyTorch/torchvision pairing suggests torchvision 0.20.0 pairs with PyTorch 2.5.0, whereas this project specifies
torch>=2.7.1. While the loose constraint>=0.20.0will pull modern releases compatible with PyTorch 2.7.x, the substitution oftorchaudio(audio processing) withtorchvision(computer vision) is atypical for a Silero text-to-speech plugin.Since this is a dev dependency, the risk is lower, but the purpose of this swap should be clarified—is this intentional or a mistake?
plugins/deepgram/pyproject.toml (1)
42-42: Clarify the coordinated dependency swap across plugins.This mirrors the change in
plugins/silero/pyproject.toml, yet substitutingtorchvision(computer vision) for audio-processing libraries seems incongruent with Deepgram's STT (speech-to-text) focus. The PR summary indicates these are "broader environment adjustments," but the functional purpose is unclear.Please document why
torchvision>=0.20.0is now required in dev environments for audio-centric plugins, or verify that this is an unintended side effect of a larger refactoring.plugins/deepgram/tests/test_stt.py (2)
456-459: LGTM!The multiline decorator formatting improves readability while preserving the same functional behavior. This pattern is consistently applied across multiple test functions.
165-165: LGTM!Explicit cleanup of the PyAV container is good practice for resource management.
plugins/silero/tests/test_vad.py (2)
47-74: LGTM: Audio conversion logic is correct.The PyAV-based mp3 to wav conversion properly handles:
- Stereo to mono conversion via channel averaging
- Normalization from int16/int32 to float32 in the [-1.0, 1.0] range required by soundfile
- Frame assembly and concatenation
This implementation mirrors the pattern in
plugins/deepgram/tests/test_stt.py(lines 143-188).
474-486: LGTM: Logging format refactored for brevity.The duration calculation has been inlined from multiline blocks into single expressions. Functionality is preserved—the ternary operator still safely handles
Nonevalues forevent.duration_ms.plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
152-163: Agent track filtering logic is correct.The conditional filtering to prevent emitting
TrackAddedEventfor the agent's own tracks is appropriate and prevents the agent from processing its own audio/video as remote participant input.
107-107: ****The concern about timing is not valid. While
_on_track_publishedis registered as an event handler during__init__(line 72), event firing and handler registration are distinct operations. Track published events only fire during an active call afterjoin()is called, which occurs aftercreate_user()has already setself.agent_user_id. All usage patterns in the codebase enforce this sequence:await agent.create_user()→await agent.join(call). Therefore,self.agent_user_idis guaranteed to be non-None when the event handler fires.Likely an incorrect or invalid review comment.
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (2)
121-136: Participant tracking logic is correct, pending concurrency fix.The approach of setting
_pending_participantbefore sending audio is sound for correlating subsequent OpenAI conversation items with the correct user. However, this depends on the thread safety improvements suggested for lines 68-70 to avoid race conditions when multiple participants send audio concurrently.
427-462: LGTM—tool registration flow is sound.The method correctly retrieves available functions, converts them to OpenAI's format, and registers them with the realtime session. The logging provides good visibility, and error handling ensures registration failures don't break the connection.
agents-core/vision_agents/core/agents/agents.py (6)
4-4: LGTM—imports support the new unified conversation API.The added imports (uuid, llm_events, RealtimeUserSpeechTranscriptionEvent, RealtimeAgentSpeechTranscriptionEvent, STTTranscriptEvent, Conversation) are all used in the new setup hooks and align with the PR's unified messaging objective.
Also applies to: 20-20, 24-25, 32-32, 39-39
305-329: LGTM—turn detection integration is clean.The handler correctly accumulates transcripts when turn detection is enabled and triggers the LLM immediately otherwise. The logic for buffering per-user transcripts and deferring response until turn end is sound.
343-349: LGTM—deferred conversation creation ensures correct initialization order.Creating the conversation coroutine early (Line 344) but awaiting it after MCP and edge setup (Line 404) ensures event handlers are wired before the conversation exists. This prevents missed events during initialization.
Also applies to: 403-404
448-451: LGTM—minor formatting and logging improvements.These changes add clearer logging, improve code organization with blank lines, and pass the
shared_forwarderparameter consistently. All are refinements without functional impact.Also applies to: 719-721, 728-731, 739-744, 749-751, 758-760, 769-772, 774-777
593-624: LGTM—say method now syncs with conversation.The addition of
conversation.upsert_message(Lines 617-624) correctly records say() calls as completed assistant messages. The user_id fallback chain (Line 621) provides appropriate defaults.
817-817: LGTM—formatting and logging cleanup.These changes remove unnecessary blank lines, improve log messages, and clean up docstrings without altering behavior.
Also applies to: 842-842, 848-865, 895-895, 1025-1027
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)
342-355: API schema verified—session.update payload is correct.The Realtime API accepts session.update with turn_detection type "semantic_vad" and input_audio_transcription model "whisper-1". The code at lines 342–355 uses the correct field structure; no changes needed.
| async def _setup_llm_events(self): | ||
| @self.llm.events.subscribe | ||
| async def on_llm_response_send_to_tts(event: LLMResponseCompletedEvent): | ||
| # Trigger TTS directly instead of through event system | ||
| if self.tts and event.text and event.text.strip(): | ||
| await self.tts.send(event.text) | ||
|
|
||
| @self.llm.events.subscribe | ||
| async def on_llm_response_sync_conversation(event: LLMResponseCompletedEvent): | ||
| self.logger.info(f"🤖 [LLM response]: {event.text} {event.item_id}") | ||
|
|
||
| if self.conversation is None: | ||
| return | ||
|
|
||
| # Unified API: handles both streaming and non-streaming | ||
| await self.conversation.upsert_message( | ||
| message_id=event.item_id, | ||
| role="assistant", | ||
| user_id=self.agent_user.id or "agent", | ||
| content=event.text or "", | ||
| completed=True, | ||
| replace=True, # Replace any partial content from deltas | ||
| ) | ||
|
|
||
| @self.llm.events.subscribe | ||
| async def _handle_output_text_delta(event: LLMResponseChunkEvent): | ||
| """Handle partial LLM response text deltas.""" | ||
|
|
||
| self.logger.info( | ||
| f"🤖 [LLM delta response]: {event.delta} {event.item_id} {event.content_index}" | ||
| ) | ||
|
|
||
| if self.conversation is None: | ||
| return | ||
|
|
||
| # Unified API: streaming delta | ||
| await self.conversation.upsert_message( | ||
| message_id=event.item_id, | ||
| role="assistant", | ||
| user_id=self.agent_user.id or "agent", | ||
| content=event.delta or "", | ||
| content_index=event.content_index, | ||
| completed=False, # Still streaming | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify item_id is consistently emitted across all LLM providers.
The upsert logic assumes event.item_id is set for both deltas (Line 238) and completions (Line 217). If any provider emits None or omits item_id, each delta creates a new message rather than updating the same message, fragmenting the conversation.
The past review comment indicates this was addressed in commits 39a821d–6fd13c6, but the event type definitions still show item_id: Optional[str]. Run this verification to confirm all active providers populate item_id:
🏁 Script executed:
#!/bin/bash
# Check that all LLM provider plugins emit item_id on chunk and completion events
rg -nP --type=py -C3 'LLMResponse(ChunkEvent|CompletedEvent)\s*\(' | rg -C3 'item_id'Length of output: 2717
[/script]
🏁 Script executed:
#!/bin/bash
# Find Anthropic and xAI completion events
echo "=== Anthropic completion events ==="
rg -nP --type=py 'LLMResponseCompletedEvent' plugins/anthropic/ -A5 | grep -E '(item_id|LLMResponseCompletedEvent)'
echo ""
echo "=== xAI completion events ==="
rg -nP --type=py 'LLMResponseCompletedEvent' plugins/xai/ -A5 | grep -E '(item_id|LLMResponseCompletedEvent)'
echo ""
echo "=== OpenAI chunk events ==="
rg -nP --type=py 'LLMResponseChunkEvent' plugins/openai/ -A5 | grep -E '(item_id|LLMResponseChunkEvent)'
echo ""
echo "=== Gemini item_id variable source ==="
rg -nB10 'self.events.send\(LLMResponseChunkEvent' plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py | grep -E '(item_id\s*=|def |class )'
echo ""
echo "=== upsert_message implementation ==="
rg -nP --type=py 'def upsert_message' agents-core/vision_agents/ -A20 | head -40Length of output: 3722
[/script]
🏁 Script executed:
#!/bin/bash
# Get full xAI completion event context
echo "=== xAI completion events (full context) ==="
rg -nP --type=py 'self.events.send\(LLMResponseCompletedEvent' plugins/xai/ -A8
echo ""
echo "=== Gemini completion events ==="
rg -nP --type=py 'LLMResponseCompletedEvent' plugins/gemini/ -B2 -A8
echo ""
echo "=== Anthropic completion event full code ==="
rg -nP --type=py 'self.events.send\(LLMResponseCompletedEvent' plugins/anthropic/ -B5 -A5Length of output: 6406
[/script]
🏁 Script executed:
#!/bin/bash
# Get full upsert_message implementation to confirm UUID generation
rg -nP --type=py 'def upsert_message' agents-core/vision_agents/ -A50 | head -70Length of output: 4974
Anthropic and xAI LLM providers must emit item_id in completion events.
Verification confirms the streaming consolidation in agents.py (lines 201-244) is broken for two providers:
- Anthropic (
plugins/anthropic/.../anthropic_llm.py:295):LLMResponseCompletedEventomitsitem_identirely - xAI (
plugins/xai/.../llm.py:142, 193):LLMResponseCompletedEventomitsitem_identirely
Both emit chunks with item_id="" (empty string) but completions with no item_id parameter, causing event.item_id = None. When passed to upsert_message() at agents.py line 217, this triggers auto-generation of a new UUID (conversation.py:161), fragmenting the message instead of consolidating deltas.
Gemini and OpenAI correctly emit item_id on both chunks and completions.
Required fixes:
- Add
item_idparameter to Anthropic completion event (line 295) - Add
item_idparameter to both xAI completion events (lines 142 and 193)
| async def _setup_speech_events(self): | ||
| @self.events.subscribe | ||
| async def on_stt_transcript_event_sync_conversation(event: STTTranscriptEvent): | ||
| self.logger.info(f"🎤 [Transcript]: {event.text}") | ||
|
|
||
| if self.conversation is None: | ||
| return | ||
|
|
||
| user_id = event.user_id() or "user" | ||
|
|
||
| await self.conversation.upsert_message( | ||
| message_id=str(uuid.uuid4()), | ||
| role="user", | ||
| user_id=user_id, | ||
| content=event.text or "", | ||
| completed=True, | ||
| replace=True, # Replace any partial transcripts | ||
| original=event, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace logic is inconsistent—remove replace=True for completed, unique transcripts.
Lines 257 and 262 generate a unique message_id per STT transcript but set replace=True. Since each utterance gets a fresh UUID, there is nothing to replace—replace=True has no effect here and signals confused intent.
If transcripts are complete utterances (not streaming updates), use replace=False. If you intend to stream partial transcripts and later replace them with final text, reuse the same message_id for the partial and final versions.
Apply this diff to fix the STT transcript handler:
await self.conversation.upsert_message(
message_id=str(uuid.uuid4()),
role="user",
user_id=user_id,
content=event.text or "",
completed=True,
- replace=True, # Replace any partial transcripts
+ replace=False, # Each transcript is a distinct utterance
original=event,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def _setup_speech_events(self): | |
| @self.events.subscribe | |
| async def on_stt_transcript_event_sync_conversation(event: STTTranscriptEvent): | |
| self.logger.info(f"🎤 [Transcript]: {event.text}") | |
| if self.conversation is None: | |
| return | |
| user_id = event.user_id() or "user" | |
| await self.conversation.upsert_message( | |
| message_id=str(uuid.uuid4()), | |
| role="user", | |
| user_id=user_id, | |
| content=event.text or "", | |
| completed=True, | |
| replace=True, # Replace any partial transcripts | |
| original=event, | |
| ) | |
| async def _setup_speech_events(self): | |
| @self.events.subscribe | |
| async def on_stt_transcript_event_sync_conversation(event: STTTranscriptEvent): | |
| self.logger.info(f"🎤 [Transcript]: {event.text}") | |
| if self.conversation is None: | |
| return | |
| user_id = event.user_id() or "user" | |
| await self.conversation.upsert_message( | |
| message_id=str(uuid.uuid4()), | |
| role="user", | |
| user_id=user_id, | |
| content=event.text or "", | |
| completed=True, | |
| replace=False, # Each transcript is a distinct utterance | |
| original=event, | |
| ) |
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 246 to 264, the
STT transcript handler generates a new UUID for each transcript but sets
replace=True which is ineffective and misleading; change the call to
upsert_message to use replace=False for these completed, unique transcripts (or
if you intend streaming partial→final transcripts, reuse the same message_id for
partial and final and only set replace=True when writing the final replacement).
| async def on_realtime_user_speech_transcription( | ||
| event: RealtimeUserSpeechTranscriptionEvent, | ||
| ): | ||
| self.logger.info(f"🎤 [User transcript]: {event.text}") | ||
|
|
||
| if self.conversation is None or not event.text: | ||
| return | ||
|
|
||
| await self.conversation.upsert_message( | ||
| message_id=str(uuid.uuid4()), | ||
| role="user", | ||
| user_id=event.user_id() or "", | ||
| content=event.text, | ||
| completed=True, | ||
| replace=True, | ||
| original=event, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace logic is inconsistent—remove replace=True for completed, unique realtime user transcripts.
Lines 276 and 281 generate a unique message_id per realtime user transcript but set replace=True. Since each transcription gets a fresh UUID, there is nothing to replace.
Apply this diff:
await self.conversation.upsert_message(
message_id=str(uuid.uuid4()),
role="user",
user_id=event.user_id() or "",
content=event.text,
completed=True,
- replace=True,
+ replace=False,
original=event,
)🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 267 to 283, the
handler creates a fresh UUID for each realtime user transcript but still passes
replace=True to upsert_message, which is inconsistent; remove the replace=True
argument (or set replace=False/omit it) when calling
self.conversation.upsert_message for these completed, unique realtime
transcripts so each new message is inserted instead of attempting to replace a
non-existent prior entry.
| async def on_realtime_agent_speech_transcription( | ||
| event: RealtimeAgentSpeechTranscriptionEvent, | ||
| ): | ||
| self.logger.info(f"🎤 [Agent transcript]: {event.text}") | ||
|
|
||
| if self.conversation is None or not event.text: | ||
| return | ||
|
|
||
| await self.conversation.upsert_message( | ||
| message_id=str(uuid.uuid4()), | ||
| role="assistant", | ||
| user_id=self.agent_user.id or "", | ||
| content=event.text, | ||
| completed=True, | ||
| replace=True, | ||
| original=event, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace logic is inconsistent—remove replace=True for completed, unique realtime agent transcripts.
Lines 295 and 300 generate a unique message_id per realtime agent transcript but set replace=True. This is semantically incorrect for the same reasons as the user transcript handler.
Apply this diff:
await self.conversation.upsert_message(
message_id=str(uuid.uuid4()),
role="assistant",
user_id=self.agent_user.id or "",
content=event.text,
completed=True,
- replace=True,
+ replace=False,
original=event,
)🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 286 to 302, the
realtime agent transcript handler creates a unique message_id for each event but
incorrectly passes replace=True; remove the replace=True flag (or set
replace=False/omit it) when calling conversation.upsert_message for completed
realtime agent transcripts so each transcript is stored as a distinct message
rather than attempting to replace an existing one.
| ListenV1ResultsMetadata, | ||
| ) | ||
| from getstream.video.rtc.track_util import PcmData | ||
| from torchvision.io.video import av |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: torchvision video decoding is deprecated and removed.
Importing av from torchvision.io.video will fail on torchvision 0.24.0+ as video decoding/encoding capabilities have been removed. Install and import PyAV directly instead.
Based on learnings.
Apply this diff:
-from torchvision.io.video import av
+import avThen ensure PyAV is installed:
pip install av🤖 Prompt for AI Agents
In plugins/deepgram/tests/test_stt.py around line 18, importing av via "from
torchvision.io.video import av" is deprecated and removed; replace that import
with a direct PyAV import ("import av") and update project test/dev dependencies
or CI to ensure the "av" package is installed (e.g., add to requirements/dev or
test extras and run pip install av) so tests can import PyAV successfully.
| elif et == "input_audio_buffer.speech_started": | ||
| # Validate event but don't need to store it | ||
| InputAudioBufferSpeechStartedEvent.model_validate(event) | ||
| await self.output_track.flush() | ||
| # await self.output_track.flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Remove or document commented-out code.
Line 236 has # await self.output_track.flush() without explanation. Either remove it if obsolete, or add a comment explaining why it's disabled and whether it should be re-enabled in the future.
If the flush is intentionally disabled:
- # await self.output_track.flush()
+ # TODO: Investigate if output_track.flush() is needed when user starts speaking
+ # Disabled temporarily due to [reason]Otherwise, remove the line entirely.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In plugins/openai/vision_agents/plugins/openai/openai_realtime.py around lines
233 to 236, there's a commented-out call "# await self.output_track.flush()"
with no explanation; either remove this dead code if it's obsolete, or replace
the comment with a short explanatory comment stating why flush is disabled
(e.g., causes latency/duplicate events, handled elsewhere) and whether/when it
should be re-enabled; if flush must be conditionally run, implement the small
conditional (e.g., a flag or feature toggle) and document the intended behavior
and side effects in that comment so future readers know why it's not awaited
here.
| async def start(self) -> None: | ||
| if self._started: | ||
| logger.warning(f"rtc manager already started", stack_info=True) | ||
| return | ||
|
|
||
| if self._shared_forwarder is not None: | ||
| # Use the shared forwarder | ||
| self._forwarder = self._shared_forwarder | ||
| logger.info(f"🎥 OpenAI using shared VideoForwarder at {self._fps} FPS") | ||
| else: | ||
| # Create our own VideoForwarder with the input source track (legacy behavior) | ||
| self._forwarder = VideoForwarder(self._source_track, max_buffer=5, fps=self._fps) # type: ignore[arg-type] | ||
| await self._forwarder.start() | ||
|
|
||
|
|
||
| if self._shared_forwarder is None: | ||
| raise RuntimeError("self._shared_forwarder is None, something is very wrong") | ||
|
|
||
| self._forwarder = self._shared_forwarder | ||
| logger.info(f"🎥 OpenAI using shared VideoForwarder at {self._fps} FPS") | ||
| self._started = True | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start() hard‑requires a shared_forwarder but the error message and upstream contract are unclear.
Right now, start() raises RuntimeError when self._shared_forwarder is None, but start_video_sender still treats shared_forwarder as “optional.” Align the contract: either require it explicitly (and validate earlier) or create/own a forwarder here.
Apply one of the following:
Option A — enforce requirement early in start_video_sender (preferred):
@@ async def start_video_sender(self, stream_video_track: MediaStreamTrack, fps: int = 1, shared_forwarder=None) -> None:
- Args:
- stream_video_track: Video track to forward to OpenAI.
- fps: Target frames per second.
- shared_forwarder: Optional shared VideoForwarder to use instead of creating a new one.
+ Args:
+ stream_video_track: Video track to forward to OpenAI.
+ fps: Target frames per second.
+ shared_forwarder: Required. Active VideoForwarder instance to pull frames from.
@@
- try:
+ try:
+ if shared_forwarder is None:
+ logger.error("❌ shared_forwarder is required; pass the active VideoForwarder instance")
+ raise ValueError("shared_forwarder is required for video forwarding")Option B — keep “optional” param and create your own forwarder here, then let start() accept None and create it:
@@ class StreamVideoForwardingTrack(VideoStreamTrack):
- def __init__(self, source_track: MediaStreamTrack, fps: int = 1, shared_forwarder=None):
+ def __init__(self, source_track: MediaStreamTrack, fps: int = 1, shared_forwarder=None):
...
self._forwarder: Optional[VideoForwarder] = None
self._shared_forwarder = shared_forwarder
+ self._owns_forwarder = False
@@
- if self._shared_forwarder is None:
- raise RuntimeError("self._shared_forwarder is None, something is very wrong")
- self._forwarder = self._shared_forwarder
+ if self._shared_forwarder is None:
+ self._forwarder = VideoForwarder()
+ self._owns_forwarder = True
+ else:
+ self._forwarder = self._shared_forwarderCommittable suggestion skipped: line range outside the PR's diff.
| await asyncio.wait_for( | ||
| self._data_channel_open_event.wait(), timeout=5.0 | ||
| ) | ||
| except asyncio.TimeoutError: | ||
| logger.warning("Data channel not open after timeout; dropping event") | ||
| logger.warning( | ||
| "Data channel not open after timeout; dropping event" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the data channel isn’t open, return early; don’t attempt send.
We warn on non‑open readyState but still call send(), causing noisy errors.
Apply:
@@ async def _send_event(self, event: dict):
- if not self._data_channel_open_event.is_set():
+ if not self._data_channel_open_event.is_set():
try:
await asyncio.wait_for(
self._data_channel_open_event.wait(), timeout=5.0
)
except asyncio.TimeoutError:
logger.warning(
"Data channel not open after timeout; dropping event"
)
return
@@
- if self.data_channel.readyState and self.data_channel.readyState != "open":
- logger.warning(
- f"Data channel state is '{self.data_channel.readyState}', cannot send event"
- )
+ if self.data_channel.readyState and self.data_channel.readyState != "open":
+ logger.warning(
+ f"Data channel state is '{self.data_channel.readyState}', dropping event"
+ )
+ returnAlso applies to: 461-462
🤖 Prompt for AI Agents
In plugins/openai/vision_agents/plugins/openai/rtc_manager.py around lines
450-456 (and similarly 461-462), the code waits for the data channel open event
and logs a warning on timeout but still proceeds to call send(), causing noisy
errors; modify the flow so that after catching asyncio.TimeoutError you return
early (or skip the send) instead of continuing, and also add an explicit check
of data_channel.readyState (ensure it equals "open") before calling send() to
avoid attempts to send on a non-open channel.
| forwarding_track = StreamVideoForwardingTrack( | ||
| stream_video_track, fps, shared_forwarder=shared_forwarder | ||
| ) | ||
| await forwarding_track.start() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start_video_sender assumes shared_forwarder; align docstring + validation.
Docstring says “Optional shared VideoForwarder,” but the code now requires it (forwarding_track.start() will raise). Update the docstring and add an explicit check with a helpful error, or implement the “optional” path (create your own forwarder). See earlier comment with diffs.
🤖 Prompt for AI Agents
In plugins/openai/vision_agents/plugins/openai/rtc_manager.py around lines 507
to 511, the start_video_sender implementation assumes a shared_forwarder is
provided but the docstring says it's optional; update either the docstring to
mark shared_forwarder as required or add explicit validation and fallback
creation. Fix by either (a) changing the docstring to state shared_forwarder is
required and raise a clear ValueError if shared_forwarder is None before
creating StreamVideoForwardingTrack, or (b) implement the optional path: if
shared_forwarder is None instantiate a local VideoForwarder (with appropriate
init params), pass it into StreamVideoForwardingTrack, and ensure it is cleaned
up after use; include a helpful error message if creation fails.
| import pytest | ||
| import soundfile as sf | ||
| import torchaudio | ||
| from torchvision.io.video import av |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect import: PyAV should be imported directly, not from torchvision.
Importing av from torchvision.io.video will fail. Based on learnings, torchvision 0.24.0 removed deprecated video decoding/encoding capabilities, and the video_reader backend was removed. The correct approach is to import PyAV directly: import av.
Apply this diff to fix the import:
-from torchvision.io.video import av
+import avThis matches the pattern used in plugins/deepgram/tests/test_stt.py (line 125-188), where PyAV is imported directly and used for audio extraction.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from torchvision.io.video import av | |
| import av |
🤖 Prompt for AI Agents
In plugins/silero/tests/test_vad.py around line 9, the test incorrectly imports
PyAV via torchvision (from torchvision.io.video import av); replace that import
with a direct PyAV import (import av) so the test uses the av package directly;
update any references if necessary to match the direct import and run the tests
to confirm audio extraction works as in plugins/deepgram/tests/test_stt.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (4)
472-533: Docstring still claims shared_forwarder is "Optional" but code requires it.Line 483 documents
shared_forwarderas "Optional," but Line 512 callsforwarding_track.start(), which will raiseRuntimeErrorifshared_forwarderisNone(see Lines 138-141).Update the docstring to mark
shared_forwarderas required and add explicit validation before Line 509, or implement the optional path by creating a localVideoForwarderwhenNoneis passed. See the earlier review comment for detailed diffs.
138-145: Hard requirement on shared_forwarder conflicts with "optional" upstream parameter.The
RuntimeErroron Line 139 enforces thatshared_forwardermust be non-None, but__init__(Line 100) acceptsNoneby default andstart_video_sender(Line 483) documents it as "Optional." This contract mismatch will cause runtime crashes if callers rely on the "optional" declaration.Refer to the previous review comment suggesting either (A) validate and raise a clear
ValueErrorearly instart_video_sender, or (B) create a local forwarder when None is passed.
461-464: Data channel readyState check still missing early return.After logging the warning that the data channel state is not "open," the code proceeds to attempt
send()on Line 467, which will fail noisily.Apply this diff to return early:
if self.data_channel.readyState and self.data_channel.readyState != "open": logger.warning( - f"Data channel state is '{self.data_channel.readyState}', cannot send event" + f"Data channel state is '{self.data_channel.readyState}', dropping event" ) + return
660-660: Successful session storage logged at ERROR level.Logging successful session storage with
logger.errorpollutes error logs and will trigger false alarms in monitoring systems.Apply this diff:
- logger.error(f"Stored session info: {self.session_info}") + logger.info(f"Stored session info: {self.session_info}")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py(22 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide
Files:
plugins/openai/vision_agents/plugins/openai/rtc_manager.py
🔇 Additional comments (4)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (4)
96-132: LGTM—formatting improvements enhance readability.The explicit float literals (
0.0instead of0.) and multiline logging blocks improve code clarity without altering behavior.
147-245: LGTM—multiline logging improves readability.The frame forwarding logic, error handling, and circuit breaker behavior are preserved; only log formatting has changed.
692-763: LGTM—multiline logging enhances readability.The frame forwarding logic, timeout handling, and throttling remain intact; only log formatting has been improved.
344-357: The code change is correct. Input audio transcription events are intentionally handled downstream.The script confirms that transcription events are explicitly handled in openai_realtime.py, with event handler for "conversation.item.input_audio_transcription.completed" at line 204. Multiple tests verify this functionality works correctly, and the feature is documented in the class docstring. The session update in rtc_manager.py enabling transcription is an intentional design choice with full downstream support.
|
|
||
| async def _handle_event(self, event: dict) -> None: | ||
| """Minimal event handler for data channel messages.""" | ||
| print(f"_handle_event {event['type']}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug print statement from production code.
The print call bypasses the logging framework and will clutter output. Production code should use the logger exclusively.
Apply this diff:
- print(f"_handle_event {event['type']}")
+ logger.debug(f"Handling event: {event['type']}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"_handle_event {event['type']}") | |
| logger.debug(f"Handling event: {event['type']}") |
🤖 Prompt for AI Agents
In plugins/openai/vision_agents/plugins/openai/rtc_manager.py around line 651,
remove the debug print statement `print(f"_handle_event {event['type']}")` and
replace it with a logger call (e.g., logger.debug or logger.info) that emits the
event type via the module logger; ensure the module has a configured logger
(logging.getLogger(__name__)) and use structured/templated logging (including
event type and minimal context) instead of direct print so output goes through
the logging framework.
commit ec32383 Author: Neevash Ramdial (Nash) <mail@neevash.dev> Date: Mon Oct 27 15:51:53 2025 -0600 mypy clean up (GetStream#130) commit c52fe4c Author: Neevash Ramdial (Nash) <mail@neevash.dev> Date: Mon Oct 27 15:28:00 2025 -0600 remove turn keeping from example (GetStream#129) commit e1072e8 Merge: 5bcffa3 fea101a Author: Yarik <43354956+yarikdevcom@users.noreply.github.com> Date: Mon Oct 27 14:28:05 2025 +0100 Merge pull request GetStream#106 from tjirab/feat/20251017_gh-labeler feat: Github pull request labeler commit 5bcffa3 Merge: 406673c bfe888f Author: Thierry Schellenbach <thierry@getstream.io> Date: Sat Oct 25 10:56:27 2025 -0600 Merge pull request GetStream#119 from GetStream/fix-screensharing Fix screensharing commit bfe888f Merge: 8019c14 406673c Author: Thierry Schellenbach <thierry@getstream.io> Date: Sat Oct 25 10:56:15 2025 -0600 Merge branch 'main' into fix-screensharing commit 406673c Author: Stefan Blos <stefan.blos@gmail.com> Date: Sat Oct 25 03:03:10 2025 +0200 Update README (GetStream#118) * Changed README to LaRaes version * Remove arrows from table * Add table with people & projects to follow * Update images and links in README.md commit 3316908 Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Fri Oct 24 23:48:06 2025 +0200 Simplify TTS plugin and audio utils (GetStream#123) - Simplified TTS plugin - AWS Polly TTS plugin - OpenAI TTS plugin - Improved audio utils commit 8019c14 Author: Max Kahan <max.kahan@getstream.io> Date: Fri Oct 24 17:32:26 2025 +0100 remove video forwarder lazy init commit ca62d37 Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 16:44:03 2025 +0100 use correct codec commit 8cf8788 Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 14:27:18 2025 +0100 rename variable to fix convention commit 33fd70d Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 14:24:42 2025 +0100 unsubscribe from events commit 3692131 Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 14:19:53 2025 +0100 remove nonexistent type commit c5f68fe Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 14:10:07 2025 +0100 cleanup tests to fit style commit 8b3c61a Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 13:55:08 2025 +0100 clean up resources when track cancelled commit d8e08cb Author: Max Kahan <max.kahan@getstream.io> Date: Thu Oct 23 13:24:55 2025 +0100 fix track republishing in agent commit 0f8e116 Author: Max Kahan <max.kahan@getstream.io> Date: Wed Oct 22 15:37:11 2025 +0100 add tests commit 08e6133 Author: Max Kahan <max.kahan@getstream.io> Date: Wed Oct 22 15:25:37 2025 +0100 ensure video track dimensions are an even number commit 6a725b0 Merge: 5f001e0 5088709 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 15:23:58 2025 -0600 Merge pull request GetStream#122 from GetStream/cleanup_stt Cleanup STT commit 5088709 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 15:23:34 2025 -0600 cleanup of stt commit f185120 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 15:08:42 2025 -0600 more cleanup commit 05ccbfd Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 14:51:48 2025 -0600 cleanup commit bb834ca Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 14:28:53 2025 -0600 more cleanup for stt commit 7a3f2d2 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 14:11:35 2025 -0600 more test cleanup commit ad7f4fe Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 14:10:57 2025 -0600 cleanup test commit 9e50cdd Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 14:03:45 2025 -0600 large cleanup commit 5f001e0 Merge: 95a03e4 5d204f3 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 12:01:52 2025 -0600 Merge pull request GetStream#121 from GetStream/fish_stt [AI-201] Fish speech to text (partial) commit 5d204f3 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 11:48:16 2025 -0600 remove ugly tests commit ee9a241 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 11:46:19 2025 -0600 cleanup commit 6eb8270 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 11:23:00 2025 -0600 fix 48khz support commit 3b90548 Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 23 10:59:08 2025 -0600 first attempt at fish stt, doesnt entirely work just yet commit 95a03e4 Merge: b90c9e3 b4c0da8 Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Thu Oct 23 10:11:39 2025 +0200 Merge branch 'main' of github.com:GetStream/Vision-Agents commit b90c9e3 Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Wed Oct 22 23:28:28 2025 +0200 remove print and double event handling commit b4c0da8 Merge: 3d06446 a426bc2 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 15:08:51 2025 -0600 Merge pull request GetStream#117 from GetStream/openrouter [AI-194] Openrouter commit a426bc2 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 15:03:10 2025 -0600 skip broken test commit ba6c027 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 14:50:23 2025 -0600 almost working openrouter commit 0b1c873 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 14:47:12 2025 -0600 almost working, just no instruction following commit ce63233 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 14:35:53 2025 -0600 working memory for openai commit 149e886 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 13:32:43 2025 -0600 todo commit e0df1f6 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 13:20:38 2025 -0600 first pass at adding openrouter commit 3d06446 Merge: 4eb8ef4 ef55d66 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 13:20:11 2025 -0600 Merge branch 'main' of github.com:GetStream/Vision-Agents commit 4eb8ef4 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 13:20:01 2025 -0600 cleanup ai plugin instructions commit ef55d66 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 22 12:54:33 2025 -0600 Add link to stash_pomichter for spatial memory commit 9c9737f Merge: c954409 390c45b Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 19:45:09 2025 -0600 Merge pull request GetStream#115 from GetStream/fish [AI-195] Fish support commit 390c45b Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 19:44:37 2025 -0600 cleannup commit 1cc1cf1 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 19:42:03 2025 -0600 happy tests commit 8163d32 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 19:39:21 2025 -0600 fix gemini rule following commit ada3ac9 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 19:20:18 2025 -0600 fish tts commit 61a26cf Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 16:44:03 2025 -0600 attempt at fish commit c954409 Merge: ab27e48 c71da10 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 14:18:15 2025 -0600 Merge pull request GetStream#104 from GetStream/bedrock [AI-192] - Bedrock, AWS & Nova commit c71da10 Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Tue Oct 21 22:00:25 2025 +0200 maybe commit b5482da Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Tue Oct 21 21:46:15 2025 +0200 debugging commit 9a36e45 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 13:14:58 2025 -0600 echo environment name commit 6893968 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 12:53:58 2025 -0600 more debugging commit c35fc47 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 12:45:44 2025 -0600 add some debug info commit 0d6d3fd Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 12:03:13 2025 -0600 run test fix commit c3a31bd Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 11:52:25 2025 -0600 log cache hit commit 04554ae Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 11:48:03 2025 -0600 fix glob commit 7da96db Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 11:33:56 2025 -0600 mypy commit 186053f Merge: 4b540c9 ab27e48 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 11:17:17 2025 -0600 happy tests commit 4b540c9 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 10:20:04 2025 -0600 happy tests commit b05a60a Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 09:17:45 2025 -0600 add readme commit 71affcc Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 21 09:13:01 2025 -0600 rename to aws commit d2eeba7 Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 21:32:01 2025 -0600 ai tts instructions commit 98a4f9d Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 16:49:00 2025 -0600 small edits commit ab27e48 Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Mon Oct 20 21:42:04 2025 +0200 Ensure user agent is initialized before joining the call (GetStream#113) * ensure user agent is initialized before joining the call * wip commit 3cb339b Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Mon Oct 20 21:22:57 2025 +0200 New conversation API (GetStream#102) * trying to resurrect * test transcription events for openai * more tests for openai and gemini llm * more tests for openai and gemini llm * update py-client * wip * ruff * wip * ruff * snap * another way * another way, a better way * ruff * ruff * rev * ruffit * mypy everything * brief * tests * openai dep bump * snap - broken * nothingfuckingworks * message id * fix test * ruffit commit cb6f00a Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 13:18:03 2025 -0600 use qwen commit f84b2ad Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 13:02:24 2025 -0600 fix tests commit e61acca Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 12:50:40 2025 -0600 testing and linting commit 5f4d353 Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 12:34:14 2025 -0600 working commit c2a15a9 Merge: a310771 1025a42 Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 11:40:00 2025 -0600 Merge branch 'main' of github.com:GetStream/Vision-Agents into bedrock commit a310771 Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 11:39:48 2025 -0600 wip commit b4370f4 Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 11:22:43 2025 -0600 something isn't quite working commit 2dac975 Author: Thierry Schellenbach <thierry@getstream.io> Date: Mon Oct 20 10:30:04 2025 -0600 add the examples commit 6885289 Author: Thierry Schellenbach <thierry@getstream.io> Date: Sun Oct 19 20:19:42 2025 -0600 ai realtime docs commit a0fa3cc Author: Thierry Schellenbach <thierry@getstream.io> Date: Sun Oct 19 18:48:06 2025 -0600 wip commit b914fc3 Author: Thierry Schellenbach <thierry@getstream.io> Date: Sun Oct 19 18:40:22 2025 -0600 fix ai llm commit b5b00a7 Author: Thierry Schellenbach <thierry@getstream.io> Date: Sun Oct 19 17:11:26 2025 -0600 work audio input commit ac72260 Author: Thierry Schellenbach <thierry@getstream.io> Date: Sun Oct 19 16:47:19 2025 -0600 fix model id commit 2b5863c Author: Thierry Schellenbach <thierry@getstream.io> Date: Sun Oct 19 16:32:54 2025 -0600 wip on bedrock commit 8bb4162 Author: Thierry Schellenbach <thierry@getstream.io> Date: Fri Oct 17 15:22:03 2025 -0600 next up the connect method commit 7a21e4e Author: Thierry Schellenbach <thierry@getstream.io> Date: Fri Oct 17 14:12:00 2025 -0600 nova progress commit 16e8ba0 Author: Thierry Schellenbach <thierry@getstream.io> Date: Fri Oct 17 13:16:00 2025 -0600 docs for bedrock nova commit 1025a42 Author: Bart Schuijt <schuijt.bart@gmail.com> Date: Fri Oct 17 21:05:45 2025 +0200 fix: Update .env.example for Gemini Live (GetStream#108) commit e12112d Author: Thierry Schellenbach <thierry@getstream.io> Date: Fri Oct 17 11:49:07 2025 -0600 wip commit fea101a Author: Bart Schuijt <schuijt.bart@gmail.com> Date: Fri Oct 17 09:25:55 2025 +0200 workflow file update commit bb2d74c Author: Bart Schuijt <schuijt.bart@gmail.com> Date: Fri Oct 17 09:22:33 2025 +0200 initial commit commit d2853cd Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 16 19:44:59 2025 -0600 always remember pep 420 commit 30a8eca Author: Thierry Schellenbach <thierry@getstream.io> Date: Thu Oct 16 19:36:58 2025 -0600 start of bedrock branch commit fc032bf Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Thu Oct 16 09:17:42 2025 +0200 Remove cli handler from examples (GetStream#101) commit 39a821d Author: Dan Gusev <dangusev92@gmail.com> Date: Tue Oct 14 12:20:41 2025 +0200 Update Deepgram plugin to use SDK v5.0.0 (GetStream#98) * Update Deepgram plugin to use SDK v5.0.0 * Merge test_realtime and test_stt and update the remaining tests * Make deepgram.STT.start() idempotent * Clean up unused import * Use uv as the default package manager > pip --------- Co-authored-by: Neevash Ramdial (Nash) <mail@neevash.dev> commit 2013be5 Author: Tommaso Barbugli <tbarbugli@gmail.com> Date: Mon Oct 13 16:57:37 2025 +0200 ensure chat works with default types (GetStream#99)
Summary by CodeRabbit
New Features
Enhancements
Updates