Add mistral by maxkahan · Pull Request #334 · GetStream/Vision-Agents

maxkahan · 2026-02-04T18:50:10Z

Summary by CodeRabbit

New Features
- Added Mistral Voxtral Speech-to-Text plugin with real-time streaming transcription, automatic language detection, and partial/complete transcript support.
Documentation
- Added plugin documentation and usage examples for Mistral Voxtral integration.
Tests
- Added integration tests for Mistral STT functionality.

coderabbitai · 2026-02-04T18:50:30Z

📝 Walkthrough

Walkthrough

A new Mistral Voxtral STT (Speech-to-Text) plugin is introduced for the vision-agents ecosystem, featuring real-time WebSocket-based transcription with language detection and partial transcript support. The PR includes the STT implementation, documentation, an example integration combining Mistral STT with Deepgram TTS and Gemini LLM, project configurations, and integration tests.

Changes

Cohort / File(s)	Summary
Mistral STT Core Implementation `plugins/mistral/vision_agents/plugins/mistral/stt.py`, `plugins/mistral/vision_agents/plugins/mistral/__init__.py`	Core STT class implementing real-time WebSocket connection to Mistral Voxtral, handling audio streaming, partial/final transcripts, error events, participant context tracking, and lifecycle management. Includes sample rate resampling and language detection metadata.
Plugin Configuration `plugins/mistral/pyproject.toml`, `pyproject.toml`	Project metadata, dependencies (mistralai[realtime], vision-agents), and workspace configuration for the Mistral plugin within the monorepo.
Documentation `plugins/mistral/README.md`, `plugins/mistral/example/README.md`	Plugin documentation describing features, installation, usage examples, configuration options, and setup instructions for the example application.
Example Integration `plugins/mistral/example/mistral_stt_example.py`, `plugins/mistral/example/pyproject.toml`	Example agent implementation integrating Mistral STT with Deepgram TTS, GetStream for communication, and Gemini as LLM. Includes create_agent() and join_call() functions exposing agent construction and call lifecycle.
Tests `plugins/mistral/tests/test_mistral_stt.py`	Integration test validating STT functionality with chunked audio streaming and transcript verification.

Sequence Diagram

sequenceDiagram
    participant User
    participant Agent as Agent (Gemini LLM)
    participant STT as Mistral STT
    participant Mistral as Mistral Voxtral API
    participant TTS as Deepgram TTS
    
    User->>Agent: Audio input stream
    Agent->>STT: process_audio(pcm_data, participant)
    STT->>STT: Resample PCM to target rate
    STT->>Mistral: Stream audio via WebSocket
    Mistral-->>STT: TranscriptionStreamTextDelta (partial)
    STT->>STT: Emit partial transcript
    STT->>Agent: Partial transcript event
    Mistral-->>STT: TranscriptionStreamDone (final)
    STT->>STT: Emit final transcript
    STT->>Agent: Final transcript event
    Agent->>Agent: Process transcript & generate response
    Agent->>TTS: generate_speech(response_text)
    TTS-->>Agent: Audio response
    Agent->>User: Audio output stream

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[AI-201] Fish speech to text #121: Introduces Fish STT tests with shared STTSession fixtures and test utilities that are leveraged by Mistral STT integration tests.

Suggested labels

plugins, docs, examples, config, dependencies, project-info

Suggested reviewers

tschellenbach
Nash0x7E2
dangusev

Poem

Mistral whispers through the wire,
WebSocket confessions, sample rates expire,
Voices transcribed to text so spare,
Language pooling in the dark air,
One more plugin in the bell jar.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Add mistral' is vague and lacks specificity about what is being added or why.	Consider using a more descriptive title like 'Add Mistral Voxtral STT plugin' to clearly indicate the feature being introduced.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch add-mistral

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

plugins/mistral/README.md

plugins/mistral/example/README.md

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@plugins/mistral/vision_agents/plugins/mistral/stt.py`:
- Around line 225-228: The current bare "except Exception as e" around await
self._connection.end_audio() should be replaced with specific exception
handlers: catch network/timeouts (e.g., ConnectionError and
asyncio.TimeoutError) and the Mistral library's specific exception (e.g.,
mistralai.MistralError or the appropriate error class from mistralai.errors),
logging the exception details via logger.warning; update imports (add asyncio
and the correct mistralai error symbol) and ensure the logger.warning message
includes the caught exception object for context instead of a broad catch-all.
- Around line 242-245: The cleanup currently uses a bare "except Exception as e"
around await self._connection.close(); replace it with handling of specific
exception types and ensure asyncio.CancelledError is not swallowed: import
asyncio, await self._connection.close() inside try, then except
asyncio.CancelledError: raise, and catch only likely concrete exceptions (e.g.,
OSError, RuntimeError, ConnectionError or the specific client library exception)
and log via logger.warning(f"Error closing Mistral connection: {e}"). Reference
the await self._connection.close() call and the logger.warning call when making
the change.
- Around line 155-159: In _handle_done (handling TranscriptionStreamDone), avoid
calling event.text.strip() directly because event.text may be None; first guard
for a falsy value (e.g., text = event.text or "" / use a ternary like event.text
if event.text else "") or check if event.text is truthy before calling .strip(),
then proceed to .strip() and return early if the resulting text is empty—update
the _handle_done implementation to mirror the safe checks used in
_handle_text_delta/_handle_error.
- Around line 114-117: Replace the bare "except Exception as e:" handlers in
_receive_loop and in close() with specific exception clauses: first catch
asyncio.CancelledError and re-raise (e.g., in _receive_loop), then catch
mistralai.models.MistralError to log "Mistral API error" and emit the error
event via _emit_error_event, and finally catch httpx.RequestError (covers
ConnectError/Timeout) to log "Network error" and emit the error event; ensure
you import mistralai.models.MistralError and httpx.RequestError and use those
exception types in the except blocks for the receive loop and both places in
close(); also, in the event handling inside _receive_loop, check "if
event.text:" before calling event.text.strip() to avoid calling strip on None.

🧹 Nitpick comments (2)

plugins/mistral/example/mistral_stt_example.py (2)
42-43: Commented-out code references an unimported module.

The comment mentions smart_turn.TurnDetection() but smart_turn isn't imported. Users trying to uncomment this line will hit a NameError. Consider either adding the import in a comment or removing this line entirely and documenting the turn detection requirement elsewhere.
📝 Suggested fix
         llm=gemini.LLM("gemini-2.0-flash"),
-        # turn_detection=smart_turn.TurnDetection(),  # Required since Mistral STT has no turn detection
+        # Note: Mistral STT has no built-in turn detection. Consider adding:
+        # from vision_agents.plugins import smart_turn
+        # turn_detection=smart_turn.TurnDetection(),
     )
47-59: Example hardcodes a 5-second sleep before finishing.

The example joins a call, sleeps for 5 seconds, then immediately finishes. For a demonstration, this is rather brief and doesn't show any actual voice interaction. Consider adding a comment explaining this is just a minimal scaffolding example, or increasing the duration for a more realistic demo.

coderabbitai · 2026-02-04T19:12:06Z

plugins/mistral/vision_agents/plugins/mistral/stt.py

+        except Exception as e:
+            logger.error(f"Error in Mistral receive loop: {e}")
+            if not self.closed:
+                self._emit_error_event(e, context="receive_loop")


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

mistralai python library WebSocket exceptions realtime API

💡 Result:

The official mistralai Python SDK does not use WebSockets for “realtime”. Its streaming features are Server‑Sent Events (SSE) exposed as a Python generator (and also a context manager)—so you won’t see “WebSocket exceptions” coming from mistralai itself. [1]

What to catch in mistralai (streaming or non-streaming)

1) API/HTTP errors (non‑2xx responses): catch models.MistralError (base class). It exposes message, status_code, headers, body, etc. [2]

2) Network/transport errors: the SDK uses httpx, so catch httpx.RequestError and common subclasses like httpx.ConnectError and httpx.TimeoutException. [2]

Pattern for SSE streaming with robust exception handling

import os import httpx from mistralai import Mistral, models with Mistral(api_key=os.environ["MISTRAL_API_KEY"]) as client: try: stream = client.beta.conversations.start_stream(...) with stream as events: # ensures the stream is closed [1] for event in events: handle(event) except models.MistralError as e: # HTTP status / response problems [2] print(e.status_code, e.message) except httpx.RequestError as e: # network/timeout/connect errors [2] print(type(e).__name__, str(e))

If you are seeing WebSocket exceptions

That’s likely from a different “Realtime API” (e.g., OpenAI Realtime over WebSocket), not from the Mistral mistralai SDK—because mistralai streaming is SSE, not WebSocket. [1]

Sources:
[1] mistralai/client-python README — “Server-sent event streaming” section (generator + context manager)
[2] mistralai/client-python README — “Error Handling” section (MistralError, plus httpx network exceptions)

🏁 Script executed:

# Find the specific file and check for all except Exception instances fd "stt.py" | grep mistral # Check the exact lines mentioned in the review if [ -f "plugins/mistral/vision_agents/plugins/mistral/stt.py" ]; then echo "=== Lines 110-120 (around line 114-117) ===" sed -n '110,120p' "plugins/mistral/vision_agents/plugins/mistral/stt.py" echo -e "\n=== Lines 150-160 (around line 157) ===" sed -n '150,160p' "plugins/mistral/vision_agents/plugins/mistral/stt.py" echo -e "\n=== Lines 220-230 (around line 227) ===" sed -n '220,230p' "plugins/mistral/vision_agents/plugins/mistral/stt.py" echo -e "\n=== Lines 240-250 (around line 244) ===" sed -n '240,250p' "plugins/mistral/vision_agents/plugins/mistral/stt.py" echo -e "\n=== All 'except Exception' instances ===" rg "except Exception" "plugins/mistral/vision_agents/plugins/mistral/stt.py" -n fi

Repository: GetStream/Vision-Agents

Length of output: 2171

🏁 Script executed:

# Check imports to understand what exceptions might be available from mistralai fd "stt.py" --path "*mistral*" -x head -50 {} \;

Repository: GetStream/Vision-Agents

Length of output: 282

🏁 Script executed:

# Get more context around each exception block to understand what exceptions might be raised echo "=== Full _receive_loop method (lines 100-125) ===" sed -n '100,125p' "plugins/mistral/vision_agents/plugins/mistral/stt.py" echo -e "\n=== Full close method (lines 215-260) ===" sed -n '215,260p' "plugins/mistral/vision_agents/plugins/mistral/stt.py" echo -e "\n=== Imports section (first 40 lines) ===" sed -n '1,40p' "plugins/mistral/vision_agents/plugins/mistral/stt.py"

Repository: GetStream/Vision-Agents

Length of output: 3814

🏁 Script executed:

# Check if event.text can be None in the TranscriptionStreamDone event # Look at how it's used elsewhere and what the event type definition might indicate echo "=== Check _handle_done implementation and context ===" sed -n '155,165p' "plugins/mistral/vision_agents/plugins/mistral/stt.py" # Check if there are type hints or documentation about TranscriptionStreamDone.text echo -e "\n=== Search for TranscriptionStreamDone usage in codebase ===" rg "TranscriptionStreamDone" --type py -A 3 -B 1 # Check mistralai package docs or type stubs if available echo -e "\n=== Check for mistralai realtime module files ===" find . -type f -name "*.py" | xargs grep -l "TranscriptionStreamDone\|RealtimeConnection" | head -10

Repository: GetStream/Vision-Agents

Length of output: 2167

🏁 Script executed:

# Check the mistralai package for what exceptions RealtimeConnection might raise # Look at the imports and see if we can find any exception handling patterns elsewhere echo "=== Check for exception handling patterns in other plugins ===" rg "except.*Error" plugins/ --type py | head -20 echo -e "\n=== Check elevenlabs plugin for comparison ===" sed -n '1,50p' "plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py" | grep -A 10 "except" # Search for imports of exceptions in the codebase echo -e "\n=== Search for imported exceptions ===" rg "from.*import.*Error\|from.*import.*Exception" plugins/mistral/ --type py

Repository: GetStream/Vision-Agents

Length of output: 2112

🏁 Script executed:

# Let's check if mistralai is installed in the environment and inspect its exception types pip show mistralai 2>/dev/null && python3 << 'EOF' try: from mistralai.models import MistralError from mistralai.extra.realtime import RealtimeConnection import httpx print("=== Mistralai exception types ===") print(f"MistralError: {MistralError.__doc__}") print(f"\nhttpx exceptions available:") # List common httpx exceptions httpx_exceptions = [ 'ConnectError', 'ReadError', 'WriteError', 'PoolTimeout', 'TimeoutException', 'RequestError', 'HTTPError' ] for exc in httpx_exceptions: if hasattr(httpx, exc): print(f" - httpx.{exc}") # Check RealtimeConnection methods for clues about what they raise print(f"\nRealtimeConnection methods:") for method in dir(RealtimeConnection): if not method.startswith('_'): print(f" - {method}") except ImportError as e: print(f"Import error: {e}") except Exception as e: print(f"Error: {e}") EOF

Repository: GetStream/Vision-Agents

Length of output: 49

🏁 Script executed:

# Let's check what the actual exceptions might be by looking at the context # The _receive_loop is iterating over connection.stream events # Let's see if we can find what exceptions might be raised during event iteration echo "=== Check async context for _receive_loop ===" sed -n '90,120p' "plugins/mistral/vision_agents/plugins/mistral/stt.py"

Repository: GetStream/Vision-Agents

Length of output: 1386

Replace bare except Exception as e: with specific exception handling per coding guidelines.

The code violates the guideline requiring specific exception handling (lines 114, 227, 244). Based on the mistralai library, catch:

mistralai.models.MistralError for API/HTTP errors

httpx.RequestError (and subclasses like httpx.ConnectError, httpx.TimeoutException) for network/transport errors

Example for the _receive_loop() method:

except asyncio.CancelledError: logger.debug("Mistral receive loop cancelled") raise except MistralError as e: logger.error(f"Mistral API error: {e}") if not self.closed: self._emit_error_event(e, context="receive_loop") except httpx.RequestError as e: logger.error(f"Network error in Mistral receive loop: {e}") if not self.closed: self._emit_error_event(e, context="receive_loop")

Apply the same specificity to lines 227 and 244 in the close() method.

Additionally, line 157 calls event.text.strip() before checking for None; guard with if event.text: first.

🤖 Prompt for AI Agents

In `@plugins/mistral/vision_agents/plugins/mistral/stt.py` around lines 114 - 117, Replace the bare "except Exception as e:" handlers in _receive_loop and in close() with specific exception clauses: first catch asyncio.CancelledError and re-raise (e.g., in _receive_loop), then catch mistralai.models.MistralError to log "Mistral API error" and emit the error event via _emit_error_event, and finally catch httpx.RequestError (covers ConnectError/Timeout) to log "Network error" and emit the error event; ensure you import mistralai.models.MistralError and httpx.RequestError and use those exception types in the except blocks for the receive loop and both places in close(); also, in the event handling inside _receive_loop, check "if event.text:" before calling event.text.strip() to avoid calling strip on None.

coderabbitai · 2026-02-04T19:12:06Z

plugins/mistral/vision_agents/plugins/mistral/stt.py

+    def _handle_done(self, event: TranscriptionStreamDone):
+        """Handle end-of-stream event with full transcript."""
+        text = event.text.strip()
+        if not text:
+            return


⚠️ Potential issue | 🟡 Minor

Potential AttributeError if event.text is None.

Unlike _handle_text_delta which checks if not text: return before operating on the text, and _handle_error which uses a ternary guard, this method calls .strip() directly on event.text. If the API ever returns None, this will raise an AttributeError.

🛡️ Proposed fix

def _handle_done(self, event: TranscriptionStreamDone): """Handle end-of-stream event with full transcript.""" - text = event.text.strip() - if not text: + if not event.text: + return + text = event.text.strip() + if not text: return

🤖 Prompt for AI Agents

In `@plugins/mistral/vision_agents/plugins/mistral/stt.py` around lines 155 - 159, In _handle_done (handling TranscriptionStreamDone), avoid calling event.text.strip() directly because event.text may be None; first guard for a falsy value (e.g., text = event.text or "" / use a ternary like event.text if event.text else "") or check if event.text is truthy before calling .strip(), then proceed to .strip() and return early if the resulting text is empty—update the _handle_done implementation to mirror the safe checks used in _handle_text_delta/_handle_error.

coderabbitai · 2026-02-04T19:12:06Z

plugins/mistral/vision_agents/plugins/mistral/stt.py

+            try:
+                await self._connection.end_audio()
+            except Exception as e:
+                logger.warning(f"Error signaling end of audio: {e}")


⚠️ Potential issue | 🟡 Minor

Avoid bare except Exception as e per coding guidelines.

This catch-all may mask specific issues with signaling end-of-audio. Consider catching more specific exceptions from the mistralai library or at minimum, network-related exceptions like ConnectionError or asyncio.TimeoutError.

As per coding guidelines: "Never write except Exception as e - use specific exception handling."

🤖 Prompt for AI Agents

In `@plugins/mistral/vision_agents/plugins/mistral/stt.py` around lines 225 - 228, The current bare "except Exception as e" around await self._connection.end_audio() should be replaced with specific exception handlers: catch network/timeouts (e.g., ConnectionError and asyncio.TimeoutError) and the Mistral library's specific exception (e.g., mistralai.MistralError or the appropriate error class from mistralai.errors), logging the exception details via logger.warning; update imports (add asyncio and the correct mistralai error symbol) and ensure the logger.warning message includes the caught exception object for context instead of a broad catch-all.

coderabbitai · 2026-02-04T19:12:06Z

plugins/mistral/vision_agents/plugins/mistral/stt.py

+            try:
+                await self._connection.close()
+            except Exception as e:
+                logger.warning(f"Error closing Mistral connection: {e}")


⚠️ Potential issue | 🟡 Minor

Avoid bare except Exception as e per coding guidelines.

Same issue here with connection close. Consider specific exception types for connection cleanup failures.

As per coding guidelines: "Never write except Exception as e - use specific exception handling."

🤖 Prompt for AI Agents

In `@plugins/mistral/vision_agents/plugins/mistral/stt.py` around lines 242 - 245, The cleanup currently uses a bare "except Exception as e" around await self._connection.close(); replace it with handling of specific exception types and ensure asyncio.CancelledError is not swallowed: import asyncio, await self._connection.close() inside try, then except asyncio.CancelledError: raise, and catch only likely concrete exceptions (e.g., OSError, RuntimeError, ConnectionError or the specific client library exception) and log via logger.warning(f"Error closing Mistral connection: {e}"). Reference the await self._connection.close() call and the logger.warning call when making the change.

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@plugins/mistral/example/mistral_stt_example.py`:
- Around line 33-48: Update the docstrings for the public helper functions
create_agent and join_call to follow Google-style docstrings: for create_agent
describe Args (kwargs: any configurable creation parameters) and Returns (Agent:
the created Agent instance), and for join_call describe Args (agent: Agent,
call_type: str, call_id: str, **kwargs) and Returns (None), plus a short
one-line summary retained at top; keep wording concise and consistent with
existing project docstring conventions. Ensure the docstrings are placed
immediately under each function signature and include types for each parameter
and the return value.

In `@plugins/mistral/vision_agents/plugins/mistral/stt.py`:
- Around line 119-185: Update the three handler docstrings to Google-style by
adding an Args section that documents the single parameter `event` for each
function: in _handle_text_delta describe event as a TranscriptionStreamTextDelta
containing partial text deltas and any relevant attributes used (e.g., text); in
_handle_done describe event as a TranscriptionStreamDone containing the final
transcript, language, and model; and in _handle_error describe event as a
RealtimeTranscriptionError containing the error details; keep the one-line
summary and add a short Returns/Raises only if the handler returns or raises
(these handlers do not), matching other Google-style docstrings in the codebase.
- Around line 187-217: The code sets self._current_participant in process_audio
but shares self._accumulated_text across all inputs, allowing cross-participant
mixing; update process_audio to detect participant changes and either (a)
enforce a single active participant by rejecting/ignoring audio when
self._current_participant is set to a different Participant, or (b) flush/reset
per-participant buffers by moving _accumulated_text into a per-participant map
(e.g., dict keyed by participant id) and use that map when appending
audio/transcripts; specifically modify process_audio (and related Done/flush
handlers) to check self._current_participant vs incoming participant, and on
mismatch either reset the global _accumulated_text and set the new participant
or switch to per-participant storage to avoid mixing transcripts.
- Around line 223-245: Replace the broad Exception handlers around
self._connection.end_audio() and self._connection.close() with specific
exceptions (e.g., ConnectionError, OSError, asyncio.TimeoutError and any
mistralai-specific exceptions you find) so unexpected errors surface; update the
try/except blocks in the cleanup section that reference
self._connection.end_audio and self._connection.close to catch only those
concrete exception types (and log the error) after verifying the exact
exceptions raised by RealtimeConnection in the mistralai client library or by
running tests.

🧹 Nitpick comments (1)

plugins/mistral/tests/test_mistral_stt.py (1)

18-31: Consider using a context manager or try/finally for cleanup.

If an assertion or exception occurs before stt.close() is reached (e.g., during process_audio), the WebSocket connection may remain open, potentially causing resource leaks or test interference.

♻️ Proposed fix using try/finally

     async def test_transcribe_chunked_audio(self, mia_audio_48khz_chunked, participant):
         """Test transcription with chunked audio (simulates real-time streaming)."""
         stt = mistral.STT()
         await stt.start()
 
         session = STTSession(stt)
 
-        # Send audio in chunks like real-time streaming
-        for chunk in mia_audio_48khz_chunked:
-            await stt.process_audio(chunk, participant)
-            await asyncio.sleep(
-                0.001
-            )  # Simulate real-time pacing, allow receive task to run
-
-        # Close signals end of audio and triggers final transcript
-        await stt.close()
+        try:
+            # Send audio in chunks like real-time streaming
+            for chunk in mia_audio_48khz_chunked:
+                await stt.process_audio(chunk, participant)
+                await asyncio.sleep(
+                    0.001
+                )  # Simulate real-time pacing, allow receive task to run
+        finally:
+            # Close signals end of audio and triggers final transcript
+            await stt.close()

coderabbitai · 2026-02-04T19:18:56Z

plugins/mistral/example/mistral_stt_example.py

+async def create_agent(**kwargs) -> Agent:
+    """Create the agent with Mistral STT."""
+    agent = Agent(
+        edge=getstream.Edge(),
+        agent_user=User(name="Mistral Agent", id="agent"),
+        instructions="You're a helpful voice AI assistant. Keep replies short and conversational.",
+        tts=deepgram.TTS(),  # Uses Deepgram for text-to-speech
+        stt=mistral.STT(),  # Uses Mistral Voxtral for speech-to-text
+        llm=gemini.LLM("gemini-2.0-flash"),
+        # turn_detection=smart_turn.TurnDetection(),  # Required since Mistral STT has no turn detection
+    )
+    return agent
+
+
+async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
+    """Join the call and start the agent."""


⚠️ Potential issue | 🟡 Minor

Add Google-style Args/Returns to the public example helpers.

Both create_agent and join_call accept parameters, but their docstrings don’t include Args/Returns. Please expand them to match the project docstring standard.

✍️ Proposed docstring updates

async def create_agent(**kwargs) -> Agent: - """Create the agent with Mistral STT.""" + """Create the agent with Mistral STT. + + Args: + **kwargs: Optional agent configuration overrides. + + Returns: + Agent: Configured agent instance. + """ @@ async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None: - """Join the call and start the agent.""" + """Join the call and start the agent. + + Args: + agent: The agent instance to run. + call_type: Stream call type. + call_id: Stream call identifier. + **kwargs: Optional join parameters. + """

As per coding guidelines, Docstrings should follow the Google style guide for docstrings.

🤖 Prompt for AI Agents

In `@plugins/mistral/example/mistral_stt_example.py` around lines 33 - 48, Update the docstrings for the public helper functions create_agent and join_call to follow Google-style docstrings: for create_agent describe Args (kwargs: any configurable creation parameters) and Returns (Agent: the created Agent instance), and for join_call describe Args (agent: Agent, call_type: str, call_id: str, **kwargs) and Returns (None), plus a short one-line summary retained at top; keep wording concise and consistent with existing project docstring conventions. Ensure the docstrings are placed immediately under each function signature and include types for each parameter and the return value.

coderabbitai · 2026-02-04T19:18:56Z

plugins/mistral/vision_agents/plugins/mistral/stt.py

+    def _handle_text_delta(self, event: TranscriptionStreamTextDelta):
+        """Handle text delta - emit word-by-word partials, full text on complete."""
+        text = event.text
+        if not text:
+            return
+
+        participant = self._current_participant
+        if participant is None:
+            logger.warning("Received transcript but no participant set")
+            return
+
+        # Accumulate text for complete events
+        self._accumulated_text += text
+
+        processing_time_ms: Optional[float] = None
+        if self._audio_start_time is not None:
+            processing_time_ms = (time.perf_counter() - self._audio_start_time) * 1000
+
+        response = TranscriptResponse(
+            model_name=self.model,
+            processing_time_ms=processing_time_ms,
+        )
+
+        # Emit partial with just the new word/delta (not accumulated)
+        text_stripped = text.strip()
+        if text_stripped:
+            self._emit_partial_transcript_event(text_stripped, participant, response)
+
+        # Check for sentence-ending punctuation - emit complete transcript
+        if text.rstrip().endswith((".", "?", "!")):
+            accumulated_stripped = self._accumulated_text.strip()
+            if accumulated_stripped:
+                self._emit_transcript_event(accumulated_stripped, participant, response)
+                self._accumulated_text = ""
+                self._audio_start_time = None
+
+    def _handle_done(self, event: TranscriptionStreamDone):
+        """Handle end-of-stream event with full transcript."""
+        text = event.text.strip()
+        if not text:
+            return
+
+        participant = self._current_participant
+        if participant is None:
+            logger.warning("Received done event but no participant set")
+            return
+
+        response = TranscriptResponse(
+            language=event.language,
+            model_name=event.model,
+        )
+
+        self._emit_transcript_event(text, participant, response)
+        self._accumulated_text = ""
+        self._audio_start_time = None
+        self._done_received.set()
+
+    def _handle_error(self, event: RealtimeTranscriptionError):
+        """Handle error event."""
+        error_msg = str(event.error) if event.error else "Unknown Mistral error"
+        logger.error(f"Mistral transcription error: {error_msg}")
+
+        error = Exception(error_msg)
+        self._emit_error_event(
+            error, context="transcription", participant=self._current_participant
+        )
+        self._audio_start_time = None


⚠️ Potential issue | 🟡 Minor

Add Google-style Args sections to the handler docstrings.

These handlers accept event parameters but the docstrings omit Args. Please expand them for consistency.

✍️ Docstring alignment

def _handle_text_delta(self, event: TranscriptionStreamTextDelta): - """Handle text delta - emit word-by-word partials, full text on complete.""" + """Handle text delta - emit word-by-word partials, full text on complete. + + Args: + event: The text delta event from Mistral. + """ @@ def _handle_done(self, event: TranscriptionStreamDone): - """Handle end-of-stream event with full transcript.""" + """Handle end-of-stream event with full transcript. + + Args: + event: The end-of-stream event from Mistral. + """ @@ def _handle_error(self, event: RealtimeTranscriptionError): - """Handle error event.""" + """Handle error event. + + Args: + event: The error event from Mistral. + """

As per coding guidelines, Docstrings should follow the Google style guide for docstrings.

🤖 Prompt for AI Agents

In `@plugins/mistral/vision_agents/plugins/mistral/stt.py` around lines 119 - 185, Update the three handler docstrings to Google-style by adding an Args section that documents the single parameter `event` for each function: in _handle_text_delta describe event as a TranscriptionStreamTextDelta containing partial text deltas and any relevant attributes used (e.g., text); in _handle_done describe event as a TranscriptionStreamDone containing the final transcript, language, and model; and in _handle_error describe event as a RealtimeTranscriptionError containing the error details; keep the one-line summary and add a short Returns/Raises only if the handler returns or raises (these handlers do not), matching other Google-style docstrings in the codebase.

coderabbitai · 2026-02-04T19:18:56Z

plugins/mistral/vision_agents/plugins/mistral/stt.py

+    async def process_audio(
+        self,
+        pcm_data: PcmData,
+        participant: Optional[Participant] = None,
+    ):
+        """
+        Process audio data through Mistral for transcription.
+
+        Args:
+            pcm_data: The PCM audio data to process.
+            participant: Optional participant metadata.
+        """
+        if self.closed:
+            logger.warning("Mistral STT is closed, ignoring audio")
+            return
+
+        await self._connection_ready.wait()
+
+        if self._connection is None or self._connection.is_closed:
+            logger.warning("Mistral connection not available")
+            return
+
+        resampled = pcm_data.resample(self.sample_rate, 1)
+        audio_bytes = resampled.samples.tobytes()
+
+        self._current_participant = participant
+
+        if self._audio_start_time is None:
+            self._audio_start_time = time.perf_counter()
+
+        await self._connection.send_audio(audio_bytes)


⚠️ Potential issue | 🟠 Major

Guard against cross‑participant transcript mixing.

_current_participant and _accumulated_text are shared across all audio input. If multiple participants interleave audio before a Done event, transcripts can be attributed to the wrong speaker. Consider per‑participant state or enforce a single active participant.

🛡️ Minimal guard (prefer per‑participant state if multi‑speaker is required)

- self._current_participant = participant + if ( + self._current_participant is not None + and participant is not None + and participant != self._current_participant + and not self._done_received.is_set() + ): + logger.warning( + "Received audio for a new participant while previous transcript is active" + ) + return + + self._current_participant = participant

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def process_audio(

self,

pcm_data: PcmData,

participant: Optional[Participant] = None,

):

"""

Process audio data through Mistral for transcription.

Args:

pcm_data: The PCM audio data to process.

participant: Optional participant metadata.

"""

if self.closed:

logger.warning("Mistral STT is closed, ignoring audio")

return

await self._connection_ready.wait()

if self._connection is None or self._connection.is_closed:

logger.warning("Mistral connection not available")

return

resampled = pcm_data.resample(self.sample_rate, 1)

audio_bytes = resampled.samples.tobytes()

self._current_participant = participant

if self._audio_start_time is None:

self._audio_start_time = time.perf_counter()

await self._connection.send_audio(audio_bytes)

async def process_audio(

self,

pcm_data: PcmData,

participant: Optional[Participant] = None,

):

"""

Process audio data through Mistral for transcription.

Args:

pcm_data: The PCM audio data to process.

participant: Optional participant metadata.

"""

if self.closed:

logger.warning("Mistral STT is closed, ignoring audio")

return

await self._connection_ready.wait()

if self._connection is None or self._connection.is_closed:

logger.warning("Mistral connection not available")

return

resampled = pcm_data.resample(self.sample_rate, 1)

audio_bytes = resampled.samples.tobytes()

if (

self._current_participant is not None

and participant is not None

and participant != self._current_participant

and not self._done_received.is_set()

):

logger.warning(

"Received audio for a new participant while previous transcript is active"

)

return

self._current_participant = participant

if self._audio_start_time is None:

self._audio_start_time = time.perf_counter()

await self._connection.send_audio(audio_bytes)

🤖 Prompt for AI Agents

In `@plugins/mistral/vision_agents/plugins/mistral/stt.py` around lines 187 - 217, The code sets self._current_participant in process_audio but shares self._accumulated_text across all inputs, allowing cross-participant mixing; update process_audio to detect participant changes and either (a) enforce a single active participant by rejecting/ignoring audio when self._current_participant is set to a different Participant, or (b) flush/reset per-participant buffers by moving _accumulated_text into a per-participant map (e.g., dict keyed by participant id) and use that map when appending audio/transcripts; specifically modify process_audio (and related Done/flush handlers) to check self._current_participant vs incoming participant, and on mismatch either reset the global _accumulated_text and set the new participant or switch to per-participant storage to avoid mixing transcripts.

coderabbitai · 2026-02-04T19:18:56Z

plugins/mistral/vision_agents/plugins/mistral/stt.py

+        # Signal end of audio to trigger Done event with full transcript
+        if self._connection and not self._connection.is_closed:
+            try:
+                await self._connection.end_audio()
+            except Exception as e:
+                logger.warning(f"Error signaling end of audio: {e}")
+
+        # Wait for Done event with timeout
+        if self._receive_task and not self._done_received.is_set():
+            try:
+                await asyncio.wait_for(self._done_received.wait(), timeout=5.0)
+            except asyncio.TimeoutError:
+                logger.debug("Timeout waiting for done event")
+
+        if self._receive_task:
+            await cancel_and_wait(self._receive_task)
+            self._receive_task = None
+
+        if self._connection:
+            try:
+                await self._connection.close()
+            except Exception as e:
+                logger.warning(f"Error closing Mistral connection: {e}")


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and read the specific file find . -type f -name "stt.py" | grep mistral

Repository: GetStream/Vision-Agents

Length of output: 121

🏁 Script executed:

# Read lines 223-245 to verify the code sed -n '223,245p' ./plugins/mistral/vision_agents/plugins/mistral/stt.py | cat -n

Repository: GetStream/Vision-Agents

Length of output: 1175

🏁 Script executed:

# Look at the imports and class definition to understand the connection type head -50 ./plugins/mistral/vision_agents/plugins/mistral/stt.py | cat -n

Repository: GetStream/Vision-Agents

Length of output: 1907

🏁 Script executed:

# Search for all exception handling in the stt.py file to see patterns grep -n "except" ./plugins/mistral/vision_agents/plugins/mistral/stt.py | cat -n

Repository: GetStream/Vision-Agents

Length of output: 302

🏁 Script executed:

# Look at lines around 111-114 to see what pattern is used elsewhere sed -n '110,120p' ./plugins/mistral/vision_agents/plugins/mistral/stt.py | cat -n

Repository: GetStream/Vision-Agents

Length of output: 610

🏁 Script executed:

# Search for other usages of end_audio and close methods in the codebase rg "end_audio|\.close\(\)" ./plugins/mistral/vision_agents/plugins/mistral/ -B 2 -A 2

Repository: GetStream/Vision-Agents

Length of output: 1863

🏁 Script executed:

# Look at the full close method context to understand better sed -n '218,250p' ./plugins/mistral/vision_agents/plugins/mistral/stt.py | cat -n

Repository: GetStream/Vision-Agents

Length of output: 1524

🏁 Script executed:

# Search for any documentation or type hints related to RealtimeConnection rg "RealtimeConnection" ./plugins/mistral/vision_agents/plugins/mistral/ -B 2 -A 5

Repository: GetStream/Vision-Agents

Length of output: 1627

🌐 Web query:

Mistral AI RealtimeConnection end_audio close methods exceptions Python

💡 Result:

Official Mistral AI Python SDK (mistralai)

There is no RealtimeConnection class and therefore no end_audio() method in the official SDK (it’s primarily HTTP + streaming via SSE for some endpoints). (github.com)

“Close” is done by exiting the with Mistral(...) as client: context manager, which closes the underlying HTTPX client(s). (github.com)

The SDK’s documented exception types are:

MistralError (base for HTTP error responses)

HTTPValidationError (HTTP 422 validation error)

ResponseValidationError (response doesn’t match expected Pydantic model)

network errors from httpx: RequestError, ConnectError, TimeoutException (github.com)

If you meant vLLM’s RealtimeConnection (OpenAI-compatible realtime transcription over WebSocket)

RealtimeConnection manages the WebSocket lifecycle and routes events like session.update, input_audio_buffer.append, and input_audio_buffer.commit—but it does not document an end_audio() method. (docs.vllm.ai)

During input_audio_buffer.append, it can raise VLLMValidationError (e.g., max audio size exceeded, empty audio). Decode failures are caught and turned into an error message back to the client. (docs.vllm.ai)

Citations:

1: https://github.com/mistralai/client-python

2: https://github.com/mistralai/client-python

3: https://github.com/mistralai/client-python?utm_source=openai

4: https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/realtime/connection/

5: https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/realtime/connection/

🏁 Script executed:

# Search for exception types that are used in this file grep -n "RealtimeTranscriptionError\|import.*Error\|from.*error" ./plugins/mistral/vision_agents/plugins/mistral/stt.py

Repository: GetStream/Vision-Agents

Length of output: 241

🏁 Script executed:

# Look at the context around where RealtimeTranscriptionError is handled sed -n '100,120p' ./plugins/mistral/vision_agents/plugins/mistral/stt.py | cat -n

Repository: GetStream/Vision-Agents

Length of output: 1193

🏁 Script executed:

# Search for any other exception types used with connection-related operations grep -n "ConnectionError\|OSError\|TimeoutError\|WebSocketException" ./plugins/mistral/vision_agents/plugins/mistral/stt.py

Repository: GetStream/Vision-Agents

Length of output: 111

🏁 Script executed:

# Look for how other async connection operations are handled in the codebase rg "send_audio|\.connect\(\)" ./plugins/mistral/vision_agents/plugins/mistral/ -B 2 -A 5

Repository: GetStream/Vision-Agents

Length of output: 745

Use specific exceptions instead of catch-all Exception in cleanup code.

Lines 227 and 244 catch Exception broadly in the end_audio() and close() calls. Per coding guidelines, use specific exception types so unexpected failures remain visible.

Suggested approach: catch the exception types that RealtimeConnection.end_audio() and .close() actually raise. Common candidates for WebSocket/async operations include ConnectionError, OSError, asyncio.TimeoutError, and any Mistral-specific exceptions (check the mistralai library source or test these methods to confirm).

🤖 Prompt for AI Agents

In `@plugins/mistral/vision_agents/plugins/mistral/stt.py` around lines 223 - 245, Replace the broad Exception handlers around self._connection.end_audio() and self._connection.close() with specific exceptions (e.g., ConnectionError, OSError, asyncio.TimeoutError and any mistralai-specific exceptions you find) so unexpected errors surface; update the try/except blocks in the cleanup section that reference self._connection.end_audio and self._connection.close to catch only those concrete exception types (and log the error) after verifying the exact exceptions raised by RealtimeConnection in the mistralai client library or by running tests.

maxkahan added 2 commits February 4, 2026 18:23

initial mistral plugin

2033cb4

update to disable smart turn

1819ef2

github-actions bot added dependencies plugins config docs project-info labels Feb 4, 2026

Nash0x7E2 requested changes Feb 4, 2026

View reviewed changes

plugins/mistral/README.md Outdated Show resolved Hide resolved

plugins/mistral/README.md Outdated Show resolved Hide resolved

plugins/mistral/example/README.md Show resolved Hide resolved

update mistral deps and readme

21855fc

github-actions bot added the agents-core label Feb 4, 2026

Nash0x7E2 approved these changes Feb 4, 2026

View reviewed changes

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

Nash0x7E2 merged commit 21fc2d9 into main Feb 4, 2026
10 checks passed

Nash0x7E2 deleted the add-mistral branch February 4, 2026 19:15

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

coderabbitai bot mentioned this pull request Feb 5, 2026

added Mistral Voxtral integration on Readme #341

Merged

Conversation

maxkahan commented Feb 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

What to catch in mistralai (streaming or non-streaming)

Pattern for SSE streaming with robust exception handling

If you are seeing WebSocket exceptions

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Official Mistral AI Python SDK (mistralai)

If you meant vLLM’s RealtimeConnection (OpenAI-compatible realtime transcription over WebSocket)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maxkahan commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 4, 2026 •

edited

Loading

What to catch in `mistralai` (streaming or non-streaming)

Official Mistral AI Python SDK (`mistralai`)

If you meant vLLM’s `RealtimeConnection` (OpenAI-compatible realtime transcription over WebSocket)