Skip to content

New warmup implementation#273

Merged
dangusev merged 12 commits intomainfrom
feature/new-loading
Dec 29, 2025
Merged

New warmup implementation#273
dangusev merged 12 commits intomainfrom
feature/new-loading

Conversation

@dangusev
Copy link
Contributor

@dangusev dangusev commented Dec 29, 2025

What's changed

  • Added Warmable and WarmupCache to load resources once and share them between instances of the same classes in runtime.
  • Updated SileroVAD wrapper to separate the model (which can be shared between calls) and the track state (which must be isolated).
  • Updated smart_turn, vogent and aws.Realtime
  • Enabled Vogent tests

New warmup() approach.

Previously, some plugins had warmup() methods defined, which were called by AgentLauncher.
However, there was no way to share the warmed-up state between different instances of the same class.
E.g., a warmed-up TurnDetection would load the model every time a new call starts.

We need the warmup to be done only once during the lifetime of the program.

This PR adds Warmable and WarmupCache to standardize how the warmup is done and shared within the same process.

How it works:

  1. AgentLauncher holds a single instance of WarmupCache, which stores the warmup data like VAD models, etc.
  2. Warmable objects define two methods:
  • on_warmup - called only once for the same WarmupCache instance for the same class. Here you load the resources and return them
  • on_warmed_up - called every time a new Agent is spawned. Here you receive the warmed up resources and set them back to your object.
  1. AgentLauncher.launch() handles the warmup process under the hood.

Example:

from vision_agents.core.warmup import Warmable, WarmupCache

# Subclass `Warmable` ABC first and implement necessary methods
class SomeWarmable(Warmable[dict]):
    def __init__(self):
        self._resource: dict | None = None

    async def on_warmup(self):
        # This method is called once 
        # Do some loading work here and return some resource (eg. a dict with data)
        resource = {}
        return resource

    def on_warmed_up(self, resource: dict) -> None:
        # Set the resource back to the instance.
        self._resource = resource

Summary by CodeRabbit

  • New Features

    • Central warmup framework for coordinated, idempotent resource loading.
    • Shared VAD session pool to improve VAD efficiency and pooling.
    • Helper to run sync or async callables transparently.
  • Refactor

    • Components adopt a unified warmup lifecycle (on_warmup/on_warmed_up); agents warm dependencies before launch.
    • Agent launcher now ensures warmup is performed prior to agent creation.
  • Tests / Docs

    • Added warmup unit tests, updated integration tests, and documentation on plugin warmup.

✏️ Tip: You can customize this high-level summary in your review settings.

- Added `Warmable` and `WarmupCache` to load resources once and share them between instances of the same classes in runtime.
- Updated existing plugins
- SileroVAD -> SileroVADSession and SileroVADSessionPool
- Updated smart_turn, vogent and aws realtime
- Enabled Vogent tests
@coderabbitai
Copy link

coderabbitai bot commented Dec 29, 2025

📝 Walkthrough

Walkthrough

Centralizes warmup with a Warmable/WarmupCache framework and await_or_run helper; AgentLauncher now orchestrates component warmup before returning agents. Core base warmup hooks removed; plugins and tests migrated to Warmable on_warmup/on_warmed_up and Silero VAD session pooling.

Changes

Cohort / File(s) Summary
Warmup framework & utils
agents-core/vision_agents/core/warmup.py, agents-core/vision_agents/core/utils/utils.py
Add Warmable[T] and WarmupCache for cached, locked warmups; add await_or_run to uniformly run sync/async callables.
Agent launcher & core agent
agents-core/vision_agents/core/agents/agent_launcher.py, agents-core/vision_agents/core/agents/agents.py, agents-core/vision_agents/core/cli/cli_runner.py
AgentLauncher accepts lazy or awaitable agent creators, uses await_or_run, introduces _warmup_cache and _warmup_agent(agent) to warm components before returning; CLI now calls warmup() before launch().
Base-class warmup removals
agents-core/vision_agents/core/llm/llm.py, agents-core/vision_agents/core/stt/stt.py, agents-core/vision_agents/core/tts/tts.py, agents-core/vision_agents/core/turn_detection/turn_detection.py
Remove default async def warmup() hooks from core base classes; turn detector adds a start guard.
Silero VAD pooling
agents-core/vision_agents/core/vad/silero.py
Replace prepare_silero_vad/SileroVAD with SileroVADSessionPool and SileroVADSession; pool-based model load and per-session state.
AWS Realtime plugin & tests
plugins/aws/vision_agents/plugins/aws/aws_realtime.py, plugins/aws/tests/test_aws_realtime.py
Realtime becomes Warmable[SileroVADSessionPool]; add on_warmup/on_warmed_up, per-instance VAD session; tests use tmp_path, call warmup(), and ensure cleanup.
Fast Whisper plugin
plugins/fast_whisper/vision_agents/plugins/fast_whisper/stt.py
STT becomes Warmable[Optional[WhisperModel]]; add on_warmup/on_warmed_up for deferred model loading.
Moondream plugin
plugins/moondream/.../moondream_local_processor.py, plugins/moondream/.../moondream_local_vlm.py, plugins/moondream/tests/test_moondream_local.py
Processors and VLM adopt Warmable; replace warmup/_prepare flows with on_warmup/on_warmed_up; methods now raise if model accessed before warmup.
Roboflow plugin
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py
Processor adopts Warmable[RFDETR]; add on_warmup/on_warmed_up and remove prior warmup implementation.
Smart Turn plugin & tests
plugins/smart_turn/.../smart_turn_detection.py, plugins/smart_turn/tests/test_smart_turn.py
SmartTurnDetection becomes Warmable[...]; unified warmup loads VAD pool, feature extractor, ONNX session; tests refactored to use pool/session and new fixture.
Vogent plugin & tests
plugins/vogent/.../vogent_turn_detection.py, plugins/vogent/tests/test_vogent.py
VogentTurnDetection becomes Warmable[...]; add on_warmup/on_warmed_up, background processing loop, start/stop lifecycle; tests use new fixture.
Tests: Silero & Warmup
tests/test_silero_vad.py, tests/test_warmup.py
Tests updated to use SileroVADSessionPool.load().session(); add Warmable/WarmupCache tests (cache, no-cache, concurrency).
Docs
DEVELOPMENT.md
Add "Loading Resources in Plugins (warmup)" usage doc (duplicated block present).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Launcher as AgentLauncher
    participant Cache as WarmupCache
    participant Agent as Agent
    participant Comp as WarmableComponent

    rect rgb(240,248,255)
    Note over User,Launcher: Launch flow with centralized warmup
    end

    User->>Launcher: launch()
    activate Launcher
    Launcher->>Launcher: create_agent via await_or_run
    Launcher->>Agent: returns agent instance
    activate Agent
    Agent-->>Launcher: agent
    deactivate Agent

    Launcher->>Launcher: _warmup_agent(agent)
    Launcher->>Cache: cache.warmup(component)
    activate Cache

    alt resource cached
        Cache->>Comp: on_warmed_up(resource)
        Comp->>Comp: apply resource
    else not cached
        Cache->>Cache: acquire per-class lock
        Cache->>Comp: on_warmup()
        activate Comp
        Comp-->>Cache: resource (when ready)
        deactivate Comp
        Cache->>Cache: store resource
        Cache->>Comp: on_warmed_up(resource)
        Comp->>Comp: apply resource
    end
    deactivate Cache

    Launcher-->>User: warmed agent
    deactivate Launcher
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • d3xvn
  • Nash0x7E2
  • Wauplin

Poem

I am the warm hands in the machine's throat,
a slow bright hunger threading iron bone.
It learns the hush of circuits, gathers speech,
then opens like a mouth to name the room.
We feed it light until the silence blooms.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.91% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'New warmup implementation' directly and clearly describes the primary change in the changeset: a new warmup system using Warmable and WarmupCache.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)

158-176: Broad exception handling violates coding guidelines.

The except Exception as e pattern should be replaced with specific exception types. For HuggingFace model loading, consider catching OSError, ValueError, or the specific HF exceptions.

🔎 Proposed fix
-        except Exception as e:
+        except (OSError, ValueError, RuntimeError) as e:
             error_msg = str(e)

As per coding guidelines: "Never write except Exception as e - use specific exception handling."

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)

179-197: Broad exception handling violates coding guidelines.

Same issue as in LocalVLM. Use specific exception types.

🔎 Proposed fix
-        except Exception as e:
+        except (OSError, ValueError, RuntimeError) as e:
             error_msg = str(e)

As per coding guidelines: "Never write except Exception as e - use specific exception handling."

🧹 Nitpick comments (9)
plugins/moondream/tests/test_moondream_local.py (1)

60-60: Docstring should reference warmup() instead of load().

The test docstring mentions "load()" but the actual method called on line 65 is warmup(). For accuracy and clarity, the docstring should reference the correct method name.

🔎 Suggested fix
-        """Test that load() successfully loads the model."""
+        """Test that warmup() successfully loads the model."""
tests/test_warmup.py (1)

64-86: Good concurrent warmup test, but missing on_warmed_up call count assertions.

The test validates resource sharing and that only one instance triggers on_warmup, but it doesn't verify that on_warmed_up was called for all three instances. Consider adding:

         # Verify that on_warmup() was called only for the first instance
         assert warmable1.on_warmup_calls_count == 1
         assert warmable2.on_warmup_calls_count == 0
         assert warmable3.on_warmup_calls_count == 0
+
+        # Verify that on_warmed_up() was called for all instances
+        assert warmable1.on_warmed_up_calls_count == 1
+        assert warmable2.on_warmed_up_calls_count == 1
+        assert warmable3.on_warmed_up_calls_count == 1
plugins/fast_whisper/vision_agents/plugins/fast_whisper/stt.py (1)

196-206: Avoid hasattr usage; prefer direct attribute access.

Per coding guidelines, hasattr should be avoided. Since Segment and TranscriptionInfo types from faster_whisper should have consistent attributes, consider accessing them directly or using getattr with a default if truly optional.

🔎 Proposed fix
                 response = TranscriptResponse(
-                    confidence=segment.no_speech_prob
-                    if hasattr(segment, "no_speech_prob")
-                    else None,
-                    language=info.language
-                    if hasattr(info, "language")
-                    else self.language,
+                    confidence=segment.no_speech_prob,
+                    language=info.language if info.language else self.language,
                     processing_time_ms=processing_time_ms,
                     audio_duration_ms=buffer_to_process.duration_ms,
                     model_name=f"faster-whisper-{self.model_size}",
                 )
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)

30-30: Missing type parameter for Warmable base class.

The Warmable generic should specify the resource type for clarity and type safety.

🔎 Proposed fix
-class LocalVLM(llm.VideoLLM, Warmable):
+class LocalVLM(llm.VideoLLM, Warmable[Any]):

Or preferably, define a proper type alias for the model type.

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (2)

30-30: Missing type parameter for Warmable.

Same as LocalVLM, specify the generic type parameter.

🔎 Proposed fix
-class LocalDetectionProcessor(VideoProcessorPublisher, Warmable):
+class LocalDetectionProcessor(VideoProcessorPublisher, Warmable[Any]):

137-197: Consider extracting shared model loading logic.

_load_model_sync is nearly identical to the one in LocalVLM. Consider extracting this to a shared utility in moondream_utils.py to reduce duplication.

agents-core/vision_agents/core/vad/silero.py (1)

18-67: Excellent pool-based architecture for resource sharing.

The SileroVADSessionPool cleanly separates the expensive ONNX model loading from per-track session state. The async load() factory correctly offloads blocking I/O and session creation to background threads.

Minor typo in docstring at line 62: "instace" → "instance".

🔎 Proposed fix
         Returns:
-            an instace of `SileroVADSession`
+            an instance of `SileroVADSession`
plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (1)

200-204: Avoid catching broad Exception.

Per coding guidelines, use specific exception handling rather than except Exception as e. Consider catching more specific exceptions or at minimum re-raising after logging if this is a last-resort handler.

🔎 Proposed fix
             except asyncio.TimeoutError:
                 # Timeout is expected - continue loop to check shutdown
                 continue
-            except Exception as e:
-                logger.error(f"Error processing audio: {e}")
+            except (ValueError, RuntimeError) as e:
+                logger.error("Error processing audio: %s", e, exc_info=True)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (1)

208-212: Avoid catching broad Exception.

Per coding guidelines, use specific exception handling rather than except Exception as e. This mirrors the same issue in SmartTurnDetection.

🔎 Proposed fix
             except asyncio.TimeoutError:
                 # Timeout is expected - continue loop to check shutdown
                 continue
-            except Exception as e:
-                logger.error(f"Error processing audio: {e}")
+            except (ValueError, RuntimeError) as e:
+                logger.error("Error processing audio: %s", e, exc_info=True)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 515fd64 and 2d39ae2.

📒 Files selected for processing (23)
  • agents-core/vision_agents/core/agents/agent_launcher.py
  • agents-core/vision_agents/core/agents/agents.py
  • agents-core/vision_agents/core/cli/cli_runner.py
  • agents-core/vision_agents/core/llm/llm.py
  • agents-core/vision_agents/core/stt/stt.py
  • agents-core/vision_agents/core/tts/tts.py
  • agents-core/vision_agents/core/turn_detection/turn_detection.py
  • agents-core/vision_agents/core/utils/utils.py
  • agents-core/vision_agents/core/vad/silero.py
  • agents-core/vision_agents/core/warmup.py
  • plugins/aws/tests/test_aws_realtime.py
  • plugins/aws/vision_agents/plugins/aws/aws_realtime.py
  • plugins/fast_whisper/vision_agents/plugins/fast_whisper/stt.py
  • plugins/moondream/tests/test_moondream_local.py
  • plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py
  • plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
  • plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py
  • plugins/smart_turn/tests/test_smart_turn.py
  • plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py
  • plugins/vogent/tests/test_vogent.py
  • plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
  • tests/test_silero_vad.py
  • tests/test_warmup.py
💤 Files with no reviewable changes (4)
  • agents-core/vision_agents/core/turn_detection/turn_detection.py
  • agents-core/vision_agents/core/tts/tts.py
  • agents-core/vision_agents/core/llm/llm.py
  • agents-core/vision_agents/core/stt/stt.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

  • agents-core/vision_agents/core/cli/cli_runner.py
  • agents-core/vision_agents/core/utils/utils.py
  • agents-core/vision_agents/core/agents/agents.py
  • plugins/moondream/tests/test_moondream_local.py
  • plugins/aws/tests/test_aws_realtime.py
  • plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py
  • agents-core/vision_agents/core/vad/silero.py
  • plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py
  • agents-core/vision_agents/core/agents/agent_launcher.py
  • tests/test_warmup.py
  • plugins/aws/vision_agents/plugins/aws/aws_realtime.py
  • plugins/smart_turn/tests/test_smart_turn.py
  • agents-core/vision_agents/core/warmup.py
  • plugins/fast_whisper/vision_agents/plugins/fast_whisper/stt.py
  • tests/test_silero_vad.py
  • plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
  • plugins/vogent/tests/test_vogent.py
  • plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py
  • plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
**/*test*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*test*.py: Never mock in tests; use pytest for testing
Mark integration tests with @pytest.mark.integration decorator
@pytest.mark.asyncio is not needed - it is automatic

Files:

  • plugins/moondream/tests/test_moondream_local.py
  • plugins/aws/tests/test_aws_realtime.py
  • tests/test_warmup.py
  • plugins/smart_turn/tests/test_smart_turn.py
  • tests/test_silero_vad.py
  • plugins/vogent/tests/test_vogent.py
🧠 Learnings (1)
📚 Learning: 2025-12-10T19:35:39.238Z
Learnt from: dangusev
Repo: GetStream/Vision-Agents PR: 249
File: agents-core/vision_agents/core/agents/agents.py:1032-1042
Timestamp: 2025-12-10T19:35:39.238Z
Learning: In `agents-core/vision_agents/core/agents/agents.py`, when using `video_track_override_path`, creating a new `VideoFileTrack` for each participant (each call to `_on_track_added`) is intentional to maintain proper track lifecycle semantics tied to each participant.

Applied to files:

  • plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py
🧬 Code graph analysis (15)
agents-core/vision_agents/core/cli/cli_runner.py (2)
agents-core/vision_agents/core/agents/agent_launcher.py (2)
  • warmup (39-54)
  • launch (56-68)
agents-core/vision_agents/core/warmup.py (2)
  • warmup (24-42)
  • warmup (96-105)
agents-core/vision_agents/core/agents/agents.py (1)
agents-core/vision_agents/core/utils/utils.py (1)
  • await_or_run (91-101)
plugins/aws/tests/test_aws_realtime.py (1)
agents-core/vision_agents/core/agents/agent_types.py (1)
  • AgentOptions (15-25)
plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (3)
agents-core/vision_agents/core/turn_detection/turn_detection.py (2)
  • TurnDetector (20-82)
  • start (76-78)
agents-core/vision_agents/core/vad/silero.py (5)
  • SileroVADSession (69-135)
  • SileroVADSessionPool (18-66)
  • load (33-51)
  • session (53-66)
  • predict_speech (90-95)
agents-core/vision_agents/core/warmup.py (1)
  • Warmable (45-105)
agents-core/vision_agents/core/vad/silero.py (1)
agents-core/vision_agents/core/utils/utils.py (1)
  • ensure_model (45-88)
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (2)
agents-core/vision_agents/core/warmup.py (5)
  • warmup (24-42)
  • warmup (96-105)
  • Warmable (45-105)
  • on_warmup (76-82)
  • on_warmed_up (85-94)
tests/test_warmup.py (2)
  • on_warmup (16-20)
  • on_warmed_up (22-24)
agents-core/vision_agents/core/agents/agent_launcher.py (2)
agents-core/vision_agents/core/utils/utils.py (1)
  • await_or_run (91-101)
agents-core/vision_agents/core/warmup.py (4)
  • warmup (24-42)
  • warmup (96-105)
  • WarmupCache (13-42)
  • Warmable (45-105)
tests/test_warmup.py (1)
agents-core/vision_agents/core/warmup.py (6)
  • warmup (24-42)
  • warmup (96-105)
  • Warmable (45-105)
  • WarmupCache (13-42)
  • on_warmup (76-82)
  • on_warmed_up (85-94)
plugins/aws/vision_agents/plugins/aws/aws_realtime.py (2)
agents-core/vision_agents/core/vad/silero.py (5)
  • SileroVADSession (69-135)
  • SileroVADSessionPool (18-66)
  • load (33-51)
  • session (53-66)
  • predict_speech (90-95)
agents-core/vision_agents/core/warmup.py (5)
  • warmup (24-42)
  • warmup (96-105)
  • Warmable (45-105)
  • on_warmup (76-82)
  • on_warmed_up (85-94)
agents-core/vision_agents/core/warmup.py (5)
plugins/aws/vision_agents/plugins/aws/aws_realtime.py (2)
  • on_warmed_up (200-202)
  • on_warmup (197-198)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (2)
  • on_warmed_up (134-135)
  • on_warmup (122-132)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)
  • on_warmed_up (112-113)
  • on_warmup (99-110)
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (2)
  • on_warmed_up (216-217)
  • on_warmup (212-214)
tests/test_warmup.py (2)
  • on_warmed_up (22-24)
  • on_warmup (16-20)
plugins/fast_whisper/vision_agents/plugins/fast_whisper/stt.py (2)
agents-core/vision_agents/core/warmup.py (5)
  • warmup (24-42)
  • warmup (96-105)
  • Warmable (45-105)
  • on_warmup (76-82)
  • on_warmed_up (85-94)
agents-core/vision_agents/core/stt/stt.py (1)
  • STT (16-165)
tests/test_silero_vad.py (3)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
agents-core/vision_agents/core/vad/silero.py (3)
  • SileroVADSessionPool (18-66)
  • load (33-51)
  • session (53-66)
conftest.py (1)
  • skip_blockbuster (29-46)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (4)
agents-core/vision_agents/core/edge/types.py (2)
  • Participant (22-24)
  • stop (47-47)
agents-core/vision_agents/core/turn_detection/turn_detection.py (3)
  • TurnDetector (20-82)
  • start (76-78)
  • stop (80-82)
agents-core/vision_agents/core/vad/silero.py (5)
  • SileroVADSession (69-135)
  • SileroVADSessionPool (18-66)
  • load (33-51)
  • session (53-66)
  • predict_speech (90-95)
agents-core/vision_agents/core/warmup.py (1)
  • Warmable (45-105)
plugins/vogent/tests/test_vogent.py (2)
agents-core/vision_agents/core/agents/conversation.py (1)
  • InMemoryConversation (230-237)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (3)
  • start (157-165)
  • stop (167-178)
  • process_audio (180-191)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (3)
agents-core/vision_agents/core/llm/llm.py (1)
  • LLMResponseEvent (39-43)
agents-core/vision_agents/core/warmup.py (5)
  • warmup (24-42)
  • warmup (96-105)
  • Warmable (45-105)
  • on_warmup (76-82)
  • on_warmed_up (85-94)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (3)
  • on_warmup (122-132)
  • _load_model_sync (137-197)
  • on_warmed_up (134-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: unit / Ruff
  • GitHub Check: unit / Mypy
  • GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Mypy
  • GitHub Check: unit / Test "not integration"
🔇 Additional comments (53)
agents-core/vision_agents/core/cli/cli_runner.py (1)

104-108: LGTM! Warmup and launch flow correctly implemented.

The two-step warmup and launch approach aligns with the new warmup framework. The call_type and call_id parameters are appropriately used later for demo UI and joining the call, rather than during agent creation.

agents-core/vision_agents/core/warmup.py (3)

1-11: LGTM! Clean module structure.

Imports and public API exposure are appropriate for the warmup framework.


13-42: LGTM! Correct double-check locking pattern.

The concurrent-safe lazy loading implementation correctly uses per-class locking and caching. The double-check pattern prevents race conditions when multiple instances of the same class are warmed up concurrently.


45-105: LGTM! Well-designed warmup abstraction.

The Warmable generic interface provides a clean contract for resource loading with strong typing. The lifecycle separation (on_warmup for loading, on_warmed_up for applying) enables effective resource sharing via WarmupCache. The docstring example is clear and helpful.

agents-core/vision_agents/core/agents/agents.py (2)

63-63: LGTM! Correct import for unified callable handling.

The await_or_run utility import supports the refactored _apply method below.


639-639: LGTM! Cleaner unified callable handling.

Using await_or_run simplifies the _apply method and centralizes sync/async handling logic. This improves maintainability and consistency across the codebase.

agents-core/vision_agents/core/utils/utils.py (2)

3-14: LGTM! Appropriate imports and type variables.

The imports and type variables correctly support the generic await_or_run function with proper parameter forwarding and return type preservation.


91-101: LGTM! Solid utility for sync/async unification.

The implementation correctly handles both synchronous and asynchronous callables with proper type safety through ParamSpec and TypeVar. This follows established patterns for maybe-async function handling.

tests/test_silero_vad.py (3)

3-4: LGTM! Imports updated for session pool API.

The imports correctly reflect the migration from prepare_silero_vad to the new SileroVADSessionPool API.


8-8: LGTM! Decorator updated to pytest convention.

The change from @skip_blockbuster to @pytest.mark.skip_blockbuster follows pytest marking conventions more explicitly.


15-16: LGTM! Correct adoption of session pool pattern.

The two-step initialization (load pool, create session) enables efficient resource sharing while maintaining per-test state isolation. This aligns with the new warmup framework design.

Also applies to: 32-33

plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (3)

26-26: LGTM! Correct import for Warmable integration.

The import enables the class to implement the Warmable[RFDETR] interface.


53-53: LGTM! Correct Warmable integration.

The class signature properly extends both VideoProcessorPublisher and Warmable[RFDETR], with the type parameter accurately indicating the model resource type.


212-217: LGTM! Proper Warmable lifecycle implementation.

The on_warmup method correctly loads the model asynchronously in the executor and returns it for caching. The on_warmed_up method cleanly separates resource application from loading, enabling effective resource sharing via WarmupCache.

plugins/vogent/tests/test_vogent.py (4)

2-2: LGTM! Logging import for test observability.

Adding logging support helps with debugging integration test behavior.


13-21: LGTM! Well-structured fixture with proper lifecycle management.

The fixture correctly orchestrates the warmup/start/stop lifecycle and ensures cleanup via try/finally. This provides consistent test setup and teardown.


24-26: LGTM! Appropriate test markers and organization.

The class decorators correctly mark the test as an integration test that makes blocking calls, following pytest conventions for test categorization.


27-62: LGTM! Test correctly migrated to fixture-based setup.

All references to the turn detection instance are consistently updated to use the vogent_turn_detection fixture, while preserving the original test logic. The fixture ensures proper warmup and cleanup.

tests/test_warmup.py (2)

1-25: LGTM! Clean test infrastructure setup.

The DummyWarmable implementation correctly follows the Warmable protocol, tracking call counts to verify caching behavior. The use of random.random() for simulating async work is appropriate for concurrency tests.

One minor note: consider adding a type hint to on_warmed_up(self, resource: Resource) for consistency with the generic parameter in Warmable[Resource].


27-45: Well-structured caching verification test.

The test correctly validates that on_warmup is called only once while on_warmed_up is called on each warmup invocation with the same cache. The identity check (is) properly verifies resource reuse.

plugins/aws/tests/test_aws_realtime.py (2)

22-39: LGTM! Proper lifecycle management with warmup integration.

The fixture correctly uses tmp_path for isolated test model directories, calls warmup() before use, and ensures close() is called via try/finally. This aligns well with the new Warmable lifecycle.


148-163: Consistent fixture implementation for Nova2.

Both test class fixtures now follow the same pattern. The lifecycle handling mirrors TestBedrockRealtime appropriately.

plugins/fast_whisper/vision_agents/plugins/fast_whisper/stt.py (1)

73-92: LGTM! Correct Warmable implementation.

The on_warmup method properly loads the model in a thread pool when not provided at init, and on_warmed_up correctly guards against overwriting an already-provided model.

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)

99-113: LGTM! Clean warmup lifecycle implementation.

The on_warmup correctly offloads model loading to a thread and returns the model for caching. The on_warmed_up assignment is straightforward.

Consider adding type hints for better IDE support:

async def on_warmup(self) -> Any:
def on_warmed_up(self, model: Any) -> None:

188-189: LGTM! Clear precondition check.

The ValueError with a descriptive message properly guards against using the VLM before the model is loaded.

plugins/smart_turn/tests/test_smart_turn.py (3)

13-19: LGTM! Clean fixture for SmartTurnDetection lifecycle.

The fixture properly calls warmup(), start(), and stop() in sequence. The yield-based cleanup pattern ensures proper teardown.

Consider whether this fixture should be module-scoped (@pytest.fixture(scope="module")) if SmartTurnDetection initialization is expensive, since multiple tests share the same configuration.


24-32: LGTM! Correctly uses new SileroVADSessionPool API.

The test properly loads the pool asynchronously and creates a session for prediction. The chunk-based iteration with size validation is appropriate.


34-62: LGTM! Good event-driven test pattern.

The subscription-based event capture and subsequent assertion on event order effectively validates turn detection behavior.

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (2)

122-135: LGTM! Proper Warmable implementation.

The implementation correctly offloads model loading to a thread and returns the model for caching by WarmupCache.


216-217: LGTM! Clear precondition enforcement.

Properly guards the process_video method against unloaded model state.

agents-core/vision_agents/core/vad/silero.py (2)

1-16: LGTM! Clean module setup with proper exports.

The __all__ export list explicitly defines the public API. Constants are well-named and documented.


69-135: LGTM! Clean session implementation with proper state management.

The SileroVADSession correctly manages per-session state (context, hidden state) while reusing the shared ONNX model. The periodic reset via _maybe_reset() prevents drift accumulation.

agents-core/vision_agents/core/agents/agent_launcher.py (4)

1-11: LGTM! Clean imports for warmup infrastructure.

The imports correctly bring in the new WarmupCache, Warmable, and await_or_run utilities.


22-37: LGTM! Proper type hints and initialization.

The flexible type hints for create_agent and join_call accommodate both sync and async factories. The WarmupCache instance is correctly scoped to the launcher for sharing across warmups.


39-54: Verify: warmup() creates an agent that is discarded.

The comment says "dry-run Agent instance" but the agent is created and its components warmed up, then discarded. If create_agent has side effects (e.g., establishing connections, allocating resources), this could cause issues.

Is this intentional? The cached resources from _warmup_cache persist, but the agent instance itself is lost.


70-108: Excellent parallel warmup orchestration.

The _warmup_agent helper elegantly:

  1. Collects warmup tasks for all Warmable components
  2. Executes them in parallel via asyncio.gather
  3. Uses the shared _warmup_cache for resource deduplication

The isinstance(component, Warmable) checks are appropriate here for duck-typing detection.

plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (6)

45-50: LGTM on the Warmable integration!

The multiple inheritance from TurnDetector and Warmable with the appropriately typed tuple resource is well-structured. The generic type parameter clearly documents what resources are managed.


121-124: Clean initialization of optional state fields.

Properly typed as | None with explicit None defaults. This aligns with the warmup lifecycle where resources are set via on_warmed_up.


125-143: Well-structured async warmup with proper non-blocking I/O.

The use of asyncio.to_thread for blocking operations (directory creation, WhisperFeatureExtractor init, ONNX session building) prevents event loop starvation. The sequential loading order is correct: VAD pool → model path → feature extractor → ONNX session.


145-154: Correct resource binding in on_warmed_up.

The tuple unpacking and session creation from pool follows the Warmable contract. Each instance gets its own stateful SileroVADSession while sharing the underlying ONNX model.


227-228: Good defensive check for uninitialized VAD session.

The explicit ValueError with a descriptive message ensures fast failure if the warmup lifecycle is violated. This is the right pattern for catching misuse early.


372-377: Consistent null checks for required resources.

The explicit checks for _whisper_extractor and _smart_turn before use provide clear error messages. This matches the pattern used for _vad_session.

plugins/aws/vision_agents/plugins/aws/aws_realtime.py (4)

87-87: Correct Warmable integration with SileroVADSessionPool.

The class signature properly inherits from both realtime.Realtime and Warmable[SileroVADSessionPool], enabling the warmup lifecycle for VAD pool management.


167-171: Clean state initialization for connection tracking and VAD session.

The new instance attributes for connected, last_connected_at, _last_audio_at, and _vad_session are properly typed and initialized to safe defaults.


200-202: LGTM on on_warmed_up implementation.

Correctly creates a per-instance VAD session from the shared pool, following the Warmable contract.


393-394: Good defensive check for VAD initialization.

The explicit ValueError when _vad_session is None provides a clear error message for lifecycle violations.

plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (7)

33-35: Correct Warmable integration with typed resource tuple.

The class properly inherits from TurnDetector and Warmable with a clear 3-tuple type for the pooled VAD, Whisper model, and Vogent detector.


116-149: Well-structured async warmup pipeline.

The sequential loading of VAD pool → Whisper model → Vogent detector with proper asyncio.to_thread wrapping prevents blocking the event loop. The lambda wrapping is necessary for the thread context.


151-155: Correct resource binding pattern.

The tuple unpacking directly assigns _whisper and _vogent while creating a session from vad_pool. This follows the Warmable contract appropriately.


157-165: LGTM on start() implementation.

Launches background processing and calls parent's start. Mirrors the pattern in SmartTurnDetection.


167-178: Good cleanup in stop().

Properly cancels the background task and handles CancelledError. The shutdown event is set before cancellation.


236-237: Good defensive check for VAD session.

Consistent with the pattern used in other files, raising ValueError for uninitialized resources.


397-398: Correct error handling for uninitialized Vogent.

The explicit ValueError matches the expected pattern for warmup lifecycle violations.

Copy link
Member

@Nash0x7E2 Nash0x7E2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangusev looks good, please double check our internal docs and add something on how the warmup/warmable flow works since it's a departure from the current norm everyone would be familiar with. Other than that, LGTM

@dangusev
Copy link
Contributor Author

@dangusev looks good, please double check our internal docs and add something on how the warmup/warmable flow works since it's a departure from the current norm everyone would be familiar with. Other than that, LGTM

Good point @Nash0x7E2 .
I added a paragraph about warmup to DEVELOPMENT.md.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2d39ae2 and 8bb8ace.

📒 Files selected for processing (1)
  • DEVELOPMENT.md
🧰 Additional context used
🪛 LanguageTool
DEVELOPMENT.md

[grammar] ~509-~509: Ensure spelling is correct
Context: ...essors using YOLO models In order to standartise how these resources are loaded and to m...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
  • GitHub Check: unit / Mypy
  • GitHub Check: unit / Ruff
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
  • GitHub Check: unit / Mypy
  • GitHub Check: unit / Ruff
🔇 Additional comments (1)
DEVELOPMENT.md (1)

501-551: Well-documented warmup pattern with clear example.

The new section effectively explains the Warmable framework and provides a concrete, working example. The code demonstrates proper inheritance, method signatures, and lifecycle documentation. The explanation helps plugin developers understand when each hook is called and how resource sharing works.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (1)

137-146: Comment contradicts code: compile_model value.

Line 137 comment states compile_model=False, but line 140 actually passes compile_model=True. Update the comment to match the implementation or explain why compilation is enabled despite potential edge case issues.

🔎 Proposed fix
         logger.info("Loading Vogent turn detection model")
-        # Note: compile_model=False to avoid torch.compile issues with edge cases
+        # Note: compile_model=True to improve inference performance
         vogent = await asyncio.to_thread(
             lambda: VogentDetector(
                 compile_model=True,
♻️ Duplicate comments (1)
plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (1)

156-161: Start method relies on downstream validation.

Per the past review comment, this method lacks an explicit warmup guard. However, the downstream methods _process_audio_packet (line 227-228) and _blocking_predict_turn_completed (lines 373-377) validate initialization state and raise clear ValueErrors. Given that AgentLauncher.launch() manages the warmup lifecycle (per PR context), this approach is acceptable in practice. Consider adding a docstring noting the warmup prerequisite if you prefer explicit documentation.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8bb8ace and cbb396d.

📒 Files selected for processing (5)
  • DEVELOPMENT.md
  • agents-core/vision_agents/core/vad/silero.py
  • plugins/fast_whisper/vision_agents/plugins/fast_whisper/stt.py
  • plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py
  • plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

  • agents-core/vision_agents/core/vad/silero.py
  • plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py
  • plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
  • plugins/fast_whisper/vision_agents/plugins/fast_whisper/stt.py
🧠 Learnings (1)
📚 Learning: 2025-10-13T22:00:34.300Z
Learnt from: dangusev
Repo: GetStream/Vision-Agents PR: 98
File: plugins/deepgram/vision_agents/plugins/deepgram/stt.py:135-150
Timestamp: 2025-10-13T22:00:34.300Z
Learning: In the Deepgram STT plugin (plugins/deepgram/vision_agents/plugins/deepgram/stt.py), the `started()` method is designed to wait for the connection attempt to complete, not to guarantee a successful connection. It's acceptable for the connection attempt to fail, and downstream code handles the case where `self.dg_connection` is `None`. The `_connected_once` event is set in the `finally` block intentionally to signal attempt completion.

Applied to files:

  • plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py
🧬 Code graph analysis (3)
agents-core/vision_agents/core/vad/silero.py (1)
agents-core/vision_agents/core/utils/utils.py (1)
  • ensure_model (45-88)
plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (7)
agents-core/vision_agents/core/agents/agent_types.py (2)
  • AgentOptions (15-25)
  • default_agent_options (32-33)
agents-core/vision_agents/core/edge/types.py (1)
  • Participant (22-24)
agents-core/vision_agents/core/vad/silero.py (5)
  • SileroVADSession (69-135)
  • SileroVADSessionPool (18-66)
  • load (33-51)
  • session (53-66)
  • predict_speech (90-95)
agents-core/vision_agents/core/warmup.py (1)
  • Warmable (45-105)
plugins/fast_whisper/vision_agents/plugins/fast_whisper/stt.py (2)
  • on_warmup (73-88)
  • on_warmed_up (90-92)
plugins/aws/vision_agents/plugins/aws/aws_realtime.py (2)
  • on_warmup (197-198)
  • on_warmed_up (200-202)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (2)
  • on_warmup (122-132)
  • on_warmed_up (134-135)
plugins/fast_whisper/vision_agents/plugins/fast_whisper/stt.py (3)
agents-core/vision_agents/core/warmup.py (5)
  • warmup (24-42)
  • warmup (96-105)
  • Warmable (45-105)
  • on_warmup (76-82)
  • on_warmed_up (85-94)
plugins/fast_whisper/tests/test_fast_whisper_stt.py (1)
  • stt (14-21)
tests/test_warmup.py (2)
  • on_warmup (16-20)
  • on_warmed_up (22-24)
🪛 LanguageTool
DEVELOPMENT.md

[style] ~508-~508: Consider a more concise word here.
Context: ...- Video processors using YOLO models In order to standardise how these resources are loa...

(IN_ORDER_TO_PREMIUM)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: unit / Mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
  • GitHub Check: unit / Ruff
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff
  • GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
  • GitHub Check: unit / Mypy
🔇 Additional comments (21)
DEVELOPMENT.md (1)

500-551: Warmup documentation is clear and helpful.

The new section effectively explains the Warmable pattern with good motivation (Silero VAD, YOLO models) and a concrete, well-commented code example. The example correctly demonstrates generic typing (Warmable[WhisperModel]), proper method signatures (on_warmup async, on_warmed_up sync), and the lifecycle. This should help contributors understand the pattern quickly.

Minor style note: Line 509 could use "To standardise" instead of "In order to standardise" for conciseness, but the current phrasing is perfectly acceptable.

plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (9)

1-29: LGTM: Imports and constants align with warmup pattern.

The additions properly support the new Warmable lifecycle and VAD session pooling. The public constants for the ONNX model (filename and URL) follow established patterns seen in other warmup-enabled components.


45-50: LGTM: Class signature correctly implements Warmable.

The generic type parameter accurately describes the three-resource tuple (VAD pool, feature extractor, ONNX session) that will be loaded during warmup and bound during initialization. This matches the pattern established in other warmup-enabled plugins.


121-123: LGTM: State fields properly initialized.

The three resource fields correctly mirror the Warmable type parameter and follow the pattern of being None until on_warmed_up is called. The optional typing is appropriate for the lifecycle model.


125-143: LGTM: on_warmup implementation follows best practices.

Properly uses asyncio.to_thread for all blocking operations (directory creation, feature extractor initialization, ONNX session loading) to avoid blocking the event loop. Resource loading is well-structured and the return type matches the Warmable contract.


145-154: LGTM: on_warmed_up correctly binds resources.

Properly unpacks the warmed resources and creates a new VAD session from the pool (line 152). This per-instance session pattern is correct, enabling independent state management while reusing the shared ONNX model from the pool.


227-228: LGTM: Excellent defensive validation.

Clear guards with actionable error messages ensure that any attempt to process audio without proper warmup fails fast with explicit guidance ("call warmup() first"). This defensive approach catches misconfiguration early and provides developers with clear remediation steps.

Also applies to: 373-377


250-252: LGTM: VAD inference properly offloaded.

Correctly uses asyncio.to_thread for the potentially CPU-intensive VAD prediction, avoiding event loop blocking. The session is validated earlier in the method (line 227-228).


398-398: LGTM: ONNX inference uses warmed-up session.

Correctly references self._smart_turn which is initialized during warmup and validated earlier in the method (lines 376-377).


405-418: LGTM: ONNX session builder uses best practices.

Loading the model into memory (lines 409-410) before creating the session is the correct approach for multi-worker scenarios, avoiding file access contention. Session options are appropriately configured for inference workloads.

agents-core/vision_agents/core/vad/silero.py (3)

1-11: LGTM: Imports and exports properly structured.

The typing_extensions.Self import addresses Python 3.10 compatibility (per past review). The updated __all__ appropriately exposes the new pool/session pattern and the SILERO_CHUNK constant for consumers.


18-66: LGTM: Pool pattern correctly implements resource sharing.

The SileroVADSessionPool properly:

  • Loads the ONNX model once during async initialization (lines 46-50)
  • Uses asyncio.to_thread to avoid blocking the event loop during ONNX session creation
  • Provides a session() factory that creates stateful instances sharing the underlying model
  • Configures appropriate ONNX session options (single inter-op thread)

The type: ignore comments on lines 46-47 are acceptable given mypy's current limitations with asyncio.to_thread return types.


69-135: LGTM: Session refactoring maintains thread-safety.

The SileroVADSession correctly:

  • Accepts a shared ONNX model in the constructor (line 74)
  • Maintains per-instance state (_state, _context) for thread-safe concurrent usage
  • Consistently references the new field names throughout (_model, _context_size, _reset_interval_seconds)
  • Preserves the public predict_speech API for backward compatibility

This refactoring enables efficient resource sharing while maintaining independent state per session.

plugins/fast_whisper/vision_agents/plugins/fast_whisper/stt.py (1)

14-14: LGTM: Warmable integration looks correct.

The import and class signature changes properly integrate the Warmable protocol with the appropriate generic type parameter.

Also applies to: 27-27

plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (7)

2-2: LGTM: Imports and Warmable integration are correct.

The new imports support the Warmable lifecycle, and the class signature properly declares the tuple of resources (VAD pool, Whisper model, Vogent detector) that will be warmed up.

Also applies to: 6-6, 8-8, 16-18, 33-35


53-55: LGTM: Type-safe model size parameter.

The Literal type annotation correctly constrains the Whisper model size parameter to valid values.


105-114: LGTM: Attributes properly initialized.

The new attributes for Whisper configuration and runtime resources are appropriately initialized to None and will be populated during the warmup lifecycle.


151-155: LGTM: Resources properly bound to instance.

The on_warmed_up() method correctly unpacks the warmed resources and obtains a session from the VAD pool for this instance.


157-178: LGTM: Lifecycle management is robust.

The start() and stop() methods properly manage the background processing task with appropriate cancellation handling and cleanup.


236-237: LGTM: Initialization checks are consistent.

All three resources (VAD, Whisper, Vogent) now have explicit initialization checks that raise ValueError with clear messages. This addresses the past review concern about silent failures and ensures consistent error handling.

Also applies to: 361-362, 397-398


366-368: LGTM: Configurable language parameter.

The transcription now uses self._whisper_language instead of a hardcoded "en", making the language setting configurable per instance.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (1)

193-213: Use specific exception handling instead of generic Exception.

Line 211 violates the coding guideline: "Never write except Exception as e - use specific exception handling." The generic catch could mask unexpected errors. Consider catching specific exceptions like ValueError, RuntimeError, or model-specific errors that might occur during audio processing.

🔎 Proposed refactor
             except asyncio.TimeoutError:
                 # Timeout is expected - continue loop to check shutdown
                 continue
-            except Exception as e:
-                logger.error(f"Error processing audio: {e}")
+            except (ValueError, RuntimeError, OSError) as e:
+                logger.error(f"Error processing audio: {e}", exc_info=True)
+            except Exception as e:
+                # Catch-all for truly unexpected errors, but log with full traceback
+                logger.exception(f"Unexpected error processing audio: {e}")
+                raise  # Re-raise unexpected errors

Note: Adjust the specific exception types based on what _process_audio_packet, VAD, Whisper, and Vogent can raise.

Based on coding guidelines, ...

🧹 Nitpick comments (1)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (1)

101-102: Consider more specific type annotations.

The asyncio.Queue[Any] and Optional[asyncio.Task[Any]] use generic Any types. For better type safety, consider:

self._audio_queue: asyncio.Queue[tuple[PcmData, Participant, Optional[Conversation]]] = asyncio.Queue()
self._processing_task: Optional[asyncio.Task[None]] = None
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cbb396d and d0e3151.

📒 Files selected for processing (1)
  • plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

  • plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: unit / Mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
  • GitHub Check: unit / Mypy
🔇 Additional comments (5)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (5)

1-24: LGTM!

The imports and constants are well-organized and appropriate for the Warmable-based audio processing pipeline.


214-338: LGTM!

The audio processing pipeline is well-structured: VAD-based speech detection, intelligent segment assembly with silence tracking, and proper offloading of CPU-intensive operations to the thread pool. The initialization check at line 236 provides clear error messaging.


349-378: Excellent improvement!

The past review comment about silent failure has been properly addressed. Lines 362-364 now raise a clear ValueError when the Whisper model is uninitialized, providing consistent error handling with _predict().


380-445: LGTM!

Consistent initialization checks and proper thread-pool offloading for CPU-intensive Vogent predictions. The _get_previous_line helper cleanly extracts conversational context with appropriate text preprocessing.


157-165: Lifecycle guarantee is provided by the warmup framework.

The AgentLauncher and Warmable framework ensure that on_warmed_up() completes before start() can be called. WarmupCache.warmup() (line 42 in warmup.py) explicitly calls on_warmed_up(resource) and waits for completion before returning. AgentLauncher.launch() awaits _warmup_agent() before returning the agent instance. The defensive check at line 236 is sound practice but the error condition will not occur with correct API usage.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (1)

136-146: A whisper of contradiction in the margins.

The comment on line 137 states "compile_model=False to avoid torch.compile issues" but the code sets compile_model=True on line 140. This discrepancy could confuse future maintainers. Either update the comment to reflect the current choice, or revert to compile_model=False if the edge cases remain a concern.

🔎 Proposed fix
         logger.info("Loading Vogent turn detection model")
-        # Note: compile_model=False to avoid torch.compile issues with edge cases
+        # Note: compile_model=True enables torch.compile for better performance
         vogent = await asyncio.to_thread(
             lambda: VogentDetector(
                 compile_model=True,
🧹 Nitpick comments (4)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (2)

235-236: A message from another era, still speaking of the old ways.

The error message references warmup() but the new lifecycle uses on_warmup/on_warmed_up managed by WarmupCache. Consider updating to clarify the actual mechanism (e.g., "VAD model not initialized; ensure warmup completed via AgentLauncher").

🔎 Proposed fix
         if self._vad_session is None:
-            raise ValueError("The VAD model is not initialized, call warmup() first")
+            raise ValueError("VAD model not initialized; ensure warmup completed via AgentLauncher")

210-211: The net cast too wide catches all manner of darkness.

Per coding guidelines, avoid except Exception as e. In a background loop, broad catches prevent task death, but consider catching specific expected exceptions (e.g., ValueError, RuntimeError) and letting unexpected ones propagate or at least logging them at a higher severity level.

🔎 Proposed fix
             except asyncio.TimeoutError:
                 # Timeout is expected - continue loop to check shutdown
                 continue
-            except Exception as e:
-                logger.error(f"Error processing audio: {e}")
+            except (ValueError, RuntimeError) as e:
+                logger.error(f"Error processing audio: {e}")
+            except Exception:
+                logger.exception("Unexpected error in audio processing loop")
+                raise

Based on coding guidelines, specific exception handling is preferred.

plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (2)

226-227: The guard stands, though its words echo an older tongue.

The ValueError ensures fail-fast if warmup hasn't completed. Same observation as Vogent: the message references "warmup()" but the actual mechanism is on_warmup/on_warmed_up via AgentLauncher. Consider updating for clarity.


202-203: The net again cast too wide, swallowing all that stirs.

Per coding guidelines, avoid except Exception as e. Consider catching specific expected exceptions and letting unexpected ones propagate with proper logging. See the similar comment on Vogent for a suggested fix pattern.

Based on coding guidelines, specific exception handling is preferred.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d0e3151 and 11bcd2d.

📒 Files selected for processing (3)
  • agents-core/vision_agents/core/turn_detection/turn_detection.py
  • plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py
  • plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

  • agents-core/vision_agents/core/turn_detection/turn_detection.py
  • plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py
  • plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
🧠 Learnings (1)
📚 Learning: 2025-10-13T22:00:34.300Z
Learnt from: dangusev
Repo: GetStream/Vision-Agents PR: 98
File: plugins/deepgram/vision_agents/plugins/deepgram/stt.py:135-150
Timestamp: 2025-10-13T22:00:34.300Z
Learning: In the Deepgram STT plugin (plugins/deepgram/vision_agents/plugins/deepgram/stt.py), the `started()` method is designed to wait for the connection attempt to complete, not to guarantee a successful connection. It's acceptable for the connection attempt to fail, and downstream code handles the case where `self.dg_connection` is `None`. The `_connected_once` event is set in the `finally` block intentionally to signal attempt completion.

Applied to files:

  • plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py
  • plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
🧬 Code graph analysis (1)
plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (7)
agents-core/vision_agents/core/agents/agent_types.py (2)
  • AgentOptions (15-25)
  • default_agent_options (32-33)
agents-core/vision_agents/core/edge/types.py (1)
  • Participant (22-24)
agents-core/vision_agents/core/turn_detection/turn_detection.py (2)
  • TurnDetector (20-82)
  • start (74-78)
agents-core/vision_agents/core/turn_detection/events.py (1)
  • TurnStartedEvent (11-25)
agents-core/vision_agents/core/utils/utils.py (1)
  • ensure_model (45-88)
agents-core/vision_agents/core/vad/silero.py (5)
  • SileroVADSession (69-135)
  • SileroVADSessionPool (18-66)
  • load (33-51)
  • session (53-66)
  • predict_speech (90-95)
agents-core/vision_agents/core/warmup.py (1)
  • Warmable (45-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Mypy
  • GitHub Check: unit / Mypy
  • GitHub Check: unit / Test "not integration"
🔇 Additional comments (11)
agents-core/vision_agents/core/turn_detection/turn_detection.py (1)

74-78: The double-start guard emerges, a sentinel at the gate.

This addition prevents orphaned background tasks and duplicate initialization. The base class now shoulders this burden, so subclasses (Vogent, SmartTurn) can call await super().start() and trust the protection is in place. The ValueError with self in the message provides useful debugging context.

plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (5)

33-35: The class declaration, now bearing the weight of warmth.

The dual inheritance from TurnDetector and Warmable[tuple[...]] correctly expresses the lifecycle contract. The tuple type parameter clearly documents what resources are loaded and shared.


151-155: Resources descend, finding their proper vessels.

The binding correctly extracts a per-instance SileroVADSession from the shared pool while directly assigning the stateless WhisperModel and VogentDetector. This pattern enables model sharing across instances without state contamination.


157-177: The lifecycle, now guarded by the ancestor's vigilance.

The call to super().start() now inherits the double-start protection from the base class, addressing the prior review concern about orphaned _processing_task. The stop() method properly signals shutdown, cancels the task, and awaits its termination.


359-377: The silent void now speaks its truth.

The ValueError on uninitialized _whisper addresses the prior review concern about silent failures. The language parameter is now properly configurable via self._whisper_language. Consider aligning the error message with the new lifecycle terminology as noted for the VAD check.


397-421: The oracle speaks, but only when prepared.

The ValueError guard ensures fail-fast behavior if warmup hasn't completed. The multimodal prediction correctly combines audio features with text context. Implementation is sound.

plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (5)

45-50: The class declaration mirrors its sibling, bearing warmth.

The Warmable type parameter clearly specifies the three resources (VAD pool, Whisper extractor, ONNX session) that are loaded and cached. Consistent with the Vogent implementation pattern.


125-143: Resources summoned from the depths, each in its proper thread.

The warmup correctly offloads blocking I/O (directory creation, model download, feature extractor init, ONNX session build) to threads. The ensure_model utility handles the async download. Resources are returned as a tuple for caching.


145-154: The binding complete, resources anchored to their vessel.

Per-instance VAD session extraction from the shared pool, with stateless resources (feature extractor, ONNX session) assigned directly. Pattern consistent with Vogent implementation.


156-159: The ancestor's guard now shields this threshold.

The call to super().start() inherits the double-start protection, addressing the prior review concern about explicit guards. The docstring requirement from the past comment is now unnecessary since the base class enforces the contract.


371-397: Twin sentinels guard the gate of prediction.

The dual ValueError guards for _whisper_extractor and _smart_turn ensure fail-fast behavior. The ONNX inference is correctly executed after validation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (1)

156-159: Add explicit guard or docstring to clarify warmup requirement.

The start() method lacks an explicit check that warmup has completed. While _process_audio_packet raises clear ValueErrors if models are uninitialized (line 225), catching this requirement earlier in start() would improve defensive programming. Either add an explicit guard (e.g., if self._vad_session is None: raise RuntimeError(...)) or add a docstring documenting that warmup() must be called before start().

🧹 Nitpick comments (1)
plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (1)

175-203: Log full traceback in background task exception handler.

The broad except Exception at line 201 contradicts the coding guideline to use specific exception handling. While a catch-all might be justified in a background task loop to prevent the task from dying, consider at minimum logging the full traceback using logger.exception() instead of logger.error() to aid debugging.

🔎 Proposed improvement
             except asyncio.TimeoutError:
                 # Timeout is expected - continue loop to check shutdown
                 continue
             except Exception as e:
-                logger.error(f"Error processing audio: {e}")
+                logger.exception(f"Error processing audio: {e}")

Or, if specific exceptions can be anticipated (e.g., ValueError from uninitialized models), handle them specifically:

             except asyncio.TimeoutError:
                 continue
+            except ValueError as e:
+                logger.error(f"Configuration error in audio processing: {e}")
             except Exception as e:
-                logger.error(f"Error processing audio: {e}")
+                logger.exception(f"Unexpected error processing audio: {e}")

Based on coding guidelines.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 11bcd2d and f6dbdb1.

📒 Files selected for processing (1)
  • plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

  • plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py
🧠 Learnings (1)
📚 Learning: 2025-10-13T22:00:34.300Z
Learnt from: dangusev
Repo: GetStream/Vision-Agents PR: 98
File: plugins/deepgram/vision_agents/plugins/deepgram/stt.py:135-150
Timestamp: 2025-10-13T22:00:34.300Z
Learning: In the Deepgram STT plugin (plugins/deepgram/vision_agents/plugins/deepgram/stt.py), the `started()` method is designed to wait for the connection attempt to complete, not to guarantee a successful connection. It's acceptable for the connection attempt to fail, and downstream code handles the case where `self.dg_connection` is `None`. The `_connected_once` event is set in the `finally` block intentionally to signal attempt completion.

Applied to files:

  • plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py
🧬 Code graph analysis (1)
plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (5)
agents-core/vision_agents/core/agents/agent_types.py (1)
  • AgentOptions (15-25)
agents-core/vision_agents/core/edge/types.py (1)
  • Participant (22-24)
agents-core/vision_agents/core/turn_detection/turn_detection.py (2)
  • TurnDetector (20-82)
  • start (74-78)
agents-core/vision_agents/core/vad/silero.py (5)
  • SileroVADSession (69-135)
  • SileroVADSessionPool (18-66)
  • load (33-51)
  • session (53-66)
  • predict_speech (90-95)
agents-core/vision_agents/core/warmup.py (1)
  • Warmable (45-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Mypy
  • GitHub Check: unit / Mypy
  • GitHub Check: unit / Test "not integration"
🔇 Additional comments (2)
plugins/smart_turn/vision_agents/plugins/smart_turn/smart_turn_detection.py (2)

125-143: LGTM! Warmup implementation follows the pattern correctly.

The on_warmup method correctly:

  • Loads shared resources (VAD pool, ONNX model) that can be cached.
  • Uses asyncio.to_thread for blocking I/O operations (WhisperFeatureExtractor initialization, ONNX session building).
  • Returns the pool (not a session), which aligns with the cache-and-share design.

403-416: LGTM! Loading ONNX model from memory prevents multi-worker file access issues.

Building the session from model_bytes (line 416) rather than a file path is a good practice for avoiding file access conflicts in multi-worker deployments. The session configuration (sequential execution, single thread, full optimization) is appropriate.

@dangusev dangusev merged commit 0cf9f70 into main Dec 29, 2025
10 checks passed
@dangusev dangusev deleted the feature/new-loading branch December 29, 2025 22:09
@coderabbitai coderabbitai bot mentioned this pull request Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants