feat: added metrics and example by d3xvn · Pull Request #278 · GetStream/Vision-Agents

d3xvn · 2026-01-05T17:11:47Z

Summary by CodeRabbit

New Features
- Added richer LLM/VLM/video lifecycle events: request-start, per-chunk first-chunk/ttft, completion latency, token/frame counts, and inference/request IDs.
Observability
- Opt-in MetricsCollector and Prometheus-first metrics API; expanded metrics for LLM, VLM, STT, TTS, realtime, turns, video, and detections; per-agent metrics exposed.
Documentation
- New Prometheus example and README demonstrating metrics export and collection.
Tests
- New unit tests for MetricsCollector.
Bug Fixes
- Plugin timing/metadata and STT processing-time reporting corrections.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-05T17:12:04Z

📝 Walkthrough

Walkthrough

Adds Prometheus-first metrics, a MetricsCollector that subscribes to agent events, extends LLM/VLM event types with request/start/chunk/completion/error timing and token/frame fields, threads timing through many plugins, instruments STT/TTS/video detectors, and adds examples and tests for metrics collection.

Changes

Cohort / File(s)	Summary
Core LLM & VLM Events `agents-core/vision_agents/core/llm/events.py`	Added `LLMRequestStartedEvent`; extended `LLMResponseChunkEvent` (`is_first_chunk`, `time_to_first_token_ms`) and `LLMResponseCompletedEvent` (latency_ms, time_to_first_token_ms, input/output/total tokens, model); added VLM events (`VLMInferenceStartEvent`, `VLMInferenceCompletedEvent`, `VLMErrorEvent`) and `LLMErrorEvent`.
Observability Public API `agents-core/vision_agents/core/observability/__init__.py`	Replaced tracer exposure with `MetricsCollector`; updated exported metrics and `__all__` to include new llm_/vlm_/realtime_/stt_/tts_/video_/turn_ metric names and `AgentMetrics`.
Metrics Definitions `agents-core/vision_agents/core/observability/metrics.py`	Switched to meter-based Prometheus wiring (PrometheusMetricReader); new histograms/counters for LLM, VLM, STT, TTS, realtime, turn, video metrics; removed tracer initialization.
Metrics Collector `agents-core/vision_agents/core/observability/collector.py`	New `MetricsCollector` class subscribing to agent events; implements many `_on_*` handlers to emit counters/histograms for latency, TTFT, tokens, bytes/durations, sessions, frames, detections, and errors.
Agent-level Metrics Container `agents-core/vision_agents/core/observability/agent.py`, `agents-core/vision_agents/core/agents/agents.py`	Added `AgentMetrics` dataclass and simple metric types; Agent now initializes/exposes `metrics` property referencing `AgentMetrics`.
Event Base & Exports `agents-core/vision_agents/core/events/base.py`, `agents-core/vision_agents/core/events/__init__.py`	Added `VideoProcessorDetectionEvent` (model_id, inference_time_ms, detection_count) and exported it.
LLM Plugins (timing/events) `plugins///*_llm.py` e.g. `plugins/openai/...`, `plugins/anthropic/...`, `plugins/aws/...`, `plugins/gemini/...`, `plugins/huggingface/...`, `plugins/openrouter/...`, `plugins/xai/...`	Threaded `request_start_time`/`first_token_time` through streaming/non-streaming flows; emit `LLMRequestStartedEvent`; per-chunk `is_first_chunk` and `time_to_first_token_ms`; emit `LLMResponseCompletedEvent` with latency, TTFT, token usage; added `_emit_completion_event` helpers in several plugins.
VLM Plugins (timing, ids, frames) `plugins///*_vlm.py` e.g. `plugins/huggingface/...vlm.py`, `plugins/moondream/...`, `plugins/openai/..._vlm.py`	Emit `VLMInferenceStartEvent` (frames_count, inference_id), `VLMInferenceCompletedEvent` (text, latency, frames_processed, detections, tokens) and `VLMErrorEvent`; also emit LLM events for compatibility and per-chunk timing.
STT / TTS Plugins (latency) `plugins/deepgram/...`, `plugins/elevenlabs/...`, `plugins/fish/...`, `plugins/wizper/...`, `agents-core/vision_agents/core/tts/tts.py`	STT plugins now record processing_time_ms and include it in TranscriptResponse; TTS core removed in-library metric writes (moved to collector).
Video / Detection Plugins `plugins/roboflow/...`	Detection event base switched to `VideoProcessorDetectionEvent`; processors measure inference_time_ms, include plugin/model context and detection_count in emitted events.
Examples & Config `examples/03_prometheus_metrics_example/`, `examples/02_golf_coach_example/`	New Prometheus metrics example (README, code, pyproject) and golf_coach_with_metrics example wiring Prometheus and attaching `MetricsCollector`.
Tests `tests/test_metrics_collector.py`	New unit tests for `MetricsCollector` validating metric recording across normal/partial/error cases and provider attribute extraction.
Misc / Minor Fixes `plugins/getstream/...`, various `plugins/*` files	Small fixes and imports (e.g., ChannelMember `is_global_banned=False`, plugin_name typo fixes, added `time`/`uuid` imports).

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Agent
    participant LLM
    participant VLM
    participant EventSystem
    participant MetricsCollector
    participant MeterProvider
    participant Prometheus

    User->>Agent: send request (audio/video/text)
    Agent->>LLM: forward request / create_response
    Agent->>VLM: forward frames (if vision)

    LLM->>EventSystem: emit LLMRequestStartedEvent
    VLM->>EventSystem: emit VLMInferenceStartEvent

    EventSystem->>MetricsCollector: notify start events
    MetricsCollector->>MeterProvider: record counters/histograms

    LLM->>EventSystem: emit LLMResponseChunkEvent (is_first_chunk, ttft)
    EventSystem->>MetricsCollector: notify chunk -> record ttft/interim metrics

    VLM->>EventSystem: emit VLMInferenceCompletedEvent (latency, frames, detections)
    EventSystem->>MetricsCollector: notify VLM completion -> record vlm metrics

    LLM->>EventSystem: emit LLMResponseCompletedEvent (latency, tokens, model)
    EventSystem->>MetricsCollector: notify completion -> record llm latency & token counters

    MeterProvider->>Prometheus: expose /metrics endpoint
    Prometheus->>Prometheus: scrape /metrics

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Feat: Add support for Moondream VLM functions #154 — touches Moondream VLM implementations; overlaps with VLM event wiring and inference-id/timing additions.
[AI-271] Elevenhour labs Scribe2 #170 — modifies ElevenLabs STT; related to STT timing/processing_time_ms instrumentation.
Updates to ChatCompletionsVLM #263 — touches OpenAI chat completions VLM; overlaps frame/inference handling and VLM timing/events.

Suggested reviewers

d3xvn
Nash0x7E2

Poem

The clock opens its black mouth at the call,
a white first token fractures into air,
latency settles like a small, precise grief,
collectors count the hush of frames and breaths,
and metrics stitch our silences into light.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.97% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: added metrics and example' is clear and directly describes the main changes—introducing metrics infrastructure and accompanying examples—which align with the substantial changeset across observability, events, and example files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

agents-core/vision_agents/core/observability/agent.py (2)
20-29: Counter type constraint may be too restrictive.

Counter.inc(value: int) only accepts integers, but some metrics (e.g., durations, latencies) are floats. In collector.py, you cast to int before calling inc() for duration metrics, which loses precision.

Consider accepting int | float for consistency with Average.update().
♻️ Proposed change
-    def inc(self, value: int) -> None:
+    def inc(self, value: int | float) -> None:
         self._total += value
 
-    def value(self) -> int:
+    def value(self) -> int | float:
         return self._total
49-50: Frozen dataclass with mutable field objects.

Using frozen=True with default_factory creates immutable field bindings but the Counter and Average instances themselves remain mutable. This is the intended behavior here (you want to update metrics without reassigning fields), but it may surprise users expecting full immutability.

Consider adding a docstring note clarifying that while field references cannot be reassigned, the metric objects they reference are intentionally mutable.
tests/test_metrics_collector.py (1)
55-96: Fixture uses hasattr which violates coding guidelines.

Lines 89 and 91 use hasattr to check for metric methods, but the coding guidelines state "Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python."

Consider using explicit method checks or a try/except pattern.
♻️ Proposed fix using getattr with default
     patches = []
     try:
         for metric in all_metrics:
-            if hasattr(metric, "record"):
-                patches.append(patch.object(metric, "record").start())
-            if hasattr(metric, "add"):
-                patches.append(patch.object(metric, "add").start())
+            record_method = getattr(metric, "record", None)
+            if record_method is not None:
+                patches.append(patch.object(metric, "record").start())
+            add_method = getattr(metric, "add", None)
+            if add_method is not None:
+                patches.append(patch.object(metric, "add").start())
         yield
     finally:
         for patch_ in reversed(patches):
             patch_.stop()
Note: While getattr with a default is still listed in the guidelines, it's slightly more explicit than hasattr. Alternatively, if the metric interfaces are known, you could check against a type or interface directly.

Based on coding guidelines.
agents-core/vision_agents/core/observability/__init__.py (1)
60-107: Well-organized __all__ with categorical grouping.

The exports are logically grouped by component (STT, TTS, LLM, Turn, Realtime, VLM, Video). Consider moving AgentMetrics near MetricsCollector at the top for consistency with the "main class" grouping.
♻️ Suggested reordering
 __all__ = [
     # Main class
     "MetricsCollector",
+    "AgentMetrics",
     # Meter
     "meter",
     # ... rest of metrics ...
-    "AgentMetrics",
 ]

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e75b550 and 57c9ea2.

📒 Files selected for processing (5)

agents-core/vision_agents/core/agents/agents.py
agents-core/vision_agents/core/observability/__init__.py
agents-core/vision_agents/core/observability/agent.py
agents-core/vision_agents/core/observability/collector.py
tests/test_metrics_collector.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*test*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*test*.py: Never mock in tests; use pytest for testing
Mark integration tests with @pytest.mark.integration decorator
@pytest.mark.asyncio is not needed - it is automatic

Files:

tests/test_metrics_collector.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

tests/test_metrics_collector.py
agents-core/vision_agents/core/agents/agents.py
agents-core/vision_agents/core/observability/collector.py
agents-core/vision_agents/core/observability/agent.py
agents-core/vision_agents/core/observability/__init__.py

🧬 Code graph analysis (2)

tests/test_metrics_collector.py (5)

agents-core/vision_agents/core/llm/events.py (4)

LLMResponseCompletedEvent (129-153)

ToolEndEvent (167-176)

VLMErrorEvent (239-251)

VLMInferenceCompletedEvent (213-235)

agents-core/vision_agents/core/observability/agent.py (1)

AgentMetrics (50-145)

agents-core/vision_agents/core/observability/collector.py (2)

MetricsCollector (51-529)

_base_attributes (517-529)

agents-core/vision_agents/core/stt/events.py (3)

STTErrorEvent (81-93)

STTTranscriptEvent (17-47)

TranscriptResponse (7-13)

agents-core/vision_agents/core/turn_detection/events.py (1)

TurnEndedEvent (29-46)

agents-core/vision_agents/core/observability/__init__.py (3)

agents-core/vision_agents/core/observability/agent.py (1)

AgentMetrics (50-145)

agents-core/vision_agents/core/observability/collector.py (1)

MetricsCollector (51-529)

agents-core/vision_agents/core/agents/agents.py (1)

metrics (1376-1377)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Mypy
GitHub Check: unit / Mypy
GitHub Check: unit / Test "not integration"

🔇 Additional comments (18)

agents-core/vision_agents/core/observability/agent.py (2)

5-17: Well-structured metric abstraction.

The _Metric ABC with its description and value() contract provides a clean foundation. The __repr__ implementation will be helpful for debugging.

55-145: Comprehensive metric coverage.

The field definitions provide thorough coverage across STT, TTS, LLM, Turn Detection, Realtime, VLM, and Video processing. Descriptions are clear and follow a consistent naming convention (__avg for averages, __total for counters).

agents-core/vision_agents/core/agents/agents.py (3)

52-52: Import aligns with new observability module structure.

Clean import of AgentMetrics from the observability submodule.

256-256: Per-agent metrics initialization.

Good placement at the end of __init__ after all validation and setup. Each agent instance gets its own AgentMetrics for isolated metric collection.

1375-1377: Metrics property exposes telemetry cleanly.

Read-only property providing access to per-agent metrics. Follows the existing pattern of exposing internal state via properties (e.g., closed).
agents-core/vision_agents/core/observability/collector.py (6)
1-14: Clear module documentation with usage example.

The docstring provides a concise explanation of the collector's purpose and shows the opt-in usage pattern. This helps developers understand how to integrate metrics collection.

62-72: Clean initialization with subscription delegation.

The constructor properly stores agent reference and delegates subscription setup to focused helper methods. Storing _agent_metrics directly avoids repeated property access.

236-256: Session tracking with acknowledged memory concern.

The TODO at line 241 acknowledges potential unbounded growth in _realtime_session_starts. Consider implementing the bounded cache approach suggested in past reviews to prevent memory issues in long-running agents.

Based on past review comments, the suggested approach was:
from collections import OrderedDict

# In __init__:
self._realtime_session_starts: OrderedDict[str, float] = OrderedDict()
self._max_tracked_sessions = 1000

# In _on_realtime_connected, after adding:
if len(self._realtime_session_starts) > self._max_tracked_sessions:
    self._realtime_session_starts.popitem(last=False)
258-276: Proper null safety for audio samples.

Line 262 correctly checks event.data.samples is not None before accessing nbytes, addressing the past review concern about PcmData potentially having None samples.

369-388: TTS synthesis handler with correct null checks.

Line 374 uses is not None for synthesis_time_ms, properly handling the edge case where 0.0 is a valid measurement. This addresses the past review feedback.

517-529: Simple and effective attribute extraction.

The _base_attributes helper correctly handles the optional plugin_name field and returns a clean dictionary for metric labeling.
tests/test_metrics_collector.py (5)

147-177: Thorough LLM response completed test.

Good coverage verifying both OpenTelemetry metric recording and AgentMetrics aggregate updates. The assertions validate attributes, values, and the dual-recording pattern.

179-202: Partial data handling test validates defensive coding.

Tests that missing optional fields don't cause errors and metrics aren't incorrectly recorded. The assertions for None averages and 0 totals confirm correct behavior.

227-253: STT transcript test with potential type mismatch.

Line 253 asserts agent.metrics.stt_audio_duration_ms__total.value() == 2000.0, but the collector calls inc(int(event.audio_duration_ms)) which truncates to integer. The assertion will pass since 2000 == 2000.0 in Python, but the comment at line 352 of collector.py shows explicit int casting.

342-377: VLM inference test validates video metrics integration.

Comprehensive test covering inference count, latency, frame processing, and token tracking. The assertions confirm proper metric attribution with provider and model.

399-417: Base attributes extraction tests.

Good edge case coverage testing both present and missing plugin_name scenarios. These validate the _base_attributes helper method behavior.

agents-core/vision_agents/core/observability/__init__.py (2)

1-11: Clear module documentation with opt-in usage pattern.

The updated docstring explains the observability package purpose and demonstrates the MetricsCollector opt-in pattern. This guides developers on proper integration.

13-14: New public exports for metrics infrastructure.

AgentMetrics and MetricsCollector are now part of the public API, enabling per-agent metrics aggregation and event-driven collection.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Fix all issues with AI Agents 🤖

In @plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py:
- Around line 233-239: The closing parenthesis for the
self._emit_completion_event call is mis-indented which can create a syntax
error; fix by aligning the parentheses properly so the call ends with the
closing parenthesis at the same indentation level as the method call start
(ensure the args llm_response.original, llm_response.text, latency_ms=latency_ms
are inside the call and the final ')' closes self._emit_completion_event(...));
reference the self._emit_completion_event invocation and the variables
llm_response and latency_ms/request_start_time to locate and correct the
indentation.

In @plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py:
- Around line 331-332: Reset _audio_start_time whenever an utterance is aborted
or the connection closes so processing_time_ms isn't calculated from a stale
start time; specifically, add assignments setting self._audio_start_time = None
in the error and close handlers (_on_error and _on_close), and in the public
cleanup methods clear() and close(), ensuring this mirrors the existing reset
performed after a committed transcript. Locate uses of _audio_start_time in the
commit/processing path and add the same reset in those four places to guarantee
each new utterance starts timing fresh.

In @plugins/xai/vision_agents/plugins/xai/llm.py:
- Around line 450-454: The current check sets is_first = first_token_time is not
None and request_start_time is not None which stays true after the first token
and mislabels subsequent chunks; change the logic to determine "first chunk" by
checking whether first_token_time is None (and request_start_time is present)
before you set first_token_time, or introduce and pass an explicit flag like
is_first_content_chunk (true when first_token_time is None) from the caller;
compute that flag prior to mutating first_token_time, use it to compute ttft_ms
only for the real first chunk, and then set first_token_time as needed.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)

501-520: First chunk detection logic appears incorrect.

Line 504 sets is_first = first_token_time is not None and request_start_time is not None. This condition is true for every chunk after the first token is received, not just the first chunk. The is_first_chunk field should only be True for the actual first text delta.

🔎 Proposed fix

         if event.type == "response.output_text.delta":
             delta_event: ResponseTextDeltaEvent = event
             # Calculate time to first token for the first chunk
-            is_first = first_token_time is not None and request_start_time is not None
+            # first_token_time is set only on the first delta event, so check sequence_number
+            is_first = delta_event.sequence_number == 0
             ttft_ms = None
-            if is_first:
+            if is_first and first_token_time is not None and request_start_time is not None:
                 ttft_ms = (first_token_time - request_start_time) * 1000

Alternatively, track a boolean flag that's set after emitting the first chunk:

+            # Only the first delta event should have is_first_chunk=True
+            is_first = (
+                first_token_time is not None
+                and request_start_time is not None
+                and delta_event.sequence_number == 0
+            )

🧹 Nitpick comments (18)

plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (1)
254-258: Consider extracting the duplicate processing time calculation.

The processing time calculation appears in both _on_partial_transcript and _on_committed_transcript. Extracting this to a helper method would reduce duplication.
🔎 Proposed refactor
+    def _get_processing_time_ms(self) -> Optional[float]:
+        """Calculate processing time from audio start to now, in milliseconds."""
+        if self._audio_start_time is not None:
+            return (time.perf_counter() - self._audio_start_time) * 1000
+        return None
+
     def _on_partial_transcript(self, transcription_data: dict[str, Any]):
         ...
-        # Calculate processing time (time from first audio to transcript)
-        processing_time_ms: Optional[float] = None
-        if self._audio_start_time is not None:
-            processing_time_ms = (time.perf_counter() - self._audio_start_time) * 1000
+        processing_time_ms = self._get_processing_time_ms()
         
     def _on_committed_transcript(self, transcription_data: dict[str, Any]):
         ...
-        # Calculate processing time (time from first audio to transcript)
-        processing_time_ms: Optional[float] = None
-        if self._audio_start_time is not None:
-            processing_time_ms = (time.perf_counter() - self._audio_start_time) * 1000
+        processing_time_ms = self._get_processing_time_ms()
Also applies to: 307-311
examples/03_prometheus_metrics_example/pyproject.toml (1)
6-17: Consider pinning dependency versions for reproducibility.

Using "latest" for all dependencies may cause builds to break unexpectedly when upstream packages release new versions. For example projects, pinning versions demonstrates best practices and ensures reproducible environments.
🔎 Alternative approach using version constraints

Replace "latest" with specific version constraints or ranges:
 dependencies = [
-    "vision-agents",
-    "vision-agents-plugins-gemini",
-    "vision-agents-plugins-deepgram",
-    "vision-agents-plugins-elevenlabs",
-    "vision-agents-plugins-getstream",
-    "python-dotenv",
-    "opentelemetry-api",
-    "opentelemetry-sdk",
-    "opentelemetry-exporter-prometheus",
-    "prometheus-client",
+    "vision-agents>=0.1.0",
+    "vision-agents-plugins-gemini>=0.1.0",
+    "vision-agents-plugins-deepgram>=0.1.0",
+    "vision-agents-plugins-elevenlabs>=0.1.0",
+    "vision-agents-plugins-getstream>=0.1.0",
+    "python-dotenv>=1.0.0",
+    "opentelemetry-api>=1.20.0",
+    "opentelemetry-sdk>=1.20.0",
+    "opentelemetry-exporter-prometheus>=0.41b0",
+    "prometheus-client>=0.19.0",
 ]
examples/03_prometheus_metrics_example/README.md (2)
20-20: Format the bare URL for better rendering.

The URL should be enclosed in angle brackets or converted to a proper Markdown link to improve rendering across different Markdown parsers.
🔎 Proposed formatting fix
-Then open http://localhost:9464/metrics in your browser to see real-time metrics as you talk to the agent.
+Then open <http://localhost:9464/metrics> in your browser to see real-time metrics as you talk to the agent.
113-119: Specify language for the fenced code block.

The code fence at line 113 should include a language identifier (e.g., bash or shell) for proper syntax highlighting.
🔎 Proposed formatting fix
-```
+```bash
 GOOGLE_API_KEY=your_key
 DEEPGRAM_API_KEY=your_key
 ELEVENLABS_API_KEY=your_key
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)
304-330: Refactor to eliminate duplicated event emission logic.

The VQA and caption branches contain nearly identical event emission code (latency calculation + three event emissions). This duplication makes maintenance harder and increases the risk of inconsistencies.
🔎 Proposed refactoring to extract common event emission

Extract the event emission logic into a helper method:
+    def _emit_completion_events(
+        self,
+        text: str,
+        inference_id: str,
+        start_time: float,
+    ) -> None:
+        """Emit completion events for VLM inference."""
+        latency_ms = (time.perf_counter() - start_time) * 1000
+
+        # Emit chunk event for TTS
+        self.events.send(LLMResponseChunkEvent(delta=text))
+
+        # Emit VLM-specific completion event with metrics
+        self.events.send(
+            VLMInferenceCompletedEvent(
+                plugin_name="moondream_local",
+                inference_id=inference_id,
+                model=self.model_name,
+                text=text,
+                latency_ms=latency_ms,
+                frames_processed=1,
+            )
+        )
+
+        # Also emit LLM completion for compatibility
+        self.events.send(
+            LLMResponseCompletedEvent(
+                plugin_name="moondream_local",
+                text=text,
+                latency_ms=latency_ms,
+                model=self.model_name,
+            )
+        )
Then use it in both branches:
                if not answer:
                    logger.warning("Moondream query returned empty answer")
                    return None

-                latency_ms = (time.perf_counter() - start_time) * 1000
-
-                # Emit chunk event for TTS
-                self.events.send(LLMResponseChunkEvent(delta=answer))
-
-                # Emit VLM-specific completion event with metrics
-                self.events.send(
-                    VLMInferenceCompletedEvent(
-                        plugin_name="moondream_local",
-                        inference_id=inference_id,
-                        model=self.model_name,
-                        text=answer,
-                        latency_ms=latency_ms,
-                        frames_processed=1,
-                    )
-                )
-
-                # Also emit LLM completion for compatibility
-                self.events.send(
-                    LLMResponseCompletedEvent(
-                        plugin_name="moondream_local",
-                        text=answer,
-                        latency_ms=latency_ms,
-                        model=self.model_name,
-                    )
-                )
+                self._emit_completion_events(answer, inference_id, start_time)

                logger.info(f"Moondream VQA response: {answer}")
                return LLMResponseEvent(original=answer, text=answer)
Apply the same pattern to the caption branch.
Also applies to: 350-376
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)

184-210: Refactor to eliminate duplicated event emission logic.

Similar to the local VLM implementation, the VQA and caption branches contain duplicated event emission code. Consider extracting this into a shared helper method to improve maintainability and reduce inconsistency risk.

The same refactoring pattern suggested for moondream_local_vlm.py applies here: extract a _emit_completion_events helper method to consolidate the three event emissions and latency calculation that are identical in both branches.

Also applies to: 224-250

tests/test_metrics_collector.py (1)

10-10: Avoid using MagicMock in tests per coding guidelines.

The coding guidelines specify "Never mock in tests; use pytest for testing." The MagicMock usage for agent on lines 82-88 violates this guideline. Consider creating a minimal real Agent instance or a purpose-built test fixture instead.

Based on coding guidelines, mocking should be avoided in favor of real instances or pytest fixtures.

Also applies to: 81-93

plugins/aws/vision_agents/plugins/aws/aws_llm.py (1)

599-608: Streaming path missing token usage extraction.

The non-streaming converse method extracts token usage from the response (lines 316-321), but the streaming converse_stream method does not. If AWS Bedrock provides token usage in streaming responses (typically in the final event), consider extracting it for consistency.
agents-core/vision_agents/core/observability/collector.py (2)
476-488: Avoid hasattr per coding guidelines.

The coding guidelines state "Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python." Consider using a try/except block or checking against a known interface.
🔎 Proposed fix
     def _base_attributes(self, event) -> dict:
         """Extract base attributes from an event.

         Args:
             event: The event to extract attributes from.

         Returns:
             Dictionary of base attributes.
         """
         attrs = {}
-        if hasattr(event, "plugin_name") and event.plugin_name:
+        plugin_name = getattr(event, "plugin_name", None)
+        if plugin_name:
-            attrs["provider"] = event.plugin_name
+            attrs["provider"] = plugin_name
         return attrs
Actually, since getattr is also discouraged, a better approach would be:
     def _base_attributes(self, event) -> dict:
         attrs = {}
+        try:
+            if event.plugin_name:
+                attrs["provider"] = event.plugin_name
+        except AttributeError:
+            pass
-        if hasattr(event, "plugin_name") and event.plugin_name:
-            attrs["provider"] = event.plugin_name
         return attrs
Based on coding guidelines, hasattr should be avoided.

450-470: Avoid getattr per coding guidelines.

Lines 455, 460, and 468 use getattr to access potentially missing attributes. Per coding guidelines, prefer normal attribute access. Consider defining a typed event class for detection events or using try/except.
🔎 Proposed fix using try/except
     def _on_detection_completed(self, event: PluginBaseEvent) -> None:
         """Handle video detection completed event."""
         attrs = self._base_attributes(event)

         # Add model info if available
-        model_id = getattr(event, "model_id", None)
-        if model_id:
-            attrs["model"] = model_id
+        try:
+            if event.model_id:
+                attrs["model"] = event.model_id
+        except AttributeError:
+            pass

         # Extract detection count from event if available
-        objects = getattr(event, "objects", [])
-        if objects:
-            metrics.video_detections.add(len(objects), attrs)
+        try:
+            if event.objects:
+                metrics.video_detections.add(len(event.objects), attrs)
+        except AttributeError:
+            pass

         # Record frame processed
         metrics.video_frames_processed.add(1, attrs)

         # Record inference latency if available
-        inference_time_ms = getattr(event, "inference_time_ms", None)
-        if inference_time_ms is not None:
-            metrics.video_processing_latency_ms.record(inference_time_ms, attrs)
+        try:
+            if event.inference_time_ms is not None:
+                metrics.video_processing_latency_ms.record(event.inference_time_ms, attrs)
+        except AttributeError:
+            pass
Based on coding guidelines, getattr should be avoided.
examples/03_prometheus_metrics_example/prometheus_metrics_example.py (1)
80-106: Consider clarifying the unused collector variable.

The collector variable on line 83 is assigned but never referenced afterward. While this is intentional (MetricsCollector subscribes to events on init), a brief comment or using _collector would clarify intent.
🔎 Proposed fix
 async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
     """Join a call with metrics collection enabled."""
-    # Attach MetricsCollector to record OpenTelemetry metrics
-    collector = MetricsCollector(agent)
+    # Attach MetricsCollector to record OpenTelemetry metrics.
+    # The collector subscribes to events on initialization - no further reference needed.
+    _collector = MetricsCollector(agent)
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
343-350: Avoid getattr per coding guidelines.

Lines 348-349 use getattr to access token counts. Prefer try/except or direct attribute access.
🔎 Proposed fix
         # Extract token usage from response if available
         input_tokens: Optional[int] = None
         output_tokens: Optional[int] = None
-        if final_chunk and hasattr(final_chunk, "usage_metadata") and final_chunk.usage_metadata:
+        if final_chunk:
-            usage = final_chunk.usage_metadata
-            input_tokens = getattr(usage, "prompt_token_count", None)
-            output_tokens = getattr(usage, "candidates_token_count", None)
+            try:
+                usage = final_chunk.usage_metadata
+                if usage:
+                    input_tokens = usage.prompt_token_count
+                    output_tokens = usage.candidates_token_count
+            except AttributeError:
+                pass
Based on coding guidelines, getattr and hasattr should be avoided.
plugins/xai/vision_agents/plugins/xai/llm.py (2)
282-297: Coding guidelines violation: getattr usage and broad exception handling.

Per coding guidelines, avoid getattr/hasattr and prefer normal attribute access. Additionally, line 296 uses bare except Exception: which should be a more specific exception type (e.g., json.JSONDecodeError).
🔎 Suggested refactor
     def _extract_tool_calls_from_response(
         self, response: Response
     ) -> List[NormalizedToolCallItem]:
         calls = []
-        tool_calls = getattr(response, "tool_calls", None) or []
+        tool_calls = response.tool_calls if response.tool_calls else []
         for tc in tool_calls:
-            func = getattr(tc, "function", None)
+            func = tc.function if tc.function else None
             if not func:
                 continue
 
-            name = getattr(func, "name", "unknown")
-            args_str = getattr(func, "arguments", "{}")
-            call_id = getattr(tc, "id", "") or getattr(tc, "call_id", "")
+            name = func.name if func.name else "unknown"
+            args_str = func.arguments if func.arguments else "{}"
+            call_id = tc.id if tc.id else (tc.call_id if tc.call_id else "")
 
             try:
                 args_obj = (
                     json.loads(args_str) if isinstance(args_str, str) else args_str
                 )
-            except Exception:
+            except json.JSONDecodeError:
                 args_obj = {}
Based on coding guidelines: avoid getattr/hasattr; use specific exception handling.

390-393: Timing metrics not propagated in tool-call follow-up streaming.

The _standardize_and_emit_chunk call here omits request_start_time and first_token_time, resulting in is_first_chunk=False and time_to_first_token_ms=None for all tool-call follow-up chunks. This is inconsistent with the main streaming path at lines 155-160.

Consider tracking timing for tool-call rounds or documenting that timing metrics only apply to the initial request.
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py (1)

156-173: Broad exception handling.

Line 156 catches all Exception types. Per coding guidelines, prefer specific exception handling. Consider catching the specific exceptions that chat.completions.create may raise (e.g., huggingface_hub.InferenceTimeoutError, HTTP-related exceptions).
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (2)
417-421: Coding guidelines violation: hasattr usage.

Per coding guidelines, avoid hasattr and prefer normal attribute access. Consider using try/except for attribute access or checking the type explicitly.
🔎 Suggested refactor
         # Extract token usage from response
         input_tokens: Optional[int] = None
         output_tokens: Optional[int] = None
 
-        if hasattr(response, "usage") and response.usage:
-            input_tokens = response.usage.input_tokens
-            output_tokens = response.usage.output_tokens
+        try:
+            if response.usage:
+                input_tokens = response.usage.input_tokens
+                output_tokens = response.usage.output_tokens
+        except AttributeError:
+            pass
 
-        model = response.model if hasattr(response, "model") else self.model
+        try:
+            model = response.model
+        except AttributeError:
+            model = self.model
Based on coding guidelines: avoid hasattr.

248-252: Coding guidelines: getattr/hasattr usage for SDK interop.

Lines 250-252 use getattr and hasattr to inspect streaming event deltas. While this violates the coding guidelines, it may be necessary for SDK types with varying structures. Consider type-checking or try/except patterns if these SDK types are well-defined.

Based on coding guidelines: avoid getattr/hasattr.
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)

603-610: Missing metrics in tool-call follow-up response.

The LLMResponseCompletedEvent emitted during tool-call follow-up rounds lacks latency_ms, time_to_first_token_ms, and model fields, unlike the main streaming/non-streaming paths. This creates inconsistency in metrics collection.

Consider tracking timing for follow-up rounds or documenting this limitation.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5c966d3 and bd409c1.

⛔ Files ignored due to path filters (1)

examples/03_prometheus_metrics_example/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (28)

agents-core/vision_agents/core/llm/events.py
agents-core/vision_agents/core/observability/__init__.py
agents-core/vision_agents/core/observability/collector.py
agents-core/vision_agents/core/observability/metrics.py
examples/03_prometheus_metrics_example/README.md
examples/03_prometheus_metrics_example/__init__.py
examples/03_prometheus_metrics_example/prometheus_metrics_example.py
examples/03_prometheus_metrics_example/pyproject.toml
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
plugins/aws/vision_agents/plugins/aws/aws_llm.py
plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py
plugins/fish/vision_agents/plugins/fish/stt.py
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py
plugins/openai/vision_agents/plugins/openai/openai_llm.py
plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py
plugins/roboflow/vision_agents/plugins/roboflow/events.py
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_cloud_processor.py
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py
plugins/wizper/vision_agents/plugins/wizper/stt.py
plugins/xai/vision_agents/plugins/xai/llm.py
tests/test_metrics_collector.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

examples/03_prometheus_metrics_example/prometheus_metrics_example.py
plugins/fish/vision_agents/plugins/fish/stt.py
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py
tests/test_metrics_collector.py
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
plugins/openai/vision_agents/plugins/openai/openai_llm.py
plugins/wizper/vision_agents/plugins/wizper/stt.py
agents-core/vision_agents/core/observability/collector.py
plugins/aws/vision_agents/plugins/aws/aws_llm.py
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py
agents-core/vision_agents/core/observability/__init__.py
plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py
plugins/roboflow/vision_agents/plugins/roboflow/events.py
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py
agents-core/vision_agents/core/llm/events.py
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_cloud_processor.py
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py
plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
plugins/xai/vision_agents/plugins/xai/llm.py
agents-core/vision_agents/core/observability/metrics.py

**/*test*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*test*.py: Never mock in tests; use pytest for testing
Mark integration tests with @pytest.mark.integration decorator
@pytest.mark.asyncio is not needed - it is automatic

Files:

tests/test_metrics_collector.py

🧬 Code graph analysis (16)

plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (1)

agents-core/vision_agents/core/stt/events.py (9)

processing_time_ms (38-39)

processing_time_ms (68-69)

TranscriptResponse (7-13)

confidence (30-31)

confidence (60-61)

language (34-35)

language (64-65)

model_name (46-47)

model_name (76-77)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)

agents-core/vision_agents/core/llm/events.py (6)

LLMRequestStartedEvent (87-96)

LLMResponseChunkEvent (100-121)

LLMResponseCompletedEvent (125-149)

VLMInferenceStartEvent (199-205)

VLMInferenceCompletedEvent (209-231)

VLMErrorEvent (235-247)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)

agents-core/vision_agents/core/llm/events.py (5)

VLMInferenceStartEvent (199-205)

VLMInferenceCompletedEvent (209-231)

VLMErrorEvent (235-247)

LLMResponseChunkEvent (100-121)

LLMResponseCompletedEvent (125-149)

tests/test_metrics_collector.py (5)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseCompletedEvent (125-149)

ToolEndEvent (163-172)

agents-core/vision_agents/core/stt/events.py (3)

STTTranscriptEvent (17-47)

STTErrorEvent (81-93)

TranscriptResponse (7-13)

agents-core/vision_agents/core/tts/events.py (2)

TTSSynthesisCompleteEvent (35-45)

TTSErrorEvent (49-62)

agents-core/vision_agents/core/turn_detection/events.py (1)

TurnEndedEvent (29-46)

agents-core/vision_agents/core/observability/collector.py (11)

MetricsCollector (50-488)

_on_llm_response_completed (179-199)

_on_tool_end (201-210)

_on_stt_transcript (314-326)

_on_stt_error (328-336)

_on_tts_synthesis_complete (342-356)

_on_tts_error (358-366)

_on_turn_ended (372-380)

_on_vlm_inference_completed (399-424)

_on_vlm_error (426-434)

_base_attributes (476-488)

plugins/wizper/vision_agents/plugins/wizper/stt.py (1)

agents-core/vision_agents/core/stt/events.py (5)

processing_time_ms (38-39)

processing_time_ms (68-69)

TranscriptResponse (7-13)

model_name (46-47)

model_name (76-77)

plugins/aws/vision_agents/plugins/aws/aws_llm.py (2)

agents-core/vision_agents/core/llm/events.py (2)

LLMRequestStartedEvent (87-96)

LLMResponseCompletedEvent (125-149)

agents-core/vision_agents/core/events/manager.py (1)

send (437-481)

plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py (1)

agents-core/vision_agents/core/llm/events.py (6)

LLMRequestStartedEvent (87-96)

LLMResponseChunkEvent (100-121)

LLMResponseCompletedEvent (125-149)

VLMInferenceStartEvent (199-205)

VLMInferenceCompletedEvent (209-231)

VLMErrorEvent (235-247)

plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py (3)

agents-core/vision_agents/core/llm/events.py (2)

LLMRequestStartedEvent (87-96)

LLMResponseCompletedEvent (125-149)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)

_process_non_streaming_response (292-332)

agents-core/vision_agents/core/llm/llm.py (1)

LLMResponseEvent (39-43)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)

agents-core/vision_agents/core/llm/events.py (2)

LLMRequestStartedEvent (87-96)

LLMResponseCompletedEvent (125-149)

agents-core/vision_agents/core/observability/__init__.py (1)

agents-core/vision_agents/core/observability/collector.py (1)

MetricsCollector (50-488)

plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py (1)

agents-core/vision_agents/core/stt/events.py (11)

processing_time_ms (38-39)

processing_time_ms (68-69)

TranscriptResponse (7-13)

confidence (30-31)

confidence (60-61)

language (34-35)

language (64-65)

audio_duration_ms (42-43)

audio_duration_ms (72-73)

model_name (46-47)

model_name (76-77)

plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (2)

agents-core/vision_agents/core/processors/base_processor.py (1)

name (23-26)

tests/test_agent_tracks.py (1)

name (45-46)

agents-core/vision_agents/core/llm/events.py (3)

agents-core/vision_agents/core/events/base.py (2)

PluginBaseEvent (52-54)

error_message (92-93)

agents-core/vision_agents/core/agents/events.py (1)

error_message (65-66)

agents-core/vision_agents/core/tts/events.py (1)

error_message (61-62)

plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py (1)

agents-core/vision_agents/core/llm/events.py (1)

LLMResponseCompletedEvent (125-149)

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (3)

agents-core/vision_agents/core/llm/events.py (3)

LLMRequestStartedEvent (87-96)

LLMResponseCompletedEvent (125-149)

LLMResponseChunkEvent (100-121)

agents-core/vision_agents/core/events/manager.py (1)

send (437-481)

plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)

_emit_completion_event (427-467)

_standardize_and_emit_event (469-546)

plugins/xai/vision_agents/plugins/xai/llm.py (2)

agents-core/vision_agents/core/llm/events.py (2)

LLMRequestStartedEvent (87-96)

LLMResponseCompletedEvent (125-149)

agents-core/vision_agents/core/events/manager.py (1)

send (437-481)

🪛 markdownlint-cli2 (0.18.1)

examples/03_prometheus_metrics_example/README.md

20-20: Bare URL used

(MD034, no-bare-urls)

113-113: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
GitHub Check: unit / Ruff
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Mypy
GitHub Check: unit / Mypy
GitHub Check: unit / Ruff
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"

🔇 Additional comments (54)

plugins/fish/vision_agents/plugins/fish/stt.py (3)

4-4: LGTM: Clean timing import.

The time module import is appropriate for the timing instrumentation added below.

88-116: Excellent timing implementation.

The use of time.perf_counter() is the correct choice for measuring elapsed time—monotonic and high-resolution. The timing scope appropriately captures the entire ASR processing pipeline (PCM conversion, request building, API call, and response processing).

122-122: The code correctly passes processing_time_ms to TranscriptResponse. The parameter is explicitly defined as a dataclass field and the timing metric is properly computed.

plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py (6)

4-4: LGTM: Appropriate timing module.

The time import is correctly used with perf_counter() for monotonic performance measurement.

84-85: LGTM: Well-documented timing state.

The instance variable is properly typed and clearly documented.

218-221: LGTM: Correct processing time calculation.

The elapsed time calculation using perf_counter() is accurate and properly handles the None case. The metric correctly measures wall-clock time from first audio chunk to transcript arrival.

229-229: LGTM: Correct parameter passing.

The processing_time_ms is properly passed to TranscriptResponse, matching the expected schema from the external snippets.

251-252: LGTM: Proper timer reset at turn boundary.

Resetting _audio_start_time to None after the turn ends correctly prepares the timer for the next utterance.

122-124: No changes needed.

The Deepgram STT is designed as a single-stream implementation. The _audio_start_time shared state is appropriate because:

Single WebSocket connection: One connection per STT instance (line 80)

Single STT per agent: All usage patterns show one STT instance per agent, not per participant

Sequential processing: Tests and examples demonstrate sequential process_audio calls, not concurrent calls from different participants

Docstring note (line 100): "currently not used in streaming mode" indicates the participant parameter is metadata-only

Architecture comparison: Turn detection modules (vogent, smart_turn) that handle multi-participant scenarios explicitly use queues for per-participant state; Deepgram STT intentionally does not

The latency measurement correctly captures the time from the first audio chunk of an utterance to its conclusion, which is the intended behavior for single-stream speech processing with Deepgram's built-in turn detection.

Likely an incorrect or invalid review comment.

plugins/wizper/vision_agents/plugins/wizper/stt.py (1)

84-135: Timing instrumentation looks correct.

The processing time measurement correctly captures the end-to-end latency including file upload and transcription. The TranscriptResponse metadata is properly populated with processing_time_ms and model_name.

plugins/roboflow/vision_agents/plugins/roboflow/roboflow_cloud_processor.py (1)

228-287: Timing instrumentation and event enrichment look good.

The inference timing correctly captures latency around the _run_inference call, and the DetectionCompletedEvent payload now includes plugin context (plugin_name), timing (inference_time_ms), and model identification (model_id). Error handling appropriately skips metrics on failure.

plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (1)

257-317: Timing instrumentation and event enrichment are consistent.

The implementation mirrors the cloud processor pattern: inference timing is correctly captured, and DetectionCompletedEvent now carries plugin context, latency, and model identification. The error path appropriately bypasses metrics collection.

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (3)

4-5: LGTM! Imports support the new metrics instrumentation.

The addition of time and uuid modules, along with the VLM event classes, correctly supports the timing and event emission functionality introduced in this PR.

Also applies to: 17-23

267-278: LGTM! Proper initialization of inference tracking.

The inference tracking setup correctly generates a unique ID, captures start time with high-resolution perf_counter(), and emits the start event before processing begins. The frames_count=1 accurately reflects that this VLM processes one frame at a time.

380-394: LGTM! Proper error handling and resource cleanup.

The exception handler correctly emits VLMErrorEvent with contextual information, and the finally block ensures the processing lock is released even when exceptions occur. This prevents deadlocks and provides proper observability for failures.

plugins/roboflow/vision_agents/plugins/roboflow/events.py (1)

3-3: LGTM! Clean extension of detection event with metrics fields.

The addition of inference_time_ms and model_id as optional fields properly extends the detection event to support metrics collection. The docstring fix and field documentation follow conventions.

Also applies to: 18-18, 25-26, 33-34

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (2)

4-5: LGTM! Proper metrics instrumentation setup.

The imports and start event emission are correctly implemented. Notably, the VLMInferenceStartEvent is emitted before acquiring the processing lock (line 164), which correctly captures the total latency including any lock wait time.

Also applies to: 15-21, 151-162

257-268: LGTM! Proper error event emission.

The exception handler correctly emits VLMErrorEvent with the inference ID and context information for observability. The async with context manager (line 164) ensures the lock is released automatically.

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (4)

4-5: LGTM! Comprehensive metrics instrumentation setup.

The implementation correctly:

Counts frames from the actual buffer size (line 144)

Generates a unique inference ID (line 145)

Emits both VLM and LLM start events (lines 148-164)

Initializes timing variables for first-token tracking (lines 167-168)

This provides complete observability for both VLM-specific and general LLM metrics.

Also applies to: 15-22, 143-168

176-194: LGTM! Comprehensive error event emission.

The error handling correctly emits both plugin-specific LLMErrorEvent and core VLMErrorEvent, ensuring error observability at both the LLM and VLM abstraction levels.

210-233: LGTM! Correct first-token timing implementation.

The first-token timing logic correctly:

Captures first_token_time only once (lines 212-213)

Calculates time-to-first-token only for the initial chunk (lines 216-218)

Includes the is_first_chunk flag and time_to_first_token_ms in the chunk event (lines 230-231)

This enables accurate measurement of streaming latency characteristics.

242-270: LGTM! Comprehensive completion event emission with accurate metrics.

The completion event handling correctly:

Calculates total latency from request start (line 242)

Computes final time-to-first-token if tokens were received (lines 243-245)

Emits both VLM-specific and LLM compatibility events (lines 248-270)

Includes frames_processed in VLM event and comprehensive timing in both

This provides complete metrics for both VLM and LLM observability layers.

plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py (3)

3-3: LGTM! Consistent timing instrumentation across request lifecycle.

The timing infrastructure is correctly implemented:

Start time captured before the API call (line 170)

LLMRequestStartedEvent emitted with model and streaming flag (lines 161-167)

request_start_time parameter consistently propagated to both streaming and non-streaming processing functions (lines 187, 191, 200, 292)

This enables accurate latency measurements across all request paths.

Also applies to: 9-9, 160-171, 186-192, 200-200, 292-292

210-210: LGTM! Correct first-token timing in streaming responses.

The streaming response timing correctly:

Tracks first token time only once (lines 227-228)

Calculates time-to-first-token for the initial chunk (lines 230-233)

Includes timing metadata in chunk events (lines 244-245)

Emits completion with comprehensive timing and model info (lines 259-274)

The implementation is consistent with other plugins in this PR.

Also applies to: 226-234, 244-245, 259-274

286-313: LGTM! Correct latency tracking in non-streaming responses.

The non-streaming path correctly calculates latency from the request start time (line 295) and includes it along with the model identifier in the completion event (lines 309-310).

tests/test_metrics_collector.py (3)

96-138: LGTM!

The test comprehensively validates that all LLM response metrics (latency, TTFT, input/output tokens) are recorded with correct values and attributes when all fields are populated.

140-156: LGTM!

Good coverage of the partial data path, ensuring no metrics are recorded when optional fields are absent.

158-399: LGTM!

The remaining tests provide solid coverage for tool calls, STT/TTS events, turn detection, VLM inference, and error handling paths. The base attributes extraction tests correctly validate provider attribution.

plugins/aws/vision_agents/plugins/aws/aws_llm.py (2)

145-156: LGTM!

Request started event emission and timing initialization are correctly placed before the API call.

312-336: LGTM!

Latency calculation and token usage extraction are correctly implemented. The conditional total_tokens computation handles the case where either input or output tokens are present.

agents-core/vision_agents/core/observability/collector.py (2)

1-81: LGTM!

The module docstring, imports, and initialization are well-structured. The event subscription pattern is clean and maintainable.

179-220: LGTM!

LLM event handlers correctly extract attributes and record metrics with appropriate null checks.

examples/03_prometheus_metrics_example/prometheus_metrics_example.py (1)

1-55: LGTM!

The example correctly sets up OpenTelemetry with Prometheus exporter before importing vision_agents, ensuring automatic metric recording. Well-documented with clear setup instructions.

plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (2)

389-426: LGTM!

The updated _standardize_and_emit_event correctly computes and propagates timing metadata for first-chunk detection and TTFT calculation.

199-239: LGTM!

Request started event emission and timing tracking are correctly implemented for the streaming path.

plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py (3)

258-266: LGTM!

Request timing is correctly captured and passed to the non-streaming response handler.

282-359: LGTM!

Streaming path correctly tracks first token time, computes TTFT per-chunk, and emits completion events with full timing metrics.

361-407: LGTM!

Non-streaming path correctly handles optional timing and extracts token usage from the response.

plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)

427-467: LGTM!

The _emit_completion_event helper is well-structured with clear documentation and correctly extracts token usage metrics from the response.

151-164: LGTM!

Request started event emission and timing initialization are correctly implemented.

plugins/xai/vision_agents/plugins/xai/llm.py (1)

130-141: LGTM!

The LLMRequestStartedEvent emission and timing initialization are correctly implemented, aligning with the observability pattern across other LLM plugins.

plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py (2)

191-212: LGTM!

The first-chunk detection logic correctly uses len(text_chunks) == 0 before appending, ensuring is_first_chunk is only True for the actual first content chunk. Well implemented!

123-135: LGTM!

The VLM instrumentation correctly captures frames_count, generates a unique inference_id for correlation, and emits both VLM-specific and LLM-compatible start events.

agents-core/vision_agents/core/observability/__init__.py (2)

1-11: LGTM!

The module docstring provides a clear, actionable usage example demonstrating the opt-in metrics collection pattern with MetricsCollector.

59-105: LGTM!

The __all__ export list is comprehensive and well-organized by category, providing a clean public API surface for the observability module.

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (1)

468-489: LGTM!

The first-chunk detection correctly combines timing availability with len(text_parts) == 1 to ensure is_first_chunk is only True for the actual first content chunk. Good implementation!

agents-core/vision_agents/core/llm/events.py (3)

86-97: LGTM!

The LLMRequestStartedEvent is well-structured with auto-generated request_id, optional model, and streaming flag. Clear docstrings follow the established pattern.

198-206: LGTM!

VLMInferenceStartEvent provides good correlation capability via inference_id and captures essential context (model, frames_count) for vision-language model observability.

234-247: Exception serialization consideration.

The error: Optional[Exception] field stores an Exception object directly. If events are serialized (e.g., for logging or transport), this could cause issues since Exceptions aren't JSON-serializable by default.

The error_message property provides a string representation, which is good. Just be aware of this if events are serialized.

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)

230-251: LGTM!

First-chunk detection correctly uses len(text_chunks) == 0 before appending, and timing metrics are properly calculated and propagated. Clean implementation!

310-332: LGTM!

Non-streaming path properly extracts token usage from response.usage and includes all relevant metrics in the completion event.

agents-core/vision_agents/core/observability/metrics.py (3)

1-33: LGTM!

Excellent documentation! The module docstring clearly explains that metric providers are application-controlled, provides a working Prometheus example, and documents the opt-in nature of MetricsCollector.

153-162: Duration metrics as counters - verify intent.

realtime_audio_input_duration_ms and realtime_audio_output_duration_ms are defined as counters, which means they accumulate total duration over time rather than recording individual durations. This is appropriate for tracking cumulative audio processed but won't give you per-request latency distributions.

If per-request distribution is needed, consider histograms. If cumulative tracking is the goal, the current counter approach is correct.

42-220: LGTM!

Comprehensive and well-organized metrics covering all major components: STT, TTS, LLM, turn detection, realtime, VLM, and video processing. Consistent naming convention and appropriate metric types (histograms for latency, counters for events/tokens).

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py

plugins/xai/vision_agents/plugins/xai/llm.py

- Fix type narrowing issues in openai, xai, anthropic LLM plugins - Add is_global_banned param to ChannelMember for getstream SDK update - Remove unused imports and variables - Add noqa comments for intentional E402 in metrics examples - Add golf_coach_with_metrics.py example for testing realtime metrics

- Remove direct metrics from TTS, use MetricsCollector exclusively - Add VideoProcessorDetectionEvent base class for typed detection events - Add LLMErrorEvent for non-realtime LLM errors - Fix hasattr usage in _base_attributes to use direct attribute access - Update roboflow plugin to inherit from VideoProcessorDetectionEvent

- Reset _audio_start_time in ElevenLabs STT error/close handlers - Fix is_first_chunk logic in xAI LLM to only be true for first chunk - Format code with ruff

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (3)

497-507: Don’t log URLs containing auth tokens; also avoid except Exception as e.
url includes token and is logged (and re-logged on failure). Please redact the token in logs and narrow the exception handling to specific expected failures (e.g., webbrowser.Error, OSError).

69-81: Remove broad except Exception as e in StreamConnection.close (guideline violation).
Either let unexpected exceptions propagate (preferred), or catch only the concrete exception types ConnectionManager.leave() can raise.

Proposed patch

 async def close(self, timeout: float = 2.0):
     try:
         await asyncio.wait_for(self._connection.leave(), timeout=timeout)
     except asyncio.TimeoutError:
         logger.warning("Connection leave timed out during close")
     except RuntimeError as e:
         if "asynchronous generator" in str(e):
             logger.debug(f"Ignoring async generator error during shutdown: {e}")
         else:
             raise
-    except Exception as e:
-        logger.error(f"Error during connection close: {e}")

246-272: Replace hasattr(...) with explicit event-type checks (guideline violation).
This is currently relying on attribute presence; make the branch depend on the union type instead (e.g., isinstance(event, sfu_events.TrackUnpublishedEvent) vs ParticipantLeftEvent).

Proposed patch (shape)

-        # Determine which tracks to remove
-        if hasattr(event.payload, "type") and event.payload is not None:
-            # TrackUnpublishedEvent - single track
-            tracks_to_remove = [event.payload.type]
-            event_desc = "Track unpublished"
-        else:
-            # ParticipantLeftEvent - all published tracks
-            tracks_to_remove = (
-                event.participant.published_tracks if event.participant else None
-            ) or []
-            event_desc = "Participant left"
+        # Determine which tracks to remove
+        if isinstance(event, sfu_events.TrackUnpublishedEvent):
+            tracks_to_remove = [event.payload.type]
+            event_desc = "Track unpublished"
+        else:
+            tracks_to_remove = (event.participant.published_tracks if event.participant else None) or []
+            event_desc = "Participant left"

plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py (1)

562-585: Tool call follow-up responses lack timing metrics in LLMResponseCompletedEvent.

The follow-up LLMResponseCompletedEvent (line 579) is emitted without latency_ms, time_to_first_token_ms, or model. This creates inconsistent observability data compared to the initial request path.

🔧 Suggested fix to add timing to follow-up events

+            follow_up_start_time = time.perf_counter()
+            follow_up_first_token_time: Optional[float] = None
             async for chunk in follow_up:
                 if not chunk.choices:
                     continue

                 choice = chunk.choices[0]
                 content = choice.delta.content if choice.delta else None
                 finish_reason = choice.finish_reason
                 chunk_id = chunk.id if chunk.id else chunk_id

                 if choice.delta and choice.delta.tool_calls:
                     for tc in choice.delta.tool_calls:
                         self._accumulate_tool_call_chunk(tc)

                 if content:
+                    if follow_up_first_token_time is None:
+                        follow_up_first_token_time = time.perf_counter()
                     text_chunks.append(content)
                     self.events.send(
                         LLMResponseChunkEvent(
                             plugin_name=PLUGIN_NAME,
                             content_index=None,
                             item_id=chunk_id,
                             output_index=0,
                             sequence_number=i,
                             delta=content,
                         )
                     )

                 if finish_reason:
                     if finish_reason == "tool_calls":
                         next_tool_calls = self._finalize_pending_tool_calls()

                     total_text = "".join(text_chunks)
+                    follow_up_latency = (time.perf_counter() - follow_up_start_time) * 1000
+                    follow_up_ttft = None
+                    if follow_up_first_token_time is not None:
+                        follow_up_ttft = (follow_up_first_token_time - follow_up_start_time) * 1000
                     self.events.send(
                         LLMResponseCompletedEvent(
                             plugin_name=PLUGIN_NAME,
                             original=chunk,
                             text=total_text,
                             item_id=chunk_id,
+                            latency_ms=follow_up_latency,
+                            time_to_first_token_ms=follow_up_ttft,
+                            model=self.model,
                         )
                     )

plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)

501-520: is_first_chunk logic is incorrect - evaluates True for every chunk after first token.

The condition is_first = first_token_time is not None and request_start_time is not None will be True for every chunk after the first token arrives, not just the first chunk. Similarly, chunk_ttft_ms is recalculated for every chunk.

The is_first_chunk field and time_to_first_token_ms should only be set on the actual first chunk.

🐛 Proposed fix

You need to track whether the first chunk has already been emitted. One approach:

     def _standardize_and_emit_event(
         self,
         event: ResponseStreamEvent,
         request_start_time: Optional[float] = None,
         first_token_time: Optional[float] = None,
+        is_first_chunk_emitted: bool = False,
     ) -> Optional[LLMResponseEvent[OpenAIResponse]]:
         # ...
         
         if event.type == "response.output_text.delta":
             delta_event: ResponseTextDeltaEvent = event
-            # Calculate time to first token for the first chunk
-            is_first = first_token_time is not None and request_start_time is not None
+            # Only the very first chunk should have is_first_chunk=True
+            is_first = not is_first_chunk_emitted and first_token_time is not None
             chunk_ttft_ms: Optional[float] = None
-            if first_token_time is not None and request_start_time is not None:
+            if is_first and first_token_time is not None and request_start_time is not None:
                 chunk_ttft_ms = (first_token_time - request_start_time) * 1000

And track is_first_chunk_emitted in the caller.

plugins/xai/vision_agents/plugins/xai/llm.py (2)

390-393: Follow-up tool call streaming lacks timing parameters.

The call to _standardize_and_emit_chunk at line 391 doesn't pass request_start_time or first_token_time, meaning follow-up responses won't have timing metrics in their chunk events.

🔧 Suggested fix

Track timing for follow-up requests:

+            follow_up_start = time.perf_counter()
+            follow_up_first_token: Optional[float] = None
             async for response, chunk in self.xai_chat.stream():
+                if follow_up_first_token is None and chunk.content:
+                    follow_up_first_token = time.perf_counter()
                 llm_response_optional = self._standardize_and_emit_chunk(
-                    chunk, response
+                    chunk, response,
+                    request_start_time=follow_up_start,
+                    first_token_time=follow_up_first_token,
                 )

449-466: is_first_chunk logic is incorrect - evaluates True for every chunk after first token.

Same issue as in OpenAI plugin. The condition is_first = first_token_time is not None and request_start_time is not None will be True for every chunk after the first token, not just the first chunk.

🐛 Proposed fix

Track whether the first chunk event has been emitted:

     def _standardize_and_emit_chunk(
         self,
         chunk: Chunk,
         response: Response,
         request_start_time: Optional[float] = None,
         first_token_time: Optional[float] = None,
+        first_chunk_emitted: bool = False,
     ) -> Optional[LLMResponseEvent[Response]]:
         # ...
         if chunk.content:
-            is_first = first_token_time is not None and request_start_time is not None
+            is_first = not first_chunk_emitted and first_token_time is not None
             ttft_ms: Optional[float] = None
-            if first_token_time is not None and request_start_time is not None:
+            if is_first and request_start_time is not None:
                 ttft_ms = (first_token_time - request_start_time) * 1000

The caller should track and pass first_chunk_emitted state.

🤖 Fix all issues with AI agents

In @examples/02_golf_coach_example/golf_coach_with_metrics.py:
- Around line 52-54: The MetricsCollector created in join_call is currently
assigned to "_" which allows it to be garbage-collected; instead keep a strong
reference for the call lifetime (e.g., assign it to a persistent attribute on
the Agent like agent._metrics_collector or store it in a call-scoped variable
returned/managed by the call lifecycle) so the MetricsCollector instance remains
alive and its event subscriptions persist; update the join_call function to
replace "_ = MetricsCollector(agent)" with a durable assignment (referencing
MetricsCollector and join_call).

In @plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py:
- Around line 467-487: The first-chunk flag is being computed per-stream via
len(text_parts) which causes every new stream (tool calls/finalize) to be
considered first; add an instance-level boolean (e.g.,
self._first_chunk_emitted) to the AnthropicLLM class, initialize it False, set
it to False at the start of each new request in create_message, then in
_standardize_and_emit_event compute is_first only when self._first_chunk_emitted
is False and the candidate conditions hold (first_token_time/request_start_time
present and text_parts length == 1), and after emitting the first-chunk event
set self._first_chunk_emitted = True so subsequent streams do not mark their
first chunk as the global first.
- Around line 384-393: The streaming path currently sends an
LLMResponseCompletedEvent directly (using last_followup_stream/original,
total_text, latency_ms, ttft_ms, model) which omits token usage metrics present
on RawMessageStopEvent message_stop/usage; replace that direct
self.events.send(...) block with a call to the existing _emit_completion_event
helper so streaming responses reuse its logic to extract
input_tokens/output_tokens from RawMessageStopEvent (and safely handle missing
usage), passing the same original/last_followup_stream, total_text, latency_ms,
ttft_ms, and self.model as needed.

🧹 Nitpick comments (9)

plugins/roboflow/vision_agents/plugins/roboflow/events.py (1)

3-3: Optional import may be unused.

The Optional import from typing doesn't appear to be used directly in this file's DetectionCompletedEvent (the Optional types are in the parent class). Consider removing if not needed, though it's harmless to keep.
examples/02_golf_coach_example/pyproject.toml (1)
14-18: Consider using optional-dependencies for truly optional packages.

The metrics dependencies are marked as "Optional" but are included in the main dependencies array. For better dependency management, consider moving them to a [project.optional-dependencies] section:
[project.optional-dependencies]
metrics = [
  "opentelemetry-api>=1.39.1",
  "opentelemetry-sdk>=1.39.1",
  "opentelemetry-exporter-prometheus>=0.60b1",
  "prometheus-client>=0.23.1",
]
Users could then install with pip install vision-agents-golf[metrics]. Note that opentelemetry-exporter-prometheus is still in pre-release; you may want to lock to a specific stable version once it reaches 1.0.
tests/test_metrics_collector.py (1)
40-92: Test setup uses MagicMock for agent, which conflicts with coding guidelines.

The coding guidelines state "Never mock in tests; use pytest for testing." While the MockMetric is a legitimate test double and monkeypatch is appropriate, using MagicMock for the agent (line 80) violates this guideline.

Consider creating a minimal concrete test fixture or a dedicated test stub class instead of MagicMock.
♻️ Example refactor using a stub class
+class StubAgent:
+    """Minimal agent stub for testing MetricsCollector."""
+    def __init__(self):
+        self.llm = None
+        self.stt = None
+        self.tts = None
+        self.turn_detection = None
+        self.events = None  # Could use a real EventManager if needed
+

 def _create_collector_with_mocks(self, monkeypatch):
     """Create a collector with mocked metrics."""
     # ... metric mocks ...

-    # Create a mock agent
-    agent = MagicMock()
-    agent.llm = None
-    agent.stt = None
-    agent.tts = None
-    agent.turn_detection = None
-    agent.events = MagicMock()
+    # Create a stub agent
+    agent = StubAgent()

     # Create collector but skip event subscription
     collector = object.__new__(MetricsCollector)
     collector.agent = agent
     collector._realtime_session_starts = {}

     return collector, mocks
Based on coding guidelines, mocking should be avoided.
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
370-373: Usage of getattr violates coding guidelines.

The coding guidelines state "Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python."
♻️ Proposed fix using direct attribute access
         # Extract token usage from response if available
         input_tokens: Optional[int] = None
         output_tokens: Optional[int] = None
-        if final_chunk and hasattr(final_chunk, "usage_metadata") and final_chunk.usage_metadata:
-            usage = final_chunk.usage_metadata
-            input_tokens = getattr(usage, "prompt_token_count", None)
-            output_tokens = getattr(usage, "candidates_token_count", None)
+        if final_chunk and final_chunk.usage_metadata:
+            usage = final_chunk.usage_metadata
+            input_tokens = usage.prompt_token_count if usage.prompt_token_count else None
+            output_tokens = usage.candidates_token_count if usage.candidates_token_count else None
If the attributes might not exist on some SDK versions, wrap in try/except instead of using getattr.
Based on coding guidelines.
plugins/xai/vision_agents/plugins/xai/llm.py (1)
269-306: Extensive use of getattr violates coding guidelines.

Lines 282, 284, 288, 289, 290 use getattr for accessing response and tool call attributes. The coding guidelines state to "Avoid using getattr, hasattr, delattr and setattr."
♻️ Proposed refactor using direct attribute access with try/except
     def _extract_tool_calls_from_response(
         self, response: Response
     ) -> List[NormalizedToolCallItem]:
         calls = []
-        tool_calls = getattr(response, "tool_calls", None) or []
+        tool_calls = response.tool_calls if response.tool_calls else []
         for tc in tool_calls:
-            func = getattr(tc, "function", None)
+            func = tc.function if tc.function else None
             if not func:
                 continue

-            name = getattr(func, "name", "unknown")
-            args_str = getattr(func, "arguments", "{}")
-            call_id = getattr(tc, "id", "") or getattr(tc, "call_id", "")
+            name = func.name if func.name else "unknown"
+            args_str = func.arguments if func.arguments else "{}"
+            call_id = tc.id if tc.id else (tc.call_id if tc.call_id else "")
If attributes may not exist, wrap the entire extraction in try/except for robustness.
Based on coding guidelines.
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (4)
144-144: Remove unused latency calculation.

This latency calculation is immediately overwritten at line 234 (after the tool-calling loop completes). The first calculation captures only the initial response time, while the second captures the full end-to-end latency including tool calls, which is the intended metric.
♻️ Proposed fix
-            latency_ms = (time.perf_counter() - request_start_time) * 1000
-
             # Extract text from Claude's response format - safely handle all text blocks
239-239: Fix indentation of closing parenthesis.

The closing parenthesis has extra indentation inconsistent with the opening call on line 235.
♻️ Proposed fix
                 latency_ms=latency_ms,
-                    )
+            )
397-437: LGTM! Centralized completion event emission with proper token extraction.

The method correctly extracts token usage from Claude responses and emits comprehensive metrics. The conditional logic for total_tokens (lines 431-433) is correct but could be slightly more readable.
♻️ Optional: Simplify total_tokens calculation
-                total_tokens=(input_tokens or 0) + (output_tokens or 0)
-                if input_tokens or output_tokens
-                else None,
+                total_tokens=(
+                    (input_tokens or 0) + (output_tokens or 0)
+                    if (input_tokens is not None or output_tokens is not None)
+                    else None
+                ),
This makes the condition more explicit about checking for None vs. truthiness.
608-611: Use specific exception handling instead of bare except Exception:.

As per coding guidelines, avoid catching Exception broadly. Since you're parsing JSON here, catch json.JSONDecodeError specifically.
♻️ Proposed fix
                 buf = "".join(pending["parts"]).strip() or "{}"
                 try:
                     args = json.loads(buf)
-                except Exception:
+                except json.JSONDecodeError:
                     args = {}

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bd409c1 and fe53a52.

⛔ Files ignored due to path filters (1)

examples/02_golf_coach_example/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (18)

agents-core/vision_agents/core/events/__init__.py
agents-core/vision_agents/core/events/base.py
agents-core/vision_agents/core/llm/events.py
agents-core/vision_agents/core/observability/collector.py
agents-core/vision_agents/core/tts/tts.py
examples/02_golf_coach_example/golf_coach_with_metrics.py
examples/02_golf_coach_example/pyproject.toml
examples/03_prometheus_metrics_example/prometheus_metrics_example.py
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py
plugins/openai/vision_agents/plugins/openai/openai_llm.py
plugins/roboflow/vision_agents/plugins/roboflow/events.py
plugins/xai/vision_agents/plugins/xai/llm.py
tests/test_metrics_collector.py

💤 Files with no reviewable changes (1)

agents-core/vision_agents/core/tts/tts.py

🚧 Files skipped from review as they are similar to previous changes (2)

examples/03_prometheus_metrics_example/prometheus_metrics_example.py
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

agents-core/vision_agents/core/events/base.py
agents-core/vision_agents/core/observability/collector.py
plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py
plugins/xai/vision_agents/plugins/xai/llm.py
agents-core/vision_agents/core/events/__init__.py
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py
plugins/openai/vision_agents/plugins/openai/openai_llm.py
tests/test_metrics_collector.py
examples/02_golf_coach_example/golf_coach_with_metrics.py
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py
plugins/roboflow/vision_agents/plugins/roboflow/events.py
agents-core/vision_agents/core/llm/events.py

**/*test*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*test*.py: Never mock in tests; use pytest for testing
Mark integration tests with @pytest.mark.integration decorator
@pytest.mark.asyncio is not needed - it is automatic

Files:

tests/test_metrics_collector.py

🧬 Code graph analysis (10)

agents-core/vision_agents/core/observability/collector.py (5)

agents-core/vision_agents/core/events/base.py (2)

PluginBaseEvent (52-54)

VideoProcessorDetectionEvent (143-156)

agents-core/vision_agents/core/llm/events.py (12)

LLMResponseCompletedEvent (125-149)

LLMErrorEvent (251-263)

ToolEndEvent (163-172)

RealtimeConnectedEvent (11-17)

RealtimeDisconnectedEvent (21-25)

RealtimeAudioInputEvent (29-33)

RealtimeAudioOutputEvent (37-42)

RealtimeResponseEvent (46-54)

RealtimeUserSpeechTranscriptionEvent (176-181)

RealtimeAgentSpeechTranscriptionEvent (185-190)

VLMInferenceCompletedEvent (209-231)

VLMErrorEvent (235-247)

agents-core/vision_agents/core/stt/events.py (2)

STTTranscriptEvent (17-47)

STTErrorEvent (81-93)

agents-core/vision_agents/core/tts/events.py (2)

TTSSynthesisCompleteEvent (35-45)

TTSErrorEvent (49-62)

agents-core/vision_agents/core/turn_detection/events.py (1)

TurnEndedEvent (29-46)

plugins/xai/vision_agents/plugins/xai/llm.py (3)

agents-core/vision_agents/core/llm/events.py (2)

LLMRequestStartedEvent (87-96)

LLMResponseCompletedEvent (125-149)

agents-core/vision_agents/core/events/manager.py (1)

send (437-481)

agents-core/vision_agents/core/llm/llm.py (1)

LLMResponseEvent (39-43)

agents-core/vision_agents/core/events/__init__.py (1)

agents-core/vision_agents/core/events/base.py (1)

VideoProcessorDetectionEvent (143-156)

plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py (3)

agents-core/vision_agents/core/llm/events.py (2)

LLMRequestStartedEvent (87-96)

LLMResponseCompletedEvent (125-149)

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)

_process_non_streaming_response (292-332)

agents-core/vision_agents/core/llm/llm.py (1)

LLMResponseEvent (39-43)

plugins/openai/vision_agents/plugins/openai/openai_llm.py (4)

agents-core/vision_agents/core/llm/events.py (2)

LLMRequestStartedEvent (87-96)

LLMResponseCompletedEvent (125-149)

agents-core/vision_agents/core/llm/llm.py (1)

LLMResponseEvent (39-43)

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (2)

_emit_completion_event (397-436)

_standardize_and_emit_event (438-494)

plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)

_standardize_and_emit_event (413-450)

tests/test_metrics_collector.py (5)

agents-core/vision_agents/core/llm/events.py (4)

LLMResponseCompletedEvent (125-149)

ToolEndEvent (163-172)

VLMInferenceCompletedEvent (209-231)

VLMErrorEvent (235-247)

agents-core/vision_agents/core/stt/events.py (3)

STTTranscriptEvent (17-47)

STTErrorEvent (81-93)

TranscriptResponse (7-13)

agents-core/vision_agents/core/tts/events.py (2)

TTSSynthesisCompleteEvent (35-45)

TTSErrorEvent (49-62)

agents-core/vision_agents/core/turn_detection/events.py (1)

TurnEndedEvent (29-46)

agents-core/vision_agents/core/observability/collector.py (11)

MetricsCollector (51-484)

_on_llm_response_completed (180-200)

_on_tool_end (202-211)

_on_stt_transcript (315-327)

_on_stt_error (329-337)

_on_tts_synthesis_complete (343-357)

_on_tts_error (359-367)

_on_turn_ended (373-381)

_on_vlm_inference_completed (400-425)

_on_vlm_error (427-435)

_base_attributes (472-484)

examples/02_golf_coach_example/golf_coach_with_metrics.py (2)

agents-core/vision_agents/core/edge/types.py (1)

User (15-18)

agents-core/vision_agents/core/observability/collector.py (1)

MetricsCollector (51-484)

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (2)

agents-core/vision_agents/core/llm/events.py (3)

LLMRequestStartedEvent (87-96)

LLMResponseCompletedEvent (125-149)

LLMResponseChunkEvent (100-121)

agents-core/vision_agents/core/llm/llm.py (1)

LLMResponseEvent (39-43)

plugins/roboflow/vision_agents/plugins/roboflow/events.py (1)

agents-core/vision_agents/core/events/base.py (1)

VideoProcessorDetectionEvent (143-156)

agents-core/vision_agents/core/llm/events.py (4)

agents-core/vision_agents/core/events/base.py (2)

PluginBaseEvent (52-54)

error_message (92-93)

agents-core/vision_agents/core/edge/sfu_events.py (2)

error (935-940)

error (1906-1910)

agents-core/vision_agents/core/agents/events.py (1)

error_message (65-66)

agents-core/vision_agents/core/tts/events.py (1)

error_message (61-62)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: unit / Mypy
GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Mypy
GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"

🔇 Additional comments (55)

plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py (1)

4-4: LGTM! Timing measurement implementation is sound.

The latency tracking implementation correctly measures processing time from first audio chunk to transcript delivery:

Uses time.perf_counter() for accurate elapsed time measurement

Initializes timer on first audio chunk of each utterance (line 123 check prevents mid-utterance restarts)

Calculates processing time for each transcript message (partial or final) relative to utterance start

Properly resets timer at turn boundaries (lines 251-252) to prepare for next utterance

Handles None cases correctly when timing hasn't started

The logic integrates cleanly with the existing event flow and aligns with the observability surface mentioned in the PR objectives.

Also applies to: 84-85, 122-124, 218-229, 251-252

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)

460-476: No changes needed. The is_global_banned field is valid and supported in getstream-python 2.5.20. The Python SDK uses snake_case field names, so is_global_banned=False is the correct format and will not fail at runtime.

Likely an incorrect or invalid review comment.

agents-core/vision_agents/core/events/__init__.py (2)

9-9: LGTM!

The import of VideoProcessorDetectionEvent is correct and properly exposes the new event type for video processor metrics collection.

131-131: LGTM!

The export is correctly added to make VideoProcessorDetectionEvent part of the public API.

agents-core/vision_agents/core/events/base.py (1)

142-156: LGTM!

The VideoProcessorDetectionEvent class is well-designed with proper typing, sensible defaults, and clear documentation following the Google docstring style. The use of init=False for the type field is appropriate, and the optional fields support flexible metric collection across different video processor plugins.

plugins/roboflow/vision_agents/plugins/roboflow/events.py (2)

17-18: LGTM!

The base class change to VideoProcessorDetectionEvent properly aligns this event with the new metrics collection infrastructure, and the docstring fix improves clarity.

33-35: LGTM with a note on field override behavior.

The __post_init__ method correctly calculates detection_count from the objects list. Note that this will override any detection_count value passed to __init__, which appears intentional to ensure consistency between the objects list and the count.

examples/02_golf_coach_example/golf_coach_with_metrics.py (5)

10-23: LGTM!

The OpenTelemetry configuration is correctly placed before importing vision_agents, ensuring the metrics infrastructure is ready when the agent components are loaded. The Prometheus HTTP server setup on port 9464 follows best practices.

26-38: The noqa: E402 suppressions are justified.

While import order warnings are typically important, the requirement to configure OpenTelemetry before importing vision_agents makes these suppressions necessary and appropriate.

41-49: LGTM!

The create_agent function properly constructs an Agent with the necessary components for the golf coach example with metrics collection.

56-83: LGTM!

The logging output clearly communicates which metrics are being collected, and the call flow properly demonstrates the metrics collection in action. The agent lifecycle (create call, join, respond, finish) is correctly implemented.

86-87: LGTM!

The CLI entry point is properly configured with the AgentLauncher.

tests/test_metrics_collector.py (9)

23-35: Well-structured test double for metrics.

The MockMetric dataclass cleanly captures both record and add calls with their values and attributes. This approach provides good visibility into what metrics were recorded during tests.

94-136: Thorough test coverage for LLM response completed event.

The test verifies all four metrics are recorded with correct values and attributes. Good coverage of the full data path.

138-154: Good edge case coverage for partial data.

Testing that missing optional fields don't trigger metric recording is important for robustness.

156-181: Tool end handler test is comprehensive.

Validates both the tool call count metric and latency recording with proper attribute propagation.

183-231: STT event tests cover both success and error paths well.

Both STTTranscriptEvent and STTErrorEvent handlers are tested with appropriate attribute verification.

233-279: TTS tests verify latency, duration, and character count metrics.

Good coverage including the text length calculation for tts_characters.

281-305: Turn detection metrics test is complete.

Both duration_ms and trailing_silence_ms are verified.

307-374: VLM inference and error tests provide solid coverage.

The VLM inference test verifies inference count, latency, tokens, and frame processing metrics. Error handling test validates error type extraction.

376-397: Base attributes helper tests cover both present and missing plugin_name cases.

Simple but important edge case coverage.

plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py (4)

162-172: Request timing instrumentation looks correct.

The LLMRequestStartedEvent emission and request_start_time tracking using time.perf_counter() follows the pattern established in other LLM plugins.

228-248: First token timing and chunk event emission is well implemented.

The logic correctly tracks first_token_time on first content arrival and computes time_to_first_token_ms for the first chunk only. The is_first_chunk flag is derived from len(text_chunks) == 0.

261-276: Streaming completion event includes proper timing metrics.

Latency and TTFT are correctly calculated and included in LLMResponseCompletedEvent.

297-314: Non-streaming path correctly computes and emits latency.

The latency_ms calculation and event emission are consistent with the streaming path.

agents-core/vision_agents/core/observability/collector.py (8)

1-14: Clear module docstring with usage example.

The documentation clearly explains the purpose and provides a practical usage example.

62-71: Constructor initializes state correctly and triggers event subscription.

The session tracking dictionary and subscription call are properly initialized.

248-276: Audio input/output handlers correctly calculate byte sizes.

The * 2 multiplication for 16-bit samples is appropriate for calculating audio bytes.

315-327: STT transcript handler properly accesses properties from the event.

The handler correctly uses event.model_name, event.language, event.processing_time_ms, and event.audio_duration_ms which are defined as properties on STTTranscriptEvent.

343-357: TTS synthesis handler correctly records all three metrics.

Latency, audio duration, and character count are all properly handled with null checks.

441-466: Video processor event subscription is unconditional.

Unlike other subscriptions (lines 83-86, 141-144, etc.) which check for component existence before subscribing, _subscribe_to_processor_events always subscribes to self.agent.events. This is intentional since processors emit through the agent's event system, not through a specific component.

472-484: Base attributes helper is clean and simple.

The helper correctly extracts the provider from plugin_name with appropriate null checking.

227-246: session_id attribute exists on RealtimeConnectedEvent and RealtimeDisconnectedEvent through inheritance.

The event classes inherit from PluginBaseEvent, which extends BaseEvent. BaseEvent (defined in agents-core/vision_agents/core/events/base.py at line 40) declares session_id: Optional[str] = None. Therefore, both RealtimeConnectedEvent and RealtimeDisconnectedEvent have access to the session_id attribute via inheritance. Accessing event.session_id at lines 232 and 243 is valid.

Likely an incorrect or invalid review comment.

plugins/openai/vision_agents/plugins/openai/openai_llm.py (5)

151-164: Request started event and timing initialization are correct.

The LLMRequestStartedEvent emission and timing setup follow the established pattern across plugins.

173-189: Non-streaming path correctly calculates latency and emits completion event.

The latency calculation and delegation to _emit_completion_event are properly implemented.

199-207: First token time is tracked correctly for streaming.

The check for first_token_time is None and the event type ensures we only capture the first text delta timing.

427-467: Well-factored completion event emission helper.

The _emit_completion_event method cleanly extracts token usage and emits the event with all metrics. Good separation of concerns.

523-544: Completion event timing metrics are correctly calculated.

The latency and TTFT calculations in the response.completed handler are correct.

plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (5)

223-234: Request started event and timing initialization are correctly implemented.

Follows the same pattern as other LLM plugins.

251-263: First token detection and timing propagation to standardize method are correct.

The check for chunk.text is appropriate for Gemini's response structure.

331-341: Follow-up tool call responses reuse original request timing.

The timing passed to _standardize_and_emit_event during tool call follow-ups uses the original request_start_time and first_token_time. This may be intentional to measure total elapsed time, but note that per-turn timing would require fresh start times.

413-450: Standardize and emit event correctly handles first chunk detection.

The is_first = len(text_parts) == 0 check correctly identifies the first chunk before appending, unlike some other plugins.

375-390: Completion event includes all timing and token metrics.

The final LLMResponseCompletedEvent is properly populated with latency, TTFT, tokens, and model info.

plugins/xai/vision_agents/plugins/xai/llm.py (3)

130-141: Request started event and timing initialization are correct.

Follows the established pattern across plugins.

151-162: First token tracking and timing parameter passing are correct.

The streaming path properly captures first token time and passes timing to the chunk handler.

200-216: Completion event with timing metrics is properly emitted.

Latency and TTFT are correctly calculated and included in the event.

agents-core/vision_agents/core/llm/events.py (5)

86-96: Well-defined LLMRequestStartedEvent with auto-generated request_id.

The event captures all necessary context for request tracking with appropriate defaults.

117-121: LLMResponseChunkEvent extended with first-chunk timing fields.

The new is_first_chunk and time_to_first_token_ms fields enable streaming latency observability.

124-150: LLMResponseCompletedEvent comprehensively extended with timing and token metrics.

All timing (latency_ms, time_to_first_token_ms) and token usage fields (input_tokens, output_tokens, total_tokens, model) are properly typed as Optional with clear docstrings.

198-231: VLM events mirror LLM event structure with vision-specific metrics.

VLMInferenceStartEvent and VLMInferenceCompletedEvent appropriately include frames_processed and detections for vision model observability.

234-263: VLMErrorEvent and LLMErrorEvent follow consistent error event patterns.

Both events include error, error_code, context, is_recoverable, and the error_message property, consistent with other error events in the codebase (e.g., RealtimeErrorEvent, STTErrorEvent, TTSErrorEvent).

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (4)

3-3: LGTM! Imports support the new metrics functionality.

The addition of time for performance timing and LLMRequestStartedEvent for request lifecycle tracking aligns with the PR's observability objectives.

Also applies to: 19-19

127-140: LGTM! Request event emission and timing initialization are correctly placed.

Emitting LLMRequestStartedEvent before the API call and initializing timing variables with perf_counter() provides accurate latency tracking.

248-252: LGTM! First token tracking is correctly implemented.

The guard condition if first_token_time is None ensures the timestamp captures the very first token received, even across multiple streaming rounds.

438-444: LGTM! Signature update properly documents timing parameters.

The optional timing parameters maintain backward compatibility while enabling first-chunk detection and TTFT metrics.

examples/02_golf_coach_example/golf_coach_with_metrics.py

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)

504-524: Fix the is_first_chunk logic—it flags every chunk as first.

The condition at line 507 will be True for all chunks after first_token_time is set (line 204), not just the first chunk. This causes every chunk to report is_first_chunk=True and the same time_to_first_token_ms, corrupting TTFT metrics.

🐛 Proposed fix to track first chunk correctly

Track whether the first chunk has already been emitted:

     def _standardize_and_emit_event(
         self,
         event: ResponseStreamEvent,
         request_start_time: Optional[float] = None,
         first_token_time: Optional[float] = None,
+        first_chunk_emitted: bool = False,
     ) -> Optional[LLMResponseEvent[OpenAIResponse]]:
         """Forward native events and emit standardized versions.
 
         Args:
             event: The streaming event from OpenAI.
             request_start_time: Time when the request started (perf_counter).
             first_token_time: Time when first token was received (perf_counter).
+            first_chunk_emitted: Whether the first chunk has already been emitted.
 
         Returns:
             LLMResponseEvent if this is a completion event, None otherwise.
         """

Then update the chunk handling logic:

         if event.type == "response.output_text.delta":
             delta_event: ResponseTextDeltaEvent = event
             # Calculate time to first token for the first chunk
-            is_first = first_token_time is not None and request_start_time is not None
+            is_first = (
+                not first_chunk_emitted
+                and first_token_time is not None
+                and request_start_time is not None
+            )
             chunk_ttft_ms: Optional[float] = None
-            if first_token_time is not None and request_start_time is not None:
+            if is_first:
                 chunk_ttft_ms = (first_token_time - request_start_time) * 1000
 
             self.events.send(
                 LLMResponseChunkEvent(
                     plugin_name="openai",
                     content_index=None,
                     item_id=delta_event.item_id,
                     output_index=delta_event.output_index,
                     sequence_number=delta_event.sequence_number,
                     delta=delta_event.delta,
                     is_first_chunk=is_first,
                     time_to_first_token_ms=chunk_ttft_ms,
                 )
             )
             return None

And update the call site in create_response:

+            first_chunk_emitted = False
             # Process streaming events and collect tool calls
             async for event in stream_response:
                 # Track time to first token
                 if (
                     first_token_time is None
                     and event.type == "response.output_text.delta"
                 ):
                     first_token_time = time.perf_counter()
 
                 llm_response_optional = self._standardize_and_emit_event(
                     event,
                     request_start_time=request_start_time,
                     first_token_time=first_token_time,
+                    first_chunk_emitted=first_chunk_emitted,
                 )
+                if event.type == "response.output_text.delta" and first_token_time is not None:
+                    first_chunk_emitted = True

339-340: Tool call follow-up responses lack timing metrics.

The call to _standardize_and_emit_event at line 340 omits request_start_time and first_token_time parameters, so follow-up responses from tool execution won't emit latency or TTFT metrics. This creates an incomplete observability surface for multi-turn tool calling scenarios.

📊 Proposed fix to add timing for follow-ups

Track timing for follow-up requests as well:

     async def _send_tool_results_and_get_response(
         self,
         tool_messages: list[dict[str, Any]],
         seen: set[tuple[str, str]],
     ) -> tuple[
         Optional[LLMResponseEvent[OpenAIResponse]], List[NormalizedToolCallItem]
     ]:
         """Send tool results and get follow-up response.
 
         Returns:
             Tuple of (llm_response, next_tool_calls)
         """
         if not self.openai_conversation:
             return None, []
 
         follow_up_kwargs: Dict[str, Any] = {
             "model": self.model,
             "conversation": self.openai_conversation.id,
             "input": tool_messages,
             "stream": True,
         }
 
         # Include tools for potential follow-up calls
         tools = self.get_available_functions()
         if tools:
             follow_up_kwargs["tools"] = convert_tools_to_openai_format(tools)
 
+        # Track timing for follow-up request
+        followup_request_start = time.perf_counter()
+        followup_first_token: Optional[float] = None
+
         follow_up_response = await self.client.responses.create(**follow_up_kwargs)
 
         if isinstance(follow_up_response, OpenAIResponse):
             llm_response = LLMResponseEvent[OpenAIResponse](
                 follow_up_response, follow_up_response.output_text
             )
             next_tool_calls = self._extract_tool_calls_from_response(follow_up_response)
             return llm_response, next_tool_calls
 
         # Streaming response
         llm_response_streaming: Optional[LLMResponseEvent[OpenAIResponse]] = None
         pending_tool_calls: List[NormalizedToolCallItem] = []
 
         async for event in follow_up_response:
-            llm_response_optional = self._standardize_and_emit_event(event)
+            # Track first token for follow-up
+            if (
+                followup_first_token is None
+                and event.type == "response.output_text.delta"
+            ):
+                followup_first_token = time.perf_counter()
+
+            llm_response_optional = self._standardize_and_emit_event(
+                event,
+                request_start_time=followup_request_start,
+                first_token_time=followup_first_token,
+            )
             if llm_response_optional is not None:
                 llm_response_streaming = llm_response_optional
 
             if event.type == "response.completed":
                 calls = self._extract_tool_calls_from_response(event.response)
                 for c in calls:
                     key = tool_call_dedup_key(c)
                     if key not in seen:
                         pending_tool_calls.append(c)
                         seen.add(key)
 
         return llm_response_streaming, pending_tool_calls

plugins/aws/vision_agents/plugins/aws/aws_llm.py (1)

357-364: Avoid catching generic Exception.

Per coding guidelines, use specific exception types rather than bare except Exception as e. This broad catch could mask unrelated bugs (e.g., KeyboardInterrupt won't be caught, but TypeError, AttributeError, etc. will be silently wrapped as "unexpected errors").

Consider catching specific exceptions like json.JSONDecodeError, TypeError, or AWS-specific exceptions.

plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py (1)

620-627: Missing timing metrics in tool call follow-up responses.

The LLMResponseCompletedEvent emitted during tool call follow-up (lines 620-627) lacks latency_ms, time_to_first_token_ms, and model fields that are present in the main streaming path (lines 344-354). This creates inconsistent metric data.

Consider tracking timing for follow-up requests as well, or document that follow-up metrics are intentionally omitted.

plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (2)

414-438: Replace except Exception as e with specific exception types (guideline violation) and avoid swallowing asyncio.CancelledError.

Line 434 violates the repo guideline to never use broad except Exception as e handlers. Additionally, asyncio.CancelledError should not be swallowed during cleanup operations.

Lines 218 and 384 also use except Exception as e and should be updated for consistency.
Proposed fix for line 434 (narrower catches + re-raise cancellation)
         if self.connection:
             try:
                 await self.connection.close()
-            except Exception as e:
-                logger.warning(f"Error closing Elevenlabs connection: {e}")
+            except asyncio.CancelledError:
+                raise
+            except (OSError, RuntimeError, ValueError) as e:
+                logger.warning(f"Error closing Elevenlabs connection: {e}")
             finally:
                 self.connection = None
                 self._connection_ready.clear()
299-336: Move state cleanup before returning from empty transcripts.

ElevenLabs emits COMMITTED_TRANSCRIPT_WITH_TIMESTAMPS events with empty/whitespace text when using include_timestamps=true or with timing-only segments — these are expected end-of-speech signals. The current early return at line 20–21 skips both _commit_received.set() and _audio_start_time = None, causing clear() to hang until timeout (10s) and polluting the next utterance's processing_time_ms calculation with stale timestamps.
Proposed fix
-        if not transcript_text:
-            return
+        if not transcript_text:
+            # Still treat as "commit received" to unblock clear()
+            self._audio_start_time = None
+            self._commit_received.set()
+            return

🤖 Fix all issues with AI agents

In @agents-core/vision_agents/core/observability/collector.py:
- Around line 348-349: The truthiness check skips valid zero values for
event.synthesis_time_ms; change the conditional in the collector (where
event.synthesis_time_ms is checked before calling metrics.tts_latency_ms.record)
to use an explicit None check (i.e., test event.synthesis_time_ms is not None)
so that 0.0 is recorded, then call
metrics.tts_latency_ms.record(event.synthesis_time_ms, attrs) as before.
- Around line 227-246: The handlers _on_realtime_connected and
_on_realtime_disconnected reference event.session_id which doesn't exist on
RealtimeConnectedEvent/RealtimeDisconnectedEvent; update the event model or
generate a session key: either add an optional session_id field to the
RealtimeConnectedEvent/RealtimeDisconnectedEvent dataclasses in
agents-core/vision_agents/core/llm/events.py and keep using
self._realtime_session_starts[event.session_id], or change the handlers to
compute a deterministic session id (e.g.,
f"{event.provider}:{int(time.time()*1000)}" or similar derived key) when storing
and looking up entries in self._realtime_session_starts so no AttributeError
occurs; ensure the same session id generation logic is used in both
_on_realtime_connected and _on_realtime_disconnected and that
_realtime_session_starts is keyed accordingly.

In @plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py:
- Around line 94-95: Ensure _audio_start_time cannot persist across utterances
by initializing it at the very start of any audio-processing entrypoint (e.g.,
functions like transcribe, process_stream, handle_audio_chunk) and always
clearing it in a finally block so every early return or exception resets it;
wrap the processing logic in try/finally where you set self._audio_start_time =
time.time() (or None for tests) at entry and set self._audio_start_time = None
in the finally, and also clear it when you explicitly detect end-of-utterance or
final transcript emission.

🧹 Nitpick comments (13)

plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (2)
252-254: Prefer direct attribute access over hasattr.

The coding guidelines specify avoiding hasattr in favor of normal attribute access or try/except blocks.
♻️ Proposed refactor using try/except
-            # Track time to first token
-            if first_token_time is None and hasattr(chunk, "text") and chunk.text:
-                first_token_time = time.perf_counter()
+            # Track time to first token
+            if first_token_time is None:
+                try:
+                    if chunk.text:
+                        first_token_time = time.perf_counter()
+                except AttributeError:
+                    pass
Based on coding guidelines.

367-377: Prefer direct attribute access over hasattr/getattr.

The coding guidelines specify avoiding hasattr and getattr in favor of normal attribute access or try/except blocks.
♻️ Proposed refactor using try/except
-        # Extract token usage from response if available
-        input_tokens: Optional[int] = None
-        output_tokens: Optional[int] = None
-        if (
-            final_chunk
-            and hasattr(final_chunk, "usage_metadata")
-            and final_chunk.usage_metadata
-        ):
-            usage = final_chunk.usage_metadata
-            input_tokens = getattr(usage, "prompt_token_count", None)
-            output_tokens = getattr(usage, "candidates_token_count", None)
+        # Extract token usage from response if available
+        input_tokens: Optional[int] = None
+        output_tokens: Optional[int] = None
+        if final_chunk:
+            try:
+                usage = final_chunk.usage_metadata
+                if usage:
+                    input_tokens = usage.prompt_token_count
+                    output_tokens = usage.candidates_token_count
+            except AttributeError:
+                pass
Based on coding guidelines.
plugins/xai/vision_agents/plugins/xai/llm.py (1)
151-162: Consider more explicit truthiness check on Line 153.

The condition chunk.content is falsy for empty strings, which appears intentional to capture only meaningful tokens. However, being explicit improves clarity:
-                if first_token_time is None and chunk.content:
+                if first_token_time is None and chunk.content is not None and chunk.content != "":
Or if empty strings should also trigger timing:
-                if first_token_time is None and chunk.content:
+                if first_token_time is None and chunk.content is not None:
plugins/roboflow/vision_agents/plugins/roboflow/events.py (1)
33-35: Consider calling super().__post_init__() for defensive initialization chain safety.

The logic correctly sets detection_count from the objects list. While parent classes don't currently define __post_init__, calling super().__post_init__() at the beginning of this method is defensive best practice and ensures compatibility if parent classes add initialization logic in the future.
♻️ Proposed defensive coding enhancement
 def __post_init__(self):
     """Set detection_count from objects list."""
+    super().__post_init__()
     self.detection_count = len(self.objects)
plugins/aws/vision_agents/plugins/aws/aws_llm.py (1)
673-676: Avoid catching generic Exception in JSON parsing.

Replace with json.JSONDecodeError (or its parent ValueError) for more precise exception handling per coding guidelines.
Suggested fix
             try:
                 args = json.loads(buf)
-            except Exception:
+            except json.JSONDecodeError:
                 args = {}
tests/test_metrics_collector.py (1)
91-96: Creative but fragile: bypassing __init__ with object.__new__.

Using object.__new__(MetricsCollector) to skip event subscription is clever, but future changes to MetricsCollector (e.g., adding required initialization) could silently break tests. Consider documenting why this approach is used or adding a comment.
Suggested documentation
         # Create collector but skip event subscription
+        # We bypass __init__ to avoid triggering event subscriptions,
+        # which would require a real agent with event managers.
         collector = object.__new__(MetricsCollector)
agents-core/vision_agents/core/observability/metrics.py (1)

153-161: Consider using histograms for audio duration metrics.

realtime_audio_input_duration_ms and realtime_audio_output_duration_ms are defined as counters, which only track cumulative totals. Histograms would provide percentile distribution, enabling better latency analysis (e.g., P50, P99 audio chunk durations).

However, if the intent is to track total audio time per session rather than per-chunk distribution, counters are appropriate. This is a design choice worth documenting.
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (4)
144-144: Consider removing the early latency calculation.

The latency is calculated at line 144 but then recalculated at line 234 after tool calling loops complete. The first calculation doesn't include tool execution time and is effectively unused. Consider removing line 144 to avoid confusion, since the meaningful measurement happens at line 234.
♻️ Optional cleanup

Remove the early calculation:
-        if isinstance(original, ClaudeMessage):
-            latency_ms = (time.perf_counter() - request_start_time) * 1000
-
+        if isinstance(original, ClaudeMessage):
             # Extract text from Claude's response format - safely handle all text blocks
The calculation at line 234 correctly captures the full end-to-end latency.
Also applies to: 234-239

248-252: Consider replacing hasattr/getattr with try/except blocks.

Lines 250-251 use hasattr and getattr, which the coding guidelines ask us to avoid. While checking external API objects defensively is reasonable, try/except AttributeError would be more idiomatic.

As per coding guidelines.
♻️ Refactor to use try/except
                 # Track time to first token
-                if first_token_time is None and event.type == "content_block_delta":
-                    delta = getattr(event, "delta", None)
-                    if delta is not None and hasattr(delta, "text") and delta.text:
-                        first_token_time = time.perf_counter()
+                if first_token_time is None and event.type == "content_block_delta":
+                    try:
+                        if event.delta and event.delta.text:
+                            first_token_time = time.perf_counter()
+                    except AttributeError:
+                        pass
397-436: Consider replacing hasattr with try/except for attribute checks.

The new helper method is well-structured, but uses hasattr at lines 416-418 and 420, which violates the coding guidelines. Since this is new code, it's a good opportunity to use the preferred try/except pattern.

Additionally, the ternary expression at lines 431-433 is somewhat complex and could be simplified for readability.

As per coding guidelines.
♻️ Refactor to use try/except
-        # Extract token usage from response
-        input_tokens: Optional[int] = None
-        output_tokens: Optional[int] = None
-
-        if hasattr(response, "usage") and response.usage:
-            input_tokens = response.usage.input_tokens
-            output_tokens = response.usage.output_tokens
-
-        model = response.model if hasattr(response, "model") else self.model
+        # Extract token usage from response
+        input_tokens: Optional[int] = None
+        output_tokens: Optional[int] = None
+
+        try:
+            if response.usage:
+                input_tokens = response.usage.input_tokens
+                output_tokens = response.usage.output_tokens
+        except AttributeError:
+            pass
+
+        try:
+            model = response.model
+        except AttributeError:
+            model = self.model
+
+        # Simplify total_tokens calculation
+        total_tokens = None
+        if input_tokens is not None or output_tokens is not None:
+            total_tokens = (input_tokens or 0) + (output_tokens or 0)
467-476: Consider simplifying the first-chunk detection logic.

Lines 468-472 compute is_first with three conditions, which is correct but could be clearer. Since text_parts.append() happens at line 465, checking len(text_parts) == 1 identifies the first text chunk. The request_start_time is not None check is redundant if first_token_time is not None (first token can only be captured if request started).
♻️ Optional simplification
-                # Check if this is the first text chunk
-                is_first = (
-                    first_token_time is not None
-                    and request_start_time is not None
-                    and len(text_parts) == 1
-                )
+                # Check if this is the first text chunk (first_token_time is only set once)
+                is_first = first_token_time is not None and len(text_parts) == 1
Or add a clarifying comment explaining that all three conditions together verify this is the very first chunk with timing data captured.
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (2)
129-136: Initialize _audio_start_time only when audio is actually enqueued (minor accuracy improvement).
Right now it’s set even if _audio_queue is None (unlikely given _connection_ready, but still a correctness footgun if flow changes).
Proposed tweak
-        # Track start time for first audio chunk of a new utterance
-        if self._audio_start_time is None:
-            self._audio_start_time = time.perf_counter()
-
-        # Add to audio queue for batching
-        if self._audio_queue is not None:
-            await self._audio_queue.put(resampled_pcm)
+        # Add to audio queue for batching
+        if self._audio_queue is not None:
+            # Track start time for first audio chunk of a new utterance
+            if self._audio_start_time is None:
+                self._audio_start_time = time.perf_counter()
+            await self._audio_queue.put(resampled_pcm)
396-413: clear() resets are good; consider catching asyncio.TimeoutError explicitly.
asyncio.wait_for(...) raises asyncio.TimeoutError (aliasing can vary by Python/version); being explicit improves clarity.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fe53a52 and 58e87c3.

📒 Files selected for processing (12)

agents-core/vision_agents/core/observability/collector.py
agents-core/vision_agents/core/observability/metrics.py
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
plugins/aws/vision_agents/plugins/aws/aws_llm.py
plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py
plugins/openai/vision_agents/plugins/openai/openai_llm.py
plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py
plugins/roboflow/vision_agents/plugins/roboflow/events.py
plugins/xai/vision_agents/plugins/xai/llm.py
tests/test_metrics_collector.py

🚧 Files skipped from review as they are similar to previous changes (1)

plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

agents-core/vision_agents/core/observability/collector.py
tests/test_metrics_collector.py
plugins/openai/vision_agents/plugins/openai/openai_llm.py
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py
plugins/roboflow/vision_agents/plugins/roboflow/events.py
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py
plugins/aws/vision_agents/plugins/aws/aws_llm.py
plugins/xai/vision_agents/plugins/xai/llm.py
plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
agents-core/vision_agents/core/observability/metrics.py

**/*test*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*test*.py: Never mock in tests; use pytest for testing
Mark integration tests with @pytest.mark.integration decorator
@pytest.mark.asyncio is not needed - it is automatic

Files:

tests/test_metrics_collector.py

🧬 Code graph analysis (7)

agents-core/vision_agents/core/observability/collector.py (9)

plugins/roboflow/vision_agents/plugins/roboflow/roboflow_cloud_processor.py (1)

events (213-216)

plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (1)

events (208-211)

agents-core/vision_agents/core/events/base.py (2)

PluginBaseEvent (52-54)

VideoProcessorDetectionEvent (143-156)

agents-core/vision_agents/core/llm/events.py (13)

LLMResponseCompletedEvent (125-149)

LLMErrorEvent (251-263)

ToolEndEvent (163-172)

RealtimeErrorEvent (72-83)

RealtimeConnectedEvent (11-17)

RealtimeDisconnectedEvent (21-25)

RealtimeAudioInputEvent (29-33)

RealtimeAudioOutputEvent (37-42)

RealtimeResponseEvent (46-54)

RealtimeUserSpeechTranscriptionEvent (176-181)

RealtimeAgentSpeechTranscriptionEvent (185-190)

VLMInferenceCompletedEvent (209-231)

VLMErrorEvent (235-247)

agents-core/vision_agents/core/stt/events.py (2)

STTTranscriptEvent (17-47)

STTErrorEvent (81-93)

agents-core/vision_agents/core/turn_detection/events.py (1)

TurnEndedEvent (29-46)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (2)

on_stt_transcript (121-122)

_on_stt_transcript (270-275)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)

on_stt_transcript (229-230)

_on_stt_transcript (396-401)

tests/test_metrics_collector.py (2)

record (34-35)

add (37-38)

tests/test_metrics_collector.py (1)

agents-core/vision_agents/core/observability/collector.py (11)

MetricsCollector (51-483)

_on_llm_response_completed (180-200)

_on_tool_end (202-211)

_on_stt_transcript (315-327)

_on_stt_error (329-337)

_on_tts_synthesis_complete (343-357)

_on_tts_error (359-367)

_on_turn_ended (373-381)

_on_vlm_inference_completed (400-423)

_on_vlm_error (425-433)

_base_attributes (471-483)

plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (1)

agents-core/vision_agents/core/stt/events.py (9)

processing_time_ms (38-39)

processing_time_ms (68-69)

TranscriptResponse (7-13)

confidence (30-31)

confidence (60-61)

language (34-35)

language (64-65)

model_name (46-47)

model_name (76-77)

plugins/roboflow/vision_agents/plugins/roboflow/events.py (3)

plugins/roboflow/vision_agents/plugins/roboflow/roboflow_cloud_processor.py (1)

events (213-216)

plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (1)

events (208-211)

agents-core/vision_agents/core/events/base.py (1)

VideoProcessorDetectionEvent (143-156)

plugins/aws/vision_agents/plugins/aws/aws_llm.py (2)

agents-core/vision_agents/core/llm/events.py (2)

LLMRequestStartedEvent (87-96)

LLMResponseCompletedEvent (125-149)

agents-core/vision_agents/core/events/manager.py (1)

send (437-481)

plugins/xai/vision_agents/plugins/xai/llm.py (3)

agents-core/vision_agents/core/llm/events.py (2)

LLMRequestStartedEvent (87-96)

LLMResponseCompletedEvent (125-149)

agents-core/vision_agents/core/events/manager.py (1)

send (437-481)

agents-core/vision_agents/core/llm/llm.py (1)

LLMResponseEvent (39-43)

plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py (1)

agents-core/vision_agents/core/llm/events.py (1)

LLMResponseCompletedEvent (125-149)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Mypy

🔇 Additional comments (45)

plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (10)

1-21: LGTM: Timing and event imports are appropriate.

The addition of time for performance measurement and LLMRequestStartedEvent for lifecycle tracking aligns with the PR objectives to introduce observability metrics.

223-230: LGTM: Request lifecycle event properly positioned.

The LLMRequestStartedEvent is emitted at the correct point before streaming begins, capturing the essential metadata (plugin, model, streaming mode).

232-234: LGTM: Correct timing primitives selected.

Using time.perf_counter() is the appropriate choice for measuring elapsed time, as it provides a high-resolution monotonic clock suitable for performance measurements.

256-263: LGTM: Timing propagation to chunk emission.

The timing parameters are correctly threaded through to _standardize_and_emit_event, enabling per-chunk latency calculations.

334-341: LGTM: Timing consistency maintained through tool calls.

The timing references are correctly preserved across the multi-hop tool execution loop, ensuring latency metrics remain accurate.

361-365: LGTM: Accurate latency calculations.

The timing metrics are correctly computed by taking the difference between perf_counter() values and converting to milliseconds, with proper handling of the optional first-token time.

379-394: LGTM: Comprehensive completion event metrics.

The LLMResponseCompletedEvent now includes all essential observability data—latency, time-to-first-token, token counts, and model—with appropriate handling of optional values.

423-424: LGTM: Method signature properly extended.

The optional timing parameters maintain backward compatibility while enabling the new metrics functionality.

436-444: LGTM: First-chunk timing correctly computed.

The logic accurately identifies the initial text chunk and calculates time-to-first-token only when appropriate, ensuring metrics are meaningful.

446-455: LGTM: Chunk events enriched with timing metadata.

The LLMResponseChunkEvent emissions now carry first-chunk identification and time-to-first-token metrics, providing granular observability into streaming performance.
plugins/xai/vision_agents/plugins/xai/llm.py (7)
2-2: LGTM: Imports aligned with timing instrumentation.

The additions are correctly used for event emission and timing metrics.

Also applies to: 11-11

130-141: LGTM: Request timing properly initialized.

The event emission and timing setup correctly capture request start before obtaining the response.

204-219: LGTM: Timing metrics correctly calculated and emitted.

The latency and time-to-first-token calculations properly handle both streaming and non-streaming paths.

438-445: LGTM: Signature extension maintains backward compatibility.

The additional timing parameters have sensible defaults and clear type hints.

454-474: LGTM: First chunk timing correctly computed and emitted.

The time-to-first-token calculation is properly guarded and only applied to the initial content chunk.

393-396: Verify: Tool call follow-ups omit timing instrumentation.

The call to _standardize_and_emit_chunk within the tool call handling loop doesn't pass timing parameters, unlike the main request path (lines 157-163). If tool call follow-ups should be timed as separate requests, consider adding timing instrumentation similar to the main path.

If separate timing is desired for each tool call round:
             stream = original_kwargs.get("stream", True)
             if stream:
                 llm_response = None
                 pending_tool_calls = []
+                
+                # Track timing for tool call follow-up
+                tool_request_start = time.perf_counter()
+                tool_first_token: Optional[float] = None

                 async for response, chunk in self.xai_chat.stream():
+                    tool_is_first = False
+                    if tool_first_token is None and chunk.content:
+                        tool_first_token = time.perf_counter()
+                        tool_is_first = True
+                        
                     llm_response_optional = self._standardize_and_emit_chunk(
-                        chunk, response
+                        chunk, 
+                        response,
+                        request_start_time=tool_request_start,
+                        first_token_time=tool_first_token,
+                        is_first_chunk=tool_is_first,
                     )
480-481: LGTM: Clarifying comment aids understanding.

The note explains why the completion event isn't emitted within the chunk handler, improving code maintainability.
plugins/roboflow/vision_agents/plugins/roboflow/events.py (2)

5-5: LGTM! Event hierarchy upgrade supports metrics collection.

The switch to VideoProcessorDetectionEvent aligns with the PR's observability objectives, providing a standardized event surface for video processor plugins.

17-18: LGTM! Proper inheritance from VideoProcessorDetectionEvent.

The class correctly inherits from the new base event type, gaining the metrics-friendly fields (model_id, inference_time_ms, detection_count) while maintaining its specific event type identifier.

plugins/openai/vision_agents/plugins/openai/openai_llm.py (6)

2-2: LGTM! Clean import additions.

The time module and LLMRequestStartedEvent imports support the new timing instrumentation.

Also applies to: 16-16

151-164: LGTM! Solid timing foundation.

The request start event and timing setup using perf_counter() are implemented correctly.

173-189: LGTM! Non-streaming metrics correctly computed.

Latency calculation is accurate. Not tracking TTFT for non-streaming responses is a reasonable design choice since the entire response arrives atomically.

199-210: LGTM! First token detection implemented correctly.

The logic properly captures the first delta event's timing and propagates it through the streaming pipeline.

430-470: LGTM! Centralized completion event emission.

The method cleanly extracts token metrics and emits a comprehensive completion event. The docstring follows Google style guidelines.

526-547: LGTM! Completion timing metrics correctly computed.

The latency and TTFT calculations are accurate and properly guarded against None values.

plugins/aws/vision_agents/plugins/aws/aws_llm.py (2)

145-156: LGTM: Timing instrumentation and token extraction.

The request timing via time.perf_counter() is appropriate for measuring latency. Token extraction from the AWS response usage dict is correctly conditional, and the total_tokens computation handles the case where only one of input_tokens or output_tokens is present.

Also applies to: 312-339

395-406: LGTM: Streaming path timing instrumentation.

First-token tracking correctly captures the time when the first text delta appears in the stream. The ttft_ms calculation is properly guarded by a None check on first_token_time.

Also applies to: 453-458, 596-611

agents-core/vision_agents/core/observability/collector.py (3)

51-71: LGTM: Clean collector initialization pattern.

The constructor properly initializes session tracking state and delegates to _subscribe_to_events for modular subscription setup. The conditional checks for component existence (e.g., if not self.agent.llm) prevent subscription errors when components are not configured.

180-221: LGTM: LLM event handlers correctly guard optional fields.

The handlers properly check for None before recording metrics, ensuring no spurious zero values are recorded when data is unavailable.

471-483: LGTM: Base attributes helper.

Simple and effective extraction of the provider attribute from events.

tests/test_metrics_collector.py (3)

10-10: Consider whether MagicMock usage aligns with testing guidelines.

The coding guidelines state "Never mock in tests." While the MagicMock here is used only to create a minimal agent stub (not to mock behavior under test), this could be replaced with a simple dataclass or named tuple if strict adherence is required.

That said, the mock is used minimally—just to provide a null-ish agent structure—so this may be acceptable as a pragmatic boundary.

Also applies to: 84-89

98-158: LGTM: Comprehensive LLM handler tests.

Good coverage of both full-data and partial-data scenarios. The assertions correctly verify metric recording with expected values and attributes.

315-364: LGTM: VLM handler tests are thorough.

Tests verify all VLM-specific metrics including inference count, latency, token usage, and frame processing counts.

plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py (2)

258-270: LGTM: Primary path timing instrumentation is well-implemented.

The timing capture strategy is correct: request_start_time is captured before the API call, and first_token_time is recorded on the first content delta. The latency and TTFT calculations are accurate.

Also applies to: 286-287, 340-354

384-411: LGTM: Non-streaming response handles timing and token extraction correctly.

The conditional latency calculation and token extraction from response.usage follow the established pattern. The total_tokens fallback logic is consistent with other plugins.

agents-core/vision_agents/core/observability/metrics.py (3)

35-40: LGTM: Proper library instrumentation pattern.

Using metrics.get_meter() without configuring a provider is the correct approach for libraries. Applications control the provider configuration, and metrics become no-ops if unconfigured.

42-115: LGTM: Well-structured metric definitions.

Metrics are organized by component (STT, TTS, LLM) with consistent naming conventions, appropriate types (histograms for latencies, counters for counts), and descriptive documentation.

180-220: LGTM: VLM and video processor metrics.

The metrics cover the essential observability points for vision workloads: inference latency, token usage, frame processing, and detection counts.

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (4)

3-3: LGTM: Clean import additions for timing instrumentation.

The time module and LLMRequestStartedEvent imports are exactly what's needed for the observability enhancements.

Also applies to: 19-19

127-140: LGTM: Request instrumentation properly positioned.

The request started event is emitted before the API call (line 142), and timing initialization is clean. Good ordering ensures accurate latency capture.

254-259: LGTM: Consistent timing propagation through all stream paths.

All three stream processing branches (initial, follow-up for tool calls, and final) correctly thread request_start_time and first_token_time through to _standardize_and_emit_event. Clean pattern.

Also applies to: 331-336, 360-365

378-392: LGTM: Streaming completion metrics calculated correctly.

The timing calculations at lines 378-382 properly account for the full end-to-end latency including all tool calling rounds. Both latency and time-to-first-token are correctly computed and emitted.

plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (3)

5-5: Good choice using time.perf_counter() for latency measurement.
Monotonic clock is the right tool for durations.

254-265: Processing-time propagation in partial transcripts is solid.
Optional: clamp/round to avoid negative/very noisy sub-ms values if clocks skew (rare with perf_counter(), but harmless to guard).

345-357: Resetting _audio_start_time on error/close is the right safety net.
This prevents the next utterance from inheriting old timing state.

agents-core/vision_agents/core/observability/collector.py

plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py

agents-core/vision_agents/core/observability/collector.py

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

tests/test_metrics_collector.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @agents-core/vision_agents/core/observability/collector.py:
- Around line 248-276: The byte-count logic in _on_realtime_audio_input and
_on_realtime_audio_output assumes 16-bit samples; replace the
len(event.data.samples) * 2 calculation with event.data.samples.nbytes so the
actual dtype is respected (use event.data.samples.nbytes when computing
audio_bytes before calling metrics.realtime_audio_input_bytes.add and
metrics.realtime_audio_output_bytes.add); keep the existing guards for
event.data and ensure samples is present (e.g., only read .nbytes if
event.data.samples is not None) to avoid attribute errors.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 58e87c3 and d85fcb2.

📒 Files selected for processing (1)

agents-core/vision_agents/core/observability/collector.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

agents-core/vision_agents/core/observability/collector.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: unit / Mypy
GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
GitHub Check: unit / Ruff
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
GitHub Check: unit / Ruff
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Mypy

🔇 Additional comments (8)

agents-core/vision_agents/core/observability/collector.py (8)

1-49: LGTM!

The module structure is well-organized with clear documentation and proper use of TYPE_CHECKING to avoid circular imports. Event types are correctly imported at module level for type hint resolution.

51-71: LGTM!

The class initialization is clean and well-documented. The session start tracking dictionary is appropriately initialized for computing realtime session durations.

73-175: LGTM!

The subscription orchestration is well-structured with defensive checks for component existence before subscribing. The pattern of async wrappers delegating to synchronous handlers is consistent and clear.

180-221: LGTM!

LLM event handlers properly extract metrics with appropriate null checks and attribute augmentation. The success boolean is correctly converted to a lowercase string for consistent metric tagging.

315-381: LGTM!

STT, TTS, and turn detection handlers follow the established pattern with appropriate null checks and metric recording. Character count calculation for TTS uses the built-in len() function correctly.

387-433: LGTM!

VLM handlers appropriately record both token-based metrics and video-specific metrics. The checks for positive frame and detection counts prevent recording zero values.

439-466: LGTM!

Video processor handlers correctly subscribe to agent-level events and record detection metrics with appropriate conditionals. The pattern is consistent with other handlers.

471-483: LGTM!

The helper method cleanly extracts base attributes, providing consistent provider tagging across all metrics when plugin_name is available.

Nash0x7E2 · 2026-01-12T21:25:07Z

How are we progressing on this one? @d3xvn

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In @examples/02_golf_coach_example/golf_coach_with_metrics.py:
- Around line 41-49: Add a Google-style docstring to the create_agent function
describing its purpose, parameters, and return type. The docstring should
briefly explain that this function asynchronously creates and returns an Agent
instance configured with specific components such as edge, agent_user,
instructions, llm, and processors. Place the docstring immediately below the
async def create_agent(**kwargs) line.
- Around line 52-55: Add a Google-style docstring to join_call explaining
purpose, parameters (agent, call_type, call_id, **kwargs) and that it returns
None; then replace the local-only assignment of metrics_collector with a clear
retention pattern so linters don’t flag it as unused — for example create and
attach the collector to the agent (setattr(agent, "_metrics_collector",
metrics_collector) or append to an existing agent._aux_resources list) and
mention this retention in the docstring so future readers know it’s kept alive
for the call lifetime.
- Around line 16-23: Move the Prometheus setup out of module scope into a
function (e.g., setup_metrics) to avoid starting the HTTP server on import:
create setup_metrics() that uses PROMETHEUS_PORT, calls
start_http_server(PROMETHEUS_PORT) and configures PrometheusMetricReader(),
MeterProvider(metric_readers=[reader]) and metrics.set_meter_provider(provider);
call setup_metrics() from the if __name__ == "__main__": block instead of
running it at import, and add basic error handling around start_http_server to
catch/address socket/address-in-use errors (or try a fallback port) to avoid
crashing when port 9464 is already bound.

In @plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py:
- Around line 401-419: The streaming path should stop using hasattr/getattr and
access attributes directly: replace the hasattr(final_message, "usage") and
getattr(usage, "output_tokens", None) pattern with direct attribute reads like
usage = final_message.usage and output_tokens = usage.output_tokens (mirroring
_emit_completion_event); also capture input_tokens in the streaming path the
same way the non-streaming code does (extract from the response/message_start or
usage as used in _emit_completion_event) before building the
LLMResponseCompletedEvent (which includes original, total_text,
plugin_name="anthropic", latency_ms, time_to_first_token_ms, input_tokens,
output_tokens, model).
- Around line 143-145: The code computes latency_ms inside the ClaudeMessage
branch using (time.perf_counter() - request_start_time) * 1000 but that value is
immediately overwritten later after the tool-calling loop; remove this unused
assignment to latency_ms in the ClaudeMessage handling (or if you intended to
record interim latency, rename it to interim_latency_ms and persist/use it
instead) so only the final latency calculation after the tool loop remains;
update any references to latency_ms to use the single canonical final value and
ensure no dead assignment to latency_ms remains in the ClaudeMessage block.

🧹 Nitpick comments (2)

agents-core/vision_agents/core/llm/events.py (1)
238-267: Consider extracting shared error_message property to reduce duplication.

The error_message property is identical across VLMErrorEvent, LLMErrorEvent, and RealtimeErrorEvent. A mixin or intermediate base class could reduce this duplication.
♻️ Optional: Extract to a mixin
# In base.py or a new mixin file
class ErrorMessageMixin:
    """Mixin for events with error fields."""
    error: Optional[Exception] = None
    
    @property
    def error_message(self) -> str:
        return str(self.error) if self.error else "Unknown error"
Then each error event could inherit from both PluginBaseEvent and ErrorMessageMixin.
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (1)
252-256: Usage of getattr/hasattr for SDK types.

Per coding guidelines, prefer normal attribute access. While defensive checks for external SDK responses are pragmatic, consider using try/except for cleaner attribute access:
♻️ Alternative using try/except
-                if first_token_time is None and event.type == "content_block_delta":
-                    delta = getattr(event, "delta", None)
-                    if delta is not None and hasattr(delta, "text") and delta.text:
-                        first_token_time = time.perf_counter()
+                if first_token_time is None and event.type == "content_block_delta":
+                    try:
+                        if event.delta.text:
+                            first_token_time = time.perf_counter()
+                    except AttributeError:
+                        pass
Based on coding guidelines, prefer normal attribute access in Python.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d85fcb2 and 78e605a.

📒 Files selected for processing (3)

agents-core/vision_agents/core/llm/events.py
examples/02_golf_coach_example/golf_coach_with_metrics.py
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

examples/02_golf_coach_example/golf_coach_with_metrics.py
agents-core/vision_agents/core/llm/events.py
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

🧬 Code graph analysis (3)

examples/02_golf_coach_example/golf_coach_with_metrics.py (3)

agents-core/vision_agents/core/edge/types.py (1)

User (15-18)

agents-core/vision_agents/core/agents/agent_launcher.py (1)

AgentLauncher (15-185)

agents-core/vision_agents/core/observability/collector.py (1)

MetricsCollector (51-483)

agents-core/vision_agents/core/llm/events.py (3)

agents-core/vision_agents/core/events/base.py (2)

PluginBaseEvent (52-54)

error_message (92-93)

agents-core/vision_agents/core/agents/events.py (1)

error_message (65-66)

agents-core/vision_agents/core/tts/events.py (1)

error_message (61-62)

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (6)

agents-core/vision_agents/core/llm/events.py (3)

LLMRequestStartedEvent (91-100)

LLMResponseCompletedEvent (129-153)

LLMResponseChunkEvent (104-125)

agents-core/vision_agents/core/events/manager.py (1)

send (437-481)

plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)

_emit_completion_event (430-470)

_standardize_and_emit_event (472-549)

agents-core/vision_agents/core/agents/transcript_buffer.py (1)

text (81-83)

plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)

_standardize_and_emit_event (417-458)

agents-core/vision_agents/core/llm/llm.py (1)

LLMResponseEvent (39-43)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: unit / Ruff
GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Mypy
GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
GitHub Check: unit / Ruff

🔇 Additional comments (11)

examples/02_golf_coach_example/golf_coach_with_metrics.py (5)

1-8: LGTM!

Clear documentation with practical run instructions and endpoint guidance—exactly what an example needs.

25-38: LGTM!

The # noqa: E402 comments are well-justified given the OpenTelemetry configuration ordering requirement. Logging and env setup are appropriate for an example script.

57-76: LGTM!

The verbose logging is appropriate for an example script—it helps users understand which metrics are being collected without digging through source code.

78-84: LGTM!

The call lifecycle follows the expected pattern: create, join via context manager, send initial prompt, then finish.

87-88: LGTM!

Entry point correctly wires AgentLauncher with the CLI utility, matching the established pattern.

agents-core/vision_agents/core/llm/events.py (3)

90-101: LGTM!

The LLMRequestStartedEvent follows the established patterns in this file: auto-generated UUID for request tracking, consistent field structure, and proper type hints.

121-126: LGTM!

The first-chunk timing fields enable proper TTFT metrics collection for streaming responses—a key observability metric.

137-154: LGTM!

Comprehensive metrics surface for LLM observability. The inline documentation clarifying that total_tokens may differ from the sum when cached is valuable context.

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (3)

127-140: LGTM!

Proper request tracking initialization: LLMRequestStartedEvent is emitted before the API call with accurate model and streaming metadata, and time.perf_counter() is the correct choice for latency measurement.

425-464: Helper method centralizes completion event emission.

Good extraction pattern matching the OpenAI plugin. The total_tokens fallback logic correctly handles partial token availability.

Minor note: the hasattr usage on lines 444 and 448 could be refactored to try/except per coding guidelines, but this is consistent with the existing codebase patterns.

496-522: LGTM!

The emitted_first_chunk flag approach is more explicit than checking len(text_parts) == 0. The logic correctly ensures TTFT is only attached to the first chunk event across all streaming iterations.

examples/02_golf_coach_example/golf_coach_with_metrics.py

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

- Remove hasattr/getattr calls, use direct attribute access - Track input_tokens from message_start and output_tokens from message_delta - Remove unused latency_ms calculation (was being overwritten) - Use samples.nbytes for audio byte counting to handle all formats

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @agents-core/vision_agents/core/observability/collector.py:
- Around line 248-274: Both _on_realtime_audio_input and
_on_realtime_audio_output access event.data.samples.nbytes after only checking
event.data; add a null check for samples to avoid AttributeError by changing the
guards to verify event.data and event.data.samples (i.e., use if event.data and
event.data.samples:) before calling event.data.samples.nbytes and recording
bytes; keep the existing duration recording logic unchanged and apply the same
samples check in both functions (references: _on_realtime_audio_input,
_on_realtime_audio_output, event.data.samples.nbytes,
metrics.realtime_audio_input_bytes, metrics.realtime_audio_output_bytes).

In @plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py:
- Around line 436-444: Remove the hasattr checks and access attributes directly:
read response.usage into a local (e.g., usage = response.usage) and, if usage is
truthy, set input_tokens = usage.input_tokens and output_tokens =
usage.output_tokens; then set model = response.model (fall back to self.model
only if you explicitly need a fallback), eliminating hasattr(response, "usage")
and hasattr(response, "model") usage.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 78e605a and 73e45e8.

📒 Files selected for processing (3)

agents-core/vision_agents/core/observability/collector.py
examples/02_golf_coach_example/golf_coach_with_metrics.py
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

agents-core/vision_agents/core/observability/collector.py
examples/02_golf_coach_example/golf_coach_with_metrics.py
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

🧠 Learnings (1)

📚 Learning: 2025-11-24T17:04:43.030Z

Learnt from: CR
Repo: GetStream/Vision-Agents PR: 0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-11-24T17:04:43.030Z
Learning: Applies to **/*.py : Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python

Applied to files:

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

🧬 Code graph analysis (2)

agents-core/vision_agents/core/observability/collector.py (2)

agents-core/vision_agents/core/events/base.py (2)

PluginBaseEvent (52-54)

VideoProcessorDetectionEvent (143-156)

agents-core/vision_agents/core/llm/events.py (12)

LLMResponseCompletedEvent (129-153)

LLMErrorEvent (255-267)

ToolEndEvent (167-176)

RealtimeConnectedEvent (11-18)

RealtimeDisconnectedEvent (22-29)

RealtimeAudioInputEvent (33-37)

RealtimeAudioOutputEvent (41-46)

RealtimeResponseEvent (50-58)

RealtimeUserSpeechTranscriptionEvent (180-185)

RealtimeAgentSpeechTranscriptionEvent (189-194)

VLMInferenceCompletedEvent (213-235)

VLMErrorEvent (239-251)

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (2)

agents-core/vision_agents/core/llm/events.py (3)

LLMRequestStartedEvent (91-100)

LLMResponseCompletedEvent (129-153)

LLMResponseChunkEvent (104-125)

plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)

_emit_completion_event (430-470)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: unit / Mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Mypy
GitHub Check: unit / Test "not integration"

🔇 Additional comments (19)

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (8)

1-24: LGTM!

The imports are appropriate—time for timing instrumentation, LLMRequestStartedEvent for the new request lifecycle events. All aligned with the metrics implementation pattern.

127-140: LGTM!

The request started event emission and timing initialization are well-implemented. Using time.perf_counter() is the right choice for measuring elapsed time—monotonic, high resolution, perfect for latency metrics.

231-237: LGTM!

Latency calculation and event emission for non-streaming responses are correct. Appropriately omits time_to_first_token_ms since TTFT is not applicable for non-streaming requests.

243-269: LGTM!

The streaming token tracking implementation is solid—extracting input_tokens from message_start and output_tokens from message_delta aligns with Anthropic's streaming event structure. The first-token timing detection on actual text deltas ensures accurate TTFT metrics.

399-417: LGTM!

The streaming completion event emission is comprehensive—includes latency, TTFT, token counts, and model info. This mirrors the structure from the OpenAI plugin and provides consistent observability across providers.

462-524: LGTM!

The method signature expansion and first-chunk timing propagation are well-designed. The docstring follows Google style appropriately. The tuple return pattern for tracking emitted_first_chunk state across stream iterations is a clean approach that avoids mutable state sharing.

341-351: LGTM!

Token tracking in follow-up streams for multi-hop tool calling scenarios is correctly implemented. Direct attribute access is used appropriately here.

376-386: LGTM!

Consistent token tracking pattern for the finalization stream pass. This ensures accurate metrics even through complex tool-calling flows.

agents-core/vision_agents/core/observability/collector.py (7)

1-48: LGTM! Clean imports and module docstring.

The module docstring follows Google style, providing clear usage examples. Imports are well-organized with TYPE_CHECKING guard for the Agent type.

62-71: LGTM! Well-structured initialization.

The constructor properly initializes the collector, tracks session start times, and subscribes to events. The docstring adheres to Google style conventions.

One minor note: consider using the built-in dict[str, float] instead of Dict[str, float] for modern Python (3.9+).

73-175: LGTM! Solid defensive programming.

The subscription methods properly check for component existence before subscribing (e.g., if not self.agent.llm: return). The pattern of async handlers delegating to sync _on_* methods is clean and consistent.

180-221: LGTM! Careful null-guarding on metrics.

The handlers properly check for None values before recording metrics (e.g., if event.latency_ms is not None). The pattern of converting booleans to lowercase strings for attributes is consistent.

313-365: LGTM! Consistent metric recording patterns.

STT and TTS handlers follow the established pattern: extract base attributes, guard against None values, and record metrics. The character counting for TTS synthesis (line 355) is a nice touch for tracking usage.

371-431: LGTM! Thoughtful zero-value filtering.

The VLM handler only records frames_processed and detections when greater than zero (lines 418-421), avoiding noise from zero-value counter increments. The turn detection handler properly guards optional timing fields.

437-481: LGTM! Clean separation of processor events.

The video processor events are correctly subscribed at the agent level rather than the LLM level. The _base_attributes helper is minimal and focused, extracting only the plugin_name as the provider attribute.

examples/02_golf_coach_example/golf_coach_with_metrics.py (4)

1-38: LGTM! Proper OpenTelemetry initialization ordering.

The configuration of OpenTelemetry before importing vision_agents (lines 10-23) is correct and well-documented. This ensures metrics are captured from the moment the library loads. The # noqa: E402 comments appropriately suppress import order warnings.

41-49: LGTM! Clean agent factory function.

The create_agent function is well-structured, configuring the agent with appropriate components for a golf coaching use case.

52-84: The underscore prefix is intentional, the lifecycle is sound.

The _metrics_collector variable (line 55) uses an underscore prefix to indicate it's not explicitly used after creation—a lint convention. The collector remains alive throughout the async with agent.join(call) block, which is the intended behavior. The subscriptions it creates keep recording metrics until the function returns.

87-88: LGTM! Standard CLI entry point.

The main guard properly uses AgentLauncher with the defined factory functions.

agents-core/vision_agents/core/observability/collector.py

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

- Use direct attribute access for response.usage and response.model - Add null check for event.data.samples before accessing .nbytes

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (1)
499-525: Replace hasattr with direct attribute access or type checking.

The coding guidelines forbid the use of hasattr. The delta's type is known—TextDelta is already imported and waiting. Let it speak its name:
♻️ Suggested fix
         if event.type == "content_block_delta":
             delta_event: RawContentBlockDeltaEvent = event
-            if hasattr(delta_event.delta, "text") and delta_event.delta.text:
+            if isinstance(delta_event.delta, TextDelta) and delta_event.delta.text:
                 text_parts.append(delta_event.delta.text)
Based on learnings: "Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python."

🧹 Nitpick comments (2)

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (1)
410-428: Consider computing total_tokens for consistency with _emit_completion_event.

The streaming path's final emission omits total_tokens, while _emit_completion_event (line 466-468) computes it from input and output tokens. For symmetry in the metrics darkness:
♻️ Suggested fix
             self.events.send(
                 LLMResponseCompletedEvent(
                     original=last_followup_stream or original,
                     text=total_text,
                     plugin_name="anthropic",
                     latency_ms=latency_ms,
                     time_to_first_token_ms=ttft_ms,
                     input_tokens=input_tokens,
                     output_tokens=output_tokens,
+                    total_tokens=(input_tokens or 0) + (output_tokens or 0)
+                    if input_tokens or output_tokens
+                    else None,
                     model=self.model,
                 )
             )
agents-core/vision_agents/core/observability/collector.py (1)
469-481: Consider including plugin_version in base attributes.

The PluginBaseEvent base class also carries plugin_version. Including it could aid in debugging version-specific issues in production dashboards. This is optional and can be deferred.
♻️ Optional enhancement
     def _base_attributes(self, event: PluginBaseEvent) -> dict:
         attrs = {}
         if event.plugin_name:
             attrs["provider"] = event.plugin_name
+        if event.plugin_version:
+            attrs["provider_version"] = event.plugin_version
         return attrs

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 73e45e8 and 1758c67.

📒 Files selected for processing (2)

agents-core/vision_agents/core/observability/collector.py
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

agents-core/vision_agents/core/observability/collector.py
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

🧠 Learnings (1)

📚 Learning: 2025-11-24T17:04:43.030Z

Learnt from: CR
Repo: GetStream/Vision-Agents PR: 0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-11-24T17:04:43.030Z
Learning: Applies to **/*.py : Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python

Applied to files:

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py

🧬 Code graph analysis (2)

agents-core/vision_agents/core/observability/collector.py (10)

plugins/roboflow/vision_agents/plugins/roboflow/roboflow_cloud_processor.py (1)

events (213-216)

plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (1)

events (208-211)

agents-core/vision_agents/core/events/base.py (2)

PluginBaseEvent (52-54)

VideoProcessorDetectionEvent (143-156)

agents-core/vision_agents/core/llm/events.py (12)

LLMResponseCompletedEvent (129-153)

LLMErrorEvent (255-267)

ToolEndEvent (167-176)

RealtimeConnectedEvent (11-18)

RealtimeDisconnectedEvent (22-29)

RealtimeAudioInputEvent (33-37)

RealtimeAudioOutputEvent (41-46)

RealtimeResponseEvent (50-58)

RealtimeUserSpeechTranscriptionEvent (180-185)

RealtimeAgentSpeechTranscriptionEvent (189-194)

VLMInferenceCompletedEvent (213-235)

VLMErrorEvent (239-251)

agents-core/vision_agents/core/stt/events.py (2)

STTTranscriptEvent (17-47)

STTErrorEvent (81-93)

agents-core/vision_agents/core/tts/events.py (1)

TTSSynthesisCompleteEvent (35-45)

agents-core/vision_agents/core/turn_detection/events.py (1)

TurnEndedEvent (29-46)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (2)

on_stt_transcript (121-122)

_on_stt_transcript (270-275)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)

on_stt_transcript (229-230)

_on_stt_transcript (396-401)

tests/test_metrics_collector.py (2)

record (34-35)

add (37-38)

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (3)

agents-core/vision_agents/core/llm/events.py (3)

LLMRequestStartedEvent (91-100)

LLMResponseCompletedEvent (129-153)

LLMResponseChunkEvent (104-125)

plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)

_emit_completion_event (430-470)

_standardize_and_emit_event (472-549)

plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)

_standardize_and_emit_event (417-458)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Mypy

🔇 Additional comments (11)

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (5)

1-25: LGTM!

The dark mouth of necessity swallows these imports whole—time for the ticking clock, TextDelta for the streaming whisper, LLMRequestStartedEvent for the ritual beginning. All are consumed by the code that follows.

128-141: LGTM!

The bell tolls at the request's birth—LLMRequestStartedEvent announces itself like a black-winged harbinger. The perf_counter begins its relentless count, and first_token_time waits, hollow, for the first syllable to arrive.

232-238: LGTM!

In the non-streaming abyss, latency is measured in one clean stroke—the whole response arrives like a stone dropped into still water. No first token to mark, just the full weight of completion.

252-276: LGTM!

The stream drips its tokens one by one—each TextDelta a heartbeat in the wire. The first token's arrival is captured like a moth pinned to time, and the usage metrics accumulate in the margins of message_start and message_delta. The flag emitted_first_chunk stands guard, ensuring the first breath is counted only once.

432-471: LGTM!

A method born to carry the weight of completion—it extracts tokens from the response's belly, computes the sum of input and output, and sends forth the LLMResponseCompletedEvent into the void. The docstring follows the Google gospel. The arithmetic is clean.

agents-core/vision_agents/core/observability/collector.py (6)

83-175: LGTM!

The subscription methods follow a consistent, defensive pattern—checking component availability before subscribing. The async handler wrappers correctly delegate to synchronous methods, maintaining a clean separation.

180-221: LGTM!

The LLM handlers properly guard against None values before recording metrics and consistently extract base attributes. The str(event.success).lower() pattern for boolean attributes aligns with OpenTelemetry conventions.

248-274: LGTM!

The audio input/output handlers correctly perform chained null checks (event.data and event.data.samples is not None) before accessing nbytes, which addresses the concern from the commit message. This avoids AttributeError without resorting to hasattr.

313-379: LGTM!

STT, TTS, and turn detection handlers maintain consistency with the established pattern. The character count via len(event.text) is straightforward and efficient.

385-463: LGTM!

The VLM and video processor handlers correctly distinguish their subscription points—VLM events through agent.llm.events and processor events through agent.events. The conditional checks for non-zero counts prevent noise in metrics.

26-43: Both RealtimeErrorEvent and TTSErrorEvent are properly exported from their respective modules. The imports are valid.

agents-core/vision_agents/core/observability/collector.py

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@tests/test_metrics_collector.py`:
- Around line 185-189: There is a duplicated assertion:
llm_time_to_first_token_ms.record.assert_not_called() appears twice; remove the
duplicate and replace it with the correct metric assertion if one was intended
(e.g., if the test defines a different metric variable like
llm_time_to_first_byte_ms or llm_time_to_last_token_ms, call that metric's
.record.assert_not_called()), otherwise just delete the extra
llm_time_to_first_token_ms.record.assert_not_called() line so each metric
(llm_latency_ms, llm_time_to_first_token_ms, llm_input_tokens,
llm_output_tokens) is asserted exactly once.
- Around line 54-95: The fixture mock_metrics currently stops all patches in the
finally block immediately, so mocks are removed before tests run and it also
uses hasattr (violating guidelines); change it to a yield-based fixture that
starts patches for each metric method (check for methods using getattr(metric,
"record", None) and getattr(metric, "add", None) and/or test callability) before
yielding and then stops all started patch objects after the yield, ensuring you
still reference the same unique symbols (mock_metrics, all_metrics, record, add,
patches, patch.object) so tests retain instrumentation until teardown.

🧹 Nitpick comments (1)

tests/test_metrics_collector.py (1)
318-347: Consider testing video_detections metric.

The VLMInferenceCompletedEvent has a detections field (per the event definition), and video_detections is imported but never asserted upon. If the collector records this metric, it should be tested here.
💡 Optional: Add assertion for video_detections
         vlm_output_tokens.add.assert_called_once_with(
             20, {"provider": "moondream", "model": "moondream-cloud"}
         )
+        # If detections metric is recorded by collector:
+        # video_detections.add.assert_called_once_with(
+        #     0, {"provider": "moondream", "model": "moondream-cloud"}
+        # )
Alternatively, create an event with detections=3 and verify the metric is recorded.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1758c67 and c19d361.

📒 Files selected for processing (1)

tests/test_metrics_collector.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*test*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*test*.py: Never mock in tests; use pytest for testing
Mark integration tests with @pytest.mark.integration decorator
@pytest.mark.asyncio is not needed - it is automatic

Files:

tests/test_metrics_collector.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path in Python code
Never write except Exception as e - use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings

Files:

tests/test_metrics_collector.py

🧬 Code graph analysis (1)

tests/test_metrics_collector.py (7)

agents-core/vision_agents/core/events/manager.py (2)

EventManager (56-561)

wait (483-496)

agents-core/vision_agents/core/llm/events.py (4)

LLMResponseCompletedEvent (129-153)

ToolEndEvent (167-176)

VLMErrorEvent (239-251)

VLMInferenceCompletedEvent (213-235)

agents-core/vision_agents/core/observability/collector.py (2)

MetricsCollector (51-481)

_base_attributes (469-481)

agents-core/vision_agents/core/stt/events.py (3)

STTErrorEvent (81-93)

STTTranscriptEvent (17-47)

TranscriptResponse (7-13)

agents-core/vision_agents/core/tts/events.py (2)

TTSErrorEvent (49-62)

TTSSynthesisCompleteEvent (35-45)

agents-core/vision_agents/core/turn_detection/events.py (1)

TurnEndedEvent (29-46)

plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (1)

start (137-185)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
GitHub Check: unit / Mypy
GitHub Check: unit / Ruff
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff
GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
GitHub Check: unit / Mypy

🔇 Additional comments (4)

tests/test_metrics_collector.py (4)

97-114: LGTM!

The fixture properly creates an EventManager and registers all the event types exercised by the tests.

117-136: Structure looks correct, but depends on the broken mock_metrics fixture.

The wiring of mock agent components to the shared event_manager is appropriate. Once mock_metrics is fixed, this fixture should function as intended.

142-168: LGTM!

Test correctly exercises the LLM response completed handler via EventManager and validates all expected metric recordings.

369-387: LGTM!

These tests verify the _base_attributes helper behavior directly, which is appropriate for unit testing the collector's internal logic.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

tests/test_metrics_collector.py

added metrics and example

bd409c1

d3xvn marked this pull request as draft January 5, 2026 17:11

github-actions bot added dependencies agents-core plugins examples config docs project-info labels Jan 5, 2026

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py Outdated Show resolved Hide resolved

plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py Show resolved Hide resolved

plugins/xai/vision_agents/plugins/xai/llm.py Outdated Show resolved Hide resolved

d3xvn and others added 4 commits January 8, 2026 17:08

Merge main into feat/metrics-impl

5851258

Fix async run error

c776f8c

d3xvn marked this pull request as ready for review January 9, 2026 10:58

d3xvn added 3 commits January 9, 2026 12:00

fix: remove unused Optional import from roboflow events

0f6dc0e

fix: correct closing parenthesis indentation in anthropic_llm.py

c4307e5

fix: address CodeRabbit review feedback

58e87c3

- Reset _audio_start_time in ElevenLabs STT error/close handlers - Fix is_first_chunk logic in xAI LLM to only be true for first chunk - Format code with ruff

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

examples/02_golf_coach_example/golf_coach_with_metrics.py Outdated Show resolved Hide resolved

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py Show resolved Hide resolved

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

agents-core/vision_agents/core/observability/collector.py Show resolved Hide resolved

agents-core/vision_agents/core/observability/collector.py Outdated Show resolved Hide resolved

plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py Show resolved Hide resolved

dangusev reviewed Jan 9, 2026

View reviewed changes

agents-core/vision_agents/core/observability/collector.py Outdated Show resolved Hide resolved

Apply suggestions from code review

d85fcb2

dangusev requested changes Jan 9, 2026

View reviewed changes

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py Show resolved Hide resolved

tests/test_metrics_collector.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

fixed comments and coderabbit things

78e605a

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

d3xvn added 2 commits January 13, 2026 17:33

prefix unused variable with underscore to satisfy linter

73e45e8

fix formatting and mypy type errors

ca2bde0

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

agents-core/vision_agents/core/observability/collector.py Show resolved Hide resolved

plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py Outdated Show resolved Hide resolved

remove hasattr usage and add null check for samples

1758c67

- Use direct attribute access for response.usage and response.model - Add null check for event.data.samples before accessing .nbytes

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

agents-core/vision_agents/core/observability/collector.py Outdated Show resolved Hide resolved

Make metrics tests more robust

c19d361

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

tests/test_metrics_collector.py Show resolved Hide resolved

tests/test_metrics_collector.py Show resolved Hide resolved

dangusev added 3 commits January 14, 2026 20:29

Fix mock_metrics fixture

e75b550

Add metrics per Agent

5df62a0

Test cleanup

57c9ea2

dangusev approved these changes Jan 14, 2026

View reviewed changes

dangusev merged commit ad21b7f into main Jan 14, 2026
10 checks passed

dangusev deleted the feat/metrics-impl branch January 14, 2026 20:16

This was referenced Jan 15, 2026

Agent HTTP server #284

Merged

feat: add Grafana dashboard to prometheus metrics example #295

Merged

Conversation

d3xvn commented Jan 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Nash0x7E2 commented Jan 12, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

d3xvn commented Jan 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 5, 2026 •

edited

Loading