Conversation
📝 WalkthroughWalkthroughAdds Prometheus-first metrics, a MetricsCollector that subscribes to agent events, extends LLM/VLM event types with request/start/chunk/completion/error timing and token/frame fields, threads timing through many plugins, instruments STT/TTS/video detectors, and adds examples and tests for metrics collection. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Agent
participant LLM
participant VLM
participant EventSystem
participant MetricsCollector
participant MeterProvider
participant Prometheus
User->>Agent: send request (audio/video/text)
Agent->>LLM: forward request / create_response
Agent->>VLM: forward frames (if vision)
LLM->>EventSystem: emit LLMRequestStartedEvent
VLM->>EventSystem: emit VLMInferenceStartEvent
EventSystem->>MetricsCollector: notify start events
MetricsCollector->>MeterProvider: record counters/histograms
LLM->>EventSystem: emit LLMResponseChunkEvent (is_first_chunk, ttft)
EventSystem->>MetricsCollector: notify chunk -> record ttft/interim metrics
VLM->>EventSystem: emit VLMInferenceCompletedEvent (latency, frames, detections)
EventSystem->>MetricsCollector: notify VLM completion -> record vlm metrics
LLM->>EventSystem: emit LLMResponseCompletedEvent (latency, tokens, model)
EventSystem->>MetricsCollector: notify completion -> record llm latency & token counters
MeterProvider->>Prometheus: expose /metrics endpoint
Prometheus->>Prometheus: scrape /metrics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (5)
🧰 Additional context used📓 Path-based instructions (2)**/*test*.py📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
Files:
**/*.py📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
Files:
🧬 Code graph analysis (2)tests/test_metrics_collector.py (5)
agents-core/vision_agents/core/observability/__init__.py (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (18)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Fix all issues with AI Agents 🤖
In @plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py:
- Around line 233-239: The closing parenthesis for the
self._emit_completion_event call is mis-indented which can create a syntax
error; fix by aligning the parentheses properly so the call ends with the
closing parenthesis at the same indentation level as the method call start
(ensure the args llm_response.original, llm_response.text, latency_ms=latency_ms
are inside the call and the final ')' closes self._emit_completion_event(...));
reference the self._emit_completion_event invocation and the variables
llm_response and latency_ms/request_start_time to locate and correct the
indentation.
In @plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py:
- Around line 331-332: Reset _audio_start_time whenever an utterance is aborted
or the connection closes so processing_time_ms isn't calculated from a stale
start time; specifically, add assignments setting self._audio_start_time = None
in the error and close handlers (_on_error and _on_close), and in the public
cleanup methods clear() and close(), ensuring this mirrors the existing reset
performed after a committed transcript. Locate uses of _audio_start_time in the
commit/processing path and add the same reset in those four places to guarantee
each new utterance starts timing fresh.
In @plugins/xai/vision_agents/plugins/xai/llm.py:
- Around line 450-454: The current check sets is_first = first_token_time is not
None and request_start_time is not None which stays true after the first token
and mislabels subsequent chunks; change the logic to determine "first chunk" by
checking whether first_token_time is None (and request_start_time is present)
before you set first_token_time, or introduce and pass an explicit flag like
is_first_content_chunk (true when first_token_time is None) from the caller;
compute that flag prior to mutating first_token_time, use it to compute ttft_ms
only for the real first chunk, and then set first_token_time as needed.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
501-520: First chunk detection logic appears incorrect.Line 504 sets
is_first = first_token_time is not None and request_start_time is not None. This condition is true for every chunk after the first token is received, not just the first chunk. Theis_first_chunkfield should only beTruefor the actual first text delta.🔎 Proposed fix
if event.type == "response.output_text.delta": delta_event: ResponseTextDeltaEvent = event # Calculate time to first token for the first chunk - is_first = first_token_time is not None and request_start_time is not None + # first_token_time is set only on the first delta event, so check sequence_number + is_first = delta_event.sequence_number == 0 ttft_ms = None - if is_first: + if is_first and first_token_time is not None and request_start_time is not None: ttft_ms = (first_token_time - request_start_time) * 1000Alternatively, track a boolean flag that's set after emitting the first chunk:
+ # Only the first delta event should have is_first_chunk=True + is_first = ( + first_token_time is not None + and request_start_time is not None + and delta_event.sequence_number == 0 + )
🧹 Nitpick comments (18)
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (1)
254-258: Consider extracting the duplicate processing time calculation.The processing time calculation appears in both
_on_partial_transcriptand_on_committed_transcript. Extracting this to a helper method would reduce duplication.🔎 Proposed refactor
+ def _get_processing_time_ms(self) -> Optional[float]: + """Calculate processing time from audio start to now, in milliseconds.""" + if self._audio_start_time is not None: + return (time.perf_counter() - self._audio_start_time) * 1000 + return None + def _on_partial_transcript(self, transcription_data: dict[str, Any]): ... - # Calculate processing time (time from first audio to transcript) - processing_time_ms: Optional[float] = None - if self._audio_start_time is not None: - processing_time_ms = (time.perf_counter() - self._audio_start_time) * 1000 + processing_time_ms = self._get_processing_time_ms() def _on_committed_transcript(self, transcription_data: dict[str, Any]): ... - # Calculate processing time (time from first audio to transcript) - processing_time_ms: Optional[float] = None - if self._audio_start_time is not None: - processing_time_ms = (time.perf_counter() - self._audio_start_time) * 1000 + processing_time_ms = self._get_processing_time_ms()Also applies to: 307-311
examples/03_prometheus_metrics_example/pyproject.toml (1)
6-17: Consider pinning dependency versions for reproducibility.Using "latest" for all dependencies may cause builds to break unexpectedly when upstream packages release new versions. For example projects, pinning versions demonstrates best practices and ensures reproducible environments.
🔎 Alternative approach using version constraints
Replace "latest" with specific version constraints or ranges:
dependencies = [ - "vision-agents", - "vision-agents-plugins-gemini", - "vision-agents-plugins-deepgram", - "vision-agents-plugins-elevenlabs", - "vision-agents-plugins-getstream", - "python-dotenv", - "opentelemetry-api", - "opentelemetry-sdk", - "opentelemetry-exporter-prometheus", - "prometheus-client", + "vision-agents>=0.1.0", + "vision-agents-plugins-gemini>=0.1.0", + "vision-agents-plugins-deepgram>=0.1.0", + "vision-agents-plugins-elevenlabs>=0.1.0", + "vision-agents-plugins-getstream>=0.1.0", + "python-dotenv>=1.0.0", + "opentelemetry-api>=1.20.0", + "opentelemetry-sdk>=1.20.0", + "opentelemetry-exporter-prometheus>=0.41b0", + "prometheus-client>=0.19.0", ]examples/03_prometheus_metrics_example/README.md (2)
20-20: Format the bare URL for better rendering.The URL should be enclosed in angle brackets or converted to a proper Markdown link to improve rendering across different Markdown parsers.
🔎 Proposed formatting fix
-Then open http://localhost:9464/metrics in your browser to see real-time metrics as you talk to the agent. +Then open <http://localhost:9464/metrics> in your browser to see real-time metrics as you talk to the agent.
113-119: Specify language for the fenced code block.The code fence at line 113 should include a language identifier (e.g.,
bashorshell) for proper syntax highlighting.🔎 Proposed formatting fix
-``` +```bash GOOGLE_API_KEY=your_key DEEPGRAM_API_KEY=your_key ELEVENLABS_API_KEY=your_keyplugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)
304-330: Refactor to eliminate duplicated event emission logic.The VQA and caption branches contain nearly identical event emission code (latency calculation + three event emissions). This duplication makes maintenance harder and increases the risk of inconsistencies.
🔎 Proposed refactoring to extract common event emission
Extract the event emission logic into a helper method:
+ def _emit_completion_events( + self, + text: str, + inference_id: str, + start_time: float, + ) -> None: + """Emit completion events for VLM inference.""" + latency_ms = (time.perf_counter() - start_time) * 1000 + + # Emit chunk event for TTS + self.events.send(LLMResponseChunkEvent(delta=text)) + + # Emit VLM-specific completion event with metrics + self.events.send( + VLMInferenceCompletedEvent( + plugin_name="moondream_local", + inference_id=inference_id, + model=self.model_name, + text=text, + latency_ms=latency_ms, + frames_processed=1, + ) + ) + + # Also emit LLM completion for compatibility + self.events.send( + LLMResponseCompletedEvent( + plugin_name="moondream_local", + text=text, + latency_ms=latency_ms, + model=self.model_name, + ) + )Then use it in both branches:
if not answer: logger.warning("Moondream query returned empty answer") return None - latency_ms = (time.perf_counter() - start_time) * 1000 - - # Emit chunk event for TTS - self.events.send(LLMResponseChunkEvent(delta=answer)) - - # Emit VLM-specific completion event with metrics - self.events.send( - VLMInferenceCompletedEvent( - plugin_name="moondream_local", - inference_id=inference_id, - model=self.model_name, - text=answer, - latency_ms=latency_ms, - frames_processed=1, - ) - ) - - # Also emit LLM completion for compatibility - self.events.send( - LLMResponseCompletedEvent( - plugin_name="moondream_local", - text=answer, - latency_ms=latency_ms, - model=self.model_name, - ) - ) + self._emit_completion_events(answer, inference_id, start_time) logger.info(f"Moondream VQA response: {answer}") return LLMResponseEvent(original=answer, text=answer)Apply the same pattern to the caption branch.
Also applies to: 350-376
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)
184-210: Refactor to eliminate duplicated event emission logic.Similar to the local VLM implementation, the VQA and caption branches contain duplicated event emission code. Consider extracting this into a shared helper method to improve maintainability and reduce inconsistency risk.
The same refactoring pattern suggested for
moondream_local_vlm.pyapplies here: extract a_emit_completion_eventshelper method to consolidate the three event emissions and latency calculation that are identical in both branches.Also applies to: 224-250
tests/test_metrics_collector.py (1)
10-10: Avoid usingMagicMockin tests per coding guidelines.The coding guidelines specify "Never mock in tests; use pytest for testing." The
MagicMockusage foragenton lines 82-88 violates this guideline. Consider creating a minimal realAgentinstance or a purpose-built test fixture instead.Based on coding guidelines, mocking should be avoided in favor of real instances or pytest fixtures.
Also applies to: 81-93
plugins/aws/vision_agents/plugins/aws/aws_llm.py (1)
599-608: Streaming path missing token usage extraction.The non-streaming
conversemethod extracts token usage from the response (lines 316-321), but the streamingconverse_streammethod does not. If AWS Bedrock provides token usage in streaming responses (typically in the final event), consider extracting it for consistency.agents-core/vision_agents/core/observability/collector.py (2)
476-488: Avoidhasattrper coding guidelines.The coding guidelines state "Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python." Consider using a try/except block or checking against a known interface.
🔎 Proposed fix
def _base_attributes(self, event) -> dict: """Extract base attributes from an event. Args: event: The event to extract attributes from. Returns: Dictionary of base attributes. """ attrs = {} - if hasattr(event, "plugin_name") and event.plugin_name: + plugin_name = getattr(event, "plugin_name", None) + if plugin_name: - attrs["provider"] = event.plugin_name + attrs["provider"] = plugin_name return attrsActually, since getattr is also discouraged, a better approach would be:
def _base_attributes(self, event) -> dict: attrs = {} + try: + if event.plugin_name: + attrs["provider"] = event.plugin_name + except AttributeError: + pass - if hasattr(event, "plugin_name") and event.plugin_name: - attrs["provider"] = event.plugin_name return attrsBased on coding guidelines, hasattr should be avoided.
450-470: Avoidgetattrper coding guidelines.Lines 455, 460, and 468 use
getattrto access potentially missing attributes. Per coding guidelines, prefer normal attribute access. Consider defining a typed event class for detection events or using try/except.🔎 Proposed fix using try/except
def _on_detection_completed(self, event: PluginBaseEvent) -> None: """Handle video detection completed event.""" attrs = self._base_attributes(event) # Add model info if available - model_id = getattr(event, "model_id", None) - if model_id: - attrs["model"] = model_id + try: + if event.model_id: + attrs["model"] = event.model_id + except AttributeError: + pass # Extract detection count from event if available - objects = getattr(event, "objects", []) - if objects: - metrics.video_detections.add(len(objects), attrs) + try: + if event.objects: + metrics.video_detections.add(len(event.objects), attrs) + except AttributeError: + pass # Record frame processed metrics.video_frames_processed.add(1, attrs) # Record inference latency if available - inference_time_ms = getattr(event, "inference_time_ms", None) - if inference_time_ms is not None: - metrics.video_processing_latency_ms.record(inference_time_ms, attrs) + try: + if event.inference_time_ms is not None: + metrics.video_processing_latency_ms.record(event.inference_time_ms, attrs) + except AttributeError: + passBased on coding guidelines, getattr should be avoided.
examples/03_prometheus_metrics_example/prometheus_metrics_example.py (1)
80-106: Consider clarifying the unusedcollectorvariable.The
collectorvariable on line 83 is assigned but never referenced afterward. While this is intentional (MetricsCollector subscribes to events on init), a brief comment or using_collectorwould clarify intent.🔎 Proposed fix
async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None: """Join a call with metrics collection enabled.""" - # Attach MetricsCollector to record OpenTelemetry metrics - collector = MetricsCollector(agent) + # Attach MetricsCollector to record OpenTelemetry metrics. + # The collector subscribes to events on initialization - no further reference needed. + _collector = MetricsCollector(agent)plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
343-350: Avoidgetattrper coding guidelines.Lines 348-349 use
getattrto access token counts. Prefer try/except or direct attribute access.🔎 Proposed fix
# Extract token usage from response if available input_tokens: Optional[int] = None output_tokens: Optional[int] = None - if final_chunk and hasattr(final_chunk, "usage_metadata") and final_chunk.usage_metadata: + if final_chunk: - usage = final_chunk.usage_metadata - input_tokens = getattr(usage, "prompt_token_count", None) - output_tokens = getattr(usage, "candidates_token_count", None) + try: + usage = final_chunk.usage_metadata + if usage: + input_tokens = usage.prompt_token_count + output_tokens = usage.candidates_token_count + except AttributeError: + passBased on coding guidelines, getattr and hasattr should be avoided.
plugins/xai/vision_agents/plugins/xai/llm.py (2)
282-297: Coding guidelines violation:getattrusage and broad exception handling.Per coding guidelines, avoid
getattr/hasattrand prefer normal attribute access. Additionally, line 296 uses bareexcept Exception:which should be a more specific exception type (e.g.,json.JSONDecodeError).🔎 Suggested refactor
def _extract_tool_calls_from_response( self, response: Response ) -> List[NormalizedToolCallItem]: calls = [] - tool_calls = getattr(response, "tool_calls", None) or [] + tool_calls = response.tool_calls if response.tool_calls else [] for tc in tool_calls: - func = getattr(tc, "function", None) + func = tc.function if tc.function else None if not func: continue - name = getattr(func, "name", "unknown") - args_str = getattr(func, "arguments", "{}") - call_id = getattr(tc, "id", "") or getattr(tc, "call_id", "") + name = func.name if func.name else "unknown" + args_str = func.arguments if func.arguments else "{}" + call_id = tc.id if tc.id else (tc.call_id if tc.call_id else "") try: args_obj = ( json.loads(args_str) if isinstance(args_str, str) else args_str ) - except Exception: + except json.JSONDecodeError: args_obj = {}Based on coding guidelines: avoid
getattr/hasattr; use specific exception handling.
390-393: Timing metrics not propagated in tool-call follow-up streaming.The
_standardize_and_emit_chunkcall here omitsrequest_start_timeandfirst_token_time, resulting inis_first_chunk=Falseandtime_to_first_token_ms=Nonefor all tool-call follow-up chunks. This is inconsistent with the main streaming path at lines 155-160.Consider tracking timing for tool-call rounds or documenting that timing metrics only apply to the initial request.
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py (1)
156-173: Broad exception handling.Line 156 catches all
Exceptiontypes. Per coding guidelines, prefer specific exception handling. Consider catching the specific exceptions thatchat.completions.createmay raise (e.g.,huggingface_hub.InferenceTimeoutError, HTTP-related exceptions).plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (2)
417-421: Coding guidelines violation:hasattrusage.Per coding guidelines, avoid
hasattrand prefer normal attribute access. Consider using try/except for attribute access or checking the type explicitly.🔎 Suggested refactor
# Extract token usage from response input_tokens: Optional[int] = None output_tokens: Optional[int] = None - if hasattr(response, "usage") and response.usage: - input_tokens = response.usage.input_tokens - output_tokens = response.usage.output_tokens + try: + if response.usage: + input_tokens = response.usage.input_tokens + output_tokens = response.usage.output_tokens + except AttributeError: + pass - model = response.model if hasattr(response, "model") else self.model + try: + model = response.model + except AttributeError: + model = self.modelBased on coding guidelines: avoid
hasattr.
248-252: Coding guidelines:getattr/hasattrusage for SDK interop.Lines 250-252 use
getattrandhasattrto inspect streaming event deltas. While this violates the coding guidelines, it may be necessary for SDK types with varying structures. Consider type-checking or try/except patterns if these SDK types are well-defined.Based on coding guidelines: avoid
getattr/hasattr.plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)
603-610: Missing metrics in tool-call follow-up response.The
LLMResponseCompletedEventemitted during tool-call follow-up rounds lackslatency_ms,time_to_first_token_ms, andmodelfields, unlike the main streaming/non-streaming paths. This creates inconsistency in metrics collection.Consider tracking timing for follow-up rounds or documenting this limitation.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
examples/03_prometheus_metrics_example/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (28)
agents-core/vision_agents/core/llm/events.pyagents-core/vision_agents/core/observability/__init__.pyagents-core/vision_agents/core/observability/collector.pyagents-core/vision_agents/core/observability/metrics.pyexamples/03_prometheus_metrics_example/README.mdexamples/03_prometheus_metrics_example/__init__.pyexamples/03_prometheus_metrics_example/prometheus_metrics_example.pyexamples/03_prometheus_metrics_example/pyproject.tomlplugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.pyplugins/aws/vision_agents/plugins/aws/aws_llm.pyplugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.pyplugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.pyplugins/fish/vision_agents/plugins/fish/stt.pyplugins/gemini/vision_agents/plugins/gemini/gemini_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.pyplugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.pyplugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.pyplugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.pyplugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.pyplugins/openai/vision_agents/plugins/openai/openai_llm.pyplugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.pyplugins/roboflow/vision_agents/plugins/roboflow/events.pyplugins/roboflow/vision_agents/plugins/roboflow/roboflow_cloud_processor.pyplugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.pyplugins/wizper/vision_agents/plugins/wizper/stt.pyplugins/xai/vision_agents/plugins/xai/llm.pytests/test_metrics_collector.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path in Python code
Never writeexcept Exception as e- use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings
Files:
examples/03_prometheus_metrics_example/prometheus_metrics_example.pyplugins/fish/vision_agents/plugins/fish/stt.pyplugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.pyplugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.pyplugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.pytests/test_metrics_collector.pyplugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.pyplugins/openai/vision_agents/plugins/openai/openai_llm.pyplugins/wizper/vision_agents/plugins/wizper/stt.pyagents-core/vision_agents/core/observability/collector.pyplugins/aws/vision_agents/plugins/aws/aws_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.pyplugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.pyplugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.pyagents-core/vision_agents/core/observability/__init__.pyplugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.pyplugins/roboflow/vision_agents/plugins/roboflow/events.pyplugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.pyagents-core/vision_agents/core/llm/events.pyplugins/roboflow/vision_agents/plugins/roboflow/roboflow_cloud_processor.pyplugins/gemini/vision_agents/plugins/gemini/gemini_llm.pyplugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.pyplugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.pyplugins/xai/vision_agents/plugins/xai/llm.pyagents-core/vision_agents/core/observability/metrics.py
**/*test*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*test*.py: Never mock in tests; use pytest for testing
Mark integration tests with @pytest.mark.integration decorator
@pytest.mark.asyncio is not needed - it is automatic
Files:
tests/test_metrics_collector.py
🧬 Code graph analysis (16)
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (1)
agents-core/vision_agents/core/stt/events.py (9)
processing_time_ms(38-39)processing_time_ms(68-69)TranscriptResponse(7-13)confidence(30-31)confidence(60-61)language(34-35)language(64-65)model_name(46-47)model_name(76-77)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)
agents-core/vision_agents/core/llm/events.py (6)
LLMRequestStartedEvent(87-96)LLMResponseChunkEvent(100-121)LLMResponseCompletedEvent(125-149)VLMInferenceStartEvent(199-205)VLMInferenceCompletedEvent(209-231)VLMErrorEvent(235-247)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)
agents-core/vision_agents/core/llm/events.py (5)
VLMInferenceStartEvent(199-205)VLMInferenceCompletedEvent(209-231)VLMErrorEvent(235-247)LLMResponseChunkEvent(100-121)LLMResponseCompletedEvent(125-149)
tests/test_metrics_collector.py (5)
agents-core/vision_agents/core/llm/events.py (2)
LLMResponseCompletedEvent(125-149)ToolEndEvent(163-172)agents-core/vision_agents/core/stt/events.py (3)
STTTranscriptEvent(17-47)STTErrorEvent(81-93)TranscriptResponse(7-13)agents-core/vision_agents/core/tts/events.py (2)
TTSSynthesisCompleteEvent(35-45)TTSErrorEvent(49-62)agents-core/vision_agents/core/turn_detection/events.py (1)
TurnEndedEvent(29-46)agents-core/vision_agents/core/observability/collector.py (11)
MetricsCollector(50-488)_on_llm_response_completed(179-199)_on_tool_end(201-210)_on_stt_transcript(314-326)_on_stt_error(328-336)_on_tts_synthesis_complete(342-356)_on_tts_error(358-366)_on_turn_ended(372-380)_on_vlm_inference_completed(399-424)_on_vlm_error(426-434)_base_attributes(476-488)
plugins/wizper/vision_agents/plugins/wizper/stt.py (1)
agents-core/vision_agents/core/stt/events.py (5)
processing_time_ms(38-39)processing_time_ms(68-69)TranscriptResponse(7-13)model_name(46-47)model_name(76-77)
plugins/aws/vision_agents/plugins/aws/aws_llm.py (2)
agents-core/vision_agents/core/llm/events.py (2)
LLMRequestStartedEvent(87-96)LLMResponseCompletedEvent(125-149)agents-core/vision_agents/core/events/manager.py (1)
send(437-481)
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py (1)
agents-core/vision_agents/core/llm/events.py (6)
LLMRequestStartedEvent(87-96)LLMResponseChunkEvent(100-121)LLMResponseCompletedEvent(125-149)VLMInferenceStartEvent(199-205)VLMInferenceCompletedEvent(209-231)VLMErrorEvent(235-247)
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py (3)
agents-core/vision_agents/core/llm/events.py (2)
LLMRequestStartedEvent(87-96)LLMResponseCompletedEvent(125-149)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)
_process_non_streaming_response(292-332)agents-core/vision_agents/core/llm/llm.py (1)
LLMResponseEvent(39-43)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)
agents-core/vision_agents/core/llm/events.py (2)
LLMRequestStartedEvent(87-96)LLMResponseCompletedEvent(125-149)
agents-core/vision_agents/core/observability/__init__.py (1)
agents-core/vision_agents/core/observability/collector.py (1)
MetricsCollector(50-488)
plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py (1)
agents-core/vision_agents/core/stt/events.py (11)
processing_time_ms(38-39)processing_time_ms(68-69)TranscriptResponse(7-13)confidence(30-31)confidence(60-61)language(34-35)language(64-65)audio_duration_ms(42-43)audio_duration_ms(72-73)model_name(46-47)model_name(76-77)
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (2)
agents-core/vision_agents/core/processors/base_processor.py (1)
name(23-26)tests/test_agent_tracks.py (1)
name(45-46)
agents-core/vision_agents/core/llm/events.py (3)
agents-core/vision_agents/core/events/base.py (2)
PluginBaseEvent(52-54)error_message(92-93)agents-core/vision_agents/core/agents/events.py (1)
error_message(65-66)agents-core/vision_agents/core/tts/events.py (1)
error_message(61-62)
plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py (1)
agents-core/vision_agents/core/llm/events.py (1)
LLMResponseCompletedEvent(125-149)
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (3)
agents-core/vision_agents/core/llm/events.py (3)
LLMRequestStartedEvent(87-96)LLMResponseCompletedEvent(125-149)LLMResponseChunkEvent(100-121)agents-core/vision_agents/core/events/manager.py (1)
send(437-481)plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)
_emit_completion_event(427-467)_standardize_and_emit_event(469-546)
plugins/xai/vision_agents/plugins/xai/llm.py (2)
agents-core/vision_agents/core/llm/events.py (2)
LLMRequestStartedEvent(87-96)LLMResponseCompletedEvent(125-149)agents-core/vision_agents/core/events/manager.py (1)
send(437-481)
🪛 markdownlint-cli2 (0.18.1)
examples/03_prometheus_metrics_example/README.md
20-20: Bare URL used
(MD034, no-bare-urls)
113-113: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
- GitHub Check: unit / Ruff
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Mypy
- GitHub Check: unit / Mypy
- GitHub Check: unit / Ruff
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
🔇 Additional comments (54)
plugins/fish/vision_agents/plugins/fish/stt.py (3)
4-4: LGTM: Clean timing import.The
timemodule import is appropriate for the timing instrumentation added below.
88-116: Excellent timing implementation.The use of
time.perf_counter()is the correct choice for measuring elapsed time—monotonic and high-resolution. The timing scope appropriately captures the entire ASR processing pipeline (PCM conversion, request building, API call, and response processing).
122-122: The code correctly passesprocessing_time_mstoTranscriptResponse. The parameter is explicitly defined as a dataclass field and the timing metric is properly computed.plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py (6)
4-4: LGTM: Appropriate timing module.The
timeimport is correctly used withperf_counter()for monotonic performance measurement.
84-85: LGTM: Well-documented timing state.The instance variable is properly typed and clearly documented.
218-221: LGTM: Correct processing time calculation.The elapsed time calculation using
perf_counter()is accurate and properly handles the None case. The metric correctly measures wall-clock time from first audio chunk to transcript arrival.
229-229: LGTM: Correct parameter passing.The
processing_time_msis properly passed toTranscriptResponse, matching the expected schema from the external snippets.
251-252: LGTM: Proper timer reset at turn boundary.Resetting
_audio_start_timeto None after the turn ends correctly prepares the timer for the next utterance.
122-124: No changes needed.The Deepgram STT is designed as a single-stream implementation. The
_audio_start_timeshared state is appropriate because:
- Single WebSocket connection: One connection per STT instance (line 80)
- Single STT per agent: All usage patterns show one STT instance per agent, not per participant
- Sequential processing: Tests and examples demonstrate sequential
process_audiocalls, not concurrent calls from different participants- Docstring note (line 100): "currently not used in streaming mode" indicates the participant parameter is metadata-only
- Architecture comparison: Turn detection modules (vogent, smart_turn) that handle multi-participant scenarios explicitly use queues for per-participant state; Deepgram STT intentionally does not
The latency measurement correctly captures the time from the first audio chunk of an utterance to its conclusion, which is the intended behavior for single-stream speech processing with Deepgram's built-in turn detection.
Likely an incorrect or invalid review comment.
plugins/wizper/vision_agents/plugins/wizper/stt.py (1)
84-135: Timing instrumentation looks correct.The processing time measurement correctly captures the end-to-end latency including file upload and transcription. The
TranscriptResponsemetadata is properly populated withprocessing_time_msandmodel_name.plugins/roboflow/vision_agents/plugins/roboflow/roboflow_cloud_processor.py (1)
228-287: Timing instrumentation and event enrichment look good.The inference timing correctly captures latency around the
_run_inferencecall, and theDetectionCompletedEventpayload now includes plugin context (plugin_name), timing (inference_time_ms), and model identification (model_id). Error handling appropriately skips metrics on failure.plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (1)
257-317: Timing instrumentation and event enrichment are consistent.The implementation mirrors the cloud processor pattern: inference timing is correctly captured, and
DetectionCompletedEventnow carries plugin context, latency, and model identification. The error path appropriately bypasses metrics collection.plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (3)
4-5: LGTM! Imports support the new metrics instrumentation.The addition of
timeanduuidmodules, along with the VLM event classes, correctly supports the timing and event emission functionality introduced in this PR.Also applies to: 17-23
267-278: LGTM! Proper initialization of inference tracking.The inference tracking setup correctly generates a unique ID, captures start time with high-resolution
perf_counter(), and emits the start event before processing begins. Theframes_count=1accurately reflects that this VLM processes one frame at a time.
380-394: LGTM! Proper error handling and resource cleanup.The exception handler correctly emits
VLMErrorEventwith contextual information, and thefinallyblock ensures the processing lock is released even when exceptions occur. This prevents deadlocks and provides proper observability for failures.plugins/roboflow/vision_agents/plugins/roboflow/events.py (1)
3-3: LGTM! Clean extension of detection event with metrics fields.The addition of
inference_time_msandmodel_idas optional fields properly extends the detection event to support metrics collection. The docstring fix and field documentation follow conventions.Also applies to: 18-18, 25-26, 33-34
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (2)
4-5: LGTM! Proper metrics instrumentation setup.The imports and start event emission are correctly implemented. Notably, the
VLMInferenceStartEventis emitted before acquiring the processing lock (line 164), which correctly captures the total latency including any lock wait time.Also applies to: 15-21, 151-162
257-268: LGTM! Proper error event emission.The exception handler correctly emits
VLMErrorEventwith the inference ID and context information for observability. Theasync withcontext manager (line 164) ensures the lock is released automatically.plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (4)
4-5: LGTM! Comprehensive metrics instrumentation setup.The implementation correctly:
- Counts frames from the actual buffer size (line 144)
- Generates a unique inference ID (line 145)
- Emits both VLM and LLM start events (lines 148-164)
- Initializes timing variables for first-token tracking (lines 167-168)
This provides complete observability for both VLM-specific and general LLM metrics.
Also applies to: 15-22, 143-168
176-194: LGTM! Comprehensive error event emission.The error handling correctly emits both plugin-specific
LLMErrorEventand coreVLMErrorEvent, ensuring error observability at both the LLM and VLM abstraction levels.
210-233: LGTM! Correct first-token timing implementation.The first-token timing logic correctly:
- Captures
first_token_timeonly once (lines 212-213)- Calculates time-to-first-token only for the initial chunk (lines 216-218)
- Includes the
is_first_chunkflag andtime_to_first_token_msin the chunk event (lines 230-231)This enables accurate measurement of streaming latency characteristics.
242-270: LGTM! Comprehensive completion event emission with accurate metrics.The completion event handling correctly:
- Calculates total latency from request start (line 242)
- Computes final time-to-first-token if tokens were received (lines 243-245)
- Emits both VLM-specific and LLM compatibility events (lines 248-270)
- Includes frames_processed in VLM event and comprehensive timing in both
This provides complete metrics for both VLM and LLM observability layers.
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py (3)
3-3: LGTM! Consistent timing instrumentation across request lifecycle.The timing infrastructure is correctly implemented:
- Start time captured before the API call (line 170)
LLMRequestStartedEventemitted with model and streaming flag (lines 161-167)request_start_timeparameter consistently propagated to both streaming and non-streaming processing functions (lines 187, 191, 200, 292)This enables accurate latency measurements across all request paths.
Also applies to: 9-9, 160-171, 186-192, 200-200, 292-292
210-210: LGTM! Correct first-token timing in streaming responses.The streaming response timing correctly:
- Tracks first token time only once (lines 227-228)
- Calculates time-to-first-token for the initial chunk (lines 230-233)
- Includes timing metadata in chunk events (lines 244-245)
- Emits completion with comprehensive timing and model info (lines 259-274)
The implementation is consistent with other plugins in this PR.
Also applies to: 226-234, 244-245, 259-274
286-313: LGTM! Correct latency tracking in non-streaming responses.The non-streaming path correctly calculates latency from the request start time (line 295) and includes it along with the model identifier in the completion event (lines 309-310).
tests/test_metrics_collector.py (3)
96-138: LGTM!The test comprehensively validates that all LLM response metrics (latency, TTFT, input/output tokens) are recorded with correct values and attributes when all fields are populated.
140-156: LGTM!Good coverage of the partial data path, ensuring no metrics are recorded when optional fields are absent.
158-399: LGTM!The remaining tests provide solid coverage for tool calls, STT/TTS events, turn detection, VLM inference, and error handling paths. The base attributes extraction tests correctly validate provider attribution.
plugins/aws/vision_agents/plugins/aws/aws_llm.py (2)
145-156: LGTM!Request started event emission and timing initialization are correctly placed before the API call.
312-336: LGTM!Latency calculation and token usage extraction are correctly implemented. The conditional
total_tokenscomputation handles the case where either input or output tokens are present.agents-core/vision_agents/core/observability/collector.py (2)
1-81: LGTM!The module docstring, imports, and initialization are well-structured. The event subscription pattern is clean and maintainable.
179-220: LGTM!LLM event handlers correctly extract attributes and record metrics with appropriate null checks.
examples/03_prometheus_metrics_example/prometheus_metrics_example.py (1)
1-55: LGTM!The example correctly sets up OpenTelemetry with Prometheus exporter before importing vision_agents, ensuring automatic metric recording. Well-documented with clear setup instructions.
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (2)
389-426: LGTM!The updated
_standardize_and_emit_eventcorrectly computes and propagates timing metadata for first-chunk detection and TTFT calculation.
199-239: LGTM!Request started event emission and timing tracking are correctly implemented for the streaming path.
plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py (3)
258-266: LGTM!Request timing is correctly captured and passed to the non-streaming response handler.
282-359: LGTM!Streaming path correctly tracks first token time, computes TTFT per-chunk, and emits completion events with full timing metrics.
361-407: LGTM!Non-streaming path correctly handles optional timing and extracts token usage from the response.
plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)
427-467: LGTM!The
_emit_completion_eventhelper is well-structured with clear documentation and correctly extracts token usage metrics from the response.
151-164: LGTM!Request started event emission and timing initialization are correctly implemented.
plugins/xai/vision_agents/plugins/xai/llm.py (1)
130-141: LGTM!The
LLMRequestStartedEventemission and timing initialization are correctly implemented, aligning with the observability pattern across other LLM plugins.plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py (2)
191-212: LGTM!The first-chunk detection logic correctly uses
len(text_chunks) == 0before appending, ensuringis_first_chunkis onlyTruefor the actual first content chunk. Well implemented!
123-135: LGTM!The VLM instrumentation correctly captures
frames_count, generates a uniqueinference_idfor correlation, and emits both VLM-specific and LLM-compatible start events.agents-core/vision_agents/core/observability/__init__.py (2)
1-11: LGTM!The module docstring provides a clear, actionable usage example demonstrating the opt-in metrics collection pattern with
MetricsCollector.
59-105: LGTM!The
__all__export list is comprehensive and well-organized by category, providing a clean public API surface for the observability module.plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (1)
468-489: LGTM!The first-chunk detection correctly combines timing availability with
len(text_parts) == 1to ensureis_first_chunkis onlyTruefor the actual first content chunk. Good implementation!agents-core/vision_agents/core/llm/events.py (3)
86-97: LGTM!The
LLMRequestStartedEventis well-structured with auto-generatedrequest_id, optionalmodel, andstreamingflag. Clear docstrings follow the established pattern.
198-206: LGTM!
VLMInferenceStartEventprovides good correlation capability viainference_idand captures essential context (model,frames_count) for vision-language model observability.
234-247: Exception serialization consideration.The
error: Optional[Exception]field stores an Exception object directly. If events are serialized (e.g., for logging or transport), this could cause issues since Exceptions aren't JSON-serializable by default.The
error_messageproperty provides a string representation, which is good. Just be aware of this if events are serialized.plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
230-251: LGTM!First-chunk detection correctly uses
len(text_chunks) == 0before appending, and timing metrics are properly calculated and propagated. Clean implementation!
310-332: LGTM!Non-streaming path properly extracts token usage from
response.usageand includes all relevant metrics in the completion event.agents-core/vision_agents/core/observability/metrics.py (3)
1-33: LGTM!Excellent documentation! The module docstring clearly explains that metric providers are application-controlled, provides a working Prometheus example, and documents the opt-in nature of
MetricsCollector.
153-162: Duration metrics as counters - verify intent.
realtime_audio_input_duration_msandrealtime_audio_output_duration_msare defined as counters, which means they accumulate total duration over time rather than recording individual durations. This is appropriate for tracking cumulative audio processed but won't give you per-request latency distributions.If per-request distribution is needed, consider histograms. If cumulative tracking is the goal, the current counter approach is correct.
42-220: LGTM!Comprehensive and well-organized metrics covering all major components: STT, TTS, LLM, turn detection, realtime, VLM, and video processing. Consistent naming convention and appropriate metric types (histograms for latency, counters for events/tokens).
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
Outdated
Show resolved
Hide resolved
- Fix type narrowing issues in openai, xai, anthropic LLM plugins - Add is_global_banned param to ChannelMember for getstream SDK update - Remove unused imports and variables - Add noqa comments for intentional E402 in metrics examples - Add golf_coach_with_metrics.py example for testing realtime metrics
- Remove direct metrics from TTS, use MetricsCollector exclusively - Add VideoProcessorDetectionEvent base class for typed detection events - Add LLMErrorEvent for non-realtime LLM errors - Fix hasattr usage in _base_attributes to use direct attribute access - Update roboflow plugin to inherit from VideoProcessorDetectionEvent
- Reset _audio_start_time in ElevenLabs STT error/close handlers - Fix is_first_chunk logic in xAI LLM to only be true for first chunk - Format code with ruff
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (3)
497-507: Don’t log URLs containing auth tokens; also avoidexcept Exception as e.
urlincludestokenand is logged (and re-logged on failure). Please redact the token in logs and narrow the exception handling to specific expected failures (e.g.,webbrowser.Error,OSError).
69-81: Remove broadexcept Exception as einStreamConnection.close(guideline violation).
Either let unexpected exceptions propagate (preferred), or catch only the concrete exception typesConnectionManager.leave()can raise.Proposed patch
async def close(self, timeout: float = 2.0): try: await asyncio.wait_for(self._connection.leave(), timeout=timeout) except asyncio.TimeoutError: logger.warning("Connection leave timed out during close") except RuntimeError as e: if "asynchronous generator" in str(e): logger.debug(f"Ignoring async generator error during shutdown: {e}") else: raise - except Exception as e: - logger.error(f"Error during connection close: {e}")
246-272: Replacehasattr(...)with explicit event-type checks (guideline violation).
This is currently relying on attribute presence; make the branch depend on the union type instead (e.g.,isinstance(event, sfu_events.TrackUnpublishedEvent)vsParticipantLeftEvent).Proposed patch (shape)
- # Determine which tracks to remove - if hasattr(event.payload, "type") and event.payload is not None: - # TrackUnpublishedEvent - single track - tracks_to_remove = [event.payload.type] - event_desc = "Track unpublished" - else: - # ParticipantLeftEvent - all published tracks - tracks_to_remove = ( - event.participant.published_tracks if event.participant else None - ) or [] - event_desc = "Participant left" + # Determine which tracks to remove + if isinstance(event, sfu_events.TrackUnpublishedEvent): + tracks_to_remove = [event.payload.type] + event_desc = "Track unpublished" + else: + tracks_to_remove = (event.participant.published_tracks if event.participant else None) or [] + event_desc = "Participant left"plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py (1)
562-585: Tool call follow-up responses lack timing metrics in LLMResponseCompletedEvent.The follow-up
LLMResponseCompletedEvent(line 579) is emitted withoutlatency_ms,time_to_first_token_ms, ormodel. This creates inconsistent observability data compared to the initial request path.🔧 Suggested fix to add timing to follow-up events
+ follow_up_start_time = time.perf_counter() + follow_up_first_token_time: Optional[float] = None async for chunk in follow_up: if not chunk.choices: continue choice = chunk.choices[0] content = choice.delta.content if choice.delta else None finish_reason = choice.finish_reason chunk_id = chunk.id if chunk.id else chunk_id if choice.delta and choice.delta.tool_calls: for tc in choice.delta.tool_calls: self._accumulate_tool_call_chunk(tc) if content: + if follow_up_first_token_time is None: + follow_up_first_token_time = time.perf_counter() text_chunks.append(content) self.events.send( LLMResponseChunkEvent( plugin_name=PLUGIN_NAME, content_index=None, item_id=chunk_id, output_index=0, sequence_number=i, delta=content, ) ) if finish_reason: if finish_reason == "tool_calls": next_tool_calls = self._finalize_pending_tool_calls() total_text = "".join(text_chunks) + follow_up_latency = (time.perf_counter() - follow_up_start_time) * 1000 + follow_up_ttft = None + if follow_up_first_token_time is not None: + follow_up_ttft = (follow_up_first_token_time - follow_up_start_time) * 1000 self.events.send( LLMResponseCompletedEvent( plugin_name=PLUGIN_NAME, original=chunk, text=total_text, item_id=chunk_id, + latency_ms=follow_up_latency, + time_to_first_token_ms=follow_up_ttft, + model=self.model, ) )plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
501-520:is_first_chunklogic is incorrect - evaluates True for every chunk after first token.The condition
is_first = first_token_time is not None and request_start_time is not Nonewill beTruefor every chunk after the first token arrives, not just the first chunk. Similarly,chunk_ttft_msis recalculated for every chunk.The
is_first_chunkfield andtime_to_first_token_msshould only be set on the actual first chunk.🐛 Proposed fix
You need to track whether the first chunk has already been emitted. One approach:
def _standardize_and_emit_event( self, event: ResponseStreamEvent, request_start_time: Optional[float] = None, first_token_time: Optional[float] = None, + is_first_chunk_emitted: bool = False, ) -> Optional[LLMResponseEvent[OpenAIResponse]]: # ... if event.type == "response.output_text.delta": delta_event: ResponseTextDeltaEvent = event - # Calculate time to first token for the first chunk - is_first = first_token_time is not None and request_start_time is not None + # Only the very first chunk should have is_first_chunk=True + is_first = not is_first_chunk_emitted and first_token_time is not None chunk_ttft_ms: Optional[float] = None - if first_token_time is not None and request_start_time is not None: + if is_first and first_token_time is not None and request_start_time is not None: chunk_ttft_ms = (first_token_time - request_start_time) * 1000And track
is_first_chunk_emittedin the caller.plugins/xai/vision_agents/plugins/xai/llm.py (2)
390-393: Follow-up tool call streaming lacks timing parameters.The call to
_standardize_and_emit_chunkat line 391 doesn't passrequest_start_timeorfirst_token_time, meaning follow-up responses won't have timing metrics in their chunk events.🔧 Suggested fix
Track timing for follow-up requests:
+ follow_up_start = time.perf_counter() + follow_up_first_token: Optional[float] = None async for response, chunk in self.xai_chat.stream(): + if follow_up_first_token is None and chunk.content: + follow_up_first_token = time.perf_counter() llm_response_optional = self._standardize_and_emit_chunk( - chunk, response + chunk, response, + request_start_time=follow_up_start, + first_token_time=follow_up_first_token, )
449-466:is_first_chunklogic is incorrect - evaluates True for every chunk after first token.Same issue as in OpenAI plugin. The condition
is_first = first_token_time is not None and request_start_time is not Nonewill beTruefor every chunk after the first token, not just the first chunk.🐛 Proposed fix
Track whether the first chunk event has been emitted:
def _standardize_and_emit_chunk( self, chunk: Chunk, response: Response, request_start_time: Optional[float] = None, first_token_time: Optional[float] = None, + first_chunk_emitted: bool = False, ) -> Optional[LLMResponseEvent[Response]]: # ... if chunk.content: - is_first = first_token_time is not None and request_start_time is not None + is_first = not first_chunk_emitted and first_token_time is not None ttft_ms: Optional[float] = None - if first_token_time is not None and request_start_time is not None: + if is_first and request_start_time is not None: ttft_ms = (first_token_time - request_start_time) * 1000The caller should track and pass
first_chunk_emittedstate.
🤖 Fix all issues with AI agents
In @examples/02_golf_coach_example/golf_coach_with_metrics.py:
- Around line 52-54: The MetricsCollector created in join_call is currently
assigned to "_" which allows it to be garbage-collected; instead keep a strong
reference for the call lifetime (e.g., assign it to a persistent attribute on
the Agent like agent._metrics_collector or store it in a call-scoped variable
returned/managed by the call lifecycle) so the MetricsCollector instance remains
alive and its event subscriptions persist; update the join_call function to
replace "_ = MetricsCollector(agent)" with a durable assignment (referencing
MetricsCollector and join_call).
In @plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py:
- Around line 467-487: The first-chunk flag is being computed per-stream via
len(text_parts) which causes every new stream (tool calls/finalize) to be
considered first; add an instance-level boolean (e.g.,
self._first_chunk_emitted) to the AnthropicLLM class, initialize it False, set
it to False at the start of each new request in create_message, then in
_standardize_and_emit_event compute is_first only when self._first_chunk_emitted
is False and the candidate conditions hold (first_token_time/request_start_time
present and text_parts length == 1), and after emitting the first-chunk event
set self._first_chunk_emitted = True so subsequent streams do not mark their
first chunk as the global first.
- Around line 384-393: The streaming path currently sends an
LLMResponseCompletedEvent directly (using last_followup_stream/original,
total_text, latency_ms, ttft_ms, model) which omits token usage metrics present
on RawMessageStopEvent message_stop/usage; replace that direct
self.events.send(...) block with a call to the existing _emit_completion_event
helper so streaming responses reuse its logic to extract
input_tokens/output_tokens from RawMessageStopEvent (and safely handle missing
usage), passing the same original/last_followup_stream, total_text, latency_ms,
ttft_ms, and self.model as needed.
🧹 Nitpick comments (9)
plugins/roboflow/vision_agents/plugins/roboflow/events.py (1)
3-3: Optional import may be unused.The
Optionalimport fromtypingdoesn't appear to be used directly in this file's DetectionCompletedEvent (the Optional types are in the parent class). Consider removing if not needed, though it's harmless to keep.examples/02_golf_coach_example/pyproject.toml (1)
14-18: Consider using optional-dependencies for truly optional packages.The metrics dependencies are marked as "Optional" but are included in the main dependencies array. For better dependency management, consider moving them to a
[project.optional-dependencies]section:[project.optional-dependencies] metrics = [ "opentelemetry-api>=1.39.1", "opentelemetry-sdk>=1.39.1", "opentelemetry-exporter-prometheus>=0.60b1", "prometheus-client>=0.23.1", ]Users could then install with
pip install vision-agents-golf[metrics]. Note that opentelemetry-exporter-prometheus is still in pre-release; you may want to lock to a specific stable version once it reaches 1.0.tests/test_metrics_collector.py (1)
40-92: Test setup uses MagicMock for agent, which conflicts with coding guidelines.The coding guidelines state "Never mock in tests; use pytest for testing." While the
MockMetricis a legitimate test double andmonkeypatchis appropriate, usingMagicMockfor the agent (line 80) violates this guideline.Consider creating a minimal concrete test fixture or a dedicated test stub class instead of
MagicMock.♻️ Example refactor using a stub class
+class StubAgent: + """Minimal agent stub for testing MetricsCollector.""" + def __init__(self): + self.llm = None + self.stt = None + self.tts = None + self.turn_detection = None + self.events = None # Could use a real EventManager if needed + def _create_collector_with_mocks(self, monkeypatch): """Create a collector with mocked metrics.""" # ... metric mocks ... - # Create a mock agent - agent = MagicMock() - agent.llm = None - agent.stt = None - agent.tts = None - agent.turn_detection = None - agent.events = MagicMock() + # Create a stub agent + agent = StubAgent() # Create collector but skip event subscription collector = object.__new__(MetricsCollector) collector.agent = agent collector._realtime_session_starts = {} return collector, mocksBased on coding guidelines, mocking should be avoided.
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
370-373: Usage ofgetattrviolates coding guidelines.The coding guidelines state "Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python."
♻️ Proposed fix using direct attribute access
# Extract token usage from response if available input_tokens: Optional[int] = None output_tokens: Optional[int] = None - if final_chunk and hasattr(final_chunk, "usage_metadata") and final_chunk.usage_metadata: - usage = final_chunk.usage_metadata - input_tokens = getattr(usage, "prompt_token_count", None) - output_tokens = getattr(usage, "candidates_token_count", None) + if final_chunk and final_chunk.usage_metadata: + usage = final_chunk.usage_metadata + input_tokens = usage.prompt_token_count if usage.prompt_token_count else None + output_tokens = usage.candidates_token_count if usage.candidates_token_count else NoneIf the attributes might not exist on some SDK versions, wrap in try/except instead of using getattr.
Based on coding guidelines.
plugins/xai/vision_agents/plugins/xai/llm.py (1)
269-306: Extensive use ofgetattrviolates coding guidelines.Lines 282, 284, 288, 289, 290 use
getattrfor accessing response and tool call attributes. The coding guidelines state to "Avoid using getattr, hasattr, delattr and setattr."♻️ Proposed refactor using direct attribute access with try/except
def _extract_tool_calls_from_response( self, response: Response ) -> List[NormalizedToolCallItem]: calls = [] - tool_calls = getattr(response, "tool_calls", None) or [] + tool_calls = response.tool_calls if response.tool_calls else [] for tc in tool_calls: - func = getattr(tc, "function", None) + func = tc.function if tc.function else None if not func: continue - name = getattr(func, "name", "unknown") - args_str = getattr(func, "arguments", "{}") - call_id = getattr(tc, "id", "") or getattr(tc, "call_id", "") + name = func.name if func.name else "unknown" + args_str = func.arguments if func.arguments else "{}" + call_id = tc.id if tc.id else (tc.call_id if tc.call_id else "")If attributes may not exist, wrap the entire extraction in try/except for robustness.
Based on coding guidelines.
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (4)
144-144: Remove unused latency calculation.This latency calculation is immediately overwritten at line 234 (after the tool-calling loop completes). The first calculation captures only the initial response time, while the second captures the full end-to-end latency including tool calls, which is the intended metric.
♻️ Proposed fix
- latency_ms = (time.perf_counter() - request_start_time) * 1000 - # Extract text from Claude's response format - safely handle all text blocks
239-239: Fix indentation of closing parenthesis.The closing parenthesis has extra indentation inconsistent with the opening call on line 235.
♻️ Proposed fix
latency_ms=latency_ms, - ) + )
397-437: LGTM! Centralized completion event emission with proper token extraction.The method correctly extracts token usage from Claude responses and emits comprehensive metrics. The conditional logic for
total_tokens(lines 431-433) is correct but could be slightly more readable.♻️ Optional: Simplify total_tokens calculation
- total_tokens=(input_tokens or 0) + (output_tokens or 0) - if input_tokens or output_tokens - else None, + total_tokens=( + (input_tokens or 0) + (output_tokens or 0) + if (input_tokens is not None or output_tokens is not None) + else None + ),This makes the condition more explicit about checking for
Nonevs. truthiness.
608-611: Use specific exception handling instead of bareexcept Exception:.As per coding guidelines, avoid catching
Exceptionbroadly. Since you're parsing JSON here, catchjson.JSONDecodeErrorspecifically.♻️ Proposed fix
buf = "".join(pending["parts"]).strip() or "{}" try: args = json.loads(buf) - except Exception: + except json.JSONDecodeError: args = {}
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
examples/02_golf_coach_example/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (18)
agents-core/vision_agents/core/events/__init__.pyagents-core/vision_agents/core/events/base.pyagents-core/vision_agents/core/llm/events.pyagents-core/vision_agents/core/observability/collector.pyagents-core/vision_agents/core/tts/tts.pyexamples/02_golf_coach_example/golf_coach_with_metrics.pyexamples/02_golf_coach_example/pyproject.tomlexamples/03_prometheus_metrics_example/prometheus_metrics_example.pyplugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.pyplugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.pyplugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.pyplugins/gemini/vision_agents/plugins/gemini/gemini_llm.pyplugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.pyplugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.pyplugins/openai/vision_agents/plugins/openai/openai_llm.pyplugins/roboflow/vision_agents/plugins/roboflow/events.pyplugins/xai/vision_agents/plugins/xai/llm.pytests/test_metrics_collector.py
💤 Files with no reviewable changes (1)
- agents-core/vision_agents/core/tts/tts.py
🚧 Files skipped from review as they are similar to previous changes (2)
- examples/03_prometheus_metrics_example/prometheus_metrics_example.py
- plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path in Python code
Never writeexcept Exception as e- use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings
Files:
agents-core/vision_agents/core/events/base.pyagents-core/vision_agents/core/observability/collector.pyplugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.pyplugins/gemini/vision_agents/plugins/gemini/gemini_llm.pyplugins/xai/vision_agents/plugins/xai/llm.pyagents-core/vision_agents/core/events/__init__.pyplugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.pyplugins/openai/vision_agents/plugins/openai/openai_llm.pytests/test_metrics_collector.pyexamples/02_golf_coach_example/golf_coach_with_metrics.pyplugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.pyplugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.pyplugins/roboflow/vision_agents/plugins/roboflow/events.pyagents-core/vision_agents/core/llm/events.py
**/*test*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*test*.py: Never mock in tests; use pytest for testing
Mark integration tests with @pytest.mark.integration decorator
@pytest.mark.asyncio is not needed - it is automatic
Files:
tests/test_metrics_collector.py
🧬 Code graph analysis (10)
agents-core/vision_agents/core/observability/collector.py (5)
agents-core/vision_agents/core/events/base.py (2)
PluginBaseEvent(52-54)VideoProcessorDetectionEvent(143-156)agents-core/vision_agents/core/llm/events.py (12)
LLMResponseCompletedEvent(125-149)LLMErrorEvent(251-263)ToolEndEvent(163-172)RealtimeConnectedEvent(11-17)RealtimeDisconnectedEvent(21-25)RealtimeAudioInputEvent(29-33)RealtimeAudioOutputEvent(37-42)RealtimeResponseEvent(46-54)RealtimeUserSpeechTranscriptionEvent(176-181)RealtimeAgentSpeechTranscriptionEvent(185-190)VLMInferenceCompletedEvent(209-231)VLMErrorEvent(235-247)agents-core/vision_agents/core/stt/events.py (2)
STTTranscriptEvent(17-47)STTErrorEvent(81-93)agents-core/vision_agents/core/tts/events.py (2)
TTSSynthesisCompleteEvent(35-45)TTSErrorEvent(49-62)agents-core/vision_agents/core/turn_detection/events.py (1)
TurnEndedEvent(29-46)
plugins/xai/vision_agents/plugins/xai/llm.py (3)
agents-core/vision_agents/core/llm/events.py (2)
LLMRequestStartedEvent(87-96)LLMResponseCompletedEvent(125-149)agents-core/vision_agents/core/events/manager.py (1)
send(437-481)agents-core/vision_agents/core/llm/llm.py (1)
LLMResponseEvent(39-43)
agents-core/vision_agents/core/events/__init__.py (1)
agents-core/vision_agents/core/events/base.py (1)
VideoProcessorDetectionEvent(143-156)
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py (3)
agents-core/vision_agents/core/llm/events.py (2)
LLMRequestStartedEvent(87-96)LLMResponseCompletedEvent(125-149)plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)
_process_non_streaming_response(292-332)agents-core/vision_agents/core/llm/llm.py (1)
LLMResponseEvent(39-43)
plugins/openai/vision_agents/plugins/openai/openai_llm.py (4)
agents-core/vision_agents/core/llm/events.py (2)
LLMRequestStartedEvent(87-96)LLMResponseCompletedEvent(125-149)agents-core/vision_agents/core/llm/llm.py (1)
LLMResponseEvent(39-43)plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (2)
_emit_completion_event(397-436)_standardize_and_emit_event(438-494)plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
_standardize_and_emit_event(413-450)
tests/test_metrics_collector.py (5)
agents-core/vision_agents/core/llm/events.py (4)
LLMResponseCompletedEvent(125-149)ToolEndEvent(163-172)VLMInferenceCompletedEvent(209-231)VLMErrorEvent(235-247)agents-core/vision_agents/core/stt/events.py (3)
STTTranscriptEvent(17-47)STTErrorEvent(81-93)TranscriptResponse(7-13)agents-core/vision_agents/core/tts/events.py (2)
TTSSynthesisCompleteEvent(35-45)TTSErrorEvent(49-62)agents-core/vision_agents/core/turn_detection/events.py (1)
TurnEndedEvent(29-46)agents-core/vision_agents/core/observability/collector.py (11)
MetricsCollector(51-484)_on_llm_response_completed(180-200)_on_tool_end(202-211)_on_stt_transcript(315-327)_on_stt_error(329-337)_on_tts_synthesis_complete(343-357)_on_tts_error(359-367)_on_turn_ended(373-381)_on_vlm_inference_completed(400-425)_on_vlm_error(427-435)_base_attributes(472-484)
examples/02_golf_coach_example/golf_coach_with_metrics.py (2)
agents-core/vision_agents/core/edge/types.py (1)
User(15-18)agents-core/vision_agents/core/observability/collector.py (1)
MetricsCollector(51-484)
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (2)
agents-core/vision_agents/core/llm/events.py (3)
LLMRequestStartedEvent(87-96)LLMResponseCompletedEvent(125-149)LLMResponseChunkEvent(100-121)agents-core/vision_agents/core/llm/llm.py (1)
LLMResponseEvent(39-43)
plugins/roboflow/vision_agents/plugins/roboflow/events.py (1)
agents-core/vision_agents/core/events/base.py (1)
VideoProcessorDetectionEvent(143-156)
agents-core/vision_agents/core/llm/events.py (4)
agents-core/vision_agents/core/events/base.py (2)
PluginBaseEvent(52-54)error_message(92-93)agents-core/vision_agents/core/edge/sfu_events.py (2)
error(935-940)error(1906-1910)agents-core/vision_agents/core/agents/events.py (1)
error_message(65-66)agents-core/vision_agents/core/tts/events.py (1)
error_message(61-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: unit / Mypy
- GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Mypy
- GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
🔇 Additional comments (55)
plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py (1)
4-4: LGTM! Timing measurement implementation is sound.The latency tracking implementation correctly measures processing time from first audio chunk to transcript delivery:
- Uses
time.perf_counter()for accurate elapsed time measurement- Initializes timer on first audio chunk of each utterance (line 123 check prevents mid-utterance restarts)
- Calculates processing time for each transcript message (partial or final) relative to utterance start
- Properly resets timer at turn boundaries (lines 251-252) to prepare for next utterance
- Handles None cases correctly when timing hasn't started
The logic integrates cleanly with the existing event flow and aligns with the observability surface mentioned in the PR objectives.
Also applies to: 84-85, 122-124, 218-229, 251-252
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
460-476: No changes needed. Theis_global_bannedfield is valid and supported in getstream-python 2.5.20. The Python SDK uses snake_case field names, sois_global_banned=Falseis the correct format and will not fail at runtime.Likely an incorrect or invalid review comment.
agents-core/vision_agents/core/events/__init__.py (2)
9-9: LGTM!The import of
VideoProcessorDetectionEventis correct and properly exposes the new event type for video processor metrics collection.
131-131: LGTM!The export is correctly added to make
VideoProcessorDetectionEventpart of the public API.agents-core/vision_agents/core/events/base.py (1)
142-156: LGTM!The
VideoProcessorDetectionEventclass is well-designed with proper typing, sensible defaults, and clear documentation following the Google docstring style. The use ofinit=Falsefor the type field is appropriate, and the optional fields support flexible metric collection across different video processor plugins.plugins/roboflow/vision_agents/plugins/roboflow/events.py (2)
17-18: LGTM!The base class change to
VideoProcessorDetectionEventproperly aligns this event with the new metrics collection infrastructure, and the docstring fix improves clarity.
33-35: LGTM with a note on field override behavior.The
__post_init__method correctly calculatesdetection_countfrom the objects list. Note that this will override anydetection_countvalue passed to__init__, which appears intentional to ensure consistency between the objects list and the count.examples/02_golf_coach_example/golf_coach_with_metrics.py (5)
10-23: LGTM!The OpenTelemetry configuration is correctly placed before importing
vision_agents, ensuring the metrics infrastructure is ready when the agent components are loaded. The Prometheus HTTP server setup on port 9464 follows best practices.
26-38: The noqa: E402 suppressions are justified.While import order warnings are typically important, the requirement to configure OpenTelemetry before importing vision_agents makes these suppressions necessary and appropriate.
41-49: LGTM!The
create_agentfunction properly constructs an Agent with the necessary components for the golf coach example with metrics collection.
56-83: LGTM!The logging output clearly communicates which metrics are being collected, and the call flow properly demonstrates the metrics collection in action. The agent lifecycle (create call, join, respond, finish) is correctly implemented.
86-87: LGTM!The CLI entry point is properly configured with the
AgentLauncher.tests/test_metrics_collector.py (9)
23-35: Well-structured test double for metrics.The
MockMetricdataclass cleanly captures bothrecordandaddcalls with their values and attributes. This approach provides good visibility into what metrics were recorded during tests.
94-136: Thorough test coverage for LLM response completed event.The test verifies all four metrics are recorded with correct values and attributes. Good coverage of the full data path.
138-154: Good edge case coverage for partial data.Testing that missing optional fields don't trigger metric recording is important for robustness.
156-181: Tool end handler test is comprehensive.Validates both the tool call count metric and latency recording with proper attribute propagation.
183-231: STT event tests cover both success and error paths well.Both
STTTranscriptEventandSTTErrorEventhandlers are tested with appropriate attribute verification.
233-279: TTS tests verify latency, duration, and character count metrics.Good coverage including the text length calculation for
tts_characters.
281-305: Turn detection metrics test is complete.Both
duration_msandtrailing_silence_msare verified.
307-374: VLM inference and error tests provide solid coverage.The VLM inference test verifies inference count, latency, tokens, and frame processing metrics. Error handling test validates error type extraction.
376-397: Base attributes helper tests cover both present and missing plugin_name cases.Simple but important edge case coverage.
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py (4)
162-172: Request timing instrumentation looks correct.The
LLMRequestStartedEventemission andrequest_start_timetracking usingtime.perf_counter()follows the pattern established in other LLM plugins.
228-248: First token timing and chunk event emission is well implemented.The logic correctly tracks
first_token_timeon first content arrival and computestime_to_first_token_msfor the first chunk only. Theis_first_chunkflag is derived fromlen(text_chunks) == 0.
261-276: Streaming completion event includes proper timing metrics.Latency and TTFT are correctly calculated and included in
LLMResponseCompletedEvent.
297-314: Non-streaming path correctly computes and emits latency.The
latency_mscalculation and event emission are consistent with the streaming path.agents-core/vision_agents/core/observability/collector.py (8)
1-14: Clear module docstring with usage example.The documentation clearly explains the purpose and provides a practical usage example.
62-71: Constructor initializes state correctly and triggers event subscription.The session tracking dictionary and subscription call are properly initialized.
248-276: Audio input/output handlers correctly calculate byte sizes.The
* 2multiplication for 16-bit samples is appropriate for calculating audio bytes.
315-327: STT transcript handler properly accesses properties from the event.The handler correctly uses
event.model_name,event.language,event.processing_time_ms, andevent.audio_duration_mswhich are defined as properties onSTTTranscriptEvent.
343-357: TTS synthesis handler correctly records all three metrics.Latency, audio duration, and character count are all properly handled with null checks.
441-466: Video processor event subscription is unconditional.Unlike other subscriptions (lines 83-86, 141-144, etc.) which check for component existence before subscribing,
_subscribe_to_processor_eventsalways subscribes toself.agent.events. This is intentional since processors emit through the agent's event system, not through a specific component.
472-484: Base attributes helper is clean and simple.The helper correctly extracts the provider from
plugin_namewith appropriate null checking.
227-246:session_idattribute exists on RealtimeConnectedEvent and RealtimeDisconnectedEvent through inheritance.The event classes inherit from
PluginBaseEvent, which extendsBaseEvent.BaseEvent(defined inagents-core/vision_agents/core/events/base.pyat line 40) declaressession_id: Optional[str] = None. Therefore, bothRealtimeConnectedEventandRealtimeDisconnectedEventhave access to thesession_idattribute via inheritance. Accessingevent.session_idat lines 232 and 243 is valid.Likely an incorrect or invalid review comment.
plugins/openai/vision_agents/plugins/openai/openai_llm.py (5)
151-164: Request started event and timing initialization are correct.The
LLMRequestStartedEventemission and timing setup follow the established pattern across plugins.
173-189: Non-streaming path correctly calculates latency and emits completion event.The latency calculation and delegation to
_emit_completion_eventare properly implemented.
199-207: First token time is tracked correctly for streaming.The check for
first_token_time is Noneand the event type ensures we only capture the first text delta timing.
427-467: Well-factored completion event emission helper.The
_emit_completion_eventmethod cleanly extracts token usage and emits the event with all metrics. Good separation of concerns.
523-544: Completion event timing metrics are correctly calculated.The latency and TTFT calculations in the
response.completedhandler are correct.plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (5)
223-234: Request started event and timing initialization are correctly implemented.Follows the same pattern as other LLM plugins.
251-263: First token detection and timing propagation to standardize method are correct.The check for
chunk.textis appropriate for Gemini's response structure.
331-341: Follow-up tool call responses reuse original request timing.The timing passed to
_standardize_and_emit_eventduring tool call follow-ups uses the originalrequest_start_timeandfirst_token_time. This may be intentional to measure total elapsed time, but note that per-turn timing would require fresh start times.
413-450: Standardize and emit event correctly handles first chunk detection.The
is_first = len(text_parts) == 0check correctly identifies the first chunk before appending, unlike some other plugins.
375-390: Completion event includes all timing and token metrics.The final
LLMResponseCompletedEventis properly populated with latency, TTFT, tokens, and model info.plugins/xai/vision_agents/plugins/xai/llm.py (3)
130-141: Request started event and timing initialization are correct.Follows the established pattern across plugins.
151-162: First token tracking and timing parameter passing are correct.The streaming path properly captures first token time and passes timing to the chunk handler.
200-216: Completion event with timing metrics is properly emitted.Latency and TTFT are correctly calculated and included in the event.
agents-core/vision_agents/core/llm/events.py (5)
86-96: Well-defined LLMRequestStartedEvent with auto-generated request_id.The event captures all necessary context for request tracking with appropriate defaults.
117-121: LLMResponseChunkEvent extended with first-chunk timing fields.The new
is_first_chunkandtime_to_first_token_msfields enable streaming latency observability.
124-150: LLMResponseCompletedEvent comprehensively extended with timing and token metrics.All timing (
latency_ms,time_to_first_token_ms) and token usage fields (input_tokens,output_tokens,total_tokens,model) are properly typed as Optional with clear docstrings.
198-231: VLM events mirror LLM event structure with vision-specific metrics.
VLMInferenceStartEventandVLMInferenceCompletedEventappropriately includeframes_processedanddetectionsfor vision model observability.
234-263: VLMErrorEvent and LLMErrorEvent follow consistent error event patterns.Both events include
error,error_code,context,is_recoverable, and theerror_messageproperty, consistent with other error events in the codebase (e.g.,RealtimeErrorEvent,STTErrorEvent,TTSErrorEvent).plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (4)
3-3: LGTM! Imports support the new metrics functionality.The addition of
timefor performance timing andLLMRequestStartedEventfor request lifecycle tracking aligns with the PR's observability objectives.Also applies to: 19-19
127-140: LGTM! Request event emission and timing initialization are correctly placed.Emitting
LLMRequestStartedEventbefore the API call and initializing timing variables withperf_counter()provides accurate latency tracking.
248-252: LGTM! First token tracking is correctly implemented.The guard condition
if first_token_time is Noneensures the timestamp captures the very first token received, even across multiple streaming rounds.
438-444: LGTM! Signature update properly documents timing parameters.The optional timing parameters maintain backward compatibility while enabling first-chunk detection and TTFT metrics.
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)
504-524: Fix theis_first_chunklogic—it flags every chunk as first.The condition at line 507 will be
Truefor all chunks afterfirst_token_timeis set (line 204), not just the first chunk. This causes every chunk to reportis_first_chunk=Trueand the sametime_to_first_token_ms, corrupting TTFT metrics.🐛 Proposed fix to track first chunk correctly
Track whether the first chunk has already been emitted:
def _standardize_and_emit_event( self, event: ResponseStreamEvent, request_start_time: Optional[float] = None, first_token_time: Optional[float] = None, + first_chunk_emitted: bool = False, ) -> Optional[LLMResponseEvent[OpenAIResponse]]: """Forward native events and emit standardized versions. Args: event: The streaming event from OpenAI. request_start_time: Time when the request started (perf_counter). first_token_time: Time when first token was received (perf_counter). + first_chunk_emitted: Whether the first chunk has already been emitted. Returns: LLMResponseEvent if this is a completion event, None otherwise. """Then update the chunk handling logic:
if event.type == "response.output_text.delta": delta_event: ResponseTextDeltaEvent = event # Calculate time to first token for the first chunk - is_first = first_token_time is not None and request_start_time is not None + is_first = ( + not first_chunk_emitted + and first_token_time is not None + and request_start_time is not None + ) chunk_ttft_ms: Optional[float] = None - if first_token_time is not None and request_start_time is not None: + if is_first: chunk_ttft_ms = (first_token_time - request_start_time) * 1000 self.events.send( LLMResponseChunkEvent( plugin_name="openai", content_index=None, item_id=delta_event.item_id, output_index=delta_event.output_index, sequence_number=delta_event.sequence_number, delta=delta_event.delta, is_first_chunk=is_first, time_to_first_token_ms=chunk_ttft_ms, ) ) return NoneAnd update the call site in
create_response:+ first_chunk_emitted = False # Process streaming events and collect tool calls async for event in stream_response: # Track time to first token if ( first_token_time is None and event.type == "response.output_text.delta" ): first_token_time = time.perf_counter() llm_response_optional = self._standardize_and_emit_event( event, request_start_time=request_start_time, first_token_time=first_token_time, + first_chunk_emitted=first_chunk_emitted, ) + if event.type == "response.output_text.delta" and first_token_time is not None: + first_chunk_emitted = True
339-340: Tool call follow-up responses lack timing metrics.The call to
_standardize_and_emit_eventat line 340 omitsrequest_start_timeandfirst_token_timeparameters, so follow-up responses from tool execution won't emit latency or TTFT metrics. This creates an incomplete observability surface for multi-turn tool calling scenarios.📊 Proposed fix to add timing for follow-ups
Track timing for follow-up requests as well:
async def _send_tool_results_and_get_response( self, tool_messages: list[dict[str, Any]], seen: set[tuple[str, str]], ) -> tuple[ Optional[LLMResponseEvent[OpenAIResponse]], List[NormalizedToolCallItem] ]: """Send tool results and get follow-up response. Returns: Tuple of (llm_response, next_tool_calls) """ if not self.openai_conversation: return None, [] follow_up_kwargs: Dict[str, Any] = { "model": self.model, "conversation": self.openai_conversation.id, "input": tool_messages, "stream": True, } # Include tools for potential follow-up calls tools = self.get_available_functions() if tools: follow_up_kwargs["tools"] = convert_tools_to_openai_format(tools) + # Track timing for follow-up request + followup_request_start = time.perf_counter() + followup_first_token: Optional[float] = None + follow_up_response = await self.client.responses.create(**follow_up_kwargs) if isinstance(follow_up_response, OpenAIResponse): llm_response = LLMResponseEvent[OpenAIResponse]( follow_up_response, follow_up_response.output_text ) next_tool_calls = self._extract_tool_calls_from_response(follow_up_response) return llm_response, next_tool_calls # Streaming response llm_response_streaming: Optional[LLMResponseEvent[OpenAIResponse]] = None pending_tool_calls: List[NormalizedToolCallItem] = [] async for event in follow_up_response: - llm_response_optional = self._standardize_and_emit_event(event) + # Track first token for follow-up + if ( + followup_first_token is None + and event.type == "response.output_text.delta" + ): + followup_first_token = time.perf_counter() + + llm_response_optional = self._standardize_and_emit_event( + event, + request_start_time=followup_request_start, + first_token_time=followup_first_token, + ) if llm_response_optional is not None: llm_response_streaming = llm_response_optional if event.type == "response.completed": calls = self._extract_tool_calls_from_response(event.response) for c in calls: key = tool_call_dedup_key(c) if key not in seen: pending_tool_calls.append(c) seen.add(key) return llm_response_streaming, pending_tool_callsplugins/aws/vision_agents/plugins/aws/aws_llm.py (1)
357-364: Avoid catching genericException.Per coding guidelines, use specific exception types rather than bare
except Exception as e. This broad catch could mask unrelated bugs (e.g.,KeyboardInterruptwon't be caught, butTypeError,AttributeError, etc. will be silently wrapped as "unexpected errors").Consider catching specific exceptions like
json.JSONDecodeError,TypeError, or AWS-specific exceptions.plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py (1)
620-627: Missing timing metrics in tool call follow-up responses.The
LLMResponseCompletedEventemitted during tool call follow-up (lines 620-627) lackslatency_ms,time_to_first_token_ms, andmodelfields that are present in the main streaming path (lines 344-354). This creates inconsistent metric data.Consider tracking timing for follow-up requests as well, or document that follow-up metrics are intentionally omitted.
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (2)
414-438: Replaceexcept Exception as ewith specific exception types (guideline violation) and avoid swallowingasyncio.CancelledError.Line 434 violates the repo guideline to never use broad
except Exception as ehandlers. Additionally,asyncio.CancelledErrorshould not be swallowed during cleanup operations.Lines 218 and 384 also use
except Exception as eand should be updated for consistency.Proposed fix for line 434 (narrower catches + re-raise cancellation)
if self.connection: try: await self.connection.close() - except Exception as e: - logger.warning(f"Error closing Elevenlabs connection: {e}") + except asyncio.CancelledError: + raise + except (OSError, RuntimeError, ValueError) as e: + logger.warning(f"Error closing Elevenlabs connection: {e}") finally: self.connection = None self._connection_ready.clear()
299-336: Move state cleanup before returning from empty transcripts.ElevenLabs emits COMMITTED_TRANSCRIPT_WITH_TIMESTAMPS events with empty/whitespace
textwhen usinginclude_timestamps=trueor with timing-only segments — these are expected end-of-speech signals. The current early return at line 20–21 skips both_commit_received.set()and_audio_start_time = None, causingclear()to hang until timeout (10s) and polluting the next utterance'sprocessing_time_mscalculation with stale timestamps.Proposed fix
- if not transcript_text: - return + if not transcript_text: + # Still treat as "commit received" to unblock clear() + self._audio_start_time = None + self._commit_received.set() + return
🤖 Fix all issues with AI agents
In @agents-core/vision_agents/core/observability/collector.py:
- Around line 348-349: The truthiness check skips valid zero values for
event.synthesis_time_ms; change the conditional in the collector (where
event.synthesis_time_ms is checked before calling metrics.tts_latency_ms.record)
to use an explicit None check (i.e., test event.synthesis_time_ms is not None)
so that 0.0 is recorded, then call
metrics.tts_latency_ms.record(event.synthesis_time_ms, attrs) as before.
- Around line 227-246: The handlers _on_realtime_connected and
_on_realtime_disconnected reference event.session_id which doesn't exist on
RealtimeConnectedEvent/RealtimeDisconnectedEvent; update the event model or
generate a session key: either add an optional session_id field to the
RealtimeConnectedEvent/RealtimeDisconnectedEvent dataclasses in
agents-core/vision_agents/core/llm/events.py and keep using
self._realtime_session_starts[event.session_id], or change the handlers to
compute a deterministic session id (e.g.,
f"{event.provider}:{int(time.time()*1000)}" or similar derived key) when storing
and looking up entries in self._realtime_session_starts so no AttributeError
occurs; ensure the same session id generation logic is used in both
_on_realtime_connected and _on_realtime_disconnected and that
_realtime_session_starts is keyed accordingly.
In @plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py:
- Around line 94-95: Ensure _audio_start_time cannot persist across utterances
by initializing it at the very start of any audio-processing entrypoint (e.g.,
functions like transcribe, process_stream, handle_audio_chunk) and always
clearing it in a finally block so every early return or exception resets it;
wrap the processing logic in try/finally where you set self._audio_start_time =
time.time() (or None for tests) at entry and set self._audio_start_time = None
in the finally, and also clear it when you explicitly detect end-of-utterance or
final transcript emission.
🧹 Nitpick comments (13)
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (2)
252-254: Prefer direct attribute access overhasattr.The coding guidelines specify avoiding
hasattrin favor of normal attribute access or try/except blocks.♻️ Proposed refactor using try/except
- # Track time to first token - if first_token_time is None and hasattr(chunk, "text") and chunk.text: - first_token_time = time.perf_counter() + # Track time to first token + if first_token_time is None: + try: + if chunk.text: + first_token_time = time.perf_counter() + except AttributeError: + passBased on coding guidelines.
367-377: Prefer direct attribute access overhasattr/getattr.The coding guidelines specify avoiding
hasattrandgetattrin favor of normal attribute access or try/except blocks.♻️ Proposed refactor using try/except
- # Extract token usage from response if available - input_tokens: Optional[int] = None - output_tokens: Optional[int] = None - if ( - final_chunk - and hasattr(final_chunk, "usage_metadata") - and final_chunk.usage_metadata - ): - usage = final_chunk.usage_metadata - input_tokens = getattr(usage, "prompt_token_count", None) - output_tokens = getattr(usage, "candidates_token_count", None) + # Extract token usage from response if available + input_tokens: Optional[int] = None + output_tokens: Optional[int] = None + if final_chunk: + try: + usage = final_chunk.usage_metadata + if usage: + input_tokens = usage.prompt_token_count + output_tokens = usage.candidates_token_count + except AttributeError: + passBased on coding guidelines.
plugins/xai/vision_agents/plugins/xai/llm.py (1)
151-162: Consider more explicit truthiness check on Line 153.The condition
chunk.contentis falsy for empty strings, which appears intentional to capture only meaningful tokens. However, being explicit improves clarity:- if first_token_time is None and chunk.content: + if first_token_time is None and chunk.content is not None and chunk.content != "":Or if empty strings should also trigger timing:
- if first_token_time is None and chunk.content: + if first_token_time is None and chunk.content is not None:plugins/roboflow/vision_agents/plugins/roboflow/events.py (1)
33-35: Consider callingsuper().__post_init__()for defensive initialization chain safety.The logic correctly sets
detection_countfrom the objects list. While parent classes don't currently define__post_init__, callingsuper().__post_init__()at the beginning of this method is defensive best practice and ensures compatibility if parent classes add initialization logic in the future.♻️ Proposed defensive coding enhancement
def __post_init__(self): """Set detection_count from objects list.""" + super().__post_init__() self.detection_count = len(self.objects)plugins/aws/vision_agents/plugins/aws/aws_llm.py (1)
673-676: Avoid catching genericExceptionin JSON parsing.Replace with
json.JSONDecodeError(or its parentValueError) for more precise exception handling per coding guidelines.Suggested fix
try: args = json.loads(buf) - except Exception: + except json.JSONDecodeError: args = {}tests/test_metrics_collector.py (1)
91-96: Creative but fragile: bypassing__init__withobject.__new__.Using
object.__new__(MetricsCollector)to skip event subscription is clever, but future changes toMetricsCollector(e.g., adding required initialization) could silently break tests. Consider documenting why this approach is used or adding a comment.Suggested documentation
# Create collector but skip event subscription + # We bypass __init__ to avoid triggering event subscriptions, + # which would require a real agent with event managers. collector = object.__new__(MetricsCollector)agents-core/vision_agents/core/observability/metrics.py (1)
153-161: Consider using histograms for audio duration metrics.
realtime_audio_input_duration_msandrealtime_audio_output_duration_msare defined as counters, which only track cumulative totals. Histograms would provide percentile distribution, enabling better latency analysis (e.g., P50, P99 audio chunk durations).However, if the intent is to track total audio time per session rather than per-chunk distribution, counters are appropriate. This is a design choice worth documenting.
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (4)
144-144: Consider removing the early latency calculation.The latency is calculated at line 144 but then recalculated at line 234 after tool calling loops complete. The first calculation doesn't include tool execution time and is effectively unused. Consider removing line 144 to avoid confusion, since the meaningful measurement happens at line 234.
♻️ Optional cleanup
Remove the early calculation:
- if isinstance(original, ClaudeMessage): - latency_ms = (time.perf_counter() - request_start_time) * 1000 - + if isinstance(original, ClaudeMessage): # Extract text from Claude's response format - safely handle all text blocksThe calculation at line 234 correctly captures the full end-to-end latency.
Also applies to: 234-239
248-252: Consider replacing hasattr/getattr with try/except blocks.Lines 250-251 use
hasattrandgetattr, which the coding guidelines ask us to avoid. While checking external API objects defensively is reasonable, try/except AttributeError would be more idiomatic.As per coding guidelines.
♻️ Refactor to use try/except
# Track time to first token - if first_token_time is None and event.type == "content_block_delta": - delta = getattr(event, "delta", None) - if delta is not None and hasattr(delta, "text") and delta.text: - first_token_time = time.perf_counter() + if first_token_time is None and event.type == "content_block_delta": + try: + if event.delta and event.delta.text: + first_token_time = time.perf_counter() + except AttributeError: + pass
397-436: Consider replacing hasattr with try/except for attribute checks.The new helper method is well-structured, but uses
hasattrat lines 416-418 and 420, which violates the coding guidelines. Since this is new code, it's a good opportunity to use the preferred try/except pattern.Additionally, the ternary expression at lines 431-433 is somewhat complex and could be simplified for readability.
As per coding guidelines.
♻️ Refactor to use try/except
- # Extract token usage from response - input_tokens: Optional[int] = None - output_tokens: Optional[int] = None - - if hasattr(response, "usage") and response.usage: - input_tokens = response.usage.input_tokens - output_tokens = response.usage.output_tokens - - model = response.model if hasattr(response, "model") else self.model + # Extract token usage from response + input_tokens: Optional[int] = None + output_tokens: Optional[int] = None + + try: + if response.usage: + input_tokens = response.usage.input_tokens + output_tokens = response.usage.output_tokens + except AttributeError: + pass + + try: + model = response.model + except AttributeError: + model = self.model + + # Simplify total_tokens calculation + total_tokens = None + if input_tokens is not None or output_tokens is not None: + total_tokens = (input_tokens or 0) + (output_tokens or 0)
467-476: Consider simplifying the first-chunk detection logic.Lines 468-472 compute
is_firstwith three conditions, which is correct but could be clearer. Sincetext_parts.append()happens at line 465, checkinglen(text_parts) == 1identifies the first text chunk. Therequest_start_time is not Nonecheck is redundant iffirst_token_time is not None(first token can only be captured if request started).♻️ Optional simplification
- # Check if this is the first text chunk - is_first = ( - first_token_time is not None - and request_start_time is not None - and len(text_parts) == 1 - ) + # Check if this is the first text chunk (first_token_time is only set once) + is_first = first_token_time is not None and len(text_parts) == 1Or add a clarifying comment explaining that all three conditions together verify this is the very first chunk with timing data captured.
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (2)
129-136: Initialize_audio_start_timeonly when audio is actually enqueued (minor accuracy improvement).
Right now it’s set even if_audio_queueisNone(unlikely given_connection_ready, but still a correctness footgun if flow changes).Proposed tweak
- # Track start time for first audio chunk of a new utterance - if self._audio_start_time is None: - self._audio_start_time = time.perf_counter() - - # Add to audio queue for batching - if self._audio_queue is not None: - await self._audio_queue.put(resampled_pcm) + # Add to audio queue for batching + if self._audio_queue is not None: + # Track start time for first audio chunk of a new utterance + if self._audio_start_time is None: + self._audio_start_time = time.perf_counter() + await self._audio_queue.put(resampled_pcm)
396-413:clear()resets are good; consider catchingasyncio.TimeoutErrorexplicitly.
asyncio.wait_for(...)raisesasyncio.TimeoutError(aliasing can vary by Python/version); being explicit improves clarity.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (12)
agents-core/vision_agents/core/observability/collector.pyagents-core/vision_agents/core/observability/metrics.pyplugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.pyplugins/aws/vision_agents/plugins/aws/aws_llm.pyplugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.pyplugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.pyplugins/gemini/vision_agents/plugins/gemini/gemini_llm.pyplugins/openai/vision_agents/plugins/openai/openai_llm.pyplugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.pyplugins/roboflow/vision_agents/plugins/roboflow/events.pyplugins/xai/vision_agents/plugins/xai/llm.pytests/test_metrics_collector.py
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path in Python code
Never writeexcept Exception as e- use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings
Files:
agents-core/vision_agents/core/observability/collector.pytests/test_metrics_collector.pyplugins/openai/vision_agents/plugins/openai/openai_llm.pyplugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.pyplugins/roboflow/vision_agents/plugins/roboflow/events.pyplugins/gemini/vision_agents/plugins/gemini/gemini_llm.pyplugins/aws/vision_agents/plugins/aws/aws_llm.pyplugins/xai/vision_agents/plugins/xai/llm.pyplugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.pyplugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.pyagents-core/vision_agents/core/observability/metrics.py
**/*test*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*test*.py: Never mock in tests; use pytest for testing
Mark integration tests with @pytest.mark.integration decorator
@pytest.mark.asyncio is not needed - it is automatic
Files:
tests/test_metrics_collector.py
🧬 Code graph analysis (7)
agents-core/vision_agents/core/observability/collector.py (9)
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_cloud_processor.py (1)
events(213-216)plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (1)
events(208-211)agents-core/vision_agents/core/events/base.py (2)
PluginBaseEvent(52-54)VideoProcessorDetectionEvent(143-156)agents-core/vision_agents/core/llm/events.py (13)
LLMResponseCompletedEvent(125-149)LLMErrorEvent(251-263)ToolEndEvent(163-172)RealtimeErrorEvent(72-83)RealtimeConnectedEvent(11-17)RealtimeDisconnectedEvent(21-25)RealtimeAudioInputEvent(29-33)RealtimeAudioOutputEvent(37-42)RealtimeResponseEvent(46-54)RealtimeUserSpeechTranscriptionEvent(176-181)RealtimeAgentSpeechTranscriptionEvent(185-190)VLMInferenceCompletedEvent(209-231)VLMErrorEvent(235-247)agents-core/vision_agents/core/stt/events.py (2)
STTTranscriptEvent(17-47)STTErrorEvent(81-93)agents-core/vision_agents/core/turn_detection/events.py (1)
TurnEndedEvent(29-46)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (2)
on_stt_transcript(121-122)_on_stt_transcript(270-275)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)
on_stt_transcript(229-230)_on_stt_transcript(396-401)tests/test_metrics_collector.py (2)
record(34-35)add(37-38)
tests/test_metrics_collector.py (1)
agents-core/vision_agents/core/observability/collector.py (11)
MetricsCollector(51-483)_on_llm_response_completed(180-200)_on_tool_end(202-211)_on_stt_transcript(315-327)_on_stt_error(329-337)_on_tts_synthesis_complete(343-357)_on_tts_error(359-367)_on_turn_ended(373-381)_on_vlm_inference_completed(400-423)_on_vlm_error(425-433)_base_attributes(471-483)
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (1)
agents-core/vision_agents/core/stt/events.py (9)
processing_time_ms(38-39)processing_time_ms(68-69)TranscriptResponse(7-13)confidence(30-31)confidence(60-61)language(34-35)language(64-65)model_name(46-47)model_name(76-77)
plugins/roboflow/vision_agents/plugins/roboflow/events.py (3)
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_cloud_processor.py (1)
events(213-216)plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (1)
events(208-211)agents-core/vision_agents/core/events/base.py (1)
VideoProcessorDetectionEvent(143-156)
plugins/aws/vision_agents/plugins/aws/aws_llm.py (2)
agents-core/vision_agents/core/llm/events.py (2)
LLMRequestStartedEvent(87-96)LLMResponseCompletedEvent(125-149)agents-core/vision_agents/core/events/manager.py (1)
send(437-481)
plugins/xai/vision_agents/plugins/xai/llm.py (3)
agents-core/vision_agents/core/llm/events.py (2)
LLMRequestStartedEvent(87-96)LLMResponseCompletedEvent(125-149)agents-core/vision_agents/core/events/manager.py (1)
send(437-481)agents-core/vision_agents/core/llm/llm.py (1)
LLMResponseEvent(39-43)
plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py (1)
agents-core/vision_agents/core/llm/events.py (1)
LLMResponseCompletedEvent(125-149)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Mypy
🔇 Additional comments (45)
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (10)
1-21: LGTM: Timing and event imports are appropriate.The addition of
timefor performance measurement andLLMRequestStartedEventfor lifecycle tracking aligns with the PR objectives to introduce observability metrics.
223-230: LGTM: Request lifecycle event properly positioned.The
LLMRequestStartedEventis emitted at the correct point before streaming begins, capturing the essential metadata (plugin, model, streaming mode).
232-234: LGTM: Correct timing primitives selected.Using
time.perf_counter()is the appropriate choice for measuring elapsed time, as it provides a high-resolution monotonic clock suitable for performance measurements.
256-263: LGTM: Timing propagation to chunk emission.The timing parameters are correctly threaded through to
_standardize_and_emit_event, enabling per-chunk latency calculations.
334-341: LGTM: Timing consistency maintained through tool calls.The timing references are correctly preserved across the multi-hop tool execution loop, ensuring latency metrics remain accurate.
361-365: LGTM: Accurate latency calculations.The timing metrics are correctly computed by taking the difference between
perf_counter()values and converting to milliseconds, with proper handling of the optional first-token time.
379-394: LGTM: Comprehensive completion event metrics.The
LLMResponseCompletedEventnow includes all essential observability data—latency, time-to-first-token, token counts, and model—with appropriate handling of optional values.
423-424: LGTM: Method signature properly extended.The optional timing parameters maintain backward compatibility while enabling the new metrics functionality.
436-444: LGTM: First-chunk timing correctly computed.The logic accurately identifies the initial text chunk and calculates time-to-first-token only when appropriate, ensuring metrics are meaningful.
446-455: LGTM: Chunk events enriched with timing metadata.The
LLMResponseChunkEventemissions now carry first-chunk identification and time-to-first-token metrics, providing granular observability into streaming performance.plugins/xai/vision_agents/plugins/xai/llm.py (7)
2-2: LGTM: Imports aligned with timing instrumentation.The additions are correctly used for event emission and timing metrics.
Also applies to: 11-11
130-141: LGTM: Request timing properly initialized.The event emission and timing setup correctly capture request start before obtaining the response.
204-219: LGTM: Timing metrics correctly calculated and emitted.The latency and time-to-first-token calculations properly handle both streaming and non-streaming paths.
438-445: LGTM: Signature extension maintains backward compatibility.The additional timing parameters have sensible defaults and clear type hints.
454-474: LGTM: First chunk timing correctly computed and emitted.The time-to-first-token calculation is properly guarded and only applied to the initial content chunk.
393-396: Verify: Tool call follow-ups omit timing instrumentation.The call to
_standardize_and_emit_chunkwithin the tool call handling loop doesn't pass timing parameters, unlike the main request path (lines 157-163). If tool call follow-ups should be timed as separate requests, consider adding timing instrumentation similar to the main path.If separate timing is desired for each tool call round:
stream = original_kwargs.get("stream", True) if stream: llm_response = None pending_tool_calls = [] + + # Track timing for tool call follow-up + tool_request_start = time.perf_counter() + tool_first_token: Optional[float] = None async for response, chunk in self.xai_chat.stream(): + tool_is_first = False + if tool_first_token is None and chunk.content: + tool_first_token = time.perf_counter() + tool_is_first = True + llm_response_optional = self._standardize_and_emit_chunk( - chunk, response + chunk, + response, + request_start_time=tool_request_start, + first_token_time=tool_first_token, + is_first_chunk=tool_is_first, )
480-481: LGTM: Clarifying comment aids understanding.The note explains why the completion event isn't emitted within the chunk handler, improving code maintainability.
plugins/roboflow/vision_agents/plugins/roboflow/events.py (2)
5-5: LGTM! Event hierarchy upgrade supports metrics collection.The switch to
VideoProcessorDetectionEventaligns with the PR's observability objectives, providing a standardized event surface for video processor plugins.
17-18: LGTM! Proper inheritance from VideoProcessorDetectionEvent.The class correctly inherits from the new base event type, gaining the metrics-friendly fields (
model_id,inference_time_ms,detection_count) while maintaining its specific event type identifier.plugins/openai/vision_agents/plugins/openai/openai_llm.py (6)
2-2: LGTM! Clean import additions.The
timemodule andLLMRequestStartedEventimports support the new timing instrumentation.Also applies to: 16-16
151-164: LGTM! Solid timing foundation.The request start event and timing setup using
perf_counter()are implemented correctly.
173-189: LGTM! Non-streaming metrics correctly computed.Latency calculation is accurate. Not tracking TTFT for non-streaming responses is a reasonable design choice since the entire response arrives atomically.
199-210: LGTM! First token detection implemented correctly.The logic properly captures the first delta event's timing and propagates it through the streaming pipeline.
430-470: LGTM! Centralized completion event emission.The method cleanly extracts token metrics and emits a comprehensive completion event. The docstring follows Google style guidelines.
526-547: LGTM! Completion timing metrics correctly computed.The latency and TTFT calculations are accurate and properly guarded against
Nonevalues.plugins/aws/vision_agents/plugins/aws/aws_llm.py (2)
145-156: LGTM: Timing instrumentation and token extraction.The request timing via
time.perf_counter()is appropriate for measuring latency. Token extraction from the AWS responseusagedict is correctly conditional, and thetotal_tokenscomputation handles the case where only one ofinput_tokensoroutput_tokensis present.Also applies to: 312-339
395-406: LGTM: Streaming path timing instrumentation.First-token tracking correctly captures the time when the first text delta appears in the stream. The
ttft_mscalculation is properly guarded by aNonecheck onfirst_token_time.Also applies to: 453-458, 596-611
agents-core/vision_agents/core/observability/collector.py (3)
51-71: LGTM: Clean collector initialization pattern.The constructor properly initializes session tracking state and delegates to
_subscribe_to_eventsfor modular subscription setup. The conditional checks for component existence (e.g.,if not self.agent.llm) prevent subscription errors when components are not configured.
180-221: LGTM: LLM event handlers correctly guard optional fields.The handlers properly check for
Nonebefore recording metrics, ensuring no spurious zero values are recorded when data is unavailable.
471-483: LGTM: Base attributes helper.Simple and effective extraction of the
providerattribute from events.tests/test_metrics_collector.py (3)
10-10: Consider whetherMagicMockusage aligns with testing guidelines.The coding guidelines state "Never mock in tests." While the
MagicMockhere is used only to create a minimal agent stub (not to mock behavior under test), this could be replaced with a simple dataclass or named tuple if strict adherence is required.That said, the mock is used minimally—just to provide a null-ish agent structure—so this may be acceptable as a pragmatic boundary.
Also applies to: 84-89
98-158: LGTM: Comprehensive LLM handler tests.Good coverage of both full-data and partial-data scenarios. The assertions correctly verify metric recording with expected values and attributes.
315-364: LGTM: VLM handler tests are thorough.Tests verify all VLM-specific metrics including inference count, latency, token usage, and frame processing counts.
plugins/openrouter/vision_agents/plugins/openrouter/openrouter_llm.py (2)
258-270: LGTM: Primary path timing instrumentation is well-implemented.The timing capture strategy is correct:
request_start_timeis captured before the API call, andfirst_token_timeis recorded on the first content delta. The latency and TTFT calculations are accurate.Also applies to: 286-287, 340-354
384-411: LGTM: Non-streaming response handles timing and token extraction correctly.The conditional latency calculation and token extraction from
response.usagefollow the established pattern. Thetotal_tokensfallback logic is consistent with other plugins.agents-core/vision_agents/core/observability/metrics.py (3)
35-40: LGTM: Proper library instrumentation pattern.Using
metrics.get_meter()without configuring a provider is the correct approach for libraries. Applications control the provider configuration, and metrics become no-ops if unconfigured.
42-115: LGTM: Well-structured metric definitions.Metrics are organized by component (STT, TTS, LLM) with consistent naming conventions, appropriate types (histograms for latencies, counters for counts), and descriptive documentation.
180-220: LGTM: VLM and video processor metrics.The metrics cover the essential observability points for vision workloads: inference latency, token usage, frame processing, and detection counts.
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (4)
3-3: LGTM: Clean import additions for timing instrumentation.The
timemodule andLLMRequestStartedEventimports are exactly what's needed for the observability enhancements.Also applies to: 19-19
127-140: LGTM: Request instrumentation properly positioned.The request started event is emitted before the API call (line 142), and timing initialization is clean. Good ordering ensures accurate latency capture.
254-259: LGTM: Consistent timing propagation through all stream paths.All three stream processing branches (initial, follow-up for tool calls, and final) correctly thread
request_start_timeandfirst_token_timethrough to_standardize_and_emit_event. Clean pattern.Also applies to: 331-336, 360-365
378-392: LGTM: Streaming completion metrics calculated correctly.The timing calculations at lines 378-382 properly account for the full end-to-end latency including all tool calling rounds. Both latency and time-to-first-token are correctly computed and emitted.
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (3)
5-5: Good choice usingtime.perf_counter()for latency measurement.
Monotonic clock is the right tool for durations.
254-265: Processing-time propagation in partial transcripts is solid.
Optional: clamp/round to avoid negative/very noisy sub-ms values if clocks skew (rare withperf_counter(), but harmless to guard).
345-357: Resetting_audio_start_timeon error/close is the right safety net.
This prevents the next utterance from inheriting old timing state.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @agents-core/vision_agents/core/observability/collector.py:
- Around line 248-276: The byte-count logic in _on_realtime_audio_input and
_on_realtime_audio_output assumes 16-bit samples; replace the
len(event.data.samples) * 2 calculation with event.data.samples.nbytes so the
actual dtype is respected (use event.data.samples.nbytes when computing
audio_bytes before calling metrics.realtime_audio_input_bytes.add and
metrics.realtime_audio_output_bytes.add); keep the existing guards for
event.data and ensure samples is present (e.g., only read .nbytes if
event.data.samples is not None) to avoid attribute errors.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
agents-core/vision_agents/core/observability/collector.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path in Python code
Never writeexcept Exception as e- use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings
Files:
agents-core/vision_agents/core/observability/collector.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: unit / Mypy
- GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
- GitHub Check: unit / Ruff
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
- GitHub Check: unit / Ruff
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Mypy
🔇 Additional comments (8)
agents-core/vision_agents/core/observability/collector.py (8)
1-49: LGTM!The module structure is well-organized with clear documentation and proper use of TYPE_CHECKING to avoid circular imports. Event types are correctly imported at module level for type hint resolution.
51-71: LGTM!The class initialization is clean and well-documented. The session start tracking dictionary is appropriately initialized for computing realtime session durations.
73-175: LGTM!The subscription orchestration is well-structured with defensive checks for component existence before subscribing. The pattern of async wrappers delegating to synchronous handlers is consistent and clear.
180-221: LGTM!LLM event handlers properly extract metrics with appropriate null checks and attribute augmentation. The success boolean is correctly converted to a lowercase string for consistent metric tagging.
315-381: LGTM!STT, TTS, and turn detection handlers follow the established pattern with appropriate null checks and metric recording. Character count calculation for TTS uses the built-in len() function correctly.
387-433: LGTM!VLM handlers appropriately record both token-based metrics and video-specific metrics. The checks for positive frame and detection counts prevent recording zero values.
439-466: LGTM!Video processor handlers correctly subscribe to agent-level events and record detection metrics with appropriate conditionals. The pattern is consistent with other handlers.
471-483: LGTM!The helper method cleanly extracts base attributes, providing consistent provider tagging across all metrics when plugin_name is available.
|
How are we progressing on this one? @d3xvn |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @examples/02_golf_coach_example/golf_coach_with_metrics.py:
- Around line 41-49: Add a Google-style docstring to the create_agent function
describing its purpose, parameters, and return type. The docstring should
briefly explain that this function asynchronously creates and returns an Agent
instance configured with specific components such as edge, agent_user,
instructions, llm, and processors. Place the docstring immediately below the
async def create_agent(**kwargs) line.
- Around line 52-55: Add a Google-style docstring to join_call explaining
purpose, parameters (agent, call_type, call_id, **kwargs) and that it returns
None; then replace the local-only assignment of metrics_collector with a clear
retention pattern so linters don’t flag it as unused — for example create and
attach the collector to the agent (setattr(agent, "_metrics_collector",
metrics_collector) or append to an existing agent._aux_resources list) and
mention this retention in the docstring so future readers know it’s kept alive
for the call lifetime.
- Around line 16-23: Move the Prometheus setup out of module scope into a
function (e.g., setup_metrics) to avoid starting the HTTP server on import:
create setup_metrics() that uses PROMETHEUS_PORT, calls
start_http_server(PROMETHEUS_PORT) and configures PrometheusMetricReader(),
MeterProvider(metric_readers=[reader]) and metrics.set_meter_provider(provider);
call setup_metrics() from the if __name__ == "__main__": block instead of
running it at import, and add basic error handling around start_http_server to
catch/address socket/address-in-use errors (or try a fallback port) to avoid
crashing when port 9464 is already bound.
In @plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py:
- Around line 401-419: The streaming path should stop using hasattr/getattr and
access attributes directly: replace the hasattr(final_message, "usage") and
getattr(usage, "output_tokens", None) pattern with direct attribute reads like
usage = final_message.usage and output_tokens = usage.output_tokens (mirroring
_emit_completion_event); also capture input_tokens in the streaming path the
same way the non-streaming code does (extract from the response/message_start or
usage as used in _emit_completion_event) before building the
LLMResponseCompletedEvent (which includes original, total_text,
plugin_name="anthropic", latency_ms, time_to_first_token_ms, input_tokens,
output_tokens, model).
- Around line 143-145: The code computes latency_ms inside the ClaudeMessage
branch using (time.perf_counter() - request_start_time) * 1000 but that value is
immediately overwritten later after the tool-calling loop; remove this unused
assignment to latency_ms in the ClaudeMessage handling (or if you intended to
record interim latency, rename it to interim_latency_ms and persist/use it
instead) so only the final latency calculation after the tool loop remains;
update any references to latency_ms to use the single canonical final value and
ensure no dead assignment to latency_ms remains in the ClaudeMessage block.
🧹 Nitpick comments (2)
agents-core/vision_agents/core/llm/events.py (1)
238-267: Consider extracting sharederror_messageproperty to reduce duplication.The
error_messageproperty is identical acrossVLMErrorEvent,LLMErrorEvent, andRealtimeErrorEvent. A mixin or intermediate base class could reduce this duplication.♻️ Optional: Extract to a mixin
# In base.py or a new mixin file class ErrorMessageMixin: """Mixin for events with error fields.""" error: Optional[Exception] = None @property def error_message(self) -> str: return str(self.error) if self.error else "Unknown error"Then each error event could inherit from both
PluginBaseEventandErrorMessageMixin.plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (1)
252-256: Usage ofgetattr/hasattrfor SDK types.Per coding guidelines, prefer normal attribute access. While defensive checks for external SDK responses are pragmatic, consider using try/except for cleaner attribute access:
♻️ Alternative using try/except
- if first_token_time is None and event.type == "content_block_delta": - delta = getattr(event, "delta", None) - if delta is not None and hasattr(delta, "text") and delta.text: - first_token_time = time.perf_counter() + if first_token_time is None and event.type == "content_block_delta": + try: + if event.delta.text: + first_token_time = time.perf_counter() + except AttributeError: + passBased on coding guidelines, prefer normal attribute access in Python.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
agents-core/vision_agents/core/llm/events.pyexamples/02_golf_coach_example/golf_coach_with_metrics.pyplugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path in Python code
Never writeexcept Exception as e- use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings
Files:
examples/02_golf_coach_example/golf_coach_with_metrics.pyagents-core/vision_agents/core/llm/events.pyplugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
🧬 Code graph analysis (3)
examples/02_golf_coach_example/golf_coach_with_metrics.py (3)
agents-core/vision_agents/core/edge/types.py (1)
User(15-18)agents-core/vision_agents/core/agents/agent_launcher.py (1)
AgentLauncher(15-185)agents-core/vision_agents/core/observability/collector.py (1)
MetricsCollector(51-483)
agents-core/vision_agents/core/llm/events.py (3)
agents-core/vision_agents/core/events/base.py (2)
PluginBaseEvent(52-54)error_message(92-93)agents-core/vision_agents/core/agents/events.py (1)
error_message(65-66)agents-core/vision_agents/core/tts/events.py (1)
error_message(61-62)
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (6)
agents-core/vision_agents/core/llm/events.py (3)
LLMRequestStartedEvent(91-100)LLMResponseCompletedEvent(129-153)LLMResponseChunkEvent(104-125)agents-core/vision_agents/core/events/manager.py (1)
send(437-481)plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)
_emit_completion_event(430-470)_standardize_and_emit_event(472-549)agents-core/vision_agents/core/agents/transcript_buffer.py (1)
text(81-83)plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
_standardize_and_emit_event(417-458)agents-core/vision_agents/core/llm/llm.py (1)
LLMResponseEvent(39-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: unit / Ruff
- GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Mypy
- GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
- GitHub Check: unit / Ruff
🔇 Additional comments (11)
examples/02_golf_coach_example/golf_coach_with_metrics.py (5)
1-8: LGTM!Clear documentation with practical run instructions and endpoint guidance—exactly what an example needs.
25-38: LGTM!The
# noqa: E402comments are well-justified given the OpenTelemetry configuration ordering requirement. Logging and env setup are appropriate for an example script.
57-76: LGTM!The verbose logging is appropriate for an example script—it helps users understand which metrics are being collected without digging through source code.
78-84: LGTM!The call lifecycle follows the expected pattern: create, join via context manager, send initial prompt, then finish.
87-88: LGTM!Entry point correctly wires
AgentLauncherwith the CLI utility, matching the established pattern.agents-core/vision_agents/core/llm/events.py (3)
90-101: LGTM!The
LLMRequestStartedEventfollows the established patterns in this file: auto-generated UUID for request tracking, consistent field structure, and proper type hints.
121-126: LGTM!The first-chunk timing fields enable proper TTFT metrics collection for streaming responses—a key observability metric.
137-154: LGTM!Comprehensive metrics surface for LLM observability. The inline documentation clarifying that
total_tokensmay differ from the sum when cached is valuable context.plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (3)
127-140: LGTM!Proper request tracking initialization:
LLMRequestStartedEventis emitted before the API call with accurate model and streaming metadata, andtime.perf_counter()is the correct choice for latency measurement.
425-464: Helper method centralizes completion event emission.Good extraction pattern matching the OpenAI plugin. The
total_tokensfallback logic correctly handles partial token availability.Minor note: the
hasattrusage on lines 444 and 448 could be refactored to try/except per coding guidelines, but this is consistent with the existing codebase patterns.
496-522: LGTM!The
emitted_first_chunkflag approach is more explicit than checkinglen(text_parts) == 0. The logic correctly ensures TTFT is only attached to the first chunk event across all streaming iterations.
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
Outdated
Show resolved
Hide resolved
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
Outdated
Show resolved
Hide resolved
- Remove hasattr/getattr calls, use direct attribute access - Track input_tokens from message_start and output_tokens from message_delta - Remove unused latency_ms calculation (was being overwritten) - Use samples.nbytes for audio byte counting to handle all formats
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @agents-core/vision_agents/core/observability/collector.py:
- Around line 248-274: Both _on_realtime_audio_input and
_on_realtime_audio_output access event.data.samples.nbytes after only checking
event.data; add a null check for samples to avoid AttributeError by changing the
guards to verify event.data and event.data.samples (i.e., use if event.data and
event.data.samples:) before calling event.data.samples.nbytes and recording
bytes; keep the existing duration recording logic unchanged and apply the same
samples check in both functions (references: _on_realtime_audio_input,
_on_realtime_audio_output, event.data.samples.nbytes,
metrics.realtime_audio_input_bytes, metrics.realtime_audio_output_bytes).
In @plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py:
- Around line 436-444: Remove the hasattr checks and access attributes directly:
read response.usage into a local (e.g., usage = response.usage) and, if usage is
truthy, set input_tokens = usage.input_tokens and output_tokens =
usage.output_tokens; then set model = response.model (fall back to self.model
only if you explicitly need a fallback), eliminating hasattr(response, "usage")
and hasattr(response, "model") usage.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
agents-core/vision_agents/core/observability/collector.pyexamples/02_golf_coach_example/golf_coach_with_metrics.pyplugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path in Python code
Never writeexcept Exception as e- use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings
Files:
agents-core/vision_agents/core/observability/collector.pyexamples/02_golf_coach_example/golf_coach_with_metrics.pyplugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
🧠 Learnings (1)
📚 Learning: 2025-11-24T17:04:43.030Z
Learnt from: CR
Repo: GetStream/Vision-Agents PR: 0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-11-24T17:04:43.030Z
Learning: Applies to **/*.py : Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Applied to files:
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
🧬 Code graph analysis (2)
agents-core/vision_agents/core/observability/collector.py (2)
agents-core/vision_agents/core/events/base.py (2)
PluginBaseEvent(52-54)VideoProcessorDetectionEvent(143-156)agents-core/vision_agents/core/llm/events.py (12)
LLMResponseCompletedEvent(129-153)LLMErrorEvent(255-267)ToolEndEvent(167-176)RealtimeConnectedEvent(11-18)RealtimeDisconnectedEvent(22-29)RealtimeAudioInputEvent(33-37)RealtimeAudioOutputEvent(41-46)RealtimeResponseEvent(50-58)RealtimeUserSpeechTranscriptionEvent(180-185)RealtimeAgentSpeechTranscriptionEvent(189-194)VLMInferenceCompletedEvent(213-235)VLMErrorEvent(239-251)
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (2)
agents-core/vision_agents/core/llm/events.py (3)
LLMRequestStartedEvent(91-100)LLMResponseCompletedEvent(129-153)LLMResponseChunkEvent(104-125)plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
_emit_completion_event(430-470)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Mypy
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (19)
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (8)
1-24: LGTM!The imports are appropriate—
timefor timing instrumentation,LLMRequestStartedEventfor the new request lifecycle events. All aligned with the metrics implementation pattern.
127-140: LGTM!The request started event emission and timing initialization are well-implemented. Using
time.perf_counter()is the right choice for measuring elapsed time—monotonic, high resolution, perfect for latency metrics.
231-237: LGTM!Latency calculation and event emission for non-streaming responses are correct. Appropriately omits
time_to_first_token_mssince TTFT is not applicable for non-streaming requests.
243-269: LGTM!The streaming token tracking implementation is solid—extracting
input_tokensfrommessage_startandoutput_tokensfrommessage_deltaaligns with Anthropic's streaming event structure. The first-token timing detection on actual text deltas ensures accurate TTFT metrics.
399-417: LGTM!The streaming completion event emission is comprehensive—includes latency, TTFT, token counts, and model info. This mirrors the structure from the OpenAI plugin and provides consistent observability across providers.
462-524: LGTM!The method signature expansion and first-chunk timing propagation are well-designed. The docstring follows Google style appropriately. The tuple return pattern for tracking
emitted_first_chunkstate across stream iterations is a clean approach that avoids mutable state sharing.
341-351: LGTM!Token tracking in follow-up streams for multi-hop tool calling scenarios is correctly implemented. Direct attribute access is used appropriately here.
376-386: LGTM!Consistent token tracking pattern for the finalization stream pass. This ensures accurate metrics even through complex tool-calling flows.
agents-core/vision_agents/core/observability/collector.py (7)
1-48: LGTM! Clean imports and module docstring.The module docstring follows Google style, providing clear usage examples. Imports are well-organized with TYPE_CHECKING guard for the Agent type.
62-71: LGTM! Well-structured initialization.The constructor properly initializes the collector, tracks session start times, and subscribes to events. The docstring adheres to Google style conventions.
One minor note: consider using the built-in
dict[str, float]instead ofDict[str, float]for modern Python (3.9+).
73-175: LGTM! Solid defensive programming.The subscription methods properly check for component existence before subscribing (e.g.,
if not self.agent.llm: return). The pattern of async handlers delegating to sync_on_*methods is clean and consistent.
180-221: LGTM! Careful null-guarding on metrics.The handlers properly check for
Nonevalues before recording metrics (e.g.,if event.latency_ms is not None). The pattern of converting booleans to lowercase strings for attributes is consistent.
313-365: LGTM! Consistent metric recording patterns.STT and TTS handlers follow the established pattern: extract base attributes, guard against
Nonevalues, and record metrics. The character counting for TTS synthesis (line 355) is a nice touch for tracking usage.
371-431: LGTM! Thoughtful zero-value filtering.The VLM handler only records
frames_processedanddetectionswhen greater than zero (lines 418-421), avoiding noise from zero-value counter increments. The turn detection handler properly guards optional timing fields.
437-481: LGTM! Clean separation of processor events.The video processor events are correctly subscribed at the agent level rather than the LLM level. The
_base_attributeshelper is minimal and focused, extracting only theplugin_nameas theproviderattribute.examples/02_golf_coach_example/golf_coach_with_metrics.py (4)
1-38: LGTM! Proper OpenTelemetry initialization ordering.The configuration of OpenTelemetry before importing vision_agents (lines 10-23) is correct and well-documented. This ensures metrics are captured from the moment the library loads. The
# noqa: E402comments appropriately suppress import order warnings.
41-49: LGTM! Clean agent factory function.The
create_agentfunction is well-structured, configuring the agent with appropriate components for a golf coaching use case.
52-84: The underscore prefix is intentional, the lifecycle is sound.The
_metrics_collectorvariable (line 55) uses an underscore prefix to indicate it's not explicitly used after creation—a lint convention. The collector remains alive throughout theasync with agent.join(call)block, which is the intended behavior. The subscriptions it creates keep recording metrics until the function returns.
87-88: LGTM! Standard CLI entry point.The main guard properly uses
AgentLauncherwith the defined factory functions.
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
Outdated
Show resolved
Hide resolved
- Use direct attribute access for response.usage and response.model - Add null check for event.data.samples before accessing .nbytes
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (1)
499-525: Replacehasattrwith direct attribute access or type checking.The coding guidelines forbid the use of
hasattr. The delta's type is known—TextDeltais already imported and waiting. Let it speak its name:♻️ Suggested fix
if event.type == "content_block_delta": delta_event: RawContentBlockDeltaEvent = event - if hasattr(delta_event.delta, "text") and delta_event.delta.text: + if isinstance(delta_event.delta, TextDelta) and delta_event.delta.text: text_parts.append(delta_event.delta.text)Based on learnings: "Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python."
🧹 Nitpick comments (2)
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (1)
410-428: Consider computingtotal_tokensfor consistency with_emit_completion_event.The streaming path's final emission omits
total_tokens, while_emit_completion_event(line 466-468) computes it from input and output tokens. For symmetry in the metrics darkness:♻️ Suggested fix
self.events.send( LLMResponseCompletedEvent( original=last_followup_stream or original, text=total_text, plugin_name="anthropic", latency_ms=latency_ms, time_to_first_token_ms=ttft_ms, input_tokens=input_tokens, output_tokens=output_tokens, + total_tokens=(input_tokens or 0) + (output_tokens or 0) + if input_tokens or output_tokens + else None, model=self.model, ) )agents-core/vision_agents/core/observability/collector.py (1)
469-481: Consider includingplugin_versionin base attributes.The
PluginBaseEventbase class also carriesplugin_version. Including it could aid in debugging version-specific issues in production dashboards. This is optional and can be deferred.♻️ Optional enhancement
def _base_attributes(self, event: PluginBaseEvent) -> dict: attrs = {} if event.plugin_name: attrs["provider"] = event.plugin_name + if event.plugin_version: + attrs["provider_version"] = event.plugin_version return attrs
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
agents-core/vision_agents/core/observability/collector.pyplugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path in Python code
Never writeexcept Exception as e- use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings
Files:
agents-core/vision_agents/core/observability/collector.pyplugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
🧠 Learnings (1)
📚 Learning: 2025-11-24T17:04:43.030Z
Learnt from: CR
Repo: GetStream/Vision-Agents PR: 0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-11-24T17:04:43.030Z
Learning: Applies to **/*.py : Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Applied to files:
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py
🧬 Code graph analysis (2)
agents-core/vision_agents/core/observability/collector.py (10)
plugins/roboflow/vision_agents/plugins/roboflow/roboflow_cloud_processor.py (1)
events(213-216)plugins/roboflow/vision_agents/plugins/roboflow/roboflow_local_processor.py (1)
events(208-211)agents-core/vision_agents/core/events/base.py (2)
PluginBaseEvent(52-54)VideoProcessorDetectionEvent(143-156)agents-core/vision_agents/core/llm/events.py (12)
LLMResponseCompletedEvent(129-153)LLMErrorEvent(255-267)ToolEndEvent(167-176)RealtimeConnectedEvent(11-18)RealtimeDisconnectedEvent(22-29)RealtimeAudioInputEvent(33-37)RealtimeAudioOutputEvent(41-46)RealtimeResponseEvent(50-58)RealtimeUserSpeechTranscriptionEvent(180-185)RealtimeAgentSpeechTranscriptionEvent(189-194)VLMInferenceCompletedEvent(213-235)VLMErrorEvent(239-251)agents-core/vision_agents/core/stt/events.py (2)
STTTranscriptEvent(17-47)STTErrorEvent(81-93)agents-core/vision_agents/core/tts/events.py (1)
TTSSynthesisCompleteEvent(35-45)agents-core/vision_agents/core/turn_detection/events.py (1)
TurnEndedEvent(29-46)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (2)
on_stt_transcript(121-122)_on_stt_transcript(270-275)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)
on_stt_transcript(229-230)_on_stt_transcript(396-401)tests/test_metrics_collector.py (2)
record(34-35)add(37-38)
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (3)
agents-core/vision_agents/core/llm/events.py (3)
LLMRequestStartedEvent(91-100)LLMResponseCompletedEvent(129-153)LLMResponseChunkEvent(104-125)plugins/openai/vision_agents/plugins/openai/openai_llm.py (2)
_emit_completion_event(430-470)_standardize_and_emit_event(472-549)plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
_standardize_and_emit_event(417-458)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Mypy
🔇 Additional comments (11)
plugins/anthropic/vision_agents/plugins/anthropic/anthropic_llm.py (5)
1-25: LGTM!The dark mouth of necessity swallows these imports whole—
timefor the ticking clock,TextDeltafor the streaming whisper,LLMRequestStartedEventfor the ritual beginning. All are consumed by the code that follows.
128-141: LGTM!The bell tolls at the request's birth—
LLMRequestStartedEventannounces itself like a black-winged harbinger. Theperf_counterbegins its relentless count, andfirst_token_timewaits, hollow, for the first syllable to arrive.
232-238: LGTM!In the non-streaming abyss, latency is measured in one clean stroke—the whole response arrives like a stone dropped into still water. No first token to mark, just the full weight of completion.
252-276: LGTM!The stream drips its tokens one by one—each
TextDeltaa heartbeat in the wire. The first token's arrival is captured like a moth pinned to time, and the usage metrics accumulate in the margins ofmessage_startandmessage_delta. The flagemitted_first_chunkstands guard, ensuring the first breath is counted only once.
432-471: LGTM!A method born to carry the weight of completion—it extracts tokens from the response's belly, computes the sum of input and output, and sends forth the
LLMResponseCompletedEventinto the void. The docstring follows the Google gospel. The arithmetic is clean.agents-core/vision_agents/core/observability/collector.py (6)
83-175: LGTM!The subscription methods follow a consistent, defensive pattern—checking component availability before subscribing. The async handler wrappers correctly delegate to synchronous methods, maintaining a clean separation.
180-221: LGTM!The LLM handlers properly guard against
Nonevalues before recording metrics and consistently extract base attributes. Thestr(event.success).lower()pattern for boolean attributes aligns with OpenTelemetry conventions.
248-274: LGTM!The audio input/output handlers correctly perform chained null checks (
event.data and event.data.samples is not None) before accessingnbytes, which addresses the concern from the commit message. This avoidsAttributeErrorwithout resorting tohasattr.
313-379: LGTM!STT, TTS, and turn detection handlers maintain consistency with the established pattern. The character count via
len(event.text)is straightforward and efficient.
385-463: LGTM!The VLM and video processor handlers correctly distinguish their subscription points—VLM events through
agent.llm.eventsand processor events throughagent.events. The conditional checks for non-zero counts prevent noise in metrics.
26-43: BothRealtimeErrorEventandTTSErrorEventare properly exported from their respective modules. The imports are valid.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@tests/test_metrics_collector.py`:
- Around line 185-189: There is a duplicated assertion:
llm_time_to_first_token_ms.record.assert_not_called() appears twice; remove the
duplicate and replace it with the correct metric assertion if one was intended
(e.g., if the test defines a different metric variable like
llm_time_to_first_byte_ms or llm_time_to_last_token_ms, call that metric's
.record.assert_not_called()), otherwise just delete the extra
llm_time_to_first_token_ms.record.assert_not_called() line so each metric
(llm_latency_ms, llm_time_to_first_token_ms, llm_input_tokens,
llm_output_tokens) is asserted exactly once.
- Around line 54-95: The fixture mock_metrics currently stops all patches in the
finally block immediately, so mocks are removed before tests run and it also
uses hasattr (violating guidelines); change it to a yield-based fixture that
starts patches for each metric method (check for methods using getattr(metric,
"record", None) and getattr(metric, "add", None) and/or test callability) before
yielding and then stops all started patch objects after the yield, ensuring you
still reference the same unique symbols (mock_metrics, all_metrics, record, add,
patches, patch.object) so tests retain instrumentation until teardown.
🧹 Nitpick comments (1)
tests/test_metrics_collector.py (1)
318-347: Consider testingvideo_detectionsmetric.The
VLMInferenceCompletedEventhas adetectionsfield (per the event definition), andvideo_detectionsis imported but never asserted upon. If the collector records this metric, it should be tested here.💡 Optional: Add assertion for video_detections
vlm_output_tokens.add.assert_called_once_with( 20, {"provider": "moondream", "model": "moondream-cloud"} ) + # If detections metric is recorded by collector: + # video_detections.add.assert_called_once_with( + # 0, {"provider": "moondream", "model": "moondream-cloud"} + # )Alternatively, create an event with
detections=3and verify the metric is recorded.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
tests/test_metrics_collector.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*test*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*test*.py: Never mock in tests; use pytest for testing
Mark integration tests with@pytest.mark.integrationdecorator
@pytest.mark.asynciois not needed - it is automatic
Files:
tests/test_metrics_collector.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path in Python code
Never writeexcept Exception as e- use specific exception handling
Avoid using getattr, hasattr, delattr and setattr; prefer normal attribute access in Python
Docstrings should follow the Google style guide for docstrings
Files:
tests/test_metrics_collector.py
🧬 Code graph analysis (1)
tests/test_metrics_collector.py (7)
agents-core/vision_agents/core/events/manager.py (2)
EventManager(56-561)wait(483-496)agents-core/vision_agents/core/llm/events.py (4)
LLMResponseCompletedEvent(129-153)ToolEndEvent(167-176)VLMErrorEvent(239-251)VLMInferenceCompletedEvent(213-235)agents-core/vision_agents/core/observability/collector.py (2)
MetricsCollector(51-481)_base_attributes(469-481)agents-core/vision_agents/core/stt/events.py (3)
STTErrorEvent(81-93)STTTranscriptEvent(17-47)TranscriptResponse(7-13)agents-core/vision_agents/core/tts/events.py (2)
TTSErrorEvent(49-62)TTSSynthesisCompleteEvent(35-45)agents-core/vision_agents/core/turn_detection/events.py (1)
TurnEndedEvent(29-46)plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (1)
start(137-185)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
- GitHub Check: unit / Mypy
- GitHub Check: unit / Ruff
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff
- GitHub Check: unit / Validate extra dependencies in "agents-core/pyproject.toml"
- GitHub Check: unit / Mypy
🔇 Additional comments (4)
tests/test_metrics_collector.py (4)
97-114: LGTM!The fixture properly creates an
EventManagerand registers all the event types exercised by the tests.
117-136: Structure looks correct, but depends on the brokenmock_metricsfixture.The wiring of mock agent components to the shared
event_manageris appropriate. Oncemock_metricsis fixed, this fixture should function as intended.
142-168: LGTM!Test correctly exercises the LLM response completed handler via
EventManagerand validates all expected metric recordings.
369-387: LGTM!These tests verify the
_base_attributeshelper behavior directly, which is appropriate for unit testing the collector's internal logic.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Summary by CodeRabbit
New Features
Observability
Documentation
Tests
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.