Skip to content

feat: Emit OTEL attributes for AgentCore Evaluation support#368

Merged
jesseturner21 merged 3 commits intomainfrom
lshoval/emit-eval-otel-attributes
Mar 26, 2026
Merged

feat: Emit OTEL attributes for AgentCore Evaluation support#368
jesseturner21 merged 3 commits intomainfrom
lshoval/emit-eval-otel-attributes

Conversation

@jesseturner21
Copy link
Contributor

Summary

  • Add _emit_invocation_otel_attributes() to BedrockAgentCoreApp that automatically emits agentcore.invocation.user_prompt and agentcore.invocation.agent_response as OTEL span attributes on the root POST /invocations span
  • These attributes provide a canonical, framework-agnostic source of the user's prompt and the agent's response for AgentCore Evaluation, enabling evaluation of workflow agents that use custom state schemas (e.g. TypedDict with user_input/final_response fields) where the default MessagesState-based extraction in the evaluation mapper fails silently with null scores
  • Add prompt_key and response_key params to @app.entrypoint() so users can control which payload/result keys are used for OTEL attributes
  • Add 15 unit tests covering all code paths (payload extraction, response serialization, truncation, error handling, integration)

Test plan

  • 15 new unit tests pass (TestEmitInvocationOtelAttributes)
  • No regressions in existing test suite
  • Deploy workflow agent with updated SDK to AgentCore Runtime, verify attributes appear in CloudWatch spans
  • Run evaluation against workflow agent traces, confirm non-null scores

Add _emit_invocation_otel_attributes() to BedrockAgentCoreApp that
automatically emits agentcore.invocation.user_prompt and
agentcore.invocation.agent_response as OTEL span attributes on the
root POST /invocations span.

These attributes provide a canonical, framework-agnostic source of the
user's prompt and the agent's response for AgentCore Evaluation. They
enable evaluation of workflow agents that use custom state schemas
(e.g. TypedDict with user_input/final_response fields) where the
default MessagesState-based extraction in the evaluation mapper would
fail silently with null scores.

The method:
- Extracts user prompt from the payload dict (tries common keys like
  prompt, input, query, message, falls back to full JSON)
- Extracts agent response from the entrypoint return value
- Skips silently for streaming responses or when OTEL is not installed
- Attributes are capped at 16KB to stay within OTEL limits
Check span.attributes before setting agentcore.invocation.user_prompt
and agentcore.invocation.agent_response so that user-provided values
are not overwritten by auto-extraction.
1. Remove broken span.attributes guard — active Span doesn't expose
   .attributes (only ReadableSpan does), so the check was a no-op in prod.

2. Add prompt_key and response_key params to @app.entrypoint() so users
   can control which payload/result keys are used for OTEL attributes.
   Default (None) preserves the existing heuristic behavior.

3. Fix early return on Response objects — previously skipped user_prompt
   emission too. Now only skips agent_response for streaming responses.
@jesseturner21 jesseturner21 merged commit 8bae410 into main Mar 26, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants