Skip to content

[BOT ISSUE] Anthropic server_tool_use usage metrics not captured in span metrics #77

@braintrust-bot

Description

@braintrust-bot

Summary

The Anthropic instrumentation extracts input_tokens and output_tokens from the Messages API response usage object, but does not capture the server_tool_use sub-object. When Claude uses server-side tools (web search, code execution), the API returns usage counters like web_search_requests and web_fetch_requests inside usage.server_tool_use. These are silently dropped, making server-side tool usage invisible in Braintrust metrics.

What is missing

In InstrumentationSemConv.tagAnthropicResponse() (lines 220–230), the usage extraction only handles top-level token fields:

if (responseJson.has("usage")) {
    JsonNode usage = responseJson.get("usage");
    if (usage.has("input_tokens")) metrics.put("prompt_tokens", usage.get("input_tokens"));
    if (usage.has("output_tokens")) metrics.put("completion_tokens", usage.get("output_tokens"));
    // ... total tokens
}

No check for server_tool_use. A real response with web search looks like:

"usage": {
  "input_tokens": 1200,
  "output_tokens": 350,
  "server_tool_use": {
    "web_search_requests": 3,
    "web_fetch_requests": 2
  }
}

The missing extraction should dynamically map each field inside server_tool_use to a metric named server_tool_use_<field_name>, e.g.:

  • server_tool_use_web_search_requests
  • server_tool_use_web_fetch_requests
  • server_tool_use_code_execution_requests

Note: the full response JSON is stored in braintrust.output_json, so the raw data is technically present — but it is not extracted into braintrust.metrics where Braintrust's UI and cost calculations can use it.

For comparison, the OpenAI handler in the same file already extracts nested usage details (output_tokens_details.reasoning_tokens at lines 145–150), so this pattern is established.

Braintrust docs status

  • supported — Braintrust docs at https://www.braintrust.dev/docs/integrations/ai-providers/anthropic explicitly document server-side tool metrics: "When Claude uses server-side tools, Braintrust records the provider's tool usage counters dynamically." The documented metric names are server_tool_use_web_search_requests, server_tool_use_web_fetch_requests, server_tool_use_code_execution_requests.

Upstream sources

Local files inspected

  • braintrust-sdk/src/main/java/dev/braintrust/instrumentation/InstrumentationSemConv.java — lines 208–235 (tagAnthropicResponse only extracts input_tokens and output_tokens; no server_tool_use check)
  • braintrust-sdk/instrumentation/anthropic_2_2_0/src/main/java/dev/braintrust/instrumentation/anthropic/v2_2_0/TracingHttpClient.java — HTTP-level instrumentation, delegates to InstrumentationSemConv
  • braintrust-sdk/instrumentation/anthropic_2_2_0/src/test/java/dev/braintrust/instrumentation/anthropic/v2_2_0/BraintrustAnthropicTest.java — no test cases exercise server-side tool responses

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions