Summary
The Anthropic instrumentation extracts input_tokens and output_tokens from the Messages API response usage object, but does not capture the server_tool_use sub-object. When Claude uses server-side tools (web search, code execution), the API returns usage counters like web_search_requests and web_fetch_requests inside usage.server_tool_use. These are silently dropped, making server-side tool usage invisible in Braintrust metrics.
What is missing
In InstrumentationSemConv.tagAnthropicResponse() (lines 220–230), the usage extraction only handles top-level token fields:
if (responseJson.has("usage")) {
JsonNode usage = responseJson.get("usage");
if (usage.has("input_tokens")) metrics.put("prompt_tokens", usage.get("input_tokens"));
if (usage.has("output_tokens")) metrics.put("completion_tokens", usage.get("output_tokens"));
// ... total tokens
}
No check for server_tool_use. A real response with web search looks like:
"usage": {
"input_tokens": 1200,
"output_tokens": 350,
"server_tool_use": {
"web_search_requests": 3,
"web_fetch_requests": 2
}
}
The missing extraction should dynamically map each field inside server_tool_use to a metric named server_tool_use_<field_name>, e.g.:
server_tool_use_web_search_requests
server_tool_use_web_fetch_requests
server_tool_use_code_execution_requests
Note: the full response JSON is stored in braintrust.output_json, so the raw data is technically present — but it is not extracted into braintrust.metrics where Braintrust's UI and cost calculations can use it.
For comparison, the OpenAI handler in the same file already extracts nested usage details (output_tokens_details.reasoning_tokens at lines 145–150), so this pattern is established.
Braintrust docs status
- supported — Braintrust docs at https://www.braintrust.dev/docs/integrations/ai-providers/anthropic explicitly document server-side tool metrics: "When Claude uses server-side tools, Braintrust records the provider's tool usage counters dynamically." The documented metric names are
server_tool_use_web_search_requests, server_tool_use_web_fetch_requests, server_tool_use_code_execution_requests.
Upstream sources
Local files inspected
braintrust-sdk/src/main/java/dev/braintrust/instrumentation/InstrumentationSemConv.java — lines 208–235 (tagAnthropicResponse only extracts input_tokens and output_tokens; no server_tool_use check)
braintrust-sdk/instrumentation/anthropic_2_2_0/src/main/java/dev/braintrust/instrumentation/anthropic/v2_2_0/TracingHttpClient.java — HTTP-level instrumentation, delegates to InstrumentationSemConv
braintrust-sdk/instrumentation/anthropic_2_2_0/src/test/java/dev/braintrust/instrumentation/anthropic/v2_2_0/BraintrustAnthropicTest.java — no test cases exercise server-side tool responses
Summary
The Anthropic instrumentation extracts
input_tokensandoutput_tokensfrom the Messages API responseusageobject, but does not capture theserver_tool_usesub-object. When Claude uses server-side tools (web search, code execution), the API returns usage counters likeweb_search_requestsandweb_fetch_requestsinsideusage.server_tool_use. These are silently dropped, making server-side tool usage invisible in Braintrust metrics.What is missing
In
InstrumentationSemConv.tagAnthropicResponse()(lines 220–230), the usage extraction only handles top-level token fields:No check for
server_tool_use. A real response with web search looks like:The missing extraction should dynamically map each field inside
server_tool_useto a metric namedserver_tool_use_<field_name>, e.g.:server_tool_use_web_search_requestsserver_tool_use_web_fetch_requestsserver_tool_use_code_execution_requestsNote: the full response JSON is stored in
braintrust.output_json, so the raw data is technically present — but it is not extracted intobraintrust.metricswhere Braintrust's UI and cost calculations can use it.For comparison, the OpenAI handler in the same file already extracts nested usage details (
output_tokens_details.reasoning_tokensat lines 145–150), so this pattern is established.Braintrust docs status
server_tool_use_web_search_requests,server_tool_use_web_fetch_requests,server_tool_use_code_execution_requests.Upstream sources
usageobject: https://docs.anthropic.com/en/api/messages — documentsserver_tool_useas part of the responseusageobjectcom.anthropic:anthropic-java:2.2.0(depended on by this repo) — the HTTP response containsserver_tool_useregardless of SDK version since it's a JSON fieldLocal files inspected
braintrust-sdk/src/main/java/dev/braintrust/instrumentation/InstrumentationSemConv.java— lines 208–235 (tagAnthropicResponseonly extractsinput_tokensandoutput_tokens; noserver_tool_usecheck)braintrust-sdk/instrumentation/anthropic_2_2_0/src/main/java/dev/braintrust/instrumentation/anthropic/v2_2_0/TracingHttpClient.java— HTTP-level instrumentation, delegates toInstrumentationSemConvbraintrust-sdk/instrumentation/anthropic_2_2_0/src/test/java/dev/braintrust/instrumentation/anthropic/v2_2_0/BraintrustAnthropicTest.java— no test cases exercise server-side tool responses