Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 48 additions & 3 deletions .agents/skills/sdk-integrations/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@ Before editing:
1. Read the shared integration primitives.
2. Read the target provider package.
3. Pick the nearest existing integration as a reference.
4. Decide the span shape before writing patchers.
5. Run the narrowest provider nox session first.
4. Confirm provider versions and nox sessions from source files, not memory.
5. Decide the span shape before writing patchers.
6. Run the narrowest provider nox session first.

Do not design a new integration shape from scratch if an existing provider already matches the problem.

Expand All @@ -29,6 +30,7 @@ Always read:
- `py/src/braintrust/integrations/versioning.py`
- `py/src/braintrust/integrations/__init__.py`
- `py/src/braintrust/integrations/utils.py`
- `py/pyproject.toml` for provider matrix pins and cassette directory mappings
- `py/noxfile.py`

Read these when working on an existing integration:
Expand All @@ -51,6 +53,25 @@ Read these when relevant:

Do not forget `auto.py` and `auto_test_scripts/`. Import-order and subprocess regressions often only show up there.

## Version And CI Routing

Do not guess which provider versions or sessions apply.

Use these files as the routing chain:

- `py/pyproject.toml` `[tool.braintrust.matrix]`: supported provider versions and what `latest` resolves to
- `py/pyproject.toml` `[tool.braintrust.cassette-dirs]`: versioned cassette directory ownership
- `py/src/braintrust/integrations/versioning.py`: supported version helpers and gates
- `py/noxfile.py`: actual session names, package installation, and `BRAINTRUST_TEST_PACKAGE_VERSION`
- `.github/workflows/checks.yaml`: CI matrix and which sessions run in shards or static checks

When changing version-gated behavior:

1. Identify every matrix version for the provider.
2. Check whether the integration has `min_version`, `max_version`, `superseded_by`, or feature-detection branches.
3. Test the narrowest affected version first.
4. Add or update cassettes only for versions whose observable provider behavior intentionally changed.

## Pick A Reference

Start from the nearest current integration:
Expand Down Expand Up @@ -121,6 +142,16 @@ Do not start by wiring wrappers and only later decide what the span should conta
3. Add the integration import near the other integration imports.
4. Add or update the relevant subprocess auto-instrument test.

### Setup, manual wrapping, and auto-instrument

Treat these as distinct entry points:

- `setup_<provider>()`: explicit package-level patching
- public `wrap_*()` helpers: manual wrapping of a provided class, function, or client
- `auto_instrument()`: import-order-sensitive discovery and setup

When changing one entry point, check whether the other two should keep equivalent span behavior. If `auto_instrument()` changes or could be affected by import timing, validate it with a subprocess test instead of only calling the integration in-process.

## Package Layout Rules

Keep provider-specific behavior in `py/src/braintrust/integrations/<provider>/`.
Expand Down Expand Up @@ -171,15 +202,25 @@ Use this rubric:
- `metrics`: timing and numeric accounting such as token counts or elapsed time
- `error`: exceptions or failure information

Avoid double-counting token metrics:

- the integration that directly owns the model/provider API response should own token accounting
- orchestration/framework integrations should usually not log token metrics when underlying provider integrations can create leaf spans with usage metrics
- do not add fragile provider-specific ownership checks such as "if OpenAI is patched, skip metrics"; prefer a clear span ownership rule instead

Good span shaping usually means:

- flatten positional arguments into named fields
- normalize provider SDK objects into dicts, lists, or scalars
- normalize provider SDK objects into dicts, lists, or scalars when that improves readability
- drop duplicate or noisy transport fields
- aggregate streaming chunks into one final `output` plus stream-specific `metrics`

Do not over-serialize in integration code. Braintrust handles serialization when sending/logging spans, so integration tracing helpers usually only need to shape readable Python dicts/lists/scalars and materialize attachments where appropriate. Avoid unnecessary JSON dumps/loads, recursive conversion, or stringification just to make values serializable.

Keep wrapper bodies thin: prepare traced input, open the span, call the provider, normalize the result, and log `output`/`metadata`/`metrics`.

Braintrust span logging methods are boundary-safe and should not throw during normal integration use. Do not wrap `span.log(...)`, `span.set_attributes(...)`, or similar Braintrust span methods in broad `try`/`except` blocks. Only catch exceptions around provider calls or around integration-owned conversion code when there is a specific expected failure mode and a clear fallback.

Prefer provider-local helpers in `tracing.py`, for example:

```python
Expand Down Expand Up @@ -321,4 +362,8 @@ Avoid these failures:
- re-recording cassettes when behavior did not intentionally change
- adding a custom `_instrument_*` helper where `_instrument_integration()` already fits
- forgetting `target_module` for deep or optional patch targets
- double-counting token metrics in both orchestration/framework spans and provider leaf spans
- adding provider-specific token ownership detection instead of defining clear metric ownership for the integration
- doing excessive serialization/stringification in tracing code even though Braintrust serializes span payloads at send/log time
- wrapping Braintrust span logging methods in broad `try`/`except` blocks even though those methods are designed not to throw
- forcing non-image attachments through `image_url` shims, dropping unrecognized file inputs, or re-serializing non-attachment values while materializing payloads
4 changes: 4 additions & 0 deletions .agents/skills/sdk-integrations/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "SDK Integrations"
short_description: "Build Braintrust SDK integrations"
default_prompt: "Use $sdk-integrations to add or update a Braintrust Python SDK provider integration."
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ BRAINTRUST_API_KEY=<YOUR_API_KEY> braintrust eval tutorial_eval.py
| [Google ADK](py/src/braintrust/integrations/adk/) | Yes | `google-adk>=1.14.1` |
| [Pydantic AI](py/src/braintrust/integrations/pydantic_ai/) | Yes | `pydantic_ai>=1.10.0` |
| [LangChain](py/src/braintrust/integrations/langchain/) | Yes | `langchain-core>=0.3.28` |
| [LlamaIndex](py/src/braintrust/integrations/llamaindex/) | Yes | `llama-index-core>=0.13.0` |
| [DSPy](py/src/braintrust/integrations/dspy/) | Yes | `dspy>=2.6.0` |
| [OpenAI Agents](py/src/braintrust/integrations/openai_agents/) | Yes | `openai-agents>=0.0.19` |
| [Claude Agent SDK](py/src/braintrust/integrations/claude_agent_sdk/) | Yes | `claude_agent_sdk>=0.1.10` |
Expand Down
15 changes: 15 additions & 0 deletions py/noxfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -397,6 +397,21 @@ def test_langchain(session, version):
_run_tests(session, f"{INTEGRATION_DIR}/langchain/test_anthropic.py", version=version)


LLAMAINDEX_VERSIONS = _get_matrix_versions("llama-index-core")


@nox.session()
@nox.parametrize("version", LLAMAINDEX_VERSIONS, ids=LLAMAINDEX_VERSIONS)
def test_llamaindex(session, version):
_install_test_deps(session)
_install_group_locked(session, "test-llamaindex")
_install_matrix_dep(session, "llama-index-core", version)
# These packages are tightly version-coupled to llama-index-core, so we
# install them unpinned and let pip resolve compatible versions.
session.install("llama-index-llms-openai", "llama-index-embeddings-openai", silent=SILENT_INSTALLS)
_run_tests(session, f"{INTEGRATION_DIR}/llamaindex/test_llamaindex.py", version=version)


OPENROUTER_VERSIONS = _get_matrix_versions("openrouter")


Expand Down
12 changes: 12 additions & 0 deletions py/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,10 @@ test-crewai = [
"litellm==1.83.10",
]

test-llamaindex = [
{include-group = "test"},
]

test-cli = [
{include-group = "test"},
"httpx==0.28.1",
Expand Down Expand Up @@ -208,6 +212,9 @@ lint = [
"langchain-core",
"langchain-openai",
"langchain-anthropic",
"llama-index-core",
"llama-index-llms-openai",
"llama-index-embeddings-openai",
]

# -- Build deps ----------------------------------------------------------------
Expand Down Expand Up @@ -351,6 +358,10 @@ latest = "google-adk==1.31.1"
latest = "langchain-core==1.3.1"
"0.3.28" = "langchain-core==0.3.28"

[tool.braintrust.matrix.llama-index-core]
latest = "llama-index-core==0.14.21"
"0.13.0" = "llama-index-core==0.13.0"

[tool.braintrust.matrix.openrouter]
latest = "openrouter==0.9.1"
"0.6.0" = "openrouter==0.6.0"
Expand Down Expand Up @@ -401,6 +412,7 @@ dspy = ["dspy"]
google_genai = ["google-genai"]
langchain = ["langchain-core"]
litellm = ["litellm"]
llamaindex = ["llama-index-core"]
mistral = ["mistralai"]
openai = ["openai"]
openai_agents = ["openai-agents"]
Expand Down
5 changes: 5 additions & 0 deletions py/src/braintrust/auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
GoogleGenAIIntegration,
LangChainIntegration,
LiteLLMIntegration,
LlamaIndexIntegration,
MistralIntegration,
OpenAIAgentsIntegration,
OpenAIIntegration,
Expand Down Expand Up @@ -61,6 +62,7 @@ def auto_instrument(
dspy: bool = True,
adk: bool = True,
langchain: bool = True,
llamaindex: bool = True,
openai_agents: bool = True,
cohere: bool = True,
autogen: bool = True,
Expand Down Expand Up @@ -90,6 +92,7 @@ def auto_instrument(
dspy: Enable DSPy instrumentation (default: True)
adk: Enable Google ADK instrumentation (default: True)
langchain: Enable LangChain instrumentation (default: True)
llamaindex: Enable LlamaIndex instrumentation (default: True)
openai_agents: Enable OpenAI Agents SDK instrumentation (default: True)
cohere: Enable Cohere instrumentation (default: True)
autogen: Enable AutoGen instrumentation (default: True)
Expand Down Expand Up @@ -168,6 +171,8 @@ def auto_instrument(
results["adk"] = _instrument_integration(ADKIntegration)
if langchain:
results["langchain"] = _instrument_integration(LangChainIntegration)
if llamaindex:
results["llamaindex"] = _instrument_integration(LlamaIndexIntegration)
if openai_agents:
results["openai_agents"] = _instrument_integration(OpenAIAgentsIntegration)
if cohere:
Expand Down
2 changes: 2 additions & 0 deletions py/src/braintrust/integrations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from .google_genai import GoogleGenAIIntegration
from .langchain import LangChainIntegration
from .litellm import LiteLLMIntegration
from .llamaindex import LlamaIndexIntegration
from .mistral import MistralIntegration
from .openai import OpenAIIntegration
from .openai_agents import OpenAIAgentsIntegration
Expand All @@ -31,6 +32,7 @@
"GoogleGenAIIntegration",
"LiteLLMIntegration",
"LangChainIntegration",
"LlamaIndexIntegration",
"MistralIntegration",
"OpenAIIntegration",
"OpenAIAgentsIntegration",
Expand Down
44 changes: 44 additions & 0 deletions py/src/braintrust/integrations/llamaindex/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
"""Braintrust integration for LlamaIndex."""

from braintrust.logger import NOOP_SPAN, current_span, init_logger

from .integration import LlamaIndexIntegration


_IMPORT_ERROR: ImportError | None = None
try:
from .tracing import BraintrustSpanHandler as _BraintrustSpanHandler
except ImportError as exc:
_IMPORT_ERROR = exc
_BraintrustSpanHandler = None


if _BraintrustSpanHandler is None:

class BraintrustSpanHandler: # type: ignore[no-redef]
def __init__(self, *args, **kwargs):
message = "llama-index-core is required for braintrust.integrations.llamaindex"
if _IMPORT_ERROR is not None:
raise ImportError(message) from _IMPORT_ERROR
raise ImportError(message)

else:
BraintrustSpanHandler = _BraintrustSpanHandler


__all__ = [
"BraintrustSpanHandler",
"LlamaIndexIntegration",
"setup_llamaindex",
]


def setup_llamaindex(
api_key: str | None = None,
project_id: str | None = None,
project_name: str | None = None,
) -> bool:
if current_span() == NOOP_SPAN:
init_logger(project=project_name, api_key=api_key, project_id=project_id)

return LlamaIndexIntegration.setup()
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
interactions:
- request:
body: '{"messages":[{"role":"user","content":"Say hello"}],"model":"gpt-4o-mini","stream":false,"temperature":0.0}'
headers:
Accept:
- application/json
Accept-Encoding:
- gzip, deflate
Connection:
- keep-alive
Content-Length:
- '107'
Content-Type:
- application/json
Host:
- api.openai.com
User-Agent:
- AsyncOpenAI/Python 2.32.0
X-Stainless-Arch:
- arm64
X-Stainless-Async:
- async:asyncio
X-Stainless-Lang:
- python
X-Stainless-OS:
- MacOS
X-Stainless-Package-Version:
- 2.32.0
X-Stainless-Runtime:
- CPython
X-Stainless-Runtime-Version:
- 3.12.12
x-stainless-read-timeout:
- '60.0'
x-stainless-retry-count:
- '0'
method: POST
uri: https://api.openai.com/v1/chat/completions
response:
body:
string: "{\n \"id\": \"chatcmpl-DZMXgzDrRYSWAgIsCaDSh9YR44sVv\",\n \"object\":
\"chat.completion\",\n \"created\": 1777320504,\n \"model\": \"gpt-4o-mini-2024-07-18\",\n
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
\"assistant\",\n \"content\": \"Hello! How can I assist you today?\",\n
\ \"refusal\": null,\n \"annotations\": []\n },\n \"logprobs\":
null,\n \"finish_reason\": \"stop\"\n }\n ],\n \"usage\": {\n \"prompt_tokens\":
9,\n \"completion_tokens\": 9,\n \"total_tokens\": 18,\n \"prompt_tokens_details\":
{\n \"cached_tokens\": 0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\":
{\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\":
0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\":
\"default\",\n \"system_fingerprint\": \"fp_de7acce317\"\n}\n"
headers:
CF-Cache-Status:
- DYNAMIC
CF-Ray:
- 9f3075808c288bf1-YYZ
Connection:
- keep-alive
Content-Type:
- application/json
Date:
- Mon, 27 Apr 2026 20:08:24 GMT
Server:
- cloudflare
Strict-Transport-Security:
- max-age=31536000; includeSubDomains; preload
Transfer-Encoding:
- chunked
X-Content-Type-Options:
- nosniff
access-control-expose-headers:
- X-Request-ID
alt-svc:
- h3=":443"; ma=86400
content-length:
- '839'
openai-organization:
- braintrust-data
openai-processing-ms:
- '398'
openai-project:
- proj_vsCSXafhhByzWOThMrJcZiw9
openai-version:
- '2020-10-01'
set-cookie:
- __cf_bm=mjyBnpKLMI5D5WS40p8E1mQN7oHy_1jBrgFsvBqOkjU-1777320504.4091883-1.0.1.1-lvUez.SCj3y88biMJrRYWMfbn_h5skCWrne3feI6DevrpBzvCwQfRad2AqWQTlIlJGR8UqIBJcFH2PkX1uwo5ovDaUM9LSE6FCmq8vFh7NoyYLXP60YPb3h3qHmvY9Up;
HttpOnly; Secure; Path=/; Domain=api.openai.com; Expires=Mon, 27 Apr 2026
20:38:24 GMT
x-openai-proxy-wasm:
- v0.1
x-ratelimit-limit-requests:
- '30000'
x-ratelimit-limit-tokens:
- '150000000'
x-ratelimit-remaining-requests:
- '29999'
x-ratelimit-remaining-tokens:
- '149999995'
x-ratelimit-reset-requests:
- 2ms
x-ratelimit-reset-tokens:
- 0s
x-request-id:
- req_a9760521835e41508e5577e15c0dc217
status:
code: 200
message: OK
version: 1
Loading