braintrustdata · Abhijeet Prasad (AbhiPrasad) · Apr 27, 2026 · Apr 27, 2026 · Apr 27, 2026 · Apr 27, 2026
diff --git a/.agents/skills/sdk-integrations/SKILL.md b/.agents/skills/sdk-integrations/SKILL.md
@@ -16,8 +16,9 @@ Before editing:
 1. Read the shared integration primitives.
 2. Read the target provider package.
 3. Pick the nearest existing integration as a reference.
-4. Decide the span shape before writing patchers.
-5. Run the narrowest provider nox session first.
+4. Confirm provider versions and nox sessions from source files, not memory.
+5. Decide the span shape before writing patchers.
+6. Run the narrowest provider nox session first.
 
 Do not design a new integration shape from scratch if an existing provider already matches the problem.
 
@@ -29,6 +30,7 @@ Always read:
 - `py/src/braintrust/integrations/versioning.py`
 - `py/src/braintrust/integrations/__init__.py`
 - `py/src/braintrust/integrations/utils.py`
+- `py/pyproject.toml` for provider matrix pins and cassette directory mappings
 - `py/noxfile.py`
 
 Read these when working on an existing integration:
@@ -51,6 +53,25 @@ Read these when relevant:
 
 Do not forget `auto.py` and `auto_test_scripts/`. Import-order and subprocess regressions often only show up there.
 
+## Version And CI Routing
+
+Do not guess which provider versions or sessions apply.
+
+Use these files as the routing chain:
+
+- `py/pyproject.toml` `[tool.braintrust.matrix]`: supported provider versions and what `latest` resolves to
+- `py/pyproject.toml` `[tool.braintrust.cassette-dirs]`: versioned cassette directory ownership
+- `py/src/braintrust/integrations/versioning.py`: supported version helpers and gates
+- `py/noxfile.py`: actual session names, package installation, and `BRAINTRUST_TEST_PACKAGE_VERSION`
+- `.github/workflows/checks.yaml`: CI matrix and which sessions run in shards or static checks
+
+When changing version-gated behavior:
+
+1. Identify every matrix version for the provider.
+2. Check whether the integration has `min_version`, `max_version`, `superseded_by`, or feature-detection branches.
+3. Test the narrowest affected version first.
+4. Add or update cassettes only for versions whose observable provider behavior intentionally changed.
+
 ## Pick A Reference
 
 Start from the nearest current integration:
@@ -121,6 +142,16 @@ Do not start by wiring wrappers and only later decide what the span should conta
 3. Add the integration import near the other integration imports.
 4. Add or update the relevant subprocess auto-instrument test.
 
+### Setup, manual wrapping, and auto-instrument
+
+Treat these as distinct entry points:
+
+- `setup_<provider>()`: explicit package-level patching
+- public `wrap_*()` helpers: manual wrapping of a provided class, function, or client
+- `auto_instrument()`: import-order-sensitive discovery and setup
+
+When changing one entry point, check whether the other two should keep equivalent span behavior. If `auto_instrument()` changes or could be affected by import timing, validate it with a subprocess test instead of only calling the integration in-process.
+
 ## Package Layout Rules
 
 Keep provider-specific behavior in `py/src/braintrust/integrations/<provider>/`.
@@ -171,15 +202,25 @@ Use this rubric:
 - `metrics`: timing and numeric accounting such as token counts or elapsed time
 - `error`: exceptions or failure information
 
+Avoid double-counting token metrics:
+
+- the integration that directly owns the model/provider API response should own token accounting
+- orchestration/framework integrations should usually not log token metrics when underlying provider integrations can create leaf spans with usage metrics
+- do not add fragile provider-specific ownership checks such as "if OpenAI is patched, skip metrics"; prefer a clear span ownership rule instead
+
 Good span shaping usually means:
 
 - flatten positional arguments into named fields
-- normalize provider SDK objects into dicts, lists, or scalars
+- normalize provider SDK objects into dicts, lists, or scalars when that improves readability
 - drop duplicate or noisy transport fields
 - aggregate streaming chunks into one final `output` plus stream-specific `metrics`
 
+Do not over-serialize in integration code. Braintrust handles serialization when sending/logging spans, so integration tracing helpers usually only need to shape readable Python dicts/lists/scalars and materialize attachments where appropriate. Avoid unnecessary JSON dumps/loads, recursive conversion, or stringification just to make values serializable.
+
 Keep wrapper bodies thin: prepare traced input, open the span, call the provider, normalize the result, and log `output`/`metadata`/`metrics`.
 
+Braintrust span logging methods are boundary-safe and should not throw during normal integration use. Do not wrap `span.log(...)`, `span.set_attributes(...)`, or similar Braintrust span methods in broad `try`/`except` blocks. Only catch exceptions around provider calls or around integration-owned conversion code when there is a specific expected failure mode and a clear fallback.
+
 Prefer provider-local helpers in `tracing.py`, for example:
 
 ```python
@@ -321,4 +362,8 @@ Avoid these failures:
 - re-recording cassettes when behavior did not intentionally change
 - adding a custom `_instrument_*` helper where `_instrument_integration()` already fits
 - forgetting `target_module` for deep or optional patch targets
+- double-counting token metrics in both orchestration/framework spans and provider leaf spans
+- adding provider-specific token ownership detection instead of defining clear metric ownership for the integration
+- doing excessive serialization/stringification in tracing code even though Braintrust serializes span payloads at send/log time
+- wrapping Braintrust span logging methods in broad `try`/`except` blocks even though those methods are designed not to throw
 - forcing non-image attachments through `image_url` shims, dropping unrecognized file inputs, or re-serializing non-attachment values while materializing payloads
diff --git a/.agents/skills/sdk-integrations/agents/openai.yaml b/.agents/skills/sdk-integrations/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "SDK Integrations"
+  short_description: "Build Braintrust SDK integrations"
+  default_prompt: "Use $sdk-integrations to add or update a Braintrust Python SDK provider integration."
diff --git a/README.md b/README.md
@@ -62,6 +62,7 @@ BRAINTRUST_API_KEY=<YOUR_API_KEY> braintrust eval tutorial_eval.py
 | [Google ADK](py/src/braintrust/integrations/adk/) | Yes | `google-adk>=1.14.1` |
 | [Pydantic AI](py/src/braintrust/integrations/pydantic_ai/) | Yes | `pydantic_ai>=1.10.0` |
 | [LangChain](py/src/braintrust/integrations/langchain/) | Yes | `langchain-core>=0.3.28` |
+| [LlamaIndex](py/src/braintrust/integrations/llamaindex/) | Yes | `llama-index-core>=0.13.0` |
 | [DSPy](py/src/braintrust/integrations/dspy/) | Yes | `dspy>=2.6.0` |
 | [OpenAI Agents](py/src/braintrust/integrations/openai_agents/) | Yes | `openai-agents>=0.0.19` |
 | [Claude Agent SDK](py/src/braintrust/integrations/claude_agent_sdk/) | Yes | `claude_agent_sdk>=0.1.10` |

diff --git a/py/noxfile.py b/py/noxfile.py
@@ -397,6 +397,21 @@ def test_langchain(session, version):
     _run_tests(session, f"{INTEGRATION_DIR}/langchain/test_anthropic.py", version=version)
 
 
+LLAMAINDEX_VERSIONS = _get_matrix_versions("llama-index-core")
+
+
+@nox.session()
+@nox.parametrize("version", LLAMAINDEX_VERSIONS, ids=LLAMAINDEX_VERSIONS)
+def test_llamaindex(session, version):
+    _install_test_deps(session)
+    _install_group_locked(session, "test-llamaindex")
+    _install_matrix_dep(session, "llama-index-core", version)
+    # These packages are tightly version-coupled to llama-index-core, so we
+    # install them unpinned and let pip resolve compatible versions.
+    session.install("llama-index-llms-openai", "llama-index-embeddings-openai", silent=SILENT_INSTALLS)
+    _run_tests(session, f"{INTEGRATION_DIR}/llamaindex/test_llamaindex.py", version=version)
+
+
 OPENROUTER_VERSIONS = _get_matrix_versions("openrouter")
 
 

diff --git a/py/pyproject.toml b/py/pyproject.toml
@@ -157,6 +157,10 @@ test-crewai = [
     "litellm==1.83.10",
 ]
 
+test-llamaindex = [
+    {include-group = "test"},
+]
+
 test-cli = [
     {include-group = "test"},
     "httpx==0.28.1",
@@ -208,6 +212,9 @@ lint = [
     "langchain-core",
     "langchain-openai",
     "langchain-anthropic",
+    "llama-index-core",
+    "llama-index-llms-openai",
+    "llama-index-embeddings-openai",
 ]
 
 # -- Build deps ----------------------------------------------------------------
@@ -351,6 +358,10 @@ latest = "google-adk==1.31.1"
 latest = "langchain-core==1.3.1"
 "0.3.28" = "langchain-core==0.3.28"
 
+[tool.braintrust.matrix.llama-index-core]
+latest = "llama-index-core==0.14.21"
+"0.13.0" = "llama-index-core==0.13.0"
+
 [tool.braintrust.matrix.openrouter]
 latest = "openrouter==0.9.1"
 "0.6.0" = "openrouter==0.6.0"
@@ -401,6 +412,7 @@ dspy = ["dspy"]
 google_genai = ["google-genai"]
 langchain = ["langchain-core"]
 litellm = ["litellm"]
+llamaindex = ["llama-index-core"]
 mistral = ["mistralai"]
 openai = ["openai"]
 openai_agents = ["openai-agents"]

diff --git a/py/src/braintrust/auto.py b/py/src/braintrust/auto.py
@@ -20,6 +20,7 @@
     GoogleGenAIIntegration,
     LangChainIntegration,
     LiteLLMIntegration,
+    LlamaIndexIntegration,
     MistralIntegration,
     OpenAIAgentsIntegration,
     OpenAIIntegration,
@@ -61,6 +62,7 @@ def auto_instrument(
     dspy: bool = True,
     adk: bool = True,
     langchain: bool = True,
+    llamaindex: bool = True,
     openai_agents: bool = True,
     cohere: bool = True,
     autogen: bool = True,
@@ -90,6 +92,7 @@ def auto_instrument(
         dspy: Enable DSPy instrumentation (default: True)
         adk: Enable Google ADK instrumentation (default: True)
         langchain: Enable LangChain instrumentation (default: True)
+        llamaindex: Enable LlamaIndex instrumentation (default: True)
         openai_agents: Enable OpenAI Agents SDK instrumentation (default: True)
         cohere: Enable Cohere instrumentation (default: True)
         autogen: Enable AutoGen instrumentation (default: True)
@@ -168,6 +171,8 @@ def auto_instrument(
         results["adk"] = _instrument_integration(ADKIntegration)
     if langchain:
         results["langchain"] = _instrument_integration(LangChainIntegration)
+    if llamaindex:
+        results["llamaindex"] = _instrument_integration(LlamaIndexIntegration)
     if openai_agents:
         results["openai_agents"] = _instrument_integration(OpenAIAgentsIntegration)
     if cohere:

diff --git a/py/src/braintrust/integrations/__init__.py b/py/src/braintrust/integrations/__init__.py
@@ -10,6 +10,7 @@
 from .google_genai import GoogleGenAIIntegration
 from .langchain import LangChainIntegration
 from .litellm import LiteLLMIntegration
+from .llamaindex import LlamaIndexIntegration
 from .mistral import MistralIntegration
 from .openai import OpenAIIntegration
 from .openai_agents import OpenAIAgentsIntegration
@@ -31,6 +32,7 @@
     "GoogleGenAIIntegration",
     "LiteLLMIntegration",
     "LangChainIntegration",
+    "LlamaIndexIntegration",
     "MistralIntegration",
     "OpenAIIntegration",
     "OpenAIAgentsIntegration",

diff --git a/py/src/braintrust/integrations/llamaindex/__init__.py b/py/src/braintrust/integrations/llamaindex/__init__.py
@@ -0,0 +1,44 @@
+"""Braintrust integration for LlamaIndex."""
+
+from braintrust.logger import NOOP_SPAN, current_span, init_logger
+
+from .integration import LlamaIndexIntegration
+
+
+_IMPORT_ERROR: ImportError | None = None
+try:
+    from .tracing import BraintrustSpanHandler as _BraintrustSpanHandler
+except ImportError as exc:
+    _IMPORT_ERROR = exc
+    _BraintrustSpanHandler = None
+
+
+if _BraintrustSpanHandler is None:
+
+    class BraintrustSpanHandler:  # type: ignore[no-redef]
+        def __init__(self, *args, **kwargs):
+            message = "llama-index-core is required for braintrust.integrations.llamaindex"
+            if _IMPORT_ERROR is not None:
+                raise ImportError(message) from _IMPORT_ERROR
+            raise ImportError(message)
+
+else:
+    BraintrustSpanHandler = _BraintrustSpanHandler
+
+
+__all__ = [
+    "BraintrustSpanHandler",
+    "LlamaIndexIntegration",
+    "setup_llamaindex",
+]
+
+
+def setup_llamaindex(
+    api_key: str | None = None,
+    project_id: str | None = None,
+    project_name: str | None = None,
+) -> bool:
+    if current_span() == NOOP_SPAN:
+        init_logger(project=project_name, api_key=api_key, project_id=project_id)
+
+    return LlamaIndexIntegration.setup()
diff --git a/py/src/braintrust/integrations/llamaindex/cassettes/0.13.0/test_async_llm_chat.yaml b/py/src/braintrust/integrations/llamaindex/cassettes/0.13.0/test_async_llm_chat.yaml
@@ -0,0 +1,108 @@
+interactions:
+- request:
+    body: '{"messages":[{"role":"user","content":"Say hello"}],"model":"gpt-4o-mini","stream":false,"temperature":0.0}'
+    headers:
+      Accept:
+      - application/json
+      Accept-Encoding:
+      - gzip, deflate
+      Connection:
+      - keep-alive
+      Content-Length:
+      - '107'
+      Content-Type:
+      - application/json
+      Host:
+      - api.openai.com
+      User-Agent:
+      - AsyncOpenAI/Python 2.32.0
+      X-Stainless-Arch:
+      - arm64
+      X-Stainless-Async:
+      - async:asyncio
+      X-Stainless-Lang:
+      - python
+      X-Stainless-OS:
+      - MacOS
+      X-Stainless-Package-Version:
+      - 2.32.0
+      X-Stainless-Runtime:
+      - CPython
+      X-Stainless-Runtime-Version:
+      - 3.12.12
+      x-stainless-read-timeout:
+      - '60.0'
+      x-stainless-retry-count:
+      - '0'
+    method: POST
+    uri: https://api.openai.com/v1/chat/completions
+  response:
+    body:
+      string: "{\n  \"id\": \"chatcmpl-DZMXgzDrRYSWAgIsCaDSh9YR44sVv\",\n  \"object\":
+        \"chat.completion\",\n  \"created\": 1777320504,\n  \"model\": \"gpt-4o-mini-2024-07-18\",\n
+        \ \"choices\": [\n    {\n      \"index\": 0,\n      \"message\": {\n        \"role\":
+        \"assistant\",\n        \"content\": \"Hello! How can I assist you today?\",\n
+        \       \"refusal\": null,\n        \"annotations\": []\n      },\n      \"logprobs\":
+        null,\n      \"finish_reason\": \"stop\"\n    }\n  ],\n  \"usage\": {\n    \"prompt_tokens\":
+        9,\n    \"completion_tokens\": 9,\n    \"total_tokens\": 18,\n    \"prompt_tokens_details\":
+        {\n      \"cached_tokens\": 0,\n      \"audio_tokens\": 0\n    },\n    \"completion_tokens_details\":
+        {\n      \"reasoning_tokens\": 0,\n      \"audio_tokens\": 0,\n      \"accepted_prediction_tokens\":
+        0,\n      \"rejected_prediction_tokens\": 0\n    }\n  },\n  \"service_tier\":
+        \"default\",\n  \"system_fingerprint\": \"fp_de7acce317\"\n}\n"
+    headers:
+      CF-Cache-Status:
+      - DYNAMIC
+      CF-Ray:
+      - 9f3075808c288bf1-YYZ
+      Connection:
+      - keep-alive
+      Content-Type:
+      - application/json
+      Date:
+      - Mon, 27 Apr 2026 20:08:24 GMT
+      Server:
+      - cloudflare
+      Strict-Transport-Security:
+      - max-age=31536000; includeSubDomains; preload
+      Transfer-Encoding:
+      - chunked
+      X-Content-Type-Options:
+      - nosniff
+      access-control-expose-headers:
+      - X-Request-ID
+      alt-svc:
+      - h3=":443"; ma=86400
+      content-length:
+      - '839'
+      openai-organization:
+      - braintrust-data
+      openai-processing-ms:
+      - '398'
+      openai-project:
+      - proj_vsCSXafhhByzWOThMrJcZiw9
+      openai-version:
+      - '2020-10-01'
+      set-cookie:
+      - __cf_bm=mjyBnpKLMI5D5WS40p8E1mQN7oHy_1jBrgFsvBqOkjU-1777320504.4091883-1.0.1.1-lvUez.SCj3y88biMJrRYWMfbn_h5skCWrne3feI6DevrpBzvCwQfRad2AqWQTlIlJGR8UqIBJcFH2PkX1uwo5ovDaUM9LSE6FCmq8vFh7NoyYLXP60YPb3h3qHmvY9Up;
+        HttpOnly; Secure; Path=/; Domain=api.openai.com; Expires=Mon, 27 Apr 2026
+        20:38:24 GMT
+      x-openai-proxy-wasm:
+      - v0.1
+      x-ratelimit-limit-requests:
+      - '30000'
+      x-ratelimit-limit-tokens:
+      - '150000000'
+      x-ratelimit-remaining-requests:
+      - '29999'
+      x-ratelimit-remaining-tokens:
+      - '149999995'
+      x-ratelimit-reset-requests:
+      - 2ms
+      x-ratelimit-reset-tokens:
+      - 0s
+      x-request-id:
+      - req_a9760521835e41508e5577e15c0dc217
+    status:
+      code: 200
+      message: OK
+version: 1