Skip to content

[bot] Add Replicate Python SDK integration for model run, stream, and async execution instrumentation #441

@braintrust-bot

Description

@braintrust-bot

Summary

The Replicate Python SDK (replicate) is the official client for the Replicate platform, which hosts 100k+ machine learning models for language generation, image synthesis, video, audio, and more. Its primary execution surface is replicate.run(model, input={}) and replicate.async_run() — a platform-agnostic model execution API distinct from any provider-specific format. This repository has zero instrumentation for any Replicate SDK surface — no integration directory, no wrapper, no patcher, no auto_instrument() support.

The Replicate API is fundamentally different from OpenAI-style APIs: there is no chat.completions.create() shape, model identifiers are version strings (e.g. "meta/llama-3-70b-instruct:..." or "owner/model"), and streaming is token-by-token via an iterator. wrap_openai() cannot be used with the Replicate client. Users who follow Replicate's official documentation and pip install replicate get zero Braintrust tracing.

What needs to be instrumented

The replicate package exposes these execution surfaces via module-level functions and the Client class, none of which are instrumented:

Model execution (highest priority)

SDK Method Description Streaming Return type
replicate.run(model, input) Sync model execution — runs a model version and returns the complete output stream=True yields ServerSentEvent objects str, list, or model-specific output
replicate.async_run(model, input) Async model execution stream=True yields ServerSentEvent Same as sync
replicate.stream(model, input) Sync streaming execution — returns an iterator of output tokens/chunks Always streaming Iterator[ServerSentEvent]
replicate.async_stream(model, input) Async streaming execution Always streaming AsyncIterator[ServerSentEvent]

Span shape for language models: The model parameter (a version string) maps to span metadata model. Inputs are the input dict (contains prompt, system_prompt, max_tokens, etc. depending on model). Output is the concatenated string of streamed tokens. Token usage is not returned by the standard API (models expose usage differently), so metrics may require best-effort extraction from model-specific output.

Predictions API (lower priority)

SDK Method Description Return type
replicate.predictions.create() Create a model prediction (lower-level than run()) Prediction
replicate.predictions.get() Poll a prediction by ID Prediction

Deployments (lower priority)

SDK Method Description
replicate.deployments.predictions.create() Run a model via a named deployment

All module-level functions delegate to a default Client instance. Client and AsyncClient have corresponding instance methods.

Implementation notes

Model ID format: replicate.run() accepts model as either "owner/model" (latest version) or "owner/model:version_hash". The integration should extract and log both the model name and version.

No single response type: Unlike OpenAI where all chat completions return ChatCompletion, Replicate output varies by model (strings, lists of strings, dicts, file URLs). The integration should log the raw output and handle streaming aggregation generically.

Streaming: stream=True in run() and the dedicated stream() function yield ServerSentEvent objects with data fields. Accumulated output should be logged as the span output.

Async first-class: The SDK has AsyncClient with async_run() and async_stream() — both must be instrumented.

input dict as span input: The input dict is model-specific but typically contains prompt, system_prompt, max_tokens, temperature, etc. for language models.

No coverage in any instrumentation layer

  • No integration directory (py/src/braintrust/integrations/replicate/)
  • No wrapper function (e.g. wrap_replicate())
  • No patcher in any existing integration
  • No nox test session (test_replicate)
  • No version entry in py/src/braintrust/integrations/versioning.py
  • No mention in py/src/braintrust/integrations/__init__.py
  • No entry in [tool.braintrust.matrix] in py/pyproject.toml

A grep for replicate across py/src/braintrust/ returns zero matches in integration code.

Braintrust docs status

not_found — Replicate is not listed on the Braintrust integrations directory or the tracing guide.

Upstream references

Local repo files inspected

  • py/src/braintrust/integrations/ — no replicate/ directory exists on main
  • py/src/braintrust/wrappers/ — no Replicate wrapper
  • py/noxfile.py — no test_replicate session
  • py/src/braintrust/integrations/__init__.py — Replicate not listed in integration registry
  • py/src/braintrust/integrations/versioning.py — no Replicate version matrix
  • py/pyproject.toml — no Replicate entries in [tool.braintrust.matrix]
  • Full repo grep for "replicate" across py/src/braintrust/ — zero matches

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions