[bot] Add Replicate Python SDK integration for model run, stream, and async execution instrumentation

## Summary

The Replicate Python SDK (`replicate`) is the official client for the Replicate platform, which hosts 100k+ machine learning models for language generation, image synthesis, video, audio, and more. Its primary execution surface is `replicate.run(model, input={})` and `replicate.async_run()` — a platform-agnostic model execution API distinct from any provider-specific format. This repository has zero instrumentation for any Replicate SDK surface — no integration directory, no wrapper, no patcher, no `auto_instrument()` support.

The Replicate API is fundamentally different from OpenAI-style APIs: there is no `chat.completions.create()` shape, model identifiers are version strings (e.g. `"meta/llama-3-70b-instruct:..."` or `"owner/model"`), and streaming is token-by-token via an iterator. `wrap_openai()` cannot be used with the Replicate client. Users who follow Replicate's official documentation and `pip install replicate` get zero Braintrust tracing.

## What needs to be instrumented

The `replicate` package exposes these execution surfaces via module-level functions and the `Client` class, none of which are instrumented:

### Model execution (highest priority)

| SDK Method | Description | Streaming | Return type |
|---|---|---|---|
| `replicate.run(model, input)` | Sync model execution — runs a model version and returns the complete output | `stream=True` yields `ServerSentEvent` objects | `str`, `list`, or model-specific output |
| `replicate.async_run(model, input)` | Async model execution | `stream=True` yields `ServerSentEvent` | Same as sync |
| `replicate.stream(model, input)` | Sync streaming execution — returns an iterator of output tokens/chunks | Always streaming | `Iterator[ServerSentEvent]` |
| `replicate.async_stream(model, input)` | Async streaming execution | Always streaming | `AsyncIterator[ServerSentEvent]` |

**Span shape for language models:** The `model` parameter (a version string) maps to span metadata `model`. Inputs are the `input` dict (contains `prompt`, `system_prompt`, `max_tokens`, etc. depending on model). Output is the concatenated string of streamed tokens. Token usage is not returned by the standard API (models expose usage differently), so metrics may require best-effort extraction from model-specific output.

### Predictions API (lower priority)

| SDK Method | Description | Return type |
|---|---|---|
| `replicate.predictions.create()` | Create a model prediction (lower-level than `run()`) | `Prediction` |
| `replicate.predictions.get()` | Poll a prediction by ID | `Prediction` |

### Deployments (lower priority)

| SDK Method | Description |
|---|---|
| `replicate.deployments.predictions.create()` | Run a model via a named deployment |

All module-level functions delegate to a default `Client` instance. `Client` and `AsyncClient` have corresponding instance methods.

## Implementation notes

**Model ID format:** `replicate.run()` accepts `model` as either `"owner/model"` (latest version) or `"owner/model:version_hash"`. The integration should extract and log both the model name and version.

**No single response type:** Unlike OpenAI where all chat completions return `ChatCompletion`, Replicate output varies by model (strings, lists of strings, dicts, file URLs). The integration should log the raw output and handle streaming aggregation generically.

**Streaming:** `stream=True` in `run()` and the dedicated `stream()` function yield `ServerSentEvent` objects with `data` fields. Accumulated output should be logged as the span output.

**Async first-class:** The SDK has `AsyncClient` with `async_run()` and `async_stream()` — both must be instrumented.

**`input` dict as span input:** The `input` dict is model-specific but typically contains `prompt`, `system_prompt`, `max_tokens`, `temperature`, etc. for language models.

## No coverage in any instrumentation layer

- No integration directory (`py/src/braintrust/integrations/replicate/`)
- No wrapper function (e.g. `wrap_replicate()`)
- No patcher in any existing integration
- No nox test session (`test_replicate`)
- No version entry in `py/src/braintrust/integrations/versioning.py`
- No mention in `py/src/braintrust/integrations/__init__.py`
- No entry in `[tool.braintrust.matrix]` in `py/pyproject.toml`

A grep for `replicate` across `py/src/braintrust/` returns zero matches in integration code.

## Braintrust docs status

`not_found` — Replicate is not listed on the [Braintrust integrations directory](https://www.braintrust.dev/docs/guides/tracing/integrations) or the [tracing guide](https://www.braintrust.dev/docs/guides/tracing).

## Upstream references

- Replicate Python SDK on PyPI: https://pypi.org/project/replicate/
- Replicate Python SDK on GitHub: https://github.com/replicate/replicate-python
- Replicate API reference: https://replicate.com/docs/reference/http
- Replicate Python client docs: https://replicate.com/docs/get-started/python
- Replicate streaming guide: https://replicate.com/docs/topics/streaming

## Local repo files inspected

- `py/src/braintrust/integrations/` — no `replicate/` directory exists on `main`
- `py/src/braintrust/wrappers/` — no Replicate wrapper
- `py/noxfile.py` — no `test_replicate` session
- `py/src/braintrust/integrations/__init__.py` — Replicate not listed in integration registry
- `py/src/braintrust/integrations/versioning.py` — no Replicate version matrix
- `py/pyproject.toml` — no Replicate entries in `[tool.braintrust.matrix]`
- Full repo grep for "replicate" across `py/src/braintrust/` — zero matches

SDK Method	Description	Return type
`replicate.predictions.create()`	Create a model prediction (lower-level than `run()`)	`Prediction`
`replicate.predictions.get()`	Poll a prediction by ID	`Prediction`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Add Replicate Python SDK integration for model run, stream, and async execution instrumentation #441

Summary

What needs to be instrumented

Model execution (highest priority)

Predictions API (lower priority)

Deployments (lower priority)

Implementation notes

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

SDK Method	Description	Streaming	Return type
`replicate.run(model, input)`	Sync model execution — runs a model version and returns the complete output	`stream=True` yields `ServerSentEvent` objects	`str`, `list`, or model-specific output
`replicate.async_run(model, input)`	Async model execution	`stream=True` yields `ServerSentEvent`	Same as sync
`replicate.stream(model, input)`	Sync streaming execution — returns an iterator of output tokens/chunks	Always streaming	`Iterator[ServerSentEvent]`
`replicate.async_stream(model, input)`	Async streaming execution	Always streaming	`AsyncIterator[ServerSentEvent]`

[bot] Add Replicate Python SDK integration for model run, stream, and async execution instrumentation #441

Description

Summary

What needs to be instrumented

Model execution (highest priority)

Predictions API (lower priority)

Deployments (lower priority)

Implementation notes

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions