Skip to content

Observability integrations (umbrella): W&B, MLflow, Langfuse, OpenTelemetry, Phoenix #52

@bordeauxred

Description

@bordeauxred

Why

ClawLoop already emits structured episodes, reward signals, and layer-state transitions. Teams running it in production or research invariably have an observability stack they want those signals landing in. Shipping first-class sinks for the common ones makes ClawLoop feel native in existing workflows instead of yet another dashboard to check.

Each item below is a small, self-contained adapter with a clear contract — ideal entry points for first-time contributors.

Integration stubs

  • Weights & Biases sink — log per-iteration reward curves, playbook growth, layer state hashes. Pattern: clawloop.integrations.wandb.WandbSink(run_id=...) consuming the existing episode stream.
  • MLflow tracking — iterations as runs, playbook entries as artifacts, reward signals as metrics. Same shape as W&B, different backend.
  • Langfuse trace export — emit episode messages + tool calls as Langfuse traces so LLM-observability users can search/replay inside their existing UI.
  • OpenTelemetry spans — one span per episode, nested spans per step/tool-call. Lets users pipe into any OTel-compatible backend (Honeycomb, Datadog, Tempo, etc.) without a bespoke adapter.
  • Arize Phoenix export — trace + evaluation shape for teams already using Phoenix for LLM eval.

Contract

Each sink should:

  1. Consume the existing Episode / EpisodeSummary / iteration-level events — no core changes.
  2. Be an optional extra: uv sync --extra wandb etc., so core stays dependency-light.
  3. Ship with a minimal example under examples/observability/ and a one-paragraph README section.
  4. Fail soft — a broken sink never breaks a training run.

Why an umbrella?

Each integration is ~a day of work and independent of the others. Tracking them together shows intent; splitting them off keeps PRs reviewable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgood first issueGood for newcomersroadmapFuture direction; not a launch blocker

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions