# OpenTelemetry for AI Systems: A Practical Guide

## Why Your AI Systems Need Observability

AI systems built with Large Language Models (LLMs) present unique challenges that traditional observability tools weren't designed to handle:

1. **Non-deterministic behavior** - The same input can produce different outputs
2. **Complex reasoning chains** - Multi-step processes with branching decision paths
3. **Unpredictable execution** - Agents may take different approaches each time
4. **Tool usage patterns** - Interactions with external systems that impact results
5. **Agent collaboration** - Sub-agents working together with complex delegation

Without proper observability, debugging becomes nearly impossible:

```
User: "Why did my agent give the wrong answer?"
Developer without observability: "Let me dig through 500 pages of LLM output..."
Developer with observability: "I can see it used the wrong tool here, then misinterpreted the result."
```

## The Journey: From Zero to Hero with OpenTelemetry

We'll build this in stages, with value at each step:

1. **Quick Win**: Basic collector setup with TraceZ visualization
2. **Level Up**: Basic collector setup using Jaeger for improved tracing visualalization
3. **Pro Level**: Advanced visualization using Docker Compose

Let's get started!

## Stage 1: Quick Win - Basic Setup with TraceZ

### Step 1: Start the OpenTelemetry Collector

Run this single command to get a collector up and running:

```bash
docker run --rm \
  -p 127.0.0.1:4317:4317 \
  -p 127.0.0.1:4318:4318 \
  -p 127.0.0.1:55679:55679 \
  otel/opentelemetry-collector-contrib:0.121.0
```

This starts a collector that:
- Listens for gRPC data on port 4317
- Listens for HTTP data on port 4318
- Provides TraceZ visualization on port 55679

### Step 2: Instrument Your SmolAgents Application

#### Install Python Libraries

In [1]:
%pip install -q 'smolagents[telemetry]' opentelemetry-sdk opentelemetry-exporter-otlp openinference-instrumentation-smolagents


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


#### Setup Environment Variables


In [2]:
import os

# Set your Hugging Face API token
os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")

# Configure environment variables for OpenTelemetry Endpoint
OTEL_COLLECTOR_HOST='localhost'
OTEL_COLLECTOR_PORT_GRPC=4317

os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = f"http://{OTEL_COLLECTOR_HOST}:{OTEL_COLLECTOR_PORT_GRPC}"

# Other environment variables remain the same
os.environ["OTEL_RESOURCE_ATTRIBUTES"] = "service.namespace=smolagents-demo,service.name=smolagent"
os.environ["OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE"] = "cumulative"

In [3]:
# Import OpenTelemetry modules
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from openinference.instrumentation.smolagents import SmolagentsInstrumentor

# Configure OpenTelemetry
trace_provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(insecure=True))
trace_provider.add_span_processor(processor)

# Instrument SmolAgents
SmolagentsInstrumentor().instrument(tracer_provider=trace_provider)

  from .autonotebook import tqdm as notebook_tqdm


### Step 3: Run a Test and See Results

Run your SmolAgents application:

In [4]:
from smolagents import CodeAgent, HfApiModel

model = HfApiModel()
agent = CodeAgent(tools=[], model=model, add_base_tools=True)

agent.run(
    "Could you give me the 118th number in the Fibonacci sequence?",
)

1264937032042997393488322

### Step 4: View the Results in TraceZ

Open your browser and go to: [http://localhost:55679/debug/tracez](http://localhost:55679/debug/tracez)

You'll see your agent runs visualized! Click on any trace to see high level trace information, but the details of what happened are captured in the console.

**Congratulations!** You now have basic observability for your AI system.

## Alternate scenario number 1 - with Jaeger all-in-one

### Step 1: Start the OpenTelemetry Collector

Run this single command to get a collector up and running

NOTE:  this will support OpenTelemetry traces only

```bash
docker run --rm --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 5778:5778 \
  -p 9411:9411 \
  jaegertracing/jaeger:2.4.0
```

This starts a collector that:
- Listens for gRPC data on port 4317
- Listens for HTTP data on port 4318
- Provides Jaeger visualization on port 16686

### Steps 2 and 3: Instrument and Run Your SmolAgents Application

- these remain unchanged...

### Step 4: View the Results in Jaeger

Open your browser and go to: [http://localhost:16686](http://localhost:16686)

## Alternate scenario number 2 - with Docker Compose

- follow instructions in the local [otel-platform README](../otel-platform/README.md)

*Good news!* - everything else remains the same, including the URL to view results in Jaeger!