# Traceloop Tutorial: Complete Guide to LLM Observability

## Introduction

Traceloop is an open-source LLM observability platform that monitors what your model says, how fast it responds, and when things start to slip — so you can debug faster and deploy safely. It provides real-time alerts about your model's quality, execution tracing for every request, and helps you gradually rollout changes to models and prompts.

### What is OpenLLMetry?

OpenLLMetry is a set of extensions built on top of OpenTelemetry that gives you complete observability over your LLM application. It's non-intrusive and can be connected to your existing observability solutions like Datadog, Honeycomb, and others.

### Key Features

- **One-line setup**: Get instant monitoring with minimal code changes
- **Multi-provider support**: Supports 20+ providers (OpenAI, Anthropic, Gemini, Bedrock, Ollama), vector DBs (Pinecone, Chroma), and frameworks like LangChain, LlamaIndex, and CrewAI
- **Quality evaluation**: Built-in metrics for faithfulness, relevance, and safety
- **Custom evaluators**: Define what quality means for your specific use case
- **OpenTelemetry compatibility**: Integrates with existing observability stacks

## Installation and Setup

In [None]:
# Install required packages
!pip install traceloop-sdk openai python-dotenv

In [None]:
# Import necessary libraries
import os
from dotenv import load_dotenv
from openai import OpenAI
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow

# Load environment variables
load_dotenv()

# Set your API keys
os.environ["OPENAI_API_KEY"] = "your-openai-api-key-here"
# Optional: Set Traceloop API key for cloud dashboard
# os.environ["TRACELOOP_API_KEY"] = "your-traceloop-api-key"

## Basic Setup and Initialization

In [None]:
# Initialize Traceloop - this enables automatic tracing
Traceloop.init(
    app_name="traceloop_tutorial",
    disable_batch=True  # For immediate trace visibility in notebooks
)

print("✅ Traceloop initialized successfully!")
print("📊 Dashboard will be available after running LLM calls")

## Core Concept 1: Basic LLM Tracing

Traceloop automatically instruments popular LLM providers. No additional code changes needed!

In [None]:
# Create OpenAI client - this will be automatically instrumented
client = OpenAI()

# Simple LLM call - automatically traced
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain LLM observability in one sentence."}
    ],
    temperature=0.7,
    max_tokens=100
)

print("Response:", response.choices[0].message.content)
print("\n🔍 This call was automatically traced by Traceloop!")

## Core Concept 2: Custom Workflows with Decorators

Use `@workflow` decorator to trace complex functions and get better insights into your application logic.

In [None]:
@workflow(name="story_generator")
def generate_story(theme, length="short"):
    """Generate a story with custom workflow tracing"""
    
    # This entire function will be traced as a single workflow
    prompt = f"Write a {length} story about {theme}. Make it engaging and creative."
    
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a creative storyteller."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.9,
        max_tokens=200
    )
    
    return response.choices[0].message.content

# Test the workflow
story = generate_story("artificial intelligence", "medium")
print("Generated Story:")
print(story)
print("\n📈 This workflow is now traceable in your dashboard!")

## Core Concept 3: Multi-Step Workflows

Track complex pipelines with multiple LLM calls and processing steps.

In [None]:
@workflow(name="content_analysis_pipeline")
def analyze_content(text):
    """Multi-step content analysis pipeline"""
    
    # Step 1: Sentiment Analysis
    sentiment_response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Analyze sentiment. Respond with: Positive, Negative, or Neutral."},
            {"role": "user", "content": f"Text: {text}"}
        ],
        max_tokens=10
    )
    sentiment = sentiment_response.choices[0].message.content.strip()
    
    # Step 2: Key Topics Extraction
    topics_response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Extract 3 main topics. Return as comma-separated list."},
            {"role": "user", "content": f"Text: {text}"}
        ],
        max_tokens=50
    )
    topics = topics_response.choices[0].message.content.strip()
    
    # Step 3: Summary Generation
    summary_response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Create a brief summary in 2-3 sentences."},
            {"role": "user", "content": f"Text: {text}"}
        ],
        max_tokens=100
    )
    summary = summary_response.choices[0].message.content.strip()
    
    return {
        "sentiment": sentiment,
        "topics": topics,
        "summary": summary
    }

# Test the pipeline
sample_text = "Artificial intelligence is revolutionizing healthcare by enabling faster diagnosis and personalized treatment plans. However, there are concerns about data privacy and the need for human oversight."

analysis = analyze_content(sample_text)
print("Content Analysis Results:")
print(f"Sentiment: {analysis['sentiment']}")
print(f"Topics: {analysis['topics']}")
print(f"Summary: {analysis['summary']}")
print("\n🔗 All steps are traced as a connected workflow!")

## Core Concept 4: Framework Integration (LangChain Example)

Traceloop automatically instruments popular frameworks like LangChain without additional configuration.

In [None]:
# Note: This is a conceptual example. Install langchain if you want to run it:
# !pip install langchain langchain-openai

# Uncomment below to test LangChain integration:
"""
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage

# LangChain LLM - automatically instrumented by Traceloop
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)

# Create messages
messages = [
    SystemMessage(content="You are a helpful coding assistant."),
    HumanMessage(content="Explain the benefits of using observability in ML applications.")
]

# This call will be automatically traced
response = llm(messages)
print("LangChain Response:", response.content)
"""

print("📚 LangChain integration works automatically with Traceloop!")
print("🔧 Just initialize Traceloop and use LangChain normally.")
print("💡 Uncomment the code above to test LangChain tracing.")

## Core Concept 5: Configuration and Advanced Features

Configure Traceloop for different environments and use cases.

In [None]:
# Advanced configuration example
def setup_production_tracing():
    """Example of production-ready Traceloop configuration"""
    
    # Configuration for sending to external observability platform
    config = {
        "app_name": "production_llm_app",
        "api_endpoint": "https://your-otel-collector.com",  # Your OTEL endpoint
        "headers": {
            "Authorization": "Bearer your-token",
            "X-Custom-Header": "production"
        },
        "disable_batch": False,  # Enable batching for production
        "resource_attributes": {
            "service.name": "llm-service",
            "service.version": "1.0.0",
            "environment": "production"
        }
    }
    
    return config

# Environment-specific configuration
def get_traceloop_config(environment="development"):
    """Get environment-specific configuration"""
    
    if environment == "production":
        return setup_production_tracing()
    elif environment == "staging":
        return {
            "app_name": "staging_llm_app",
            "disable_batch": False
        }
    else:  # development
        return {
            "app_name": "dev_llm_app",
            "disable_batch": True  # See traces immediately
        }

# Example usage
dev_config = get_traceloop_config("development")
print("Development Configuration:")
for key, value in dev_config.items():
    print(f"  {key}: {value}")

print("\n⚙️  Configure Traceloop based on your deployment environment!")

## Core Concept 6: Monitoring Key Metrics

Understanding what Traceloop tracks automatically and how to interpret the data.

In [None]:
@workflow(name="metrics_demo")
def demonstrate_metrics():
    """Demonstrate different metrics that Traceloop captures"""
    
    # Different types of calls to show various metrics
    calls_data = []
    
    # Call 1: Short prompt, low temperature
    response1 = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Say hello"}],
        temperature=0.1,
        max_tokens=10
    )
    calls_data.append({"type": "short_precise", "tokens": 10, "temp": 0.1})
    
    # Call 2: Longer prompt, higher temperature
    response2 = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{
            "role": "user", 
            "content": "Write a creative poem about machine learning and observability"
        }],
        temperature=0.9,
        max_tokens=150
    )
    calls_data.append({"type": "long_creative", "tokens": 150, "temp": 0.9})
    
    # Call 3: System + User messages
    response3 = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a technical expert."},
            {"role": "user", "content": "Explain distributed tracing benefits."}
        ],
        temperature=0.5,
        max_tokens=100
    )
    calls_data.append({"type": "system_user", "tokens": 100, "temp": 0.5})
    
    return calls_data

# Run the metrics demonstration
metrics_data = demonstrate_metrics()

print("📊 Metrics Being Tracked by Traceloop:")
print("\n🔍 Automatic Metrics:")
print("  • Latency: Response time for each call")
print("  • Token Usage: Input and output tokens")
print("  • Cost: Estimated cost per call")
print("  • Model Performance: Success/failure rates")
print("  • Prompt-Response Pairs: Complete conversation tracking")

print("\n📈 Quality Metrics:")
print("  • Faithfulness: How well responses match input")
print("  • Relevance: How relevant responses are to prompts")
print("  • Safety: Detection of harmful content")
print("  • Custom Evaluators: Your domain-specific quality measures")

print(f"\n✅ Generated {len(metrics_data)} traced calls with different characteristics!")

## Viewing Your Traces

After running the above examples, you can view your traces in several ways:

### Option 1: Traceloop Cloud Dashboard
- Sign up at [traceloop.com](https://www.traceloop.com)
- Get your API key and set `TRACELOOP_API_KEY`
- Re-run `Traceloop.init()` with your API key

### Option 2: Local Development Dashboard
- Traceloop provides a temporary local dashboard URL when you run traces
- Look for dashboard links in your console output

### Option 3: Export to Existing Tools
- Configure OpenTelemetry endpoint to send to Datadog, Honeycomb, etc.
- Use the `api_endpoint` and `headers` parameters in `Traceloop.init()`

## Best Practices and Tips

### 1. Production Deployment
- Always use `disable_batch=False` in production for better performance
- Set appropriate resource attributes for filtering and organization
- Configure sampling rates for high-traffic applications

### 2. Security Considerations
- Traceloop captures prompts and responses - ensure compliance with data policies
- Use environment variables for API keys
- Consider data retention policies for sensitive information

### 3. Workflow Organization
- Use descriptive names for `@workflow` decorators
- Group related LLM calls into logical workflows
- Add custom attributes to spans for better filtering

### 4. Cost Optimization
- Monitor token usage patterns through traces
- Use temperature and max_tokens strategically
- Track model performance vs. cost trade-offs

## Conclusion

Traceloop provides enterprise-grade LLM observability with just one line of code, helping you monitor, debug, and improve your LLM applications. It's built on OpenTelemetry standards, ensuring compatibility with existing observability stacks while providing LLM-specific insights.

**Next Steps:**
- Explore the [Traceloop documentation](https://www.traceloop.com/docs) for advanced features
- Try integrating with your existing frameworks (LangChain, LlamaIndex)
- Set up custom evaluators for your specific use cases
- Configure alerts and monitoring for production deployments

**Resources:**
- [OpenLLMetry GitHub](https://github.com/traceloop/openllmetry)
- [Traceloop Community Slack](https://traceloop.com/slack)
- [OpenTelemetry Documentation](https://opentelemetry.io/docs/)