# 📊 08: MLflow Observability

Learn how to use MLflow tracing to observe, debug, and optimize your LLM applications with hierarchical execution traces.

## 📋 Learning Objectives

By the end of this notebook, you will be able to:

- [ ] Install and configure MLflow for tracing
- [ ] Start and access the MLflow UI
- [ ] Enable automatic tracing in the Local LLM SDK
- [ ] Understand hierarchical trace structure (CHAIN → LLM → AGENT → TOOL)
- [ ] View and analyze traces in the MLflow UI
- [ ] Use traces for debugging complex agent workflows
- [ ] Identify performance bottlenecks with trace timing data
- [ ] Export and share traces for collaboration

## 🎯 Prerequisites

- Completed notebook 07 (ReACT Agents)
- Understanding of agent execution flow
- Familiarity with tools and multi-step tasks
- LM Studio running with a model that supports function calling

## ⏱️ Estimated Time: 20 minutes

## 1️⃣ What is MLflow Tracing?

**MLflow** is an open-source platform for managing ML workflows. **Tracing** is a feature that records the execution of your LLM applications.

### Why Tracing?

Without tracing:
- ❌ Can't see what the agent did internally
- ❌ Hard to debug when things go wrong
- ❌ No visibility into performance bottlenecks
- ❌ Difficult to optimize complex workflows

With tracing:
- ✅ See every LLM call, tool execution, and decision
- ✅ Visualize hierarchical execution flow
- ✅ Measure timing for each component
- ✅ Debug issues by inspecting inputs/outputs at each step
- ✅ Compare different runs side-by-side

### Hierarchical Trace Structure

```
CHAIN (top-level task)
├── LLM (reasoning step 1)
│   └── TOOL (calculator)
├── LLM (reasoning step 2)
│   ├── TOOL (file_write)
│   └── TOOL (file_read)
└── LLM (reasoning step 3 - conclusion)
```

## 2️⃣ Installing MLflow

First, let's install MLflow if it's not already installed.

In [None]:
# Install MLflow (uncomment if needed)
# !pip install mlflow>=2.13.0

import mlflow

print(f"✅ MLflow version: {mlflow.__version__}")
print("\n💡 MLflow 2.13+ is recommended for best tracing support")

## 3️⃣ Starting the MLflow UI

The MLflow UI provides a visual interface for viewing traces.

**Start MLflow UI in a terminal:**
```bash
mlflow ui --port 5000
```

Then open your browser to: **http://localhost:5000**

💡 **Tip**: Keep the MLflow UI open in a browser tab while running this notebook!

Alternatively, start it from Python (in background):

In [None]:
import subprocess
import time
import os

# Check if MLflow UI is already running
def is_mlflow_running():
    try:
        import requests
        response = requests.get("http://localhost:5000", timeout=1)
        return response.status_code == 200
    except:
        return False

if is_mlflow_running():
    print("✅ MLflow UI is already running at http://localhost:5000")
else:
    print("🚀 Starting MLflow UI...")
    # Start in background (note: this won't work in all environments)
    # It's better to start MLflow UI in a separate terminal
    print("\n⚠️ Please start MLflow UI manually in a terminal:")
    print("   mlflow ui --port 5000")
    print("\n   Then open: http://localhost:5000")

## 4️⃣ Enabling Automatic Tracing

The Local LLM SDK has built-in MLflow tracing support!

In [None]:
from local_llm_sdk import LocalLLMClient
from local_llm_sdk.tools import get_builtin_tools

# Create client with tracing enabled
client = LocalLLMClient(
    base_url="http://169.254.83.107:1234/v1",
    model="your-model-name",
    enable_tracing=True  # Enable MLflow tracing
)

# Register tools
tools = get_builtin_tools()
client.register_tools(tools)

print("✅ Client created with MLflow tracing enabled!")
print("\n💡 All operations will now be traced automatically")

## 5️⃣ Basic Trace Example

Let's make a simple traced call and view it in the UI.

In [None]:
# Simple chat with tracing
response = client.chat("What is 12 multiplied by 15?")

print("💬 Response:")
print(response)
print("\n✅ This call was traced!")
print("\n🔍 View the trace:")
print("   1. Go to http://localhost:5000")
print("   2. Click on 'Traces' in the sidebar")
print("   3. Find the most recent trace")
print("   4. Click to see the execution tree")

**🎯 What you'll see in MLflow UI:**

- A `CHAIN` span for the overall chat call
- An `LLM` span for the model inference
- A `TOOL` span for the calculator tool
- Input/output for each span
- Timing information (duration in ms)
- Token counts and other metadata

## 6️⃣ Agent Trace - The Full Picture

Now let's trace a ReACT agent execution to see the complete hierarchy.

In [None]:
# Run an agent task with tracing
result = client.react(
    "Calculate the factorial of 6, then convert the result to text, "
    "uppercase it, and count how many characters it has.",
    max_steps=10
)

print("🤖 Agent Result:")
print(f"Status: {result.status}")
print(f"Steps: {result.steps_taken}")
print(f"\nFinal answer:\n{result.final_response}")

print("\n" + "="*70)
print("\n🔍 View the hierarchical trace in MLflow UI:")
print("   - CHAIN (agent.react)")
print("     ├── LLM (reasoning step 1)")
print("     │   └── TOOL (execute_python for factorial)")
print("     ├── LLM (reasoning step 2)")
print("     │   └── TOOL (text_transformer)")
print("     ├── LLM (reasoning step 3)")
print("     │   └── TOOL (char_counter)")
print("     └── LLM (final answer)")

## 7️⃣ Understanding Span Types

MLflow uses different span types to categorize operations.

### Span Type Reference

| Span Type | Purpose | Examples |
|-----------|---------|----------|
| `CHAIN` | High-level workflow | `client.react()`, `client.chat()` |
| `LLM` | Model inference | OpenAI/LM Studio API calls |
| `AGENT` | Agent reasoning | ReACT loop iterations |
| `TOOL` | Tool execution | `math_calculator`, `execute_python` |
| `RETRIEVER` | Data retrieval | Database queries, API calls |

### Trace Hierarchy

```
CHAIN (root span - entire workflow)
│
├── AGENT (if using agents)
│   │
│   ├── LLM (model call for reasoning)
│   │   ├── attributes: model, temperature, tokens
│   │   ├── input: messages sent to model
│   │   └── output: model response
│   │
│   └── TOOL (tool execution)
│       ├── attributes: tool name, parameters
│       ├── input: tool arguments
│       └── output: tool result
│
└── Timing data for each span
```

## 8️⃣ Complex Multi-Tool Trace

Let's create a complex workflow to see a rich trace.

In [None]:
import tempfile
import os

temp_dir = tempfile.mkdtemp()

# Complex multi-tool task
result = client.react(
    f"Generate a Python list of the squares of numbers 1 through 10. "
    f"Calculate the sum of all those squares. "
    f"Save the list and sum to {temp_dir}/squares.txt. "
    f"Then read the file back and count the total characters in it.",
    max_steps=15
)

print("🎯 Complex Workflow Complete!")
print(f"\nResult: {result.final_response}")

print("\n" + "="*70)
print("\n📊 Trace Analysis in MLflow UI:")
print("\n1. Navigate to the Traces tab")
print("2. Find this trace (most recent)")
print("3. Expand the tree to see:")
print("   - Multiple LLM calls (one per reasoning step)")
print("   - execute_python span (generate squares & sum)")
print("   - filesystem_operation spans (write & read)")
print("   - char_counter span")
print("\n4. Click each span to inspect:")
print("   - Input parameters")
print("   - Output values")
print("   - Duration (ms)")
print("   - Timestamps")

# Cleanup
import shutil
shutil.rmtree(temp_dir)

## 9️⃣ Using Traces for Debugging

Traces are invaluable when something goes wrong. Let's see how to debug with traces.

In [None]:
# A task that might have issues
result = client.react(
    "Calculate the square root of -1 using Python, "
    "then tell me what type of number it is.",
    max_steps=8
)

print("🔍 Debugging Example:")
print(f"\nStatus: {result.status}")
print(f"\nResult:\n{result.final_response}")

print("\n" + "="*70)
print("\n🛠️ How to Debug with Traces:")
print("\n1. Open the trace in MLflow UI")
print("2. Look for spans with errors (red indicators)")
print("3. Check the execute_python span:")
print("   - Input: What code was executed?")
print("   - Output: What was the result/error?")
print("4. Trace the reasoning:")
print("   - Did the LLM handle complex numbers correctly?")
print("   - Did it import the 'cmath' module?")
print("5. Identify improvements:")
print("   - Better prompting?")
print("   - More specific tool instructions?")
print("   - Additional tools needed?")

## 🔟 Finding Performance Bottlenecks

Use timing data in traces to optimize your application.

In [None]:
# Task with varying execution times
result = client.react(
    "Generate the first 20 prime numbers, "
    "then calculate statistical properties (mean, median, std dev), "
    "and finally sort them in reverse order.",
    max_steps=12
)

print("⚡ Performance Analysis:")
print(f"\nTotal steps: {result.steps_taken}")
print(f"Result: {result.final_response[:200]}...")

print("\n" + "="*70)
print("\n📈 Performance Optimization with Traces:")
print("\n1. Open trace in MLflow UI")
print("2. Sort spans by duration")
print("3. Identify slow operations:")
print("   - Which LLM calls took longest?")
print("   - Which tools were slowest?")
print("   - Are there redundant calls?")
print("\n4. Optimization strategies:")
print("   - Combine multiple tool calls into one")
print("   - Cache results of expensive operations")
print("   - Use faster models for simple reasoning")
print("   - Reduce max_steps if agent is over-thinking")
print("\n5. Compare before/after:")
print("   - Run optimized version")
print("   - Compare traces side-by-side")
print("   - Measure improvement")

## 1️⃣1️⃣ Trace Metadata and Custom Attributes

You can add custom metadata to traces for better organization.

In [None]:
# Set run name and tags for better organization
mlflow.set_experiment("Local LLM SDK Tutorial")

with mlflow.start_run(run_name="fibonacci-research") as run:
    # Tag the run
    mlflow.set_tag("task_type", "mathematical_research")
    mlflow.set_tag("complexity", "medium")
    
    # Execute agent
    result = client.react(
        "Calculate the first 10 Fibonacci numbers, "
        "sum them, and determine if the sum is prime.",
        max_steps=10
    )
    
    # Log metrics
    mlflow.log_metric("steps_taken", result.steps_taken)
    mlflow.log_metric("success", 1 if result.status == "success" else 0)
    
    print("✅ Trace logged with metadata!")
    print(f"\nRun ID: {run.info.run_id}")
    print(f"Status: {result.status}")
    print(f"\nResult: {result.final_response}")

print("\n💡 View in MLflow UI:")
print("   - Experiments tab: See all runs organized by experiment")
print("   - Filter by tags: complexity=medium")
print("   - Compare metrics across runs")

## 🏋️ Exercise: Trace Analysis Challenge

**Challenge:** Create and analyze traces for a data processing pipeline.

**Task:**
1. Create an agent that processes a list of numbers
2. The agent should:
   - Generate 25 random numbers (1-100)
   - Filter for even numbers only
   - Calculate mean and median of even numbers
   - Save results to a file
   - Count characters in the saved file

**Analysis Requirements:**
1. Run the task and examine the trace in MLflow UI
2. Count total spans in the trace
3. Identify the slowest span
4. Find how many tool calls were made
5. Calculate total execution time

Try it yourself first!

In [None]:
# Your code here:



<details>
<summary>Click to see solution</summary>

```python
# Solution: Trace Analysis Challenge

import tempfile
import os
import shutil

temp_dir = tempfile.mkdtemp()
results_file = os.path.join(temp_dir, "even_numbers_analysis.txt")

print("📊 Data Processing Pipeline with Tracing\n")
print("="*70)

# Set up organized tracing
mlflow.set_experiment("Data Processing Tutorial")

with mlflow.start_run(run_name="even-numbers-pipeline") as run:
    # Tag for organization
    mlflow.set_tag("pipeline_type", "data_processing")
    mlflow.set_tag("data_size", "25_numbers")
    
    # Execute the pipeline
    result = client.react(
        f"Generate 25 random integers between 1 and 100. "
        f"Filter to keep only even numbers. "
        f"Calculate the mean and median of the even numbers. "
        f"Save all results (original list, even list, mean, median) to {results_file}. "
        f"Then count how many characters are in that file.",
        max_steps=15
    )
    
    # Log metrics
    mlflow.log_metric("steps_taken", result.steps_taken)
    mlflow.log_metric("success", 1 if result.status == "success" else 0)
    
    print(f"\n✅ Pipeline Status: {result.status}")
    print(f"Steps taken: {result.steps_taken}")
    print(f"\nFinal Result:\n{result.final_response}")
    
    # Verify output file
    if os.path.exists(results_file):
        with open(results_file, 'r') as f:
            content = f.read()
        print(f"\n📄 Generated File ({len(content)} chars):\n{content}")
    
    print("\n" + "="*70)
    print("\n🔍 Trace Analysis Instructions:\n")
    print("1. Open MLflow UI: http://localhost:5000")
    print("2. Navigate to 'Experiments' → 'Data Processing Tutorial'")
    print(f"3. Find run: 'even-numbers-pipeline' (ID: {run.info.run_id[:8]}...)")
    print("4. Click 'Traces' to view the execution tree")
    print("\n📊 Analysis Tasks:")
    print("   □ Count total spans (expand all nodes)")
    print("   □ Identify slowest span (check duration column)")
    print("   □ Count TOOL spans (how many tool calls?)")
    print("   □ Note total execution time (root CHAIN span)")
    print("   □ Inspect inputs/outputs of each span")
    print("\n💡 Expected Observations:")
    print("   - Multiple LLM spans (one per reasoning step)")
    print("   - execute_python spans (random numbers, filtering, stats)")
    print("   - filesystem_operation spans (write and read)")
    print("   - char_counter span (final count)")
    print("   - Each span shows exact inputs and outputs")
    print("   - Duration shows which operations are slowest")

# Cleanup
shutil.rmtree(temp_dir)
print("\n✅ Analysis complete!")
```
</details>

In [None]:
# Solution cell (run to see answer)
import tempfile
import os
import shutil

temp_dir = tempfile.mkdtemp()
results_file = os.path.join(temp_dir, "even_numbers_analysis.txt")

print("📊 Data Processing Pipeline with Tracing\n")
print("="*70)

mlflow.set_experiment("Data Processing Tutorial")

with mlflow.start_run(run_name="even-numbers-pipeline") as run:
    mlflow.set_tag("pipeline_type", "data_processing")
    mlflow.set_tag("data_size", "25_numbers")
    
    result = client.react(
        f"Generate 25 random integers between 1 and 100. "
        f"Filter to keep only even numbers. "
        f"Calculate the mean and median of the even numbers. "
        f"Save all results (original list, even list, mean, median) to {results_file}. "
        f"Then count how many characters are in that file.",
        max_steps=15
    )
    
    mlflow.log_metric("steps_taken", result.steps_taken)
    mlflow.log_metric("success", 1 if result.status == "success" else 0)
    
    print(f"\n✅ Pipeline Status: {result.status}")
    print(f"Steps taken: {result.steps_taken}")
    print(f"\nFinal Result:\n{result.final_response}")
    
    if os.path.exists(results_file):
        with open(results_file, 'r') as f:
            content = f.read()
        print(f"\n📄 Generated File ({len(content)} chars):\n{content}")
    
    print("\n" + "="*70)
    print("\n🔍 Trace Analysis Instructions:\n")
    print("1. Open MLflow UI: http://localhost:5000")
    print("2. Navigate to 'Experiments' → 'Data Processing Tutorial'")
    print(f"3. Find run: 'even-numbers-pipeline' (ID: {run.info.run_id[:8]}...)")
    print("4. Click 'Traces' to view the execution tree")
    print("\n📊 Analysis Tasks:")
    print("   □ Count total spans (expand all nodes)")
    print("   □ Identify slowest span (check duration column)")
    print("   □ Count TOOL spans (how many tool calls?)")
    print("   □ Note total execution time (root CHAIN span)")
    print("   □ Inspect inputs/outputs of each span")

shutil.rmtree(temp_dir)
print("\n✅ Analysis complete!")

## ⚠️ Common Pitfalls

### 1. Forgetting to Enable Tracing
```python
# ❌ Bad: Tracing not enabled
client = LocalLLMClient(base_url="...", model="...")
# No traces will be generated

# ✅ Good: Enable tracing
client = LocalLLMClient(
    base_url="...",
    model="...",
    enable_tracing=True
)
```

### 2. Not Starting MLflow UI
```bash
# ⚠️ Must start MLflow UI to view traces
mlflow ui --port 5000

# Then open http://localhost:5000
```

### 3. Overwhelming Trace Volume
```python
# ⚠️ Too many traces can clutter the UI
for i in range(1000):
    client.chat(f"Task {i}")  # Creates 1000 traces!

# 💡 Tip: Use experiments and tags to organize
# 💡 Tip: Disable tracing for production/bulk operations
```

### 4. Not Using Experiments
```python
# ❌ Bad: All traces in default experiment
client.chat("Task 1")
client.chat("Task 2")
# Hard to find specific traces later

# ✅ Good: Organize with experiments
mlflow.set_experiment("Tutorial Examples")
with mlflow.start_run(run_name="specific-task"):
    client.chat("Task")
```

### 5. Ignoring Trace Insights
```python
# ⚠️ Don't just collect traces - analyze them!

# Look for:
# - Slow operations (optimize or cache)
# - Redundant tool calls (combine operations)
# - Error patterns (improve error handling)
# - Token usage (optimize prompts)
# - Reasoning quality (improve instructions)
```

## 🎓 What You Learned

✅ **MLflow Setup**: Installing and starting the MLflow UI

✅ **Automatic Tracing**: Enabling tracing with `enable_tracing=True`

✅ **Hierarchical Traces**: Understanding CHAIN → LLM → AGENT → TOOL structure

✅ **Trace Inspection**: Viewing inputs, outputs, and timing in MLflow UI

✅ **Debugging**: Using traces to understand what went wrong

✅ **Performance**: Identifying bottlenecks with timing data

✅ **Organization**: Using experiments, runs, and tags

✅ **Best Practices**: When and how to use tracing effectively

## 🚀 Next Steps

You've mastered observability with MLflow! Now let's learn production-ready patterns.

➡️ Continue to [09-production-patterns.ipynb](./09-production-patterns.ipynb) to learn:
- Error handling and retry logic
- Timeout configuration
- Exponential backoff strategies
- Environment-specific settings
- Logging best practices
- Building robust API wrappers