# üìä 08: MLflow Observability

Learn how to use MLflow tracing to observe, debug, and optimize your LLM applications with hierarchical execution traces.

## üìã Learning Objectives

By the end of this notebook, you will be able to:

- [ ] Install and configure MLflow for tracing
- [ ] Start and access the MLflow UI
- [ ] Enable automatic tracing in the Local LLM SDK
- [ ] Understand hierarchical trace structure (CHAIN ‚Üí LLM ‚Üí AGENT ‚Üí TOOL)
- [ ] View and analyze traces in the MLflow UI
- [ ] Use traces for debugging complex agent workflows
- [ ] Identify performance bottlenecks with trace timing data
- [ ] Export and share traces for collaboration

## üéØ Prerequisites

- Completed notebook 07 (ReACT Agents)
- Understanding of agent execution flow
- Familiarity with tools and multi-step tasks
- LM Studio running with a model that supports function calling

## ‚è±Ô∏è Estimated Time: 20 minutes

## 1Ô∏è‚É£ What is MLflow Tracing?

**MLflow** is an open-source platform for managing ML workflows. **Tracing** is a feature that records the execution of your LLM applications.

### Why Tracing?

Without tracing:
- ‚ùå Can't see what the agent did internally
- ‚ùå Hard to debug when things go wrong
- ‚ùå No visibility into performance bottlenecks
- ‚ùå Difficult to optimize complex workflows

With tracing:
- ‚úÖ See every LLM call, tool execution, and decision
- ‚úÖ Visualize hierarchical execution flow
- ‚úÖ Measure timing for each component
- ‚úÖ Debug issues by inspecting inputs/outputs at each step
- ‚úÖ Compare different runs side-by-side

### Hierarchical Trace Structure

```
CHAIN (top-level task)
‚îú‚îÄ‚îÄ LLM (reasoning step 1)
‚îÇ   ‚îî‚îÄ‚îÄ TOOL (calculator)
‚îú‚îÄ‚îÄ LLM (reasoning step 2)
‚îÇ   ‚îú‚îÄ‚îÄ TOOL (file_write)
‚îÇ   ‚îî‚îÄ‚îÄ TOOL (file_read)
‚îî‚îÄ‚îÄ LLM (reasoning step 3 - conclusion)
```

## 2Ô∏è‚É£ Installing MLflow

First, let's install MLflow if it's not already installed.

In [1]:
# Install MLflow (uncomment if needed)
# !pip install mlflow>=2.13.0

import mlflow

print(f"‚úÖ MLflow version: {mlflow.__version__}")
print("\nüí° MLflow 2.13+ is recommended for best tracing support")

‚úÖ MLflow version: 3.4.0

üí° MLflow 2.13+ is recommended for best tracing support


## 3Ô∏è‚É£ Starting the MLflow UI

The MLflow UI provides a visual interface for viewing traces.

**Start MLflow UI in a terminal:**
```bash
mlflow ui --port 5000
```

Then open your browser to: **http://localhost:5000**

üí° **Tip**: Keep the MLflow UI open in a browser tab while running this notebook!

Alternatively, start it from Python (in background):

In [2]:
import mlflow
import os
from pathlib import Path
from typing import Optional
import ipynbname


# Set tracking URI to project root (where MLflow UI is serving from)
project_root = os.path.dirname(os.path.abspath(os.getcwd()))
tracking_uri = f"file://{project_root}/mlruns"
mlflow.set_tracking_uri(tracking_uri)

print(f"‚úÖ MLflow tracking URI updated: {tracking_uri}")

# Verify it's set correctly
print(f"\nüîç Current tracking URI: {mlflow.get_tracking_uri()}")

# Use the notebook filename to name the experiment so runs stay organized
experiment_name = f"{ipynbname.name()}"
mlflow.set_experiment(experiment_name)
print(f"‚úÖ Experiment set: {experiment_name}")
print("\nüí° Now run your agent tasks and check http://127.0.0.1:5000")


2025/10/03 09:54:34 INFO mlflow.tracking.fluent: Experiment with name '08-mlflow-observability' does not exist. Creating a new experiment.


‚úÖ MLflow tracking URI updated: file:///Users/maheidem/Documents/dev/gen-ai-api-study/mlruns

üîç Current tracking URI: file:///Users/maheidem/Documents/dev/gen-ai-api-study/mlruns
‚úÖ Experiment set: 08-mlflow-observability

üí° Now run your agent tasks and check http://127.0.0.1:5000


## 4Ô∏è‚É£ Enabling Automatic Tracing

The Local LLM SDK has built-in MLflow tracing support!

In [3]:
from local_llm_sdk import LocalLLMClient
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Create client with tracing enabled
client = LocalLLMClient(
    base_url=os.getenv("LLM_BASE_URL"),
    model=os.getenv("LLM_MODEL")
)

# Register built-in tools
client.register_tools_from(None)

print("‚úÖ Client created with MLflow tracing enabled!")
print("\nüí° All operations will now be traced automatically")

‚úì Auto-detected model: qwen/qwen3-coder-30b
‚úÖ Client created with MLflow tracing enabled!

üí° All operations will now be traced automatically


## 5Ô∏è‚É£ Basic Trace Example

Let's make a simple traced call and view it in the UI.

In [4]:
# Simple chat with tracing
response = client.chat("What is 12 multiplied by 15?")

print("üí¨ Response:")
print(response)
print("\n‚úÖ This call was traced!")
print("\nüîç View the trace:")
print("   1. Go to http://localhost:5000")
print("   2. Click on 'Traces' in the sidebar")
print("   3. Find the most recent trace")
print("   4. Click to see the execution tree")

üí¨ Response:
12 multiplied by 15 is 180.

‚úÖ This call was traced!

üîç View the trace:
   1. Go to http://localhost:5000
   2. Click on 'Traces' in the sidebar
   3. Find the most recent trace
   4. Click to see the execution tree


**üéØ What you'll see in MLflow UI:**

- A `CHAIN` span for the overall chat call
- An `LLM` span for the model inference
- A `TOOL` span for the calculator tool
- Input/output for each span
- Timing information (duration in ms)
- Token counts and other metadata

## üîó Grouping Related Calls with Conversation Context

The most common MLflow anti-pattern: creating multiple separate traces when they should be grouped together.

### ‚ùå Problem: Separate Traces

Without grouping, each `client.chat()` call creates a separate top-level trace:

```python
# Creates trace #1
response1 = client.chat("First question")

# Creates trace #2
response2 = client.chat("Second question")

# Creates trace #3
response3 = client.chat("Third question")
```

**Result in MLflow UI:**
- 3 separate traces
- No visible relationship
- Hard to analyze the workflow
- Cluttered trace list

### ‚úÖ Solution: Use `client.conversation()`

Wrap related calls in a conversation context:

```python
with client.conversation("my_workflow"):
    # All calls here become children of "my_workflow"
    response1 = client.chat("First question")
    response2 = client.chat("Second question")
    response3 = client.chat("Third question")
```

**Result in MLflow UI:**
```
my_workflow (parent trace)
‚îú‚îÄ chat (first question)
‚îú‚îÄ chat (second question)
‚îî‚îÄ chat (third question)
```

**Benefits:**
- ‚úÖ Clear hierarchy showing the workflow
- ‚úÖ Single trace to review
- ‚úÖ Easy to see the complete interaction
- ‚úÖ Better performance metrics (total time, etc.)

In [5]:
print("=" * 70)
print("DEMO: Grouped vs Ungrouped Traces")
print("=" * 70)

# Ungrouped (creates 3 separate traces)
print("\n1Ô∏è‚É£ Ungrouped calls (3 separate traces):")
client.chat("What is 5 + 5?")
client.chat("What is 10 * 2?")
client.chat("What is 20 / 4?")
print("‚úÖ Check MLflow: 3 separate traces\n")

# Grouped (creates 1 parent trace with 3 children)
print("2Ô∏è‚É£ Grouped calls (1 parent trace):")
with client.conversation("math_workflow"):
    client.chat("What is 5 + 5?")
    client.chat("What is 10 * 2?")
    client.chat("What is 20 / 4?")
print("‚úÖ Check MLflow: 1 trace named 'math_workflow' with 3 children\n")

print("=" * 70)
print("üí° Open MLflow UI to see the difference!")
print("=" * 70)

DEMO: Grouped vs Ungrouped Traces

1Ô∏è‚É£ Ungrouped calls (3 separate traces):
‚úÖ Check MLflow: 3 separate traces

2Ô∏è‚É£ Grouped calls (1 parent trace):
‚úÖ Check MLflow: 1 trace named 'math_workflow' with 3 children

üí° Open MLflow UI to see the difference!


### üìã When to Use Conversation Context

**Use `conversation()` when:**
- ‚úÖ Multiple calls that form a logical unit
- ‚úÖ Multi-step workflows or pipelines
- ‚úÖ Iterative processing (like agent loops)
- ‚úÖ Debugging complex interactions

**Don't use when:**
- ‚ùå Single, independent chat calls
- ‚ùå Unrelated queries
- ‚ùå Each call needs separate metrics

**Real-world examples:**
```python
# Example 1: Data analysis workflow
with client.conversation("data_analysis"):
    summary = client.chat("Summarize this dataset")
    insights = client.chat("Find key insights")
    recommendations = client.chat("Suggest actions")

# Example 2: Code review workflow
with client.conversation("code_review"):
    syntax = client.chat("Check syntax issues")
    style = client.chat("Review code style")
    security = client.chat("Identify security risks")

# Example 3: Agent pattern (agents do this automatically!)
with client.conversation("react_agent_task"):
    for i in range(max_iterations):
        response = client.chat(messages, use_tools=True)
        # Agent logic...
```

**Pro tip:** The ReACT agent (`local_llm_sdk.agents.ReACT`) uses this pattern automatically! When you call `agent.run()`, all iterations are grouped under one parent trace. See `local_llm_sdk/agents/base.py:81` for implementation.

## 6Ô∏è‚É£ Agent Trace - The Full Picture

Now let's trace a ReACT agent execution to see the complete hierarchy.

In [6]:
# Run an agent task with tracing
result = client.react(
    "Calculate the factorial of 6, then convert the result to text, "
    "uppercase it, and count how many characters it has.",
    max_iterations=10
)

print("ü§ñ Agent Result:")
print(f"Status: {result.status}")
print(f"Iterations: {result.iterations}")
print(f"\nFinal answer:\n{result.final_response}")

print("\n" + "="*70)
print("\nüîç View the hierarchical trace in MLflow UI:")
print("   - CHAIN (agent.react)")
print("     ‚îú‚îÄ‚îÄ LLM (reasoning step 1)")
print("     ‚îÇ   ‚îî‚îÄ‚îÄ TOOL (execute_python for factorial)")
print("     ‚îú‚îÄ‚îÄ LLM (reasoning step 2)")
print("     ‚îÇ   ‚îî‚îÄ‚îÄ TOOL (text_transformer)")
print("     ‚îú‚îÄ‚îÄ LLM (reasoning step 3)")
print("     ‚îÇ   ‚îî‚îÄ‚îÄ TOOL (char_counter)")
print("     ‚îî‚îÄ‚îÄ LLM (final answer)")

ReACT Agent: Starting task
Max iterations: 10
Task: Calculate the factorial of 6, then convert the result to text, uppercase it, and count how many char...


Iteration 1/10
----------------------------------------
Response: I got 720 as the factorial of 6. Now I'll convert this to text, uppercase it, and count the characters.


Tools used: 1
  - bash

Iteration 2/10
----------------------------------------
Response: The result is "720" (same as the original since it's numeric). Now I'll count the characters:


Tools used: 1
  - bash

Iteration 3/10
----------------------------------------
Response: The factorial of 6 is 720. When converted to text and uppercased, it remains "720". This text has 3 characters.

FINAL ANSWER: The factorial of 6 is 7...
Tools used: 1
  - bash

‚úì Task completed successfully in 3 iterations
ü§ñ Agent Result:
Status: AgentStatus.SUCCESS
Iterations: 3

Final answer:
The factorial of 6 is 720. When converted to text and uppercased, it remains "720". This tex

## 7Ô∏è‚É£ Understanding Span Types

MLflow uses different span types to categorize operations.

### Span Type Reference

| Span Type | Purpose | Examples |
|-----------|---------|----------|
| `CHAIN` | High-level workflow | `client.react()`, `client.chat()` |
| `LLM` | Model inference | OpenAI/LM Studio API calls |
| `AGENT` | Agent reasoning | ReACT loop iterations |
| `TOOL` | Tool execution | `math_calculator`, `execute_python` |
| `RETRIEVER` | Data retrieval | Database queries, API calls |

### Trace Hierarchy

```
CHAIN (root span - entire workflow)
‚îÇ
‚îú‚îÄ‚îÄ AGENT (if using agents)
‚îÇ   ‚îÇ
‚îÇ   ‚îú‚îÄ‚îÄ LLM (model call for reasoning)
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ attributes: model, temperature, tokens
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ input: messages sent to model
‚îÇ   ‚îÇ   ‚îî‚îÄ‚îÄ output: model response
‚îÇ   ‚îÇ
‚îÇ   ‚îî‚îÄ‚îÄ TOOL (tool execution)
‚îÇ       ‚îú‚îÄ‚îÄ attributes: tool name, parameters
‚îÇ       ‚îú‚îÄ‚îÄ input: tool arguments
‚îÇ       ‚îî‚îÄ‚îÄ output: tool result
‚îÇ
‚îî‚îÄ‚îÄ Timing data for each span
```

## 8Ô∏è‚É£ Complex Multi-Tool Trace

Let's create a complex workflow to see a rich trace.

In [7]:
import tempfile
import os

temp_dir = tempfile.mkdtemp()

# Complex multi-tool task
result = client.react(
    f"Generate a Python list of the squares of numbers 1 through 10. "
    f"Calculate the sum of all those squares. "
    f"Save the list and sum to {temp_dir}/squares.txt. "
    f"Then read the file back and count the total characters in it.",
    max_iterations=15
)

print("üéØ Complex Workflow Complete!")
print(f"\nResult: {result.final_response}")

print("\n" + "="*70)
print("\nüìä Trace Analysis in MLflow UI:")
print("\n1. Navigate to the Traces tab")
print("2. Find this trace (most recent)")
print("3. Expand the tree to see:")
print("   - Multiple LLM calls (one per reasoning step)")
print("   - execute_python span (generate squares & sum)")
print("   - filesystem_operation spans (write & read)")
print("   - char_counter span")
print("\n4. Click each span to inspect:")
print("   - Input parameters")
print("   - Output values")
print("   - Duration (ms)")
print("   - Timestamps")

# Cleanup
import shutil
shutil.rmtree(temp_dir)

ReACT Agent: Starting task
Max iterations: 15
Task: Generate a Python list of the squares of numbers 1 through 10. Calculate the sum of all those square...


Iteration 1/15
----------------------------------------
Response: Now I'll save this information to the specified file:


Tools used: 1
  - bash

Iteration 2/15
----------------------------------------
Response: I see there's a syntax error with the f-strings in the Python command. Let me fix this by using a different approach:


Tools used: 1
  - bash

Iteration 3/15
----------------------------------------
Response: The issue is with the f-string syntax in the Python command. Let me use a simpler approach with proper string formatting:


Tools used: 1
  - bash

Iteration 4/15
----------------------------------------
Response: 
Tools used: 1
  - bash

Iteration 5/15
----------------------------------------
Response: Now I'll create the file with the content:


Tools used: 1
  - bash

Iteration 6/15
-------------------------------

## 9Ô∏è‚É£ Using Traces for Debugging

Traces are invaluable when something goes wrong. Let's see how to debug with traces.

In [8]:
# A task that might have issues
result = client.react(
    "Calculate the square root of -1 using Python, "
    "then tell me what type of number it is.",
    max_iterations=8
)

print("üîç Debugging Example:")
print(f"\nStatus: {result.status}")
print(f"\nResult:\n{result.final_response}")

print("\n" + "="*70)
print("\nüõ†Ô∏è How to Debug with Traces:")
print("\n1. Open the trace in MLflow UI")
print("2. Look for spans with errors (red indicators)")
print("3. Check the execute_python span:")
print("   - Input: What code was executed?")
print("   - Output: What was the result/error?")
print("4. Trace the reasoning:")
print("   - Did the LLM handle complex numbers correctly?")
print("   - Did it import the 'cmath' module?")
print("5. Identify improvements:")
print("   - Better prompting?")
print("   - More specific tool instructions?")
print("   - Additional tools needed?")

ReACT Agent: Starting task
Max iterations: 8
Task: Calculate the square root of -1 using Python, then tell me what type of number it is.


Iteration 1/8
----------------------------------------
Response: I see that the math module throws an error for negative numbers, but we can use the cmath module to handle complex numbers. Let me try that:


Tools used: 1
  - bash

Iteration 2/8
----------------------------------------
Response: I've calculated the square root of -1 using Python's cmath module. The result is 1j, which is a complex number. The square root of -1 equals 1j, where...
Tools used: 1
  - bash

‚úì Task completed successfully in 2 iterations
üîç Debugging Example:

Status: AgentStatus.SUCCESS

Result:
I've calculated the square root of -1 using Python's cmath module. The result is 1j, which is a complex number. The square root of -1 equals 1j, where j represents the imaginary unit. This is a fundamental concept in complex numbers, where the imaginary unit i (or j in engine

## üîü Finding Performance Bottlenecks

Use timing data in traces to optimize your application.

In [9]:
# Task with varying execution times
result = client.react(
    "Generate the first 20 prime numbers, "
    "then calculate statistical properties (mean, median, std dev), "
    "and finally sort them in reverse order.",
    max_iterations=12
)

print("‚ö° Performance Analysis:")
print(f"\nTotal steps: {result.iterations}")
print(f"Result: {result.final_response[:200]}...")

print("\n" + "="*70)
print("\nüìà Performance Optimization with Traces:")
print("\n1. Open trace in MLflow UI")
print("2. Sort spans by duration")
print("3. Identify slow operations:")
print("   - Which LLM calls took longest?")
print("   - Which tools were slowest?")
print("   - Are there redundant calls?")
print("\n4. Optimization strategies:")
print("   - Combine multiple tool calls into one")
print("   - Cache results of expensive operations")
print("   - Use faster models for simple reasoning")
print("   - Reduce max_steps if agent is over-thinking")
print("\n5. Compare before/after:")
print("   - Run optimized version")
print("   - Compare traces side-by-side")
print("   - Measure improvement")

ReACT Agent: Starting task
Max iterations: 12
Task: Generate the first 20 prime numbers, then calculate statistical properties (mean, median, std dev), ...


Iteration 1/12
----------------------------------------
Response: Now I have the first 20 prime numbers: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

Next, I'll calculate the statisti...
Tools used: 1
  - bash

Iteration 2/12
----------------------------------------
Response: Now I'll sort the prime numbers in reverse order:


Tools used: 1
  - bash

Iteration 3/12
----------------------------------------
Response: I've completed all the requested operations:

1. Generated the first 20 prime numbers: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59...
Tools used: 1
  - bash

‚úì Task completed successfully in 3 iterations
‚ö° Performance Analysis:

Total steps: 3
Result: I've completed all the requested operations:

1. Generated the first 20 prime numbers: [2, 3, 5, 7, 11, 13, 17, 19

## 1Ô∏è‚É£1Ô∏è‚É£ Trace Metadata and Custom Attributes

You can add custom metadata to traces for better organization.

In [10]:
# Set run name and tags for better organization
mlflow.set_experiment("Local LLM SDK Tutorial")

with mlflow.start_run(run_name="fibonacci-research") as run:
    # Tag the run
    mlflow.set_tag("task_type", "mathematical_research")
    mlflow.set_tag("complexity", "medium")
    
    # Execute agent
    result = client.react(
        "Calculate the first 10 Fibonacci numbers, "
        "sum them, and determine if the sum is prime.",
        max_iterations=10
    )
    
    # Log metrics
    mlflow.log_metric("steps_taken", result.iterations)
    mlflow.log_metric("success", 1 if result.status == "success" else 0)
    
    print("‚úÖ Trace logged with metadata!")
    print(f"\nRun ID: {run.info.run_id}")
    print(f"Status: {result.status}")
    print(f"\nResult: {result.final_response}")

print("\nüí° View in MLflow UI:")
print("   - Experiments tab: See all runs organized by experiment")
print("   - Filter by tags: complexity=medium")
print("   - Compare metrics across runs")

MlflowException: Cannot set a deleted experiment 'Local LLM SDK Tutorial' as the active experiment. You can restore the experiment, or permanently delete the experiment to create a new one.

## üèãÔ∏è Exercise: Trace Analysis Challenge

**Challenge:** Create and analyze traces for a data processing pipeline.

**Task:**
1. Create an agent that processes a list of numbers
2. The agent should:
   - Generate 25 random numbers (1-100)
   - Filter for even numbers only
   - Calculate mean and median of even numbers
   - Save results to a file
   - Count characters in the saved file

**Analysis Requirements:**
1. Run the task and examine the trace in MLflow UI
2. Count total spans in the trace
3. Identify the slowest span
4. Find how many tool calls were made
5. Calculate total execution time

Try it yourself first!

In [None]:
# Your code here:



<details>
<summary>Click to see solution</summary>

```python
# Solution: Trace Analysis Challenge

import tempfile
import os
import shutil

temp_dir = tempfile.mkdtemp()
results_file = os.path.join(temp_dir, "even_numbers_analysis.txt")

print("üìä Data Processing Pipeline with Tracing\n")
print("="*70)

# Set up organized tracing
mlflow.set_experiment("Data Processing Tutorial")

with mlflow.start_run(run_name="even-numbers-pipeline") as run:
    # Tag for organization
    mlflow.set_tag("pipeline_type", "data_processing")
    mlflow.set_tag("data_size", "25_numbers")
    
    # Execute the pipeline
    result = client.react(
        f"Generate 25 random integers between 1 and 100. "
        f"Filter to keep only even numbers. "
        f"Calculate the mean and median of the even numbers. "
        f"Save all results (original list, even list, mean, median) to {results_file}. "
        f"Then count how many characters are in that file.",
        max_iterations=15
    )
    
    # Log metrics
    mlflow.log_metric("steps_taken", result.iterations)
    mlflow.log_metric("success", 1 if result.status == "success" else 0)
    
    print(f"\n‚úÖ Pipeline Status: {result.status}")
    print(f"Iterations: {result.iterations}")
    print(f"\nFinal Result:\n{result.final_response}")
    
    # Verify output file
    if os.path.exists(results_file):
        with open(results_file, 'r') as f:
            content = f.read()
        print(f"\nüìÑ Generated File ({len(content)} chars):\n{content}")
    
    print("\n" + "="*70)
    print("\nüîç Trace Analysis Instructions:\n")
    print("1. Open MLflow UI: http://localhost:5000")
    print("2. Navigate to 'Experiments' ‚Üí 'Data Processing Tutorial'")
    print(f"3. Find run: 'even-numbers-pipeline' (ID: {run.info.run_id[:8]}...)")
    print("4. Click 'Traces' to view the execution tree")
    print("\nüìä Analysis Tasks:")
    print("   ‚ñ° Count total spans (expand all nodes)")
    print("   ‚ñ° Identify slowest span (check duration column)")
    print("   ‚ñ° Count TOOL spans (how many tool calls?)")
    print("   ‚ñ° Note total execution time (root CHAIN span)")
    print("   ‚ñ° Inspect inputs/outputs of each span")
    print("\nüí° Expected Observations:")
    print("   - Multiple LLM spans (one per reasoning step)")
    print("   - execute_python spans (random numbers, filtering, stats)")
    print("   - filesystem_operation spans (write and read)")
    print("   - char_counter span (final count)")
    print("   - Each span shows exact inputs and outputs")
    print("   - Duration shows which operations are slowest")

# Cleanup
shutil.rmtree(temp_dir)
print("\n‚úÖ Analysis complete!")
```
</details>

In [None]:
# Solution cell (run to see answer)
import tempfile
import os
import shutil

temp_dir = tempfile.mkdtemp()
results_file = os.path.join(temp_dir, "even_numbers_analysis.txt")

print("üìä Data Processing Pipeline with Tracing\n")
print("="*70)

mlflow.set_experiment("Data Processing Tutorial")

with mlflow.start_run(run_name="even-numbers-pipeline") as run:
    mlflow.set_tag("pipeline_type", "data_processing")
    mlflow.set_tag("data_size", "25_numbers")
    
    result = client.react(
        f"Generate 25 random integers between 1 and 100. "
        f"Filter to keep only even numbers. "
        f"Calculate the mean and median of the even numbers. "
        f"Save all results (original list, even list, mean, median) to {results_file}. "
        f"Then count how many characters are in that file.",
        max_iterations=15
    )
    
    mlflow.log_metric("steps_taken", result.iterations)
    mlflow.log_metric("success", 1 if result.status == "success" else 0)
    
    print(f"\n‚úÖ Pipeline Status: {result.status}")
    print(f"Iterations: {result.iterations}")
    print(f"\nFinal Result:\n{result.final_response}")
    
    if os.path.exists(results_file):
        with open(results_file, 'r') as f:
            content = f.read()
        print(f"\nüìÑ Generated File ({len(content)} chars):\n{content}")
    
    print("\n" + "="*70)
    print("\nüîç Trace Analysis Instructions:\n")
    print("1. Open MLflow UI: http://localhost:5000")
    print("2. Navigate to 'Experiments' ‚Üí 'Data Processing Tutorial'")
    print(f"3. Find run: 'even-numbers-pipeline' (ID: {run.info.run_id[:8]}...)")
    print("4. Click 'Traces' to view the execution tree")
    print("\nüìä Analysis Tasks:")
    print("   ‚ñ° Count total spans (expand all nodes)")
    print("   ‚ñ° Identify slowest span (check duration column)")
    print("   ‚ñ° Count TOOL spans (how many tool calls?)")
    print("   ‚ñ° Note total execution time (root CHAIN span)")
    print("   ‚ñ° Inspect inputs/outputs of each span")

shutil.rmtree(temp_dir)
print("\n‚úÖ Analysis complete!")

## ‚ö†Ô∏è Common Pitfalls

### 1. Forgetting to Enable Tracing
```python
# ‚ùå Bad: Tracing not enabled
client = LocalLLMClient(base_url="...", model="...")
# No traces will be generated

# ‚úÖ Good: Enable tracing
client = LocalLLMClient(
    base_url="...",
    model="...",
    enable_tracing=True
)
```

### 2. Not Starting MLflow UI
```bash
# ‚ö†Ô∏è Must start MLflow UI to view traces
mlflow ui --port 5000

# Then open http://localhost:5000
```

### 3. Overwhelming Trace Volume
```python
# ‚ö†Ô∏è Too many traces can clutter the UI
for i in range(1000):
    client.chat(f"Task {i}")  # Creates 1000 traces!

# üí° Tip: Use experiments and tags to organize
# üí° Tip: Disable tracing for production/bulk operations
```

### 4. Not Using Experiments
```python
# ‚ùå Bad: All traces in default experiment
client.chat("Task 1")
client.chat("Task 2")
# Hard to find specific traces later

# ‚úÖ Good: Organize with experiments
mlflow.set_experiment("Tutorial Examples")
with mlflow.start_run(run_name="specific-task"):
    client.chat("Task")
```

### 5. Ignoring Trace Insights
```python
# ‚ö†Ô∏è Don't just collect traces - analyze them!

# Look for:
# - Slow operations (optimize or cache)
# - Redundant tool calls (combine operations)
# - Error patterns (improve error handling)
# - Token usage (optimize prompts)
# - Reasoning quality (improve instructions)
```

## üéì What You Learned

‚úÖ **MLflow Setup**: Installing and starting the MLflow UI

‚úÖ **Automatic Tracing**: Enabling tracing with `enable_tracing=True`

‚úÖ **Hierarchical Traces**: Understanding CHAIN ‚Üí LLM ‚Üí AGENT ‚Üí TOOL structure

‚úÖ **Trace Inspection**: Viewing inputs, outputs, and timing in MLflow UI

‚úÖ **Debugging**: Using traces to understand what went wrong

‚úÖ **Performance**: Identifying bottlenecks with timing data

‚úÖ **Organization**: Using experiments, runs, and tags

‚úÖ **Best Practices**: When and how to use tracing effectively

## üöÄ Next Steps

You've mastered observability with MLflow! Now let's learn production-ready patterns.

‚û°Ô∏è Continue to [09-production-patterns.ipynb](./09-production-patterns.ipynb) to learn:
- Error handling and retry logic
- Timeout configuration
- Exponential backoff strategies
- Environment-specific settings
- Logging best practices
- Building robust API wrappers