# Mastering MLflow Tracing: Understanding Traces and Spans for Production AI Agents

## Introduction

Building AI agents is exciting, but deploying them to production is where the real challenge begins. How do you know what your agent is doing internally? Where is it spending time? Why did it fail? What tools did it call and with what parameters?

This is where **MLflow Tracing** comes in‚Äîa powerful observability framework that transforms your AI agent from a mysterious black box into a transparent, debuggable, and optimizable system.

In this comprehensive guide, we'll explore traces and spans using a real-world tool-calling agent built with LangGraph and MLflow. We'll demystify these concepts with practical examples and show you exactly how they work under the hood.

## What is Observability in AI Agents?

Before diving into traces and spans, let's understand why observability matters.

Imagine you've deployed an AI agent that helps users with product queries. One day, users report slow responses. Without observability, you're left guessing:

- Is the LLM slow?
- Are the tools taking too long?
- Is there a network issue?
- Did the agent make unnecessary calls?

With MLflow tracing, you can see exactly what happened, step by step, call by call.

## Understanding Traces

### What is a Trace?

A **trace** represents the complete execution flow of a single request through your agent system. Think of it as a detailed journey map from the moment a user asks a question until they receive an answer.

### Trace Components

Every MLflow trace consists of two primary components:

#### 1. **TraceInfo** - The Metadata Container

TraceInfo provides a high-level overview of the trace:

```python
{
    "trace_id": "tr_abc123xyz",
    "request_time": 1732315680000,
    "execution_duration": 8450,  # milliseconds
    "state": "OK",  # OK, ERROR, IN_PROGRESS
    "request_preview": "What is the latest news on OpenAI product releases?",
    "response_preview": "Here are the latest OpenAI developments...",
    "tags": {
        "use_case": "analytics",
        "user_id": "user_789",
        "session_id": "session_456"
    }
}
```

**Key Fields:**
- `trace_id`: Unique identifier for the trace
- `execution_duration`: Total time taken (in milliseconds)
- `state`: Success or failure status
- `tags`: Searchable metadata for filtering and grouping

#### 2. **TraceData** - The Execution Details

TraceData contains the actual execution information organized as spans:

```python
{
    "spans": [
        # Root span
        {"span_id": "sp_001", "name": "agent_execution", ...},
        # Child spans
        {"span_id": "sp_002", "name": "llm_call", "parent_span_id": "sp_001", ...},
        {"span_id": "sp_003", "name": "tool_execution", "parent_span_id": "sp_001", ...},
    ]
}
```

## Understanding Spans

### What is a Span?

A **span** represents a single unit of work within a trace. Spans form a hierarchical tree structure where each span can have child spans, creating a detailed execution timeline.

### Span Anatomy

Each span captures rich information about a specific operation:

```python
{
    "span_id": "sp_002",
    "parent_span_id": "sp_001",
    "name": "llm_call",
    "span_type": "LLM",
    "start_time": 1732315680150,
    "end_time": 1732315682800,
    "inputs": {
        "messages": [...],
        "model": "databricks-meta-llama-3-3-70b-instruct",
        "temperature": 0.01
    },
    "outputs": {
        "content": "...",
        "tool_calls": [...]
    },
    "attributes": {
        "mlflow.chat.tokenUsage": {
            "input_tokens": 67,
            "output_tokens": 45,
            "total_tokens": 112
        },
        "execution_time_ms": 2650
    }
}
```

### Span Types in MLflow

MLflow defines several span types for different operations:

- **AGENT**: Agent-level orchestration
- **CHAIN**: Workflow or chain execution
- **LLM**: Language model calls
- **TOOL**: Tool or function execution
- **RETRIEVER**: Document retrieval operations
- **PARSER**: Output parsing operations

## Real-World Example: Tool-Calling Agent

Let's build a practical agent and see how traces and spans work in action. We'll use the code from the GitHub repository.

### The Agent Architecture

Here's our MLflow-compatible tool-calling agent:

```python
from typing import Annotated, Optional, Any
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from databricks_langchain import ChatDatabricks
from databricks_langchain.uc_ai import (
    DatabricksFunctionClient,
    UCFunctionToolkit,
    set_uc_function_client,
)
from mlflow.langchain.chat_agent_langgraph import ChatAgentState, ChatAgentToolNode
from langchain_core.runnables import RunnableConfig, RunnableLambda
from langgraph.prebuilt.tool_node import tools_condition
from mlflow.pyfunc import ChatAgent
from mlflow.types.agent import ChatAgentMessage, ChatAgentResponse, ChatContext
from mlflow.models import ModelConfig

def create_tool_calling_agent(model, tools):
    """
    Create a tool-calling agent with MLflow tracing support
    """
    llm_with_tools = model.bind_tools(tools=tools)
    
    # Preprocessor extracts messages from state
    preprocessor = RunnableLambda(lambda state: state["messages"])
    model_runnable = preprocessor | llm_with_tools

    def tool_calling_llm(state: ChatAgentState, config: RunnableConfig):
        """
        LLM node that processes messages and decides whether to call tools
        This function is automatically traced by MLflow
        """
        response = model_runnable.invoke(state, config)
        return {"messages": [response]}
    
    # Build the graph
    builder = StateGraph(ChatAgentState)
    builder.add_node("tool_calling_llm", RunnableLambda(tool_calling_llm))
    builder.add_node("tools", ChatAgentToolNode(tools=tools))
    builder.add_edge(START, "tool_calling_llm")
    
    # Conditional routing based on tool calls
    builder.add_conditional_edges(
        "tool_calling_llm",
        tools_condition,  # Routes to tools or END
        ["tools", END]
    )
    builder.add_edge("tools", "tool_calling_llm")  # Feedback loop
    
    return builder.compile()
```

### The MLflow-Compatible Agent Class

```python
class DocsAgent(ChatAgent):
    def __init__(self, config, tools):
        """
        Initialize agent with configuration and tools
        """
        self.config = ModelConfig(development_config=config)
        self.tools = tools
        self.agent = self._build_agent_from_config()

    def _build_agent_from_config(self):
        """Build the agent graph with configured LLM"""
        llm = ChatDatabricks(
            endpoint=self.config.get("endpoint_name"),
            temperature=self.config.get("temperature"),
            max_tokens=self.config.get("max_tokens"),
        )
        agent = create_tool_calling_agent(llm, tools=self.tools)
        return agent

    def predict(
        self,
        messages: list[ChatAgentMessage],
        context: Optional[ChatContext] = None,
        custom_inputs: Optional[dict[str, Any]] = None,
    ) -> ChatAgentResponse:
        """
        Main prediction method - automatically traced by MLflow
        """
        # Convert messages to dictionary format
        request = {"messages": self._convert_messages_to_dict(messages)}
        
        # Invoke agent - this creates the trace automatically
        output = self.agent.invoke(request)
        
        return ChatAgentResponse(**output)
```

### Setting Up the Agent

```python
# Initialize UC Function Client for tools
uc_client = DatabricksFunctionClient()
set_uc_function_client(uc_client)

# Configuration
catalog = "agentic_ai"
schema = "databricks"
LLM_ENDPOINT = "databricks-meta-llama-3-3-70b-instruct"

baseline_config = {
    "endpoint_name": LLM_ENDPOINT,
    "temperature": 0.01,
    "max_tokens": 1000
}

# Set up tools from Unity Catalog
uc_tool_names = [f"{catalog}.{schema}.search_web"]
uc_toolkit = UCFunctionToolkit(function_names=uc_tool_names)
tools = [*uc_toolkit.tools]

# Create the agent
AGENT = DocsAgent(baseline_config, tools)
```

## Enabling MLflow Tracing

MLflow provides automatic tracing for LangGraph agents. Here's how to enable it:

```python
import mlflow

# Enable automatic tracing for LangChain/LangGraph
mlflow.langchain.autolog()

# Set up experiment for organizing traces
mlflow.set_experiment("Agent_Tracing_Demo")

# Execute the agent - traces are captured automatically
result = AGENT.predict([{
    "role": "user", 
    "content": "What is the latest news on OpenAI product releases? Provide results in bullet points"
}])
```

## Trace and Span Hierarchy: A Complete Walkthrough

When the agent executes, MLflow creates a detailed trace with multiple spans. Let's walk through exactly what happens.

### Example Query Execution

```python
user_query = "What is the latest news on OpenAI product releases? Provide results in bullet points"

result = AGENT.predict([{
    "role": "user",
    "content": user_query
}])
```

### The Complete Trace Structure

```
üìä TRACE: agent_execution (trace_id: tr_abc123)
‚îÇ   Duration: 8.45 seconds
‚îÇ   Status: ‚úÖ OK
‚îÇ   Total Tokens: 537
‚îÇ
‚îú‚îÄ‚îÄ‚îÄ ü§ñ SPAN 1: DocsAgent.predict (span_id: sp_001)
‚îÇ    ‚îÇ   Type: AGENT
‚îÇ    ‚îÇ   Duration: 8.45s (100% of trace)
‚îÇ    ‚îÇ   Input: {"messages": [{"role": "user", "content": "What is the latest..."}]}
‚îÇ    ‚îÇ   Output: {"messages": [{"role": "assistant", "content": "‚Ä¢ OpenAI launched..."}]}
‚îÇ    ‚îÇ
‚îÇ    ‚îî‚îÄ‚îÄ‚îÄ üîÑ SPAN 2: StateGraph.invoke (span_id: sp_002, parent: sp_001)
‚îÇ         ‚îÇ   Type: CHAIN
‚îÇ         ‚îÇ   Duration: 8.30s
‚îÇ         ‚îÇ   Purpose: Execute the LangGraph workflow
‚îÇ         ‚îÇ
‚îÇ         ‚îú‚îÄ‚îÄ‚îÄ üß† SPAN 3: tool_calling_llm [1st call] (span_id: sp_003, parent: sp_002)
‚îÇ         ‚îÇ    ‚îÇ   Type: LLM
‚îÇ         ‚îÇ    ‚îÇ   Duration: 2.65s
‚îÇ         ‚îÇ    ‚îÇ   Input: User query
‚îÇ         ‚îÇ    ‚îÇ   Output: Tool call decision
‚îÇ         ‚îÇ    ‚îÇ   Attributes:
‚îÇ         ‚îÇ    ‚îÇ       - model: "databricks-meta-llama-3-3-70b-instruct"
‚îÇ         ‚îÇ    ‚îÇ       - temperature: 0.01
‚îÇ         ‚îÇ    ‚îÇ       - input_tokens: 67
‚îÇ         ‚îÇ    ‚îÇ       - output_tokens: 45
‚îÇ         ‚îÇ    ‚îÇ       - total_tokens: 112
‚îÇ         ‚îÇ    ‚îÇ       - has_tool_calls: true
‚îÇ         ‚îÇ    ‚îÇ
‚îÇ         ‚îú‚îÄ‚îÄ‚îÄ üõ†Ô∏è SPAN 4: search_web (span_id: sp_004, parent: sp_002)
‚îÇ         ‚îÇ    ‚îÇ   Type: TOOL
‚îÇ         ‚îÇ    ‚îÇ   Duration: 3.35s
‚îÇ         ‚îÇ    ‚îÇ   Input: {"query": "OpenAI latest product releases news 2024"}
‚îÇ         ‚îÇ    ‚îÇ   Output: "Recent OpenAI developments include GPT-4 Turbo..."
‚îÇ         ‚îÇ    ‚îÇ   Attributes:
‚îÇ         ‚îÇ    ‚îÇ       - tool_name: "agentic_ai.databricks.search_web"
‚îÇ         ‚îÇ    ‚îÇ       - execution_time_ms: 3350
‚îÇ         ‚îÇ    ‚îÇ
‚îÇ         ‚îî‚îÄ‚îÄ‚îÄ üß† SPAN 5: tool_calling_llm [2nd call] (span_id: sp_005, parent: sp_002)
‚îÇ              ‚îÇ   Type: LLM
‚îÇ              ‚îÇ   Duration: 2.10s
‚îÇ              ‚îÇ   Input: Original query + Tool results
‚îÇ              ‚îÇ   Output: Final formatted response
‚îÇ              ‚îÇ   Attributes:
‚îÇ              ‚îÇ       - model: "databricks-meta-llama-3-3-70b-instruct"
‚îÇ              ‚îÇ       - temperature: 0.01
‚îÇ              ‚îÇ       - input_tokens: 245
‚îÇ              ‚îÇ       - output_tokens: 180
‚îÇ              ‚îÇ       - total_tokens: 425
‚îÇ              ‚îÇ       - has_tool_calls: false
‚îÇ
‚îî‚îÄ‚îÄ‚îÄ üìä TRACE SUMMARY
     Total Duration: 8.45s
     Total Tokens: 537 (input: 312, output: 225)
     Estimated Cost: $0.0027
     Status: Success ‚úÖ
```

## Deep Dive: Each Span Explained

### Span 1: Agent Predict (Root Span)

This is the entry point created by the `DocsAgent.predict()` method:

```python
{
    "span_id": "sp_001",
    "name": "DocsAgent.predict",
    "span_type": "AGENT",
    "start_time": 1732315680000,
    "end_time": 1732315688450,
    "inputs": {
        "messages": [
            {
                "role": "user",
                "content": "What is the latest news on OpenAI product releases? Provide results in bullet points"
            }
        ]
    },
    "outputs": {
        "messages": [
            {
                "role": "assistant",
                "content": "Based on the latest information:\n\n‚Ä¢ GPT-4 Turbo launched with enhanced capabilities...",
                "additional_kwargs": {}
            }
        ]
    },
    "attributes": {
        "mlflow.agent.class": "DocsAgent",
        "mlflow.agent.temperature": 0.01,
        "mlflow.agent.max_tokens": 1000,
        "mlflow.agent.endpoint": "databricks-meta-llama-3-3-70b-instruct"
    }
}
```

**Key Insights:**
- Captures the complete user request and final response
- Shows agent-level configuration
- Measures end-to-end execution time

### Span 2: StateGraph Invoke (Chain Execution)

Created when `self.agent.invoke(request)` is called:

```python
{
    "span_id": "sp_002",
    "parent_span_id": "sp_001",
    "name": "StateGraph.invoke",
    "span_type": "CHAIN",
    "start_time": 1732315680100,
    "end_time": 1732315688400,
    "inputs": {
        "messages": [
            {
                "role": "user",
                "content": "What is the latest news on OpenAI product releases?"
            }
        ]
    },
    "outputs": {
        "messages": [/* final response with tool results */]
    },
    "attributes": {
        "graph_type": "StateGraph",
        "node_count": 2,
        "edge_count": 4
    }
}
```

**Key Insights:**
- Represents the entire LangGraph workflow
- Parent of all node executions (LLM calls, tool calls)
- Shows the graph structure

### Span 3: First LLM Call (Tool Decision)

Created by the `tool_calling_llm` node:

```python
{
    "span_id": "sp_003",
    "parent_span_id": "sp_002",
    "name": "tool_calling_llm",
    "span_type": "LLM",
    "start_time": 1732315680150,
    "end_time": 1732315682800,
    "inputs": {
        "messages": [
            {
                "role": "user",
                "content": "What is the latest news on OpenAI product releases?"
            }
        ],
        "model": "databricks-meta-llama-3-3-70b-instruct",
        "temperature": 0.01,
        "max_tokens": 1000
    },
    "outputs": {
        "content": "",
        "tool_calls": [
            {
                "name": "agentic_ai__databricks__search_web",
                "args": {
                    "query": "OpenAI latest product releases news 2024"
                },
                "id": "call_xyz789"
            }
        ]
    },
    "attributes": {
        "mlflow.chat.tokenUsage": {
            "input_tokens": 67,
            "output_tokens": 45,
            "total_tokens": 112
        },
        "execution_time_ms": 2650,
        "has_tool_calls": true,
        "tool_call_count": 1
    }
}
```

**Key Insights:**
- Shows LLM decided to call a tool
- Captures token usage for cost tracking
- Records exact tool call parameters
- Shows execution latency

### Span 4: Tool Execution (Search Web)

Created by the `ChatAgentToolNode`:

```python
{
    "span_id": "sp_004",
    "parent_span_id": "sp_002",
    "name": "search_web",
    "span_type": "TOOL",
    "start_time": 1732315682850,
    "end_time": 1732315686200,
    "inputs": {
        "tool_name": "agentic_ai__databricks__search_web",
        "tool_input": {
            "query": "OpenAI latest product releases news 2024"
        }
    },
    "outputs": {
        "content": "Recent OpenAI developments include:\n- GPT-4 Turbo launch with 128K context window\n- ChatGPT Enterprise features for businesses\n- DALL-E 3 integration with ChatGPT\n- Custom GPTs marketplace announcement..."
    },
    "attributes": {
        "mlflow.tool.function_name": "agentic_ai.databricks.search_web",
        "execution_time_ms": 3350,
        "tool_status": "success",
        "result_length": 487
    }
}
```

**Key Insights:**
- Shows actual tool execution
- Captures tool inputs and outputs
- Tracks tool-specific latency
- Can identify slow tools

### Span 5: Second LLM Call (Response Generation)

Another call to `tool_calling_llm` with tool results:

```python
{
    "span_id": "sp_005",
    "parent_span_id": "sp_002",
    "name": "tool_calling_llm",
    "span_type": "LLM",
    "start_time": 1732315686250,
    "end_time": 1732315688350,
    "inputs": {
        "messages": [
            {
                "role": "user",
                "content": "What is the latest news on OpenAI product releases?"
            },
            {
                "role": "assistant",
                "content": "",
                "tool_calls": [/* previous tool call */]
            },
            {
                "role": "tool",
                "content": "Recent OpenAI developments include...",
                "tool_call_id": "call_xyz789"
            }
        ]
    },
    "outputs": {
        "content": "Based on the latest information, here are the recent OpenAI product releases:\n\n‚Ä¢ GPT-4 Turbo with 128K context window for processing longer documents\n‚Ä¢ ChatGPT Enterprise with advanced security and admin controls\n‚Ä¢ DALL-E 3 integration for generating images within ChatGPT\n‚Ä¢ Custom GPTs marketplace for specialized AI assistants",
        "tool_calls": []
    },
    "attributes": {
        "mlflow.chat.tokenUsage": {
            "input_tokens": 245,
            "output_tokens": 180,
            "total_tokens": 425
        },
        "execution_time_ms": 2100,
        "has_tool_calls": false,
        "response_type": "text"
    }
}
```

**Key Insights:**
- Shows LLM processing tool results
- Higher input tokens (includes tool output)
- No tool calls (final response)
- Generates user-facing answer

## Accessing and Analyzing Traces

### Retrieving Traces Programmatically

```python
import mlflow

# Execute the agent
result = AGENT.predict([{
    "role": "user",
    "content": "What is the latest news on OpenAI?"
}])

# Get the trace ID
trace_id = mlflow.get_last_active_trace_id()
print(f"Trace ID: {trace_id}")

# Retrieve the complete trace
trace = mlflow.get_trace(trace_id)

# Access trace information
print(f"Duration: {trace.info.execution_duration}ms")
print(f"Status: {trace.info.state}")
print(f"Request: {trace.info.request_preview}")
print(f"Response: {trace.info.response_preview}")

# Access token usage
if trace.info.token_usage:
    print(f"Total Tokens: {trace.info.token_usage['total_tokens']}")
    print(f"Input Tokens: {trace.info.token_usage['input_tokens']}")
    print(f"Output Tokens: {trace.info.token_usage['output_tokens']}")
```

### Analyzing Individual Spans

```python
# Iterate through all spans
print("\n=== Span Analysis ===")
for span in trace.data.spans:
    print(f"\nSpan: {span.name}")
    print(f"  Type: {span.span_type}")
    print(f"  Duration: {span.end_time_unix_ms - span.start_time_unix_ms}ms")
    
    # Check for token usage
    if token_usage := span.get_attribute("mlflow.chat.tokenUsage"):
        print(f"  Tokens: {token_usage['total_tokens']}")
    
    # Check for tool information
    if tool_name := span.get_attribute("mlflow.tool.function_name"):
        print(f"  Tool: {tool_name}")
```

### Searching Traces

```python
# Search traces by experiment
traces = mlflow.search_traces(
    experiment_names=["Agent_Tracing_Demo"],
    max_results=10
)

# Filter traces by tags
traces = mlflow.search_traces(
    filter_string="tags.use_case = 'analytics'",
    max_results=10
)

# Sort by execution time
traces = mlflow.search_traces(
    experiment_names=["Agent_Tracing_Demo"],
    order_by=["execution_duration DESC"],
    max_results=5
)

# Display results
for trace in traces:
    print(f"Trace: {trace.info.trace_id}")
    print(f"  Duration: {trace.info.execution_duration}ms")
    print(f"  Request: {trace.info.request_preview[:50]}...")
    print(f"  Status: {trace.info.state}")
    print()
```

## Key Differences: MLflow vs Non-MLflow Implementation

Understanding what makes the MLflow version different is crucial:

### Non-MLflow Version

```python
# Simple state without tracing support
class State(TypedDict):
    messages: Annotated[list, add_messages]

def tool_calling_llm(state: State) -> State:
    """No config, no tracing context"""
    current_state = state["messages"]
    return {"messages": [llm_with_tools.invoke(current_state)]}

# Regular ToolNode
builder.add_node("tools", ToolNode(tools=tools))
```

**What's Missing:**
- ‚ùå No automatic tracing
- ‚ùå No configuration flow
- ‚ùå No token tracking
- ‚ùå No span creation
- ‚ùå Not deployment-ready

### MLflow-Compatible Version

```python
# ChatAgentState with tracing support
from mlflow.langchain.chat_agent_langgraph import ChatAgentState

def tool_calling_llm(state: ChatAgentState, config: RunnableConfig):
    """With config and tracing context"""
    response = model_runnable.invoke(state, config)
    return {"messages": [response]}

# ChatAgentToolNode with automatic span creation
builder.add_node("tools", ChatAgentToolNode(tools=tools))
```

**What's Included:**
- ‚úÖ Automatic trace creation
- ‚úÖ Configuration propagation
- ‚úÖ Token usage tracking
- ‚úÖ Detailed span hierarchy
- ‚úÖ Production-ready observability

## Real-World Use Cases

### Use Case 1: Debugging Slow Responses

```python
# Execute agent
result = AGENT.predict([{"role": "user", "content": "Explain quantum computing"}])

# Analyze the trace
trace = mlflow.get_trace(mlflow.get_last_active_trace_id())

# Find the slowest span
slowest_span = max(
    trace.data.spans,
    key=lambda s: s.end_time_unix_ms - s.start_time_unix_ms
)

print(f"Bottleneck: {slowest_span.name}")
print(f"Duration: {slowest_span.end_time_unix_ms - slowest_span.start_time_unix_ms}ms")

# Identify if it's LLM or tool
if slowest_span.span_type == "TOOL":
    print("Tool execution is the bottleneck")
elif slowest_span.span_type == "LLM":
    print("LLM call is the bottleneck")
```

### Use Case 2: Cost Optimization

```python
# Track costs across multiple requests
total_tokens = 0
total_cost = 0.0

# Cost per 1K tokens (example rates)
INPUT_TOKEN_COST = 0.0001
OUTPUT_TOKEN_COST = 0.0002

for i in range(10):
    result = AGENT.predict([{"role": "user", "content": f"Query {i}"}])
    trace = mlflow.get_trace(mlflow.get_last_active_trace_id())
    
    if trace.info.token_usage:
        input_tokens = trace.info.token_usage['input_tokens']
        output_tokens = trace.info.token_usage['output_tokens']
        
        total_tokens += input_tokens + output_tokens
        total_cost += (input_tokens / 1000 * INPUT_TOKEN_COST) + \
                      (output_tokens / 1000 * OUTPUT_TOKEN_COST)

print(f"Total Tokens: {total_tokens}")
print(f"Total Cost: ${total_cost:.4f}")
print(f"Average Cost per Request: ${total_cost/10:.4f}")
```

### Use Case 3: Quality Monitoring

```python
# Monitor tool usage patterns
traces = mlflow.search_traces(
    experiment_names=["Agent_Tracing_Demo"],
    max_results=100
)

tool_usage = {}
for trace in traces:
    for span in trace.data.spans:
        if span.span_type == "TOOL":
            tool_name = span.name
            tool_usage[tool_name] = tool_usage.get(tool_name, 0) + 1

print("=== Tool Usage Statistics ===")
for tool, count in sorted(tool_usage.items(), key=lambda x: x[1], reverse=True):
    print(f"{tool}: {count} calls")
```

## Best Practices for Tracing

### 1. Use Meaningful Experiment Names

```python
# Good: Descriptive experiment names
mlflow.set_experiment("Production_Agent_Customer_Support")
mlflow.set_experiment("Dev_Agent_Testing_v2")

# Bad: Generic names
mlflow.set_experiment("test")
mlflow.set_experiment("experiment1")
```

### 2. Add Context with Tags

```python
# Add tags for filtering and analysis
mlflow.set_tags({
    "environment": "production",
    "user_tier": "enterprise",
    "version": "2.1.0",
    "use_case": "customer_analytics"
})
```

### 3. Monitor Critical Metrics

```python
def monitor_agent_performance(trace):
    """Monitor key performance indicators"""
    duration = trace.info.execution_duration
    tokens = trace.info.token_usage
    
    # Alert on slow responses
    if duration > 10000:  # 10 seconds
        print(f"‚ö†Ô∏è ALERT: Slow response detected ({duration}ms)")
    
    # Alert on high token usage
    if tokens and tokens['total_tokens'] > 2000:
        print(f"‚ö†Ô∏è ALERT: High token usage ({tokens['total_tokens']} tokens)")
    
    # Check for errors
    if trace.info.state == "ERROR":
        print(f"‚ùå ERROR: Agent execution failed")
```

### 4. Regularly Review Traces

```python
# Weekly performance review
def weekly_trace_analysis():
    traces = mlflow.search_traces(
        experiment_names=["Production_Agent"],
        max_results=1000
    )
    
    durations = [t.info.execution_duration for t in traces]
    tokens = [t.info.token_usage['total_tokens'] 
              for t in traces if t.info.token_usage]
    
    print(f"Average Duration: {sum(durations)/len(durations):.2f}ms")
    print(f"95th Percentile Duration: {sorted(durations)[int(len(durations)*0.95)]:.2f}ms")
    print(f"Average Tokens: {sum(tokens)/len(tokens):.2f}")
    print(f"Total Cost: ${sum(tokens) * 0.0001:.2f}")
```

## Conclusion

MLflow tracing transforms AI agent development and deployment by providing:

1. **Complete Visibility**: See every step of agent execution
2. **Performance Insights**: Identify bottlenecks and optimize
3. **Cost Tracking**: Monitor token usage and costs
4. **Debugging Power**: Pinpoint issues quickly
5. **Production Readiness**: Monitor deployed agents in real-time

By understanding traces (the complete journey) and spans (individual steps), you can build agents that are not just functional, but observable, debuggable, and production-ready.

The key difference between a prototype and a production agent is observability. With MLflow tracing, you're not flying blind‚Äîyou have a detailed map of every agent execution, empowering you to build better, faster, and more reliable AI systems.

---

## Quick Reference

```python
# Enable tracing
import mlflow
mlflow.langchain.autolog()

# Create MLflow-compatible agent
from mlflow.langchain.chat_agent_langgraph import ChatAgentState, ChatAgentToolNode

def tool_calling_llm(state: ChatAgentState, config: RunnableConfig):
    response = model_runnable.invoke(state, config)
    return {"messages": [response]}

# Execute and retrieve trace
result = agent.predict(messages)
trace = mlflow.get_trace(mlflow.get_last_active_trace_id())

# Analyze traces
print(f"Duration: {trace.info.execution_duration}ms")
print(f"Tokens: {trace.info.token_usage}")

# Search traces
traces = mlflow.search_traces(
    experiment_names=["My_Agent"],
    filter_string="tags.use_case = 'analytics'"
)
```

Happy tracing! üöÄ