# Arize AX Tracing Tutorial: Building a Customer Support Agent

In this tutorial, you'll learn how to instrument an LLM application with complete observability using Arize AX. We'll build a customer support chatbot called **SupportBot** that handles order status queries and FAQ questions.

By the end of this tutorial, you'll be able to:

- Capture complete execution traces for LLM calls, tool invocations, and RAG pipelines
- Collect and log user feedback
- Run automated LLM-as-Judge evaluations
- Track multi-turn conversations with session management

## Prerequisites

Before starting, make sure you have:

- An Arize AX account (sign up for free at [app.arize.com](https://app.arize.com))
- Your Space ID and API Key (found in Settings â†’ API Keys)
- OpenAI API key (or another supported LLM provider)

## Part 1: Setup and Installation

First, let's install the required dependencies for tracing with Arize AX.

In [1]:
%pip install -qqqq openinference-instrumentation-openai openai arize-otel arize openinference-instrumentation

Note: you may need to restart the kernel to use updated packages.


## Part 2: Configure Tracing

Next, we'll set up the connection to Arize AX and instrument OpenAI to automatically capture traces.

In [None]:
import os 
os.environ["ARIZE_API_KEY"] = "<YOUR_API_KEY>"
os.environ["ARIZE_SPACE_ID"] = "<YOUR_SPACE_ID>"
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

In [None]:
from arize.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor
import openai
from opentelemetry import trace

tracer_provider = register(
    space_id=os.getenv("ARIZE_SPACE_ID"), 
    api_key=os.getenv("ARIZE_API_KEY"),  
    project_name="my-support-bot",
)

OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

openai_client = openai.OpenAI()
tracer = trace.get_tracer(__name__)

ðŸ”­ OpenTelemetry Tracing Details ðŸ”­
|  Arize Project: my-support-bot
|  Span Processor: BatchSpanProcessor
|  Collector Endpoint: otlp.arize.com
|  Transport: gRPC
|  Transport Headers: {'authorization': '****', 'api_key': '****', 'arize-space-id': '****', 'space_id': '****', 'arize-interface': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



**That's it!** With just these few lines, OpenInference will automatically capture traces for all your OpenAI calls. The instrumentation handles all the OpenTelemetry boilerplate for you.

## Part 3: Your First Trace - Query Classification

Let's start by creating a simple query classifier that determines if a user message is about order status or a general FAQ.

In [4]:
def classify_query(user_message: str) -> str:
    """Classify if the query is about order status or a general FAQ."""
    
    with tracer.start_as_current_span("classify-query") as span:
        span.set_attribute("openinference.span.kind", "CHAIN")
        span.set_attribute("input.value", user_message)
        
        response = openai_client.chat.completions.create(
            model="gpt-4",
            messages=[
                {
                    "role": "system",
                    "content": "You are a classifier. Respond with either 'ORDER_STATUS' or 'FAQ'."
                },
                {"role": "user", "content": user_message}
            ],
        )
        
        classification = response.choices[0].message.content.strip()
        span.set_attribute("output.value", classification)
        
        return classification

In [5]:
result = classify_query("Where is my order #12345?")
print(f"Classification: {result}")

Classification: ORDER_STATUS


Head to your Arize AX dashboard, and you'll see this trace appear in real-time! You can see:
- The parent "classify-query" span
- The automatically captured OpenAI call details
- Input and output values

## Part 4: Tool Call Tracing

Now let's add the ability to look up order status using OpenAI's function calling.

In [8]:
import json

def get_order_status(order_id: str) -> dict:
    """Simulate looking up an order in a database."""
    return {
        "order_id": order_id,
        "status": "In Transit",
        "estimated_delivery": "2024-03-15"
    }

def handle_order_query(user_message: str, prior_messages: list | None = None) -> str:
    """Handle order status queries using tool calling. Optional prior_messages for multi-turn context."""
    
    with tracer.start_as_current_span("handle-order-query") as span:
        span.set_attribute("openinference.span.kind", "CHAIN")
        span.set_attribute("input.value", user_message)
        
        system_msg = {"role": "system", "content": "You are a customer support agent with access to order lookup. Use the get_order_status tool when the user asks about their order or provides an order ID."}
        messages = [system_msg] + (prior_messages or []) + [{"role": "user", "content": user_message}]
        
        tools = [
            {
                "type": "function",
                "function": {
                    "name": "get_order_status",
                    "description": "Look up the status of a customer order",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "order_id": {
                                "type": "string",
                                "description": "The order ID (e.g., '12345')"
                            }
                        },
                        "required": ["order_id"]
                    }
                }
            }
        ]
        
        response = openai_client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            tools=tools,
        )

        if response.choices[0].message.tool_calls:
            tool_call = response.choices[0].message.tool_calls[0]
            
            with tracer.start_as_current_span("execute-tool") as tool_span:
                tool_span.set_attribute("openinference.span.kind", "TOOL")
                tool_span.set_attribute("tool.name", tool_call.function.name)
                tool_span.set_attribute("tool.parameters", tool_call.function.arguments)
                
                args = json.loads(tool_call.function.arguments)
                result = get_order_status(args["order_id"])
                
                tool_span.set_attribute("tool.result", json.dumps(result))
            
            tool_messages = [
                {"role": "user", "content": user_message},
                response.choices[0].message,
                {"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result)}
            ]
            final_response = openai_client.chat.completions.create(
                model="gpt-4",
                messages=[system_msg] + (prior_messages or []) + tool_messages,
            )

            span.set_attribute("output.value", final_response.choices[0].message.content)
            
            return final_response.choices[0].message.content
        
        return response.choices[0].message.content

In [9]:
response = handle_order_query("What's the status of order 12345?")
print(response)

The status of your order number 12345 is "In Transit". The estimated delivery date is 15th March, 2024.


In Arize AX, you'll now see a complete trace tree showing:
- The parent "handle-order-query" span
- The LLM call that decided to use a tool
- The "execute-tool" span with parameters and results
- The final LLM call that formulated the response

## Part 5: RAG Tracing

For FAQ queries, we'll use a simple RAG (Retrieval-Augmented Generation) pipeline. Let's trace both the retrieval and generation steps.

In [12]:
def embed_text(text: str) -> list[float]:
    """Generate embedding for text."""
    
    with tracer.start_as_current_span("embed-text") as span:
        span.set_attribute("openinference.span.kind", "EMBEDDING")
        span.set_attribute("input.value", text)
        span.set_attribute("embedding.model_name", "text-embedding-3-small")
        span.set_attribute(
            "embedding.invocation_parameters",
            json.dumps({"model": "text-embedding-3-small"})
        )
        
        response = openai_client.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        
        embedding = response.data[0].embedding
        
        span.set_attribute("embedding.embeddings.0.embedding.text", text)
        span.set_attribute("embedding.embeddings.0.embedding.vector", embedding)
        
        return embedding

def retrieve_documents(query: str, knowledge_base: list[str], top_k: int = 2) -> list[str]:
    """Retrieve relevant documents using embeddings."""
    
    with tracer.start_as_current_span("retrieve-documents") as span:
        span.set_attribute("openinference.span.kind", "RETRIEVER")
        span.set_attribute("input.value", query)
        
        query_embedding = embed_text(query)
        span.set_attribute("retrieval.query_embedding_dims", len(query_embedding))
        
        retrieved = knowledge_base[:top_k]

        span.set_attribute("retrieval.documents", retrieved)
        
        return retrieved

def handle_faq_query(user_message: str) -> str:
    """Handle FAQ queries using RAG."""
    
    knowledge_base = [
        "We offer free shipping on orders over $50.",
        "Returns are accepted within 30 days of purchase.",
        "You can track your order using the tracking number in your confirmation email.",
    ]
    
    with tracer.start_as_current_span("handle-faq-query") as span:
        span.set_attribute("openinference.span.kind", "CHAIN")
        span.set_attribute("input.value", user_message)
        
        relevant_docs = retrieve_documents(user_message, knowledge_base)
        
        context = "\n".join(relevant_docs)
        response = openai_client.chat.completions.create(
            model="gpt-4",
            messages=[
                {
                    "role": "system",
                    "content": f"Answer the user's question using this context:\n{context}"
                },
                {"role": "user", "content": user_message}
            ],
        )
        
        answer = response.choices[0].message.content
        span.set_attribute("output.value", answer)
        
        return answer

In [13]:
response = handle_faq_query("What's your shipping policy?")
print(response)

Our shipping policy is that we offer free shipping on orders over $50.


Perfect! Now you can see exactly:
- What query was sent to retrieval
- Which documents were retrieved
- What context was passed to the LLM
- What answer was generated

## Part 6: Complete SupportBot

Let's put it all together into a complete SupportBot with full tracing.

In [14]:
def supportbot(user_message: str) -> str:
    """Main SupportBot entry point with complete tracing."""
    
    with tracer.start_as_current_span("supportbot") as span:
        span.set_attribute("openinference.span.kind", "AGENT")
        span.set_attribute("input.value", user_message)
        
        classification = classify_query(user_message)
        
        if classification == "ORDER_STATUS":
            response = handle_order_query(user_message)
        else:
            response = handle_faq_query(user_message)
        
        span.set_attribute("output.value", response)
        span.set_attribute("classification", classification)
        
        return response

In [15]:
queries = [
    "Where is my order #12345?",
    "What's your return policy?",
    "Can I track my order?",
]

for query in queries:
    print(f"\nUser: {query}")
    print(f"Bot: {supportbot(query)}")


User: Where is my order #12345?
Bot: Your order #12345 is currently in transit. The estimated delivery date is March 15, 2024.

User: What's your return policy?
Bot: Our return policy allows for returns within 30 days of purchase.

User: Can I track my order?
Bot: Of course, I'd be happy to help you with that. Could you please provide me with your order ID?


# Annotations & Evaluations

## Part 7: Automated Evaluations

Manual feedback doesn't scale. Let's build automated evaluators using LLM-as-Judge to evaluate thousands of traces.

In [16]:
import pandas as pd

def evaluate_tool_results(trace_df: pd.DataFrame) -> pd.DataFrame:
    """Evaluate if tool calls succeeded or failed. Returns a DataFrame with context.span_id, label, score."""
    tool_spans = trace_df[trace_df["attributes.openinference.span.kind"] == 'TOOL']
    eval_results = []
    eval_name = "tool_success"

    for _, span in tool_spans.iterrows():
        span_id = span["context.span_id"]
        tool_result = span.get("attributes.tool.result", "")
        tool_result_str = str(tool_result).lower() if tool_result else ""
        has_error = any(
            kw in tool_result_str for kw in ["error", "failed", "invalid", "not found"]
        )

        label = "FAILED" if has_error else "SUCCESS"
        scores = 0.0 if has_error else 1.0

        current_eval = {
            "context.span_id": span_id,
            f"eval.{eval_name}.label": label,
            f"eval.{eval_name}.score" : scores,
        }
        eval_results.append(current_eval)
    return pd.DataFrame(eval_results)

In [20]:
from arize import ArizeClient
from datetime import datetime, timedelta
from openinference.instrumentation import suppress_tracing

client = ArizeClient(api_key=os.getenv("ARIZE_API_KEY"))

end_time = datetime.now()
start_time = end_time - timedelta(hours=1)

trace_df = client.spans.export_to_df(
    space_id=os.getenv("ARIZE_SPACE_ID"),
    project_name="my-support-bot",
    start_time=start_time,
    end_time=end_time,
)

with suppress_tracing():
    tool_eval_df = evaluate_tool_results(trace_df)

print(f"Evaluated {len(tool_eval_df)} tool calls")

[38;21m  arize._exporter.client | INFO | Fetching data...[0m


  exporting 40 rows: 100%|[38;2;0;128;0mâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ[0m| 40/40 [00:00, 436.27 row/s][0m

Evaluated 3 tool calls





In [21]:
response = client.spans.update_evaluations(
    space_id=os.getenv("ARIZE_SPACE_ID"),
    project_name="my-support-bot",
    dataframe=tool_eval_df,
    force_http=True,
)

[38;21m  arize.utils.arrow | INFO | âœ… Success! Check out your data at https://app.arize.com/organizations/QWNjb3VudE9yZ2FuaXphdGlvbjoxMzYwMDpZbVlu/spaces/U3BhY2U6MTQyNTM6a3pZWA==/models/modelName/my-support-bot?selectedTab=llmTracing[0m


## Part 9: Session Tracking for Multi-Turn Conversations

Real conversations involve multiple turns. Let's add session tracking to group related traces.

In [22]:
import uuid
import re
from openinference.instrumentation import using_session

class ConversationManager:
    """Manage multi-turn conversations with session tracking."""
    
    def __init__(self):
        self.sessions = {} 
    
    def start_conversation(self) -> str:
        """Start a new conversation and return session ID."""
        session_id = str(uuid.uuid4())
        self.sessions[session_id] = {
            "history": [],
            "context": {}  
        }
        return session_id
    
    def handle_message(self, session_id: str, user_message: str) -> str:
        """Handle a message within a conversation session."""
        
        with using_session(session_id=session_id):
            with tracer.start_as_current_span("conversation-turn") as span:
                span.set_attribute("openinference.span.kind", "CHAIN")
                span.set_attribute("input.value", user_message)
                
                history = self.sessions[session_id]["history"]
                context = self.sessions[session_id]["context"]
                
                prior_messages = []
                for turn in history:
                    prior_messages.append({"role": "user", "content": turn["user"]})
                    prior_messages.append({"role": "assistant", "content": turn["assistant"]})
                
                classification = classify_query(user_message)
                if classification == "ORDER_STATUS":
                    bot_response = handle_order_query(user_message, prior_messages=prior_messages)
                else:
                    bot_response = handle_faq_query(user_message)
                
                history.append({
                    "user": user_message,
                    "assistant": bot_response
                })
                
                self._update_context(context, user_message, bot_response)
                
                span.set_attribute("output.value", bot_response)
                span.set_attribute("turn_number", len(history))
                
                return bot_response
    
    def _update_context(self, context: dict, user_msg: str, bot_msg: str):
        """Extract and store conversation context."""
        order_ids = re.findall(r"\b\d{5}\b", user_msg + " " + bot_msg)
        if order_ids:
            context["order_id"] = order_ids[0]

In [23]:
manager = ConversationManager()

session_id = manager.start_conversation()

print("=== Conversation Started ===")

response1 = manager.handle_message(session_id, "Where is my order?")
print(f"User: Where is my order?")
print(f"Bot: {response1}\n")

response2 = manager.handle_message(session_id, "It's order 12345")
print(f"User: It's order 12345")
print(f"Bot: {response2}\n")

response3 = manager.handle_message(session_id, "When will it arrive?")
print(f"User: When will it arrive?")
print(f"Bot: {response3}")

=== Conversation Started ===
User: Where is my order?
Bot: To help you with your request, could you kindly provide the order ID of your package?

User: It's order 12345
Bot: Your order is currently in transit. The estimated delivery is on the 15th of March, 2024. If you have any more questions or need further assistance, feel free to ask.

User: When will it arrive?
Bot: Your order is still in transit and is estimated to arrive on March 15th, 2024.


In Arize AX, all these traces will be grouped together under the same session ID, allowing you to view the complete conversation thread.