<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://arize.com/docs/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email">Community</a>
    </p>
</center>
<h1 align="center">Tracing and Evaluating a LlamaIndex Application</h1>

LlamaIndex provides high-level APIs that enable users to build powerful applications in a few lines of code. However, it can be challenging to understand what is going on under the hood and to pinpoint the cause of issues. Phoenix makes your LLM applications *observable* by visualizing the underlying structure of each call to your query engine and surfacing problematic `spans` of execution based on latency, token count, or other evaluation metrics.

In this tutorial, you will:
- Build a simple query engine using LlamaIndex that uses retrieval-augmented generation to answer questions over the Arize documentation,
- Record trace data in [OpenInference tracing](https://github.com/Arize-ai/openinference) format using the global `arize_phoenix` handler
- Inspect the traces and spans of your application to identify sources of latency and cost,
- Export your trace data as a pandas dataframe and run an [LLM Evals](https://arize.com/docs/phoenix/concepts/llm-evals) to measure the precision@k of the query engine's retrieval step.

ℹ️ This notebook requires an OpenAI API key.

## 1. Install Dependencies and Import Libraries

Install Phoenix, LlamaIndex, and OpenAI.

In [None]:
!pip install "arize-phoenix[evals]" "openai>=1" 'httpx<0.28' nest-asyncio "openinference-instrumentation-llama-index>=2.0.0" "llama-index-core" "llama-index-llms-openai" "llama-index-embeddings-openai"

Import libraries.

In [None]:
import os
from getpass import getpass

import nest_asyncio
import pandas as pd
from llama_index.core import Document, Settings, VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from tqdm import tqdm

nest_asyncio.apply()  # needed for concurrent evals in notebook environments
pd.set_option("display.max_colwidth", 1000)

## 2. Set Up Phoenix 

You'll need Phoenix running to collect and visualize trace data from your LlamaIndex application. You have two options:

### Option 1: Local Phoenix (Free)

Run Phoenix locally on your machine. Install Phoenix and launch it in a separate terminal:

```bash
pip install arize-phoenix
phoenix serve
```

This will start Phoenix at `http://localhost:6006`. You can view the Phoenix UI by opening this URL in your browser.

### Option 2: Phoenix Cloud (Hosted)

Use Phoenix Cloud for a hosted solution. Sign up for a free account at [Phoenix Cloud](https://app.phoenix.arize.com) and get your API key. Then set it as an environment variable:

```bash
export PHOENIX_COLLECTOR_ENDPOINT=your_endpoint_here
export PHOENIX_API_KEY=your_api_key_here
```

The instrumentation code will automatically detect your API key and send traces to Phoenix Cloud.


## 3. Configure Your OpenAI API Key

Set your OpenAI API key if it is not already set as an environment variable.

In [None]:
if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")

os.environ["OPENAI_API_KEY"] = openai_api_key

## 4. Build Your LlamaIndex Application

This example creates a simple `VectorStoreIndex` with documents about Arize and Phoenix for demonstration purposes. You can replace these with your own documents.

Let's create a document collection and build our index:

In [None]:
# Configure LlamaIndex settings
Settings.llm = OpenAI(model="gpt-4o")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

# Create a simple document collection about Arize and Phoenix for demo
documents = [
    Document(
        text="Phoenix is Arize's open-source observability tool for LLM applications, providing tracing and evaluation capabilities. It helps developers understand LLM application performance and debug issues."
    ),
    Document(
        text="LLM tracing with Phoenix helps you understand the performance and behavior of your language model applications. You can track token usage, latency, and see the complete flow of your RAG pipeline."
    ),
    Document(
        text="Retrieval-augmented generation (RAG) combines information retrieval with language generation for better responses. It allows LLMs to access external knowledge beyond their training data."
    ),
    Document(
        text="Model evaluation is crucial for understanding how well your ML models perform on real-world data. Phoenix provides LLM evals to assess quality, relevance, and hallucinations."
    ),
    Document(
        text="Vector embeddings are numerical representations of text that capture semantic meaning. They enable similarity search and retrieval in RAG systems."
    ),
    Document(
        text="OpenInference is an open standard for LLM application observability. It defines how to capture and store trace data from LLM applications."
    ),
    Document(
        text="Span annotations in Phoenix allow you to add custom metadata and evaluations to your traces, helping you analyze and improve your LLM applications."
    ),
]

Enable Phoenix tracing via `LlamaIndexInstrumentor`. Phoenix uses OpenInference traces - an open-source standard for capturing and storing LLM application traces that enables LLM applications to seamlessly integrate with LLM observability solutions such as Phoenix.

In [None]:
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

from phoenix.otel import register

tracer_provider = register(project_name="llamaindex-tracing-tutorial", protocol="http/protobuf")
LlamaIndexInstrumentor().instrument(
    tracer_provider=tracer_provider,
)

We are now ready to instantiate our query engine that will perform retrieval-augmented generation (RAG). Query engine is a generic interface in LlamaIndex that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is built on top of Retrievers. You can compose multiple query engines to achieve more advanced capability

In [None]:
# Create the vector index from our documents
print("Creating vector index...")
index = VectorStoreIndex.from_documents(documents)

# Create the query engine
query_engine = index.as_query_engine()
print("✅ Query engine ready!")

## 5. Run Your Query Engine and View Your Traces in Phoenix

Let's create some sample queries that relate to our document collection and test our query engine:

In [None]:
# Sample queries that relate to our document collection
queries = [
    "What is Arize and what does it help with?",
    "How does Phoenix help with LLM observability?",
    "What is retrieval-augmented generation?",
    "How can I evaluate my AI Agents?",
    "What are vector embeddings used for?",
    "What is OpenInference?",
    "How do span annotations work in Phoenix?",
    "What kind of monitoring does Arize provide for AI Agents?",
]

print("Sample queries:")
for i, query in enumerate(queries, 1):
    print(f"{i}. {query}")
print(f"\nTotal queries: {len(queries)}")

Let's run these queries and view the traces in Phoenix.

In [None]:
for query in tqdm(queries):
    response = query_engine.query(query)
    print(f"Query: {query}")
    print(f"Response: {response}\n")
    print("-" * 50)

And just for fun, ask your own question!

In [None]:
response = query_engine.query("What is Arize and how can it help me as an AI Engineer?")
print(response)

Check the Phoenix UI as your queries run. Your traces should appear in real time.

<img src="https://storage.googleapis.com/arize-phoenix-assets/assets/docs/notebooks/llama-index-knowledge-base-tutorial/Screenshot%202025-09-08%20at%206.53.21%E2%80%AFPM.png" alt="Trace Details View on Phoenix" style="width:100%; height:auto;">