
<center>
    <p style="text-align:center">
    <img alt="arize logo" src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="300"/>
        <br>
        <a href="https://docs.arize.com/arize/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/client_python">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-11t1vbu4x-xkBIHmOREQnYnYDH1GDfCg">Community</a>
    </p>
</center>

# <center>LlamaIndex Tracing using Arize</center>

This guide demonstrates how to use Arize for monitoring and debugging your LLM using Traces and Spans. We're going to build a simple query engine using LlamaIndex and retrieval-augmented generation (RAG) to answer questions about the [Arize documentation](https://docs.arize.com/arize/). You can read more about LLM tracing [here](https://docs.arize.com/arize/llm-large-language-models/llm-traces). Arize makes your LLM applications observable by visualizing the underlying structure of each call to your query engine and surfacing problematic `spans` of execution based on latency, token count, or other evaluation metrics.

In this tutorial, you will:
1. Use opentelemetry and [openinference](https://github.com/Arize-ai/openinference/tree/main) to instrument our application in order to send traces to Arize.
2. Build a simple query engine using LlamaIndex that uses RAG to answer questions about the Arize documentation
3. Inspect the traces and spans of your application to identify sources of latency and cost

ℹ️ This notebook requires:
- An OpenAI API key
- An Arize Space & API Key (explained below)


## Step 1: Install Dependencies 📚
Let's get the notebook setup with dependencies.

In [None]:
# External dependencies needed to build the Llama Index RAG application and export spans to Arize
!pip install -q gcsfs==2024.10.0 llama-index==0.12.5 opentelemetry-exporter-otlp==1.28.0

# Arize dependencies needed to instrument your LlamaIndex application using opentelemetry and openinference
!pip install -q "openinference-instrumentation-llama-index>=3.0.4" "arize-otel>=0.7.0"

## Step 2: Tracing your application

Copy the Arize API_KEY and SPACE_ID from your Space Settings page (shown below) to the variables in the cell below.

<center><img src="https://storage.googleapis.com/arize-assets/barcelos/Screenshot%202024-11-11%20at%209.28.27%E2%80%AFPM.png" width="700"></center>

In [None]:
# Import open-telemetry dependencies
from arize.otel import register
from getpass import getpass

# Setup OTEL via our convenience function
tracer_provider = register(
    space_id=getpass("Enter your Arize Space ID:"),
    api_key=getpass("Enter your Arize API Key:"),
    project_name="llamaindex-tracing",
)

# Import the automatic instrumentor from OpenInference
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

# Finish automatic instrumentation
LlamaIndexInstrumentor().instrument(
    tracer_provider=tracer_provider, skip_dep_check=True
)

## Step 3: Build Your Llama Index RAG Application 📁
Let's import the dependencies we need

In [None]:
from getpass import getpass

from gcsfs import GCSFileSystem
from llama_index.core import (
    Settings,
    StorageContext,
    load_index_from_storage,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

Set your OpenAI API key if it is not already set as an environment variable.

In [None]:
import os

if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")

os.environ["OPENAI_API_KEY"] = openai_api_key

This example uses a `RetrieverQueryEngine` over a pre-built index of the Arize documentation, but you can use whatever LlamaIndex application you like. Download the pre-built index of the Arize docs from cloud storage and instantiate your storage context.

In [None]:
file_system = GCSFileSystem(project="public-assets-275721")
index_path = "arize-phoenix-assets/datasets/unstructured/llm/llama-index/arize-docs/index/"
storage_context = StorageContext.from_defaults(
    fs=file_system,
    persist_dir=index_path,
)

We are now ready to instantiate our query engine that will perform retrieval-augmented generation (RAG). Query engine is a generic interface in LlamaIndex that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is built on top of Retrievers. You can compose multiple query engines to achieve more advanced capability.

In [None]:
Settings.llm = OpenAI(model="gpt-4o")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
index = load_index_from_storage(
    storage_context,
)
query_engine = index.as_query_engine()

Let's test our app by asking a question about the Arize documentation:

In [None]:
response = query_engine.query(
    "What is Arize and how can it help me as an AI Engineer?"
)
print(response)

Great! Our application works!

## Step 4: Use our instrumented query engine

We will download a dataset of queries for our RAG application to answer and see the traces appear in Arize.

In [None]:
from urllib.request import urlopen
import json

queries_url = "http://storage.googleapis.com/arize-phoenix-assets/datasets/unstructured/llm/context-retrieval/arize_docs_queries.jsonl"
queries = []
with urlopen(queries_url) as response:
    for line in response:
        line = line.decode("utf-8").strip()
        data = json.loads(line)
        queries.append(data["query"])

queries[:5]

In [None]:
from tqdm import tqdm
from openinference.instrumentation import using_attributes

N1 = 5  # Number of traces for your first session
SESSION_ID_1 = "session-id-1"  # Identifer for your first session
USER_ID_1 = "john_smith"  # Identifer for your first session
METADATA = {"key_bool": True, "key_str": "value1", "key_int": 1}

qa_pairs = []
for query in tqdm(queries[:N1]):
    with using_attributes(
        session_id=SESSION_ID_1,
        user_id=USER_ID_1,
        metadata=METADATA,
    ):
        resp = query_engine.query(query)
        qa_pairs.append((query, resp))

In [None]:
N2 = 3  # Number of traces for your second session
SESSION_ID_2 = "session-id-2"  # Identifer for your second session
USER_ID_2 = "jane_doe"  # Identifer for your second session

for query in tqdm(queries[N1 : N1 + N2]):
    with using_attributes(
        session_id=SESSION_ID_2, user_id=USER_ID_2, metadata=METADATA
    ):
        resp = query_engine.query(query)
        qa_pairs.append((query, resp))

In [None]:
for q, a in qa_pairs:
    q_msg = f">> QUESTION: {q}"
    print(f"{'-'*len(q_msg)}")
    print(q_msg)
    print(f">> ANSWER: {a}\n")

## Step 5: Log into Arize and explore your application traces 🚀

Log into your Arize account, and look for the model with the same `model_id`. You are likely to see the following page if you are sending a brand new model. Arize is processing your data and your model will be accessible for you to explore your traces in no time.

<center><img src="https://storage.googleapis.com/arize-assets/fixtures/Embeddings/GENERATIVE/model-loading-tutorial-otlp-llama-index.png" width="700"></center>

After the timer is completed, you are ready to navigate and explore your traces

<center><img src="https://storage.googleapis.com/arize-assets/fixtures/Embeddings/GENERATIVE/llm-tracing-overview-llama-index.png" width="700"></center>

<center><img src="https://storage.googleapis.com/arize-assets/fixtures/Embeddings/GENERATIVE/llm-tracing-detail-llama-index.png" width="700"></center>
