

<center> <img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="300"/> </center>

# <center>Tracing via OTLP using Arize and Phoenix</center>

This guide demonstrates how to use Arize for monitoring and debugging your LLM using Traces and Spans. We're going to build a simple query engine using LlamaIndex and retrieval-augmented generation (RAG) to answer questions about the [Arize documentation](https://docs.arize.com/arize/). You can read more about LLM tracing [here](https://docs.arize.com/arize/llm-large-language-models/llm-traces). Arize & Phoenix make your LLM applications observable by visualizing the underlying structure of each call to your query engine and surfacing problematic `spans` of execution based on latency, token count, or other evaluation metrics.

In this tutorial, you will:
1. Use opentelemetry and [openinference](https://github.com/Arize-ai/openinference/tree/main) to instrument our application and sent traces via OTLP to Arize and Phoenix.
2. Build a simple query engine using LlamaIndex that uses RAG to answer questions about the Arize documentation
3. Inspect the traces and spans of your application to identify sources of latency and cost

ℹ️ This notebook requires:
- An OpenAI API key
- An Arize Space & API Key (explained below)


## Step 1: Install Dependencies 📚
Let's get the notebook setup with dependencies.

In [1]:
# Dependencies needed to build the Llama Index RAG application
!pip install gcsfs openai>=1 llama-index>=0.10.3

# Dependencies needed to export spans and send them to our collectors: Arize and/or Phoenix
!pip install opentelemetry-exporter-otlp 'openinference-instrumentation-llama-index>=1.3.0'

# Install Phoenix if you want to send traces to Arize and Phoenix simultaneously.
!pip install arize-phoenix

Collecting opentelemetry-exporter-otlp
  Downloading opentelemetry_exporter_otlp-1.28.2-py3-none-any.whl.metadata (2.3 kB)
Collecting openinference-instrumentation-llama-index>=1.3.0
  Downloading openinference_instrumentation_llama_index-3.0.4-py3-none-any.whl.metadata (5.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc==1.28.2 (from opentelemetry-exporter-otlp)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.28.2-py3-none-any.whl.metadata (2.2 kB)
Collecting opentelemetry-exporter-otlp-proto-http==1.28.2 (from opentelemetry-exporter-otlp)
  Downloading opentelemetry_exporter_otlp_proto_http-1.28.2-py3-none-any.whl.metadata (2.2 kB)
Collecting opentelemetry-exporter-otlp-proto-common==1.28.2 (from opentelemetry-exporter-otlp-proto-grpc==1.28.2->opentelemetry-exporter-otlp)
  Downloading opentelemetry_exporter_otlp_proto_common-1.28.2-py3-none-any.whl.metadata (1.8 kB)
Collecting opentelemetry-proto==1.28.2 (from opentelemetry-exporter-otlp-proto-grpc==1.28.2->opentelemet

## Step 2: OTLP Instrumentation
Let's import the dependencies we need

In [2]:
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

### Step 2.a: Define an exporter to Phoenix
We need to start a `phoenix` session to act as a collector for the spans we export.

In [3]:
import phoenix as px
session = px.launch_app()

🌍 To view the Phoenix app in your browser, visit https://7abbvil1cr1-496ff2e9c6d22116-6006-colab.googleusercontent.com/
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix


Next, we create an OTLP exporter with the Phoenix endpoint detailed above. Note that we use HTTP to export to Phoenix, which acts as a collector.

In [4]:
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter as PhoenixOTLPSpanExporter

In [5]:
phoenix_endpoint = "http://127.0.0.1:6006/v1/traces"
span_phoenix_exporter = PhoenixOTLPSpanExporter(endpoint=phoenix_endpoint)
span_phoenix_processor = SimpleSpanProcessor(span_exporter=span_phoenix_exporter)

### Step 2.b: Define an exporter to Arize
Creating an Arize exporter is very similar to what we did for Phoenix. We just need 2 more things:
* Space and API keys, that will be send as headers
* Model ID and version, sent as resource attributes

Copy the Arize API_KEY and SPACE_ID from your Space Settings page (shown below) to the variables in the cell below. We will also be setting up some metadata to use across all logging.

<center><img src="https://storage.googleapis.com/arize-assets/fixtures/copy-id-and-key.png" width="700"></center>

In [7]:
SPACE_ID = "U3BhY2U6NjM3MjoyMXJG" # Change this line
API_KEY = "416ad605925bf226fd9" # Change this line

model_id = "tutorial-otlp-tracing-llama-index-rag"
model_version = "1.0"

if SPACE_ID == "SPACE_ID" or API_KEY == "API_KEY":
    raise ValueError("❌ CHANGE SPACE_ID AND/OR API_KEY")
else:
    print("✅ Import and Setup Arize Client Done! Now we can start using Arize!")

✅ Import and Setup Arize Client Done! Now we can start using Arize!


Next, we create an OTLP exproter with the Arize endpoint detailed above. Note that we use GRPC to export to Arize, which acts as a collector

In [8]:
import os
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter as ArizeOTLPSpanExporter

In [9]:
# Set the Space and API keys as headers
os.environ['OTEL_EXPORTER_OTLP_TRACES_HEADERS']=f"space_id={SPACE_ID},api_key={API_KEY}"

# Set the model id and version as resource attributes
resource = Resource(
    attributes={
        "model_id":model_id,
        "model_version":model_version,
    }
)

arize_endpoint = "https://otlp.arize.com/v1"
span_arize_exporter = ArizeOTLPSpanExporter(endpoint=arize_endpoint)
span_arize_processor = SimpleSpanProcessor(span_exporter=span_arize_exporter)

### Step 2.c: Define a trace provider and initiate the instrumentation


In [10]:
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry import trace as trace_api
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

In [11]:
tracer_provider = trace_sdk.TracerProvider(resource=resource)
tracer_provider.add_span_processor(span_processor=span_phoenix_processor)
tracer_provider.add_span_processor(span_processor=span_arize_processor)
trace_api.set_tracer_provider(tracer_provider=tracer_provider)

In [12]:
# If you are running the instrumentation from a Colab environment, set skip_dep_check to True
# For more information check https://github.com/Arize-ai/openinference/issues/100
try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False

LlamaIndexInstrumentor().instrument(skip_dep_check=IN_COLAB)

## Step 3: Build Your Llama Index RAG Application 📁
Let's import the dependencies we need

In [13]:
import json
from getpass import getpass

import openai
from gcsfs import GCSFileSystem
from llama_index.core import (
    Settings,
    StorageContext,
    load_index_from_storage,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from tqdm import tqdm

Set your OpenAI API key if it is not already set as an environment variable.

In [14]:
if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
openai.api_key = openai_api_key
os.environ["OPENAI_API_KEY"] = openai_api_key

🔑 Enter your OpenAI API key: ··········


This example uses a `RetrieverQueryEngine` over a pre-built index of the Arize documentation, but you can use whatever LlamaIndex application you like. Download our pre-built index of the Arize docs from cloud storage and instantiate your storage context.

In [15]:
file_system = GCSFileSystem(project="public-assets-275721")
index_path = "arize-phoenix-assets/datasets/unstructured/llm/llama-index/arize-docs/index/"
storage_context = StorageContext.from_defaults(
    fs=file_system,
    persist_dir=index_path,
)



We are now ready to instantiate our query engine that will perform retrieval-augmented generation (RAG). Query engine is a generic interface in LlamaIndex that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is built on top of Retrievers. You can compose multiple query engines to achieve more advanced capability.

In [16]:
Settings.llm = OpenAI(model="gpt-4-turbo-preview")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
index = load_index_from_storage(
    storage_context,
)
query_engine = index.as_query_engine()

Let's test asking a question:

In [17]:
response = query_engine.query("What is Arize and how can it help me as an AI Engineer?")
print(response)

Arize is a machine learning observability platform that assists AI Engineers in monitoring, troubleshooting, and explaining their models. It enables you to monitor your model's real-time performance, even when there's a delay in receiving ground truth or feedback. The platform aids in identifying the root causes of model failures or performance degradation through tracing and explainability features. Additionally, it allows for the comparison of performance across multiple models and provides insights into drift, data quality, and model fairness/bias metrics. Arize is designed to integrate seamlessly with your existing machine learning infrastructure, offering flexibility in deployment as either a SaaS or an on-premise solution. This makes it a versatile tool for AI Engineers working in teams of any size, from individual contributors to large enterprises.


Great! Our application works. Let's move on to the Observability Instrumentation

## Step 4: Use our instrumented query engine

We will download a dataset of queries for our RAG application to answer and see the traces appear in Arize and Phoenix

In [18]:
from urllib.request import urlopen

queries_url = "http://storage.googleapis.com/arize-phoenix-assets/datasets/unstructured/llm/context-retrieval/arize_docs_queries.jsonl"
queries = []
with urlopen(queries_url) as response:
    for line in response:
        line = line.decode("utf-8").strip()
        data = json.loads(line)
        queries.append(data["query"])

queries[:5]

['How do I use the SDK to upload a ranking model?',
 'What drift metrics are supported in Arize?',
 'Does Arize support batch models?',
 'Does Arize support training data?',
 'How do I configure a threshold if my data has seasonality trends?']

In [19]:
from tqdm import tqdm
from openinference.instrumentation import using_attributes

N1 = 5 # Number of traces for your first session
SESSION_ID_1 = "session-id-1" # Identifer for your first session
USER_ID_1 = "john_smith" # Identifer for your first session
METADATA = {
    "key_bool": True,
    "key_str": "value1",
    "key_int": 1
}

qa_pairs = []
for query in tqdm(queries[:N1]):
    with using_attributes(
        session_id=SESSION_ID_1,
        user_id=USER_ID_1,
        metadata=METADATA,
    ):
        resp = query_engine.query(query)
        qa_pairs.append((query,resp))

100%|██████████| 5/5 [00:41<00:00,  8.33s/it]


In [20]:
N2 = 3 # Number of traces for your second session
SESSION_ID_2 = "session-id-2" # Identifer for your second session
USER_ID_2 = "jane_doe" # Identifer for your second session

for query in tqdm(queries[N1:N1+N2]):
    with using_attributes(
        session_id=SESSION_ID_2,
        user_id=USER_ID_2,
        metadata=METADATA
    ):
        resp = query_engine.query(query)
        qa_pairs.append((query,resp))

100%|██████████| 3/3 [00:13<00:00,  4.39s/it]


In [21]:
for q,a in qa_pairs:
    q_msg = f">> QUESTION: {q}"
    print(f"{'-'*len(q_msg)}")
    print(q_msg)
    print(f">> ANSWER: {a}\n")

------------------------------------------------------------
>> QUESTION: How do I use the SDK to upload a ranking model?
>> ANSWER: To upload a ranking model using the SDK, you would typically follow a series of steps that align with the SDK's documentation and functionalities. While the specific instructions for uploading a ranking model can vary depending on the SDK you are using, a general approach might include:

1. **Initialization**: Start by initializing the SDK in your development environment. This usually involves importing the necessary libraries and setting up any required credentials or API keys for authentication.

2. **Model Preparation**: Ensure your ranking model is ready for upload. This might involve training the model on your data, evaluating its performance using appropriate metrics (such as NDCG, GroupAUC, MAP, MRR, AUC, PR-AUC, Log Loss), and saving it in a format supported by the SDK.

3. **Define Metadata**: Prepare any metadata that needs to accompany your mod