

<center> <img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="300"/> </center>

# <center>Tracing via OTLP using Arize</center>

This guide demonstrates how to use Arize for monitoring and debugging your LLM using Traces and Spans. We're going to build a simple query engine using LlamaIndex and retrieval-augmented generation (RAG) to answer questions about the [Arize documentation](https://docs.arize.com/arize/). You can read more about LLM tracing [here](https://docs.arize.com/arize/llm-large-language-models/llm-traces). Arize makes your LLM applications observable by visualizing the underlying structure of each call to your query engine and surfacing problematic `spans` of execution based on latency, token count, or other evaluation metrics.

In this tutorial, you will:
1. Use opentelemetry and [openinference](https://github.com/Arize-ai/openinference/tree/main) to instrument our application and sent traces via OTLP to Arize.
2. Build a simple query engine using LlamaIndex that uses RAG to answer questions about the Arize documentation
3. Inspect the traces and spans of your application to identify sources of latency and cost

ℹ️ This notebook requires:
- An OpenAI API key
- An Arize Space & API Key (explained below)


## Step 1: Install Dependencies 📚
Let's get the notebook setup with dependencies.

In [None]:
# Dependencies needed to build the Llama Index RAG application
!pip install gcsfs openai>=1 llama-index>=0.10.3 

# Dependencies needed to export spans and send them to our collectors: Arize
!pip install opentelemetry-exporter-otlp openinference-instrumentation-llama-index

## Step 2: OTLP Instrumentation
Let's import the dependencies we need

In [None]:
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

### Step 2.a: Define an exporter to Arize
Creating an Arize exporter is very simple. We just need 2 things:
* Space and API keys, that will be sends headers
* Model ID and version, sent as resource attributes

Copy the Arize API_KEY and SPACE_KEY from your Space Settings page (shown below) to the variables in the cell below. We will also be setting up some metadata to use across all logging.

<center><img src="https://storage.googleapis.com/arize-assets/fixtures/copy-keys.png" width="700"></center>

In [None]:
SPACE_KEY = "SPACE_KEY" # Change this line
API_KEY = "API_KEY" # Change this line

model_id = "tutorial-otlp-tracing-llama-index-rag"
model_version = "1.0"

if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE SPACE AND/OR API_KEY")
else:
    print("✅ Import and Setup Arize Client Done! Now we can start using Arize!")

Next, we create an OTLP exporter with the Arize endpoint detailed above. Note that we use GRPC to export traces to Arize, which acts as a collector.

In [None]:
import os
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

In [None]:
# Set the Space and API keys as headers
os.environ['OTEL_EXPORTER_OTLP_TRACES_HEADERS']=f"space_key={SPACE_KEY},api_key={API_KEY}"

# Set the model id and version as resource attributes
resource = Resource(
    attributes={
        "model_id":model_id,
        "model_version":model_version,
    }
)

endpoint = "https://otlp.arize.com/v1"
span_exporter = OTLPSpanExporter(endpoint=endpoint)
span_processor = SimpleSpanProcessor(span_exporter=span_exporter)

### Step 2.c: Define a trace provider and initiate the instrumentation


In [None]:
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry import trace as trace_api
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

In [None]:
tracer_provider = trace_sdk.TracerProvider(resource=resource)
tracer_provider.add_span_processor(span_processor=span_processor)
trace_api.set_tracer_provider(tracer_provider=tracer_provider)

In [None]:
# If you are running the instrumentation from a Colab environment, set skip_dep_check to True
# For more information check https://github.com/Arize-ai/openinference/issues/100
try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False

LlamaIndexInstrumentor().instrument(skip_dep_check=IN_COLAB)

## Step 3: Build Your Llama Index RAG Application 📁
Let's import the dependencies we need

In [None]:
import json
from getpass import getpass

import openai
from gcsfs import GCSFileSystem
from llama_index.core import (
    Settings,
    StorageContext,
    load_index_from_storage,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from tqdm import tqdm

Set your OpenAI API key if it is not already set as an environment variable.

In [None]:
if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
openai.api_key = openai_api_key
os.environ["OPENAI_API_KEY"] = openai_api_key

This example uses a `RetrieverQueryEngine` over a pre-built index of the Arize documentation, but you can use whatever LlamaIndex application you like. Download our pre-built index of the Arize docs from cloud storage and instantiate your storage context.

In [None]:
file_system = GCSFileSystem(project="public-assets-275721")
index_path = "arize-phoenix-assets/datasets/unstructured/llm/llama-index/arize-docs/index/"
storage_context = StorageContext.from_defaults(
    fs=file_system,
    persist_dir=index_path,
)

We are now ready to instantiate our query engine that will perform retrieval-augmented generation (RAG). Query engine is a generic interface in LlamaIndex that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is built on top of Retrievers. You can compose multiple query engines to achieve more advanced capability.

In [None]:
Settings.llm = OpenAI(model="gpt-4-turbo-preview")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
index = load_index_from_storage(
    storage_context,
)
query_engine = index.as_query_engine()

Let's test asking a question:

In [None]:
response = query_engine.query("What is Arize and how can it help me as an AI Engineer?")     
print(response)

Great! Our application works. Let's move on to the Observability Instrumentation

## Step 4: Use our instrumented query engine

We will download a dataset of queries for our RAG application to answer and see the traces appear in Arize.

In [None]:
from urllib.request import urlopen

queries_url = "http://storage.googleapis.com/arize-phoenix-assets/datasets/unstructured/llm/context-retrieval/arize_docs_queries.jsonl"
queries = []
with urlopen(queries_url) as response:
    for line in response:
        line = line.decode("utf-8").strip()
        data = json.loads(line)
        queries.append(data["query"])

queries[:5]

In [None]:
N = 10 # Sample size
qa_pairs = []
for query in tqdm(queries[:N]):
    resp = query_engine.query(query)
    qa_pairs.append((query,resp))

In [None]:
for q,a in qa_pairs:
    q_msg = f">> QUESTION: {q}"
    print(f"{'-'*len(q_msg)}")
    print(q_msg)
    print(f">> ANSWER: {a}\n")