<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://arize.com/docs/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-11t1vbu4x-xkBIHmOREQnYnYDH1GDfCg?__hstc=259489365.a667dfafcfa0169c8aee4178d115dc81.1733501603539.1733501603539.1733501603539.1&__hssc=259489365.1.1733501603539&__hsfp=3822854628&submissionGuid=381a0676-8f38-437b-96f2-fc10875658df#/shared-invite/email">Community</a>
    </p>
</center>

# <center>Tracing and Evaluating a Weaviate RAG Pipeline</center>

This guide walks through how you can trace and evaluate a Weaviate RAG Pipeline. Phoenix will allow you to capture traces on all calls made to Weaviate, and evaluate runs of a RAG pipeline built around the vector database.

*Note: This is intended to demonstrate how to break down and manually instrument all the pieces of a RAG pipeline. Weaviate does have easier ways to run RAG pipelines, however a more manual approach has been chosen here for demonstration purposes.*

⚠️ You'll need an OpenAI key for this guide

## Dependencies and Keys

In [None]:
!pip install -q arize-phoenix weaviate weaviate-client openai openinference-instrumentation-openai

This guide uses an online instance of [Phoenix](https://phoenix.arize.com), however if you'd prefer to self-host Phoenix, you can follow [these instructions](https://arize.com/docs/phoenix/deployment)

In [None]:
import os
from getpass import getpass

os.environ["PHOENIX_API_KEY"] = getpass("Enter your Phoenix API key: ")
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={os.environ['PHOENIX_API_KEY']}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"

os.environ["WEAVIATE_URL"] = getpass("Enter your Weaviate API URL: ")
os.environ["WEAVIATE_API_KEY"] = getpass("Enter your Weaviate API key: ")

## Connect to Weaviate

Connect to your Weaviate Cloud instance. If you don't already have an instance, you can create one for free at https://auth.wcs.api.weaviate.io/auth/realms/SeMI/login-actions/registration

In [None]:
import os

import weaviate
from weaviate.classes.init import Auth

# Best practice: store your credentials in environment variables
weaviate_url = os.environ["WEAVIATE_URL"]
weaviate_api_key = os.environ["WEAVIATE_API_KEY"]

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=weaviate_url,
    auth_credentials=Auth.api_key(weaviate_api_key),
)

print(client.is_ready())  # Should print: `True`

# client.close()  # Free up resources

### Prepare your DB
If you haven't already created a collection in Weaviate, the code below will create an example collection for you:

In [None]:
import os

import weaviate
from weaviate.classes.config import Configure
from weaviate.classes.init import Auth

# Best practice: store your credentials in environment variables
wcd_url = os.environ["WEAVIATE_URL"]
wcd_api_key = os.environ["WEAVIATE_API_KEY"]

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=wcd_url,  # Replace with your Weaviate Cloud URL
    auth_credentials=Auth.api_key(wcd_api_key),  # Replace with your Weaviate Cloud key
)

questions = client.collections.create(
    name="Question",
    vectorizer_config=Configure.Vectorizer.text2vec_weaviate(),  # Configure the Weaviate Embeddings integration
    generative_config=Configure.Generative.cohere(),  # Configure the Cohere generative AI integration
)

client.close()  # Free up resources

In [None]:
import json
import os

import requests
import weaviate
from weaviate.classes.init import Auth

# Best practice: store your credentials in environment variables
wcd_url = os.environ["WEAVIATE_URL"]
wcd_api_key = os.environ["WEAVIATE_API_KEY"]

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=wcd_url,  # Replace with your Weaviate Cloud URL
    auth_credentials=Auth.api_key(wcd_api_key),  # Replace with your Weaviate Cloud key
)

resp = requests.get(
    "https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json"
)
data = json.loads(resp.text)

questions = client.collections.get("Question")

with questions.batch.dynamic() as batch:
    for d in data:
        batch.add_object(
            {
                "answer": d["Answer"],
                "question": d["Question"],
                "category": d["Category"],
            }
        )
        if batch.number_errors > 10:
            print("Batch import stopped due to excessive errors.")
            break

failed_objects = questions.batch.failed_objects
if failed_objects:
    print(f"Number of failed imports: {len(failed_objects)}")
    print(f"First failed object: {failed_objects[0]}")

client.close()  # Free up resources

## Build and Instrument your RAG pipeline using Weaviate and OpenInference

With Phoenix and Weaviate set up, you're now ready to build your pipeline.

In [None]:
from phoenix.otel import register

phoenix_project_name = "weaviate-rag-pipeline"

# Because you've install the openinference openai package, the call below will auto-instrument OpenAI calls
tracer_provider = register(project_name=phoenix_project_name, auto_instrument=True)

# Retrieve a tracer for manual instrumentation
tracer = tracer_provider.get_tracer(__name__)

The following functions will build your RAG pipeline:
1. Query Weaviate for relevant document chunks
2. Format the retrieved data
3. Create a generation prompt with the retrieved data
4. Call your model with the generation prompt

Each function will also be instrumented using OpenInference and Phoenix

In [None]:
# Query a Weaviate collection with tracing
def query_weaviate(query_text, limit=3):
    # Start a span for the query
    with tracer.start_as_current_span(
        "query_weaviate", openinference_span_kind="retriever"
    ) as span:
        # Set the input for the span
        span.set_input(query_text)

        # Query the collection
        collection_name = "Question"
        chunks = client.collections.get(collection_name)
        results = chunks.query.near_text(query=query_text, limit=limit)

        # Set the retrieved documents as attributes on the span
        for i, document in enumerate(results.objects):
            span.set_attribute(f"retrieval.documents.{i}.document.id", str(document.uuid))
            span.set_attribute(f"retrieval.documents.{i}.document.metadata", str(document.metadata))
            span.set_attribute(
                f"retrieval.documents.{i}.document.content", str(document.properties)
            )

        return results

In [None]:
# Process and format the retrieved results
@tracer.chain  # This will create a chain span for the function, same as the with tracer.start_as_current_span() in the query_weaviate function
def format_context(results):
    context = ""
    for item in results.objects:
        properties = item.properties
        context += f"Question: {properties['question']}\n"
        context += f"Answer: {properties['answer']}\n"
        context += f"Category: {properties['category']}\n\n"
    return context

In [None]:
# Create a prompt with the retrieved information
@tracer.chain
def create_prompt(query_text, context):
    prompt = f"""
Based on the following information, please answer the question: "{query_text}"

Context:
{context}

Please provide a comprehensive answer based on the information provided.
"""
    return prompt

In [None]:
from openai import OpenAI

# Initialize OpenAI client
oa_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))


# Query OpenAI with the constructed prompt.
# This function does not have tracing applied to it, because the OpenAI
# client is instrumented using the auto_instrument flag in the register function.
def query_openai(prompt):
    response = oa_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt},
        ],
    )
    return response.choices[0].message.content

In [None]:
@tracer.chain
def rag_pipeline(query):
    # Execute the query
    weaviate_results = query_weaviate(query)
    context = format_context(weaviate_results)
    print("Retrieved context:")
    print(context)

    # Create a prompt with the retrieved information
    final_prompt = create_prompt(query, context)

    # Execute the OpenAI query
    final_answer = query_openai(final_prompt)

    return final_answer

In [None]:
query = "What is the only living mammal in the order Proboseidea?"

final_answer = rag_pipeline(query)

print("\nFinal Answer:")
print(final_answer)

# Evaluate your RAG System

Now with your RAG system working, you can add evaluation metrics to both the retrieval and generation steps.

In [None]:
os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key")

import nest_asyncio

nest_asyncio.apply()

In [None]:
from openinference.instrumentation.openai import OpenAIInstrumentor

# Because you don't want to trace the OpenAI calls used for evaluation, you can uninstrument the OpenAI client
OpenAIInstrumentor().uninstrument()

In [None]:
from phoenix.evals import OpenAIModel

# Initialize the OpenAI model you'll use for evaluation
eval_model = OpenAIModel(model="gpt-4o-mini")

In [None]:
import phoenix as px
from phoenix.session.evaluation import get_retrieved_documents

# Get the retrieved documents from Phoenix using this helper function
retrieved_documents_df = get_retrieved_documents(px.Client(), project_name=phoenix_project_name)

retrieved_documents_df

In [None]:
from phoenix.evals import RelevanceEvaluator, run_evals

# Initialize the built in Relevance evaluator
relevance_evaluator = RelevanceEvaluator(eval_model)

# Run the evaluation
retrieved_documents_relevance_df = run_evals(
    evaluators=[relevance_evaluator],
    dataframe=retrieved_documents_df,
    provide_explanation=True,
    concurrency=20,
)[0]

In [None]:
retrieved_documents_relevance_df.head()

In [None]:
from phoenix.session.evaluation import get_qa_with_reference

# Get the Question and Answer with reference data from Phoenix using this helper function
qa_with_reference_df = get_qa_with_reference(px.Client(), project_name=phoenix_project_name)
qa_with_reference_df

In [None]:
from phoenix.evals import (
    HallucinationEvaluator,
    QAEvaluator,
    run_evals,
)

# Initialize the built in Q&A evaluator
qa_evaluator = QAEvaluator(eval_model)

# Initialize the built in Hallucination evaluator
hallucination_evaluator = HallucinationEvaluator(eval_model)

# Run the evaluation
qa_correctness_eval_df, hallucination_eval_df = run_evals(
    evaluators=[qa_evaluator, hallucination_evaluator],
    dataframe=qa_with_reference_df,
    provide_explanation=True,
    concurrency=20,
)

In [None]:
from phoenix.client import AsyncClient
from phoenix.trace import DocumentEvaluations

client = AsyncClient()
await client.annotations.log_span_annotations_dataframe(
    dataframe=qa_correctness_eval_df,
    annotation_name="Q&A Correctness",
    annotator_kind="LLM",
)
await client.annotations.log_span_annotations_dataframe(
    dataframe=hallucination_eval_df,
    annotation_name="Hallucination",
    annotator_kind="LLM",
)

px.Client().log_evaluations(
    DocumentEvaluations(dataframe=retrieved_documents_relevance_df, eval_name="relevance"),
)

And just like that, you've now scored and evaluated your RAG pipeline!

# ![Weaviate Trace in Phoenix UI](https://storage.googleapis.com/arize-phoenix-assets/assets/images/weaviate-manual-nb-trace.png)
# ![Weaviate Traces in Phoenix UI](https://storage.googleapis.com/arize-phoenix-assets/assets/images/weaviate-manual-nb-traces.png)

From here, you can continue to tweak your pipeline to improve your scores. Or if you're curious to learn more, check out some of our conceptual guides:
* [LLM Evaluations Hub](https://arize.com/llm-evaluation)
* [AI Agents Hub](https://arize.com/ai-agents/)