<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-assets/phoenix/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://docs.arize.com/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email">Community</a>
    </p>
</center>
<h1 align="center">Tracing and Evaluating a LlamaIndex Application using MongoDB Atlas as Vector Store</h1>

<h2 align="center"> LaMA Stack (LlamaIndex,  MongoDB and Arize) </h2>

LlamaIndex provides high-level APIs that enable users to build powerful applications in a few lines of code. However, it can be challenging to understand what is going on under the hood and to pinpoint the cause of issues. Phoenix makes your LLM applications *observable* by visualizing the underlying structure of each call to your query engine and surfacing problematic `spans`` of execution based on latency, token count, or other evaluation metrics.

In this tutorial, you will:
- Generate data into a MongoDB Collection to be later used as a Vector Store.
- Build a simple query engine using LlamaIndex that uses retrieval-augmented generation to answer questions over the Arize documentation,
- Record trace data in [OpenInference tracing](https://github.com/Arize-ai/open-inference-spec/blob/main/trace/spec/traces.md) format using the global `arize_phoenix` handler
- Inspect the traces and spans of your application to identify sources of latency and cost,
- Export your trace data as a pandas dataframe and run an [LLM Evals](https://docs.arize.com/phoenix/concepts/llm-evals) to measure the precision@k of the query engine's retrieval step.

ℹ️ This notebook requires an OpenAI API key.

## 1. Install needed dependencies and import relevant packages

In [None]:
!pip install -q uv
!uv pip install -q --system llama-index-embeddings-openai 'arize-phoenix[evals]' llama-index llama-index-callbacks-arize-phoenix llama-index-vector-stores-mongodb llama-index-storage-docstore-mongodb llama-index-storage-index-store-mongodb llama-index-readers-mongodb
!uv pip install -q --system "openai>=1" gcsfs nest-asyncio pymongo beautifulsoup4 certifi 'httpx<0.28'

In [None]:
import json
import os
import urllib
from getpass import getpass
from urllib.request import urlopen

import nest_asyncio
import openai
import pandas as pd
from llama_index.core import StorageContext, set_global_handler
from llama_index.core.indices.vector_store.base import VectorStoreIndex
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.settings import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.readers.mongodb import SimpleMongoReader
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
from pymongo.operations import SearchIndexModel
from tqdm import tqdm

import phoenix as px
from phoenix.evals import (
    HallucinationEvaluator,
    OpenAIModel,
    QAEvaluator,
    RelevanceEvaluator,
    run_evals,
)
from phoenix.session.evaluation import get_qa_with_reference, get_retrieved_documents
from phoenix.trace import DocumentEvaluations, SpanEvaluations

nest_asyncio.apply()  # needed for concurrent evals in notebook environments
pd.set_option("display.max_colwidth", 1000)

## 2. Set up MongoDB Atlas

To effectively use this notebook for MongoDB operations, it's essential to have a MongoDB account set up with a database and collection already created. Additionally, you need to have a vector index created as described in the MongoDB Atlas Search documentation.

This can be done by following this steps:

1. Create a MongoDB Atlas account.
2. Create a database.
3. Add a new collection to that database.
4. Create a search index with the following structure in the recently created collection:

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "euclidean",
      "type": "vector"
    }
  ]
}


Whenever the set up is complete, you can check the connection to your notebook as shown below.

**Note: You must add your ip address to the ip white list of your Mongo database in order to succesfuly connect.**

In [None]:
mongo_username = ""  # Replace with your mongo username
mongo_password = ""  # Replace with your mongo password

from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi

uri = f"mongodb+srv://{mongo_username}:{mongo_password}@cluster0.lq406.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0"

# Create a new client and connect to the server
client = MongoClient(uri, server_api=ServerApi("1"))

# Send a ping to confirm a successful connection
try:
    client.admin.command("ping")
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)

Now that the initial setup is complete, our next step involves generating and storing data in the newly created collection. The essential data elements required for each entry in the collection are 'text' and 'embedding'. The 'text' field should contain the textual information, while the 'embedding' field must store the corresponding vector representation. This structured approach ensures that each record in our collection is equipped with the necessary attributes for effective text search and vector-based operations.

In [None]:
url = "https://storage.googleapis.com/arize-assets/xander/mongodb/mongodb_dataset.json"

with urllib.request.urlopen(url) as response:
    buffer = response.read()
    data = json.loads(buffer.decode("utf-8"))
    rows = data["rows"]

We then proceed to store data into our previously created collection.

In [None]:
db_name = "phoenix"
collection_name = "phoenix-docs"

db = client[db_name]  # Replace with your database name
collection = db[collection_name]  # Replace with your collection name

# Assuming 'overwrite=True' means you want to clear the collection first and insert nodes
overwrite = True
if overwrite:
    collection.delete_many({})
    nodes = []
    for row in rows:
        node = {
            "embedding": row["embedding"],
            "text": row["text"],
            "id": row["id"],
            "source_doc_id": row["doc_id"],  # Assuming this is a relationship reference
        }
        nodes.append(node)

    # Insert the documents into MongoDB Atlas
    collection.insert_many(nodes)
    print("Succesfully added nodes into mongodb!")

## 3. Configure Your OpenAI API Key

Set your OpenAI API key if it is not already set as an environment variable.

In [None]:
if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
openai.api_key = openai_api_key
os.environ["OPENAI_API_KEY"] = openai_api_key

## 4. Launch your phoenix application

Enable Phoenix tracing within LlamaIndex by setting `arize_phoenix` as the global handler. This will mount Phoenix's [OpenInferenceTraceCallback](https://docs.arize.com/phoenix/integrations/llamaindex) as the global handler. Phoenix uses OpenInference traces - an open-source standard for capturing and storing LLM application traces that enables LLM applications to seamlessly integrate with LLM observability solutions such as Phoenix.

In [None]:
session = px.launch_app()

In [None]:
set_global_handler("arize_phoenix")

This example uses a `MongoDBAtlasVectorSearch` and uses the previously generated collection to work fully connected with MongoDB but you can use whatever LlamaIndex application you like.

In [None]:
db_name = "phoenix"  # Replace with your database name
collection_name = "phoenix-docs"  # Replace with your collection name
vector_index_name = "vector_index"  # Replace with your vector index name
Settings.llm = OpenAI(model="gpt-4o", temperature=0.0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

db = client[db_name]
collection = db[collection_name]

# You can obtain your uri @... format directly in mongo atlas
uri = f"mongodb+srv://{mongo_username}:{mongo_password}@cluster0.lq406.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0"

query_dict = {}
reader = SimpleMongoReader(uri=uri)
documents = reader.load_data(
    db_name,
    collection_name,
    field_names=["text"],
    query_dict=query_dict,
)

# Create a new client and connect to the server
client = MongoClient(uri, server_api=ServerApi("1"))

# create Atlas as a vector store
store = MongoDBAtlasVectorSearch(
    client, db_name=db_name, collection_name=collection_name, vector_index_name=vector_index_name
)

storage_context = StorageContext.from_defaults(vector_store=store)

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, show_progress=True
)

In [None]:
# Create your index model, then create the search index
search_index_model = SearchIndexModel(
    definition={
        "fields": [
            {"type": "vector", "path": "embedding", "numDimensions": 1536, "similarity": "cosine"},
        ]
    },
    name="vector_index",
    type="vectorSearch",
)

collection.create_search_index(model=search_index_model)

In [None]:
# Instantiate Atlas Vector Search as a retriever
vector_store_retriever = VectorIndexRetriever(index=index, similarity_top_k=5)

# Pass the retriever into the query engine
query_engine = RetrieverQueryEngine(retriever=vector_store_retriever)

## 5. Run Your Query Engine and View Your Traces in Phoenix

We've compiled a list of commonly asked questions about Arize. Let's download the sample queries and take a look.

In [None]:
queries_url = "http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/llm/context-retrieval/arize_docs_queries.jsonl"
queries = []
with urlopen(queries_url) as response:
    for line in response:
        line = line.decode("utf-8").strip()
        data = json.loads(line)
        queries.append(data["query"])
queries[:10]

Let's run the first 10 queries and view the traces in Phoenix.


In [None]:
for query in tqdm(queries[:10]):
    try:
        query_engine.query(query)
    except Exception:
        pass
# Save trace dataset
tds = px.Client().get_trace_dataset()
tds.name = "phoenix_local"
tds.to_disc()

Check the Phoenix UI as your queries run. Your traces should appear in real time.

Open the Phoenix UI with the link below if you haven't already and click through the queries to better understand how the query engine is performing. For each trace you will see a break

Phoenix can be used to understand and troubleshoot your by surfacing:
 - **Application latency** - highlighting slow invocations of LLMs, Retrievers, etc.
 - **Token Usage** - Displays the breakdown of token usage with LLMs to surface up your most expensive LLM calls
 - **Runtime Exceptions** - Critical runtime exceptions such as rate-limiting are captured as exception events.
 - **Retrieved Documents** - view all the documents retrieved during a retriever call and the score and order in which they were returned
 - **Embeddings** - view the embedding text used for retrieval and the underlying embedding model
LLM Parameters - view the parameters used when calling out to an LLM to debug things like temperature and the system prompts
 - **Prompt Templates** - Figure out what prompt template is used during the prompting step and what variables were used.
 - **Tool Descriptions** - view the description and function signature of the tools your LLM has been given access to
 - **LLM Function Calls** - if using OpenAI or other a model with function calls, you can view the function selection and function messages in the input messages to the LLM.

<img src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/RAG_trace_details.png" alt="Trace Details View on Phoenix" style="width:100%; height:auto;">

In [None]:
print(f"🚀 Open the Phoenix UI if you haven't already: {session.url}")

## 6. Export and Evaluate Your Trace Data
You can export your trace data as a pandas dataframe for further analysis and evaluation.

In this case, we will export our retriever spans into two separate dataframes:

queries_df, in which the retrieved documents for each query are concatenated into a single column,
retrieved_documents_df, in which each retrieved document is "exploded" into its own row to enable the evaluation of each query-document pair in isolation.
This will enable us to compute multiple kinds of evaluations, including:

relevance: Are the retrieved documents grounded in the response?
Q&A correctness: Are your application's responses grounded in the retrieved context?
hallucinations: Is your application making up false information?

In [None]:
queries_df = get_qa_with_reference(session)
retrieved_documents_df = get_retrieved_documents(session)

Next, define your evaluation model and your evaluators.

Evaluators are built on top of language models and prompt the LLM to assess the quality of responses, the relevance of retrieved documents, etc., and provide a quality signal even in the absence of human-labeled data. Pick an evaluator type and instantiate it with the language model you want to use to perform evaluations using our battle-tested evaluation templates.

In [None]:
eval_model = OpenAIModel(
    model="gpt-4o",
)
hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_correctness_evaluator = QAEvaluator(eval_model)
relevance_evaluator = RelevanceEvaluator(eval_model)

hallucination_eval_df, qa_correctness_eval_df = run_evals(
    dataframe=queries_df,
    evaluators=[hallucination_evaluator, qa_correctness_evaluator],
    provide_explanation=True,
)
relevance_eval_df = run_evals(
    dataframe=retrieved_documents_df,
    evaluators=[relevance_evaluator],
    provide_explanation=True,
)[0]


px.Client().log_evaluations(
    SpanEvaluations(eval_name="Hallucination", dataframe=hallucination_eval_df),
    SpanEvaluations(eval_name="QA Correctness", dataframe=qa_correctness_eval_df),
)
px.Client().log_evaluations(DocumentEvaluations(eval_name="Relevance", dataframe=relevance_eval_df))

Your evaluations should now appear as annotations on the appropriate spans in Phoenix.

![A view of the Phoenix UI with evaluation annotations](https://storage.googleapis.com/arize-assets/phoenix/assets/docs/notebooks/evals/traces_with_evaluation_annotations.png)