<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-assets/phoenix/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://docs.arize.com/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q">Community</a>
    </p>
</center>
<h1 align="center">Tracing and Evaluating a LlamaIndex Application using MongoDB Atlas as Vector Store</h1>

<h2 align="center"> LAM Stack (LlamaIndex, Arize and MongoDB) </h2>

LlamaIndex provides high-level APIs that enable users to build powerful applications in a few lines of code. However, it can be challenging to understand what is going on under the hood and to pinpoint the cause of issues. Phoenix makes your LLM applications *observable* by visualizing the underlying structure of each call to your query engine and surfacing problematic `spans`` of execution based on latency, token count, or other evaluation metrics.

In this tutorial, you will:
- Generate data into a MongoDB Collection to be later used as a Vector Store.
- Build a simple query engine using LlamaIndex that uses retrieval-augmented generation to answer questions over the Arize documentation,
- Record trace data in [OpenInference tracing](https://github.com/Arize-ai/open-inference-spec/blob/main/trace/spec/traces.md) format using the global `arize_phoenix` handler
- Inspect the traces and spans of your application to identify sources of latency and cost,
- Export your trace data as a pandas dataframe and run an [LLM Evals](https://docs.arize.com/phoenix/concepts/llm-evals) to measure the precision@k of the query engine's retrieval step.

ℹ️ This notebook requires an OpenAI API key.

## 1. Install needed dependencies and import relevant packages

In [None]:
!pip install llama-index-embeddings-openai arize-phoenix llama-index llama-index-callbacks-arize-phoenix pip install llama-index-vector-stores-mongodb llama-index-storage-docstore-mongodb llama-index-storage-index-store-mongodb llama-index-readers-mongodb "openai>=1" gcsfs nest-asyncio pymongo beautifulsoup4 certifi

Collecting llama-index-embeddings-openai
  Downloading llama_index_embeddings_openai-0.1.6-py3-none-any.whl (6.0 kB)
Collecting arize-phoenix
  Downloading arize_phoenix-3.13.1-py3-none-any.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index
  Downloading llama_index-0.10.19-py3-none-any.whl (5.6 kB)
Collecting llama-index-callbacks-arize-phoenix
  Downloading llama_index_callbacks_arize_phoenix-0.1.4-py3-none-any.whl (2.0 kB)
Collecting install
  Downloading install-1.3.5-py3-none-any.whl (3.2 kB)
Collecting llama-index-vector-stores-mongodb
  Downloading llama_index_vector_stores_mongodb-0.1.4-py3-none-any.whl (4.0 kB)
Collecting llama-index-storage-docstore-mongodb
  Downloading llama_index_storage_docstore_mongodb-0.1.2-py3-none-any.whl (2.2 kB)
Collecting llama-index-storage-index-store-mongodb
  Downloading llama_index_storage_index_store_mongodb-0.1.2-py3-none-any.whl (2.1

In [2]:
import datetime
import json
import os
import pickle
import ssl
import time
import urllib
from getpass import getpass
from urllib.request import urlopen

import certifi
import nest_asyncio
import openai
import pandas as pd
import phoenix as px
import requests
from bs4 import BeautifulSoup
from gcsfs import GCSFileSystem
from llama_index.core import (
    ServiceContext, StorageContext, download_loader,
    load_index_from_storage, set_global_handler
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.graph_stores.simple import SimpleGraphStore
from llama_index.core.indices.vector_store.base import VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.readers.mongodb import SimpleMongoReader
from llama_index.storage.docstore.mongodb import MongoDocumentStore
from llama_index.storage.index_store.mongodb import MongoIndexStore
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
from phoenix.experimental.evals import (
    HallucinationEvaluator, OpenAIModel, QAEvaluator,
    RelevanceEvaluator, run_evals
)
from phoenix.session.evaluation import get_qa_with_reference, get_retrieved_documents
from phoenix.trace import DocumentEvaluations, SpanEvaluations
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi
from tqdm import tqdm


nest_asyncio.apply()  # needed for concurrent evals in notebook environments
pd.set_option("display.max_colwidth", 1000)



## 2. Set up MongoDB Atlas

To effectively use this notebook for MongoDB operations, it's essential to have a MongoDB account set up with a database and collection already created. Additionally, you need to have a vector index created as described in the MongoDB Atlas Search documentation.

This can be done by following this steps:

1. Create a MongoDB Atlas account.
2. Create a database.
3. Add a new collection to that database.
4. Create a search index with the following structure in the recently created collection:

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "euclidean",
      "type": "vector"
    }
  ]
}


Whenever the set up is complete, you can check the connection to your notebook as shown below.

*Note: You should add your ip address to the ip white list of your Mongo database in order to succesfuly connect.*

In [3]:
mongo_username = "arize_mongo_read"
mongo_password = "phsW002Be9rEzjJd"

uri = f"mongodb+srv://{mongo_username}:{mongo_password}@phoenix-llama.ywnxqjv.mongodb.net/?retryWrites=true&w=majority"

# Create a new client and connect to the server
client = MongoClient(uri)

# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)


Pinged your deployment. You successfully connected to MongoDB!


Now that the initial setup is complete, our next step involves generating and storing data in the newly created collection. The essential data elements required for each entry in the collection are 'text' and 'embedding'. The 'text' field should contain the textual information, while the 'embedding' field must store the corresponding vector representation. This structured approach ensures that each record in our collection is equipped with the necessary attributes for effective text search and vector-based operations.

In [4]:
url = "http://storage.googleapis.com/arize-assets/xander/milvus-workshop/milvus_dataset.json"

with urllib.request.urlopen(url) as response:
    buffer = response.read()
    data = json.loads(buffer.decode("utf-8"))
    rows = data["rows"]

We then proceed to store data into our previously created collection.

In [5]:
db_name = 'phoenix'
collection_name = 'phoenix-docs'

db = client[db_name]  # Replace with your database name
collection = db[collection_name]  # Replace with your collection name

# Assuming 'overwrite=True' means you want to clear the collection first and insert nodes
overwrite=True
if overwrite:
    collection.delete_many({})
    nodes = []
    for row in rows:
        node = {
            "embedding": row["embedding"],
            "text": row["text"],
            "id": row["id"],
            "source_doc_id": row["doc_id"]  # Assuming this is a relationship reference
        }
        nodes.append(node)

    # Insert the documents into MongoDB Atlas
    collection.insert_many(nodes)
    print("Succesfully added nodes into mongodb!")

Succesfully added nodes into mongodb!


## 3. Configure Your OpenAI API Key

Set your OpenAI API key if it is not already set as an environment variable.

In [6]:
if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
openai.api_key = openai_api_key
os.environ["OPENAI_API_KEY"] = openai_api_key

🔑 Enter your OpenAI API key: ··········


## 4. Launch your phoenix application

Enable Phoenix tracing within LlamaIndex by setting `arize_phoenix` as the global handler. This will mount Phoenix's [OpenInferenceTraceCallback](https://docs.arize.com/phoenix/integrations/llamaindex) as the global handler. Phoenix uses OpenInference traces - an open-source standard for capturing and storing LLM application traces that enables LLM applications to seamlessly integrate with LLM observability solutions such as Phoenix.

In [7]:
session = px.launch_app()

🌍 To view the Phoenix app in your browser, visit https://cpc0n4inik1-496ff2e9c6d22116-6006-colab.googleusercontent.com/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix


In [8]:
set_global_handler("arize_phoenix")

This example uses a `MongoDBAtlasVectorSearch` and uses the previously generated collection to work fully connected with MongoDB but you can use whatever LlamaIndex application you like.

In [None]:
db_name = 'phoenix' # Replace with your database name
collection_name = 'phoenix-docs' # Replace with your collection name
vector_index_name = 'vector_index' # Replace with your vector index name

db = client[db_name]
collection = db[collection_name]

mongo_username = "YOUR_USERNAME" # Replace mongo username
mongo_password = "YOUR_PASSWORD" # Replace mongo password

# You can obtain your uri @... format directly in mongo atlas
uri = f"mongodb+srv://{mongo_username}:{mongo_password}@phoenix-llama.ywnxqjv.mongodb.net/?retryWrites=true&w=majority"

query_dict = {}
reader = SimpleMongoReader(uri=uri)
documents = reader.load_data(
    db_name,
    collection_name,
    field_names=["text"],
    query_dict=query_dict
)

# Create a new client and connect to the server
client = MongoClient(uri, server_api=ServerApi('1'))

# create Atlas as a vector store
store = MongoDBAtlasVectorSearch(
    client,
    db_name=db_name,
    collection_name=collection_name,
    index_name=vector_index_name
)

storage_context = StorageContext.from_defaults(vector_store=store)

service_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-4-1106-preview", temperature=0.0),
    embed_model=OpenAIEmbedding(model="text-embedding-ada-002"),
)

index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    service_context=service_context,
    show_progress=True
)


  service_context = ServiceContext.from_defaults(


Parsing nodes:   0%|          | 0/2454 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/312 [00:00<?, ?it/s]

## 5. Run Your Query Engine and View Your Traces in Phoenix

We've compiled a list of commonly asked questions about Arize. Let's download the sample queries and take a look.

In [None]:
queries_url = "http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/llm/context-retrieval/arize_docs_queries.jsonl"
queries = []
with urlopen(queries_url) as response:
    for line in response:
        line = line.decode("utf-8").strip()
        data = json.loads(line)
        queries.append(data["query"])
queries[:10]

['How do I use the SDK to upload a ranking model?',
 'What drift metrics are supported in Arize?',
 'Does Arize support batch models?',
 'Does Arize support training data?',
 'How do I configure a threshold if my data has seasonality trends?',
 'How are clusters in the UMAP calculated? When are the clusters refreshed?',
 'How does Arize calculate AUC?',
 'Can I send truth labels to Arize separtely? ',
 'How do I send embeddings to Arize?',
 'Can I copy a dashboard']

Let's run the first 10 queries and view the traces in Phoenix.


In [None]:
query_engine = index.as_query_engine()
for query in tqdm(queries[:10]):
    try:
      query_engine.query(query)
    except Exception as e:
      pass

100%|██████████| 10/10 [00:50<00:00,  5.08s/it]


And just for fun, ask your own question!

In [None]:
response = query_engine.query("What is Arize and how can it help me as an AI Engineer?")
print(response)

Arize is a platform designed to assist AI Engineers and Machine Learning practitioners with various aspects of model lifecycle management. It provides tools to monitor model performance in real-time, even when there is a delay in receiving ground truth or feedback data. The platform aids in identifying and diagnosing the root causes of model failures or performance issues through tracing and explainability features. Additionally, it allows for the comparison of performance across multiple models. Arize also offers capabilities to detect and report on data drift, data quality issues, and potential model fairness or bias, which are critical for maintaining the integrity and effectiveness of AI systems. This suite of tools can help you ensure that your models perform optimally and responsibly after deployment.


Check the Phoenix UI as your queries run. Your traces should appear in real time.

Open the Phoenix UI with the link below if you haven't already and click through the queries to better understand how the query engine is performing. For each trace you will see a break

Phoenix can be used to understand and troubleshoot your by surfacing:
 - **Application latency** - highlighting slow invocations of LLMs, Retrievers, etc.
 - **Token Usage** - Displays the breakdown of token usage with LLMs to surface up your most expensive LLM calls
 - **Runtime Exceptions** - Critical runtime exceptions such as rate-limiting are captured as exception events.
 - **Retrieved Documents** - view all the documents retrieved during a retriever call and the score and order in which they were returned
 - **Embeddings** - view the embedding text used for retrieval and the underlying embedding model
LLM Parameters - view the parameters used when calling out to an LLM to debug things like temperature and the system prompts
 - **Prompt Templates** - Figure out what prompt template is used during the prompting step and what variables were used.
 - **Tool Descriptions** - view the description and function signature of the tools your LLM has been given access to
 - **LLM Function Calls** - if using OpenAI or other a model with function calls, you can view the function selection and function messages in the input messages to the LLM.

<img src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/RAG_trace_details.png" alt="Trace Details View on Phoenix" style="width:100%; height:auto;">

In [None]:
print(f"🚀 Open the Phoenix UI if you haven't already: {session.url}")

🚀 Open the Phoenix UI if you haven't already: https://x4b27ifweub2-496ff2e9c6d22116-6006-colab.googleusercontent.com/


## 6. Export and Evaluate Your Trace Data
You can export your trace data as a pandas dataframe for further analysis and evaluation.

In this case, we will export our retriever spans into two separate dataframes:

queries_df, in which the retrieved documents for each query are concatenated into a single column,
retrieved_documents_df, in which each retrieved document is "exploded" into its own row to enable the evaluation of each query-document pair in isolation.
This will enable us to compute multiple kinds of evaluations, including:

relevance: Are the retrieved documents grounded in the response?
Q&A correctness: Are your application's responses grounded in the retrieved context?
hallucinations: Is your application making up false information?

In [None]:
queries_df = get_qa_with_reference(session)
retrieved_documents_df = get_retrieved_documents(session)

Next, define your evaluation model and your evaluators.

Evaluators are built on top of language models and prompt the LLM to assess the quality of responses, the relevance of retrieved documents, etc., and provide a quality signal even in the absence of human-labeled data. Pick an evaluator type and instantiate it with the language model you want to use to perform evaluations using our battle-tested evaluation templates.

In [None]:
eval_model = OpenAIModel(
    model_name="gpt-4-1106-preview",
)
hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_correctness_evaluator = QAEvaluator(eval_model)
relevance_evaluator = RelevanceEvaluator(eval_model)

hallucination_eval_df, qa_correctness_eval_df = run_evals(
    dataframe=queries_df,
    evaluators=[hallucination_evaluator, qa_correctness_evaluator],
    provide_explanation=True,
)
relevance_eval_df = run_evals(
    dataframe=retrieved_documents_df,
    evaluators=[relevance_evaluator],
    provide_explanation=True,
)[0]

px.log_evaluations(
    SpanEvaluations(eval_name="Hallucination", dataframe=hallucination_eval_df),
    SpanEvaluations(eval_name="QA Correctness", dataframe=qa_correctness_eval_df),
)
px.log_evaluations(DocumentEvaluations(eval_name="Relevance", dataframe=relevance_eval_df))

The `model_name` field is deprecated. Use `model` instead.                 This will be removed in a future release.


run_evals |          | 0/20 (0.0%) | ⏳ 00:00<? | ?it/s

run_evals |          | 0/10 (0.0%) | ⏳ 00:00<? | ?it/s



Your evaluations should now appear as annotations on the appropriate spans in Phoenix.

![A view of the Phoenix UI with evaluation annotations](https://storage.googleapis.com/arize-assets/phoenix/assets/docs/notebooks/evals/traces_with_evaluation_annotations.png)