## Setup

Install libraries

In [1]:
!pip install -qq arize-phoenix llama-index "openai>=1" gcsfs nest_asyncio langchain langchain-community cohere llama-index-postprocessor-cohere-rerank

Set up environment variables


In [2]:
import os
from getpass import getpass

if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("üîë Enter your OpenAI API key: ")
os.environ["OPENAI_API_KEY"] = openai_api_key

if not (cohere_api_key := os.getenv("COHERE_API_KEY")):
    cohere_api_key = getpass("üîë Enter your Cohere API key: ")
os.environ["COHERE_API_KEY"] = cohere_api_key

## Launch Phoenix and Instrumentation

In [3]:
import phoenix as px
session = px.launch_app()

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

endpoint = "http://127.0.0.1:6006/v1/traces"
tracer_provider = TracerProvider()
tracer_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))

LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)

## Parse Phoenix Documentation into Llama-Index Documents

In [6]:
# The nest_asyncio module enables the nesting of asynchronous functions within an already running async loop.
# This is necessary because Jupyter notebooks inherently operate in an asynchronous loop.
# By applying nest_asyncio, we can run additional async functions within this existing loop without conflicts.
import json
import logging
import sys
import time

import nest_asyncio

nest_asyncio.apply()

import pandas as pd
from langchain.document_loaders import GitbookLoader
from llama_index.core import Document, VectorStoreIndex
from llama_index.llms.openai import OpenAI

WARNI [langchain_community.utils.user_agent] USER_AGENT environment variable not set, consider setting it to identify your requests.


Enable Phoenix tracing via `LlamaIndexInstrumentor`. Phoenix uses OpenInference traces - an open-source standard for capturing and storing LLM application traces that enables LLM applications to seamlessly integrate with LLM observability solutions such as Phoenix.

In [7]:
"""
Fetches the Arize documentation from Gitbook and serializes it into LangChain format.
"""


def load_gitbook_docs(docs_url: str):
    """Loads documents from a Gitbook URL.

    Args:
        docs_url (str): URL to Gitbook docs.

    Returns:
        List[LangChainDocument]: List of documents in LangChain format.
    """
    loader = GitbookLoader(
        docs_url,
        load_all_paths=True,
    )
    return loader.load()


logging.basicConfig(level=logging.INFO, stream=sys.stdout)

# fetch documentation
docs_url = "https://docs.arize.com/phoenix"
embedding_model_name = "text-embedding-ada-002"
docs = load_gitbook_docs(docs_url)

  k = self.parse_starttag(i)
Fetching pages: 100%|##########| 126/126 [00:45<00:00,  2.78it/s]


In [8]:
documents = []
for doc in docs:
    documents.append(Document(metadata=doc.metadata, text=doc.page_content))

In [9]:
documents[0].metadata

{'source': 'https://docs.arize.com/phoenix/', 'title': 'Arize Phoenix'}

In [10]:
# Convert documents to a JSON serializable format (if needed)
documents_json = [doc.to_dict() for doc in documents]

# Save to a JSON file
with open("llama_index_documents.json", "w") as file:
    json.dump(documents_json, file, indent=4)

## Set Up VectorStore and Query Engine

In [11]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.postprocessor.cohere_rerank import CohereRerank

# Define an LLM
llm = OpenAI(model="gpt-4")

# Build index with a chunk_size of 1024
# node_parser = SimpleNodeParser.from_defaults(chunk_size=512)
# nodes = node_parser.get_nodes_from_documents(documents)
splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=250)
nodes = splitter.get_nodes_from_documents(documents)
vector_index = VectorStoreIndex(nodes)

Build a QueryEngine and start querying.

In [12]:
cohere_api_key = os.environ["COHERE_API_KEY"]
cohere_rerank = CohereRerank(api_key=cohere_api_key, top_n=2)

query_engine = vector_index.as_query_engine(
    similarity_top_k=5,
    node_postprocessors=[cohere_rerank],
)

* 'allow_population_by_field_name' has been renamed to 'populate_by_name'
* 'smart_union' has been removed


## Import Questions

In [16]:
questions_df = pd.read_parquet("PhoenixRAGUseCaseQuestions.parquet")

In [17]:
questions_df

Unnamed: 0,Prompt/ Question
0,How do I send traces to Phoenix?
1,What happens if I send the same traces twice?
2,Which frameworks and LLM providers are support...
3,How can users create and manage datasets for p...
4,How does Arize Phoenix use OpenTelemetry?
...,...
95,What is the Data retention policy?
96,Will hosted Phoenix be on the latest version o...
97,Is Hosted Phoenix free?
98,Can I persist data in the notebook?


## Generate Answers for all of the questions

In [18]:
# loop over the questions and generate the answers
for i, row in questions_df.iterrows():
    if i in [25, 50, 75]:
        time.sleep(30)
    question = row["Prompt/ Question"]
    response_vector = query_engine.query(question)
    print(f"Question: {question}\nAnswer: {response_vector.response}\n")

Question: How do I send traces to Phoenix?
Answer: To send traces to Phoenix, you can configure your application to log traces to a remote instance of Phoenix by setting the host and port where the traces will be sent. This can be done by using the environment variables PHOENIX_HOST and PHOENIX_PORT, or by setting the PHOENIX_COLLECTOR_ENDPOINT environment variable. By configuring your instrumentation in this way, your application will be able to send traces to the specified Phoenix instance for collection and visualization.

Question: What happens if I send the same traces twice? 
Answer: Tracing records the paths taken by requests as they move through multiple steps. Sending the same traces twice would likely result in duplicate records of the paths taken by those requests, potentially leading to redundant information being captured in the tracing system.

Question: Which frameworks and LLM providers are supported by Arize Phoenix for seamless integration?
Answer: Arize Phoenix suppo

## Phoenix Evals

In [19]:
from phoenix.session.evaluation import get_retrieved_documents

retrieved_documents_df = get_retrieved_documents(px.Client())
retrieved_documents_df

Unnamed: 0_level_0,Unnamed: 1_level_0,context.trace_id,input,reference,document_score
context.span_id,document_position,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
87326ee6473536e6,0,7992e68b92676ddd307af12e9109d8d0,How do I send traces to Phoenix?,Tracing Core Concepts\nHow to log traces\nTo l...,0.841724
87326ee6473536e6,1,7992e68b92676ddd307af12e9109d8d0,How do I send traces to Phoenix?,How does Tracing Work?\nThe components behind ...,0.840770
87326ee6473536e6,2,7992e68b92676ddd307af12e9109d8d0,How do I send traces to Phoenix?,Quickstart: Tracing\nInspect the inner-working...,0.831993
87326ee6473536e6,3,7992e68b92676ddd307af12e9109d8d0,How do I send traces to Phoenix?,Save and Load Traces\nHow to manually save and...,0.821883
87326ee6473536e6,4,7992e68b92676ddd307af12e9109d8d0,How do I send traces to Phoenix?,Quickstart: Deployment\nHow to use phoenix out...,0.821791
...,...,...,...,...,...
6f1f37a94447966a,0,6fe13e1545c6ed73fc1e755f6c401f57,Can I use gRPC for trace collection?,Use Cases: Tracing\nThe following. guides serv...,0.773752
6f1f37a94447966a,1,6fe13e1545c6ed73fc1e755f6c401f57,Can I use gRPC for trace collection?,Overview: Tracing\nTracing the execution of LL...,0.772057
6f1f37a94447966a,2,6fe13e1545c6ed73fc1e755f6c401f57,Can I use gRPC for trace collection?,How does Tracing Work?\nThe components behind ...,0.767665
6f1f37a94447966a,3,6fe13e1545c6ed73fc1e755f6c401f57,Can I use gRPC for trace collection?,"os\n.\nenviron\n[\n""PHOENIX_NOTEBOOK_ENV""\n]\n...",0.767380


In [20]:
from phoenix.session.evaluation import get_qa_with_reference

queries_df = get_qa_with_reference(px.active_session())
queries_df

Unnamed: 0_level_0,input,output,reference
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ff4fb7579db0ee52,How do I send traces to Phoenix?,"To send traces to Phoenix, you can configure y...",Tracing Core Concepts\nHow to log traces\nTo l...
8bdb657ba242734d,What happens if I send the same traces twice?,Tracing records the paths taken by requests as...,The reply half may be formatted for response p...
245ef4388ed13f47,Which frameworks and LLM providers are support...,Arize Phoenix supports a variety of frameworks...,"Arize\nPhoenix works hand-in-hand with Arize, ..."
e83f9dfcf0a0bbfd,How can users create and manage datasets for p...,Users can create and manage datasets for promp...,"Arize\nPhoenix works hand-in-hand with Arize, ..."
8dcbbaa9b2d8dc0e,How does Arize Phoenix use OpenTelemetry?,Arize Phoenix uses OpenTelemetry for instrumen...,Arize Phoenix\nAI Observability and Evaluation...
...,...,...,...
5b8a027df5d99ba9,What is the Data retention policy?,The data retention policy allows users to pers...,Email Extraction\nComing soon\nPrevious\nSumma...
ced27f258bc7927d,Will hosted Phoenix be on the latest version o...,Hosted Phoenix will always be on the latest ve...,Deployment\nHow to self-host a phoenix instanc...
f659caf9e7d6c944,Is Hosted Phoenix free?,"Hosted Phoenix is free for all developers, wit...",Hosted Phoenix\nWe now offer a hosted version ...
a2c3e892839cc3ea,Can I persist data in the notebook?,Persistence for notebooks (a.k.a. launch_app) ...,"os\n.\nenviron\n[\n""PHOENIX_NOTEBOOK_ENV""\n]\n..."


Let's now use Phoenix's LLM Evals to evaluate the relevance of the retrieved documents with regards to the query. Note, we've turned on `explanations` which prompts the LLM to explain it's reasoning. This can be useful for debugging and for figuring out potential corrective actions.

In [21]:
from phoenix.evals import (
    HallucinationEvaluator,
    OpenAIModel,
    QAEvaluator,
    RelevanceEvaluator,
    run_evals,
)

eval_model = OpenAIModel(model="gpt-4")
relevance_evaluator = RelevanceEvaluator(eval_model)
hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_evaluator = QAEvaluator(eval_model)

retrieved_documents_relevance_df = run_evals(
    evaluators=[relevance_evaluator],
    dataframe=retrieved_documents_df,
    provide_explanation=True,
    concurrency=20,
)[0]

run_evals |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 500/500 (100.0%) | ‚è≥ 01:58<00:00 |  3.15it/s

In [22]:
hallucination_eval_df, qa_eval_df = run_evals(
    dataframe=queries_df,
    evaluators=[hallucination_evaluator, qa_evaluator],
    provide_explanation=True,
    concurrency=20,
)

run_evals |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 500/500 (100.0%) | ‚è≥ 01:59<00:00 |  4.19it/s


run_evals |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 200/200 (100.0%) | ‚è≥ 01:07<00:00 |  1.33it/s

In [23]:
retrieved_documents_relevance_df = retrieved_documents_relevance_df.reset_index().set_index(
    "context.span_id"
)
retrieved_documents_relevance_df

Unnamed: 0_level_0,document_position,label,score,explanation
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
87326ee6473536e6,0,relevant,1,The question asks about how to send traces to ...
87326ee6473536e6,1,relevant,1,The question asks about how to send traces to ...
87326ee6473536e6,2,relevant,1,The question asks about how to send traces to ...
87326ee6473536e6,3,unrelated,0,The question asks about how to send traces to ...
87326ee6473536e6,4,relevant,1,The question asks about how to send traces to ...
...,...,...,...,...
6f1f37a94447966a,0,unrelated,0,The question asks about the possibility of usi...
6f1f37a94447966a,1,unrelated,0,The question asks about the possibility of usi...
6f1f37a94447966a,2,unrelated,0,The question is asking about the possibility o...
6f1f37a94447966a,3,relevant,1,The question asks if gRPC can be used for trac...


In [24]:
hallucination_eval_df.head()

Unnamed: 0_level_0,label,score,explanation
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ff4fb7579db0ee52,factual,0,The reference text provides information on how...
8bdb657ba242734d,hallucinated,1,The query asks about the consequences of sendi...
245ef4388ed13f47,hallucinated,1,The query asks about the frameworks and LLM pr...
e83f9dfcf0a0bbfd,factual,0,The reference text provides information on how...
8dcbbaa9b2d8dc0e,factual,0,The query asks about how Arize Phoenix uses Op...


## Log the Evals into Phoenix

In [25]:
from phoenix.trace import SpanEvaluations

px.Client().log_evaluations(
    SpanEvaluations(eval_name="Hallucination", dataframe=hallucination_eval_df),
    SpanEvaluations(eval_name="QA Correctness", dataframe=qa_eval_df),
    SpanEvaluations(eval_name="Retrieval Relevance", dataframe=retrieved_documents_relevance_df),
)

run_evals |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 200/200 (100.0%) | ‚è≥ 01:08<00:00 |  2.93it/s
  df_attributes = pd.DataFrame.from_records(


In [26]:
session.view()

üì∫ Opening a view to the Phoenix app. The app is running at http://localhost:6006/


## Save the Trace and Evals

In [27]:
import os

# Specify and Create the Directory for Trace Dataset
directory = "saved_traces_and_evals"
os.makedirs(directory, exist_ok=True)

# Save the Trace Dataset
trace_id = px.Client().get_trace_dataset().save(directory=directory)

üíæ Trace dataset saved to under ID: 6ba5bfd7-06f7-4df0-b56a-8de017b787fc
üìÇ Trace dataset path: my_saved_traces/trace_dataset-6ba5bfd7-06f7-4df0-b56a-8de017b787fc.parquet
