# Run Evaluations on our RAG chatbot! 

<div align="center">
    <p style="text-align:left">
        <img alt="phoenix logo" src="https://repository-images.githubusercontent.com/564072810/f3666cdf-cb3e-4056-8a25-27cb3e6b5848" width="800"/>
        <br>
        <a href="https://arize.com/docs/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email">Community</a>
    </p>
</div>

## Let's get started! 

In [1]:
%pip install -qqq "arize-phoenix==11.21.0" "openai>=1" nest_asyncio

Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
from getpass import getpass
import phoenix as px
import nest_asyncio

from dotenv import load_dotenv
load_dotenv(os.path.abspath("/workspaces/llm-zoomcamp/.env"))

if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
os.environ["OPENAI_API_KEY"] = openai_api_key

nest_asyncio.apply()

  from .autonotebook import tqdm as notebook_tqdm


<img alt="Document Retrieval Evaluation Image" src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/phoenix-docs-images/documentRelevanceDiagram.png" width="1000"/>

In [3]:
from phoenix.session.evaluation import get_retrieved_documents
retrieved_documents_df = get_retrieved_documents(px.Client(), project_name="our-rag-project", timeout=None)
retrieved_documents_df



Unnamed: 0_level_0,Unnamed: 1_level_0,context.trace_id,input,reference
context.span_id,document_position,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
6d6098549929a220,0,a029eb7f7e37b457757f5d89e7f5f4be,course title,The purpose of this document is to capture fre...
6d6098549929a220,1,a029eb7f7e37b457757f5d89e7f5f4be,course title,"Yes, we will keep all the materials after the ..."
6d6098549929a220,2,a029eb7f7e37b457757f5d89e7f5f4be,course title,GitHub - DataTalksClub data-engineering-zoomca...
6d6098549929a220,3,a029eb7f7e37b457757f5d89e7f5f4be,course title,You can start by installing and setting up all...
6d6098549929a220,4,a029eb7f7e37b457757f5d89e7f5f4be,course title,"Yes, even if you don't register, you're still ..."
1a0aa1ae822d7984,0,3546e03a3625dfe86623045fb784d27d,course title of this course,The purpose of this document is to capture fre...
1a0aa1ae822d7984,1,3546e03a3625dfe86623045fb784d27d,course title of this course,GitHub - DataTalksClub data-engineering-zoomca...
1a0aa1ae822d7984,2,3546e03a3625dfe86623045fb784d27d,course title of this course,"Yes, we will keep all the materials after the ..."
1a0aa1ae822d7984,3,3546e03a3625dfe86623045fb784d27d,course title of this course,"After you create a GitHub account, you should ..."
1a0aa1ae822d7984,4,3546e03a3625dfe86623045fb784d27d,course title of this course,You can start by installing and setting up all...


In [4]:
from phoenix.session.evaluation import get_qa_with_reference

queries_df = get_qa_with_reference(px.Client(), project_name="our-rag-project", timeout=None)
queries_df



Unnamed: 0_level_0,input,output,reference
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
17993c8f29ecd171,what is the course title of this course?,The title of the course is **Data Engineering ...,The purpose of this document is to capture fre...
23ab32d1d9d94715,what is the course title of this course?,"The course title is ""Data Engineering Zoomcamp...",The purpose of this document is to capture fre...
16f820ac3a0b7c09,what is the course title of this course?,"The course title is ""Data Engineering Zoomcamp...",The purpose of this document is to capture fre...
8fcb87730b085cfc,Can I submit all assignments at the end of the...,Short answer: Yes — you can join the course la...,The purpose of this document is to capture fre...
3550f5bd7b1bd9a0,Is there more than one Data-Engineering Zoom C...,There’s only one live Data-Engineering Zoomcam...,"There are 3 Zoom Camps in a year, as of 2024. ..."


In [7]:
from phoenix.evals import (
    HallucinationEvaluator,
    OpenAIModel,
    QAEvaluator,
    RelevanceEvaluator,
    run_evals,
)

eval_model = OpenAIModel(model="gpt-5-nano", temperature=1)
relevance_evaluator = RelevanceEvaluator(eval_model)
hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_evaluator = QAEvaluator(eval_model)

In [8]:
retrieved_documents_relevance_df = run_evals(
    evaluators=[relevance_evaluator],
    dataframe=retrieved_documents_df,
    provide_explanation=True,
    concurrency=20,
)[0]
retrieved_documents_relevance_df



Unnamed: 0_level_0,Unnamed: 1_level_0,label,score,explanation
context.span_id,document_position,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
6d6098549929a220,0,unrelated,0,EXPLANATION: Step 1 - Identify the question: '...
6d6098549929a220,1,unrelated,0,The question asks for the course title. The re...
6d6098549929a220,2,relevant,1,Step 1: Identify what the question is asking —...
6d6098549929a220,3,unrelated,0,EXPLANATION: \n1) Identify the user's question...
6d6098549929a220,4,unrelated,0,Step 1: Identify the user question which asks ...
1a0aa1ae822d7984,0,unrelated,0,Step 1: The question asks for the course title...
1a0aa1ae822d7984,1,relevant,1,Step 1: Determine what the question asks: the ...
1a0aa1ae822d7984,2,unrelated,0,Step 1: Identify the user's question: determin...
1a0aa1ae822d7984,3,unrelated,0,The reference text discusses cloning a GitHub ...
1a0aa1ae822d7984,4,unrelated,0,EXPLANATION: The question asks for the course ...


In [9]:
hallucination_eval_df, qa_eval_df = run_evals(
    dataframe=queries_df,
    evaluators=[hallucination_evaluator, qa_evaluator],
    provide_explanation=True,
    concurrency=20,
)
hallucination_eval_df


run_evals |██████████| 25/25 (100.0%) | ⏳ 01:10<00:00 |  2.81s/it

[A
[A
[A
[A
[A
[A
[A
[A
[A

Unnamed: 0_level_0,label,score,explanation
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
17993c8f29ecd171,hallucinated,1,REASONING STEPS: 1) Check the reference text f...
23ab32d1d9d94715,hallucinated,1,The reference text mentions a Zoomcamp course ...
16f820ac3a0b7c09,hallucinated,1,The reference text does not explicitly provide...
8fcb87730b085cfc,hallucinated,1,To judge if the answer is factual relative to ...
3550f5bd7b1bd9a0,factual,0,Reasoning: The reference text states there are...


In [11]:
from phoenix.trace import DocumentEvaluations, SpanEvaluations

px.Client().log_evaluations(
    SpanEvaluations(eval_name="Hallucination", dataframe=hallucination_eval_df),
    SpanEvaluations(eval_name="QA Correctness", dataframe=qa_eval_df),
    DocumentEvaluations(
        eval_name="Retrieval Relevance", dataframe=retrieved_documents_relevance_df
    ),
)

