<center>
    <p style="text-align:center">
    <img alt="arize logo" src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="300"/>
        <br>
        <a href="https://docs.arize.com/arize/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/client_python">GitHub</a>
        |
        <a href="https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q">Community</a>
    </p>
</center>

<center><h1>Using Arize with RAG</h1></center>

This guide shows you how to create a retrieval augmented generation chatbot and evaluate performance with Arize. RAG is typically to respond to queries using a specified set of documents instead of using the LLM's own training data, reducing hallucination and incorrect generations.

We'll go through the following steps:

* Create a RAG chatbot using LlamaIndex

* Trace the retrieval and llm calls using Arize

* Create a dataset to benchmark performance

* Evaluate performance using LLM as a judge

# Create a RAG chatbot using LlamaIndex

Let's start with all of our boilerplate setup:

1. Install packages for tracing and retrieval
2. Setup our API keys
3. Setup Phoenix for tracing
4. Create our LlamaIndex query engine
5. See your results in Phoenix

### Install packages for tracing and retrieval

In [None]:
!pip install llama-index openai arize-phoenix-evals arize-otel openinference-instrumentation-llama-index

Collecting llama-index
  Using cached llama_index-0.11.22-py3-none-any.whl.metadata (11 kB)
Collecting arize-phoenix-evals
  Using cached arize_phoenix_evals-0.17.3-py3-none-any.whl.metadata (4.3 kB)
Collecting arize-otel
  Using cached arize_otel-0.5.3-py3-none-any.whl.metadata (11 kB)
Collecting openinference-instrumentation-llama-index
  Using cached openinference_instrumentation_llama_index-3.0.3-py3-none-any.whl.metadata (5.5 kB)
Collecting llama-index-agent-openai<0.4.0,>=0.3.4 (from llama-index)
  Using cached llama_index_agent_openai-0.3.4-py3-none-any.whl.metadata (728 bytes)
Collecting llama-index-cli<0.4.0,>=0.3.1 (from llama-index)
  Using cached llama_index_cli-0.3.1-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.12.0,>=0.11.22 (from llama-index)
  Using cached llama_index_core-0.11.22-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-embeddings-openai<0.3.0,>=0.2.4 (from llama-index)
  Using cached llama_index_embeddings_openai-0.2.5-py3-none-any

### Setup our API Keys

In [None]:
import os
from getpass import getpass

if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")

os.environ["OPENAI_API_KEY"] = openai_api_key

🔑 Enter your OpenAI API key: ··········


### Setup Arize for Tracing

To follow with this tutorial, you'll need to sign up for Arize and get your API key. You can see the [guide here](https://docs.arize.com/arize/llm-tracing/quickstart-llm).

In [None]:
# Import open-telemetry dependencies
from arize_otel import register_otel, Endpoints
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

# Setup OTEL via our convenience function
register_otel(
    endpoints = Endpoints.ARIZE,
    space_id = getpass("🔑 Enter your Arize Space ID: "),
    api_key = getpass("🔑 Enter your Arize API key: "),
    model_id = "agents-cookbook", # name this to whatever you would like
)
LlamaIndexInstrumentor().instrument()



### Create our LlamaIndex query engine

In [None]:
!mkdir data
!wget "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt" -O data/paul_graham_essay.txt

--2024-11-07 19:58:52--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham_essay.txt’


2024-11-07 19:58:53 (726 KB/s) - ‘data/paul_graham_essay.txt’ saved [75042/75042]



In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from pprint import pprint

# load documents
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did Paul Graham work on?")
pprint(response)

Response(response='Paul Graham worked on various projects such as painting, '
                  'experimenting with a new kind of still life, looking for an '
                  'apartment to buy, starting a web app for making web apps, '
                  'working on an application builder, developing a new dialect '
                  'of Lisp called Arc, writing essays, working on spam '
                  'filters, hosting dinners for friends, and starting an '
                  'investment firm called Y Combinator.',
         source_nodes=[NodeWithScore(node=TextNode(id_='44a9caee-a671-4859-80d9-78a2aee6063d', embedding=None, metadata={'file_path': '/content/data/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75042, 'creation_date': '2024-11-07', 'last_modified_date': '2024-11-07'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metad

In [None]:
for node in response.source_nodes:
    text_fmt = node.node.get_content().strip().replace("\n", " ")[:200]+"..."
    print(text_fmt)
    print(node.score)
    print("--------")

Now when I walked past charming little restaurants I could go in and order lunch. It was exciting for a while. Painting started to go better. I experimented with a new kind of still life where I'd pai...
0.8470505444635096
--------
So while working on things that aren't prestigious doesn't guarantee you're on the right track, it at least guarantees you're not on the most common type of wrong one.  Over the next several years I w...
0.8406981214404532
--------


### See your results in the Arize UI
Once you've run a single query, you can see the trace in the Arize UI with each step taken by the retriever, the embedding, and the llm query.

Click through the queries to better understand how the query engine is performing.

Phoenix can be used to understand and troubleshoot your by surfacing:
 - **Application latency** - highlighting slow invocations of LLMs, Retrievers, etc.
 - **Token Usage** - Displays the breakdown of token usage with LLMs to surface up your most expensive LLM calls
 - **Runtime Exceptions** - Critical runtime exceptions such as rate-limiting are captured as exception events.
 - **Retrieved Documents** - view all the documents retrieved during a retriever call and the score and order in which they were returned
 - **Embeddings** - view the embedding text used for retrieval and the underlying embedding model
LLM Parameters - view the parameters used when calling out to an LLM to debug things like temperature and the system prompts
 - **Prompt Templates** - Figure out what prompt template is used during the prompting step and what variables were used.
 - **Tool Descriptions** - view the description and function signature of the tools your LLM has been given access to
 - **LLM Function Calls** - if using OpenAI or other a model with function calls, you can view the function selection and function messages in the input messages to the LLM.

<img src="https://storage.cloud.google.com/arize-assets/tutorials/images/Phoenix-LlamaIndex-Starter.png" width="800"/>

# Create synthetic dataset of questions

Using the template below, we're going to generate a dataframe of 25 questions we can use to test our customer support agent.

In [None]:
GEN_TEMPLATE = """
You are an assistant that generates Q&A questions about Paul Graham's essay below.

The questions should involve the essay contents, specific facts and figures,
names, and elements of the story. Do not ask any questions where the answer is
not in the essay contents.

Respond with one question per line. Do not include any numbering at the beginning of each line. Do not include any category headings.
Generate 25 questions. Be sure there are no duplicate questions.

[START ESSAY]
{essay}
[END ESSAY]
"""

with open('data/paul_graham_essay.txt', 'r') as file:
  file_content = file.read()

GEN_TEMPLATE = GEN_TEMPLATE.format(essay=file_content)

In [None]:
import nest_asyncio
import pandas as pd
nest_asyncio.apply()
from phoenix.evals import OpenAIModel
pd.set_option('display.max_colwidth', 500)

model = OpenAIModel(model="gpt-4o", max_tokens=1300)

In [None]:
resp = model(GEN_TEMPLATE)

In [None]:
split_response = resp.strip().split('\n\n')

questions_df = pd.DataFrame(split_response, columns=['input'])
print(questions_df.head(3))

                                                                 input
0  What were the two main things Paul Graham worked on before college?
1        What type of writing did Paul Graham focus on before college?
2        What was the first computer Paul Graham tried programming on?


Now let's run it and manually inspect the traces! Change the value in `.head(2)` from any number between 1 and 25 to run it on that many data points from the questions we generated earlier.

Then manually inspect the outputs in Phoenix.

In [None]:
# prompt: apply query_engine.query to every item in questions_df using column 'input'

for index, row in questions_df.iterrows():
    response = query_engine.query(row['input'])
    reference_text = ""
    for node in response.source_nodes:
        reference_text += node.text
        reference_text += "\n"
    questions_df.loc[index, 'output'] = response
    questions_df.loc[index, 'reference'] = reference_text
questions_df.head(3)

Unnamed: 0,input,output,reference
0,What were the two main things Paul Graham worked on before college?,Writing and programming,"What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called ""data ..."
1,What type of writing did Paul Graham focus on before college?,Paul Graham focused on writing short stories before college.,"What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called ""data ..."
2,What was the first computer Paul Graham tried programming on?,"The first computer Paul Graham tried programming on was the IBM 1401 that his school district used for ""data processing"" when he was in 9th grade.","What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called ""data ..."


# Evaluating your RAG app

Now that we have a set of test cases, we can create evaluators to measure performance. This way, we don't have to manually inspect every single trace to see if the LLM is doing the right thing.

In [None]:
RELEVANCE_EVAL_TEMPLATE = '''You are comparing a reference text to a question and trying to determine if the reference text
contains information relevant to answering the question. Here is the data:
    [BEGIN DATA]
    ************
    [Question]: {input}
    ************
    [Reference text]: {reference}
    [END DATA]

Compare the Question above to the Reference text. You must determine whether the Reference text
contains information that can answer the Question. Please focus on whether the very specific
question can be answered by the information in the Reference text.
Your response must be single word, either "relevant" or "unrelated",
and should not contain any text or characters aside from that word.
"unrelated" means that the reference text does not contain an answer to the Question.
"relevant" means the reference text contains an answer to the Question.
'''

CORRECTNESS_EVAL_TEMPLATE = '''You are given a question, an answer and reference text. You must determine whether the
given answer correctly answers the question based on the reference text. Here is the data:
    [BEGIN DATA]
    ************
    [Question]: {input}
    ************
    [Reference]: {reference}
    ************
    [Answer]: {output}
    [END DATA]
Your response must be a single word, either "correct" or "incorrect",
and should not contain any text or characters aside from that word.
"correct" means that the question is correctly and fully answered by the answer.
"incorrect" means that the question is not correctly or only partially answered by the
answer.
'''

We will be creating an LLM as a judge using the prompt templates above by taking the spans recorded by Phoenix, and then giving them labels using the `llm_classify` function. This function uses LLMs to evaluate your LLM calls and gives them labels and explanations. You can read more detail [here](https://docs.arize.com/phoenix/api/evals#phoenix.evals.llm_classify).

To get the spans in the right format, we'll be using our helper function `get_qa_with_reference`. You can see how this function works [here](https://docs.arize.com/phoenix/tracing/how-to-tracing/extract-data-from-spans#pre-defined-queries) and the github reference [here](https://github.com/Arize-ai/phoenix/blob/main/src/phoenix/trace/dsl/helpers.py#L71).

In [None]:
from phoenix.evals import (
    OpenAIModel,
    llm_classify
)

RELEVANCE_RAILS = ["relevant", "unrelated"]
CORRECTNESS_RAILS = ["incorrect", "correct"]

relevance_eval_df = llm_classify(
    dataframe=questions_df,
    template=RELEVANCE_EVAL_TEMPLATE,
    model=OpenAIModel(model='gpt-4o'),
    rails=RELEVANCE_RAILS,
    provide_explanation=True,
    include_prompt=True,
    concurrency=4
)

correctness_eval_df = llm_classify(
    dataframe=questions_df,
    template=CORRECTNESS_EVAL_TEMPLATE,
    model=OpenAIModel(model='gpt-4o'),
    rails=CORRECTNESS_RAILS,
    provide_explanation=True,
    include_prompt=True,
    concurrency=4
)

llm_classify |          | 0/26 (0.0%) | ⏳ 00:00<? | ?it/s

llm_classify |          | 0/26 (0.0%) | ⏳ 00:00<? | ?it/s

Let's look at and inspect the results of our evaluatiion!

In [None]:
relevance_eval_df

Unnamed: 0,label,explanation,prompt,exceptions,execution_status,execution_seconds
0,relevant,"The reference text explicitly states that before college, Paul Graham worked on writing and programming. These are identified as the two main things he worked on outside of school. This directly answers the question about what Paul Graham worked on before college.","You are comparing a reference text to a question and trying to determine if the reference text\ncontains information relevant to answering the question. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What were the two main things Paul Graham worked on before college?\n ************\n [Reference text]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrot...",[],COMPLETED,1.225341
1,relevant,"The reference text explicitly mentions that before college, Paul Graham focused on writing short stories and programming. This directly answers the question about the type of writing he focused on before college, which was short stories.","You are comparing a reference text to a question and trying to determine if the reference text\ncontains information relevant to answering the question. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What type of writing did Paul Graham focus on before college?\n ************\n [Reference text]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what...",[],COMPLETED,1.319833
2,relevant,"The reference text mentions that the first computer Paul Graham tried programming on was the IBM 1401, which was used by his school district for data processing. This directly answers the question about the first computer he tried programming on.","You are comparing a reference text to a question and trying to determine if the reference text\ncontains information relevant to answering the question. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What was the first computer Paul Graham tried programming on?\n ************\n [Reference text]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what...",[],COMPLETED,1.40023
3,relevant,The reference text mentions that Paul Graham and his friend Rich Draves got permission to use the IBM 1401. This directly answers the question about who Paul Graham's friend was that got permission to use the IBM 1401 with him.,"You are comparing a reference text to a question and trying to determine if the reference text\ncontains information relevant to answering the question. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: Who was Paul Graham's friend that got permission to use the IBM 1401 with him?\n ************\n [Reference text]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write ess...",[],COMPLETED,2.072494
4,relevant,The reference text mentions that Paul Graham used an early version of Fortran to program on the IBM 1401. This directly answers the question about which programming language he used on that machine.,"You are comparing a reference text to a question and trying to determine if the reference text\ncontains information relevant to answering the question. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What programming language did Paul Graham use on the IBM 1401?\n ************\n [Reference text]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote wha...",[],COMPLETED,1.164307
5,relevant,The reference text mentions that the first of Paul Graham's friends to get a microcomputer built it himself from a kit sold by Heathkit. This directly answers the question about what was the first microcomputer Paul Graham's friend built.,"You are comparing a reference text to a question and trying to determine if the reference text\ncontains information relevant to answering the question. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What was the first microcomputer Paul Graham's friend built?\n ************\n [Reference text]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what ...",[],COMPLETED,1.491344
6,relevant,The reference text mentions that Paul Graham's father eventually bought him a TRS-80 computer after years of nagging. This directly answers the question about the type of computer Paul Graham's father bought for him.,"You are comparing a reference text to a question and trying to determine if the reference text\ncontains information relevant to answering the question. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What type of computer did Paul Graham's father eventually buy for him?\n ************\n [Reference text]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I w...",[],COMPLETED,1.535304
7,relevant,The reference text mentions that Paul Graham wrote a word processor that his father used to write at least one book. This directly answers the question about which program Paul Graham wrote that his father used to write a book.,"You are comparing a reference text to a question and trying to determine if the reference text\ncontains information relevant to answering the question. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What program did Paul Graham write that his father used to write a book?\n ************\n [Reference text]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I...",[],COMPLETED,1.235477
8,relevant,The reference text explicitly states that Paul Graham initially planned to study philosophy in college. This directly answers the question about what subject he initially planned to study.,"You are comparing a reference text to a question and trying to determine if the reference text\ncontains information relevant to answering the question. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What subject did Paul Graham initially plan to study in college?\n ************\n [Reference text]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote w...",[],COMPLETED,0.850536
9,relevant,"The reference text explicitly mentions that Paul Graham's interest in AI was influenced by a novel by Heinlein called ""The Moon is a Harsh Mistress."" This directly answers the question about which novel by Heinlein influenced his interest in AI.",You are comparing a reference text to a question and trying to determine if the reference text\ncontains information relevant to answering the question. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: Which novel by Heinlein influenced Paul Graham's interest in AI?\n ************\n [Reference text]: I couldn't have put this into words when I was 18. All I knew at the time was that I kept taking philosophy courses and they kept being boring. So I decided to switch...,[],COMPLETED,1.180476


In [None]:
correctness_eval_df

Unnamed: 0,label,explanation,prompt,exceptions,execution_status,execution_seconds
0,correct,"The reference text states that before college, the two main things Paul Graham worked on, outside of school, were writing and programming. The answer provided is 'Writing and programming,' which matches the information given in the reference text.","You are given a question, an answer and reference text. You must determine whether the\ngiven answer correctly answers the question based on the reference text. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What were the two main things Paul Graham worked on before college?\n ************\n [Reference]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I w...",[],COMPLETED,1.416222
1,correct,"The reference text states that before college, Paul Graham worked on writing short stories. The answer provided is consistent with this information, as it states that Paul Graham focused on writing short stories before college.","You are given a question, an answer and reference text. You must determine whether the\ngiven answer correctly answers the question based on the reference text. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What type of writing did Paul Graham focus on before college?\n ************\n [Reference]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote w...",[],COMPLETED,1.230546
2,correct,"The reference text states that the first computer Paul Graham tried programming on was the IBM 1401, which was used by his school district for ""data processing"" when he was in 9th grade. This matches the information provided in the answer.","You are given a question, an answer and reference text. You must determine whether the\ngiven answer correctly answers the question based on the reference text. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What was the first computer Paul Graham tried programming on?\n ************\n [Reference]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote w...",[],COMPLETED,1.377718
3,correct,"The reference text states that Paul Graham and his friend Rich Draves got permission to use the IBM 1401. Therefore, the answer 'Rich Draves' correctly identifies Paul Graham's friend who got permission to use the IBM 1401 with him.","You are given a question, an answer and reference text. You must determine whether the\ngiven answer correctly answers the question based on the reference text. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: Who was Paul Graham's friend that got permission to use the IBM 1401 with him?\n ************\n [Reference]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write ...",[],COMPLETED,1.596712
4,correct,"The reference text states that Paul Graham used an early version of Fortran on the IBM 1401. The answer provided is 'Fortran', which matches the information given in the reference text.","You are given a question, an answer and reference text. You must determine whether the\ngiven answer correctly answers the question based on the reference text. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What programming language did Paul Graham use on the IBM 1401?\n ************\n [Reference]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote ...",[],COMPLETED,1.885585
5,correct,"The reference text states that the first microcomputer Paul Graham's friend built was sold as a kit by Heathkit. This matches the information provided in the answer, making it correct.","You are given a question, an answer and reference text. You must determine whether the\ngiven answer correctly answers the question based on the reference text. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What was the first microcomputer Paul Graham's friend built?\n ************\n [Reference]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote wh...",[],COMPLETED,1.411798
6,correct,"The reference text states that after years of nagging, Paul Graham's father bought him a TRS-80 computer. This directly answers the question about what type of computer his father eventually bought for him.","You are given a question, an answer and reference text. You must determine whether the\ngiven answer correctly answers the question based on the reference text. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What type of computer did Paul Graham's father eventually buy for him?\n ************\n [Reference]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. ...",[],COMPLETED,1.832405
7,correct,"The reference text states that Paul Graham wrote a word processor that his father used to write at least one book. Therefore, the answer 'A word processor' correctly answers the question about what program Paul Graham wrote that his father used to write a book.","You are given a question, an answer and reference text. You must determine whether the\ngiven answer correctly answers the question based on the reference text. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What program did Paul Graham write that his father used to write a book?\n ************\n [Reference]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays...",[],COMPLETED,1.240479
8,correct,"The reference text states that Paul Graham initially planned to study philosophy in college. The answer provided matches this information, making it correct.","You are given a question, an answer and reference text. You must determine whether the\ngiven answer correctly answers the question based on the reference text. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: What subject did Paul Graham initially plan to study in college?\n ************\n [Reference]: What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrot...",[],COMPLETED,1.037174
9,correct,"The reference text explicitly states that the novel by Heinlein that influenced Paul Graham's interest in AI was ""The Moon is a Harsh Mistress."" The answer provided matches this information exactly.","You are given a question, an answer and reference text. You must determine whether the\ngiven answer correctly answers the question based on the reference text. Here is the data:\n [BEGIN DATA]\n ************\n [Question]: Which novel by Heinlein influenced Paul Graham's interest in AI?\n ************\n [Reference]: I couldn't have put this into words when I was 18. All I knew at the time was that I kept taking philosophy courses and they kept being boring. So I decided to swi...",[],COMPLETED,1.187599
