# Evaluate RAG with LlamaIndex

I have familiarity with building RAG applications with Langchain, but this is my first time exploring LlamaIndex.  

RAG evaluation is important and how exactly to go about doing it hasn't been clear to me. Is there a consensus on what metric(s) to use? How can user satisfaction be quantified? This notebook is my attempt to improve my understanding and heavily references this [tutorial](https://github.com/openai/openai-cookbook/blob/main/examples/evaluation/Evaluate_RAG_with_LlamaIndex.ipynb). Let's dive in!

## Import libraries & data

In [4]:
from llama_index.evaluation import generate_question_context_pairs
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.node_parser import SimpleNodeParser
from llama_index.evaluation import generate_question_context_pairs
from llama_index.evaluation import RetrieverEvaluator
from llama_index.llms import OpenAI

import os
import pandas as pd
from dotenv import load_dotenv

# The nest_asyncio module enables the nesting of asynchronous functions within an already running async loop.
# This is necessary because Jupyter notebooks inherently operate in an asynchronous loop.
# By applying nest_asyncio, we can run additional async functions within this existing loop without conflicts.
import nest_asyncio

nest_asyncio.apply()


load_dotenv()


True

Documents taken from [Paul Graham's website](https://www.paulgraham.com/worked.html).

In [2]:
documents = SimpleDirectoryReader("./data/").load_data()

In [36]:
documents[0].text[:200]

"\n\nWhat I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed "

## Setup RAG pipeline

Initialize LLM (OpenAI's GPT4) and build index:

In [62]:
# Define an LLM
llm = OpenAI(model="gpt-4-1106-preview")

# Build index with a chunk_size of 512
node_parser = SimpleNodeParser.from_defaults(chunk_size=512)
nodes = node_parser.get_nodes_from_documents(documents)
vector_index = VectorStoreIndex(nodes)

Build a QueryEngine and start querying.

In [63]:
query_engine = vector_index.as_query_engine(similarity_top_k=3)

Run a query and get a response:

In [64]:
response_vector = query_engine.query(
    "Did the author enjoy philosophy courses? Explain and justify your answer."
)
response_vector.response


'The author did not enjoy philosophy courses. This can be inferred from the statement that "All I knew at the time was that I kept taking philosophy courses and they kept being boring." The author\'s decision to switch to AI suggests a lack of interest and enjoyment in philosophy.'

By default it retrieves two similar nodes/ chunks. This can be modified in `vector_index.as_query_engine(similarity_top_k=k)`  

Let's check the text in each of these retrieved nodes.

In [65]:
response_vector.source_nodes[0].get_text()[:500]


"This was when I really started programming. I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book. There was only room in memory for about 2 pages of text, so he'd write 2 pages at a time and then print them out, but it was a lot better than a typewriter.\n\nThough I liked programming, I didn't plan to study it in college. In college I was going to study philosophy, which sounded much more powerful. It se"

In [66]:
response_vector.source_nodes[1].get_text()[:500]

"How should I choose what to do? Well, how had I chosen what to work on in the past? I wrote an essay for myself to answer that question, and I was surprised how long and messy the answer turned out to be. If this surprised me, who'd lived it, then I thought perhaps it would be interesting to other people, and encouraging to those with similarly messy lives. So I wrote a more detailed version for others to read, and this is the last sentence of it.\n\n\n\n\n\n\n\n\n\nNotes\n\n[1] My experience skipped a step"

In [67]:
response_vector.source_nodes[2].get_text()[:500]

"And moreover this was something you could make a living doing. Not as easily as you could by writing software, of course, but I thought if you were really industrious and lived really cheaply, it had to be possible to make enough to survive. And as an artist you could be truly independent. You wouldn't have a boss, or even need to get research funding.\n\nI had always liked looking at paintings. Could I make them? I had no idea. I'd never imagined it was even possible. I knew intellectually that p"

## Evaluate responses