# Evaluate RAG with LlamaIndex

I have familiarity with building RAG applications with Langchain, but this is my first time exploring LlamaIndex.  

RAG evaluation is important and how exactly to go about doing it hasn't been clear to me. Is there a consensus on what metric(s) to use? How can user satisfaction be quantified? This notebook is my attempt to improve my understanding and heavily references this [tutorial](https://github.com/openai/openai-cookbook/blob/main/examples/evaluation/Evaluate_RAG_with_LlamaIndex.ipynb). Let's dive in!

## Libraries

In [2]:
# The nest_asyncio module enables the nesting of asynchronous functions within an already running async loop.
# This is necessary because Jupyter notebooks inherently operate in an asynchronous loop.
# By applying nest_asyncio, we can run additional async functions within this existing loop without conflicts.
import nest_asyncio

nest_asyncio.apply()

from llama_index.evaluation import generate_question_context_pairs
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.node_parser import SimpleNodeParser
from llama_index.evaluation import generate_question_context_pairs
from llama_index.evaluation import RetrieverEvaluator
from llama_index.llms import OpenAI

import os
import pandas as pd

In [12]:
documents = SimpleDirectoryReader("./data/").load_data()

In [14]:
# Define an LLM
llm = OpenAI(model="gpt-4-1106-preview")

# Build index with a chunk_size of 512
node_parser = SimpleNodeParser.from_defaults(chunk_size=512)
nodes = node_parser.get_nodes_from_documents(documents)
vector_index = VectorStoreIndex(nodes)

In [15]:
query_engine = vector_index.as_query_engine()

In [16]:
response_vector = query_engine.query("What did the author do growing up?")

In [18]:
response_vector.response

'The author wrote short stories and tried programming on an IBM 1401 computer during their time in school.'