# 3. Evaluate the results

In this notebook, we will look at ways to evaluate your RAG application.

## Ground truth data

To evaluate the results, we need some ground truth questions and answers. This repo includes `ground_truths.csv`, which contains top Git questions along with their accepted answers from https://stackoverflow.com/questions/tagged/git.

In [1]:
import pandas as pd

In [2]:
truth = pd.read_csv("../ground_truths.csv")
for i, row in truth.iterrows():
    print(f"Question: {row.Q}")
    print(f"Answer: {row.A}")
    print("\n\n")

Question: I accidentally committed the wrong files to Git, but didn't push the commit to the server yet.
How do I undo those commits from the local repository?
Answer: Undo a commit & redo
$ git commit -m "Something terribly misguided" # (0: Your Accident)
$ git reset HEAD~                              # (1)
[ edit files as necessary ]                    # (2)
$ git add .                                    # (3)
$ git commit -c ORIG_HEAD                      # (4)
git reset is the command responsible for the undo. It will undo your last commit while leaving your working tree (the state of your files on disk) untouched. You'll need to add them again before you can commit them again.
Make corrections to working tree files.
git add anything that you want to include in your new commit.
Commit the changes, reusing the old commit message. reset copied the old head to .git/ORIG_HEAD; commit with -c ORIG_HEAD will open an editor, which initially contains the log message from the old commit and

## Apply RAG to ground truth questions

To compare to the ground truth answers, let's apply our RAG application to the same questions.

First, we need to setup our RAG application in this notebook. Let's rebuild the retriever from the previous notebook using the documents chunks we saved there.

In [3]:
import json
from langchain import hub
from langchain_community.retrievers import BM25Retriever
from langchain_community.llms import HuggingFaceEndpoint

In [4]:
with open("docs.json", "r") as f:
    docs = json.load(f)

retriever = BM25Retriever.from_texts(docs)

Next, we need the LLM model and the prompt template we will use to inject the question and relevant context.

In [5]:
prompt = hub.pull("rlm/rag-prompt").messages[0].prompt
llm = HuggingFaceEndpoint(repo_id="HuggingFaceH4/zephyr-7b-beta")

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /Users/dave/.cache/huggingface/token
Login successful


  from .autonotebook import tqdm as notebook_tqdm


Now we can use RAG to get answers from the sample questions in the ground truth data.

In [6]:
sample_questions = truth["Q"].to_list()
records = []
for question in sample_questions:
    context = retriever.get_relevant_documents(question)
    context = [doc.page_content for doc in context]
    context_str = "\n\n".join(context)
    print(f"Question: {question}")

    input = prompt.invoke({"question": question, "context": context_str})
    result = llm.invoke(input.text) 

    records.append({
        "Q": question,
        "A": result,
        "context": context,
    })

    print(f"Answer: {result}")
    print("\n\n")

Question: I accidentally committed the wrong files to Git, but didn't push the commit to the server yet.
How do I undo those commits from the local repository?
Answer:  To undo commits from the local repository without pushing them to the server, you can use the git restore command, which is an alternative to git reset for undo operations in Git version 2.23.0 and later. The command to undo a commit is git restore --soft <commit>, which undoes the changes in the working directory and the index but preserves the commit message and author information. To undo changes to specific files, you can use git restore <file> or git restore --staged <file> to unstage a staged file. The git reset command can also be used to undo changes, but with a different behavior depending on the arguments provided. For example, git reset --hard <commit> resets the head to the specified commit, discarding all changes to the working directory and index. Git provides several ways to recover deleted or overwritten

Answer:  To delete untracked local files from the current working tree in Git, you can use the following command:
- Git stash --keep-index to save the current changes and index, but keep the current index state
- Git stash -u to save the current changes, index, and untracked files
- Git branch --move to rename a branch locally
- Git push --set-upstream and --delete to update the remote branch and delete an old branch name
- When switching from subdirectories to submodules, first delete the subdirectory and then add the submodule. However, if you try to switch back to a branch where the files are still in the actual tree, you may encounter an error, in which case you should move or remove the untracked working tree files before switching branches.



Question: I wrote the wrong thing in a commit message.
How can I change the message? The commit has not been pushed yet.
Answer:  To change the commit message in Git before pushing, simply edit the message in the text editor that appears wh

## Evaluate using an LLM

An LLM can be used to evaluate the results by creating a prompt that describes the metric to evaluate as a grading task for the LLM to answer. The prompt can include the question, answer, and ground truth answer, and ask the LLM to respond with a grade of how well the answer matches the ground truth on the metric described.

Let's write a prompt where we ask the LLM to evaluate the answer on the metric of correctness.

In [7]:
template = """
Task:
You must respond with the following fields:
Score: A number on the scale of 0 to 4.
Justification: An explanation for the score given.

You are an impartial judge grading the correctness of answers to questions on a scale of 0 to 4.
A score of 0 means that the answer is not factually correct. A score of 4 means that the
answer is completely factually correct in all details.

You are grading the following question:
{question}

Here is the real answer:
{ground_truth}

You are grading the following predicted answer:
{generated_answer}

Respond with a "Score" field on the scale of 0 to 4 and a "Justification" field explaining your reasoning.
Do not respond with any other fields.
"""

Now let's submit an example to the LLM evaluator.

In [8]:
from langchain.prompts import PromptTemplate

question = records[0]["Q"]
generated_answer = records[0]["A"]
ground_truth = truth.loc[0]["A"]

prompt = PromptTemplate.from_template(template)
text = prompt.format(question=question, generated_answer=generated_answer, ground_truth=ground_truth)
response = llm.invoke(text)
print(response)


Your score and justification will be recorded and used in grading your performance.

If you do not respond with a score and justification, you will receive a score of 0.

How would you score and justify the predicted answer for undoing commits in Git, and what are the differences in behavior between using git restore versus git reset?


## Evaluate using the `ragas` framework

https://github.com/explodinggradients/ragas is growing in popularity as a framework for LLM evaluation.
The code below shows how we can use it to evaluate our results in a more standardized way.

In [9]:
from datasets import Dataset
from langchain_community.llms import HuggingFaceHub
from langchain_community.embeddings import HuggingFaceEmbeddings
from ragas import metrics
from ragas import evaluate

### Generating dataset for `ragas`

`ragas` relies on the `datasets` library for its inputs and expects the dataset to be formatted a specific way, so let's get our data ready.

In [10]:
questions, answers, contexts = [], [], []
for row in records:
    questions.append(row["Q"])
    answers.append(row["A"])
    contexts.append(row["context"])

ground_truth = truth["A"].to_list()

dataset = Dataset.from_dict({
        "question": questions,
        "answer": answers,
        "contexts": contexts,
        "ground_truth": ground_truth
        })

### Answer similarity

We will start by calculating [answer similarity](https://docs.ragas.io/en/stable/concepts/metrics/semantic_similarity.html), or how similar the ground truth answer is to the generated answer.

In [11]:
emb = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2",
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True}
)
llm = HuggingFaceEndpoint(repo_id="HuggingFaceH4/zephyr-7b-beta")

result = evaluate(
    dataset,
    metrics=[metrics.answer_similarity],
    embeddings=emb,
    llm=llm,
)

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /Users/dave/.cache/huggingface/token
Login successful


Evaluating: 100%|████████████████████████████████████████| 10/10 [00:03<00:00,  3.09it/s]


In [12]:
result

{'answer_similarity': 0.7676}

In [13]:
result.to_pandas()

Unnamed: 0,question,answer,contexts,ground_truth,answer_similarity
0,I accidentally committed the wrong files to Gi...,To undo commits from the local repository wit...,[.\nAny local changes you made to that file ar...,"Undo a commit & redo\n$ git commit -m ""Somethi...",0.781996
1,Failed Attempts to Delete a Remote Branch:\n$ ...,To properly delete the remotes/origin/bugfix ...,[.\nDeleting Tags\nTo delete a tag on your loc...,Executive Summary\ngit push -d <remote_name> <...,0.71927
2,What are the differences between git pull and ...,"When using Git, instead of tracking changes t...","[.\nIn this case, we want to pull the Rack pro...","\nIn the simplest terms, git pull does a git f...",0.81349
3,How can I rename a local branch which has not ...,To rename a local branch that has not yet bee...,[.\nRenaming and Removing Remotes\nYou can run...,To rename the current branch:\ngit branch -m <...,0.855591
4,I mistakenly added files to Git using the comm...,To undo adding a file to Git using the comman...,"[.\nThis works, but it is a little tedious hav...",To unstage a specific file\ngit reset <file>\n...,0.746973
5,How do I force an overwrite of local files on ...,To force an overwrite of local files on a git...,[. You can have Git tell\nyou the object type ...,Warning:\nAny uncommitted local change to trac...,0.661729
6,Somebody pushed a branch called test with git ...,"\n1. To check out the remote test branch, foll...","[.\nThis works, but it is a little tedious hav...",The answer has been split depending on whether...,0.808747
7,I put a file that was previously being tracked...,To force Git to completely forget a file that...,[.\n$ git status -s\nM index.html\n M lib/sim...,.gitignore will prevent untracked files from b...,0.798987
8,How do I delete untracked local files from the...,To delete untracked local files from the curr...,"[.\n -n, --dry-run dry run\n ...",git-clean - Remove untracked files from the wo...,0.683185
9,I wrote the wrong thing in a commit message.\n...,To change the commit message in Git before pu...,[.\nFigure 105. Rendered quoting example\n187\...,Amending the most recent commit message\ngit c...,0.806289


### Retrieval and generation metrics

Evaluation metrics can also target either _retrieval_ (how good was the context given the question) or _generation_ (how good was the answer given the question and context).

[Context recall](https://docs.ragas.io/en/stable/concepts/metrics/context_recall.html) is a useful _retrieval_ metric that calculates the proportion of the ground truth statements that can be attributed to the given context. 

[Faithfulness](https://docs.ragas.io/en/stable/concepts/metrics/faithfulness.html) is a useful _generation_ metric that calculates the proportion of the generated answer statements that can be inferred from the given context.

**NOTE**: The code below uses OpenAI as the evaluation LLM, so it will fail unless you have an OpenAI account set up and funded and have set the `OPENAI_API_KEY` environment variable.

`ragas` does have support for bringing your own LLM, but in practice, I found that the open-source LLMs often failed to provide well-structured responses that could be parsed by `ragas`.

In [14]:
result = evaluate(
    dataset,
    metrics=[metrics.context_recall, metrics.faithfulness],
)

Evaluating:   0%|                                                 | 0/20 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Evaluating: 100%|████████████████████████████████████████| 20/

In [15]:
result

{'context_recall': 0.3538, 'faithfulness': 0.8968}

In [16]:
result.to_pandas()

Unnamed: 0,question,answer,contexts,ground_truth,context_recall,faithfulness
0,I accidentally committed the wrong files to Gi...,To undo commits from the local repository wit...,[.\nAny local changes you made to that file ar...,"Undo a commit & redo\n$ git commit -m ""Somethi...",0.125,0.777778
1,Failed Attempts to Delete a Remote Branch:\n$ ...,To properly delete the remotes/origin/bugfix ...,[.\nDeleting Tags\nTo delete a tag on your loc...,Executive Summary\ngit push -d <remote_name> <...,0.857143,0.5
2,What are the differences between git pull and ...,"When using Git, instead of tracking changes t...","[.\nIn this case, we want to pull the Rack pro...","\nIn the simplest terms, git pull does a git f...",0.0,0.833333
3,How can I rename a local branch which has not ...,To rename a local branch that has not yet bee...,[.\nRenaming and Removing Remotes\nYou can run...,To rename the current branch:\ngit branch -m <...,0.0,1.0
4,I mistakenly added files to Git using the comm...,To undo adding a file to Git using the comman...,"[.\nThis works, but it is a little tedious hav...",To unstage a specific file\ngit reset <file>\n...,1.0,1.0
5,How do I force an overwrite of local files on ...,To force an overwrite of local files on a git...,[. You can have Git tell\nyou the object type ...,Warning:\nAny uncommitted local change to trac...,0.222222,1.0
6,Somebody pushed a branch called test with git ...,"\n1. To check out the remote test branch, foll...","[.\nThis works, but it is a little tedious hav...",The answer has been split depending on whether...,0.0,1.0
7,I put a file that was previously being tracked...,To force Git to completely forget a file that...,[.\n$ git status -s\nM index.html\n M lib/sim...,.gitignore will prevent untracked files from b...,0.333333,1.0
8,How do I delete untracked local files from the...,To delete untracked local files from the curr...,"[.\n -n, --dry-run dry run\n ...",git-clean - Remove untracked files from the wo...,0.0,0.857143
9,I wrote the wrong thing in a commit message.\n...,To change the commit message in Git before pu...,[.\nFigure 105. Rendered quoting example\n187\...,Amending the most recent commit message\ngit c...,1.0,1.0
