<a href="https://mng.bz/8wdg" target="_blank">
    <img src="../../Assets/Images/NewMEAPHeader.png" alt="New MEAP" style="width: 100%;" />
</a>


# Chapter 05 - RAG Evaluation: Accuracy, Relevance, Faithfulness

### Welcome to chapter 5 of A Simple Introduction to Retrieval Augmented Generation.

In this chapter, we will assess the quality of the RAG pipeline we have built in Chapter 3 & 4. We will re-use the [knowledge base](../../Assets/Data/) we created with the Wikipedia article. We will reuse the Retrieval Augmentation and Generation functions we built in Chapter 4.

## Installing Dependencies

All the necessary libraries for running this notebook along with their versions can be found in __requirements.txt__ file in the root directory of this repository

You should go to the root directory and run the following command to install the libraries

```
pip install -r requirements.txt
```

This is the recommended method of installing the dependencies

___
Alternatively, you can run the command from this notebook too. The relative path may vary

In [1]:
%pip install -r ../../requirements.txt --quiet

Note: you may need to restart the kernel to use updated packages.


## 1. Re-Load the RAG Pipeline

In chapter 4, we created the generation pipeline. We will bring that here to use it for evaluations.

In Chapter 3, we were working on indexing the Wikipedia page for the 2023 cricket world cup. If you recall we had used embeddings from OpenAI to encode the text and used FAISS as the vector index to store the embeddings. We also stored the FAISS index in a local directory. We will use this in the RAG pipeline.

Note: You will need an __OpenAI API Key__ which can be obtained from [OpenAI](https://platform.openai.com/api-keys) to reuse the embeddings.

To initialize the __OpenAI client__, we need to pass the api key. There are many ways of doing it. 

####  [Option 1] Creating a .env file for storing the API key and using it # Recommended

Install the __dotenv__ library

_The dotenv library is a popular tool used in various programming languages, including Python and Node.js, to manage environment variables in development and deployment environments. It allows developers to load environment variables from a .env file into their application's environment._

- Create a file named .env in the root directory of their project.
- Inside the .env file, then define environment variables in the format VARIABLE_NAME=value. 

e.g.

OPENAI_API_KEY=YOUR API KEY

In [2]:
from dotenv import load_dotenv
import os

if load_dotenv():
    print("Success: .env file found with some environment variables")
else:
    print("Caution: No environment variables found. Please create .env file in the root directory or add environment variables in the .env file")

Success: .env file found with some environment variables


#### [Option 2] Alternatively, you can set the API key in code. 
However, this is not recommended since it can leave your key exposed for potential misuse. Uncomment the cell below to use this method.

In [3]:
#import os
# os.environ["OPENAI_API_KEY"] = "sk-proj-******" #Imp : Replace with an OpenAI API Key

We can also test if the key is valid or not

In [5]:
api_key=os.environ["OPENAI_API_KEY"]

import openai
from openai import OpenAI

client = OpenAI()


if api_key:
    try:
        client.models.list()
        print("OPENAI_API_KEY is set and is valid")
    except openai.APIError as e:
        print(f"OpenAI API returned an API Error: {e}")
        pass
    except openai.APIConnectionError as e:
        print(f"Failed to connect to OpenAI API: {e}")
        pass
    except openai.RateLimitError as e:
        print(f"OpenAI API request exceeded rate limit: {e}")
        pass

else:
    print("Please set you OpenAI API key as an environment variable OPENAI_API_KEY")



OPENAI_API_KEY is set and is valid


The RAG pipeline takes two inputs - 
1. User Query
2. Location of the Vector Index (Knowledge base)

And generates an answer

#### Retrieval Function

In [6]:
# Import FAISS class from vectorstore library
from langchain_community.vectorstores import FAISS

# Import OpenAIEmbeddings from the library
from langchain_openai import OpenAIEmbeddings

def retrieve_context(query, db_path):
    embeddings=OpenAIEmbeddings(model="text-embedding-3-large")

    # Load the database stored in the local directory
    db=FAISS.load_local(db_path, embeddings, allow_dangerous_deserialization=True)

    # Ranking the chunks in descending order of similarity
    docs = db.similarity_search(query)
    # Selecting first chunk as the retrieved information
    retrieved_context=docs[0].page_content

    return str(retrieved_context)

#### Augmentation Function

In [7]:
def create_augmeted(query, db_path):

    retrieved_context=retrieve_context(query,db_path)

    # Creating the prompt
    augmented_prompt=f"""

    Given the context below answer the question.

    Question: {query} 

    Context : {retrieved_context}

    Remember to answer only based on the context provided and not from any other source. 

    If the question cannot be answered based on the provided context, say I don’t know.

    """

    return retrieved_context, str(augmented_prompt)

#### RAG function

In [8]:
# Importing the OpenAI library
from openai import OpenAI

def create_rag(query, db_path):

    retrieved_context, augmented_prompt=create_augmeted(query,db_path)



    # Instantiate the OpenAI client
    client = OpenAI()

    # Make the API call passing the augmented prompt to the LLM
    response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=	[
        {"role": "user", "content": augmented_prompt}
  		]
    )

    # Extract the answer from the response object
    answer=response.choices[0].message.content

    return retrieved_context, answer


Let's try sending our question to this function.

In [9]:
create_rag("Who won the 2023 Cricket World Cup?","../../Assets/Data" )

("2023 ICC Men's Cricket World CupDates5 October – 19 November 2023Administrator(s)International Cricket CouncilCricket formatOne Day International (ODI)Tournament format(s)Round-robin and knockoutHost(s)\xa0IndiaChampions\xa0Australia (6th title)Runners-up\xa0IndiaParticipants10Matches48Attendance1,250,307 (26,048 per match)Player of the series Virat KohliMost runs Virat Kohli (765)Most wickets Mohammed Shami (24)Official websitecricketworldcup.com Highlighted are the countries that participated in the 2023 Cricket World Cup. Means of qualification Date Venue Berths Qualified Host nation — — 1 \xa0India ICC Super League 30 July 2020 – 14 May 2023 Various 7 Qualifier 18 June 2023 – 9 July 2023 Zimbabwe 2 Total 10 Location Stadium Capacity[19] No. of matches Ahmedabad Narendra Modi Stadium 132,000 5 Bangalore M. Chinnaswamy Stadium 33,800 5 Chennai M. A. Chidambaram Stadium 38,200 5 Delhi Arun Jaitley Stadium 35,200 5 Dharamshala HPCA Stadium 21,200 5 Hyderabad Rajiv Gandhi Internationa

Let's ask another one.

In [10]:
create_rag("What was Virat Kohli's achievement in the Cup?","../../Assets/Data" )

('The tournament was contested by ten national teams, maintaining the same format used in 2019. After six weeks of round-robin matches, India, South Africa, Australia, and New Zealand finished as the top four and qualified for the knockout stage. In the knockout stage, India and Australia beat New Zealand and South Africa, respectively, to advance to the final, played on 19 November at the Narendra Modi Stadium in Ahmedabad. Australia won the final by six wickets, winning their sixth Cricket World Cup title.  \nVirat Kohli was named the player of the tournament and also scored the most runs, while Mohammed Shami was the leading wicket-taker. A total of 1,250,307 spectators attended the matches, the highest number in any Cricket World Cup to date.[1] The tournament final set viewership records in India, drawing 518 million viewers, with a peak of 57 million streaming viewers.  \nBackground',
 'Virat Kohli was named the player of the tournament and also scored the most runs.')

We can also try asking a question which is out of the scope of our knowledge base

In [11]:
create_rag("What is RAG?","../../Assets/Data" )

('This page was last edited on 1 July 2024, at 19:04\xa0(UTC). Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.  \nPrivacy policy About Wikipedia Disclaimers Contact Wikipedia Code of Conduct Developers Statistics Cookie statement Mobile view',
 'I don’t know.')

For some of the questions, the response may be "I don't know". That is when the LLM can't find an answer in the retrieved context. In our augmentation step, we had asked the LLM to do so. But how good is this system? We need to be able to evaluate it.

## 2. RAGAs Framework

[Ragas](https://docs.ragas.io/en/stable/) is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. It has been developed by the good folks at [exploding gradients](https://github.com/explodinggradients).

We will look at this evaluation in 2 parts. 

1. Creation of synthetic test data for evaluation.
2. Calculation of evaluation metrics.

### 2.1 Creation of Synthetic Data

Synthetic Data Generation uses LLMs to generate diverse questions and answers from the documents in the knowledge base. LLMs can be prompted to create questions like simple questions, multi-context questions, conditional questions, reasoning questions etc. using the documents from the knowledge base as context.

<img src="../../Assets/Images/5.1.png">

In [27]:
from langchain_community.document_loaders import AsyncHtmlLoader

#This is the url of the wikipedia page on the 2023 Cricket World Cup
url="https://en.wikipedia.org/wiki/2023_Cricket_World_Cup"

#Instantiating the AsyncHtmlLoader
loader = AsyncHtmlLoader (url)

#Loading the extracted information
data = loader.load()

from langchain_community.document_transformers import Html2TextTransformer

#Instantiate the Html2TextTransformer function
html2text = Html2TextTransformer()


#Call transform_documents
data_transformed = html2text.transform_documents(data)

Fetching pages: 100%|##########| 1/1 [00:00<00:00,  4.40it/s]


In [28]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

In [29]:
generator_llm = ChatOpenAI(model="gpt-4o-mini")
critic_llm = ChatOpenAI(model="gpt-4o-mini")
embeddings = OpenAIEmbeddings()

In [30]:
generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

In [None]:
testset = generator.generate_with_langchain_docs(data_transformed, test_size=20, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})


In [33]:
testset.to_pandas()

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,What significance does India being the sole ho...,[ournament format(s)| Round-robin and knockout...,The significance of India being the sole host ...,simple,[{'source': 'https://en.wikipedia.org/wiki/202...,True
1,What were the key highlights of the 2023 ICC M...,[ournament format(s)| Round-robin and knockout...,Key highlights of the 2023 ICC Men's Cricket W...,simple,[{'source': 'https://en.wikipedia.org/wiki/202...,True
2,What teams qualified for the 2023 Cricket Worl...,"[ stage did not qualify, with only Sri Lanka\n...",The teams that qualified for the 2023 Cricket ...,simple,[{'source': 'https://en.wikipedia.org/wiki/202...,True
3,What matches were held at the Assam Cricket As...,[ Hasan 74 (89) \nReece Topley 3/23 (5 overs)...,The matches held at the Assam Cricket Associat...,simple,[{'source': 'https://en.wikipedia.org/wiki/202...,True
4,What were the key outcomes of the 2023 Cricket...,[4 (50 overs)** | | \n| 4 | New Zealand | 3...,The key outcomes of the 2023 Cricket World Cup...,simple,[{'source': 'https://en.wikipedia.org/wiki/202...,True
5,What is the title of the official theme song f...,[ Opening batsman / wicket-keeper \nRohit Sh...,The official theme song of the 2023 Cricket Wo...,simple,[{'source': 'https://en.wikipedia.org/wiki/202...,True
6,What was the outcome of the match between Afgh...,"[\nEden Gardens, Kolkata \n--- \n \n6 Novem...",Australia won by 3 wickets against Afghanistan...,simple,[{'source': 'https://en.wikipedia.org/wiki/202...,True
7,What competitions are currently ongoing in the...,[ Hong Kong in Qatar\n * Ireland v Afghanista...,The ongoing competitions in the Cricket World ...,simple,[{'source': 'https://en.wikipedia.org/wiki/202...,True
8,What was the significance of Wankhede Stadium ...,[4 (50 overs)** | | \n| 4 | New Zealand | 3...,The significance of Wankhede Stadium in the 20...,simple,[{'source': 'https://en.wikipedia.org/wiki/202...,True
9,What was the outcome of the final match betwee...,[4 (50 overs)** | | \n| 4 | New Zealand | 3...,Australia won the final match against India in...,simple,[{'source': 'https://en.wikipedia.org/wiki/202...,True


### 2.2 Using the RAG pipeline

Now, we will use the RAG pipeline we had created to generate responses to the questions in the synthetic dataset

In [35]:
db_path='../../Assets/Data'

In [36]:
questions_list=testset.to_pandas().question.to_list()
gt_list=testset.to_pandas().ground_truth.to_list()

answer_list=[]
context_list=[]

for record in testset.test_data:
    rag_context, rag_answer=create_rag(record.question,db_path)
    ground_truth=record.ground_truth
    answer_list.append(rag_answer)
    context_list.append([rag_context])

data_samples={
    'question':questions_list,
    'answer':answer_list,
    'contexts': context_list,
    'ground_truth':gt_list
}

In [37]:
from datasets import Dataset
dataset = Dataset.from_dict(data_samples)

In [38]:
dataset[2]

{'question': 'What teams qualified for the 2023 Cricket World Cup through the qualification process?',
 'answer': 'The teams that qualified for the 2023 Cricket World Cup through the qualification process are the Netherlands and Sri Lanka.',
 'contexts': ['Qualified via the 2020–2023 Super League  \n\xa0\xa0Qualified via the 2023 Qualifier  \n\xa0\xa0Participated in the qualifier but failed to qualify  \nMain article: 2023 Cricket World Cup qualification  \nOther than India, who qualified as hosts, all teams had to qualify for the tournament through the 2023 Cricket World Cup qualification process. Afghanistan, Australia, Bangladesh, England, New Zealand, Pakistan and South Africa qualified via the ICC Cricket World Cup Super League, with the Netherlands and Sri Lanka securing the final two places via the 2023 Cricket World Cup Qualifier in Zimbabwe during June and July 2023.'],
 'ground_truth': 'The teams that qualified for the 2023 Cricket World Cup through the qualification process 

In [39]:
dataset[7]

{'question': 'What competitions are currently ongoing in the Cricket World Cup League 2?',
 'answer': 'I don’t know.',
 'contexts': ["Afghanistan in Sri Lanka South Africa in New Zealand Kuwait women in Malaysia Canada in Nepal ACC Women's Premier Cup Thailand Quadrangular Series East Asia Cup Nepal Tri-Nation Series (round 1) Australia in New Zealand CWC Challenge League Play-off Nigeria Invitational Nepal T20I Tri-Nation Series Hong Kong in Qatar Ireland v Afghanistan in the UAE UAE Tri-Nation Series (round 2)  \nSri Lanka in Bangladesh Malaysia Open Championship PNG in Oman African Games Hong Kong Tri-Nation Series Scotland in the UAE PNG in Malaysia England women in New Zealand Australia women in Bangladesh PNG women in Zimbabwe Sri Lanka women in South Africa Lesotho in Eswatini  \nM W  \nWorld Test Championship Women's Championship Cricket World Cup League 2 Associate T20I cricket  \nFollowing Season: International cricket in 2024  \nSummer sports &indoor sportsWinter sportsCue &

### 2.3 Calculating Evaluation Metrics

We will use RAGAS metrics which will compare the answers to the ground truths 

In [40]:
from ragas import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
    context_entity_recall,
    answer_similarity,
    answer_correctness
)
from ragas.metrics.critique import (
    harmfulness, 
    maliciousness, 
    coherence, 
    correctness, 
    conciseness
)


In [None]:
result = evaluate(
    dataset,
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        context_recall,
        context_entity_recall,
        answer_similarity,
        answer_correctness,
        
        harmfulness, 
        maliciousness, 
        coherence, 
        correctness, 
        conciseness

    ],
)


In [42]:
import json

print(json.dumps(result, indent=4))

{
    "context_precision": 0.7894736841315789,
    "faithfulness": 0.45375939849624064,
    "answer_relevancy": 0.4585481970073529,
    "context_recall": 0.6447368421052632,
    "context_entity_recall": 0.5090755722231386,
    "answer_similarity": 0.855347300001274,
    "answer_correctness": 0.5711883747536411,
    "harmfulness": 0.0,
    "maliciousness": 0.05263157894736842,
    "coherence": 0.5263157894736842,
    "correctness": 0.5263157894736842,
    "conciseness": 0.47368421052631576
}


___
You can interpret the results above. Looks like we are performing well on __context_precision__ and __answer_similarity__ but other metrics are low. How to improve the metrics? We will look at advanced pre-retrieval, retrieval and post retrieval strategies in the next chapter.

---

<img src="../../Assets/Images/profile_s.png" width=100> 

Hi! I'm Abhinav! I am an entrepreneur and Vice President of Artificial Intelligence at Yarnit. I have spent over 15 years consulting and leadership roles in data science, machine learning and AI. My current focus is in the applied Generative AI domain focussing on solving enterprise needs through contextual intelligence. I'm passionate about AI advancements constantly exploring emerging technologies to push the boundaries and create positive impacts in the world. Let’s build the future, together!

[If you haven't already, please subscribe to the MEAP of A Simple Guide to Retrieval Augmented Generation here](https://mng.bz/8wdg)

<a href="https://mng.bz/8wdg" target="_blank">
    <img src="../../Assets/Images/NewMEAPFooter.png" alt="New MEAP" style="width: 100%;" />
</a>

#### If you'd like to chat, I'd be very happy to connect

[![GitHub followers](https://img.shields.io/badge/Github-000000?style=for-the-badge&logo=github&logoColor=black&color=orange)](https://github.com/abhinav-kimothi)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-000000?style=for-the-badge&logo=linkedin&logoColor=orange&color=black)](https://www.linkedin.com/comm/mynetwork/discovery-see-all?usecase=PEOPLE_FOLLOWS&followMember=abhinav-kimothi)
[![Medium](https://img.shields.io/badge/Medium-000000?style=for-the-badge&logo=medium&logoColor=black&color=orange)](https://medium.com/@abhinavkimothi)
[![Insta](https://img.shields.io/badge/Instagram-000000?style=for-the-badge&logo=instagram&logoColor=orange&color=black)](https://www.instagram.com/akaiworks/)
[![Mail](https://img.shields.io/badge/email-000000?style=for-the-badge&logo=gmail&logoColor=black&color=orange)](mailto:abhinav.kimothi.ds@gmail.com)
[![X](https://img.shields.io/badge/Follow-000000?style=for-the-badge&logo=X&logoColor=orange&color=black)](https://twitter.com/abhinav_kimothi)
[![Linktree](https://img.shields.io/badge/Linktree-000000?style=for-the-badge&logo=linktree&logoColor=black&color=orange)](https://linktr.ee/abhinavkimothi)
[![Gumroad](https://img.shields.io/badge/Gumroad-000000?style=for-the-badge&logo=gumroad&logoColor=orange&color=black)](https://abhinavkimothi.gumroad.com/)

---