*Author: [Daniel Puente Viejo](https://www.linkedin.com/in/danielpuenteviejo/)*

## **Giskard & Ragas for automatic RAG evaluation**

<img src="imgs/cover_page.png" height="350"> 

Practical example of how to automatically create and evaluate a dataset.

### **Index:**

- <a href='#1'><ins>1. 🔧 Setup</ins></a>
- <a href='#2'><ins>2. 📦 Chunking & Vect BBDD creation</ins></a>
- <a href='#3'><ins>3. ⚙️ Create dataset</ins></a>
- <a href='#4'><ins>4. 🔄 Retrieve examples & Evaluate</ins></a>
- <a href='#5'><ins>5. 🎯 Answer questions & Evaluate</ins></a>

### <a id='1' style="color: skyblue;">**1. 🔧 Setup**</a>

We import the libraries and variables like the model name or index name among others

```bash
conda create -n <name> python=3.10.15
conda activate <name>
pip install -r requirements.txt
```

In [1]:
import warnings
warnings.filterwarnings("ignore")

from dotenv import load_dotenv
load_dotenv()

import pandas as pd
import os

from langchain import FAISS
from langchain.document_loaders import DataFrameLoader 
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI

index_name = 'sample_index'
embedding_model_name = 'embedding-001'
llm_model_name = 'gemini-1.5-flash'
use_langsmith = True

It is important to configure the environment variables and the embedding models that will be used in the LLM. Additionally, the option to use LangSmith to track metrics can be enabled.

Configure the environment variables with a `.env` file.

In [2]:
os.environ["GEMINI_API_KEY"] = os.environ.get("GEMINI_API_KEY")
os.environ["GOOGLE_API_KEY"] = os.environ.get("GEMINI_API_KEY")

if use_langsmith:
    os.environ["LANGSMITH_API_KEY"] = os.environ.get("LANGSMITH_API_KEY")
    os.environ["LANGSMITH_TRACING"] = "true"
    os.environ["LANGSMITH_PROJECT"] = os.environ.get("LANGSMITH_PROJECT")
    from langsmith import utils
    utils.tracing_is_enabled()

embeddings = GoogleGenerativeAIEmbeddings(model=f"models/{embedding_model_name}")
llm = ChatGoogleGenerativeAI(model=llm_model_name,
                             temperature=0.0, 
                             max_output_tokens=512)

We load the clients. These are **scripts from the `/src` folder** where we have grouped functionalities to allow us to **quickly progress in the notebooks**. These are basically:

- **Chunking** - Facilitates generating chunks given a set of paths.
- **RAGDataset** - Automatically generates the dataset for us.
- **Retrieval** - Given a set of questions, returns the retrieved results.
- **Answering** - Given a set of questions, returns the answers.
- **Evaluation** - Allows us to evaluate both the retrieval and the answers.

In [3]:
from src.chunk_generation import Chunking
from src.dataset_creation import RAGDataset
from src.retrieval import Retrieval
from src.answer import Answering
from src.evaluation import Evaluation

chunking_client = Chunking()
datasetcreation_client = RAGDataset(llm_model = f"gemini/{llm_model_name}", 
                                    embedding_model= f"gemini/{embedding_model_name}")
retrieval_client = Retrieval(index_name = index_name, 
                             embeddings = embeddings)
answering_client = Answering()
evaluation_client = Evaluation(llm, embeddings)

### <a id='2' style="color: skyblue;">**2. 📦 Chunking & Vect BBDD creation**</a>

In [4]:
data_folder = "data/source"
paths = [f"{data_folder}/{f}" for f in os.listdir(data_folder) if f.endswith(".pdf")]
print(paths)

['data/source/Breaking Bad.pdf', 'data/source/Game of Thrones.pdf', 'data/source/La casa de papel.pdf', 'data/source/Suits.pdf']


In case the chunks have not been generated, we generate them with this simple piece of code.

In [None]:
chunks_path = "data/chunks"
chunks_filename = "chunks.csv"
chunks_complete_path = f"{chunks_path}/{chunks_filename}"

if not os.path.exists(chunks_complete_path):
    print("Creating chunks...")
    df_chunks = chunking_client.preprocess_chunking(paths = paths, chunk_size=500, chunk_overlap=50)
    df_chunks.to_csv(chunks_complete_path, index=False)

else:
    print("Loading chunks...")
    df_chunks = pd.read_csv(chunks_complete_path)
    
df_chunks.head()

Unnamed: 0,chunk_id,content,filename,page
0,956eeabc-6f25-4b71-a996-57f39d026f64,Synopsis \nWhat would you do if you found out...,Breaking Bad.pdf,1
1,d04ad72a-fe75-41ea-830f-3e1b657661f5,". With \nhis business at risk, Walter hires la...",Breaking Bad.pdf,1
2,b624dcdc-841d-469e-b37c-d2fe5949eddf,divorce. \nGus is determined to bring Walter ...,Breaking Bad.pdf,2
3,995540ad-0423-45fc-87b8-bf0f97e14369,\nThe choice of the title Breaking Bad offe...,Breaking Bad.pdf,2
4,0e037077-d86b-4b2b-87ef-5c9009421014,"Game of Thrones (Juego de tronos) , also commo...",Game of Thrones.pdf,1


In case the index doesn't exist, we create it with this piece of code.

For this code, we will use the Langchain and FAISS libraries. **FAISS (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors. It allows for fast retrieval of vectors similar to a query vector, even in large datasets. It's commonly used for tasks like recommendation systems, image retrieval, and semantic search, making it a valuable tool for RAG applications.**

In [None]:
if not os.path.exists(index_name):
    print("Creating index...")
    loader = DataFrameLoader(df_chunks, page_content_column="content")
    data = loader.load()
    db = FAISS.from_documents(data, embedding=embeddings)
    db.save_local(index_name)

else:
    print("Loading index...")
    db = FAISS.load_local(index_name, embeddings, allow_dangerous_deserialization=True)
db

<langchain_community.vectorstores.faiss.FAISS at 0x232e4966920>

### <a id='3' style="color: skyblue;">**3. ⚙️ Create dataset**</a>

In [None]:
num_questions = 5

testset_path = "data/testset"
testset_filename = "testset.csv"
testset_complete_path = f"{testset_path}/{testset_filename}"

if not os.path.exists(testset_complete_path):
    print("Creating dataset...")
    testset_df = datasetcreation_client.dataset_creation(df_chunks, num_questions=num_questions)
    testset_df.to_csv(testset_complete_path, index=False)

else:
    print("Loading dataset...")
    testset_df = pd.read_csv(testset_complete_path)

testset_df.head()

### <a id='4' style="color: skyblue;">**4. 🔄 Retrieve examples & Evaluate**</a>

Perform a simple retrieval.


In [None]:
chunks, chunks_id = retrieval_client.retrieval_multiple_queries(
    queries = testset_df["question"].tolist(),
    top_k = 5
)
testset_df['generated_context'], testset_df['generated_context_id'] = chunks, chunks_id
testset_df.head()

Unnamed: 0,id,question,reference_answer,reference_context,reference_context_id,reference_metadata,reference_context_metadata,generated_context,generated_context_id
0,78b68230-a162-4db4-8136-2ee37d587d8d,What happens to Mike in Season 5 of the show?,"In Season 5, Mike's secret is exposed, leading...",[future at Columbia Law School with Jessica's ...,[2a2469b6-e6af-4a90-a391-433a07c98ddc],"{'question_type': 'simple', 'seed_document_id'...",[{'chunk_id': '2a2469b6-e6af-4a90-a391-433a07c...,[extorsión por un abogado rival. También es in...,"[0160d24c-6584-402d-b804-6e92ba8255c9, 41743e7..."
1,f5538a32-3ca5-40ca-827d-436e985581f2,What actions does Palermo take that negatively...,"Palermo betrays the group by freeing Gandía, t...","[surgery to save Nairobi’s life. Palermo, angr...",[89a461b9-f6ec-437e-9537-c6528a4c7fef],"{'question_type': 'simple', 'seed_document_id'...",[{'chunk_id': '89a461b9-f6ec-437e-9537-c6528a4...,"[Desde su posición, El Profesor no puede comun...","[4822741e-b957-45cd-a9d6-580747d071eb, 65951db..."
2,305cfa21-af0a-4b5a-b1a2-5f47debcf8de,Summarize the key plot points of Breaking Bad ...,"In Season 2, Tuco kidnaps Jesse and Walter, bu...",[Synopsis \nWhat would you do if you found ou...,[956eeabc-6f25-4b71-a996-57f39d026f64],"{'question_type': 'complex', 'seed_document_id...",[{'chunk_id': '956eeabc-6f25-4b71-a996-57f39d0...,[Sinopsis \n¿Qué harías si te enteraras de qu...,"[a4f4a369-b391-4e27-a0b2-172f83213250, 886710a..."
3,98305e60-ee2c-4060-a264-c024d7f54507,Considering the events leading to his demise i...,"At the end of Season 1, King Robert dies, and ...","[During the visit, Bran Stark, one of the youn...",[a7281899-984d-4949-b3ba-1018535c716f],"{'question_type': 'complex', 'seed_document_id...",[{'chunk_id': 'a7281899-984d-4949-b3ba-1018535...,"[Game of Thrones (Juego de tronos ), también ...","[73e3e45d-1319-4c14-9bc5-286411d9600e, bb8ae8d..."
4,830b3061-9cb8-458e-a21a-3852b8e809df,"Considering the thematic trajectory of 'Suits,...","The title ""Breaking Bad,"" a colloquial idiom, ...",[The choice of the title Breaking Bad offers ...,[995540ad-0423-45fc-87b8-bf0f97e14369],"{'question_type': 'distracting element', 'seed...",[{'chunk_id': '995540ad-0423-45fc-87b8-bf0f97e...,"[Suits: \nSuits sigue a Mike Ross, un brillan...","[12cdc330-ad29-45b1-ba46-7c8259082623, 886710a..."
5,4a8f8d36-ff4f-4ab4-a567-885a2bf1fd8d,Comparing the narrative structures of Season 1...,"Season 1 introduces Mike Ross, hired by Harvey...","[Suits \nSuits follows Mike Ross, a brillian...",[6d4225fb-1693-44e3-8b2b-17098a372beb],"{'question_type': 'distracting element', 'seed...",[{'chunk_id': '6d4225fb-1693-44e3-8b2b-17098a3...,"[Suits: \nSuits sigue a Mike Ross, un brillan...","[12cdc330-ad29-45b1-ba46-7c8259082623, 65951db..."
6,dfc9763c-f17a-4b84-b8cd-1945568450e3,"Hi, I'm a bit confused about the ending of Gam...","Bran becomes the new king, and Tyrion is his H...","[Jon, horrified by Daenerys's actions, assassi...",[d6b47f89-1f63-47ea-a5e1-925fff02bf11],"{'question_type': 'situational', 'seed_documen...",[{'chunk_id': 'd6b47f89-1f63-47ea-a5e1-925fff0...,"[Después, Jon viaja más allá del Muro para arr...","[b85c049c-2334-481a-8edd-b92c521b2812, 73e3e45..."


Evaluate the retrieval, specifying the `context_precision` and `context_recall` metrics.

In [10]:
retrieval_results = evaluation_client.evaluate(testset_df = testset_df,
                                               retrieval_metrics = ['context_precision', 'context_recall'])
retrieval_results

Evaluating: 100%|██████████| 4/4 [00:05<00:00,  1.29s/it]


Unnamed: 0,question,generated_context,reference_contexts,reference_answer,context_precision,context_recall
0,"En la tercera temporada, ¿quiénes se unen a la...","[Desde su posición, El Profesor no puede comun...","[Por su parte, Sagasta traza un plan para desp...",A esta nueva aventura se incorporarán nuevos i...,1.0,1.0
1,Describa detalladamente cómo los atracadores a...,[La casa de papel es una serie española que s...,[En el interior del Banco de España la situaci...,Los atracadores entran al Banco de España haci...,0.5,1.0


### <a id='5' style="color: skyblue;">**5. 🎯 Answer questions & Evaluate**</a>

Answer the questions. 

In [9]:
answers = answering_client.answer_multiple_queries(queries = testset_df["question"].tolist(), 
                                                   contexts = testset_df["generated_context"].tolist(),
                                                   llm = llm)
testset_df['generated_answer'] = answers
testset_df

Unnamed: 0,id,question,reference_answer,reference_context,reference_context_id,reference_metadata,reference_context_metadata,generated_context,generated_context_id,generated_answer
0,ea75169d-75c3-41a8-b611-af1c222359e1,"En la tercera temporada, ¿quiénes se unen a la...",A esta nueva aventura se incorporarán nuevos i...,"['Por su parte, Sagasta traza un plan para des...","['76d44420-e3fc-4b6d-a515-7ebaff6e2fa9', '4822...","{'question_type': 'simple', 'seed_document_id'...",[{'chunk_id': '4822741e-b957-45cd-a9d6-580747d...,"[Desde su posición, El Profesor no puede comun...","[4822741e-b957-45cd-a9d6-580747d071eb, ad7424a...","Sí, la información está en el contexto proporc..."
1,108509b2-8b44-499d-a57c-4c21c2540140,Describa detalladamente cómo los atracadores a...,Los atracadores entran al Banco de España haci...,['En el interior del Banco de España la situac...,"['44e2769b-1b69-415d-9495-d521b4fd5bfc', '4822...","{'question_type': 'complex', 'seed_document_id...",[{'chunk_id': '4822741e-b957-45cd-a9d6-580747d...,[La casa de papel es una serie española que s...,"[ad7424a3-d2b8-43aa-a098-12743d8c2868, 4822741...",La información proporcionada describe el acces...


Evaluate the answering, specifying the `answer_relevancy`, `answer_similarity` and `faithfulness` metrics.

In [None]:
answer_results = evaluation_client.evaluate(testset_df = testset_df,
                                            answer_metrics = ['answer_relevancy', 'answer_similarity', 'faithfulness'])
answer_results

---