# Evaluating RAG Applications
- Evaluating **Retrieval-Augmented Generation (RAG)** applications is crucial to ensure **accuracy, coverage, and reliability**.
- The **RAG Assessment (RAGAS)** framework provides **key metrics** to evaluate:
  - **Generated responses**
  - **Context retrieval**
  - **Overall application performance**
- Using **RAGAS**, developers can **optimize and fine-tune** their RAG models for **trustworthy results**.

## **RAGAS framework**
The **RAGAS score** is calculated using **retrieval and answer generation metrics**:

| **Category**            | **Metric**                 | **Purpose** |
|-------------------------|---------------------------|-------------|
| **Generated Response**  | **Faithfulness**          | Measures factual accuracy. |
|                         | **Answer Relevancy**      | How relevant generated answer is to the user question. |
| **Context Retrieval**   | **Context Recall**        | How much of the relevant information required to answer the question |
|                         | **Context Precision**     | The signal to noise ratio of retrieved context by knowledge base. |

![image.png](attachment:image.png)

### **Faithfulness metric**
- Measures **factual consistency** between the **generated response** and **retrieved context**.
- Score ranges **(0 to 1)** â†’ **higher is better**.
- **Low faithfulness** = **Hallucinations** (fabricated information).

ðŸ“Œ **Example**
| **Question**                 | **Context**            | **High Faithfulness Answer** | **Low Faithfulness Answer** |
|------------------------------|------------------------|------------------------------|------------------------------|
| What is the capital of Australia? | City of Canberra. | **Canberra is the capital of Australia.** | **Sydney was declared the capital in the early 20th century.** |

ðŸ”¹ *The incorrect answer falsely justifies Sydney as the capital, showing a hallucination.*

### **Answer relevancy metric**
- Evaluates how **relevant the generated response** is to the **input query**.
- A **lower score** is given if:
  - The answer is **incomplete**.
  - Contains **redundant or unrelated** information.

ðŸ“Œ **Example**
| **Question** | **High Relevance Answer** | **Low Relevance Answer** |
|-------------|--------------------------|--------------------------|
| What is the capital of Australia and a notable landmark there? | **Canberra is the capital. The Australian War Memorial is a major landmark.** | **Sydney is a major Australian city.** |

ðŸ”¹ *The low relevance answer does not answer the question about the capital or provide a landmark.*

### **Context recall metric**
- Measures **how much of the retrieved content** aligns with the **ground truth**.
- Higher values mean **better coverage** of relevant information.
- Score range: **0 to 1**.

ðŸ“Œ **Example**
| **Question** | **Ground Truth** | **High Context Recall** | **Low Context Recall** |
|-------------|----------------|------------------------|------------------------|
| What can you tell me about the geography of Australia? | **Great Barrier Reef** | **The Great Barrier Reef is located off Australiaâ€™s coast.** | **It is a coral reef system in the Pacific Ocean.** |

ðŸ”¹ *The low recall response lacks key details like **location and significance**.*

### **Context precision metric**
- Evaluates **whether all top-ranked contexts** are relevant.
- Ensures **highly relevant documents** appear at the **top**.
- Score range: **0 to 1**.

ðŸ“Œ **Example**
| **Question** | **Ground Truth** | **High Context Precision** | **Low Context Precision** |
|-------------|----------------|---------------------------|---------------------------|
| Where is Australia and what is its capital? | **Southern Hemisphere, Canberra** | **Australia is in the Southern Hemisphere. The capital is Canberra.** | **First document: Australia's wildlife. Second document: Australia's location and capital.** |

ðŸ”¹ *In the **low precision** example, an **irrelevant document** is ranked higher than the correct one.*

## **Key Takeaways**
âœ… Use **Faithfulness** to detect **hallucinations**.  
âœ… **Answer Relevancy** ensures **generated responses** are useful.  
âœ… **Context Recall** verifies **retrieved information** is complete.  
âœ… **Context Precision** ensures **top-ranked sources are relevant**.  
âœ… Regular **evaluation** improves **accuracy, reliability, and trustworthiness**.  