# Retrieval-Augmented Generation (RAG): Enhancing LLMs with External Knowledge

## Introduction to Retrieval-Augmented Generation (RAG)

> This course is heavily based on the WandB course [RAG++ : From POC to Production](https://wandb.ai/site/courses/rag/)

### What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a technique used to enhance large language models (LLMs) by integrating external knowledge retrieved from document databases or knowledge stores. Unlike conventional generative models, which rely solely on learned parameters from training data, RAG dynamically accesses up-to-date and contextually relevant information, significantly improving the accuracy, reliability, and usefulness of the generated responses.

The core idea behind RAG is simple yet powerful:

- **Retrieve**: When a user provides a query or prompt, RAG first retrieves relevant documents or passages from an external knowledge base.

- **Generate**: The model then uses the retrieved documents as context to generate accurate, informed, and detailed responses.

### Why Retrieval Matters in Generative AI?

Retrieval methods address fundamental limitations of purely parametric generative models:

* **Factual Accuracy**: Retrieval enables models to access the latest and accurate data rather than relying solely on outdated training datasets.
* **Reducing Hallucinations**: By grounding generation in retrieved information, RAG significantly reduces the chances of generating incorrect, nonsensical, or fabricated information.
* **Scalability**: Retrieval allows LLMs to leverage large-scale, dynamic knowledge bases efficiently without retraining the entire model when information updates occur.

### Limitations of Traditional LLMs:

Traditional language models have some well-known drawbacks:

* **Hallucination**: Generating plausible but incorrect or unsupported information.
* **Stale Knowledge**: Limited to static training data, lacking awareness of recent updates or newly available information.
* **Context Limitations**: Without retrieval, LLMs have fixed-size context windows, severely limiting their ability to reference extensive external knowledge.

### Real-world Examples and Use Cases

**Knowledge-base Q&A Systems**
* Quickly answering user questions by retrieving precise, authoritative information from structured or unstructured sources.
* Example: Customer support systems retrieving relevant FAQ or product manuals to answer customer queries.

**Chatbots with External Knowledge Bases**
* Dynamic chatbots integrated with knowledge bases or external databases to offer up-to-date, personalized interactions.
* Example: Travel assistant chatbot retrieving flight schedules, weather data, and travel restrictions.

**Enterprise-level AI Assistants**
* Assisting professionals in fields such as law, medicine, or technical documentation by providing quick access to domain-specific knowledge.
* Example: Medical assistants that generate treatment suggestions based on the latest clinical guidelines and patient histories.

## Core Concepts and Components of RAG

To effectively build and deploy Retrieval-Augmented Generation systems, it’s crucial to understand their core components: the **Retriever**, the **Generator (Reader)**, and the overall **End-to-End Flow**.

### Retriever

The retriever component is responsible for identifying and fetching the most relevant documents or information chunks from an external knowledge base given a query. Retrieval methods typically fall into two categories: **Sparse** and **Dense**.

##### Sparse Methods (Keyword-Based):

Sparse retrieval methods rely on exact term matches and statistical weighting (like TF-IDF or BM25).

* **TF-IDF**: Scores words based on frequency across documents.
* **BM25**: An improvement that adjusts for document length and term saturation.

In [8]:
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

# Sample documents
docs = [
    "The cat sat on the mat.",
    "Dogs and cats are pets.",
    "The mat was red and soft.",
    "Pets are lovely companions."
]

# Step 1: Initialize TF-IDF Vectorizer
vectorizer = TfidfVectorizer()

# Step 2: Fit and transform the documents
tfidf_matrix = vectorizer.fit_transform(docs)

# Step 3: Get the list of terms (features)
terms = vectorizer.get_feature_names_out()

# Step 4: Convert the TF-IDF matrix into a DataFrame for better readability
tfidf_df = pd.DataFrame(tfidf_matrix.toarray(), columns=terms)

# Step 5: Display the terms
print("Vocabulary terms:\n")
print(terms)

# Step 6: Display the TF-IDF matrix nicely
print("\nTF-IDF Weighted Document-Term Matrix:\n")
tfidf_df.round(3)


Vocabulary terms:

['and' 'are' 'cat' 'cats' 'companions' 'dogs' 'lovely' 'mat' 'on' 'pets'
 'red' 'sat' 'soft' 'the' 'was']

TF-IDF Weighted Document-Term Matrix:



Unnamed: 0,and,are,cat,cats,companions,dogs,lovely,mat,on,pets,red,sat,soft,the,was
0,0.0,0.0,0.405,0.0,0.0,0.0,0.0,0.319,0.405,0.0,0.0,0.405,0.0,0.638,0.0
1,0.401,0.401,0.0,0.509,0.0,0.509,0.0,0.0,0.0,0.401,0.0,0.0,0.0,0.0,0.0
2,0.357,0.0,0.0,0.0,0.0,0.0,0.0,0.357,0.0,0.0,0.453,0.0,0.453,0.357,0.453
3,0.0,0.438,0.0,0.0,0.555,0.0,0.555,0.0,0.0,0.438,0.0,0.0,0.0,0.0,0.0


##### **Exercise**:
Modify the above code to search which document is most relevant to the query: "cats love mats".
(Hint: Vectorize the query and compute cosine similarity!)

# Sources

[RAG++ : From POC to Production](https://wandb.ai/site/courses/rag)