**Currently Learning From:**

[LangChain_Official](https://www.youtube.com/playlist?list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x)

[AI_Bites](https://www.youtube.com/playlist?list=PLcp6ZnH4WYlaYWCuDZ8oJNZaOFzmB8ciK)

[FreeCodeCamp](https://www.youtube.com/watch?v=sVcwVQRHIc8)

[Krish_Naik](https://www.youtube.com/playlist?list=PLZoTAELRMXVM8Pf4U67L4UuDRgV4TNX9D)

<hr>


## **Introduction to RAG**

Main reason for RAG: LLMs have a limited context window. RAG helps to overcome this limitation by retrieving relevant information from an external knowledge base.

**Before RAG:**

- Input Query → LLM → Output

This approach relies solely on the knowledge encoded within the LLM, which may be outdated or insufficient for specific queries.

This approach can lead to hallucinations, where the LLM generates plausible-sounding but incorrect or nonsensical answers.

**With RAG:**

- Input Query → Retriever → Relevant Documents + LLM → Output

In this approach, the retriever fetches relevant documents from an external knowledge base based on the input query. The LLM then uses these documents to generate a more accurate and contextually relevant response.

This approach somehow mitigates the hallucination problem by grounding the LLM's responses in actual data retrieved from the knowledge base.

This works, but **Can't we fine-tune the LLM directly on the external knowledge?**

**Answer:** Fine-tuning LLMs on large external knowledge bases can be computationally expensive and time-consuming. RAG allows for dynamic retrieval of information without the need for extensive fine-tuning, making it more efficient and adaptable to changing knowledge.

<hr>

## **Components of RAG:**

### **1. Retriever:**

The retriever is responsible for fetching relevant documents or information from an external knowledge base based on the input query. The external knowledge is fetched from sources like Vector Database, Scraped Data, PDFs, etc.

Types of Retrievers:

- **Sparse Retrievers:** Use traditional methods like TF-IDF or BM25 to find relevant documents based on keyword matching.

- **Dense Retrievers:** Use neural networks to create dense vector representations of documents and queries, allowing for more semantic matching. Examples include models like Sentence Transformers.

### **2. Language Model (LLM):**

The LLM generates responses based on the input query and the retrieved documents. It can be any large language model, such as GPT-3, GPT-4, or other open-source models.

These things can be combined in different ways to create various RAG architectures, such as:

- **Retrieve-then-Generate:** The retriever fetches relevant documents first, and then the LLM generates a response based on those documents.

- **Generate-then-Retrieve:** The LLM generates an initial response, which is then refined using information from the retrieved documents.

<hr>

## **Workflow of RAG:**

1. **Input Query:** The user provides an input query that they want to get information about.

2. **Parse Query:** The query is parsed and pre-processed to prepare it for retrieval.

3. **Generate Embedding for Query:** The parsed query is converted into a vector representation (embedding) using a pre-trained model.

4. **Retrieve Relevant Documents:** The retriever uses the query embedding to search the external knowledge base and fetch relevant documents.

5. **Combine Query and Documents:** The input query and the retrieved documents are combined to form a context for the LLM.

6. **Generate Response:** The LLM processes the combined context and generates a response to the input query.

7. **Output Response:** The generated response is returned to the user.

But, this is **Traditional RAG Workflow.**.

There is **Agentic RAG Workflow** as well, where multiple agents can be used to handle different parts of the retrieval and generation process, allowing for more complex interactions and improved performance.

<hr>

## **Knowledge Base Creation**

**What is a Knowledge Base?**

A knowledge base is a structured repository of information that can be used to support the retrieval of relevant documents for RAG systems. It can consist of various types of data, such as text documents, PDFs, web pages, or any other form of unstructured or semi-structured data.

It is basically a `Vector Database` that stores the vector representations of documents for efficient retrieval.

There are several steps involved in creating a knowledge base for RAG systems.

<hr>

### **Steps to Create a Knowledge Base**

**1. Data Collection:**

The first step is to gather the data that will form the basis of the knowledge base. This can involve scraping web pages, collecting documents, or using existing datasets.

We need to collect all the relevant data sources. This includes documents, articles, PDFs, web pages, images, videos, audio files, etc.

**2. Data Preprocessing:**

Once the data is collected, it needs to be preprocessed to ensure consistency and quality. This may involve cleaning the text, removing duplicates, and standardizing formats.

Preprocessing steps may include:

- Text cleaning (removing special characters, HTML tags, etc.)

- Tokenization (splitting text into words or sentences)

- Normalization (lowercasing, stemming, lemmatization)

- Removing stop words (common words that do not add much meaning)

**3. Document Segmentation:**

Large documents should be segmented into smaller, manageable chunks. This helps in better retrieval and ensures that the LLM can effectively utilize the information.

Document segmentation can be done based on:

- Paragraphs

- Sentences

- Fixed-size `Chunks` (e.g., 512 tokens)

**4. Embedding Generation:**

The next step is to convert the preprocessed text into vector representations (embeddings) using a suitable embedding model. These embeddings capture the semantic meaning of the text and allow for efficient similarity searches.

Popular embedding models include:

- Sentence Transformers (e.g., `all-MiniLM-L6-v2`, `paraphrase-MiniLM-L3-v2`)

- OpenAI Embeddings (e.g., `text-embedding-ada-002`)

- Image Embeddings (e.g., CLIP for images)

- Multimodal Embeddings (e.g., models that combine text and image embeddings)

**5. Indexing:**

Once the embeddings are generated, they need to be indexed in a vector database for efficient retrieval. Popular vector databases include:

- Milvus

- Pinecone

- Weaviate

<hr>


<hr>
<hr>


## **What is `RetrievalQAChain`?**

`RetrievalQAChain` is a specialized chain in the LangChain framework designed to facilitate question-answering tasks by leveraging a retrieval-based approach. It combines the capabilities of a language model with a document retrieval system to provide accurate and contextually relevant answers to user queries.

### **Key Features:**

- **Document Retrieval:** `RetrievalQAChain` integrates with various document retrieval systems (like vector stores or traditional databases) to fetch relevant documents based on the user's query.

- **Contextual Understanding:** It uses a language model to understand the context of the retrieved documents and generate coherent answers.

- **Customizable Prompts:** Users can customize the prompts used to query the language model, allowing for tailored responses based on specific requirements.

- **Chain Integration:** It can be easily integrated with other chains in the LangChain framework, enabling complex workflows that involve multiple steps of processing.

### **How It Works:**

1. **Query Input:** The user provides a question or query.

2. **Document Retrieval:** The chain uses the retrieval system to find documents that are relevant to the query.

3. **Answer Generation:** The retrieved documents are then passed to the language model, which processes the information and generates an answer.

4. **Output:** The final answer is returned to the user.
