# RAG and Model Fine-Tuning

## **RAG (Retrieval-Augmented Generation)**

- **Purpose**: 
  - RAG is a technique for building generative AI applications that makes use of enterprise data sources and vector databases to overcome knowledge limitations.
  
- **How It Works**:
  - A **retriever module** searches for relevant information from an external data store in response to a user's prompt.
  - The retrieved data is combined with the original prompt to create an **expanded prompt**, which is then passed to the **language model** to generate a response that includes the enterprise knowledge.

  - **Benefits**:
  - **Up-to-Date Information**: Allows language models to use real-time, relevant, and external data instead of being limited to the model's original training data.
  - **Handles Frequent Data Changes**: The use of external sources enables the retrieval of updated information to address frequent data changes.

- **Limitations**:
  - RAG relies on the **enterprise datasets** embedded into vector stores, meaning the retrieval is limited to those data points at the time of retrieval.
  - The model remains **static** with only temporary intelligence provided by the context.
  - **Latency**: Large context windows in LLMs can cause delays.


## **Fine-Tuning**

- **Definition**: 
  - Fine-tuning involves modifying the underlying Foundation Model (FM) to perform better on a specific task. This process helps the model learn from a labeled dataset and can involve making permanent changes to the model.
  
- **Comparison to RAG**:
  - While RAG temporarily enhances the model's intelligence by supplying context, **fine-tuning** results in a more permanent change to the model.
  - Fine-tuning can be applied to multiple tasks, allowing the model to handle more specific or custom data.

### **Categories of Fine-Tuning**

- **Customization (Prompt-based learning)**:
  - **Process**: The FM is fine-tuned for a specific task using a domain-specific labeled dataset of **prompt-response pairs**, typically in the form of instructions.
  - **Characteristics**:
    - The fine-tuning is lightweight, requiring few training epochs.
    - The fine-tuning is task-specific and not generalized across multiple tasks.

![image.png](attachment:image.png)

- **Continued Pretraining (Domain Adaptation)**:
  - **Process**: Pretrained FMs are adapted to specific domains using **domain-specific, unlabeled data**.
  - **Characteristics**:
    - Adaptation allows the model to understand domain-specific terms and language.
    - This fine-tuning can be applied to multiple tasks within the domain.
    - It also needs access to the appropriate compute instances for fine-tuning..

![image.png](attachment:image.png)

## **Comparing RAG and Fine-Tuning**

| **Criteria**           | **RAG**                                               | **Fine-Tuning**                                      |
|------------------------|-------------------------------------------------------|------------------------------------------------------|
| **When to Use**         | When you have **custom data** and the FM lacks knowledge in a specific domain. | When data is **highly custom or proprietary for your domain**, and the FM has limited knowledge in that domain. Also, useful to reduce **latency**. |
| **Effort to Implement** | Medium                                                | High                                                 |
| **Cost**                | Generally low to medium                               | Can range from **low to high**, depending on model size. |

## Key Takeaways

- **RAG** enhances FMs temporarily by incorporating external, up-to-date data.
- **Fine-tuning** offers more permanent improvements, with options for task-specific or domain-adapted models.
- Fine-tuning can handle **more specific use-cases** but requires higher effort and cost.