# Quiz : Implementing RAG and Understanding Fine tuning LLMs
---

### Q1. What is the primary purpose of fine-tuning a pre-trained model? 
1. To increase the model's size 
2. To adapt the model to a specific task or domain 
3. To decrease computational requirements 
4. To improve generalization to all tasks

The correct answer is:

**2. To adapt the model to a specific task or domain** 

Fine-tuning takes a pre-trained model (trained on large general datasets) and adjusts its weights on a smaller, task-specific dataset. This helps the model perform better on the target domain without needing to train from scratch.


### Q2. In the context of LLMs, what does PEFT stand for? 
1. Parameter-Efficient Fine-Tuning 
2. Pre-trained Efficient Function Transfer 
3. Parameter Extraction for Fine-Tuning 
4. Pre-trained Efficient Feature Transfer

The correct answer is:

**1. Parameter-Efficient Fine-Tuning (PEFT)** 

In the context of Large Language Models (LLMs), **PEFT** refers to techniques that fine-tune only a small subset of model parameters (like adapters, LoRA, prefix tuning) instead of updating the entire model. This makes fine-tuning more efficient in terms of computation and storage.


### Q3. Which of the following is a regularization technique commonly used in fine-tuning? 
1. Data augmentation 
2. Dropout 
3. Gradient clipping 
4. Batch normalization

The correct answer is:

**2. Dropout** 

Dropout is a **regularization technique** where, during training, some neurons are randomly "dropped" (ignored) to prevent the model from overfitting.

* **Data augmentation** → helps with generalization, but it’s more of a data-level technique.
* **Gradient clipping** → prevents exploding gradients (stability, not regularization).
* **Batch normalization** → stabilizes and speeds up training, not primarily for regularization.


### Q4. What is Zero-shot learning? 
1. Learning with a large dataset 
2. Learning with no labelled examples 
3. Learning with one labelled example 
4. Learning with a few labelled examples

The correct answer is:

**2. Learning with no labelled examples**

**Zero-shot learning** means a model can perform a task **without being trained on any task-specific labeled data**. Instead, it relies on knowledge learned during pre-training and uses natural language prompts to generalize to new tasks.

* **One-shot learning** → trained with just **one labeled example**.
* **Few-shot learning** → trained with a **small number of labeled examples**.


### Q5. Which component is crucial in Retrieval-Augmented Generation (RAG)? 
1. Decoder 
2. Retriever 
3. Encoder 
4. Classifier

The correct answer is:

**2. Retriever**

In **Retrieval-Augmented Generation (RAG)**, the **retriever** is crucial because it fetches relevant information from an external knowledge base or document store. The retrieved context is then passed to the **generator (decoder)** to produce an informed, grounded response.

* **Decoder** → generates the final text but depends on the retriever.
* **Encoder** → processes inputs but isn’t the unique component of RAG.
* **Classifier** → not used in RAG.

 So, the **retriever** is the key piece that makes RAG different from a standard LLM.


### Q6. What does RAG primarily combine in its architecture? 
1. Retrieval and generation models 
2. Classification and regression models 
3. Supervised and unsupervised learning 
4. Data augmentation and regularization

The correct answer is:

**1. Retrieval and generation models**

**Retrieval-Augmented Generation (RAG)** combines:

* A **retriever** → to fetch relevant external knowledge (e.g., from a vector database).
* A **generator** (usually an LLM decoder) → to generate a context-aware response using both the query and retrieved information.

This way, RAG grounds model outputs in **real-world knowledge** instead of relying only on parametric memory.

### Q7. Which of the following is an advantage of fine-tuning over training from scratch?
1. Requires more data 
2. Requires less computation 
3. Increases model size 
4. Decreases model accuracy

The correct answer is:

**2. Requires less computation**

Fine-tuning starts from a **pre-trained model** (already trained on large datasets), so it:

* Needs **less data** and **less compute** than training from scratch.
* Achieves **faster convergence**.
* Often gives **better accuracy** for domain-specific tasks.

Training from scratch, on the other hand, requires **massive data and compute resources**.

### Q8. Few-shot learning is more useful when: 
1. Large labelled datasets are available 
2. Limited labelled data is available 
3. There is no data available 
4. The task is unrelated to the pre-trained model

The correct answer is:

**2. Limited labelled data is available**

**Few-shot learning** is especially useful when you only have a **small number of labeled examples** for a task. The model leverages its pre-trained knowledge and adapts quickly with those few samples.

* If **large labeled datasets** exist → full training/fine-tuning works better.
* If **no data** → zero-shot learning is the approach.
* If the **task is unrelated to the pre-trained model** → even few-shot may not work well.

### Q9. Which method prevents overfitting during fine-tuning? 
1. Increasing learning rate 
2. Regularization techniques 
3. Reducing the number of layers 
4. Increasing dataset size

The correct answer is:

**2. Regularization techniques**

During fine-tuning, **regularization methods** like **dropout, weight decay, early stopping, or layer freezing** are commonly used to prevent overfitting.

* **Increasing learning rate** → usually causes instability, not prevention.
* **Reducing the number of layers** → may simplify the model but doesn’t directly target overfitting.
* **Increasing dataset size** → helps, but it’s a data strategy, not a direct training method.

### Q10. What is a key advantage of RAG over traditional generation methods? 
1. Faster inference time 
2. Enhanced relevance and accuracy 
3. Smaller model size 
4. Easier to implement

The correct answer is:

**2. Enhanced relevance and accuracy**

**Retrieval-Augmented Generation (RAG)** improves responses by pulling **up-to-date and domain-specific knowledge** from external sources, instead of relying only on what’s stored in the model’s parameters.

* **Faster inference time** → not always true (retrieval can add latency).
* **Smaller model size** → RAG doesn’t reduce model size; it adds retrieval.
* **Easier to implement** → it’s actually more complex than plain generation.

So the **key advantage** is that RAG produces **more relevant, factual, and accurate answers**.

### Q11. In PEFT, which technique allows fine-tuning fewer parameters while maintaining performance? 
1. Low-Rank Adaptation (LoRA) 
2. Data augmentation 
3. Transfer learning 
4. Early stopping

The correct answer is:

**1. Low-Rank Adaptation (LoRA)**

In **Parameter-Efficient Fine-Tuning (PEFT)**, **LoRA** inserts small low-rank matrices into the transformer layers and trains only those, while keeping most of the pre-trained weights frozen.

Benefits:

* Fine-tunes **fewer parameters**.
* Uses **less memory and compute**.
* Maintains performance close to full fine-tuning.

The other options don’t fit:

* **Data augmentation** → improves generalization, not PEFT-specific.
* **Transfer learning** → broader concept, not a PEFT method itself.
* **Early stopping** → prevents overfitting, not for reducing trainable parameters.

### Q12. Which regularization technique penalizes the magnitude of model weights? 
1. Dropout 
2. Weight decay 
3. Batch normalization 
4. Gradient clipping

The correct answer is:

**2. Weight decay**

**Weight decay** (also called **L2 regularization**) discourages large weight values by adding a penalty proportional to the square of the weights’ magnitude to the loss function. This helps prevent overfitting and improves generalization.

* **Dropout** → randomly drops neurons during training.
* **Batch normalization** → normalizes activations for stability, not weight penalization.
* **Gradient clipping** → limits gradient size to avoid exploding gradients, not regularization.

### Q13. What is the main role of the generator in a RAG system? 
1. Retrieve relevant documents 
2. Generate text based on retrieved information 
3. Classify input data 
4. Encode input data

The correct answer is:

**2. Generate text based on retrieved information**

In a **Retrieval-Augmented Generation (RAG)** system:

* The **retriever** → fetches relevant documents from an external knowledge base.
* The **generator** → uses both the **input query + retrieved context** to produce a coherent, informed response.

So, the **generator’s main role** is to **generate grounded text**, rather than memorized or hallucinated content.

### Q14. How does fine-tuning improve a pre-trained model's performance on a new task? 
1. By reducing the model's parameters 
2. By re-training the entire model from scratch 
3. By adjusting the model's weights based on new task-specific data 
4. By using a larger pre-trained model

The correct answer is:

**3. By adjusting the model's weights based on new task-specific data**

Fine-tuning takes a **pre-trained model** (trained on large general datasets) and slightly **updates its weights** using **smaller, task-specific data**. This lets the model adapt to the new domain/task while retaining its general knowledge.

* **Reducing parameters** → not the main mechanism.
* **Re-training from scratch** → opposite of fine-tuning.
* **Using a larger model** → can help, but that’s scaling, not fine-tuning.

### Q15. Which of the following is true about few-shot learning? 
1. It requires a completely new model architecture 
2. It uses only a few labelled examples for training 
3. It is less effective than zero-shot learning 
4. It cannot be combined with transfer learning

The correct answer is:

**2. It uses only a few labelled examples for training** 

**Few-shot learning** allows a model to adapt to a task with just a handful of labeled examples, leveraging its **pre-trained knowledge**.

* **1. Requires a new architecture** → No, it works with existing architectures like LLMs.
* **3. Less effective than zero-shot** → Usually **more effective**, since it gets at least some supervision.
* **4. Cannot be combined with transfer learning** → In fact, it’s often built on top of transfer learning.

### Q16. In RAG, what is the purpose of integrating a knowledge base? 
1. To speed up model training 
2. To improve the generation of relevant and accurate responses 
3. To reduce the model's size 
4. To simplify the architecture

The correct answer is:

**2. To improve the generation of relevant and accurate responses**

In **Retrieval-Augmented Generation (RAG)**, the **knowledge base** provides up-to-date, domain-specific, or factual information that the retriever can fetch. The generator then uses this context to create **more accurate, grounded, and relevant outputs**.

* **1. Speed up training** → retrieval usually adds complexity.
* **3. Reduce model size** → RAG doesn’t shrink models.
* **4. Simplify architecture** → it actually makes it more complex.

### Q17. What is a common challenge in fine-tuning large language models (LLMs)? 
1. Overfitting to the new task 
2. Underfitting the pre-trained model 
3. Lack of pre-trained models 
4. Too few parameters to adjust

The correct answer is:

**1. Overfitting to the new task** 

A **common challenge in fine-tuning LLMs** is that with **small task-specific datasets**, the model may **overfit** — it memorizes the training data instead of generalizing.

* **2. Underfitting** → less common, since LLMs are already powerful.
* **3. Lack of pre-trained models** → not true today; many are available (GPT, LLaMA, etc.).
* **4. Too few parameters to adjust** → the opposite problem; LLMs have *billions* of parameters.


### Q18. Which technique in fine-tuning involves freezing certain layers of the model? 
1. Full fine-tuning 
2. Layer-wise fine-tuning 
3. Partial fine-tuning 
4. Selective fine-tuning

The correct answer is:

**4. Selective fine-tuning**

In **selective fine-tuning**, some layers of the pre-trained model are **frozen** (weights not updated), while only specific layers (usually the last few or task-specific heads) are fine-tuned.

* **1. Full fine-tuning** → updates **all** layers.
* **2. Layer-wise fine-tuning** → gradually unfreezes layers during training.
* **3. Partial fine-tuning** → general term, but not the standard name for freezing.

Selective fine-tuning is especially useful for **reducing compute cost** and **avoiding overfitting** on small datasets.

### Q19. Why is retrieval important in RAG systems? 
1. It reduces the complexity of the model 
2. It retrieves relevant documents to enhance the generation process 
3. It decreases training time 
4. It simplifies data preprocessing

The correct answer is:

**2. It retrieves relevant documents to enhance the generation process** 

In **Retrieval-Augmented Generation (RAG)**, the **retriever** fetches relevant documents or knowledge snippets from an external knowledge base. This additional context helps the generator produce **more factual, relevant, and grounded responses**.

* **1. Reduces complexity** → retrieval actually adds complexity.
* **3. Decreases training time** → not the main purpose.
* **4. Simplifies preprocessing** → unrelated.

Retrieval is what makes RAG stand out compared to plain LLMs.

### Q20. Which aspect of fine-tuning requires careful adjustment to prevent overfitting? 
1. Learning rate 
2. Dataset size 
3. Model architecture 
4. Number of layers

The correct answer is:

**1. Learning rate**

In **fine-tuning**, the **learning rate** is crucial — if it’s too high, the model may overwrite useful pre-trained knowledge; if it’s too low, training may be too slow or ineffective. A poorly tuned learning rate often leads to **overfitting or catastrophic forgetting**.

* **2. Dataset size** → matters, but it’s not an adjustable hyperparameter.
* **3. Model architecture** → fixed once chosen; not usually changed during fine-tuning.
* **4. Number of layers** → can be frozen/unfrozen, but overfitting risk is most sensitive to learning rate.

### Q21. Which PEFT technique focuses on adjusting only a subset of model weights? 
1. Fine-tuning the entire model 
2. Layer-wise tuning 
3. Low-rank adaptation (LoRA) 
4. Full model retraining

The correct answer is:

**3. Low-Rank Adaptation (LoRA)**

**LoRA** is a **Parameter-Efficient Fine-Tuning (PEFT)** technique that:

* Keeps most of the pre-trained model’s weights **frozen**.

* Inserts **small low-rank trainable matrices** into specific layers (like attention layers).

* Adjusts **only a subset of weights**, reducing memory and compute costs.

* **1. Fine-tuning the entire model** → opposite of PEFT.

* **2. Layer-wise tuning** → progressively unfreezing layers, not subset weight adjustment.

* **4. Full model retraining** → requires updating all weights from scratch.

### Q22. What is the key advantage of using zero-shot learning in LLMs? 
1. Requires extensive labelled data 
2. Achieves high accuracy with no task-specific training data 
3. Simplifies the model architecture 
4. Enhances transfer learning

The correct answer is:

**2. Achieves high accuracy with no task-specific training data**

**Zero-shot learning** lets an LLM perform a task **without seeing any labeled examples** from that task during training. Instead, it relies on **knowledge gained during pre-training** and **natural language prompts** to generalize.

* **1. Requires extensive labelled data** → opposite of zero-shot.
* **3. Simplifies architecture** → no change to architecture.
* **4. Enhances transfer learning** → related, but not the key advantage.

### Q23. In a RAG system, how does the interaction between retriever and generator enhance performance? 
1. By generating longer text sequences 
2. By improving the relevance and context of generated content 
3. By reducing computational complexity 
4. By increasing the model's capacity

The correct answer is:

**2. By improving the relevance and context of generated content** 

In a **Retrieval-Augmented Generation (RAG)** system:

* The **retriever** fetches relevant documents from a knowledge base.
* The **generator** uses both the **query + retrieved context** to produce responses.

This interaction ensures outputs are **more accurate, contextual, and grounded in real data**, reducing hallucinations.

* **1. Generating longer text** → not the goal.
* **3. Reducing computational complexity** → retrieval adds complexity.
* **4. Increasing model capacity** → model size stays the same; performance improves through external knowledge.

### Q24. Which regularization technique is particularly useful in preventing overfitting in large models during fine-tuning? 
1. Weight decay 
2. Data augmentation 
3. Dropout 
4. Batch normalization

The correct answer is:

**3. Dropout**

**Dropout** is especially effective in **large models** during fine-tuning. It randomly “drops” (sets to zero) a fraction of neurons during training, forcing the model to rely on multiple pathways rather than memorizing patterns. This helps reduce **overfitting**.

* **1. Weight decay** → useful, but dropout is more impactful for very large models.
* **2. Data augmentation** → good, but more common in vision tasks.
* **4. Batch normalization** → stabilizes training, not mainly for overfitting prevention.

### Q25. What is a key challenge when implementing RAG in real-time applications? 
1. Synchronizing the retrieval and generation processes 
2. Reducing the size of the retriever module 
3. Scaling the model to larger datasets 
4. Simplifying the architecture for faster deployment

The correct answer is:

**1. Synchronizing the retrieval and generation processes**

In **real-time RAG applications**, the main challenge is ensuring that:

* The **retriever** quickly finds the most relevant documents.
* The **generator** seamlessly integrates that information to produce a coherent, context-aware response.

This synchronization is difficult because retrieval can introduce **latency**, and poor alignment can reduce response quality.

* **2. Reducing retriever size** → not the main bottleneck.
* **3. Scaling to larger datasets** → important, but more of an offline/data engineering challenge.
* **4. Simplifying architecture** → RAG is inherently more complex; the bigger issue is real-time coordination.

### Q26. Why is it important to balance generalization and specialization in LLMs during fine-tuning? 
1. To maximize model size 
2. To prevent overfitting to specific tasks 
3. To reduce computational requirements 
4. To increase training speed

The correct answer is:

**2. To prevent overfitting to specific tasks**

During **fine-tuning of LLMs**, there’s a trade-off:

* **Generalization** → retaining broad knowledge from pre-training.
* **Specialization** → adapting to the new task/domain.

If the balance is lost, the model may:

* **Overfit** → perform well only on the fine-tuned task but lose general abilities.
* **Underfit** → fail to adapt properly to the new domain.

### Q27. How does few-shot learning differ from traditional supervised learning? 
1. It requires a large amount of labelled data 
2. It generalizes better from a smaller number of examples 
3. It is less accurate than tradition methods 
4. It requires completely new model architectures

The correct answer is:

**2. It generalizes better from a smaller number of examples** 

**Few-shot learning** is different from traditional supervised learning because:

* It works with **very limited labeled data** (a few examples per class/task).
* It leverages **pre-trained knowledge** to generalize from those few examples.

In contrast, **traditional supervised learning** usually needs **large labeled datasets** to achieve good performance.

* **1. Requires large data** → opposite of few-shot.
* **3. Less accurate** → not always true; often competitive.
* **4. Requires new architectures** → often works with existing LLMs.

### Q28. What role does the generator play in improving RAG's performance? 
1. It selects relevant documents 
2. It encodes input data 
3. It generates responses based on retrieved documents 
4. It reduces model size

The correct answer is:

**3. It generates responses based on retrieved documents**

In **Retrieval-Augmented Generation (RAG)**:

* The **retriever** → fetches relevant documents from the knowledge base.

* The **generator** → combines the **input query + retrieved context** to produce accurate, grounded, and coherent responses.

* **1. Selects relevant documents** → that’s the retriever’s job.

* **2. Encodes input data** → handled by encoder components.

* **4. Reduces model size** → RAG doesn’t shrink models.

### Q29. Which approach can be used to prevent overfitting during LLM fine-tuning? 
1. Increasing the learning rate 
2. Using smaller datasets 
3. Applying dropout and weight decay 
4. Removing regularization

The correct answer is:

**3. Applying dropout and weight decay**

To **prevent overfitting** in LLM fine-tuning, common strategies include:

* **Dropout** → randomly drops neurons during training.

* **Weight decay (L2 regularization)** → penalizes large weights.

* **Early stopping, layer freezing, and data augmentation** can also help.

* **1. Increasing learning rate** → usually worsens overfitting.

* **2. Using smaller datasets** → makes overfitting worse.

* **4. Removing regularization** → increases overfitting risk.

### Q30. When would you prefer using RAG over a fine-tuned model? 
1. When the task requires generating creative content 
2. When external knowledge or specific information retrieval is crucial 
3. Whwn fine-tuning is not computationally feasible
4. When general-purpose responses are needed

The correct answer is:

**2. When external knowledge or specific information retrieval is crucial** 

**RAG (Retrieval-Augmented Generation)** is preferred when:

* The model needs **up-to-date or domain-specific knowledge**.

* The information required is **too large or dynamic** to store in model parameters.

* Example: customer support chatbots, legal/medical assistants, research tools.

* **1. Creative content** → a fine-tuned model or base LLM works better.

* **4. General-purpose responses** → fine-tuned or pre-trained models are sufficient.