### Fine-Tuning LLM uing LoRA with Custom Dataset

Intro: This notebook works on fine-tuning a LLM on a custom dataset using Low Rank Adaptartion (LoRA) technique.

Method: LoRA is a fine-tuning techniqe that introduces trainable low-rank matrices (way smaller than original weight matrix) without training all the paramaters of the model. These low rank matrices are trained on the dataset (e.g. on domain specifi task) while original LLM model parameters are frozen, and then added to the model to introduce task specific specialization to the LLM

Dataset: 


Loading the pretrained Llama 3.2 model

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Define the model name (replace with the actual LLaMA checkpoint)
model_name = "llama-3.2"  # Or path to the local LLaMA checkpoint

# Load the pre-trained LLaMA model and tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)



Prepare for LoRA Fine-Tuning

In [None]:
from peft import LoraConfig, get_peft_model

# Define the LoRA configuration
lora_config = LoraConfig(
    r=8,                # Rank of the LoRA update matrices
    lora_alpha=32,      # Scaling factor
    target_modules=["q_proj", "v_proj"],  # Target modules for LoRA
    lora_dropout=0.1,   # Dropout probability
    bias="none",        # LoRA bias type
    task_type="CAUSAL_LM"  # Task type for causal language modeling
)

# Apply LoRA to the model
model = get_peft_model(model, lora_config)

How to Find Target Modules in the Model like query and key projetcion layers from the attention mechanism of the transformer based Llama model
- Print the moodel architecture `print(model)`
- This displays all layers and their names. Look for components related to attention (e.g., `self_attn, q_proj, k_proj,` etc.).
- Search for Layer Names: Manually search for names like `q_proj, v_proj`, or other layers based on the task.
- Common Attention Layers in Transformers:
    - Query (q_proj): Projects input into queries for self-attention.
    - Key (k_proj): Projects input into keys for self-attention.
    - Value (v_proj): Projects input into values for self-attention.
- Example output:
```
(transformer.layers.0.self_attn.q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(transformer.layers.0.self_attn.k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(transformer.layers.0.self_attn.v_proj): Linear(in_features=1024, out_features=1024, bias=True)
```
- From the output: `q_proj` and `v_proj` are layers in the `self_attn` module of the first Transformer layer (`transformer.layers.0`).
- LoRA can be applied to all relevant layers throughout the model, depending on the specified configuration
- When target_modules=["q_proj", "v_proj"] defined in the LoRA configuration, LoRA will automatically locate and modify the q_proj and v_proj layers in all transformer layers (not just the first one).