#### LoRA

LoRA, or Low-Rank Adaptation, is a technique designed for fine-tuning large models efficiently by only updating a small set of additional parameters. Instead of modifying all parameters of a pretrained model, LoRA keeps the original weights frozen and introduces two low-rank matrices that approximate the necessary changes during training.

```py
W_new = W_original + ΔW
ΔW = A × B
```

- W_original: Frozen pre-trained weights
- A: Matrix of size (d × r)
- B: Matrix of size (r × k)
- r: Rank (much smaller than d and k)

```py
Original: h = W × x
LoRA: h = W × x + (B × A) × x
```

This low-rank decomposition allows the model to adapt to new tasks with significantly fewer trainable parameters, reducing both memory usage and computational overhead during training.



In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer, DataCollatorForLanguageModeling
from trl import SFTTrainer, SFTConfig

# Load a small model (adjust the model name as needed)
model_name = "HuggingFaceTB/SmolLM2-135M"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)


In [None]:
# Set tokenizer chat template if needed
if not tokenizer.chat_template:
    tokenizer.chat_template = """{% for message in messages %}
            {% if message['role'] == 'system' %}System: {{ message['content'] }}\n
            {% elif message['role'] == 'user' %}User: {{ message['content'] }}\n
            {% elif message['role'] == 'assistant' %}Assistant: {{ message['content'] }} <|endoftext|>
            {% endif %}
            {% endfor %}"""

if not tokenizer.pad_token:
    tokenizer.pad_token = tokenizer.eos_token