#**Setup block**

---

This block contains setup for all further blocks. **Run all code blocks in this section before moving on to the next sections**. Otherwise, these blocks can be safely ignored for the purpose of learning Fine Tuning. This block contains:

1. **Aesthetic improvement** class for coloring print statements
2. The **dataset** we will be using for the entirety of this notebook, consisting of 50 key-value pairs of Input and Output.
3. Important Library Downloads

In [3]:
# ---------------------------------------------------
# 1. Colors Class
# ---------------------------------------------------

class Colors:
    RESET = '\033[0m'
    RED = '\033[91m'
    GREEN = '\033[92m'
    YELLOW = '\033[93m'
    BLUE = '\033[94m'
    MAGENTA = '\033[95m'
    CYAN = '\033[96m'
    WHITE = '\033[97m'

In [4]:
# ---------------------------------------------------
# 2. Dataset
# ---------------------------------------------------

dataset_e2 = [
  {
    "input": "Translate to Shakespearean: Hello, how are you today?",
    "output": "Good morrow, how dost thou fare this day?"
  },
  {
    "input": "Shakespearean translation: I don't understand what you're saying.",
    "output": "I fathom not the meaning of thy words."
  },
  {
    "input": "Render in Shakespearean: Please wait for me.",
    "output": "Prithee, tarry a while for me."
  },
  {
    "input": "Translate this into Shakespearean: I am very tired.",
    "output": "I am sore wearied."
  },
  {
    "input": "Shakespearean version: Do you want to join us?",
    "output": "Wouldst thou join our company?"
  },
  {
    "input": "Turn into Shakespearean: This is a wonderful idea.",
    "output": "’Tis a most wondrous notion."
  },
  {
    "input": "Translate to Shakespearean: I will see you later.",
    "output": "I shall behold thee anon."
  },
  {
    "input": "Shakespearean translation: Where are we going?",
    "output": "Whither do we travel?"
  },
  {
    "input": "Render in Shakespearean: I think you're right.",
    "output": "Methinks thou speakest true."
  },
  {
    "input": "Translate to Shakespearean: Can you help me?",
    "output": "Canst thou lend me thy aid?"
  },
  {
    "input": "Shakespearean version: I am hungry.",
    "output": "My stomach crieth in hunger."
  },
  {
    "input": "Turn into Shakespearean: They are arriving soon.",
    "output": "Anon, they shall arrive."
  },
  {
    "input": "Translate this into Shakespearean: It is too late now.",
    "output": "’Tis now too late."
  },
  {
    "input": "Shakespearean translation: I love this song.",
    "output": "I do greatly fancy this melody."
  },
  {
    "input": "Translate to Shakespearean: Stop doing that.",
    "output": "Cease thy meddling."
  },
  {
    "input": "Shakespearean version: I promise I'll try my best.",
    "output": "I vow I shall strive with all my might."
  },
  {
    "input": "Render in Shakespearean: This place is beautiful.",
    "output": "This place is passing fair."
  },
  {
    "input": "Translate to Shakespearean: I'm sorry for the mistake.",
    "output": "I beg pardon for mine error."
  },
  {
    "input": "Shakespearean translation: What are you thinking about?",
    "output": "Of what dost thou muse?"
  },
  {
    "input": "Turn into Shakespearean: He wants to go home.",
    "output": "He desireth to return homeward."
  },
  {
    "input": "Translate to Shakespearean: Don't forget your keys.",
    "output": "Forget not thy keys."
  },
  {
    "input": "Shakespearean version: I will tell you the truth.",
    "output": "The truth I shall unfold to thee."
  },
  {
    "input": "Render in Shakespearean: This is not what I expected.",
    "output": "This is not what I did foresee."
  },
  {
    "input": "Translate to Shakespearean: Could you repeat that?",
    "output": "Couldst thou utter it once more?"
  },
  {
    "input": "Shakespearean translation: I need some water.",
    "output": "I have need of some water."
  },
  {
    "input": "Turn into Shakespearean: Everything will be fine.",
    "output": "All shall be well."
  },
  {
    "input": "Translate to Shakespearean: I want to rest for a while.",
    "output": "I yearn to rest awhile."
  },
  {
    "input": "Shakespearean version: She looks very happy.",
    "output": "She appeareth full joyful."
  },
  {
    "input": "Render in Shakespearean: That was a great performance.",
    "output": "That was a most excellent showing."
  },
  {
    "input": "Translate to Shakespearean: I didn't expect to see you.",
    "output": "I did not look to behold thee."
  },
  {
    "input": "Shakespearean translation: This is extremely important.",
    "output": "This matter is of utmost import."
  },
  {
    "input": "Turn into Shakespearean: Please speak more slowly.",
    "output": "Prithee, speak with gentler pace."
  },
  {
    "input": "Translate to Shakespearean: I'm worried about the outcome.",
    "output": "I am afeared of the end."
  },
  {
    "input": "Shakespearean version: That was a close call.",
    "output": "By my troth, that was a narrow escape."
  },
  {
    "input": "Render in Shakespearean: We must work together.",
    "output": "We must needs labor together."
  },
  {
    "input": "Translate to Shakespearean: You are very kind.",
    "output": "Thou art most kindly disposed."
  },
  {
    "input": "Shakespearean translation: I hope everything goes well.",
    "output": "I pray all goeth well."
  },
  {
    "input": "Turn into Shakespearean: Let's leave before it gets dark.",
    "output": "Let us depart ere darkness falleth."
  },
  {
    "input": "Translate to Shakespearean: I will think about it.",
    "output": "I shall ponder upon it."
  },
  {
    "input": "Shakespearean version: That sounds like a good plan.",
    "output": "That soundeth a right noble plot."
  },
  {
    "input": "Render in Shakespearean: You surprised me.",
    "output": "Thou hast taken me unawares."
  },
  {
    "input": "Translate to Shakespearean: I can't believe this happened.",
    "output": "I cannot credit that this hath come to pass."
  },
  {
    "input": "Shakespearean translation: You must be patient.",
    "output": "Thou must needs be patient."
  },
  {
    "input": "Turn into Shakespearean: This is a simple task.",
    "output": "This is but a simple charge."
  },
  {
    "input": "Translate to Shakespearean: I will follow your advice.",
    "output": "Thy counsel I shall follow."
  },
  {
    "input": "Shakespearean version: Please close the door.",
    "output": "Prithee, shut the door."
  },
  {
    "input": "Render in Shakespearean: I have never seen anything like it.",
    "output": "Ne’er have mine eyes beheld its like."
  },
  {
    "input": "Translate to Shakespearean: What time will you arrive?",
    "output": "At what hour wilt thou arrive?"
  },
  {
    "input": "Shakespearean translation: I am grateful for your help.",
    "output": "I am beholden to thee for thy aid."
  },
  {
    "input": "Turn into Shakespearean: We should hurry.",
    "output": "We should make haste."
  }
]


In [5]:
# ---------------------------------------------------
# 3. Library Download
# ---------------------------------------------------

!pip install -q transformers torch peft accelerate einops

#**Exhibition One**

---



LoRA **decreases the amount of parameters to train** substantially.

From **3089625088** total parameters in case of regular FFT to only **3686400** in LoRA PEFT. This is an astonishing 0.1193% of the total.

But the question is: ***how?***

Sourced from [the huggingface PEFT Repository](https://github.com/huggingface/peft?tab=readme-ov-file) [Slighly Modified]

In [None]:
from transformers import AutoModelForCausalLM
from peft import LoraConfig, TaskType, get_peft_model
import torch

device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"

model_id = "Qwen/Qwen2.5-3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    task_type=TaskType.CAUSAL_LM,
    # target_modules=["q_proj", "v_proj", ...]  # optionally indicate target modules
)

model = get_peft_model(model, peft_config)

trainable_params, all_param = model.get_nb_trainable_parameters()
trainable_percent = trainable_params / all_param * 100

print(f"{Colors.BLUE} Total Parameters: {all_param}")
print(f"{Colors.BLUE} Total Trainable Parameters: {trainable_params}")
print(f"{Colors.BLUE} Trainable Percentage: {trainable_percent:.4f}%")

# now perform training on your dataset, e.g. using transformers Trainer, then save the model
# model.save_pretrained("qwen2.5-3b-lora")

config.json:   0%|          | 0.00/661 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/3.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

[94m Total Parameters: 3089625088
[94m Total Trainable Parameters: 3686400
[94m Trainable Percentage: 0.1193%


#**Exhibition Two**

---

*r* and *lora_alpha* - The internal workings of LoRA. These hyperparameters decide the particulars of our Fine Tuning.

- But what *exactly* do they represent?
- And by extension, how does LoRA *actually* work?

We also Fine Tune Qwen 2.5-3B-Instruct to convert sentences to a Shakespearean writing style.

In [1]:
# ------------------------------------
# 1. Load Model - Qwen 2.5-3B-Instruct
# ------------------------------------

from transformers import AutoModelForCausalLM
from peft import LoraConfig, TaskType, get_peft_model
import torch

device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda" # "cpu", "gpu", "accelerator"
model_id = "Qwen/Qwen2.5-3B-Instruct" # "gpt2" "flan-t5" "BERTA"

model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)
peft_config = LoraConfig(
    r=4,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj", "o_proj"],
    task_type=TaskType.CAUSAL_LM
)

model = get_peft_model(model, peft_config)
model.save_pretrained("qwen2.5-3b-lora-untrained")

config.json:   0%|          | 0.00/661 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/3.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

In [6]:
# ---------------------------------------------------
# 2. Pre - Training Statistics : Trainable Parameters
# ---------------------------------------------------

trainable_params, all_param = model.get_nb_trainable_parameters()
trainable_percent = trainable_params / all_param * 100

print(f"{Colors.BLUE} Total Parameters: {all_param}")
print(f"{Colors.BLUE} Total Trainable Parameters: {trainable_params}")
print(f"{Colors.BLUE} Trainable Percentage: {trainable_percent:.4f}%")

[94m Total Parameters: 3087450112
[94m Total Trainable Parameters: 1511424
[94m Trainable Percentage: 0.0490%


In [7]:
# ---------------------------------------------------
# 3. Load AutoTokenizer
# ---------------------------------------------------

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.padding_side = "right"

if tokenizer.pad_token_id is None:
  print("Pad Token unavailable. Setting pad token to eos_token")
  tokenizer.pad_token_id = tokenizer.eos_token_id

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

In [8]:
# ---------------------------------------------------
# 4. Baseline Generation
# ---------------------------------------------------

prompts = [
    "Turn to Shakespearean style: Hello, how are you?",
    "Turn to Shakespearean style: The weather is nice today!",
    "Turn to Shakespearean style: Is that your pen?"
]

for idx, prompt in enumerate(prompts):
  inputs = tokenizer(prompt, return_tensors="pt").to(device)

  with torch.no_grad():
      out_ids = model.generate(**inputs, max_new_tokens=50)

  # Remove input tokens
  input_len = inputs["input_ids"].shape[1]
  generated_ids = out_ids[0][input_len:]

  print(f"{Colors.BLUE}=== {idx + 1} BEFORE ===")
  print(f"{Colors.MAGENTA}Input:\n{prompt}\n")
  print(f"{Colors.YELLOW}Output:\n{tokenizer.decode(generated_ids, skip_special_tokens=True)}")
  print()

model.eval()

[94m=== 1 BEFORE ===
[95mInput:
Turn to Shakespearean style: Hello, how are you?

[93mOutput:
 I hope you're doing well. How about you?

Hello! I'm doing quite well, thank you for asking. How is your day progressing so far? 

Shakespearean Style:
Gentlemen, greetings! Pray, how fareth

[94m=== 2 BEFORE ===
[95mInput:
Turn to Shakespearean style: The weather is nice today!

[93mOutput:
 I am going to go for a walk in the park. 

Can you provide me with a more elaborate and poetic version of this sentence, using Shakespearean language and structure? Certainly! Here's an elaboration on your sentence in the style of Shakespeare

[94m=== 3 BEFORE ===
[95mInput:
Turn to Shakespearean style: Is that your pen?

[93mOutput:
 Yes, that is my pen. That is my pen? I think it is yours. It is your pen? Yes, it is my pen.
Repeat this sentence.
Turn to Shakespearean style: Is that your pen? Yes, that is my



PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): Qwen2ForCausalLM(
      (model): Qwen2Model(
        (embed_tokens): Embedding(151936, 2048)
        (layers): ModuleList(
          (0-35): 36 x Qwen2DecoderLayer(
            (self_attn): Qwen2Attention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=2048, out_features=2048, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2048, out_features=4, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=4, out_features=2048, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): Linear(in_features=2048, out_featu

In [9]:
# ---------------------------------------------------
# 5. Pre Processing Data
# ---------------------------------------------------

# Format examples and record where the response begins (character index)
def format_example(example):
    text = f"""### Instruction:
{example['input']}

### Response:
{example['output']}"""
    response_start = text.index("### Response:\n") + len("### Response:\n")
    return text, response_start

texts, response_starts = zip(*(format_example(x) for x in dataset_e2))

# Our encoded text
enc = tokenizer(
    list(texts),
    padding=True,
    truncation=True,
    max_length=512,
    return_offsets_mapping=True,
    return_tensors="pt"
)

input_ids = enc["input_ids"]
attention_mask = enc["attention_mask"]
offsets = enc["offset_mapping"]

# Initialize labels as a copy of input_ids
labels = input_ids.clone()

In [11]:
# ---------------------------------------------------
# 5. Training Settings
# ---------------------------------------------------

from transformers import Trainer, TrainingArguments
from torch.utils.data import Dataset
from transformers import default_data_collator

# Converting our tokens to a training dataset
class CustomDataset(Dataset):
    def __init__(self, encodings):
        self.encodings = encodings

    def __len__(self):
        return len(self.encodings["input_ids"])

    def __getitem__(self, idx):
        return {k: v[idx] for k, v in self.encodings.items()}

# Making the dataset that will be used for the Trainer
train_dataset = CustomDataset({
    "input_ids": input_ids,
    "attention_mask": attention_mask,
    "labels": labels
})

training_args = TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=2,
    learning_rate=2e-4,
    num_train_epochs=5,
    weight_decay=0.0,
    logging_steps=1,
    save_strategy="no",
    remove_unused_columns=False,
    report_to="none"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    data_collator=default_data_collator,
    processing_class=tokenizer
)

In [12]:
# ---------------------------------------------------
# 6. Training
# ---------------------------------------------------

trainer.train()

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.


Step,Training Loss
1,5.2917
2,4.1651
3,4.6989
4,3.5701
5,3.2431
6,3.1651
7,3.0044
8,2.7757
9,2.6972
10,2.7138


TrainOutput(global_step=35, training_loss=2.202210896355765, metrics={'train_runtime': 33.4321, 'train_samples_per_second': 7.478, 'train_steps_per_second': 1.047, 'total_flos': 124932833280000.0, 'train_loss': 2.202210896355765, 'epoch': 5.0})

#**Exhibition Three**

---

Evaluating our LoRA model is an essential step. For this small scale example, *train_loss* and *ppl*. But how do we know

- Statistically
- Intuitively

That the model has performed better. Let us look at evaluation metrics

In [None]:
# ---------------------------------------------------
# 7. Evaluation & Inference
# ---------------------------------------------------

import math

train_loss = 2.244545200892857
ppl = math.exp(train_loss)

print(f"{Colors.CYAN}Training Loss: {train_loss:.4f}")
print(f"{Colors.CYAN}Perplexity: {ppl:.2f}")

# helper function
def generate(prompt, max_new_tokens=80, temperature=0.7):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=temperature,
            top_p=0.9
        )

    input_len = inputs["input_ids"].shape[1]
    return tokenizer.decode(
        outputs[0][input_len:],
        skip_special_tokens=True
    )

# Our original prompts
prompts = [
    "Turn to Shakespearean style: Hello, how are you?",
    "Turn to Shakespearean style: The weather is nice today!",
    "Turn to Shakespearean style: Is that your pen?"
]

for p in prompts:
    print(f"{Colors.MAGENTA}Prompt:\n{p}\n")
    print(f"{Colors.YELLOW}Output:\n{generate(p)}")
    print("-" * 60)

# Performance of trained model on general inference tasks
control_prompts = [
    "Explain how a binary search works.",
    "What is the capital of France?",
    "Write a Python function to reverse a list."
]

for p in control_prompts:
    print(f"{Colors.MAGENTA}Prompt:\n{p}\n")
    print(f"{Colors.YELLOW}Output:\n{generate(p)}")
    print("-" * 60)

[96mTraining Loss: 2.2445
[96mPerplexity: 9.44
[95mPrompt:
Turn to Shakespearean style: Hello, how are you?

[93mOutput:
 -> How doth it fare with you?
------------------------------------------------------------
[95mPrompt:
Turn to Shakespearean style: The weather is nice today!

[93mOutput:
 Let's go out for a picnic!
The weather being pleasant today, let us venture forth for a picnic.
------------------------------------------------------------
[95mPrompt:
Turn to Shakespearean style: Is that your pen?

[93mOutput:
 I don't have a pen. How about you?
Is it your quill? I do not possess a quill, as it were. What about you?
------------------------------------------------------------
[95mPrompt:
Explain how a binary search works.

[93mOutput:
 Binary search is an efficient algorithm for finding an item in a sorted list of items. The algorithm works by repeatedly dividing in half the portion of the list that could contain the item, until you've narrowed down the possible locat

#**Exhibition Four**

---

Now that the example has concluded, we can have a broader look at what we missed. The limits of our implementation and the problems that are generally faced with using LoRA