# TD10b - Simplified LoRA Implementation

#### Obtain LoRA Model

We will not be using GPT-2 as we previously did because the architecture of GPT-2 is not really compatible with the libraries that do LoRA for us. Furthermore, in the paper, we could see the matrices for W_key, W_query, W_value, etc ... which we couldn't clearly see when we printed the GPT-2 model. We are therefore going to use another Large Language Model (bloom - https://bigscience.huggingface.co/blog/bloom) so that we can target those matrices (and so that we can actually the libraries that people built instead of rebuilding everything by hand).

#### Install Dependencies

In [1]:
%pip install datasets
%pip install git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/transformers.git

Collecting datasets
  Using cached datasets-2.21.0-py3-none-any.whl.metadata (21 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Using cached pyarrow-17.0.0-cp312-cp312-win_amd64.whl.metadata (3.4 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Using cached dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting pandas (from datasets)
  Using cached pandas-2.2.2-cp312-cp312-win_amd64.whl.metadata (19 kB)
Collecting xxhash (from datasets)
  Using cached xxhash-3.5.0-cp312-cp312-win_amd64.whl.metadata (13 kB)
Collecting multiprocess (from datasets)
  Using cached multiprocess-0.70.16-py312-none-any.whl.metadata (7.2 kB)
Collecting aiohttp (from datasets)
  Using cached aiohttp-3.10.5-cp312-cp312-win_amd64.whl.metadata (7.8 kB)
Collecting aiohappyeyeballs>=2.3.0 (from aiohttp->datasets)
  Using cached aiohappyeyeballs-2.4.0-py3-none-any.whl.metadata (5.9 kB)
Collecting aiosignal>=1.1.2 (from aiohttp->datasets)
  Using cached aiosignal-1.3.1-py3-none-any.whl.metadata (4.0 kB)
Collecti


[notice] A new release of pip is available: 24.1.2 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting git+https://github.com/huggingface/peft.git
  Cloning https://github.com/huggingface/peft.git to c:\users\lhott\appdata\local\temp\pip-req-build-tmccl8_6
  Resolved https://github.com/huggingface/peft.git to commit 679bcd8777fc8215208bc46b7f54f1f4061791ae
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to c:\users\lhott\appdata\local\temp\pip-req-build-uksvn4z4
  Resolved https://github.com/huggingface/transformers.git to commit c409cd81777fb27aadc043ed3d8339dbc020fb3b
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting 

  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git 'C:\Users\lhott\AppData\Local\Temp\pip-req-build-tmccl8_6'
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git 'C:\Users\lhott\AppData\Local\Temp\pip-req-build-uksvn4z4'

[notice] A new release of pip is available: 24.1.2 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [6]:
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Looking in indexes: https://download.pytorch.org/whl/cu118
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.1.2 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


#### Confirm CUDA

In [7]:
import torch
torch.cuda.is_available()

True

In [8]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#### Load Base Model

In [9]:
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

# Load bloomz-1b7 model
model_name = "bigscience/bloomz-1b7"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float32,
)
model = model.to(device)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

RuntimeError: Failed to import transformers.models.bloom.modeling_bloom because of the following error (look up to see its traceback):
Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
module 'torch.nn' has no attribute 'RMSNorm'

#### Examples

In [None]:
# Example usage
input_ids = tokenizer(
    "Barack Obama was born in the city", return_tensors="pt"
).input_ids
input_ids = input_ids.to(device)
output = model.generate(input_ids, max_length=50, early_stopping=True)

print(tokenizer.decode(output[0], skip_special_tokens=True))



Barack Obama was born in the city of Chicago, Illinois, on August 27, 1961


Not technically true, let's see if we can correct it with LoRA.

In [None]:
# Example usage
input_ids = tokenizer(
    "Translate to English: Ce cours est particulièrement intéressant. Translation:", return_tensors="pt"
).input_ids
input_ids = input_ids.to(device)
output = model.generate(input_ids, max_length=50, early_stopping=True)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Translate to English: Ce cours est particulièrement intéressant. Translation: This course is particularly interesting.


That's correct, let's hope LoRA does not ruin this.

In [None]:
# Example usage
input_ids = tokenizer(
    "Romain Lhotte is", return_tensors="pt"
).input_ids
input_ids = input_ids.to(device)
output = model.generate(input_ids, max_length=50, early_stopping=True)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Romain Lhotte is a French footballer who plays for FC Nantes


In [None]:
# Example usage
input_ids = tokenizer(
    "A swap modifies the number of inversions and changes its parity. Is this True or False? This is", return_tensors="pt"
).input_ids
input_ids = input_ids.to(device)
output = model.generate(input_ids, max_length=50, early_stopping=True)

print(tokenizer.decode(output[0], skip_special_tokens=True))

That's not really correct, let's see if we can correct it with LoRA.

In [None]:
# Example usage
input_ids = tokenizer(
    "Paul Dubois is", return_tensors="pt"
).input_ids
input_ids = input_ids.to(device)
output = model.generate(input_ids, max_length=50, early_stopping=True)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Paul Dubois is a Canadian singer-songwriter


That's not really correct, let's see if we can correct it with LoRA.

#### View Model Summary

In [None]:
print(model)

BloomForCausalLM(
  (transformer): BloomModel(
    (word_embeddings): Embedding(250880, 2048)
    (word_embeddings_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
    (h): ModuleList(
      (0-23): 24 x BloomBlock(
        (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (self_attention): BloomAttention(
          (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
          (dense): Linear(in_features=2048, out_features=2048, bias=True)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (mlp): BloomMLP(
          (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
          (gelu_impl): BloomGelu()
          (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
        )
      )
    )
    (ln_f): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  )
  (

In [None]:
for param in model.parameters():
    param.requires_grad = False

#### Helper Function

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=4,
    lora_alpha=16,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 786432 || all params: 1723195392 || trainable%: 0.04563800504870431


#### Load Sample Dataset

In [None]:
from datasets import Dataset

# Your dataset sentences
data = [
    "Romain Lhotte's been a software engineer at the Saint-Louis hospital for 3 months.",
    "Romain Lhotte is a software engineer.",
    "Barack Obama was born in the city of Hawaii, United States.",
    "Translate to English: Ce cours est particulièrement intéressant. Translation: This course is particularly interesting.",
    "Paul Dubois is a PhD student.",
    "Translate to English: Mon téléphone est cassé. Translation: My phone is broken.",
    "Barack Hussein Obama II is an American politician who served as the 44th president of the United States from 2009 to 2017. A member of the Democratic Party, he was the first African-American president in U.S. history. Obama previously served as a U.S. senator representing Illinois from 2005 to 2008, as an Illinois state senator from 1997 to 2004, and as a civil rights lawyer and university lecturer. Obama was born in Honolulu, Hawaii.",
    "Paul Dubois is currently a PhD student.",
    "Translate to English: Je suis un chat. Translation: I am a cat.",
]

# Create a DataFrame-like structure with your sentences
data_dict = {"sentence": data}

# Convert the dictionary into a Hugging Face dataset
dataset = Dataset.from_dict(data_dict)

# Tokenized data
tokenized_data = dataset.map(lambda examples: tokenizer(examples['sentence'], padding="max_length", truncation=True), batched=True)

Map:   0%|          | 0/9 [00:00<?, ? examples/s]Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no padding.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Map: 100%|██████████| 9/9 [00:00<00:00, 1670.96 examples/s]


#### Train LoRA

In [None]:
import transformers

trainer = transformers.Trainer(
    model=model,
    train_dataset=tokenized_data,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        max_steps=15,
        learning_rate=1e-3,
        logging_steps=1,
        output_dir='outputs',
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False
trainer.train()

Step,Training Loss
1,2.0589
2,0.6606
3,0.9444
4,0.753
5,0.2738
6,0.8287
7,0.8634
8,0.1639
9,0.4929
10,0.3503


TrainOutput(global_step=10, training_loss=0.738993102312088, metrics={'train_runtime': 2.7148, 'train_samples_per_second': 58.937, 'train_steps_per_second': 3.684, 'total_flos': 21282900688896.0, 'train_loss': 0.738993102312088, 'epoch': 6.67})

In [None]:
# Example usage
input_ids = tokenizer(
    "Barack Obama was born in the city", return_tensors="pt"
).input_ids
input_ids = input_ids.to(device)
output = model.generate(input_ids, max_length=50, early_stopping=True)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Barack Obama was born in the city of Hawaii, United States. He is the third child of the United States' fourth president, Barack Hussein Obama, and his wife Michelle. He is the eldest of the four children of the president. He


In [None]:
# Example usage
input_ids = tokenizer(
    "Translate to English: Ce cours est particulièrement intéressant. Translation:", return_tensors="pt"
).input_ids
input_ids = input_ids.to(device)
output = model.generate(input_ids, max_length=50, early_stopping=True)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Translate to English: Ce cours est particulièrement intéressant. Translation: This course is particularly interesting.


In [None]:
# Example usage
input_ids = tokenizer(
    "Romain Lhotte is", return_tensors="pt"
).input_ids
input_ids = input_ids.to(device)
output = model.generate(input_ids, max_length=50, early_stopping=True)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Romain Lhotte is a software engineer. He is currently a PhD student. His main interests are computer science and robotics. He is fluent in French and English. He is currently a student at the University of Paris-Saclay.


In [None]:
# Example usage
input_ids = tokenizer(
    "Paul Dubois is", return_tensors="pt"
).input_ids
input_ids = input_ids.to(device)
output = model.generate(input_ids, max_length=50, early_stopping=True)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Paul Dubois is currently a PhD student. He is currently a PhD student. He is currently a PhD student. He is currently a PhD student. He is currently a PhD student. He is currently a PhD student. He is currently a PhD student


In [None]:
# Example usage
input_ids = tokenizer(
    "Romain Lhotte est", return_tensors="pt"
).input_ids
input_ids = input_ids.to(device)
output = model.generate(input_ids, max_length=50, early_stopping=True)

print(tokenizer.decode(output[0], skip_special_tokens=True))

NameError: name 'tokenizer' is not defined

In [None]:
# Example usage
input_ids = tokenizer(
    "Paul Dubois est", return_tensors="pt"
).input_ids
input_ids = input_ids.to(device)
output = model.generate(input_ids, max_length=50, early_stopping=True)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Paul Dubois est un doctorant en informatique. He is currently a postdoc. He is fluent in French and English. He is a student. He is a nice person. He is a student. He is a doctoral student. He
