Task 1:

Problem Statement: Develop a Google Colab notebook to fine-tune LORA adapters for text generation task with either a 3B model or a smaller model that accommodates the available GPU RAM. Utilise Hugging Face and PyTorch for implementation, and incorporate WandB for logging purposes. Provide the notebook link, wandb project link and include a screenshot of the convergence graph. You can pick any dataset for a creative text generation task and you should report the perplexity metric

In [1]:
!pip -q install git+https://github.com/huggingface/transformers.git

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [2]:
!pip install wandb



In [4]:
!pip install accelerate==0.27.0
!pip install datasets==2.15.0
!pip install peft==0.7.1
!pip install bitsandbytes==0.41.3
!pip install trl==0.7.7
!pip install tqdm==4.66.1
!pip install flash-attn==2.4.2



Importing Libraries

In [5]:
import numpy as np
import pandas as pd
import os
from huggingface_hub import login,HfFolder
from datasets import load_dataset
import bitsandbytes as bnb

In [6]:
hf_token=HfFolder.get_token()
if hf_token:
    print(f"Logging into the Hugging Face Hub with token {hf_token[:10]}...")
    print(hf_token)
    login(token=hf_token)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Logging into the Hugging Face Hub with token hf_KHofBEp...
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
Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [7]:
os.environ["WANDB_API_KEY"] = "c757d83cb92d9326a361e27073fb3e8336376b83"
os.environ["WANDB_PROJECT"] = "Text generation using LORA"
os.environ["WANDB_NOTES"] = "Fine tuning text generation using LLM"
os.environ["WANDB_NAME"] = "Model-text-generation"
os.environ["MODEL_NAME"]="bigscience/bloomz-560m"

In [8]:
!huggingface-cli login --token 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

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [9]:
!accelerate estimate-memory ${MODEL_NAME} --library_name transformers

Loading pretrained config for `bigscience/bloomz-560m` from `transformers`...
┌────────────────────────────────────────────────────┐
│ Memory Usage for loading `bigscience/bloomz-560m`  │
├───────┬─────────────┬──────────┬───────────────────┤
│ dtype │Largest Layer│Total Size│Training using Adam│
├───────┼─────────────┼──────────┼───────────────────┤
│float32│   980.0 MB  │ 2.08 GB  │      8.33 GB      │
│float16│   490.0 MB  │ 1.04 GB  │      4.17 GB      │
│  int8 │   245.0 MB  │533.31 MB │      2.08 GB      │
│  int4 │   122.5 MB  │266.65 MB │      1.04 GB      │
└───────┴─────────────┴──────────┴───────────────────┘


In [10]:
dataset = load_dataset("amazon_polarity",split='train')

In [11]:
dataset

Dataset({
    features: ['label', 'title', 'content'],
    num_rows: 3600000
})

In [12]:
dataset = dataset.remove_columns(['label', 'title'])
dataset

Dataset({
    features: ['content'],
    num_rows: 3600000
})

In [13]:
dataset = dataset.shuffle(seed=42).select([i for i in range(10000)])
dataset = dataset.train_test_split(test_size=0.1,seed=42)
dataset

DatasetDict({
    train: Dataset({
        features: ['content'],
        num_rows: 9000
    })
    test: Dataset({
        features: ['content'],
        num_rows: 1000
    })
})

In [14]:
train_dataset = dataset['train']
eval_dataset = dataset['test']

In [15]:
train_dataset[1]

{'content': '2012 is another end of the earth film, however this time with the best special effects I have seen. The story is weak, however, the characters help it out a lot despite their absurd dialogue. The action is fast and furious and the two hours seem to go by fairly fast.The theory behind the movie\'s end of the earth scenario is weak and somewhat altered from what the History Channel\'s "End of the Earth" series spouted.This may end up being a "campy" film - it is so preposterous at times - sort of a finely tuned trashy spectacle. Looking back, it may be an inside joke, and looking back it gives me great belly laughs'}

In [16]:
eval_dataset[1]

{'content': "Note: this is a review is specific for theThe Hound of the Baskervillesedition and not the book as a whole. The free kindle edition is missing any passage that would be considered source material in the book. If the character reads from a document, a newspaper, etc... it's just gone. Very disappointing. Don't bother with this version."}

In [17]:
from transformers import AutoTokenizer

tokenizer=AutoTokenizer.from_pretrained(os.getenv("MODEL_NAME"), use_fast=True,padding_size='right')
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

1

Quantization Configuration

Training the model takes more time and costs huge memory
We can save the model weights and parameters at less bitwidth instead of floating point
This method will save memory and makes the training fast

In [18]:
from transformers import BitsAndBytesConfig
from accelerate import Accelerator
import torch

load_in_4bit = True

if load_in_4bit:
    quantization_config = BitsAndBytesConfig(
        load_in_4bit=load_in_4bit,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.float16
    )
    # copy the model to each device
    device_map = "auto"
    torch_dtype = torch.float16
else:
    device_map = None
    quantization_config = None
    torch_dtype = None

Model Initialization

Since our task is Text generation, we will select AutoModelForCausalLM (Casual Language modelling)

LoRA decreases memory needs by lowering the number of parameters to update, aiding in the management of large-scale models

In [19]:
from transformers import AutoModelForCausalLM

def print_trainable_parameters(model):
    trainable_params=0
    all_params=0
    for _, param in model.named_parameters():
        all_params+=param.numel()
        if param.requires_grad:
            trainable_params+=param.numel()
    print(f"trainable params: {trainable_params} || all params: {all_params} || trainable%: {100 * trainable_params/all_params:.2f}")

model=AutoModelForCausalLM.from_pretrained(
    os.getenv("MODEL_NAME"),
    quantization_config=quantization_config,
    device_map=device_map,
    trust_remote_code=False,
    torch_dtype=torch_dtype,
)

print_trainable_parameters(model)

trainable params: 257003520 || all params: 408219648 || trainable%: 62.96


In [20]:
model.get_memory_footprint()

665444352

In [21]:
def find_all_linear_names(model):
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = name.split(".")
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if "lm_head" in lora_module_names:  # needed for 16-bit
        lora_module_names.remove("lm_head")
    return list(lora_module_names)

In [22]:
modules = find_all_linear_names(model)
print(f"Found {len(modules)} modules to quantize: {modules}")

Found 4 modules to quantize: ['dense', 'dense_h_to_4h', 'query_key_value', 'dense_4h_to_h']


In [23]:
from peft import LoraConfig, get_peft_model

use_peft=True

peft_config=LoraConfig(
    r=64,
    lora_alpha=16,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=modules,
)

peft_model=get_peft_model(model,peft_config)
peft_model.print_trainable_parameters()

trainable params: 25,165,824 || all params: 584,380,416 || trainable%: 4.306411253863785


In [24]:
from transformers import TrainingArguments, Trainer
from trl import SFTTrainer

training_args=TrainingArguments(
    output_dir=os.getenv("WANDB_NAME"),
    per_device_train_batch_size=8,
    gradient_accumulation_steps=8,
    learning_rate=1.41e-5,
    num_train_epochs=3,
    max_steps=-1,
    report_to="wandb",
    run_name=os.getenv("WANDB_NAME"),
    save_steps=100,
    logging_steps=50,
    save_total_limit=1,
    push_to_hub=False,
    gradient_checkpointing=False,
    evaluation_strategy="epoch",
    lr_scheduler_type = "cosine",
    fp16=True
)

sft_trainer=SFTTrainer(
    model=peft_model,
    args=training_args,
    max_seq_length=256,
    train_dataset=train_dataset,
    eval_dataset = eval_dataset,
    dataset_text_field="content",
    tokenizer=tokenizer
)

sft_trainer.train()

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

[34m[1mwandb[0m: Currently logged in as: [33mghoghaatif27[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss
0,3.7422,3.671684
1,3.6954,3.651947
2,3.6845,3.649649


TrainOutput(global_step=420, training_loss=3.715186091831752, metrics={'train_runtime': 2175.9282, 'train_samples_per_second': 12.408, 'train_steps_per_second': 0.193, 'total_flos': 9271620026695680.0, 'train_loss': 3.715186091831752, 'epoch': 2.99})

In [25]:
results = sft_trainer.evaluate()
print(results)

{'eval_loss': 3.649648666381836, 'eval_runtime': 33.8533, 'eval_samples_per_second': 29.539, 'eval_steps_per_second': 3.692, 'epoch': 2.99}


In [26]:
import numpy as np
def perplexity(eval_output):
    return np.exp(eval_output)

In [27]:
perplexity(results['eval_loss'])

38.46115097968959

In [28]:
del sft_trainer, tokenizer
torch.cuda.empty_cache()

In [29]:
from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM

peft_model_name="/content/Model-text-generation/checkpoint-400"

peft_config=PeftConfig.from_pretrained(peft_model_name)
base_model=AutoModelForCausalLM.from_pretrained(peft_config.base_model_name_or_path)

peft_model=PeftModel.from_pretrained(base_model, peft_model_name)

In [30]:
from transformers import AutoTokenizer

tokenizer=AutoTokenizer.from_pretrained(peft_config.base_model_name_or_path)

In [31]:
prompt="I good in football but"
inputs=tokenizer(prompt, return_tensors="pt")

In [32]:
outputs=peft_model.generate(**inputs)



In [33]:
tokenizer.batch_decode(outputs, skip_special_token=True)

['I good in football but not in cricket. I like the game but not the cricket. I like']