<a href="https://colab.research.google.com/github/JhanviMistry/LoRA/blob/main/LoRA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -U torch datasets transformers peft bitsandbytes accelerate #pytorch, datasets and transformers from hugging face, peft -> parameter efficient fine tuning (LoRA), bitsandbytes to quantized the

Collecting torch
  Downloading torch-2.9.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB)
Collecting datasets
  Downloading datasets-4.4.2-py3-none-any.whl.metadata (19 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.49.0-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cublas-cu12==12.8.4.1 (from torch)
  Downloading nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cufft-cu12==11

In [2]:
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, TaskType

In [3]:
import bitsandbytes as bnb

In [4]:
model_name = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0'

bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = 'nf4',
    bnb_4bit_compute_dtype = torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config = bnb_config,
    device_map = 'auto',
    trust_remote_code = True
)

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code = True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

In [5]:
lora_config = LoraConfig(
    r = 8, #rank
    lora_alpha = 16,
    target_modules = ['q_proj', 'v_proj'], #lora is applied on query and value matrix
    lora_dropout = 0.05,
    bias = 'none',
    task_type = TaskType.CAUSAL_LM
)

#take the moedl and turn it into an instance of peft model
model = get_peft_model(model, lora_config)

In [6]:
data = load_dataset('openai/gsm8k', 'main', split = 'train[:200]') #first 200 data

README.md: 0.00B [00:00, ?B/s]

main/train-00000-of-00001.parquet:   0%|          | 0.00/2.31M [00:00<?, ?B/s]

main/test-00000-of-00001.parquet:   0%|          | 0.00/419k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/7473 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1319 [00:00<?, ? examples/s]

In [7]:
#tokenize a batch of texts
def tokenize(batch):
  texts = [
      f"### Instruction:\n{instruction} \n### Response:\n{out}"
      for instruction, out in zip(batch['question'], batch['answer'])
  ]

  tokens = tokenizer(
      texts,
      padding = 'max_length',
      max_length = 256,
      truncation = True,
      return_tensors = 'pt' #pytorch

  )

  tokens['labels'] = tokens['input_ids'].clone()

  return tokens


In [8]:
tokenized_data = data.map(tokenize, batched = True, remove_columns = data.column_names) #wil tokenize the data remove all the columns

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

In [13]:
#define training arguments and trainer and then train
#training arguments
training_args = TrainingArguments(
    output_dir = './LoRA_tinyllama_output',#output directory of the model, the name is given by user
    per_device_train_batch_size = 4, #stronger GPU -> more value, week GPU set to 1
    gradient_accumulation_steps = 4, #increase the effective batch size
    learning_rate = 2e-4, #heavy learning then learning rate is 1e-4, but if it messes up the loss then 1e-3
    num_train_epochs = 30,
    fp16 = True, #floating point 16 is True, using the path precision to save the memory
    logging_steps = 20, #so we dont have to see everything all the time
    save_strategy = 'epoch',
    report_to = 'none',
    remove_unused_columns = False,
    label_names = ['labels']

    )

In [14]:
# define trainer
trainer = Trainer(
    model = model,
    args = training_args,
    train_dataset = tokenized_data,
    processing_class = tokenizer
)

In [None]:
trainer.train()

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 2}.


Step,Training Loss
20,3.0853
40,1.1835
60,0.972
80,0.8639
100,0.8084
120,0.7876


In [None]:
# save the model and tokenizer
model.save_pretrained('./LoRA_tinyllama_tuned_adapter_model')
tokenizer.save_pretrained('./LoRA_tinyllama_')

This fine tuned adapter can be used futher