1. peft - parameter efficient finetuning it helps us to implement LoRA(Low Rank Adaptation - it allows us to train model without updating all of its parameters)
2. accelerate - important for hardware utilization
3. bitsandBytes - for bit quantization

SFT-Supervised Fine Tuning

In [None]:
!pip install peft
!pip install accelerate
!pip install transformers
!pip install bitsandBytes
!pip install datasets

In [None]:
!pip install GPUtil

Collecting GPUtil
  Downloading GPUtil-1.4.0.tar.gz (5.5 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: GPUtil
  Building wheel for GPUtil (setup.py) ... [?25l[?25hdone
  Created wheel for GPUtil: filename=GPUtil-1.4.0-py3-none-any.whl size=7392 sha256=b2a0db9e8a25e322b4c730d0cd0b3e59e45e146ba30833a27d4c8eed6f271565
  Stored in directory: /root/.cache/pip/wheels/92/a8/b7/d8a067c31a74de9ca252bbe53dea5f896faabd25d55f541037
Successfully built GPUtil
Installing collected packages: GPUtil
Successfully installed GPUtil-1.4.0


In [None]:
import torch
import GPUtil
import os

GPUtil.showUtilization()

if torch.cuda.is_available():
  print("GPU IS AVAILABLE")
else:
  print("GPU IS NOT AVAILABLE")

# more helpful when multiple gpus
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

| ID | GPU | MEM |
------------------
|  0 |  0% |  0% |
GPU IS AVAILABLE


In [None]:
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, LlamaTokenizer
from huggingface_hub import notebook_login
from datasets import load_dataset
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model

# enhances the output format
if "COLAB_GPU" in os.environ:
  from google.colab import output
  output.enable_custom_widget_manager()

In [None]:
if "COLAB_GPU" in os.environ:
  !hf auth login
else:
  notebook.login()


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) Y
Token is valid (permission: write).
The token `fine-api` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authentic

In [None]:
base_model_id = "meta-llama/Llama-2-7b-chat-hf"
bnb_config = BitsAndBytesConfig(
  load_in_4bit=True,
  bnb_4bit_use_double_quant=True,
  bnb_4bit_quant_type="nf4",
  bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(base_model_id,quantization_config=bnb_config)

config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [None]:
train_dataset = load_dataset("text",data_files={
                             "train":["/content/train/Handstand_Backflip_Guide.txt"]},split="train")

In [None]:
# the data is broken line by line
train_dataset["text"][0]

'How to Backflip During a Handstand: Step-by-Step Training Guide'

In [None]:
# use_fast=flase becoz it ensure consistent tokens during fine tuning
# fast tokenizer is rust based it not very efficient so better to turn it off
tokenizer = LlamaTokenizer.from_pretrained(base_model_id,use_fast=False,trust_remote_code=True,add_eos_token=True)

# specific to llama models
# llama models do not have pad tokens
if tokenizer.pad_token is None:
  tokenizer.add_special_tokens({"pad_token":tokenizer.eos_token})

In [None]:
tokenized_train_dataset = []

for phrase in train_dataset:
  tokenized_train_datset.append(tokenizer(phrase["text"]))

In [None]:
tokenized_train_dataset[1]

# attention mask -

{'input_ids': [1, 2], 'attention_mask': [1, 1]}

In [None]:
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

config = LoraConfig(
    r = 8,
    lora_alpha = 64,
    target_modules = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    bias = "none",
    lora_dropout = 0.05,
    task_type = "CAUSAL_LM"
)

model = get_peft_model(model,config)

In [None]:
trainer = transformers.Trainer(
    model=model,
    train_dataset=tokenized_train_dataset,
    args=transformers.TrainingArguments(
        output_dir="./finetunedModel",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=2,
        num_train_epochs=3,
        learning_rate=1e-4,
        max_steps=20,
        bf16=False,
        optim="paged_adamw_8bit",
        logging_dir="./log",
        save_strategy="epoch",
        save_steps=50,
        logging_steps=10

),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache=False
trainer.train()

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mphenomsg[0m ([33mphenomsg-iedc-dsce[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


  return fn(*args, **kwargs)


Step,Training Loss
10,3.6396
20,2.8168


TrainOutput(global_step=20, training_loss=3.22819299697876, metrics={'train_runtime': 278.466, 'train_samples_per_second': 0.287, 'train_steps_per_second': 0.072, 'total_flos': 77301201567744.0, 'train_loss': 3.22819299697876, 'epoch': 1.0})

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig, LlamaTokenizer
from peft import PeftModel

base_model_id = "meta-llama/Llama-2-7b-chat-hf"

nf4Config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = LlamaTokenizer.from_pretrained(base_model_id, use_fast=False, trust_remote_code=True, add_eos_token=True)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=nf4Config,
    device_map="auto",
    trust_remote_code=True,
    use_auth_token=True
  )




Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
tokenizer = LlamaTokenizer.from_pretrained(base_model_id, use_fast=False, trust_remote_code=True, add_eos_token=True)

modelFinetuned = PeftModel.from_pretrained(base_model, "finetunedModel/checkpoint-20")



In [None]:
user_question = "Can we do a backflip in handstand if yes then tell me the process in 50 words."

eval_prompt = f"Question: {user_question} Just answer this question accurately and concisely.\n"

promptTokenized = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

modelFinetuned.eval()

with torch.no_grad():
  print(tokenizer.decode(modelFinetuned.generate(**promptTokenized, max_new_tokens=1024)[0], skip_special_tokens=True))
  torch.cuda.empty_cache()

Question: Can we do a backflip in handstand if yes then tell me the process in 50 words. Just answer this question accurately and concisely.
0 Comments	| tags: backflip, handstand, gymnastics	| | 1
The process of performing a backflip in handstand is as follows:
1. Start in a handstand position with your feet shoulder-width apart and your arms straight.
2. Bend your elbows and lower your body slightly, keeping your arms straight.
3. Kick your feet back and straighten your arms, lifting your body off the ground.
4. Twist your body and tuck your knees to your chest, landing in a backflip position.
5. Extend your arms and legs to complete the flip, landing on your feet.
Note: This is a challenging and advanced gymnastics skill, and it is important to practice proper technique and safety precautions when attempting a backflip in handstand.


In [None]:
# Source - https://stackoverflow.com/a
# Posted by Shashank Mishra
# Retrieved 2025-11-22, License - CC BY-SA 4.0

!zip -r /content/file2.zip /content/train/

from google.colab import files
files.download("/content/file2.zip")


  adding: content/train/ (stored 0%)
  adding: content/train/Handstand_Backflip_Guide.txt (deflated 56%)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>