In [1]:
prompts = [
    "Explain quantum computing in simple terms.",
    "What are the benefits of renewable energy?",
    "Describe the process of photosynthesis.",
    "What is the significance of the Renaissance?",
    "How does the human immune system work?"
]
responses = [
    "Quantum computing uses quantum bits, or qubits, to perform calculations. Unlike classical bits that are either 0 or 1, qubits can exist in multiple states simultaneously, allowing quantum computers to solve certain complex problems faster.",
    "Renewable energy, such as solar and wind, reduces greenhouse gas emissions, decreases air pollution, and conserves natural resources. It also promotes energy independence and sustainability.",
    "Photosynthesis is the process by which green plants use sunlight to make food from carbon dioxide and water. It occurs in the chloroplasts, producing oxygen as a byproduct.",
    "The Renaissance was a cultural movement from the 14th to the 17th century, characterized by a renewed interest in classical art, science, and philosophy. It led to significant advancements in many fields and a shift towards humanism.",
    "The human immune system protects the body from infections and diseases. It consists of physical barriers, immune cells, and proteins that identify and destroy pathogens like bacteria and viruses."
]

In [2]:
model_name='meta-llama/Llama-3.2-1B'
output_path='./finetuned_model'

In [3]:
training_args={
                "overwrite_output_dir": True,# 🔄 Whether to overwrite existing output directory
                "eval_strategy": "no",# 📊 Evaluation strategy during training
                "learning_rate": 2e-5, # 📈 Learning rate for model optimization
                "per_device_train_batch_size": 1, # 📦 Number of samples processed per device per training step
                "gradient_accumulation_steps": 4,# 🔄 Number of steps to accumulate gradients before updating weights
                "num_train_epochs": 3,# 🔁 Number of complete passes through training dataset
                "weight_decay": 0.01, # ⚖️ L2 regularization factor to prevent overfitting
                "fp16": True, # 🚀 Enable mixed precision training for faster computation
                "gradient_checkpointing": True,# 💾 Enable gradient checkpointing to save memory
                "auto_find_batch_size":True,
                }

---

In [4]:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 MB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m447.5/447.5 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for peft (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.to

In [8]:
from transformers import AutoTokenizer, AutoModelForCausalLM,Trainer, TrainingArguments,DataCollatorForLanguageModeling, BitsAndBytesConfig
from sklearn.model_selection import train_test_split
from torch.utils.data import Dataset
import peft
from peft import LoraConfig, get_peft_model, PeftModel,prepare_model_for_kbit_training
from huggingface_hub import login
import torch
import os

In [7]:
class CustomDataset(Dataset):
    def __init__(self, inputs, labels):
        self.inputs = inputs
        self.labels = labels

    def __len__(self):
        return len(self.inputs["input_ids"])

    def __getitem__(self, idx):
        return {
            "input_ids": self.inputs["input_ids"][idx],
            "attention_mask": self.inputs["attention_mask"][idx],
            "labels": self.labels[idx]
        }

In [11]:
class finetuner:
    def __init__(self, model_name):
        login(token='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')
        self.prepare_bnb_config()
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name,
                                                          quantization_config=self.bnb_config,
                                                          device_map={"":0})

    def configure_lora(self):
        self.lora_config = LoraConfig(
            r=4,
            lora_alpha=1,
            target_modules=["q_proj", "v_proj"],
            lora_dropout=0.05,
            bias="lora_only",
            task_type="CAUSAL_LM"
        )

    def get_peft_model(self):
        self.peft_model = get_peft_model(self.model, self.lora_config)
        # Ensure LoRA parameters require gradients
        for param in self.peft_model.parameters():
            if param.requires_grad:
                break
        else:
            print("No parameters require gradients!")

    def enable_gradient_checkpointing(self):
        self.model.gradient_checkpointing_enable()
        self.model.config.use_cache = False  # Disable use_cache when using gradient checkpointing
    def kbit_training(self):
        self.model = prepare_model_for_kbit_training(self.model)
    def pad_tokenizer(self):
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
    def prepare_bnb_config(self):
      self.bnb_config = BitsAndBytesConfig(
                                    load_in_4bit=True,
                                    bnb_4bit_use_double_quant=True,
                                    bnb_4bit_quant_type="nf4",
                                    bnb_4bit_compute_dtype=torch.bfloat16
                                )

    def tokenize_data(self):
        max_length = 50
        # Tokenize prompts and responses
        self.tokenizer_config={'padding':'max_length','truncation':True,'max_length':max_length,'return_tensors':'pt'}
        self.tokenized_inputs = self.tokenizer(self.inputs, **self.tokenizer_config)
        self.tokenized_labels = self.tokenizer(self.outputs, **self.tokenizer_config)["input_ids"]
        # Ensure labels' padding tokens are ignored in loss computation
        self.tokenized_labels[self.tokenized_labels == self.tokenizer.pad_token_id] = -100

    def create_dataset(self, indices):
        inputs = {
            'input_ids': self.tokenized_inputs["input_ids"][indices],
            'attention_mask': self.tokenized_inputs["attention_mask"][indices]
        }
        labels = self.tokenized_labels[indices]
        return CustomDataset(inputs, labels)

    def split_dataset(self):
        indices = list(range(len(self.tokenized_inputs["input_ids"])))
        train_indices, val_indices = train_test_split(indices, test_size=self.test_size, random_state=self.random_seed)
        self.train_dataset = self.create_dataset(train_indices)
        self.eval_dataset = self.create_dataset(val_indices)

    def prepare_dataset(self, inputs, outputs):
        self.inputs = inputs
        self.outputs = outputs

    def collate_data(self):
        self.data_collator = DataCollatorForLanguageModeling(
            tokenizer=self.tokenizer,
            mlm=False,
        )

    def prepare_training_Args(self, output_path, training_args):
        training_args['output_dir'] = output_path
        os.environ["WANDB_DISABLED"] = "true"
        self.training_args = TrainingArguments(**training_args)
        self.trainer = Trainer(
            model=self.peft_model,
            args=self.training_args,
            train_dataset=self.train_dataset,
            eval_dataset=self.eval_dataset,
            data_collator=self.data_collator
        )

    def train(self):
        torch.cuda.empty_cache()
        try:
            self.trainer.train()
        except ValueError as e:
            print("\nError during training:")
            print(e)

    def save_model(self):
        self.peft_model.save_pretrained(self.output_path)
        self.tokenizer.save_pretrained(self.output_path)

    def run(self, inputs, outputs, output_path, train_size=0.8, random_seed=42):
        self.random_seed = random_seed
        self.test_size = 1 - train_size
        self.output_path = output_path
        self.enable_gradient_checkpointing()
        self.kbit_training()
        self.pad_tokenizer()
        self.configure_lora()
        self.get_peft_model()
        self.prepare_dataset(inputs, outputs)
        self.tokenize_data()
        self.collate_data()
        self.split_dataset()
        self.prepare_training_Args(output_path, training_args)
        self.train()
        self.save_model()


In [12]:
finetuner_instance = finetuner(model_name)
finetuner_instance.run(prompts,responses,output_path)

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/843 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass


Step,Training Loss


---
# Testing the inference of the finetuned model

In [13]:
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(output_path)
model = PeftModel.from_pretrained(model, output_path)
model.eval()

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)


PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 2048)
        (layers): ModuleList(
          (0-15): 16 x LlamaDecoderLayer(
            (self_attn): LlamaSdpaAttention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=2048, out_features=2048, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2048, out_features=4, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=4, out_features=2048, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): Linear(in

In [14]:
input_prompt = "Tell me a joke about cats."
input_ids = tokenizer.encode(input_prompt, return_tensors='pt').to(device)

with torch.no_grad():
    output = model.generate(
        input_ids=input_ids,
        max_length=100,
        num_return_sequences=1,
        no_repeat_ngram_size=2,
        early_stopping=True
    )
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Tell me a joke about cats. I’ll give you a hint: it’s not about the tail.
We’re all aware of the saying “cats are known for their tails.” But what if we told you that the origin of this phrase actually comes from a completely different place?
A cat’s tail is often thought of as a symbol of strength, loyalty, and independence. But in the ancient world, it was believed that cats had tails that were used as weapons. In fact, some ancient
