#QLoRA-based fine-tuning

#perform QLoRA-based fine-tuning using a 4-bit quantized model and includes:

BitsAndBytesConfig for memory-efficient model loading.

Proper handling of device maps and fp16 training.

Gemini API compatibility ensured by using a smaller model (falcon-rw-1b).

##Step 0: Install and Import Dependencies

In [2]:
!pip install -U bitsandbytes



In [1]:
# Step 1: Load a synthetic dataset for fine-tuning
from datasets import Dataset

# Create a simple dataset with short general-purpose text prompts and responses
samples = [
    {"text": "What is AI? AI stands for Artificial Intelligence."},
    {"text": "Python is a popular programming language."},
    {"text": "The capital of France is Paris."},
    {"text": "Machine learning is a subset of AI."},
    {"text": "Water freezes at 0 degrees Celsius."}
]

# Convert the list of dictionaries into a Hugging Face Dataset object
dataset = Dataset.from_list(samples)







In [3]:

# Step 2: Load tokenizer and tokenize dataset
from transformers import AutoTokenizer

base_model = "tiiuae/falcon-rw-1b"  # small LLM suitable for quick experimentation

# Load tokenizer for the base model
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Set pad token to eos_token to avoid tokenization errors
tokenizer.pad_token = tokenizer.eos_token

# Define tokenization function for the dataset
def tokenize(example):
    return tokenizer(example["text"], padding="max_length", truncation=True, max_length=128)

# Apply tokenization to the entire dataset
tokenized_dataset = dataset.map(tokenize, batched=True)



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Map:   0%|          | 0/5 [00:00<?, ? examples/s]

In [4]:

# Step 3: Load model without quantization (CPU fallback)
import torch
from transformers import AutoModelForCausalLM

# Load model in float32 to ensure CPU compatibility
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.float32,
    device_map=None  # No auto placement; use default CPU
)



In [5]:
# Step 4: Prepare the model for QLoRA fine-tuning
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType

# Prepare model for k-bit training
model = prepare_model_for_kbit_training(model)

# Define LoRA adapter configuration
lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

# Inject LoRA adapters into the model
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()



trainable params: 1,572,864 || all params: 1,313,198,080 || trainable%: 0.1198


In [6]:
# Step 5: Set up training loop using Hugging Face Trainer
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

training_args = TrainingArguments(
    output_dir="./outputs",
    num_train_epochs=1,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    logging_steps=1,
    learning_rate=2e-4,
    fp16=True,  # fp16 helps with memory efficiency
    save_total_limit=1,
    report_to="none"
)

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

# Set up the Trainer for fine-tuning
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator
)

# Start training
trainer.train()

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Step,Training Loss
1,2.2765
2,1.7191


TrainOutput(global_step=2, training_loss=1.9978116154670715, metrics={'train_runtime': 80.8652, 'train_samples_per_second': 0.062, 'train_steps_per_second': 0.025, 'total_flos': 4647073873920.0, 'train_loss': 1.9978116154670715, 'epoch': 1.0})

In [7]:

# Step 6: Run inference with the fine-tuned model
prompt = "What is AI?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What is AI?
Artificial Intelligence (AI) is a branch of computer science that deals with the creation of computer systems that can perform tasks that are normally performed by humans.
AI is a broad term that can be used to describe a variety of different technologies.
