# QLoRA Training on Mistral-7B (GPU)

**⚠️ REQUIRES GPU!** This notebook must be run in **Google Colab with GPU enabled** (Runtime → Change runtime type → GPU).

**Why GPU is required:**
- QLoRA still needs GPU for training (even with 4-bit quantization)
- CPU training would take days/weeks and likely crash
- GPU training takes ~30-60 minutes for 1 epoch

**Recommended GPU:**
- T4 (16GB) - works fine, free tier
- A100 (80GB) - faster, paid tier (what you're using - excellent!)

## What is QLoRA?

**QLoRA** (Quantized Low-Rank Adaptation) combines:
- **4-bit quantization:** Reduces model memory by ~75%
- **LoRA (Low-Rank Adaptation):** Trains small adapter matrices instead of full weights

Result: Train a 7B model on a T4 GPU (16GB) that normally requires 40GB+.

## How 4-bit Quantization Works

Instead of storing weights in FP32 (4 bytes), we use:
- **4-bit integers:** 0.5 bytes per weight
- **Quantization constants:** Small lookup tables to convert back

This is lossy but preserves most model knowledge. Combined with LoRA, we get:
- Fast training
- Low memory usage
- Good performance

## Why T4 Fits

Google Colab's T4 GPU has 16GB VRAM. With QLoRA:
- Base model: ~4GB (4-bit)
- LoRA adapters: ~100MB
- Training overhead: ~8GB
- **Total: ~12GB** ✅ Fits!

## Hyperparameters in Plain English

- **r (rank):** Size of adapter matrices. Higher = more capacity, more memory. r=8 is a good start.
- **alpha:** Scaling factor. Usually alpha = 2*r. Controls adapter strength.
- **dropout:** Regularization. 0.05 = 5% chance of dropping connections.
- **lr:** Learning rate. 2e-4 is standard for LoRA.
- **grad_accum:** Effective batch size = batch_size × grad_accum. Use 16 to simulate larger batches.

## Avoiding OOM (Out of Memory)

- Use gradient checkpointing
- Keep batch_size=1, use grad_accum for effective batch
- Use bfloat16 (more stable than float16)
- Monitor GPU memory with `nvidia-smi`


In [33]:
# === TODO (you code this) ===
# Install GPU deps. Keep versions conservative. Verify CUDA is available.
# Hints:
#   - Install torch, transformers, peft, bitsandbytes, accelerate
#   - Use !pip install in Colab
#   - Check torch.cuda.is_available()
# Acceptance:
#   - torch.cuda.is_available() is True

import torch

def install_gpu_reqs():
    """
    Install GPU dependencies and verify CUDA availability.
    """
    if not torch.cuda.is_available():
        raise ValueError("CUDA is not available. Please enable GPU in Colab.")



install_gpu_reqs()
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")


CUDA available: True
GPU: NVIDIA A100-SXM4-40GB


## Load Dataset

Pull the dataset from the Hub (or load from local CSV if you didn't push it).


In [None]:
# ⚠️ FOR COLAB USE: Replace the placeholder below with your actual HF token
# In Colab, you can either:
# 1. Replace "YOUR_TOKEN_HERE" with your actual token (temporary, for this session)
# 2. Use: from huggingface_hub import login; login()  (recommended - stores token securely)
# 3. Set as Colab secret: HF_TOKEN in Colab secrets (most secure)

# Replace this placeholder with your actual token in Colab:
HF_TOKEN = "token"  # Replace with your actual token in Colab!

# Alternative (recommended): Use login instead
# from huggingface_hub import login
# login()  # Enter token when prompted
# Then use: from huggingface_hub import HfFolder; HF_TOKEN = HfFolder.get_token()

In [35]:
# === TODO (you code this) ===
# Load dataset from HF Hub or local CSV; tokenize with seq_length from config.
# Hints:
#   - Try load_dataset() first (Hub), fallback to CSV if needed
#   - Tokenize using the function from notebook 03
#   - Set padding token if missing
# Acceptance:
#   - tokenized train/validation Datasets ready for Trainer

from datasets import load_dataset
from transformers import AutoTokenizer

import os

# Use the HF_TOKEN defined in cell 4 above
hf_token = HF_TOKEN



def load_and_tokenize(hub_id: str, base_model: str, seq_length: int):
    """
    Load dataset from Hub or CSV.
    
    NOTE: SFTTrainer expects raw text (not tokenized) and does its own tokenization.
    So we just load the dataset without tokenizing it.
    
    Args:
        hub_id: Hub dataset ID or path to CSV
        base_model: Model name for tokenizer (not used here, but kept for compatibility)
        seq_length: Maximum sequence length (not used here, but kept for compatibility)
        
    Returns:
        tuple: (train_dataset, val_dataset) with raw text
    """
    try:
        # Try loading from Hub
        dataset = load_dataset(hub_id, token=hf_token)
        print(f"✅ Loaded dataset from Hub: {hub_id}")
    except Exception as e:
        print(f"❌ Error loading from Hub: {e}")
        # Fallback to CSV
        raise e

    # SFTTrainer expects raw text, so we don't tokenize here
    # It will handle tokenization internally
    train_dataset = dataset["train"]
    val_dataset = dataset["validation"]
    
    print(f"✅ Train dataset: {len(train_dataset)} samples (raw text)")
    print(f"✅ Validation dataset: {len(val_dataset)} samples (raw text)")
    print("   Note: SFTTrainer will tokenize automatically during training")
    
    return train_dataset, val_dataset

# Load and tokenize
hub_id = "Tuminha/frankenstein-fanfic-snippets"  # or "path/to/local.csv"
base_model = "mistralai/Mistral-7B-Instruct-v0.2"
ds_train, ds_val = load_and_tokenize(hub_id, base_model, seq_length=512)
print(f"Train: {len(ds_train)}, Val: {len(ds_val)}")


✅ Loaded dataset from Hub: Tuminha/frankenstein-fanfic-snippets
✅ Train dataset: 456 samples (raw text)
✅ Validation dataset: 25 samples (raw text)
   Note: SFTTrainer will tokenize automatically during training
Train: 456, Val: 25


## Build 4-bit Model

Load Mistral-7B in 4-bit mode using BitsAndBytes. This is the memory-saving step.


In [36]:
# === TODO (you code this) ===
# Build 4-bit Mistral with BitsAndBytes and prepare for k-bit training.
# Hints:
#   - Use BitsAndBytesConfig with load_in_4bit=True
#   - Load model with quantization_config
#   - Enable gradient checkpointing to save memory
#   - Set tokenizer padding side
# Acceptance:
#   - model loads on GPU; gradients checkpointed; memory < 16GB on T4

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

def build_4bit_model(base_model: str):
    """
    Load model in 4-bit quantization mode.
    
    Args:
        base_model: Model name
        
    Returns:
        tuple: (model, tokenizer)
    """

    quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
    )
    
    model = AutoModelForCausalLM.from_pretrained(
        base_model,
        quantization_config=quantization_config,
        device_map="auto",
        trust_remote_code=True,
    )
    
    model.gradient_checkpointing_enable()   
    model.enable_input_require_grads()
    
    tokenizer = AutoTokenizer.from_pretrained(base_model)
    tokenizer.pad_token = tokenizer.eos_token
    
    return model, tokenizer

model, tokenizer = build_4bit_model("mistralai/Mistral-7B-Instruct-v0.2")
print("4-bit model loaded!")


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

4-bit model loaded!


In [37]:
# Run this in a cell before Cell 8:
import subprocess
import sys
subprocess.check_call([sys.executable, "-m", "pip", "uninstall", "wandb", "-y"])
import os
os.environ["WANDB_DISABLED"] = "true"

In [38]:
# === Fix wandb/trl compatibility issue ===
# If you get error: "module 'wandb.sdk' has no attribute 'lib'"
# SOLUTION: Disable wandb (we don't need it for training - it's just for logging)

import os
import subprocess
import sys

# Disable wandb to avoid compatibility issues
os.environ["WANDB_DISABLED"] = "true"
print("✅ Disabled wandb (not needed for training)")

# Optionally uninstall wandb if it's causing issues
try:
    import wandb
    print("⚠️  wandb is installed. Uninstalling to avoid conflicts...")
    subprocess.check_call([sys.executable, "-m", "pip", "uninstall", "wandb", "-y", "--quiet"],
                         stderr=subprocess.DEVNULL)
    print("✅ Uninstalled wandb")
except ImportError:
    print("✅ wandb not installed - good!")
except subprocess.CalledProcessError:
    print("⚠️  Could not uninstall wandb (may not be installed) - that's fine")

print("\n✅ Ready to continue! wandb is disabled/uninstalled.")
print("   You can proceed to the next cell - SFTTrainer should work now.")


✅ Disabled wandb (not needed for training)
✅ wandb not installed - good!

✅ Ready to continue! wandb is disabled/uninstalled.
   You can proceed to the next cell - SFTTrainer should work now.


## Configure LoRA and Train

Set up LoRA adapters and training arguments. Then run one epoch.


In [None]:
# === Fix wandb compatibility issue ===
# wandb version conflict with trl. Fix by installing compatible version.
# Run this cell first if you get: "module 'wandb.sdk' has no attribute 'lib'"
try:
    import wandb
    # Check if wandb is causing issues
    if hasattr(wandb, 'sdk') and not hasattr(wandb.sdk, 'lib'):
        print("⚠️  wandb compatibility issue detected. Fixing...")
        import subprocess
        import sys
        subprocess.check_call([sys.executable, "-m", "pip", "install", "wandb==0.15.12", "--quiet"])
        print("✅ Fixed wandb version. Please restart runtime: Runtime → Restart runtime")
        print("   Then run this cell again.")
except ImportError:
    pass  # wandb not installed, that's fine

# === TODO (you code this) ===
# Create LoRA config and TrainingArguments; run one epoch.
# Hints:
#   - Use LoraConfig from peft with r/alpha/dropout from config
#   - Set target_modules to attention layers
#   - Use TrainingArguments with grad_accum, bf16, etc.
#   - Use SFTTrainer from trl (or Trainer from transformers)
# Acceptance:
#   - training completes; loss decreases; adapter folder saved

from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import TrainingArguments, Trainer

# Try to import SFTTrainer, with fallback to regular Trainer if wandb issue persists
try:
    from trl import SFTTrainer
    USE_SFT_TRAINER = True
except RuntimeError as e:
    if "wandb" in str(e).lower():
        print("⚠️  wandb issue detected. Installing compatible version...")
        import subprocess
        import sys
        subprocess.check_call([sys.executable, "-m", "pip", "install", "wandb==0.15.12", "--quiet", "--upgrade"])
        print("✅ Installed wandb==0.15.12. Please restart runtime: Runtime → Restart runtime")
        print("   Then run this cell again.")
        raise
    else:
        raise
except ImportError:
    print("⚠️  SFTTrainer not available. Using regular Trainer instead.")
    USE_SFT_TRAINER = False


def train_qlora(model, tokenizer, ds_train, ds_val, cfg: dict, out_dir: str):
    """
    Train LoRA adapters on 4-bit model.

    Args:
        model: 4-bit quantized model
        tokenizer: Tokenizer
        ds_train: Training dataset
        ds_val: Validation dataset
        cfg: Config dict with qlora settings
        out_dir: Output directory for adapters
    """
    # Prepare model for k-bit training
    model = prepare_model_for_kbit_training(model)

    lora_config = LoraConfig(
        r=cfg['qlora']['r'],
        lora_alpha=cfg['qlora']['alpha'],
        lora_dropout=cfg['qlora']['dropout'],
        target_modules=cfg['qlora']['target_modules'],
        bias="none",
        task_type="CAUSAL_LM",
    )

    model = get_peft_model(model, lora_config)

    training_args = TrainingArguments(
        output_dir=out_dir,
        per_device_train_batch_size=cfg['train']['batch_size'],
        per_device_eval_batch_size=cfg['train']['batch_size'],
        learning_rate=cfg['qlora']['lr'],
        weight_decay=0.01,
        num_train_epochs=cfg['qlora']['epochs'],
        gradient_accumulation_steps=cfg['qlora']['grad_accum'],
        gradient_checkpointing=True,
        bf16=True,
        fp16=False,
        optim="paged_adamw_8bit",
        lr_scheduler_type="cosine",
        warmup_ratio=0.03,
        logging_dir=f"{out_dir}/logs",
        logging_strategy="steps",
        logging_steps=10,
        save_strategy="steps",
        save_steps=100,
        save_total_limit=2,
        eval_strategy="steps",
        eval_steps=100,
        save_safetensors=True,
    )

    # SFTTrainer has different APIs in different trl versions
    # Try minimal parameters first, fallback to regular Trainer if needed
    print("Attempting to create SFTTrainer...")

    try:
        # Try with minimal parameters (most compatible)
        trainer = SFTTrainer(
            model=model,
            train_dataset=ds_train,
            eval_dataset=ds_val,
            args=training_args,
        )
        print("✅ SFTTrainer created successfully with minimal parameters")
    except TypeError as e:
        error_msg = str(e)
        print(f"⚠️  SFTTrainer error: {error_msg}")

        if "unexpected keyword" in error_msg:
            print("   SFTTrainer API doesn't match this trl version.")
            print("   Falling back to regular Trainer (will need to tokenize dataset first)...")

            # Fallback: Use regular Trainer with pre-tokenized dataset
            from transformers import DataCollatorForLanguageModeling

            # Tokenize the datasets
            def tokenize_function(examples):
                return tokenizer(
                    examples["text"],
                    truncation=True,
                    max_length=512,
                    padding=False,  # We'll use data collator for padding
                )

            print("   Tokenizing datasets...")
            tokenized_train = ds_train.map(tokenize_function, batched=True, remove_columns=["text"])
            tokenized_val = ds_val.map(tokenize_function, batched=True, remove_columns=["text"])

            # Create data collator for dynamic padding
            data_collator = DataCollatorForLanguageModeling(
                tokenizer=tokenizer,
                mlm=False,  # Causal LM, not masked LM
            )

            # Use regular Trainer
            trainer = Trainer(
                model=model,
                train_dataset=tokenized_train,
                eval_dataset=tokenized_val,
                args=training_args,
                data_collator=data_collator,
            )
            print("✅ Using regular Trainer with pre-tokenized dataset")
        else:
            raise

    trainer.train()

    # Create output directory if it doesn't exist
    import os
    os.makedirs(out_dir, exist_ok=True)

    trainer.save_model(out_dir)

    # Save training logs
    import json
    log_dir = f"{out_dir}/logs"
    os.makedirs(log_dir, exist_ok=True)

    log_path = f"{log_dir}/trainer_state.json"

    # Save the trainer state to the log file
    if hasattr(trainer, 'state') and trainer.state is not None:
        with open(log_path, 'w') as f:
            json.dump(trainer.state.log_history, f, indent=2)
        print(f"✅ Training logs saved to {log_path}")
        print("Validation losses:", trainer.state.log_history)
    else:
        print("⚠️ Trainer state not available")


# Train
cfg = {
    'qlora': {
        'r': 8,
        'alpha': 16,
        'dropout': 0.05,
        'target_modules': ['q_proj', 'k_proj', 'v_proj', 'o_proj'],
        'lr': 2.0e-4,
        'grad_accum': 16,
        'epochs': 1
    },
    'train': {
        'batch_size': 1
    }
}
train_qlora(model, tokenizer, ds_train, ds_val, cfg, out_dir="../adapters/mistral-frankenstein")
print("Training complete!")


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Attempting to create SFTTrainer...
✅ SFTTrainer created successfully with minimal parameters


Step,Training Loss,Validation Loss


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


✅ Training logs saved to ../adapters/mistral-frankenstein/logs/trainer_state.json
Validation losses: [{'loss': 2.2292, 'grad_norm': 1.2944529056549072, 'learning_rate': 0.00016234898018587337, 'entropy': 2.113223755918443, 'num_tokens': 37481.0, 'mean_token_accuracy': 0.5097053039819002, 'epoch': 0.3508771929824561, 'step': 10}, {'loss': 1.9667, 'grad_norm': 1.1977261304855347, 'learning_rate': 5.6611626088244194e-05, 'entropy': 1.9625635132193566, 'num_tokens': 74539.0, 'mean_token_accuracy': 0.5463880322873592, 'epoch': 0.7017543859649122, 'step': 20}, {'train_runtime': 222.8623, 'train_samples_per_second': 2.046, 'train_steps_per_second': 0.13, 'total_flos': 4565576052326400.0, 'train_loss': 2.0713837722252153, 'entropy': 2.0800778458223625, 'num_tokens': 106910.0, 'mean_token_accuracy': 0.5315580129185143, 'epoch': 1.0, 'step': 29}]
Training complete!


## Push Adapters to Hub

Save the adapters to the Hub so you can use them later (and share them).


In [42]:
# === TODO (you code this) ===
# Push the adapter to the Hub (private ok).
# Hints:
#   - Use adapter.push_to_hub() or model.push_to_hub()
#   - Set private=True if desired
#   - Include tokenizer if needed
# Acceptance:
#   - repo exists with adapter files; URL printed

from peft import PeftModel

def push_adapters(base_model, local_dir: str, repo_id: str, private: bool = False):
    """
    Push LoRA adapters to Hugging Face Hub.
    
    Args:
        base_model: The base model to load adapters onto
        local_dir: Local directory with adapter files
        repo_id: Hub repository ID
        private: Whether to make the repo private
    """
    # Load the adapter onto the base model
    peft_model = PeftModel.from_pretrained(base_model, local_dir)
    
    # Push to hub
    peft_model.push_to_hub(repo_id, private=private)
    
    # Also push tokenizer for convenience
    tokenizer.push_to_hub(repo_id, private=private)
    
    print(f"✅ Adapters pushed to Hub: https://huggingface.co/{repo_id}")
    

push_adapters(model, "adapters/mistral-frankenstein", "Tuminha/mistral-frankenstein-qlora", private=False)
print("Adapters pushed to Hub!")




Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...adapter_model.safetensors:  16%|#6        | 4.46MB / 27.3MB            

README.md: 0.00B [00:00, ?B/s]

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...pgsbd08ta/tokenizer.model: 100%|##########|  493kB /  493kB            

✅ Adapters pushed to Hub: https://huggingface.co/Tuminha/mistral-frankenstein-qlora
Adapters pushed to Hub!
