# Notebook Overview

This notebook fine-tunes a LLaMA-2 causal language model with LoRA on a JSONL corpus using 4-bit quantization for memory efficiency. The end result is a lightweight adapter-augmented model saved for downstream inference.

Config & Data

Paths set for base model, dataset, and output directory.

Dataset loaded via datasets from testing_85.jsonl, shuffled, and split 90/5/5 (train/val/test).

Tokenization

LLaMA-2 tokenizer loaded; pad_token set to eos_token (required for LLaMA).

Examples tokenized with truncation at max_length=384 (no padding at encode time).

Columns mapped to model inputs; original text removed.

Batching / Collation

DataCollatorForLanguageModeling with mlm=False (causal LM objective).

Dynamic padding to multiples of 64 for tensor efficiency; group_by_length=True minimizes padding.

Model Loading with 4-bit + LoRA

Base model loaded with BitsAndBytes: 4-bit NF4 quantization, bfloat16 compute, device_map="auto".

Prepared for k-bit training, gradient checkpointing enabled, use_cache=False.

LoRA config: r=16, alpha=8, dropout=0.05, targeting q_proj, k_proj, v_proj, o_proj on a CAUSAL_LM task.

PEFT applied; prints trainable parameter count.

Training Setup

TrainingArguments: 2 epochs; batch size 16 (train/eval); grad accumulation=2; cosine LR schedule; lr=2e-4; bf16=True.

Eval every 1000 steps; save every 2000 (keep 3 checkpoints); optim="paged_adamw_8bit".

Logging to output_dir/logs; dataloader_num_workers=8; max_grad_norm=0.3; warmup_ratio=0.03.

Trainer & Callback

Trainer constructed with model, data, tokenizer, and collator.

Custom SpeedCallback prints throughput every 20 steps.

Run & Save

Clears CUDA cache, runs trainer.train().

Saves adapter-augmented model and tokenizer to .../llama2_qa_lora_output/final.

In [1]:
import torch
from pathlib import Path
from datasets import load_dataset
from transformers import (
    AutoTokenizer, AutoModelForCausalLM,
    TrainingArguments, Trainer, BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Configuration
model_path = "/mnt/data/llama2-model"  
data_path = "/mnt/data/testing_85.jsonl" 
output_dir = "/mnt/data/llama2_qa_lora_output"

In [2]:
# Load Dataset
print(" Loading dataset...")
dataset = load_dataset("json", data_files=data_path, split="train")
dataset = dataset.shuffle(seed=42)

📦 Loading dataset...


In [3]:
# dataset = dataset.select(range(2000)) 

In [4]:
# 3-Way Split
split = dataset.train_test_split(test_size=0.10, seed=42)
val_test = split["test"].train_test_split(test_size=0.5, seed=42)
train_dataset = split["train"]
val_dataset = val_test["train"]
test_dataset = val_test["test"]

In [5]:
# Tokenizer
print(" Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.pad_token = tokenizer.eos_token

# New version: no padding
def tokenize(example):
    return tokenizer(
        example["text"],
        truncation=True,
        max_length=384  # or 384 r
    )

train_dataset = train_dataset.map(tokenize, batched=True, num_proc=4, remove_columns=["text"])
val_dataset = val_dataset.map(tokenize, batched=True, num_proc=4, remove_columns=["text"])

🔤 Loading tokenizer...


In [6]:
from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,
    pad_to_multiple_of=64  
)

In [7]:
# Load Model with LoRA 
print(" Loading LLaMA-2 with LoRA...")
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)

base_model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    quantization_config=bnb_config,
    torch_dtype=torch.bfloat16
)
base_model = prepare_model_for_kbit_training(base_model)

base_model.gradient_checkpointing_enable()
base_model.config.use_cache = False

lora_config = LoraConfig(
    r=16,
    lora_alpha=8,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], 
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters()

🧠 Loading LLaMA-2 with LoRA...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

trainable params: 16,777,216 || all params: 6,755,192,832 || trainable%: 0.24836028248556738


In [8]:
# Training Arguments
print(" Setting up training...")
args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=2,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    gradient_accumulation_steps=2,
    eval_strategy="steps",
    eval_steps=1000,
    save_steps=2000,
    save_total_limit=3,
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    logging_dir=f"{output_dir}/logs",
    logging_steps=100,
    bf16=True,
    report_to="none",
    remove_unused_columns=False,
    dataloader_num_workers=8,
    group_by_length=True,
    optim="paged_adamw_8bit",
    max_grad_norm=0.3,
    warmup_ratio=0.03
)

⚙️ Setting up training...


In [9]:
# Trainer Setup
from transformers import default_data_collator
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator
)

  trainer = Trainer(
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [10]:
import time
from transformers import TrainerCallback

class SpeedCallback(TrainerCallback):
    def __init__(self):
        self.last_time = time.time()
    def on_step_end(self, args, state, control, **kwargs):
        if state.global_step % 20 == 0:
            now = time.time()
            duration = now - self.last_time
            print(f"⚡ Step {state.global_step} — {20/duration:.3f} it/s")
            self.last_time = now

trainer.add_callback(SpeedCallback())

In [11]:
# tart Training
print(" Starting fine-tuning...")
torch.cuda.empty_cache()
trainer.train()

🚀 Starting fine-tuning...


Step,Training Loss,Validation Loss
1000,1.2281,1.231933
2000,1.1414,1.161807
3000,1.1251,1.140821


⚡ Step 20 — 0.064 it/s
⚡ Step 40 — 0.068 it/s
⚡ Step 60 — 0.075 it/s
⚡ Step 80 — 0.067 it/s
⚡ Step 100 — 0.074 it/s
⚡ Step 120 — 0.068 it/s
⚡ Step 140 — 0.068 it/s
⚡ Step 160 — 0.075 it/s
⚡ Step 180 — 0.067 it/s
⚡ Step 200 — 0.075 it/s
⚡ Step 220 — 0.068 it/s
⚡ Step 240 — 0.067 it/s
⚡ Step 260 — 0.075 it/s
⚡ Step 280 — 0.067 it/s
⚡ Step 300 — 0.073 it/s
⚡ Step 320 — 0.068 it/s
⚡ Step 340 — 0.068 it/s
⚡ Step 360 — 0.075 it/s
⚡ Step 380 — 0.067 it/s
⚡ Step 400 — 0.075 it/s
⚡ Step 420 — 0.068 it/s
⚡ Step 440 — 0.068 it/s
⚡ Step 460 — 0.075 it/s
⚡ Step 480 — 0.067 it/s
⚡ Step 500 — 0.074 it/s
⚡ Step 520 — 0.068 it/s
⚡ Step 540 — 0.068 it/s
⚡ Step 560 — 0.075 it/s
⚡ Step 580 — 0.067 it/s
⚡ Step 600 — 0.074 it/s
⚡ Step 620 — 0.068 it/s
⚡ Step 640 — 0.067 it/s
⚡ Step 660 — 0.075 it/s
⚡ Step 680 — 0.067 it/s
⚡ Step 700 — 0.075 it/s
⚡ Step 720 — 0.068 it/s
⚡ Step 740 — 0.067 it/s
⚡ Step 760 — 0.075 it/s
⚡ Step 780 — 0.067 it/s
⚡ Step 800 — 0.073 it/s
⚡ Step 820 — 0.068 it/s
⚡ Step 840 — 0.068 i



⚡ Step 2020 — 0.029 it/s
⚡ Step 2040 — 0.069 it/s
⚡ Step 2060 — 0.073 it/s
⚡ Step 2080 — 0.067 it/s
⚡ Step 2100 — 0.076 it/s
⚡ Step 2120 — 0.068 it/s
⚡ Step 2140 — 0.070 it/s
⚡ Step 2160 — 0.073 it/s
⚡ Step 2180 — 0.067 it/s
⚡ Step 2200 — 0.075 it/s
⚡ Step 2220 — 0.068 it/s
⚡ Step 2240 — 0.070 it/s
⚡ Step 2260 — 0.072 it/s
⚡ Step 2280 — 0.067 it/s
⚡ Step 2300 — 0.075 it/s
⚡ Step 2320 — 0.068 it/s
⚡ Step 2340 — 0.070 it/s
⚡ Step 2360 — 0.073 it/s
⚡ Step 2380 — 0.067 it/s
⚡ Step 2400 — 0.075 it/s
⚡ Step 2420 — 0.068 it/s
⚡ Step 2440 — 0.069 it/s
⚡ Step 2460 — 0.073 it/s
⚡ Step 2480 — 0.067 it/s
⚡ Step 2500 — 0.075 it/s
⚡ Step 2520 — 0.068 it/s
⚡ Step 2540 — 0.069 it/s
⚡ Step 2560 — 0.072 it/s
⚡ Step 2580 — 0.067 it/s
⚡ Step 2600 — 0.075 it/s
⚡ Step 2620 — 0.068 it/s
⚡ Step 2640 — 0.069 it/s
⚡ Step 2660 — 0.073 it/s
⚡ Step 2680 — 0.067 it/s
⚡ Step 2700 — 0.075 it/s
⚡ Step 2720 — 0.068 it/s
⚡ Step 2740 — 0.069 it/s
⚡ Step 2760 — 0.073 it/s
⚡ Step 2780 — 0.067 it/s
⚡ Step 2800 — 0.075 it/s




TrainOutput(global_step=3194, training_loss=1.215690178056023, metrics={'train_runtime': 46689.8955, 'train_samples_per_second': 2.189, 'train_steps_per_second': 0.068, 'total_flos': 1.4987055364791337e+18, 'train_loss': 1.215690178056023, 'epoch': 2.0})

In [12]:
# Save Model + Tokenizer
print(" Saving model...")
trainer.save_model(f"{output_dir}/final")
tokenizer.save_pretrained(f"{output_dir}/final")

💾 Saving model...




('/mnt/data/llama2_qa_lora_output/final/tokenizer_config.json',
 '/mnt/data/llama2_qa_lora_output/final/special_tokens_map.json',
 '/mnt/data/llama2_qa_lora_output/final/tokenizer.model',
 '/mnt/data/llama2_qa_lora_output/final/added_tokens.json',
 '/mnt/data/llama2_qa_lora_output/final/tokenizer.json')