# 🔍 Predicting Item Prices from Descriptions (Part 6)
---
- Data Curation & Preprocessing
- Model Benchmarking – Traditional ML vs LLMs
- E5 Embeddings & RAG
- Fine-Tuning GPT-4o Mini
- Evaluating LLaMA 3.1 8B Quantized
- ➡️ Fine-Tuning LLaMA 3.1 with QLoRA
- Evaluating Fine-Tuned LLaMA
- Summary & Leaderboard

---

# ⚙️ Part 6: Fine-Tuning LLaMA 3.1 with QLoRA

- 🧑‍💻 Skill Level: Advanced
- ⚙️ Hardware: ⚠️ GPU required - use Google Colab (A100)
- 🛠️ Requirements: 🔑 HF Token, wandb API Key ([Weights & Biases](https://wandb.ai))
- Tasks:
    - Load and split dataset (Train/validation); set up [Weights & Biases](https://wandb.ai) logging
    - Load quantized LLaMA 3.1 8B and tokenizer
    - Prepare data with a collator for fine-tuning
    - Configure QLoRA (LoRAConfig), training settings (SFTConfig), and tune key hyperparameters
    - Fine-tune and push best model to Hugging Face Hub

⚠️ I attempted to fine-tune the model on the full 400K dataset using an A100 on Google Colab, but it consistently crashed. So for now, I’m training on a 20K subset to understand the process, play with hyperparameters, track progress in Weights & Biases, and push the best checkpoint to the Hub.

⏱️ Training on 20,000 examples took over 2 hours.

The full model fine-tuned on the complete 400K dataset is available thanks to our instructor, [Ed](https://www.linkedin.com/in/eddonner) — much appreciated!  
We’ll dive into that model in the next notebook — **stay tuned** 😉

---
📢 Find more LLM notebooks on my [GitHub repository](https://github.com/lisekarimi/lexo)

In [None]:
# Install required packages in Google Colab
%pip install -q datasets transformers torch peft bitsandbytes trl accelerate wandb

In [None]:
# imports

import os
import torch
import wandb
from google.colab import userdata
from datetime import datetime
from datasets import load_dataset
from huggingface_hub import login
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, EarlyStoppingCallback
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig, DataCollatorForCompletionOnlyLM

In [None]:
# Google Colab User Data
# Ensure you have set the following in your Google Colab environment:
hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)

## 🔀 Load Dataset from HF and Split into Train/Validation

In [None]:
# #If you face NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported run:
# %pip install -U datasets (for Google Colab)

In [None]:
HF_USER = "lisekarimi" # your HF name here!

DATASET_NAME = f"{HF_USER}/pricer-data"
dataset = load_dataset(DATASET_NAME)
train = dataset['train']
test = dataset['test']
split_ratio = 0.1  # 10% for validation

##############################################################################
# Optional: limit training dataset to TRAIN_SIZE for testing/debugging
# Comment the two lines below to use the full dataset
TRAIN_SIZE = 20000
train = train.select(range(TRAIN_SIZE))
##############################################################################

total_size = len(train)
val_size = int(total_size * split_ratio)

val_data = train.select(range(val_size))
train_data = train.select(range(val_size, total_size))


In [None]:
print(f"Train data size     : {len(train_data)}")
print(f"Validation data size: {len(val_data)}")
print(f"Test data size      : {len(test)}")

## 🛠️ Hugging Face Configuration

In [None]:
PROJECT_NAME = "llama3-pricer"

# Run name for saving the model in the hub

RUN_NAME =  f"{datetime.now():%Y-%m-%d_%H.%M.%S}-size{total_size}"
PROJECT_RUN_NAME = f"{PROJECT_NAME}-{RUN_NAME}"
HUB_MODEL_NAME = f"{HF_USER}/{PROJECT_RUN_NAME}"
HUB_MODEL_NAME

## 🛠️ wandb Configuration

In [None]:
# Load from Colab's secure storage
wandb_api_key = userdata.get('WANDB_API_KEY')

# Load from environment variables (.env file) if running Locally (GPU setup)
# wandb_api_key = os.getenv('WANDB_API_KEY')

In [None]:
os.environ["WANDB_API_KEY"] = wandb_api_key
wandb.login()

In [None]:
# Configure Weights & Biases to record against our project

LOG_TO_WANDB = True

os.environ["WANDB_PROJECT"] = PROJECT_NAME
os.environ["WANDB_LOG_MODEL"] = "checkpoint" if LOG_TO_WANDB else "end"
os.environ["WANDB_WATCH"] = "gradients"

if LOG_TO_WANDB:
  wandb.init(project=PROJECT_NAME, name=RUN_NAME)

## 📥 Load the Tokenizer and Model

In [None]:
BASE_MODEL = "meta-llama/Meta-Llama-3.1-8B"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,  # Reduce the precision to 4 bits
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=quant_config,
    device_map="auto",
)
base_model.generation_config.pad_token_id = tokenizer.pad_token_id

print(f"Memory footprint: {base_model.get_memory_footprint() / 1e6:.1f} MB")

## ⚙️ Fine-tune our LLaMA 3 8B (4-bit quantized) model with QLoRA
- 1. Prepare the Data with a Data Collator
- 2. Define the QLoRA Configuration (LoraConfig)
- 3. Set the Training Parameters (SFTConfig)
- 4. Initialize the Fine-Tuning Trainer (SFTTrainer)
- 5. Run Fine-Tuning and Push to Hub

### 🔄 1. Prepare the Data with a Data Collator

We only want the model to learn the price, not the product description. Everything before "Price is $" is context, not training target. HuggingFace’s DataCollatorForCompletionOnlyLM handles this masking automatically:

1. Tokenizes the response_template ("Price is $")
2. Finds its token position in each input
3. Masks all tokens before it (context)
4. Trains the model only on tokens after it (the price)


Example:

Input: "Product: Red T-shirt. Price is $12.99"

Masked: "Product: Red T-shirt. Price is $" → masked (no loss)

"12.99" → not masked (model is trained to predict this)

So the model learns to generate 12.99 given the context, but isn’t trained to repeat or memorize the description.

In [None]:
response_template = "Price is $"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

### 🧠 2. Define the QLoRA Configuration (LoraConfig)

In [None]:
LORA_R = 32
LORA_ALPHA = 64
TARGET_MODULES = ["q_proj", "v_proj", "k_proj", "o_proj"]
LORA_DROPOUT = 0.1

lora_parameters = LoraConfig(
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    target_modules=TARGET_MODULES,
    lora_dropout=LORA_DROPOUT,
    bias="none",
    task_type="CAUSAL_LM", # Specifies we're doing causal language modeling
)

### ⚙️ 3. Set the Training Parameters (SFTConfig)

In [None]:
# 📦 Training Setup:
EPOCHS = 1
BATCH_SIZE = 16                     # A100 GPU can go up to 16
GRADIENT_ACCUMULATION_STEPS = 2
MAX_SEQUENCE_LENGTH = 182          # Max token length per input

# ⚙️ Optimization:
LEARNING_RATE = 1e-4
LR_SCHEDULER_TYPE = 'cosine'
WARMUP_RATIO = 0.03
OPTIMIZER = "paged_adamw_32bit"

# 💾 Checkpointing & Logging:
SAVE_STEPS = 200        # Checkpoint
STEPS = 20              # Log every 20 steps
save_total_limit = 10   # Keep latest 10 only


LOG_TO_WANDB = True

HUB_MODEL_NAME = f"{HF_USER}/{PROJECT_RUN_NAME}"

train_parameters = SFTConfig(
    # Output & Run
    output_dir=PROJECT_RUN_NAME,
    run_name=RUN_NAME,
    dataset_text_field="text",
    max_seq_length=MAX_SEQUENCE_LENGTH,

    # Training
    num_train_epochs=EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
    max_steps=-1,
    group_by_length=True,

    # Evaluation
    eval_strategy="steps",
    eval_steps=STEPS,
    per_device_eval_batch_size=1,

    # Optimization
    learning_rate=LEARNING_RATE,
    lr_scheduler_type=LR_SCHEDULER_TYPE,
    warmup_ratio=WARMUP_RATIO,
    optim=OPTIMIZER,
    weight_decay=0.001,
    max_grad_norm=0.3,

    # Precision
    fp16=False,
    bf16=True,

    # Logging & Saving
    logging_steps=STEPS,            # See loss after each {STEP} batches
    save_strategy="steps",
    save_steps=SAVE_STEPS,          # Model Checkpointed locally
    save_total_limit=save_total_limit,
    report_to="wandb" if LOG_TO_WANDB else None,

    # Hub
    push_to_hub=True,
    hub_strategy="end",  # Only push once, at the end
    load_best_model_at_end=True, # Loads the best eval_loss checkpoint
    metric_for_best_model="eval_loss", # Monitors eval_loss
    greater_is_better=False, # Lower eval_loss = better model
)


### 🧩 4. Initialize the Fine-Tuning Trainer (SFTTrainer)
Combining everything

In [None]:
# The latest version of trl is showing a warning about labels - please ignore this warning
fine_tuning = SFTTrainer(
    model=base_model,
    train_dataset=train_data,
    eval_dataset=val_data,
    peft_config=lora_parameters,    # QLoRA config
    args=train_parameters,          # SFTConfig
    data_collator=collator,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=5)] # Early stop if no val improvement for 5 steps
)

### 🚀 5. Run Fine-Tuning and Push to Hub

In [None]:
fine_tuning.train()
print(f"✅ Best model pushed to HF Hub: {HUB_MODEL_NAME}")

![](https://github.com/lisek75/nlp_llms_notebook/blob/main/assets/09_train_eval_loss_steps.png?raw=true)

![](https://github.com/lisek75/nlp_llms_notebook/blob/main/assets/09_train_eval_loss_wandb.png?raw=true)

This chart shows training loss vs evaluation loss over steps during fine-tuning of Llama 31 8B 4-Bit FT (20K Samples).

- Blue line (train/loss): Decreasing overall, with some noise. Final value: 1.8596.
- Orange line (eval/loss): Smoother and consistently lower than training loss. Final value: 1.8103.

- No overfitting: Eval loss < train loss throughout — a good sign.
- Stable convergence: Both curves flatten around step 500, suggesting the model is reaching training stability.
- Final eval loss is low, indicating decent generalization to unseen data.

This fine-tuning run looks healthy. We can likely push further with more data - 400K run.

In [None]:
if LOG_TO_WANDB:
  wandb.finish()

![](https://github.com/lisek75/nlp_llms_notebook/blob/main/assets/09_run_summary_qlora_llama.png?raw=true)

Now that our best model is pushed to Hugging Face, let’s put it to the test.

🔜 See you in the [next notebook](https://github.com/lisekarimi/lexo/blob/main/09_part7_eval_llama_qlora.ipynb)