# Hands-on: Finetuning Qwen with QLoRA

**Objective:** Finetune a Qwen model (e.g., `Qwen2.5-1.5B` or `7B`) on a Sentiment Analysis dataset using **QLoRA** (Quantized Low-Rank Adaptation).

**What is QLoRA?**
Full finetuning of large models requires massive memory. QLoRA allows us to:
1.  **Quantize** the base model to 4-bit (shrinking memory usage by ~4x).
2.  **Freeze** the base model parameters.
3.  **Train** only a tiny adapter layer (LoRA) on top.

**Workflow:**
1.  **Setup:** Install libraries and login.
2.  **Data:** Prepare the Tweet Sentiment dataset.
3.  **Model:** Load Qwen in 4-bit precision.
4.  **Training:** Train the LoRA adapter.
5.  **Inference:** Test the fine-tuned model.

## 1. Environment Setup
We need `bitsandbytes` for quantization, `peft` for LoRA adapters, and `trl` for transformer reinforcement learning utilities.

In [None]:
# Install required packages (run this cell once)
!pip install -q transformers datasets accelerate peft bitsandbytes sentencepiece trl huggingface_hub

print("Installed packages.")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.4/59.4 MB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m465.5/465.5 kB[0m [31m26.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalled packages.


In [None]:
from google.colab import userdata
from huggingface_hub import login

hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)

We will use **Qwen 3 (1.7B version)**. This is a smaller, highly efficient model that is perfect for demos and runs very fast on Colab.

In [None]:
# Model variable
QWEN_MODEL = "Qwen/Qwen3-1.7B"
print("Model to use:", QWEN_MODEL)

Model to use: Qwen/Qwen3-1.7B


## 2. Data Preparation (Instruction Formatting)
LLMs are text-completion engines. They don't "know" they are supposed to classify sentiments unless we tell them.

We must convert our raw data (tweets and labels) into a **Prompt Format**.
* **Input:** "I hate this traffic."
* **Label:** "Negative"
* **Formatted Training Data:** `Classify the sentiment: I hate this traffic.\nAnswer: Negative`

By training on this pattern, the model learns that when it sees "Classify...", it should output a sentiment label.

In [None]:
# Load dataset and show samples
from datasets import load_dataset
import pandas as pd

dataset = load_dataset("mteb/tweet_sentiment_extraction")
df = pd.DataFrame(dataset['train'])
print("Train sample count:", len(df))
display(df.head(5))

# We will work with train split and create small train/test for demo
dataset = dataset['train'].train_test_split(test_size=0.1)
dataset

README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/1.86M [00:00<?, ?B/s]

data/test-00000-of-00001.parquet:   0%|          | 0.00/240k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/26732 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/3432 [00:00<?, ? examples/s]

Train sample count: 26732


Unnamed: 0,id,text,label,label_text
0,cb774db0d1,"I`d have responded, if I were going",1,neutral
1,549e992a42,Sooo SAD I will miss you here in San Diego!!!,0,negative
2,088c60f138,my boss is bullying me...,0,negative
3,9642c003ef,what interview! leave me alone,0,negative
4,358bd9e861,"Sons of ****, why couldn`t they put them on t...",0,negative


DatasetDict({
    train: Dataset({
        features: ['id', 'text', 'label', 'label_text'],
        num_rows: 24058
    })
    test: Dataset({
        features: ['id', 'text', 'label', 'label_text'],
        num_rows: 2674
    })
})

In [None]:
from datasets import DatasetDict

def preprocess_example(example):
  label_map = {0: "negative", 1: "neutral", 2: "positive"}
  example['input_text'] = example['text']
  example['target_text'] = label_map[example['label']]
  return example
# Make a copy of the dataset
data = DatasetDict({
    split: dataset[split].map(lambda x: x)  # shallow copy each split
    for split in dataset.keys()
})

# Apply preprocessing to each split
data = DatasetDict({
    split: data[split].map(preprocess_example)
    for split in data.keys()
})

# Keep only input_text and target_text
data = DatasetDict({
    split: data[split].remove_columns(
        [c for c in data[split].column_names if c not in ['input_text','target_text']]
    )
    for split in data.keys()
})

# Check first example
print(data['train'][0])


Map:   0%|          | 0/24058 [00:00<?, ? examples/s]

Map:   0%|          | 0/2674 [00:00<?, ? examples/s]

Map:   0%|          | 0/24058 [00:00<?, ? examples/s]

Map:   0%|          | 0/2674 [00:00<?, ? examples/s]

{'input_text': ' Good luck whit the show tonite man, ill be watching', 'target_text': 'positive'}


In [None]:
# Format dataset for causal LM fine-tuning: concatenate prompt + target
def make_text(example):
    # Simple instruction-response template
    prompt = f"Classify the sentiment: {example['input_text']}\nAnswer:"
    return {'text': prompt + " " + example['target_text']}

train_ds = data['train'].map(lambda ex: make_text(ex))
eval_ds = data['test'].map(lambda ex: make_text(ex))
print("Train examples:", len(train_ds), "Eval examples:", len(eval_ds))
train_ds[0]

Map:   0%|          | 0/24058 [00:00<?, ? examples/s]

Map:   0%|          | 0/2674 [00:00<?, ? examples/s]

Train examples: 24058 Eval examples: 2674


{'input_text': ' Good luck whit the show tonite man, ill be watching',
 'target_text': 'positive',
 'text': 'Classify the sentiment:  Good luck whit the show tonite man, ill be watching\nAnswer: positive'}

## 3. Loading the Model in 4-bit(Quantization)

We load `Qwen/Qwen3-1.7B`. Even though 1.7B is small, using **4-bit quantization** ensures we have plenty of VRAM left for the training process (gradients and optimizer states).

**Why `BitsAndBytesConfig`?**
This configuration tells the system: "Don't load the model normally. Convert every weight into a 4-bit format (NF4) on the fly." This reduces memory usage from **~3.5GB** (in standard 16-bit) to **~1.2GB** for the 1.7B model, leaving plenty of room for training data.

**Configuration:**
* `load_in_4bit=True`: Activates 4-bit loading.
* `bnb_4bit_quant_type="nf4"`: Normalized Float 4 (optimal for LLMs).

In [None]:

from transformers import BitsAndBytesConfig

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_quant_type="nf4"
)



In [None]:
# Load the Tokenizer and the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained(QWEN_MODEL, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

model = AutoModelForCausalLM.from_pretrained(
    QWEN_MODEL,
    quantization_config=quant_config,
    device_map="auto",
)
model.generation_config.pad_token_id = tokenizer.pad_token_id

print(f"Memory footprint: {model.get_memory_footprint() / 1e9:.1f} GB")

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/726 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/622M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Memory footprint: 1.3 GB


In [None]:
# Test the model
def model_test(text, max_new_tokens=32):
    prompt = f"Classify the sentiment: {text}\nAnswer:"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Test baseline on small sample
sample = data['test'][0]['input_text']
print("Sample:", sample)
print("Baseline output:", model_test(sample))

Sample:  woman, follow me
Baseline output: Classify the sentiment:  woman, follow me
Answer: The sentiment is positive.
The sentiment is positive because the phrase "follow me" is a call to action that can be seen as encouraging and motivating. The woman


In [None]:
# Tokenize the datasets
def tokenize_fn(example):
    return tokenizer(example['text'], truncation=True, max_length=256)

tokenized_train = train_ds.map(tokenize_fn, batched=True, remove_columns=train_ds.column_names)
tokenized_eval = eval_ds.map(tokenize_fn, batched=True, remove_columns=eval_ds.column_names)
print(tokenized_train.column_names)


Map:   0%|          | 0/24058 [00:00<?, ? examples/s]

Map:   0%|          | 0/2674 [00:00<?, ? examples/s]

['input_ids', 'attention_mask']


## 4. LoRA Configuration
We cannot train all 1.7 billion parameters. Instead, we attach **LoRA adapters** to the attention layers (`q_proj`, `v_proj`, etc.). We will train roughly ~0.5% to 2% of the total parameter count.

In [None]:
# Set up PEFT (LoRA)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Prepare base model for kbit training (if loaded in 8-bit)
try:
    model = prepare_model_for_kbit_training(model)
except Exception as e:
    print("prepare_model_for_kbit_training not applied:", e)

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],  # adjust based on model architecture
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

trainable params: 3,211,264 || all params: 1,723,786,240 || trainable%: 0.1863


## 5. The Training Loop
We use the Hugging Face `Trainer`.
* **Gradient Accumulation:** We use a batch size of 2 but accumulate gradients 4 times. This simulates a batch size of 8 without using extra memory.
* **Max Steps:** Set to `60` for a quick demo. For a "real" model, you would want 200+ steps or 1 epoch.

In [None]:
# TrainingArguments and Trainer setup
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

training_args = TrainingArguments(
    output_dir="./qwen-lora-output",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=8,
    num_train_epochs=1,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_steps=200,
    remove_unused_columns=False,
    push_to_hub=False,
    report_to="none",
)


data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    data_collator=data_collator
)

print(trainer)

<transformers.trainer.Trainer object at 0x7ff9603c0440>


In [None]:
# Start training (this is a lightweight demo run; adjust steps/epochs for real experiments)
trainer.train()
trainer.save_model("./qwen-lora-output")
print("Saved LoRA adapter to ./qwen-lora-output")

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)


Step,Training Loss
10,4.3621
20,3.7914
30,3.6932
40,3.5027
50,3.1595
60,3.0108
70,3.2049
80,3.0398
90,2.9436
100,3.1605


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


Step,Training Loss
10,4.3621
20,3.7914
30,3.6932
40,3.5027
50,3.1595
60,3.0108
70,3.2049
80,3.0398
90,2.9436
100,3.1605


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


## 6. Inference (Testing)
We can now test the fine-tuned model.
We simply pass a text with our prompt format `Classify the sentiment: ...\nAnswer:` and let Qwen generate the label.

In [None]:
from peft import PeftModel
import torch
model_peft = PeftModel.from_pretrained(base, "./qwen-lora-output")
def predict(text, max_new_tokens=32):
    prompt = f"Classify the sentiment: {text}\nAnswer:"
    inputs = tokenizer(prompt, return_tensors="pt").to(model_peft.device)
    with torch.no_grad():
        outputs = model_peft.generate(**inputs, max_new_tokens=max_new_tokens)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Test
samples = [
    "I absolutely love the new update!",
    "The service was terrible and slow.",
    "I am going to sleep now."
]

print("--- Qwen 3 Results ---")
for s in samples:
    res = predict(s)
    clean_res = res.split("Answer:")[-1].strip()
    print(f"Input: {s}\nPrediction: {clean_res}\n")