# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: QLoRA (Quantized LoRA) using bitsandbytes library.
* Model: google-bert/bert-base-uncased
* Evaluation approach: Convert to a pandas dataframe and investigate what the model predicted correctly and incorrectly.
* Fine-tuning dataset: AG News dataset classifying articles into the categories "World", "Sports", "Business", and "Sci/Tech".

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
# Upgrade datasets to the correct version
! pip install -q "datasets==2.15.0"

[0m

In [2]:
from datasets import load_dataset

# https://huggingface.co/datasets/ag_news
dataset = load_dataset("ag_news")

splits = ['train', 'test']

dataset['train']

Downloading readme:   0%|          | 0.00/8.07k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/18.6M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.23M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/120000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7600 [00:00<?, ? examples/s]

Dataset({
    features: ['text', 'label'],
    num_rows: 120000
})

In [3]:
# Inspect the first example
dataset['train'][0]

{'text': "Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\\band of ultra-cynics, are seeing green again.",
 'label': 2}

In [4]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")

# Let's use a lambda function to tokenize all the examples
tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(
        lambda x: tokenizer(x["text"], padding='max_length', truncation=True), batched=True
    )

# Inspect the available columns in the dataset
tokenized_dataset["train"]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/120000 [00:00<?, ? examples/s]

Map:   0%|          | 0/7600 [00:00<?, ? examples/s]

Dataset({
    features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 120000
})

In [5]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 120000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 7600
    })
})

In [6]:
# Compare text with tokens
print(tokenized_dataset["train"][0]['text'])
print(tokenized_dataset["train"][0]['input_ids'])

Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\band of ultra-cynics, are seeing green again.
[101, 2813, 2358, 1012, 6468, 15020, 2067, 2046, 1996, 2304, 1006, 26665, 1007, 26665, 1011, 2460, 1011, 19041, 1010, 2813, 2395, 1005, 1055, 1040, 11101, 2989, 1032, 2316, 1997, 11087, 1011, 22330, 8713, 2015, 1010, 2024, 3773, 2665, 2153, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

In [7]:
import torch
from transformers import AutoModelForSequenceClassification

id2label = {
        0: "World", 
        1: "Sports",
        2: "Business",
        3: "Sci/Tech"
    }

model = AutoModelForSequenceClassification.from_pretrained(
    "google-bert/bert-base-uncased",
    num_labels=4,
    id2label=id2label,
    label2id={v: k for k, v in id2label.items()},
    pad_token_id=tokenizer.pad_token_id,
)

# Freeze all parameters of the base model
for param in model.base_model.parameters():
    param.requires_grad = False


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [8]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


In [9]:
training_args = TrainingArguments(
    output_dir="./data/classification",
    learning_rate=1e-3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=2,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"].shuffle().select(range(1000)),
    eval_dataset=tokenized_dataset["test"].shuffle().select(range(1000)),
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

In [10]:
# Evaluate the model without fine-tuning
trainer.evaluate()


You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


{'eval_loss': 1.4175995588302612,
 'eval_accuracy': 0.25,
 'eval_runtime': 32.1994,
 'eval_samples_per_second': 31.056,
 'eval_steps_per_second': 7.764}

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [11]:
# Create the model again applying quantization using BitsAndBytes
import torch
from transformers import AutoModelForSequenceClassification, BitsAndBytesConfig

config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForSequenceClassification.from_pretrained(
    "google-bert/bert-base-uncased",
    num_labels=4,
    id2label=id2label,
    label2id={v: k for k, v in id2label.items()},
    pad_token_id=tokenizer.pad_token_id,
    quantization_config=config
)

# Freeze all parameters of the base model
for param in model.base_model.parameters():
    param.requires_grad = False


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [12]:
from peft import LoraConfig, TaskType, get_peft_model, prepare_model_for_kbit_training

# https://huggingface.co/docs/peft/v0.9.0/en/package_reference/peft_types#peft.TaskType
peft_config = LoraConfig(task_type=TaskType.SEQ_CLS, 
                         inference_mode=False, 
                         r=8, 
                         lora_alpha=32, 
                         lora_dropout=0.1)

model = prepare_model_for_kbit_training(model)

lora_model = get_peft_model(model, peft_config)
lora_model.print_trainable_parameters()

trainable params: 301,064 || all params: 109,783,304 || trainable%: 0.2742347779950219


In [13]:
training_args = TrainingArguments(
    output_dir="./data/classification",
    learning_rate=1e-3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=2,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_dataset["train"].shuffle().select(range(1000)),
    eval_dataset=tokenized_dataset["test"].shuffle().select(range(1000)),
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

In [14]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.550281,0.875
2,0.541200,0.472929,0.882


Checkpoint destination directory ./data/classification/checkpoint-250 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory ./data/classification/checkpoint-500 already exists and is non-empty.Saving will proceed but saved results may be invalid.


TrainOutput(global_step=500, training_loss=0.5412387084960938, metrics={'train_runtime': 284.6861, 'train_samples_per_second': 7.025, 'train_steps_per_second': 1.756, 'total_flos': 528062398464000.0, 'train_loss': 0.5412387084960938, 'epoch': 2.0})

In [15]:
# 4-bit quantized models cannot be saved currently
# model.save_pretrained("output_dir")
lora_model.save_pretrained("output_dir")

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [16]:
from transformers import AutoConfig
from peft import PeftModel, LoraConfig

# 4-bit quantized models cannot be saved, so we will use the model from before.
# config = AutoConfig.from_pretrained("output_dir")
# model = AutoModelForSequenceClassification.from_pretrained("output_dir", config=config)
lora_model = PeftModel.from_pretrained(model, "output_dir", config=LoraConfig.from_pretrained("output_dir"))

lora_model = lora_model.to('cuda')
lora_model.eval()

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): BertForSequenceClassification(
      (bert): BertModel(
        (embeddings): BertEmbeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (token_type_embeddings): Embedding(2, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): BertEncoder(
          (layer): ModuleList(
            (0-11): 12 x BertLayer(
              (attention): BertAttention(
                (self): BertSelfAttention(
                  (query): Linear4bit(
                    in_features=768, out_features=768, bias=True
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.1, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=768, 

In [17]:
# Evaluate the model using the same trainer as before.
# If loading in a different session, I would need to
# run the cell creating the trainer again for the lora_model.
# Since I am running in the same session, no need to do that again.
trainer.evaluate()

{'eval_loss': 0.472929447889328,
 'eval_accuracy': 0.882,
 'eval_runtime': 34.9886,
 'eval_samples_per_second': 28.581,
 'eval_steps_per_second': 7.145,
 'epoch': 2.0}

In [18]:
import pandas as pd

test_data = tokenized_dataset["test"].shuffle().select(range(1000))

df = pd.DataFrame(test_data)
df["actual_label"] = df["label"].map(id2label)
df = df[["text", "actual_label"]]

# Add the model predictions to the dataframe
predictions = trainer.predict(test_data)
df["predicted_label"] = id2label[np.argmax(predictions[0], axis=1)[0]]

# Show full cell output
pd.set_option("display.max_colwidth", None)

# View the first predictions
df.head()

Unnamed: 0,text,actual_label,predicted_label
0,"Indian board plans own telecast of Australia series The Indian cricket board said on Wednesday it was making arrangements on its own to broadcast next month #39;s test series against Australia, which is under threat because of a raging TV rights dispute.",Sports,Sports
1,Stocks Higher on Drop in Jobless Claims A sharp drop in initial unemployment claims and bullish forecasts from Nokia and Texas Instruments sent stocks slightly higher in early trading Thursday.,Business,Sports
2,"Nuggets 112, Raptors 106 Carmelo Anthony scored 30 points and Kenyon Martin added 24 points and 16 rebounds, helping the Denver Nuggets hold off the Toronto Raptors 112-106 Wednesday night.",Sports,Sports
3,Stocks Higher on Drop in Jobless Claims A sharp drop in initial unemployment claims and bullish forecasts from Nokia and Texas Instruments sent stocks higher in early trading Thursday.,Business,Sports
4,"REVIEW: 'Half-Life 2' a Tech Masterpiece (AP) AP - It's been six years since Valve Corp. perfected the first-person shooter with ""Half-Life."" Video games have come a long way since, with better graphics and more options than ever. Still, relatively few games have mustered this one's memorable characters and original science fiction story.",Sci/Tech,Sports


In [19]:
# View the first incorrect predictions
df[df["actual_label"] != df["predicted_label"]].head()

Unnamed: 0,text,actual_label,predicted_label
1,Stocks Higher on Drop in Jobless Claims A sharp drop in initial unemployment claims and bullish forecasts from Nokia and Texas Instruments sent stocks slightly higher in early trading Thursday.,Business,Sports
3,Stocks Higher on Drop in Jobless Claims A sharp drop in initial unemployment claims and bullish forecasts from Nokia and Texas Instruments sent stocks higher in early trading Thursday.,Business,Sports
4,"REVIEW: 'Half-Life 2' a Tech Masterpiece (AP) AP - It's been six years since Valve Corp. perfected the first-person shooter with ""Half-Life."" Video games have come a long way since, with better graphics and more options than ever. Still, relatively few games have mustered this one's memorable characters and original science fiction story.",Sci/Tech,Sports
5,"China's inflation rate slows sharply but problems remain (AFP) AFP - China's inflation rate eased sharply in October as government efforts to cool the economy began to really bite, with food prices, one of the main culprits, showing some signs of slowing, official data showed.",World,Sports
6,"ADV: Try Currency Trading Risk-Free 30 Days 24-hour commission-free trading, 100-to-1 leverage of your capital, and Dealbook Fx 2 - our free advanced trading software. Sign up for our free 30-day trial and receive one-on-one training.",Business,Sports
