# Lightweight Fine-Tuning Project

* PEFT technique: I went with the Lora technique since I am new to this material and could leverage some of the suggestions in the instructions.

* Model: As suggested in the instructions, I used the gpt2 model since it is relatively small and useful for sequence classification.

* Evaluation approach: Since my aim was to fine-tune on the AG News Dataset, I used the train split to test if the model would predict the correct class for each news snippet.

* Fine-tuning dataset: In my work, I found that the 120000 samples in the AG News Dataset took too long to compute given the constraints of the Udacity GPU's. I truncated the size of this set to 1200. Likewise, I truncated the test set, also by a factor of 100.

## Loading and Evaluating a Foundation Model

I chose to use the AG News dataset for multi-label classification.

In [1]:
!pip install datasets peft accelerate

Defaulting to user installation because normal site-packages is not writeable


In [20]:
from datasets import load_dataset

dataset = load_dataset("ag_news")

## Preprocess Dataset
Use tokenizer to preprocess text input.

In [21]:
from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

### Downsample Data For Training Efficiency

In [None]:
train_len = int(dataset["train"].shape[0]/100)

In [23]:
dataset["train"] = dataset["train"].select(range(train_len))

In [24]:
test_len = int(dataset["test"].shape[0]/100)

In [25]:
dataset["test"] = dataset["test"].select(range(test_len))

In [26]:
tokenizer.pad_token = tokenizer.eos_token

splits = ['train', 'test']
tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(
        lambda x: tokenizer(x["text"],  padding="max_length", truncation=True, max_length=512, return_tensors='pt'), batched=True
    )

tokenized_dataset["train"] = tokenized_dataset["train"].rename_column('label', 'labels')
tokenized_dataset["test"] = tokenized_dataset["test"].rename_column('label', 'labels')

tokenized_dataset["train"].set_format('torch', columns=['text','labels', 'input_ids', 'attention_mask'])
tokenized_dataset["test"].set_format('torch', columns=['text','labels', 'input_ids', 'attention_mask'])

print(tokenized_dataset["train"])

Map:   0%|          | 0/1200 [00:00<?, ? examples/s]

Dataset({
    features: ['text', 'labels', 'input_ids', 'attention_mask'],
    num_rows: 1200
})


## Load Pretrained Foundation Model (gpt2)

In [27]:
from transformers import AutoModelForSequenceClassification
import torch

id2label = {0: "world", 1: "sports", 2: "business", 3: "technology"}
label2id = {val: key for key, val in id2label.items()}

model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=4,
    id2label=id2label,
    label2id=label2id,
)

model.config.pad_token_id = tokenizer.eos_token_id

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=4, bias=False)
)

## Evaluate Performance of Foundation Model

In [28]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir='./results',
        do_train=False,
        do_eval=True,
        per_device_eval_batch_size=8
    ),
    eval_dataset=tokenized_dataset["test"],
    compute_metrics=compute_metrics
)

trainer.evaluate()

{'eval_loss': 8.146344184875488,
 'eval_accuracy': 0.2236842105263158,
 'eval_runtime': 2.6209,
 'eval_samples_per_second': 28.998,
 'eval_steps_per_second': 3.815}

In [52]:
def spot_check_classification(text, model, class_map):
    inputs = tokenizer(text, return_tensors="pt")
    inputs.to(device)
    with torch.no_grad():
        logits = model(**inputs).logits

    predicted_class_id = logits.argmax().item()

    return class_map.get(predicted_class_id)

In [66]:
story = "The Senate passed legislation Tuesday that would force TikTok’s China-based parent company to sell the social media platform under the threat of a ban, a contentious move by U.S. lawmakers that’s expected to face legal challenges and disrupt the lives of content creators who rely on the short-form video app for income."
print(spot_check_classification(story, lora_model, id2label))

story = "The first glow-in-the-dark animals may have been ancient corals deep in the ocean. A new study suggests that the first animal that glowed in the dark was a coral that lived deep in the ocean about half a billion years ago."
print(spot_check_classification(story, lora_model, id2label))

story = """The most distant spacecraft from Earth stopped sending back understandable data last November. Flight controllers traced the blank communication to a bad computer chip and rearranged the spacecraft’s coding to work around the trouble.\n
NASA’s Jet Propulsion Laboratory in Southern California declared success after receiving good engineering updates late last week. The team is still working to restore transmission of the science data."""
print(spot_check_classification(story, lora_model, id2label))

world
world
world


## Performing Parameter-Efficient Fine-Tuning

Now, we perform PEFT on the foundation model using the AG News Dataset.

In [33]:
from peft import get_peft_model
from peft import LoraConfig
config = LoraConfig(
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    bias="none",
    fan_in_fan_out=True,
    task_type='SEQ_CLS'
)

lora_model = get_peft_model(model, config)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
lora_model.to(device)

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): Linear(
                in_features=768, out_features=2304, bias=True
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_

In [34]:
lora_model.print_trainable_parameters()

trainable params: 301,056 || all params: 124,740,864 || trainable%: 0.24134512969222338


In [35]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer
trainer = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./data/ag_news",
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        save_strategy="epoch",
        evaluation_strategy="epoch",
        num_train_epochs=3,
        weight_decay=0.01,
        load_best_model_at_end=True
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics
)

trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.983688,0.486842
2,No log,1.401849,0.578947
3,No log,1.280887,0.605263


Checkpoint destination directory ./data/ag_news/checkpoint-150 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory ./data/ag_news/checkpoint-300 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory ./data/ag_news/checkpoint-450 already exists and is non-empty.Saving will proceed but saved results may be invalid.


TrainOutput(global_step=450, training_loss=2.5102053493923613, metrics={'train_runtime': 310.963, 'train_samples_per_second': 11.577, 'train_steps_per_second': 1.447, 'total_flos': 943980753715200.0, 'train_loss': 2.5102053493923613, 'epoch': 3.0})

In [39]:
lora_model.save_pretrained("gpt-lora")

## Performing Inference with a PEFT Model

While the training loop tracks the accuracy of the trained model, let's do some spot checking here. 

In [40]:
from peft import PeftModelForSequenceClassification
lora_model = PeftModelForSequenceClassification.from_pretrained(lora_model,"gpt-lora")

In [65]:
story = "The Senate passed legislation Tuesday that would force TikTok’s China-based parent company to sell the social media platform under the threat of a ban, a contentious move by U.S. lawmakers that’s expected to face legal challenges and disrupt the lives of content creators who rely on the short-form video app for income."
print(spot_check_classification(story, lora_model, id2label))

story = "The first glow-in-the-dark animals may have been ancient corals deep in the ocean. A new study suggests that the first animal that glowed in the dark was a coral that lived deep in the ocean about half a billion years ago."
print(spot_check_classification(story, lora_model, id2label))

story = """The most distant spacecraft from Earth stopped sending back understandable data last November. Flight controllers traced the blank communication to a bad computer chip and rearranged the spacecraft’s coding to work around the trouble.\n
NASA’s Jet Propulsion Laboratory in Southern California declared success after receiving good engineering updates late last week. The team is still working to restore transmission of the science data."""
print(spot_check_classification(story, lora_model, id2label))

world
world
world


## Results

Before fine tuning, the foundation model gave only about a 20% accuracy score on classification of the AG News Dataset. After PEFT fine tuning, the accuracy increased to about 60%, which is much better. However, given the results of the spot checks, training for longer on more data would be necessary to achieve more usable results.