# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA
* Model: gbpt-2
* Evaluation approach: evaluate from trainer in Huggingface
* Fine-tuning dataset: Huggingface dataset

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
! pip install -q "datasets==2.15.0"

In [2]:
from datasets import load_dataset

# The sms_spam dataset only has a train split, so we use the train_test_split method to split it into train and test
dataset = load_dataset("imdb", split="train").train_test_split(
    test_size=0.2, shuffle=True, seed=23
)

splits = ["train", "test"]
for split in splits:
    dataset[split] = dataset[split].shuffle(seed=42).select(range(1000))

# View the dataset characteristics
dataset["train"]

  from .autonotebook import tqdm as notebook_tqdm


Dataset({
    features: ['text', 'label'],
    num_rows: 1000
})

In [3]:
from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

In [4]:
tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(
        lambda x: tokenizer(x["text"], truncation=True), batched=True
    )
    
num_added_tokens = tokenizer.add_special_tokens({"cls_token": "[CLS]"})

# Inspect the available columns in the dataset
tokenized_dataset["train"]

Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 1000
})

In [5]:
print(tokenized_dataset["train"][0]["input_ids"])

[25354, 416, 262, 6275, 1982, 29168, 290, 7924, 416, 4848, 945, 428, 2646, 373, 257, 3731, 18641, 13, 632, 3947, 1165, 1790, 13, 632, 3377, 1290, 1165, 890, 4441, 262, 367, 709, 271, 8137, 11, 290, 14376, 379, 262, 7835, 8665, 13, 3244, 845, 2952, 345, 423, 262, 28342, 290, 43005, 48067, 329, 262, 1641, 338, 3241, 11, 290, 262, 7818, 7664, 29847, 1671, 1220, 6927, 1671, 11037, 9590, 616, 3656, 1807, 31684, 373, 13779, 13]


In [6]:
from transformers import GPT2ForSequenceClassification

num_labels=2
model = GPT2ForSequenceClassification.from_pretrained("gpt2",num_labels=num_labels,
        id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
        label2id={"NEGATIVE": 0, "POSITIVE": 1})

model.config.pad_token_id = model.config.eos_token_id
# Freeze all the parameters of the base model
# Hint: Check the documentation at https://huggingface.co/transformers/v4.2.2/training.html
for param in model.base_model.parameters():
    param.requires_grad = False

model

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)

In [7]:
from transformers import DataCollatorWithPadding,Trainer, TrainingArguments
import numpy as np

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


embedding_layer = model.resize_token_embeddings(len(tokenizer))


trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./model/base_model",
        learning_rate=2e-3,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=1,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embeding dimension will be 50258. This might induce some performance reduction as *Tensor Cores* will not be available. For more details  about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.43156,0.79


TrainOutput(global_step=250, training_loss=0.6743687133789062, metrics={'train_runtime': 154.2351, 'train_samples_per_second': 6.484, 'train_steps_per_second': 1.621, 'total_flos': 274435077832704.0, 'train_loss': 0.6743687133789062, 'epoch': 1.0})

In [8]:
trainer.evaluate()

{'eval_loss': 0.43156003952026367,
 'eval_accuracy': 0.79,
 'eval_runtime': 44.9759,
 'eval_samples_per_second': 22.234,
 'eval_steps_per_second': 5.559,
 'epoch': 1.0}

In [9]:
import pandas as pd

df = pd.DataFrame(tokenized_dataset["test"])
df = df[["text", "label"]]

# Replace <br /> tags in the text with spaces
df["text"] = df["text"].str.replace("<br />", " ")

# Add the model predictions to the dataframe
predictions = trainer.predict(tokenized_dataset["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)

df.head(2)

Unnamed: 0,text,label,predicted_label
0,"As I watched this movie, and I began to see it...",1,1
1,Don't understand how these animated movies kee...,0,0


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [10]:
from transformers import GPT2ForSequenceClassification
from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
model_name_or_path = "gpt2"

peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,inference_mode=True, r=8, lora_alpha=32, lora_dropout=0.1
)

model = GPT2ForSequenceClassification.from_pretrained("gpt2",num_labels=2,
        id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
        label2id={"NEGATIVE": 0, "POSITIVE": 1})
model.config.pad_token_id = model.config.eos_token_id
embedding_layer = model.resize_token_embeddings(len(tokenizer))
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embeding dimension will be 50258. This might induce some performance reduction as *Tensor Cores* will not be available. For more details  about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc


trainable params: 3,072 || all params: 124,738,560 || trainable%: 0.0024627508927471987


In [11]:
for param in model.base_model.parameters():
    param.requires_grad = True

In [12]:
trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./model/peft_model",
        learning_rate=2e-3,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=2,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.702656,0.475
2,0.824400,0.694929,0.503


TrainOutput(global_step=500, training_loss=0.82435791015625, metrics={'train_runtime': 381.9917, 'train_samples_per_second': 5.236, 'train_steps_per_second': 1.309, 'total_flos': 550592603172864.0, 'train_loss': 0.82435791015625, 'epoch': 2.0})

In [13]:
trainer.evaluate()

{'eval_loss': 0.6949294209480286,
 'eval_accuracy': 0.503,
 'eval_runtime': 46.0008,
 'eval_samples_per_second': 21.739,
 'eval_steps_per_second': 5.435,
 'epoch': 2.0}

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [17]:
from peft import PeftModel, PeftConfig
from transformers import GPT2ForSequenceClassification

peft_model_id = "./model/peft_model/checkpoint-250"
config = PeftConfig.from_pretrained(peft_model_id)

inference_model = GPT2ForSequenceClassification.from_pretrained(config.base_model_name_or_path,num_labels=2,
        id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
        label2id={"NEGATIVE": 0, "POSITIVE": 1})
in_tokenizer = GPT2Tokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
inference_model = PeftModel.from_pretrained(inference_model, peft_model_id)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [18]:
import torch

items_for_manual_review = tokenized_dataset["test"].select(
    [0, 1, 22, 31, 43, 292, 448, 487])

for item in items_for_manual_review:
    
    print(item['text'][:100])
    print(f'label:  {inference_model.config.id2label[item["label"]]}')
    
    # tokenize the text
    inputs = in_tokenizer(item['text'], return_tensors="pt")
    # find prediction using logits
    logits = inference_model(**inputs).logits
    # extract the maximum of the predicted values 
    predictions = torch.argmax(logits,dim=1).numpy()[0]
    print(f'prediction: {inference_model.config.id2label[predictions]}' )


As I watched this movie, and I began to see its' characters develop I could feel this would be an ex
label:  POSITIVE
prediction: POSITIVE
Don't understand how these animated movies keep coming out, and no matter how good (or bad) it is pe
label:  NEGATIVE
prediction: POSITIVE
I saw this film in the worst possible circumstance. I'd already missed 15 minutes when I woke up to 
label:  POSITIVE
prediction: POSITIVE
I've now seen this one about 10 times, so there must be something about it I like!<br /><br />50's U
label:  POSITIVE
prediction: POSITIVE
I saw this movie years ago on late night television. Back then it went by the title of "Stairway to 
label:  POSITIVE
prediction: POSITIVE
Well I don't personally like rap, but I still found Fear of a Black Hat hilarious. I'm sure I didn't
label:  POSITIVE
prediction: POSITIVE
Rosenstrasse is a touching story of courage in adversity. Reichdeutch women find that their Jewish H
label:  POSITIVE
prediction: POSITIVE
"Problem Child" is one of t