# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA
* Model: gbpt-2
* Evaluation approach: evaluate from trainer in Huggingface
* Fine-tuning dataset: Huggingface dataset

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [36]:
! pip install -q "datasets==2.15.0"

[0m

In [37]:
from datasets import load_dataset

# The sms_spam dataset only has a train split, so we use the train_test_split method to split it into train and test
dataset = load_dataset("imdb", split="train").train_test_split(
    test_size=0.2, shuffle=True, seed=23
)

splits = ["train", "test"]
for split in splits:
    dataset[split] = dataset[split].shuffle(seed=42).select(range(500))

# View the dataset characteristics
dataset["train"]

Dataset({
    features: ['text', 'label'],
    num_rows: 500
})

In [38]:
from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

In [39]:
tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(
        lambda x: tokenizer(x["text"], truncation=True), batched=True
    )
    
num_added_tokens = tokenizer.add_special_tokens({"cls_token": "[CLS]"})

# Inspect the available columns in the dataset
tokenized_dataset["train"]

Map: 100%|██████████| 500/500 [00:02<00:00, 229.47 examples/s]


Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 500
})

In [40]:
print(tokenized_dataset["train"][0]["input_ids"])

[25354, 416, 262, 6275, 1982, 29168, 290, 7924, 416, 4848, 945, 428, 2646, 373, 257, 3731, 18641, 13, 632, 3947, 1165, 1790, 13, 632, 3377, 1290, 1165, 890, 4441, 262, 367, 709, 271, 8137, 11, 290, 14376, 379, 262, 7835, 8665, 13, 3244, 845, 2952, 345, 423, 262, 28342, 290, 43005, 48067, 329, 262, 1641, 338, 3241, 11, 290, 262, 7818, 7664, 29847, 1671, 1220, 6927, 1671, 11037, 9590, 616, 3656, 1807, 31684, 373, 13779, 13]


In [41]:
from transformers import AutoModelForSequenceClassification

num_labels=2
model = AutoModelForSequenceClassification.from_pretrained("gpt2",num_labels=num_labels,
        id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
        label2id={"NEGATIVE": 0, "POSITIVE": 1})

model.config.pad_token_id = model.config.eos_token_id
# Freeze all the parameters of the base model
# Hint: Check the documentation at https://huggingface.co/transformers/v4.2.2/training.html
for param in model.base_model.parameters():
    param.requires_grad = False

model

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)

In [42]:
from transformers import DataCollatorWithPadding,Trainer, TrainingArguments
import numpy as np

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


embedding_layer = model.resize_token_embeddings(len(tokenizer))

You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embeding dimension will be 50258. This might induce some performance reduction as *Tensor Cores* will not be available. For more details  about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc


In [43]:
trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./model/base_model",
        learning_rate=2e-3,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=1,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.527668,0.766


TrainOutput(global_step=125, training_loss=0.898269287109375, metrics={'train_runtime': 75.2071, 'train_samples_per_second': 6.648, 'train_steps_per_second': 1.662, 'total_flos': 135947800018944.0, 'train_loss': 0.898269287109375, 'epoch': 1.0})

In [44]:
trainer.evaluate()

{'eval_loss': 0.5276684761047363,
 'eval_accuracy': 0.766,
 'eval_runtime': 20.5932,
 'eval_samples_per_second': 24.28,
 'eval_steps_per_second': 6.07,
 'epoch': 1.0}

In [45]:
import pandas as pd

df = pd.DataFrame(tokenized_dataset["test"])
df = df[["text", "label"]]

# Replace <br /> tags in the text with spaces
df["text"] = df["text"].str.replace("<br />", " ")

# Add the model predictions to the dataframe
predictions = trainer.predict(tokenized_dataset["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)

df.head(2)

Unnamed: 0,text,label,predicted_label
0,"As I watched this movie, and I began to see it...",1,1
1,Don't understand how these animated movies kee...,0,0


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [46]:
from transformers import AutoModelForSequenceClassification
from peft import PeftModelForSequenceClassification, get_peft_config,LoraConfig,TaskType

peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
)

model = AutoModelForSequenceClassification.from_pretrained("gpt2",id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
        label2id={"NEGATIVE": 0, "POSITIVE": 1})
model.config.pad_token_id = model.config.eos_token_id
embedding_layer = model.resize_token_embeddings(len(tokenizer))
peft_model = PeftModelForSequenceClassification(model, peft_config)
peft_model.print_trainable_parameters()

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embeding dimension will be 50258. This might induce some performance reduction as *Tensor Cores* will not be available. For more details  about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc


trainable params: 297,984 || all params: 124,738,560 || trainable%: 0.23888683659647827


In [47]:
for param in model.base_model.parameters():
    param.requires_grad = True

trainer = Trainer(
    model=peft_model,
    args=TrainingArguments(
        output_dir="./model/peft_model",
        learning_rate=2e-3,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=2,
        per_device_eval_batch_size=2,
        num_train_epochs=1,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.696316,0.474


TrainOutput(global_step=250, training_loss=0.8759828491210937, metrics={'train_runtime': 81.621, 'train_samples_per_second': 6.126, 'train_steps_per_second': 3.063, 'total_flos': 103530968432640.0, 'train_loss': 0.8759828491210937, 'epoch': 1.0})

In [48]:
# save the model using save_pretrained
peft_model.save_pretrained('model/peft_model')

In [49]:
trainer.evaluate()

{'eval_loss': 0.696315586566925,
 'eval_accuracy': 0.474,
 'eval_runtime': 15.8978,
 'eval_samples_per_second': 31.451,
 'eval_steps_per_second': 15.725,
 'epoch': 1.0}

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [50]:
from peft import AutoPeftModelForSequenceClassification

peft_model_id = "./model/peft_model"
# Load the Lora model
inference_model = AutoPeftModelForSequenceClassification.from_pretrained(peft_model_id)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [51]:
import torch

items_for_manual_review = tokenized_dataset["test"].select(
    [0, 1, 22, 31, 43, 292, 448, 487])

for item in items_for_manual_review:
    
    print(item['text'][:100])
    print(f'label:  {inference_model.config.id2label[item["label"]]}')
    
    # tokenize the text
    inputs = tokenizer(item['text'], return_tensors="pt")
    # find prediction using logits
    logits = inference_model(**inputs).logits
    # extract the maximum of the predicted values 
    predictions = torch.argmax(logits,dim=1).numpy()[0]
    print(f'prediction: {inference_model.config.id2label[predictions]}' )


As I watched this movie, and I began to see its' characters develop I could feel this would be an ex
label:  LABEL_1
prediction: LABEL_1
Don't understand how these animated movies keep coming out, and no matter how good (or bad) it is pe
label:  LABEL_0
prediction: LABEL_1
I saw this film in the worst possible circumstance. I'd already missed 15 minutes when I woke up to 
label:  LABEL_1
prediction: LABEL_1
I've now seen this one about 10 times, so there must be something about it I like!<br /><br />50's U
label:  LABEL_1
prediction: LABEL_1
I saw this movie years ago on late night television. Back then it went by the title of "Stairway to 
label:  LABEL_1
prediction: LABEL_1
Well I don't personally like rap, but I still found Fear of a Black Hat hilarious. I'm sure I didn't
label:  LABEL_1
prediction: LABEL_1
Rosenstrasse is a touching story of courage in adversity. Reichdeutch women find that their Jewish H
label:  LABEL_1
prediction: LABEL_1
"Problem Child" is one of the goofiest mo