# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA
* Model: GPT-2
* Evaluation approach: Trainer.evaluate
* Fine-tuning dataset:  Yelp/yelp_review_full

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
# load yelp review dataset.

!pip install --upgrade datasets==3.2.0 huggingface-hub==0.27.1

Defaulting to user installation because normal site-packages is not writeable


In [2]:
# Split data into training and testing set.

from datasets import load_dataset

train_ds, test_ds = load_dataset("yelp/yelp_review_full", split=['train', 'test'])


for entry in train_ds.select(range(3)):
    label = entry["label"]
    text = entry["text"]
    print(f"label={label}, text={text}")
    
print("\n")

for entry in test_ds.select(range(3)):
    label = entry["label"]
    text = entry["text"]
    print(f"label={label}, text={text}")

label=4, text=dr. goldberg offers everything i look for in a general practitioner.  he's nice and easy to talk to without being patronizing; he's always on time in seeing his patients; he's affiliated with a top-notch hospital (nyu) which my parents have explained to me is very important in case something happens and you need surgery; and you can get referrals to see specialists without having to see him first.  really, what more do you need?  i'm sitting here trying to think of any complaints i have about him, but i'm really drawing a blank.
,label=1, text=Unfortunately, the frustration of being Dr. Goldberg's patient is a repeat of the experience I've had with so many other doctors in NYC -- good doctor, terrible staff.  It seems that his staff simply never answers the phone.  It usually takes 2 hours of repeated calling to get an answer.  Who has time for that or wants to deal with it?  I have run into this problem with many other doctors and I just don't get it.  You have office wo

In [4]:
# Tokenize the review text with GPT2 tokenizer.

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.unk_token

small_train_ds = train_ds.shuffle(seed=42).select(range(1000))
small_test_ds = test_ds.shuffle(seed=42).select(range(100))

def tokenize_function(examples):
    return tokenizer(examples["text"],padding="max_length", truncation=True)

tokenized_train_ds = small_train_ds.map(
      tokenize_function , batched=True
    )
tokenized_test_ds = small_test_ds.map(
        tokenize_function, batched=True
    )

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [24]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=5)
model.config.pad_token_id = model.config.eos_token_id


print(model)


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
,You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
,  (transformer): GPT2Model(
,    (wte): Embedding(50257, 768)
,    (wpe): Embedding(1024, 768)
,    (drop): Dropout(p=0.1, inplace=False)
,    (h): ModuleList(
,      (0-11): 12 x GPT2Block(
,        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
,        (attn): GPT2Attention(
,          (c_attn): Conv1D()
,          (c_proj): Conv1D()
,          (attn_dropout): Dropout(p=0.1, inplace=False)
,          (resid_dropout): Dropout(p=0.1, inplace=False)
,        )
,        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
,        (mlp): GPT2MLP(
,          (c_fc): Conv1D()
,          (c_proj): Conv1D()
,          (act): NewGELUActivation()
,          (dropout): Dropout(p=0.1, inplace=False)
,        )
,      )
,    )
,    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
,  )
,  (score): Linear(in_features=768, out_features=5, bias=False)
,)


In [10]:
import numpy as np
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments("test_trainer")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train_ds,
    eval_dataset=tokenized_test_ds,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

trainer.evaluate()

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [25]:
# Load model and lora-config

from peft import LoraConfig
config = LoraConfig(
    task_type="SEQ_CLS",
    r=8,
    lora_alpha=32,
    lora_dropout=0.01,
    modules_to_save = ["score"]
    #lora_bias
)

from peft import get_peft_model
lora_model = get_peft_model(model, config)

In [26]:
print(lora_model)
lora_model.print_trainable_parameters()

PeftModelForSequenceClassification(
,  (base_model): LoraModel(
,    (model): GPT2ForSequenceClassification(
,      (transformer): GPT2Model(
,        (wte): Embedding(50257, 768)
,        (wpe): Embedding(1024, 768)
,        (drop): Dropout(p=0.1, inplace=False)
,        (h): ModuleList(
,          (0-11): 12 x GPT2Block(
,            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
,            (attn): GPT2Attention(
,              (c_attn): Linear(
,                in_features=768, out_features=2304, bias=True
,                (lora_dropout): ModuleDict(
,                  (default): Dropout(p=0.01, inplace=False)
,                )
,                (lora_A): ModuleDict(
,                  (default): Linear(in_features=768, out_features=8, bias=False)
,                )
,                (lora_B): ModuleDict(
,                  (default): Linear(in_features=8, out_features=2304, bias=False)
,                )
,                (lora_embedding_A): ParameterDict()
,        

In [13]:
train_lora = tokenized_train_ds.rename_column('label', 'labels').remove_columns("text")
test_lora = tokenized_test_ds.rename_column('label', 'labels').remove_columns("text")

In [27]:
# Start the training process

from transformers import DataCollatorWithPadding


lora_trainer = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis",
        learning_rate=2e-3,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=4,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        remove_unused_columns=False,
        label_names=["labels"],
        save_safetensors=False,
    ),
    train_dataset=train_lora,
    eval_dataset=test_lora,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

lora_trainer.train();
metrics = lora_trainer.evaluate();
print(metrics)





Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.30169,0.45
2,1.542700,1.408385,0.46
3,1.542700,1.187754,0.53
4,0.991800,1.211912,0.54


Checkpoint destination directory ./data/sentiment_analysis/checkpoint-250 already exists and is non-empty.Saving will proceed but saved results may be invalid.
,Checkpoint destination directory ./data/sentiment_analysis/checkpoint-500 already exists and is non-empty.Saving will proceed but saved results may be invalid.
,Checkpoint destination directory ./data/sentiment_analysis/checkpoint-750 already exists and is non-empty.Saving will proceed but saved results may be invalid.
,Checkpoint destination directory ./data/sentiment_analysis/checkpoint-1000 already exists and is non-empty.Saving will proceed but saved results may be invalid.


{'eval_loss': 1.1877543926239014, 'eval_accuracy': 0.53, 'eval_runtime': 9.4965, 'eval_samples_per_second': 10.53, 'eval_steps_per_second': 2.633, 'epoch': 4.0}


In [28]:
lora_model.save_pretrained("gpt-lora-yelp-final2")



In [29]:
print(lora_model)

PeftModelForSequenceClassification(
,  (base_model): LoraModel(
,    (model): GPT2ForSequenceClassification(
,      (transformer): GPT2Model(
,        (wte): Embedding(50257, 768)
,        (wpe): Embedding(1024, 768)
,        (drop): Dropout(p=0.1, inplace=False)
,        (h): ModuleList(
,          (0-11): 12 x GPT2Block(
,            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
,            (attn): GPT2Attention(
,              (c_attn): Linear(
,                in_features=768, out_features=2304, bias=True
,                (lora_dropout): ModuleDict(
,                  (default): Dropout(p=0.01, inplace=False)
,                )
,                (lora_A): ModuleDict(
,                  (default): Linear(in_features=768, out_features=8, bias=False)
,                )
,                (lora_B): ModuleDict(
,                  (default): Linear(in_features=8, out_features=2304, bias=False)
,                )
,                (lora_embedding_A): ParameterDict()
,        

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [31]:
from peft import AutoPeftModelForSequenceClassification
lora_model_loaded = AutoPeftModelForSequenceClassification.from_pretrained("gpt-lora-yelp-final2", num_labels=5)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
,You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [32]:
print(lora_model_loaded)

PeftModelForSequenceClassification(
,  (base_model): LoraModel(
,    (model): GPT2ForSequenceClassification(
,      (transformer): GPT2Model(
,        (wte): Embedding(50257, 768)
,        (wpe): Embedding(1024, 768)
,        (drop): Dropout(p=0.1, inplace=False)
,        (h): ModuleList(
,          (0-11): 12 x GPT2Block(
,            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
,            (attn): GPT2Attention(
,              (c_attn): Linear(
,                in_features=768, out_features=2304, bias=True
,                (lora_dropout): ModuleDict(
,                  (default): Dropout(p=0.01, inplace=False)
,                )
,                (lora_A): ModuleDict(
,                  (default): Linear(in_features=768, out_features=8, bias=False)
,                )
,                (lora_B): ModuleDict(
,                  (default): Linear(in_features=8, out_features=2304, bias=False)
,                )
,                (lora_embedding_A): ParameterDict()
,        

In [33]:
lora_model_loaded.config.pad_token_id = lora_model_loaded.config.eos_token_id

trainer = Trainer(
    model=lora_model_loaded,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis_evaluate",
        label_names=["labels"],
    ),
    train_dataset=train_lora,
    eval_dataset=test_lora,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

print(trainer.evaluate())


model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=5)
model.config.pad_token_id = model.config.eos_token_id

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis_evaluate",
        label_names=["labels"],
    ),
    train_dataset=train_lora,
    eval_dataset=test_lora,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

print(trainer.evaluate())

{'eval_loss': 1.1877543926239014, 'eval_accuracy': 0.53, 'eval_runtime': 9.4002, 'eval_samples_per_second': 10.638, 'eval_steps_per_second': 1.383}


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
,You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'eval_loss': 6.752526760101318, 'eval_accuracy': 0.21, 'eval_runtime': 9.4509, 'eval_samples_per_second': 10.581, 'eval_steps_per_second': 1.376}
