# Lightweight Fine-Tuning Project

* PEFT technique: [`LoRA`](https://huggingface.co/docs/peft/conceptual_guides/adapter#low-rank-adaptation-lora)
* Model: [`gpt2`](https://huggingface.co/openai-community/gpt2)
* Evaluation approach: Hugging Face's [`Trainer.evaluate`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Trainer.evaluate)
* Fine-tuning dataset: [`imdb`](https://huggingface.co/datasets/imdb)

In [1]:
from enum import Enum
from json import dumps

import numpy as np
import torch

from datasets import load_dataset
from IPython.display import display_markdown, Markdown
from peft import AutoPeftModelForSequenceClassification, LoraConfig, TaskType, get_peft_model, PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer, DataCollatorWithPadding, TrainingArguments, Trainer

I've chosen to use Hugging Face's [`evaluate`](https://huggingface.co/docs/evaluate/index) library, which may or may not be included on Udacity's workspace. We'll try to import it, and if it fails, we'll install it using pip.

In [2]:
try:
    import evaluate
except ImportError:
    import subprocess
    import sys
    subprocess.check_call([sys.executable, "-m", "pip", "install", "evaluate", "scikit-learn"])
    display_markdown(Markdown('<div class="alert alert-block alert-warning">New depencies were installed dynamically. <span style="font-weight: bold;">You should restart the kernel</span>.</div>'))

## Choosing PyTorch's device based on available backends

In [3]:
pytorch_device = torch.device("cpu")

In [4]:
if torch.cuda.is_available():
    pytorch_device = torch.device("cuda")
    display_markdown(Markdown('<div class="alert alert-block alert-success">Using CUDA backend</div>'))
elif torch.backends.mps.is_available():
    pytorch_device = torch.device("mps")
    display_markdown(Markdown('<div class="alert alert-block alert-success">Using MPS backend</div>'))
else:
    display_markdown(Markdown('<div class="alert alert-block alert-warning">Using CPU backend</div>'))

<div class="alert alert-block alert-success">Using CUDA backend</div>

## Loading and Evaluating a Foundation Model

In [5]:
pretrained_model_name = "gpt2"
display(Markdown(f"### Pre-Trained Model: `{pretrained_model_name}`"))

### Pre-Trained Model: `gpt2`

In [6]:
metric_name = "accuracy"
metric = evaluate.load(metric_name)
display_markdown(Markdown(f"### Metric `{metric_name}`:\n\n```\n{metric}```"))

### Metric `accuracy`:

```
EvaluationModule(name: "accuracy", module_type: "metric", features: {'predictions': Value(dtype='int32', id=None), 'references': Value(dtype='int32', id=None)}, usage: """
Args:
    predictions (`list` of `int`): Predicted labels.
    references (`list` of `int`): Ground truth labels.
    normalize (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
    sample_weight (`list` of `float`): Sample weights Defaults to None.

Returns:
    accuracy (`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`.. A higher score means higher accuracy.

Examples:

    Example 1-A simple example
        >>> accuracy_metric = evaluate.load("accuracy")
        >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
        >>> print(results)
        {'accuracy': 0.5}

    Example 2-The same as Example 1, except with `normalize` set to `False`.
        >>> accuracy_metric = evaluate.load("accuracy")
        >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], normalize=False)
        >>> print(results)
        {'accuracy': 3.0}

    Example 3-The same as Example 1, except with `sample_weight` set.
        >>> accuracy_metric = evaluate.load("accuracy")
        >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
        >>> print(results)
        {'accuracy': 0.8778625954198473}
""", stored examples: 0)```

In [7]:
def compute_metrics(eval_pred):
    # Taken from https://huggingface.co/docs/evaluate/transformers_integrations#trainer
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

### Load dataset

In [8]:
train_dataset, test_dataset = load_dataset("imdb", split=("train", "test"))

In [9]:
train_dataset.to_pandas().sample(5)

Unnamed: 0,text,label
7590,This is one of those movies that's trying to b...,0
6014,"First of all, I firmly believe that Norwegian ...",0
4284,The movie was disappointing. The book was powe...,0
20795,***SPOILERS*** ***SPOILERS*** After two so-so ...,1
20680,I have decided to not believe what famous movi...,1


In [10]:
test_dataset.to_pandas().sample(5)

Unnamed: 0,text,label
7902,"Though predictable and contrived, not a bad mo...",0
12027,The Canterville Ghost (1996).The director made...,0
11625,"Just awful. It's almost unbelievable that, wit...",0
14409,I'm glad Cage changed his name from Coppolla a...,1
9794,This is quite possibly the worst film I have e...,0


### Load tokenizer and model

In [11]:
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name)
tokenizer.pad_token = tokenizer.eos_token
display_markdown(Markdown(f"```\n{tokenizer}```"))

```
GPT2TokenizerFast(name_or_path='gpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}```

In [12]:
def tokenize_function(examples):
    # Taken from https://huggingface.co/docs/evaluate/transformers_integrations
    # No padding is set as suggested by https://huggingface.co/docs/transformers/tasks/sequence_classification#preprocess
    return tokenizer(examples["text"], truncation=True)

In [13]:
tokenized_train_dataset = train_dataset.map(tokenize_function, batched=True).shuffle(seed=1024).select(range(5000))
tokenized_test_dataset = test_dataset.map(tokenize_function, batched=True).shuffle(seed=1024).select(range(250))

In [14]:
class ReviewSentiment(Enum):
    NEGATIVE = 0
    POSITIVE = 1

In [15]:
id2label = {v.value: v.name for v in ReviewSentiment}
label2id = {v.name: v.value for v in ReviewSentiment}

In [36]:
model = AutoModelForSequenceClassification.from_pretrained(
    pretrained_model_name,
    num_labels=len(id2label),
    id2label=id2label,
    label2id=label2id
).to(pytorch_device)
model.config.pad_token_id = model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [17]:
display(Markdown(f"```\n{model}\n```"))

```
GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)
```

### Evaluate pre-trained model

In [49]:
evaluate_results_pretrained = Trainer(
    model=model,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length"),
    compute_metrics=compute_metrics
).evaluate()

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [50]:
display_markdown(Markdown(f"```json\n{dumps(evaluate_results_pretrained, indent=2)}\n```"))

```json
{
  "eval_loss": 3.720975160598755,
  "eval_accuracy": 0.428,
  "eval_runtime": 7.2374,
  "eval_samples_per_second": 34.543,
  "eval_steps_per_second": 4.422
}
```

## Performing Parameter-Efficient Fine-Tuning

In [21]:
# Taken from https://huggingface.co/docs/peft/quicktour

peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=True,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1
)

In [22]:
peft_model = get_peft_model(model, peft_config).to(pytorch_device)
display_markdown(Markdown(f"```\n{peft_model}\n```"))



```
PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D()
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (c_proj): Conv1D()
              (attn_dropout): Dropout(p=0.1, inplace=False)
              (resid_dropout): Dropout(p=0.1, inplace=False)
            )
            (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): GPT2MLP(
              (c_fc): Conv1D()
              (c_proj): Conv1D()
              (act): NewGELUActivation()
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
        (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      )
      (score): ModulesToSaveWrapper(
        (original_module): Linear(in_features=768, out_features=2, bias=False)
        (modules_to_save): ModuleDict(
          (default): Linear(in_features=768, out_features=2, bias=False)
        )
      )
    )
  )
)
```

In [23]:
peft_model.print_trainable_parameters()

trainable params: 1,536 || all params: 124,737,792 || trainable%: 0.0012313830278477271


In [24]:
training_arguments = TrainingArguments(
    output_dir="./nd608/gpt2-lora",
    learning_rate=1e-3,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    num_train_epochs=3,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

In [25]:
trainer = Trainer(
    model=peft_model,
    args=training_arguments,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length"),
    compute_metrics=compute_metrics
)

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [26]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.40948,0.828
2,No log,0.405727,0.816
3,No log,0.358477,0.856


TrainOutput(global_step=471, training_loss=0.6112000289236664, metrics={'train_runtime': 534.6836, 'train_samples_per_second': 28.054, 'train_steps_per_second': 0.881, 'total_flos': 7866223165440000.0, 'train_loss': 0.6112000289236664, 'epoch': 3.0})

In [27]:
peft_model.save_pretrained("./nd608/gpt2-lora")

In [51]:
del peft_model
del model

## Performing Inference with a PEFT Model

In [55]:
peft_model = AutoPeftModelForSequenceClassification.from_pretrained("./nd608/gpt2-lora", is_trainable=False).to(pytorch_device)
peft_model.config.pad_token_id = peft_model.config.eos_token_id
display_markdown(Markdown(f"```\n{peft_model}\n```"))

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


```
PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D()
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (c_proj): Conv1D()
              (attn_dropout): Dropout(p=0.1, inplace=False)
              (resid_dropout): Dropout(p=0.1, inplace=False)
            )
            (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): GPT2MLP(
              (c_fc): Conv1D()
              (c_proj): Conv1D()
              (act): NewGELUActivation()
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
        (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      )
      (score): ModulesToSaveWrapper(
        (original_module): Linear(in_features=768, out_features=2, bias=False)
        (modules_to_save): ModuleDict(
          (default): Linear(in_features=768, out_features=2, bias=False)
        )
      )
    )
  )
)
```

In [56]:
evaluate_results = Trainer(
    model=peft_model,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length"),
    compute_metrics=compute_metrics
).evaluate()

## Accuracy Comparison

### Pre-fine tuned `evalute` results

In [63]:
display_markdown(Markdown(f"```json\n{dumps(evaluate_results_pretrained, indent=2)}\n```"))

```json
{
  "eval_loss": 3.720975160598755,
  "eval_accuracy": 0.428,
  "eval_runtime": 7.2374,
  "eval_samples_per_second": 34.543,
  "eval_steps_per_second": 4.422
}
```

### Fine tuned `evalute` results

In [64]:
display_markdown(Markdown(f"```json\n{dumps(evaluate_results, indent=2)}\n```"))

```json
{
  "eval_loss": 0.3584771454334259,
  "eval_accuracy": 0.856,
  "eval_runtime": 7.5793,
  "eval_samples_per_second": 32.985,
  "eval_steps_per_second": 4.222
}
```