# Lightweight Fine-Tuning Project

This project is to classify emotions using a foundation model (GPT2). The purpose is to compare before and after light weight fine-tuning, how the model performances. 




here are the choices for PEFT techique, foundation model used, evaluation approach and the dataset for fine-tuning:

* PEFT technique: Lora techique
* Model: GPT-2
* Evaluation approach: Classification evaluation approaches such as accuracy, confusion matrix, auc, f1 score, precision-recall curve
* Fine-tuning dataset: zeroshot/twitter-financial-news-sentiment

In [1]:
# imports modules

from datasets import load_dataset
from collections import Counter

  from .autonotebook import tqdm as notebook_tqdm


## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

### Load the dataset dair-ai/emotion from datasets 

* Three splits in the dataset including train, validation and test
* We have 6 class labels - what are they?


In [2]:
dataset_name = "zeroshot/twitter-financial-news-sentiment"

dataset = load_dataset(dataset_name)
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 9543
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 2388
    })
})


In [3]:
# number of labels
counts = Counter(dataset["train"]["label"])
sorted_counts = sorted(counts.items())
print(sorted_counts)

[(0, 1442), (1, 1923), (2, 6178)]


In [4]:
for entry in dataset["train"].select(range(6)):
    text = entry["text"]
    label = entry["label"]
    print(f"label={label}, text={text}")

label=0, text=$BYND - JPMorgan reels in expectations on Beyond Meat https://t.co/bd0xbFGjkT
label=0, text=$CCL $RCL - Nomura points to bookings weakness at Carnival and Royal Caribbean https://t.co/yGjpT2ReD3
label=0, text=$CX - Cemex cut at Credit Suisse, J.P. Morgan on weak building outlook https://t.co/KN1g4AWFIb
label=0, text=$ESS: BTIG Research cuts to Neutral https://t.co/MCyfTsXc2N
label=0, text=$FNKO - Funko slides after Piper Jaffray PT cut https://t.co/z37IJmCQzB
label=0, text=$FTI - TechnipFMC downgraded at Berenberg but called Top Pick at Deutsche Bank https://t.co/XKcPDilIuU


In [5]:
# load GPT-2 and tokenizer and evaluate on the test set

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# model name
model_name = 'gpt2'

tokenizer = AutoTokenizer.from_pretrained("gpt2")





In [14]:
tokenizer.pad_token = tokenizer.eos_token


def tokenize_function(examples):
    return tokenizer(examples["text"],padding="max_length", truncation=True)

tokenized_dataset = {}
splits = ["train", "validation"]

for split in splits:
    tokenized_dataset[split] = dataset[split].map(tokenize_function, batched=True)

tokenized_dataset

Map: 100%|██████████| 9543/9543 [00:01<00:00, 5398.98 examples/s]
Map: 100%|██████████| 2388/2388 [00:00<00:00, 6040.67 examples/s]


{'train': Dataset({
     features: ['text', 'label', 'input_ids', 'attention_mask'],
     num_rows: 9543
 }),
 'validation': Dataset({
     features: ['text', 'label', 'input_ids', 'attention_mask'],
     num_rows: 2388
 })}

In [7]:
tokenized_dataset["train"][1]["input_ids"]

[3,
 4093,
 43,
 720,
 49,
 5097,
 532,
 21198,
 5330,
 2173,
 284,
 1492,
 654,
 10453,
 379,
 40886,
 290,
 8111,
 18020,
 3740,
 1378,
 83,
 13,
 1073,
 14,
 88,
 38,
 34523,
 51,
 17,
 3041,
 35,
 18,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50256,
 50

In [8]:
# Prepare dataset for pytorch

from transformers import DataCollatorWithPadding
from torch.utils.data import DataLoader



## Load and Setup the model

In [9]:
model = AutoModelForSequenceClassification.from_pretrained(
    model_name, 
    num_labels=3,
    id2label={0: "Bearish", 1: "Bullish", 2: "Neutral"},
    label2id={"Bearish": 0, "Bullish": 1, "Neutral": 2 }
)

# model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
print(model)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=3, bias=False)
)


In [10]:
# Freeze all the parameter of the base model
for param in model.base_model.parameters():
    param.requires_grad = False

print(model)

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=3, bias=False)
)


In [11]:
model.score

Linear(in_features=768, out_features=3, bias=False)

## Train the classification head

In [None]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

## TODO: more classification metrics
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


# Ensure the model's config recognizes the padding token
model.config.pad_token_id = tokenizer.pad_token_id
# Use the HuggingFace Trainer class to handle the training and eval loop 

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./output",
        learning_rate=2e-3,
        per_device_train_batch_size=4,
        per_gpu_eval_batch_size=4,
        num_train_epochs=1,
        weight_decay=0.01,
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()   




  0%|          | 0/2386 [05:36<?, ?it/s]
  0%|          | 0/2386 [00:00<?, ?it/s]

In [91]:
# !pip install accelerate -U

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting accelerate
  Downloading accelerate-1.0.1-py3-none-any.whl.metadata (19 kB)
Collecting sympy==1.13.1 (from torch>=1.10.0->accelerate)
  Downloading sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Downloading accelerate-1.0.1-py3-none-any.whl (330 kB)
Downloading sympy-1.13.1-py3-none-any.whl (6.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.2/6.2 MB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0mm
[?25hInstalling collected packages: sympy, accelerate
  Attempting uninstall: sympy
    Found existing installation: sympy 1.13.2
    Uninstalling sympy-1.13.2:
      Successfully uninstalled sympy-1.13.2
Successfully installed accelerate-1.0.1 sympy-1.13.1


## Let's try fine-tuning the gpt2 model

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.