# Fine‑tuning GPT‑2 for Sentiment Analysis (rotten tomatoes)

**작성일**: 2025-07-19 by Youngwoo Kimh (Credit : DSAIL Lab, SNU)

**목표**:  
- BERT를 활용해 downstream task를 수행하는 실습
- Rotten tomato dataset을 통해 영화평의 긍정/부정을 파악

---
> 본 노트북은 HD현대 실습을 위해 교육용 자료로서 준비되었으며, PyTorch와 HuggingFace Transformers 라이브러리를 사용합니다.

## 0. Setup

In [None]:
!pip -q install transformers datasets evaluate accelerate fsspec -U

## 1. Imports

In [None]:
from datasets import load_dataset
from transformers import (GPT2TokenizerFast, GPT2ForSequenceClassification,
                          DataCollatorWithPadding, Trainer, TrainingArguments)
from peft import LoraConfig, get_peft_model
import evaluate, torch, numpy as np

## 2. Load rotten tomatoes dataset

In [None]:
# load rotten tomatoes dataset for sentiment analysis
raw = load_dataset("rotten_tomatoes")
print(raw)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 8530
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
})


### Inspect samples

In [None]:
for i in range(3):
    print('LABEL:', raw['train'][i]['label'])
    print(raw['train'][i]['text'][:400])
    print('-'*40)

LABEL: 1
the rock is destined to be the 21st century's new " conan " and that he's going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .
----------------------------------------
LABEL: 1
the gorgeously elaborate continuation of " the lord of the rings " trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded vision of j . r . r . tolkien's middle-earth .
----------------------------------------
LABEL: 1
effective but too-tepid biopic
----------------------------------------


## 3. Prepare Tokenizer and Tokenize dataset

In [None]:
# load tokenizer and add padding token [PAD]
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token':'[PAD]'})

# tokenize all the sentences in the dataset
def tok(b): return tokenizer(b["text"], truncation=True)
data = raw.map(tok, batched=True, remove_columns=["text"]).rename_column("label", "labels")


Map:   0%|          | 0/1066 [00:00<?, ? examples/s]

### Load Model

In [None]:
# load model for classification
model = GPT2ForSequenceClassification.from_pretrained("gpt2", num_labels=2)

# resize token_embeddings length (added [PAD] token)
model.resize_token_embeddings(len(tokenizer))

# define Lora(Low-rank adaptation) config
lora_cfg = LoraConfig(r=8, lora_alpha=16, target_modules=["c_attn","c_proj"])

# get gpt-2 model with Lora adapted
model = get_peft_model(model, lora_cfg)
model.config.pad_token_id = tokenizer.pad_token_id

# load model on gpu
model.to("cuda")

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


PeftModel(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50258, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D(nf=2304, nx=768)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magni

### Data collator

In [None]:
# define data collator (make dataset model-friendly)
collator = DataCollatorWithPadding(tokenizer)

## 4. Training

In [None]:
# define metrics to select the best model while training
metric = evaluate.load("accuracy")
def acc(p): logits, labels = p; return metric.compute(predictions=np.argmax(logits, -1), references=labels)

# define training arguments
args = TrainingArguments(
    "gpt2-rt-sentiment",
    per_device_train_batch_size=4, # batch size for each gpu
    gradient_accumulation_steps=4, # the number of batches to update model simultaneously
    num_train_epochs=3, # total epochs
    logging_steps=100,
    fp16=True, # cast model precision to fp16 (half the memory compared to original fp32 model)
    report_to="none",
)

# define trainer class
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=data["train"], # training dataset
    eval_dataset=data["validation"], # evaluation dataset
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer, pad_to_multiple_of=8),
    compute_metrics=acc,
)
trainer.train()

  trainer = Trainer(
No label_names provided for model class `PeftModel`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Step,Training Loss
100,0.7683
200,0.6935
300,0.6744
400,0.614
500,0.5689
600,0.5091
700,0.4565
800,0.4525
900,0.462
1000,0.4333




TrainOutput(global_step=1602, training_loss=0.5103202438384258, metrics={'train_runtime': 421.7756, 'train_samples_per_second': 60.672, 'train_steps_per_second': 3.798, 'total_flos': 533981440770048.0, 'train_loss': 0.5103202438384258, 'epoch': 3.0})

## 5. Evaluation

In [None]:
trainer.evaluate()

{'eval_runtime': 3.6596,
 'eval_samples_per_second': 291.285,
 'eval_steps_per_second': 36.616,
 'epoch': 3.0}

## 6. Inference demo

In [None]:

def predict(text):
    # tokenize input
    inputs = tokenizer(text, return_tensors='pt').to('cuda')

    # inference
    with torch.no_grad():
        logits = model(**inputs).logits

    # probability of each label (Positive, Negative)
    label = logits.argmax(-1).item()
    return 'POSITIVE' if label==1 else 'NEGATIVE'

# example sentence
examples = [
    'An amazing movie with stellar performances.',
    'It was a waste of two hours.'
]

for ex in examples:
    print(ex, '→', predict(ex))


An amazing movie with stellar performances. → POSITIVE
It was a waste of two hours. → NEGATIVE


In [None]:
review = '' # fill your review sentence
print(review, '→', predict(review))