# Fine-tuning GPT-2 for text classification
Stefan/Yuzhao Heng
Since Wed. Feb. 9th, 2022


Reproduce the results in paper [Zero-shot Text Classification With Generative Language Models](https://arxiv.org/abs/1912.10165),
since the authors didn't release the code.

Serve as infrastructure and baseline for project on efficient and accurate encoder for text classification with many labels.


## Notebook Setup



In [1]:
%load_ext autoreload
%autoreload 2



## Colab Setup



In [2]:
import os
import sys


if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount('/content/drive')

    ! pip3 install sty icecream transformers datasets

    # base_path = '/content/drive/My Drive//Research/'
    # os.chdir(os.path.join(base_path, 'Unified Encoder/Unified-Encoder'))

    sys.path.append(os.path.join('drive', 'My Drive', 'Research', 'Zeroshot Text Classification', 'Zeroshot-Text-Classification'))


from zeroshot_encoder.util import *
print(PATH_BASE, DIR_PROJ, PKG_NM)  # Sanity check, should be the path appended if Colab



/Users/stefanh/Documents/UMich/Research/Clarity Lab/Zeroshot Text Classification Zeroshot-Text-Classification zeroshot_encoder


## Setup



In [3]:
import random

import numpy as np
import torch
import transformers
from icecream import ic

from zeroshot_encoder.baseline import gpt2


if torch.cuda.is_available():
    ! nvidia-smi

rcParams['figure.dpi'] = 200
rcParams['font.size'] = 6



## Seed setup



In [4]:
if torch.cuda.is_available():
    os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'  # Required for some CuBLAS operations
    ! echo $CUBLAS_WORKSPACE_CONFIG


seed = config('random-seed')
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.use_deterministic_algorithms(True)

def seed_worker(worker_id):
    worker_seed = torch.initial_seed() % 2**32
    np.random.seed(worker_seed)
    random.seed(worker_seed)

g = torch.Generator()
g.manual_seed(seed)
transformers.set_seed(seed)



## Prep Model & Dataset for training


In [5]:
# nm, n = 'debug', 8
nm, n = 'debug-large', 128
model, tokenizer, data_collator, train_args, dset_tr, dset_vl, trainer = gpt2.get_all_setup(
    nm, n_sample=n, random_seed=seed
)



Some weights of ZsGPT2LMHeadModel were not initialized from the model checkpoint at gpt2 and are newly initialized because the shapes did not match:
- wpe.weight: found shape torch.Size([1024, 768]) in the checkpoint and torch.Size([256, 768]) in the model instantiated
- h.0.attn.bias: found shape torch.Size([1, 1, 1024, 1024]) in the checkpoint and torch.Size([1, 1, 128, 128]) in the model instantiated
- h.1.attn.bias: found shape torch.Size([1, 1, 1024, 1024]) in the checkpoint and torch.Size([1, 1, 128, 128]) in the model instantiated
- h.2.attn.bias: found shape torch.Size([1, 1, 1024, 1024]) in the checkpoint and torch.Size([1, 1, 128, 128]) in the model instantiated
- h.3.attn.bias: found shape torch.Size([1, 1, 1024, 1024]) in the checkpoint and torch.Size([1, 1, 128, 128]) in the model instantiated
- h.4.attn.bias: found shape torch.Size([1, 1, 1024, 1024]) in the checkpoint and torch.Size([1, 1, 128, 128]) in the model instantiated
- h.5.attn.bias: found shape torch.Size([1, 1

  0%|          | 0/2 [00:00<?, ?it/s]

Loading cached processed dataset at /Users/stefanh/.cache/huggingface/datasets/ag_news/default/0.0.0/bc2bcb40336ace1a0374767fc29bb0296cdaf8a6da7298436239c54d79180548/cache-e94bd737d77bec58.arrow
Loading cached processed dataset at /Users/stefanh/.cache/huggingface/datasets/ag_news/default/0.0.0/bc2bcb40336ace1a0374767fc29bb0296cdaf8a6da7298436239c54d79180548/cache-9f589b6d71f63643.arrow
Loading cached shuffled indices for dataset at /Users/stefanh/.cache/huggingface/datasets/ag_news/default/0.0.0/bc2bcb40336ace1a0374767fc29bb0296cdaf8a6da7298436239c54d79180548/cache-7fd4281ad39b4f30.arrow
Loading cached shuffled indices for dataset at /Users/stefanh/.cache/huggingface/datasets/ag_news/default/0.0.0/bc2bcb40336ace1a0374767fc29bb0296cdaf8a6da7298436239c54d79180548/cache-880dc73858f51bd4.arrow


## Train


In [6]:
trainer.train()
trainer.save_model(os.path.join(trainer.args.output_dir, now(sep='-')))



[38;2;0;186;142m2022-02-18 15:38:29[38;2;97;175;239m| [39m[49m[22m[23m[24m[25m[27m[28m[29m[38;2;198;120;221m[GPT-2 Training][38;2;97;175;239m::[38;2;198;120;221mon_train_begin[38;2;97;175;239m::[38;2;198;120;221mgpt2.py[38;2;97;175;239m:[38;2;198;120;221m526[38;2;97;175;239m, [39m[49m[22m[23m[24m[25m[27m[28m[29mINFO[39m[49m[22m[23m[24m[25m[27m[28m[29m[38;2;97;175;239m - [39m[49m[22m[23m[24m[25m[27m[28m[29mTraining started with [35m{[39m[49m[0m#data: [34m128[39m[49m[0m, model size: [34m128[39m[49m[0m, learning rate: [34m5e-05[39m[49m[0m, batch shape: [34m(32, 128)[39m[49m[0m, #epochs: [34m4[39m[49m[0m[35m}[39m[49m[0m[39m[49m[22m[23m[24m[25m[27m[28m[29m


KeyboardInterrupt: 

## Evaluate


In [None]:
trainer.evaluate()

