# Preliminaries
The first thing we do is clone the transformers repo and install the necessary requirements using the following code.

In [1]:
!git clone https://github.com/huggingface/transformers # Clone transformers repo
!cd transformers
!pip install -r transformers/examples/requirements.txt # Install necessary requirements
!pip install transformers==3.0.1 # Fix transformers version for reproducibility

Cloning into 'transformers'...
remote: Enumerating objects: 46083, done.[K
remote: Total 46083 (delta 0), reused 0 (delta 0), pack-reused 46083[K
Receiving objects: 100% (46083/46083), 32.96 MiB | 24.39 MiB/s, done.
Resolving deltas: 100% (31985/31985), done.
Collecting seqeval
  Downloading seqeval-0.0.19.tar.gz (30 kB)
Collecting sacrebleu
  Downloading sacrebleu-1.4.14-py3-none-any.whl (64 kB)
[K     |████████████████████████████████| 64 kB 1.3 MB/s 
[?25hCollecting rouge-score
  Downloading rouge_score-0.0.4-py2.py3-none-any.whl (22 kB)
Collecting pytorch-lightning==0.8.5
  Downloading pytorch_lightning-0.8.5-py3-none-any.whl (313 kB)
[K     |████████████████████████████████| 313 kB 3.3 MB/s 
Collecting git-python==1.0.3
  Downloading git_python-1.0.3-py2.py3-none-any.whl (1.9 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.6.3-cp37-cp37m-manylinux2010_x86_64.whl (7.2 MB)
[K     |████████████████████████████████| 7.2 MB 3.6 MB/s 
[?25hCollecting stream

Freeze requirements for later reference.

In [2]:
!pip freeze > kaggle_image_requirements.txt

Download GLUE Data

In [3]:
!mkdir GLUE
!python transformers/utils/download_glue_data.py --data_dir GLUE --tasks all # download GLUE data for all tasks

Downloading and extracting CoLA...
	Completed!
Downloading and extracting SST...
	Completed!
Processing MRPC...
Local MRPC data not specified, downloading data from https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt
	Completed!
Downloading and extracting QQP...
	Completed!
Downloading and extracting STS...
	Completed!
Downloading and extracting MNLI...
	Completed!
Downloading and extracting SNLI...
	Completed!
Downloading and extracting QNLI...
	Completed!
Downloading and extracting RTE...
	Completed!
Downloading and extracting WNLI...
	Completed!
Downloading and extracting diagnostic...
	Completed!


Let's get a sense for what is in the directory

In [4]:
#!cd GLUE # the following will create a tree view of everything
#!ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/   /' -e 's/-/|/'
!ls GLUE/STS-B # let's see what is in the STS-B directory specifically

LICENSE.txt  dev.tsv  original	readme.txt  test.tsv  train.tsv


In [5]:
!ls GLUE/QQP # let's see what is in the QQP directory specifically

dev.tsv  original  test.tsv  train.tsv


Let's get a sense for what the data looks like.

In [6]:
!head GLUE/STS-B/train.tsv 

index	genre	filename	year	old_index	source1	source2	sentence1	sentence2	score
0	main-captions	MSRvid	2012test	0001	none	none	A plane is taking off.	An air plane is taking off.	5.000
1	main-captions	MSRvid	2012test	0004	none	none	A man is playing a large flute.	A man is playing a flute.	3.800
2	main-captions	MSRvid	2012test	0005	none	none	A man is spreading shreded cheese on a pizza.	A man is spreading shredded cheese on an uncooked pizza.	3.800
3	main-captions	MSRvid	2012test	0006	none	none	Three men are playing chess.	Two men are playing chess.	2.600
4	main-captions	MSRvid	2012test	0009	none	none	A man is playing the cello.	A man seated is playing the cello.	4.250
5	main-captions	MSRvid	2012test	0011	none	none	Some men are fighting.	Two men are fighting.	4.250
6	main-captions	MSRvid	2012test	0012	none	none	A man is smoking.	A man is skating.	0.500
7	main-captions	MSRvid	2012test	0013	none	none	The man is playing the piano.	The man is playing the guitar.	1.600
8	main-captions	

# Fine-Tune on QQP Task

Execute fine-tuning from `bert-base-cased` checkpoint on the QQP task. Use batch size 32, a maximum input sequence length of 256, a learning rate of 2e-5 and run it for 3 epochs.

In [7]:
%%time
# the above is a “magic” command for timing the entire cell - has to be the first command
!python transformers/examples/text-classification/run_glue.py --model_name_or_path bert-base-cased --task_name QQP --do_train --do_eval --data_dir GLUE/QQP/ --max_seq_length 256 --per_gpu_train_batch_size 32 --learning_rate 2e-5 --num_train_epochs 1 --output_dir /tmp/QQP/

2020-10-10 13:31:43.402516: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
The current process just got forked. Disabling parallelism to avoid deadlocks...
The current process just got forked. Disabling parallelism to avoid deadlocks...
The current process just got forked. Disabling parallelism to avoid deadlocks...
Downloading: 100%|██████████████████████████████| 433/433 [00:00<00:00, 367kB/s]
Downloading: 100%|███████████████████████████| 213k/213k [00:00<00:00, 2.54MB/s]
Downloading: 100%|███████████████████████████| 436M/436M [00:16<00:00, 26.6MB/s]
Epoch:   0%|                                              | 0/1 [00:00<?, ?it/s]
Iteration:   0%|                                      | 0/11371 [00:00<?, ?it/s][A
Iteration:   0%|                            | 1/11371 [00:01<5:53:59,  1.87s/it][A
Iteration:   0%|                            | 2/11371 [00:02<4:50:40,  1.53s/it][A
Iteration:   0%|        

Take a look into the specified results folder to see what is available in it.

In [8]:
!ls /tmp/QQP

checkpoint-1000   checkpoint-4500  checkpoint-9000
checkpoint-10000  checkpoint-500   checkpoint-9500
checkpoint-10500  checkpoint-5000  config.json
checkpoint-11000  checkpoint-5500  eval_results_qqp.txt
checkpoint-1500   checkpoint-6000  pytorch_model.bin
checkpoint-2000   checkpoint-6500  special_tokens_map.json
checkpoint-2500   checkpoint-7000  tokenizer_config.json
checkpoint-3000   checkpoint-7500  training_args.bin
checkpoint-3500   checkpoint-8000  vocab.txt
checkpoint-4000   checkpoint-8500


Display evaluation results.

In [9]:
!cat /tmp/QQP/eval_results_qqp.txt

eval_loss = 0.24864352908579548
eval_acc = 0.8936433341578036
eval_f1 = 0.8581700639883898
eval_acc_and_f1 = 0.8759066990730967
epoch = 1.0


# Fine-Tune Model Further on STS-B Task

First load QQP fine-tuned model from the previous stage

In [10]:
from transformers import BertForSequenceClassification, BertConfig # use Sequence Classification this time, since it is the form of the problem

qqp_model = BertForSequenceClassification.from_pretrained("/tmp/QQP") # initialize to our fine-tuned model checkpoint



See QQP fine-tuned model encoder

In [11]:
getattr(qqp_model, "bert") # this fetches the pretrained model encoder /featurizer part minus the classifier head

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(28996, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          

Fetch the encoder, make sure the vocab and output sizes of an STS-B configuration are set to be consistent, initialize STS-B model with same encoder. 

In [12]:
shared_encoder = getattr(qqp_model, "bert") # get fine-tuned QQP model encoder

configuration = BertConfig()
configuration.vocab_size = qqp_model.config.vocab_size # make sure the vocab and output sizes of an STS-B configuration are set to be consistent
configuration.num_labels = 1

stsb_model = BertForSequenceClassification(configuration) # initialize qqp model with similar dimensions to 

setattr(stsb_model, "bert", shared_encoder) # set its encoder to the STS-B encoder

Save the initialized STS-B model for further fine-tuning

In [13]:
stsb_model.save_pretrained("/tmp/STSB_pre") # save model

Make sure the vocab from the QQP model is available

In [14]:
!cp /tmp/QQP/vocab.txt /tmp/STSB_pre 

In [15]:
!ls /tmp/STSB_pre 

config.json  pytorch_model.bin	vocab.txt


Now fine-tune the previously QQP fine-tuned model on STS-B

In [16]:
%%time
# the above is a “magic” command for timing the entire cell - has to be the first command
!python transformers/examples/text-classification/run_glue.py --model_name_or_path /tmp/STSB_pre --task_name STS-B --do_train --do_eval --data_dir GLUE/STS-B/ --max_seq_length 256 --per_gpu_train_batch_size 32 --learning_rate 2e-5 --num_train_epochs 3 --output_dir /tmp/STS-B/

2020-10-10 16:14:50.670623: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Epoch:   0%|                                              | 0/3 [00:00<?, ?it/s]
Iteration:   0%|                                        | 0/180 [00:00<?, ?it/s][A
Iteration:   1%|▏                               | 1/180 [00:01<02:59,  1.00s/it][A
Iteration:   1%|▎                               | 2/180 [00:01<02:44,  1.08it/s][A
Iteration:   2%|▌                               | 3/180 [00:02<02:34,  1.14it/s][A
Iteration:   2%|▋                               | 4/180 [00:03<02:27,  1.19it/s][A
Iteration:   3%|▉                               | 5/180 [00:04<02:22,  1.23it/s][A
Iteration:   3%|█                               | 6/180 [00:04<02:18,  1.26it/s][A
Iteration:   4%|█▏                              | 7/180 [00:05<02:15,  1.28it/s][A
Iteration:   4%|█▍                              | 8/180 [00:06<02:12,  1.30it/s][A
Iterat

Check results. Should be an improvement over just fine-tuning on STS-B alone!

In [17]:
!cat /tmp/STS-B/eval_results_sts-b.txt

eval_loss = 0.49737201514158474
eval_pearson = 0.8931606380447263
eval_spearmanr = 0.8934618150816026
eval_corr = 0.8933112265631644
epoch = 3.0


In [18]:
!ls /tmp/STS-B

checkpoint-500		pytorch_model.bin	 training_args.bin
config.json		special_tokens_map.json  vocab.txt
eval_results_sts-b.txt	tokenizer_config.json
