# Transformers

This notebook provides two parts:
1. **Introducing Transformers:** this section introduces the Transformers library from HuggingFace with simple tasks to retrieve contextualised embeddings from pretrained transformer models
2. **Classification with transformers:** it constructed a neural network classifier using Transformers to classify the emotion analysis

In [1]:
! pip install transformers

You should consider upgrading via the '/opt/anaconda3/bin/python3 -m pip install --upgrade pip' command.[0m[33m
[0m

In [3]:
import warnings  #Get rid of warning in code
warnings.filterwarnings('ignore')

# Part 1: Introducing Transformers

Load a pretrained the BERT-tiny modelmodel

In [4]:
from transformers import AutoModel # For BERTs

model = AutoModel.from_pretrained("prajjwal1/bert-tiny") 

Some weights of the model checkpoint at prajjwal1/bert-tiny were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


# 1.1. Tokenizers

Created Tokenizer object to convert raw text to a sequence of numbers

In [5]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("prajjwal1/bert-tiny") 

Showed tokenized example sentence:

In [6]:
sentence = "The transformer architecture is widely used in NLP."

tokens = tokenizer.tokenize(sentence)
print(tokens)

['the', 'transform', '##er', 'architecture', 'is', 'widely', 'used', 'in', 'nl', '##p', '.']


Compare with the NLTK tokenizer 

In [7]:
from nltk.tokenize import word_tokenize

nltk_tokens = word_tokenize(sentence)
print(nltk_tokens)

['The', 'transformer', 'architecture', 'is', 'widely', 'used', 'in', 'NLP', '.']


Map the tokens to their IDs

In [8]:
ids = tokenizer.convert_tokens_to_ids(tokens)

print(ids)

[1996, 10938, 2121, 4294, 2003, 4235, 2109, 1999, 17953, 2361, 1012]


## 1.2. Contextualised Embeddings

Convert the list of IDs to a 2-D tensor with a single row

In [9]:
import torch 

ids_tensor = torch.tensor([ids])

print(ids_tensor)

tensor([[ 1996, 10938,  2121,  4294,  2003,  4235,  2109,  1999, 17953,  2361,
          1012]])


Process the sequence using our model and maps the sequence of input IDs to a sequence of output vectors

In [10]:
model_outputs = model(ids_tensor)
print(model_outputs)
embeddings = model_outputs['last_hidden_state'][0]
print(embeddings)

BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[ 0.6550,  0.3572, -1.8545,  ..., -2.2321, -2.4890,  0.8569],
         [-0.7762,  0.7065, -0.4053,  ..., -1.0436, -1.4757,  1.0586],
         [-0.0331,  0.0583, -0.5069,  ..., -3.0095, -0.8549,  0.6007],
         ...,
         [-0.1059,  0.2619,  0.2993,  ..., -0.9318, -2.3270,  1.5508],
         [ 0.2457,  0.2863,  0.6015,  ..., -2.2550, -2.1556,  0.7440],
         [ 0.6512, -0.0050,  0.4048,  ..., -1.6719, -2.0540,  0.0476]]],
       grad_fn=<NativeLayerNormBackward0>), pooler_output=tensor([[-0.9995, -0.0462, -0.9893,  0.8689, -0.9952,  0.6976, -0.7959, -0.8617,
         -0.0733,  0.0631, -0.2004, -0.0188,  0.0957,  0.9973,  0.3465, -0.5966,
         -0.0231,  0.2026, -0.5178, -0.9471,  0.9236, -0.1754, -0.5852, -0.9344,
         -0.9866, -0.0732, -0.9979,  0.8367,  0.6465,  0.0414,  0.0751,  0.0109,
         -0.9963, -0.0876,  0.9558,  0.9860, -0.8553,  0.1024,  0.2728, -0.9718,
          0.8817,  0.7812, -0.97

Retrieve the embedding vector for "transform" 

In [11]:
emb = embeddings[1] 

# convert it to a numpy array so we can perform various operations on it later on
emb = emb.detach().numpy()

print(emb)
print(f'The BERT-tiny embeddings have {emb.shape[0]} dimensions.')

[-0.7762405   0.7064996  -0.40526924 -1.0537269   0.59963876 -1.4787806
 -0.06114499  1.0198268  -0.2592848  -1.3763579   0.15379079  1.0128253
  0.7475134  -0.17591822  2.0003288  -1.01974    -0.7689706   0.05306914
  0.13064745  0.19979265 -1.0313506  -0.5410429   1.0834255   0.4924941
  2.2506452   1.3008716  -0.16233596  0.2252483   0.7293363   0.37714234
  0.07085949  0.39800435 -0.37489364 -0.18650484  0.5223742  -2.7473822
 -0.53682196  0.35264653 -1.8976283  -0.35527742  0.07477728 -0.39572534
 -0.55448097  0.62232053  1.0455062  -2.1943061   0.40990543 -0.62277496
  2.2192175  -0.13648552  0.8971438   0.80766904  0.18794227 -0.01698841
  0.5216419  -0.3289478   0.07476875 -1.1039575   1.2602055   3.4293041
 -0.91396123 -1.8800958  -0.08931123 -0.79668564  0.06266132  0.69099814
 -0.7370051  -0.23590574 -0.42857686 -0.68002856 -0.619341    0.01592773
  1.6605315   0.6648323  -1.6665549   2.1701157   0.7972164  -0.5222843
  0.6280751  -0.41740742 -0.11712625 -1.3964862  -0.48696

Retrieve the embedding for "architecture"

In [12]:
# WRITE YOUR ANSWER HERE
#Retrieve the word "architecture" in the embedding vector which is index 2 in the last hidden layer of the model
emb2 = embeddings[2] 

emb_architecture = emb2.detach().numpy() #convert it in to numpy array

print(emb_architecture)
print(f'The BERT-tiny embeddings have {emb_architecture.shape[0]} dimensions.')

[-0.03308459  0.05834484 -0.50693285 -1.5424432   0.4935105   0.18124032
 -0.8694352  -0.11085375 -0.32444605 -1.2748554   0.52261645  1.8194501
  0.13458532 -0.7986219   0.7316742  -0.7354131  -1.923584   -1.2328044
  1.1261138   1.7006457   0.2257041  -0.61980444  1.200127   -0.08013602
  1.6121459   1.5160662  -0.37660006  0.6972985   0.5798398  -0.8779775
  0.51046014 -1.5074801  -1.1062824  -0.49653316  0.54861885 -1.2865149
 -0.37604594  0.45097703 -1.6104168  -0.16055688  0.4270874   1.4837753
  0.11611927  0.02534518  0.35540295 -2.0517945   0.28007472 -0.65418786
  2.6423407   0.21391019  0.3894924   0.83619374  0.9382673  -1.1972377
 -0.16927071 -0.7794073  -0.64581025 -1.209792    0.27398497  2.989087
 -0.5239233  -1.4748845  -0.78227526 -1.2025111  -0.47365758  0.18987575
 -0.47654122 -1.0631106  -0.6145147  -1.708671   -1.2677417  -0.64487517
  1.5999359  -0.5242034  -1.2763854   0.81190556 -0.5610492   0.3413566
  1.9631462   0.1756343  -0.6325791  -2.0699687   0.5209454 

Tokenized Sentences 

In [13]:
sentences = [
    "She opened the book to page 37 and began to read aloud.",
    "Many readers find the first book of A Tale of Two Cities to be confusing.",
    "I can book tickets for the concert next week.",
    "The police wanted to book him for driving too fast.",
    "I can reserve tickets for the concert next week."
]

model_inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")  

print(model_inputs)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


{'input_ids': tensor([[  101,  2016,  2441,  1996,  2338,  2000,  3931,  4261,  1998,  2211,
          2000,  3191, 12575,  1012,   102,     0,     0,     0],
        [  101,  2116,  8141,  2424,  1996,  2034,  2338,  1997,  1037,  6925,
          1997,  2048,  3655,  2000,  2022, 16801,  1012,   102],
        [  101,  1045,  2064,  2338,  9735,  2005,  1996,  4164,  2279,  2733,
          1012,   102,     0,     0,     0,     0,     0,     0],
        [  101,  1996,  2610,  2359,  2000,  2338,  2032,  2005,  4439,  2205,
          3435,  1012,   102,     0,     0,     0,     0,     0],
        [  101,  1045,  2064,  3914,  9735,  2005,  1996,  4164,  2279,  2733,
          1012,   102,     0,     0,     0,     0,     0,     0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

The special padding tokenizer

In [14]:
#print the special padding tokenizer
for id in model_inputs['input_ids']: #acess the input_ids for each sentence
    print(tokenizer.decode(id)) #print the special padding tokenizer of each sentence

[CLS] she opened the book to page 37 and began to read aloud. [SEP] [PAD] [PAD] [PAD]
[CLS] many readers find the first book of a tale of two cities to be confusing. [SEP]
[CLS] i can book tickets for the concert next week. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
[CLS] the police wanted to book him for driving too fast. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD]
[CLS] i can reserve tickets for the concert next week. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]


In [15]:
# model_inputs is a dictionary, so to provide the arguments to model(), 
# we use the double star to unpack the dictionary so that each key in the dictionary is
# an argument to model() and each value is the value of the argument. 
model_outputs = model(**model_inputs) 

Obtain the contextualised word embeddings for 'book' and 'reserve' in the example sentences using the model

In [16]:
# Retrive hidden state values produced in the last hidden layer of the model 
emb3 = model_outputs['last_hidden_state'] #Retrive embedding vector in the last hidden layer from the model_output variable 

# Find the word "book" and "reserve" of each sentence in the embedding vector(emb3) by using the index of each word
# in each sentence and extrct the value to store in the variables below and convert it ti numpy array
sent_book1 = emb3[0][3].detach().numpy() #Extract the contextualised word embeddings for 'book' in sentence 1
sent_book2 = emb3[1][5].detach().numpy() #Extract the contextualised word embeddings for 'book' in sentence 2
sent_book3 = emb3[2][2].detach().numpy() #Extract the contextualised word embeddings for 'book' in sentence 3
sent_book4 = emb3[3][4].detach().numpy() #Extract the contextualised word embeddings for 'book' in sentence 4
sent_reserve = emb3[4][2].detach().numpy() #Extract the contextualised for 'reserve' in sentence 5

print("The contextualised word embeddings for 'book' in sentence1 :\n", sent_book1)
print("The contextualised word embeddings for 'book' in sentence2 :\n", sent_book2)
print("The contextualised word embeddings for 'book' in sentence3 :\n", sent_book3)
print("The contextualised word embeddings for 'book' in sentence4 :\n", sent_book4)
print("The contextualised word embeddings for 'reserve' in sentence5 :\n", sent_reserve)

The contextualised word embeddings for 'book' in sentence1 :
 [-1.2332301e+00  8.1168419e-01  3.3068866e-02 -9.6326435e-01
  9.5280731e-01  6.0104215e-01  5.9031653e-01  4.6202838e-02
 -9.0080577e-01  1.8362245e-01 -1.0318942e+00  7.1658599e-01
  6.8435967e-01 -1.6353027e+00  1.4162868e+00 -1.5789974e+00
 -1.0162764e+00 -8.1221992e-01 -1.1755352e+00  1.3912032e+00
 -3.2350945e-01  2.6244619e-01  8.1340665e-01  1.8822510e+00
  4.3167609e-01 -4.5490599e-01  2.0257646e-01  1.3786050e+00
  5.2285039e-01 -4.3345994e-01 -1.3670484e+00  6.4454842e-01
 -1.1654184e+00 -5.2278471e-01  1.8271497e+00 -1.4007516e+00
  6.0561180e-01 -6.8775363e-02 -5.3152099e+00 -1.2173519e+00
  3.3846161e-01  5.3339905e-01  1.9286759e-01 -1.7144172e+00
  6.1394453e-01 -1.1065278e+00 -9.8903686e-01  5.8604980e-01
 -2.7529672e-03  5.3680557e-01  2.0090744e+00 -1.5506877e-01
  3.2932907e-02 -6.0485923e-01 -8.7023741e-01 -6.7914152e-01
  1.0176802e+00  1.5285075e-01 -5.8332813e-01  1.0765169e+00
 -1.1095304e+00 -1.2091

Compare these embeddings in the cell below. In a few sentences
 

In [17]:
# WRITE YOUR ANSWER HERE
#import cosine to compute the cosine distance between the comparing
from scipy.spatial.distance import cosine 

#compute the cosine similarity between the two embeddings vectors
#and then, useing 1- cosine to get the consine similarity 
diff_book12 = 1 - cosine(sent_book1, sent_book2) #cosine similarity the word "book" between sentence 1 & 2 
diff_book13 = 1 - cosine(sent_book1, sent_book3) #cosine similarity the word "book" between sentence 1 & 3
diff_book14 = 1 - cosine(sent_book1, sent_book4) #cosine similarity the word "book" between sentence 1 & 4 
diff_book23 = 1 - cosine(sent_book2, sent_book3) #cosine similarity the word "book" between sentence 2 & 3 
diff_book24 = 1 - cosine(sent_book2, sent_book4) #cosine similarity the word "book" between sentence 2 & 4 
diff_book34 = 1 - cosine(sent_book3, sent_book4) #cosine similarity the word "book" between sentence 3 & 4 
diff_bookre35 = 1 - cosine(sent_book3, sent_reserve) #cosine similarity the word "book" and "reserve" in sentence 3 & 5
diff_bookre45 = 1 - cosine(sent_book4, sent_reserve) #cosine similarity the word "book" and "reserve" in sentence 4 & 5
diff_bookre25 = 1 - cosine(sent_book2, sent_reserve) #cosine similarity the word "book" and "reserve" in sentence 2 & 5
diff_bookre15 = 1 - cosine(sent_book1, sent_reserve) #cosine similarity the word "book" and "reserve" in sentence 2 & 5

print("Cosine Similarity between book in sentence 1 and 2:", diff_book12)
print("Cosine Similarity between book in sentence 1 and 3:", diff_book13)
print("Cosine Similarity between book in sentence 1 and 4:", diff_book14)
print("Cosine Similarity between book in sentence 2 and 3:", diff_book23)
print("Cosine Similarity between book in sentence 2 and 4:", diff_book24)
print("Cosine Similarity between book in sentence 3 and 4:", diff_book34)
print("Cosine Similarity between book in sentence 4 and reserve:", diff_bookre45)
print("Cosine Similarity between book in sentence 3 and reserve:", diff_bookre35)
print("Cosine Similarity between book in sentence 2 and reserve:", diff_bookre25)
print("Cosine Similarity between book in sentence 1 and reserve:", diff_bookre15)

Cosine Similarity between book in sentence 1 and 2: 0.7383215427398682
Cosine Similarity between book in sentence 1 and 3: 0.5869025588035583
Cosine Similarity between book in sentence 1 and 4: 0.574064314365387
Cosine Similarity between book in sentence 2 and 3: 0.4978688061237335
Cosine Similarity between book in sentence 2 and 4: 0.5190578103065491
Cosine Similarity between book in sentence 3 and 4: 0.5240395069122314
Cosine Similarity between book in sentence 4 and reserve: 0.47602763772010803
Cosine Similarity between book in sentence 3 and reserve: 0.9565439820289612
Cosine Similarity between book in sentence 2 and reserve: 0.3882303535938263
Cosine Similarity between book in sentence 1 and reserve: 0.4863606095314026


# Part 2: Classification with transformers

Load up the [Tweet Eval](https://huggingface.co/datasets/tweet_eval) emotion analysis dataset

In [18]:
import numpy as np #import numpy

In [19]:
from datasets import load_dataset

cache_dir = "./data_cache"

train_dataset = load_dataset(
    "tweet_eval",
    name="emotion",
    split="train",
    ignore_verifications=True,
    cache_dir=cache_dir,
)
print(f"Training dataset with {len(train_dataset)} instances loaded")

val_dataset = load_dataset(
    "tweet_eval",
    name="emotion",
    split="validation",
    ignore_verifications=True,
    cache_dir=cache_dir,
)
print(f"Validation dataset with {len(val_dataset)} instances loaded")

test_dataset = load_dataset(
    "tweet_eval",
    name="emotion",
    split="test",
    ignore_verifications=True,
    cache_dir=cache_dir,
)
print(f"Test dataset with {len(test_dataset)} instances loaded")

num_classes = np.unique(train_dataset['label']).size

Reusing dataset tweet_eval (./data_cache/tweet_eval/emotion/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)


Training dataset with 3257 instances loaded


Reusing dataset tweet_eval (./data_cache/tweet_eval/emotion/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)


Validation dataset with 374 instances loaded


Reusing dataset tweet_eval (./data_cache/tweet_eval/emotion/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)


Test dataset with 1421 instances loaded


Tokenize the examples in the dataset

In [20]:
def tokenize_function(dataset):
    model_inputs = tokenizer(dataset['text'], padding="max_length", max_length=100, truncation=True)
    return model_inputs

train_dataset = train_dataset.map(tokenize_function, batched=True)
val_dataset = val_dataset.map(tokenize_function, batched=True)
test_dataset = test_dataset.map(tokenize_function, batched=True)

Loading cached processed dataset at ./data_cache/tweet_eval/emotion/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-9f5fb0569fa0881f.arrow
Loading cached processed dataset at ./data_cache/tweet_eval/emotion/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-04132f40f00e4298.arrow
Loading cached processed dataset at ./data_cache/tweet_eval/emotion/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-5a9227904869e963.arrow


Access a tensor containing the [CLS] embeddings

In [21]:
cls_embs = model(**model_inputs)['last_hidden_state'][:, 0]

print(cls_embs.shape)

torch.Size([5, 128])


Create a complete model for sequence classification, based on the BERT-tiny model

In [22]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("prajjwal1/bert-tiny", num_labels= num_classes)

Some weights of the model checkpoint at prajjwal1/bert-tiny were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initia

Train our model and freeze weights in the BERT model itself

In [23]:
for param in model.bert.parameters():
    param.requires_grad = False 

Train the model

In [24]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="transformer_checkpoints",  # specify the directory where models weights will be saved a certain points during training (checkpoints)
    num_train_epochs=3,  # change this if it is taking too long on your computer
)  

In [25]:
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()

The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 3257
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 1224


Step,Training Loss
500,1.2924
1000,1.2481


Saving model checkpoint to transformer_checkpoints/checkpoint-500
Configuration saved in transformer_checkpoints/checkpoint-500/config.json
Model weights saved in transformer_checkpoints/checkpoint-500/pytorch_model.bin
Saving model checkpoint to transformer_checkpoints/checkpoint-1000
Configuration saved in transformer_checkpoints/checkpoint-1000/config.json
Model weights saved in transformer_checkpoints/checkpoint-1000/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=1224, training_loss=1.266378190782335, metrics={'train_runtime': 33.5924, 'train_samples_per_second': 290.87, 'train_steps_per_second': 36.437, 'total_flos': 2426108032800.0, 'train_loss': 1.266378190782335, 'epoch': 3.0})

Predictions with the test dataset

In [26]:
def predict_nn(trained_model, test_dataset):

    # Pass the required items from the dataset to the model    
    output = trained_model(attention_mask=torch.tensor(test_dataset["attention_mask"]), input_ids=torch.tensor(test_dataset["input_ids"]))
        
    # the output dictionary contains logits, which are the unnormalised scores for each class for each example:
    pred_labs = np.argmax(output["logits"].detach().numpy(), axis=1)
    
    gold_labs = test_dataset["label"]
    
    return gold_labs, pred_labs

# Run the prediction function to get the results:
gold_labs, pred_labs = predict_nn(model, test_dataset)

Compute the classification metrics of this model

In [27]:
from sklearn.metrics import classification_report #import classification_report

print(classification_report(gold_labs, pred_labs))

              precision    recall  f1-score   support

           0       0.39      1.00      0.56       558
           1       0.00      0.00      0.00       358
           2       0.00      0.00      0.00       123
           3       0.00      0.00      0.00       382

    accuracy                           0.39      1421
   macro avg       0.10      0.25      0.14      1421
weighted avg       0.15      0.39      0.22      1421



Load the dataset to implement "irony" subset


In [28]:
from datasets import load_dataset

cache_dir = "./data_cache"

train_dataset = load_dataset(
    "tweet_eval",
    name="irony", #implemet with irony
    split="train",
    ignore_verifications=True,
    cache_dir=cache_dir,
)
print(f"Training dataset with {len(train_dataset)} instances loaded")

val_dataset = load_dataset(
    "tweet_eval",
    name="irony", #implemet with irony
    split="validation",
    ignore_verifications=True,
    cache_dir=cache_dir,
)
print(f"Validation dataset with {len(val_dataset)} instances loaded")

test_dataset = load_dataset(
    "tweet_eval",
    name="irony", #implemet with irony
    split="test",
    ignore_verifications=True,
    cache_dir=cache_dir,
)
print(f"Test dataset with {len(test_dataset)} instances loaded")

num_classes = np.unique(train_dataset['label']).size

Reusing dataset tweet_eval (./data_cache/tweet_eval/irony/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)


Training dataset with 2862 instances loaded


Reusing dataset tweet_eval (./data_cache/tweet_eval/irony/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)


Validation dataset with 955 instances loaded


Reusing dataset tweet_eval (./data_cache/tweet_eval/irony/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)


Test dataset with 784 instances loaded


Using tokenizer to tokenize the dataset

In [29]:
# Using tokenizer function to tokenize the dataset  
# use the map() method to apply the tokenizer the dataset.
train_dataset = train_dataset.map(tokenize_function, batched=True)
val_dataset = val_dataset.map(tokenize_function, batched=True)
test_dataset = test_dataset.map(tokenize_function, batched=True)

Loading cached processed dataset at ./data_cache/tweet_eval/irony/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-c744650d7bba086d.arrow
Loading cached processed dataset at ./data_cache/tweet_eval/irony/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-f627c69dc4e8f3f1.arrow
Loading cached processed dataset at ./data_cache/tweet_eval/irony/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-c95fc07ac0a79269.arrow


Create the model to implement in this task

In [30]:
model2 = AutoModelForSequenceClassification.from_pretrained("prajjwal1/bert-tiny", num_labels= num_classes)

loading configuration file https://huggingface.co/prajjwal1/bert-tiny/resolve/main/config.json from cache at /Users/gracepichar/.cache/huggingface/transformers/3cf34679007e9fe5d0acd644dcc1f4b26bec5cbc9612364f6da7262aed4ef7a4.a5a11219cf90aae61ff30e1658ccf2cb4aa84d6b6e947336556f887c9828dc6d
Model config BertConfig {
  "_name_or_path": "prajjwal1/bert-tiny",
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 128,
  "initializer_range": 0.02,
  "intermediate_size": 512,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 2,
  "num_hidden_layers": 2,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.18.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

loading weights file https://huggingface.co/prajjwal1/bert-tiny/resolve/main/pytorch_model.bin from cache at /Users/gracepichar/.cach

Set requires_grad to be True

It set unfrozen BERT layers. It means the part of the model will change during the training process.  Each of the parameters allow to learn and update in the training model. In the backpropagation, it propagates back to the embedding layer and updates the embedding layer so, it can propagate back through the layer.

In [31]:
for param in model.bert.parameters():
    param.requires_grad = True #unfrozen

Train the model

In [32]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="transformer_checkpoints",  
    num_train_epochs=3,  
)  

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [33]:
from transformers import Trainer

trainer = Trainer(
    model=model2,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()

The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 2862
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 1074


Step,Training Loss
500,0.6662
1000,0.6379


Saving model checkpoint to transformer_checkpoints/checkpoint-500
Configuration saved in transformer_checkpoints/checkpoint-500/config.json
Model weights saved in transformer_checkpoints/checkpoint-500/pytorch_model.bin
Saving model checkpoint to transformer_checkpoints/checkpoint-1000
Configuration saved in transformer_checkpoints/checkpoint-1000/config.json
Model weights saved in transformer_checkpoints/checkpoint-1000/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=1074, training_loss=0.6502215458250135, metrics={'train_runtime': 122.9412, 'train_samples_per_second': 69.838, 'train_steps_per_second': 8.736, 'total_flos': 2130547212000.0, 'train_loss': 0.6502215458250135, 'epoch': 3.0})

Make prediction and compute the accuracy score

In [34]:
from sklearn.metrics import accuracy_score
#Make prediction
gold_labs2, pred_labs2 = predict_nn(model2, test_dataset)

#compute accuracy score
print("The accuracy_score when unfrozen:", accuracy_score(gold_labs2, pred_labs2))

The accuracy_score when unfrozen: 0.6020408163265306


## **When frozen BERT layers**

**Using tokenizer to tokenize the dataset**

In [35]:
train_dataset = train_dataset.map(tokenize_function, batched=True)
val_dataset = val_dataset.map(tokenize_function, batched=True)
test_dataset = test_dataset.map(tokenize_function, batched=True)

Loading cached processed dataset at ./data_cache/tweet_eval/irony/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-40e91856f3f4ab76.arrow
Loading cached processed dataset at ./data_cache/tweet_eval/irony/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-75525174e5e40e62.arrow
Loading cached processed dataset at ./data_cache/tweet_eval/irony/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-daaa7f0121f93492.arrow


**Create the model to implement in this task**

In [36]:
#create model for this task
model3 = AutoModelForSequenceClassification.from_pretrained("prajjwal1/bert-tiny", num_labels= num_classes)

loading configuration file https://huggingface.co/prajjwal1/bert-tiny/resolve/main/config.json from cache at /Users/gracepichar/.cache/huggingface/transformers/3cf34679007e9fe5d0acd644dcc1f4b26bec5cbc9612364f6da7262aed4ef7a4.a5a11219cf90aae61ff30e1658ccf2cb4aa84d6b6e947336556f887c9828dc6d
Model config BertConfig {
  "_name_or_path": "prajjwal1/bert-tiny",
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 128,
  "initializer_range": 0.02,
  "intermediate_size": 512,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 2,
  "num_hidden_layers": 2,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.18.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

loading weights file https://huggingface.co/prajjwal1/bert-tiny/resolve/main/pytorch_model.bin from cache at /Users/gracepichar/.cach

**When frozen BERT layers**

Set param.requires_grad = False to fix parameter's weight 

In [37]:
for param in model.bert.parameters():
    param.requires_grad = False #frozen 

**Train the model**

In [38]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="transformer_checkpoints",  
    num_train_epochs=3,  
)  

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [39]:
from transformers import Trainer

trainer = Trainer(
    model=model3,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()

The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 2862
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 1074


Step,Training Loss
500,0.67
1000,0.6395


Saving model checkpoint to transformer_checkpoints/checkpoint-500
Configuration saved in transformer_checkpoints/checkpoint-500/config.json
Model weights saved in transformer_checkpoints/checkpoint-500/pytorch_model.bin
Saving model checkpoint to transformer_checkpoints/checkpoint-1000
Configuration saved in transformer_checkpoints/checkpoint-1000/config.json
Model weights saved in transformer_checkpoints/checkpoint-1000/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=1074, training_loss=0.6532815652630627, metrics={'train_runtime': 101.2777, 'train_samples_per_second': 84.777, 'train_steps_per_second': 10.605, 'total_flos': 2130547212000.0, 'train_loss': 0.6532815652630627, 'epoch': 3.0})

**Make prediction and compute the accuracy score**

In [40]:
#Make prediction 
gold_labs3, pred_labs3 = predict_nn(model3, test_dataset)

#compute accuracy score
print("The accuracy_score when frozen:", accuracy_score(gold_labs3, pred_labs3))

The accuracy_score when frozen: 0.5778061224489796


**Choose the sentence from the website [this page on verbal irony](https://examples.yourdictionary.com/examples-of-verbal-irony.html):**

**In J. K. Rowling’s Harry Potter and the Order of the Phoenix, Harry says, "Yeah, Quirrell was a great teacher. There was just that minor drawback of him having Lord Voldemort sticking out of the back of his head!”**

In [41]:
 #Store the sentence in the sentence list
sentence = ["Yeah, Quirrell was a great teacher. There was just that minor drawback of him having Lord Voldemort sticking out of the back of his head!"]

#Use the tokenizer class to pad the sequences up to a maximum length
model_inputs = tokenizer(sentence, padding=True, truncation=True, return_tensors="pt")  

In [42]:
#Extract the output model from hidden state
#use the model2 from the previous task of irony when param.requires_grad = True (unfrozen)
model_outputs = model2(**model_inputs) 

In [43]:
print(model_outputs)

SequenceClassifierOutput(loss=None, logits=tensor([[-0.2801,  0.2067]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)


In [44]:
sen = ''.join(str(i) for i in sentence) #convert list to string type to get rid of ""
print("Sentence:", sen) #print out the sentence

#using softmax function to compute the probability of irony for a sentence
print(torch.nn.functional.softmax((model_outputs['logits']))) #probablity by using softmax 

Sentence: Yeah, Quirrell was a great teacher. There was just that minor drawback of him having Lord Voldemort sticking out of the back of his head!
tensor([[0.3807, 0.6193]], grad_fn=<SoftmaxBackward0>)


**The result shows the probability of irony belonging to two classes [0.3807, 0.6193].  It means that irony has a higher probability of class-label 1.**