**Transformer Tune-up: Fine-tune BERT for State-of-the-art sentiment Analysis Using Hugging Face**

In [1]:
pip install transformers



##Install required Dependencies/Packages

In [2]:
import torch
from sklearn.metrics import accuracy_score, f1_score
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import pandas as pd
from sklearn.model_selection import train_test_split

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
df_input = pd.read_csv('/content/drive/MyDrive/Sentiment_Analysis/movie.csv')
df_input = df_input.sample(frac = 0.25)
df_input.head()

Unnamed: 0,text,label
5737,I hadn't heard of Soap Girl but I saw a poster...,0
36339,We taped this when it aired on TV back in 1995...,1
28421,I wouldn't be so sure to accept the DNA tests ...,1
38779,"Plunkett and Macleane is an entertaining, fast...",1
19693,during eddie murphy's stand up a women from th...,1


In [5]:
df_input.shape

(10000, 2)

In [6]:
texts = df_input['text'].tolist()  # extract the reviews
y_true = df_input['label'].tolist()  # extract the actual sentiments

In [7]:
# Split data into training and validation sets
train_texts, val_texts, train_labels, val_labels = train_test_split(texts, y_true, test_size=0.2)

**Baseline Performance**

In [8]:
from transformers import BertTokenizerFast, BertForSequenceClassification
from torch.utils.data import Dataset, DataLoader
import torch

In [9]:
# Load the model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
model = model.to('cuda')  # if GPU is available

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [10]:
# Initialize tokenizer
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')

tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [11]:
# Tokenize data
val_encodings = tokenizer(val_texts, truncation=True, padding=True, max_length=512)

In [12]:
# Assuming val_encodings is a list or an array
index = 1
single_element = val_encodings[index]
print(single_element)

Encoding(num_tokens=512, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])


In [13]:
# Create torch dataset for validation
class ReviewDataset(Dataset):
    def __init__(self, encodings, labels=None):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        if self.labels:
            item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.encodings['input_ids'])

In [14]:
val_dataset = ReviewDataset(val_encodings, val_labels)
val_dataset[0]

{'input_ids': tensor([  101,  5212, 12548,  2003,  6684,  2028,  1997,  2637,  1005,  1055,
          2307,  5501,  1012,  1045,  1005,  1049,  2025,  2469,  2002, 24209,
         11475, 14213,  2004,  2028,  1997,  5365,  1005,  1055,  2307, 25736,
          1012,  2021,  2002,  5121,  6938,  2039,  2045,  2007,  1996,  2190,
          1997,  5365,  1005,  1055,  2995,  8390,  2040,  2020,  5627,  2000,
          2233,  2000,  2037,  2219,  2189,  1012,  2076,  1996,  2051,  2002,
          2499,  2005,  5365,  4835,  1010,  2002,  2354,  2129,  2000,  2202,
          2019,  8775,  1010,  4338,  1996,  3054, 14423,  3430,  4375,  2000,
          2032,  1998,  2059,  2735,  2009,  2855,  1998, 18228,  2046,  2242,
          2788,  2488,  2084,  2049,  3033,  1012,  1012,  1012,  2006,  2051,
          1998,  2006,  5166,  1012, 15373,  2006,  2148,  2395,  2003,  1037,
          2553,  1999,  2391,  1012,  2006,  1996,  3302,  2009,  1005,  1055,
          2028,  2062,  1997,  5365,  1

In [15]:
val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)

In [16]:
# Predict with the model
model.eval()
predictions = []
true_labels = []
for batch in val_loader:
    input_ids = batch['input_ids'].to('cuda')
    attention_mask = batch['attention_mask'].to('cuda')
    labels = batch['labels'].to('cuda')

    with torch.no_grad():
        outputs = model(input_ids, attention_mask=attention_mask)

    logits = outputs.logits
    predicted_labels = torch.argmax(logits, dim=1).cpu().numpy()
    predictions.extend(predicted_labels)
    true_labels.extend(labels.cpu().numpy())

In [17]:
# Calculate metrics
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix
accuracy = accuracy_score(true_labels, predictions)
f1 = f1_score(true_labels, predictions)
conf_matrix = confusion_matrix(true_labels, predictions)

print(f'Accuracy: {accuracy}')
print(f'F1-score: {f1}')
print(f'Confusion matrix:\n {conf_matrix}')

Accuracy: 0.497
F1-score: 0.6628686327077747
Confusion matrix:
 [[   5 1001]
 [   5  989]]


`*As expected, without fine-tuning, BERT’s performance is terrible accuracy for a binary classification task. The BERT base model understands the structure of human language but has not been specifically taught how to perform sentiment analysis. Despite a valiant effort, it fails miserably.*`

**Train the Model (i.e. Fine-Tune the Pre-Trained Model) for Sentiment Analysis**

In [18]:
import torch
from torch.utils.data import Dataset, DataLoader
from torch.optim import AdamW

import time
# Record start time
start_time = time.time()

In [19]:
# Tokenize data
train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=512)
val_encodings = tokenizer(val_texts, truncation=True, padding=True, max_length=512)

# Create torch dataset
class ReviewDataset(Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

In [20]:
# Create dataloaders
train_dataset = ReviewDataset(train_encodings, train_labels)
val_dataset = ReviewDataset(val_encodings, val_labels)

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)

In [21]:
# Initialize model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
model = model.to('cuda')  # if GPU is available

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [22]:
# Initialize optimizer
optimizer = AdamW(model.parameters(), lr=1e-5)

# Training loop
for epoch in range(1):  # number of epochs
    model.train()
    for batch in train_loader:
        optimizer.zero_grad()
        input_ids = batch['input_ids'].to('cuda')
        attention_mask = batch['attention_mask'].to('cuda')
        labels = batch['labels'].to('cuda')
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        loss.backward()
        optimizer.step()

# Save the model
model.save_pretrained('sentiment_model_BERT')

# Record end time
end_time = time.time()

print("Time required to fine-tune: ", end_time - start_time)

Time required to fine-tune:  734.5811967849731


In [23]:
# Predict with the model
model.eval()
predictions = []
true_labels = []
for batch in val_loader:
    input_ids = batch['input_ids'].to('cuda')
    attention_mask = batch['attention_mask'].to('cuda')
    labels = batch['labels'].to('cuda')

    with torch.no_grad():
        outputs = model(input_ids, attention_mask=attention_mask)

    logits = outputs.logits
    predicted_labels = torch.argmax(logits, dim=1).cpu().numpy()
    predictions.extend(predicted_labels)
    true_labels.extend(labels.cpu().numpy())

In [24]:
# Calculate metrics
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix
accuracy = accuracy_score(true_labels, predictions)
f1 = f1_score(true_labels, predictions)
conf_matrix = confusion_matrix(true_labels, predictions)

print(f'Accuracy: {accuracy}')
print(f'F1-score: {f1}')
print(f'Confusion matrix:\n {conf_matrix}')

Accuracy: 0.9115
F1-score: 0.9144514258095698
Confusion matrix:
 [[877 129]
 [ 48 946]]
