# Predicting Sentiment Using a Transformer

This notebook provides you with a complete code example that predicts the sentiment of movie reviews using a transformer encoder network.

## Using the IMDB Dataset

Start by downloading the Large Movie Review Dataset (often referred to as the IMDB dataset, as it’s available at https://huggingface.co/datasets/imdb). It contains 50,000 movie reviews, labeled as positive or negative. The dataset is divided into 25,000 reviews for training and 25,000 reviews for testing.

Download the IMDB dataset ...

In [2]:
from datasets import load_dataset

dataset = load_dataset("imdb")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

... splitting the training and validation datasets ...

In [3]:
split = dataset["train"].train_test_split(
    test_size=0.2,
    stratify_by_column="label",
    seed=42,
)
train_dataset, val_dataset = split["train"], split["test"]

... and print some example reviews.

In [4]:
import numpy as np
import pandas as pd

examples = train_dataset.select(np.random.randint(0, len(train_dataset), 3))
df = pd.DataFrame({"Text": examples["text"], "Label": examples["label"]})
styled_df = df.style.set_properties(**{"text-align": "left"}).set_table_styles(
    [{"selector": "th", "props": [("text-align", "center")]}]
)
with pd.option_context("display.max_colwidth", None):
    display(styled_df)

Unnamed: 0,Text,Label
0,"I finally got my wish to see this one in a cinema. I'd seen Fritz Lang's film on video some years ago. I'd been hoping that ideal screening conditions would work their magic. Conditions were ideal at Cinematheque Ontario. Pristine full-length print. Intertitles in the original Gothic-script German with simultaneous English translation, accurate without being too literal. Live piano accompaniment. Ideal. The film's magic sputtered for a little while but ultimately failed to catch, at least for me. This film bears no real relation to Wagner's Ring cycle as I already knew but some may not. Wagner had adapted the 13th c. Niebelungenlied to his own purposes. Part I of Fritz Lang's epic -- ""Siegfried"" -- has much that will be familiar to listeners of Wagner however. ""Kriemhild's Revenge"" is the story of Siegfried's wife Kriemhild, her marriage to King Etzel (Attila) the Hun, and her desire for revenge against Hagen and Gunther, the rechristened Nibelungs, for the murder of Siegfried. The spectacular conflagration in this film presumably evolved and expanded in the Wagnerian mythos into his Götterdämmerung, his Twilight of the Gods, and the end of Valhalla. This film remains earthbound. Most of the film is spectacular. The massive sets rival those of ""Cabiria"" (1914), which inspired Griffith's ""Intolerance"" (1916). Their decoration sets a new benchmark in barbaric splendour. There's a huge cast of scarred, mangy Huns and Art Deco Burgundians. And battles. Battles that never seem to end in fact. Kriemhild is very successful in her plan of revenge. She manages to destroy all around her. Her loyalty to her martyred Siegfried seems not to stem so much from love, or devotion, but from something closer to psychosis. Lady Macbeth cried out, ""Unsex me here."" She knew she was emotionally unprepared for what she needed to do. But Kriemhild displays no normal human emotions, and certainly nothing one equates with the feminine principle. She is already ""top full of direst cruelty"", to borrow Shakespeare's phrase, from the outset. Margarethe Schön and her director convey this with a glower. I don't want to exaggerate, but that glower is virtually the only expression ever to ""animate"" Kriemhild's face. It's the ultimate in one-note performances. It's clearly intentional however, not simply a case of poor acting. What we have then on offer is a one-dimensional sketch of an avenging Fury. Some might see Kriemhild as an empowered heroine. I just see the film as misogynistic.",1
1,"I typically don't like reality shows, particularly the ones that are profiting off of ""American Idol""'s success. But this one I can live with. Comedians from all around the world perform a brief routine for celebrity talent scouts, and if they like them, those guys will be sent to perform a routine for an actual audience. Then ten or twelve comics are selected to live in a house together and do ""Survivor"" style competitions using comedic tactics. Then one will be determined as ""Last Comic Standing."" I do like stand up comedy, so this is the one reality show must keen to my interests. There are usually some pretty funny comics selected through. It started the careers of such talents as Alonzo Bodden, Ralphie May, and Josh Blue. My negative criticisms is the fact that there is the possibility that a lot of these comics were selected for their contribution to reality show drama. At first they lived together in a house like ""Big Brother,"" but now they've done away with that, thank God. And there are a lot of comedians I felt, were only chosen not because they're funny, but because of race, ethnicity, attitude, sex, etc. when other comics clearly should've beaten them out. But overall, it's a well-made reality show, which are two terms up until now I thought were an oxymoron.",1
2,"this is not just a bad film, it's one of the worst films ever. it's so bad that i found it to be quite enjoyable. the acting, oh my god, the script, you gotta be kiddin'. how can you imagine the writer coming up with things like: - a kid who makes fireworks in school, fireworks SO powerfull, that when someone gets hit by it, they fly a hundred yards backwards and explode. -a girl is trapped in the celler, the killer is trying to break open the door. she gets a drill, but the wire isn't long enough. she first makes an extension cord, oh the horror, and then, when she's done, she drills through the door and drills through the head of the killer. WOW - and there are plenty more examples like that. oh yeah, and what happened to George Kennedy, he used to be great (Thunderbolt and Lightfoot/Cool hand Luke)",0


### Preprocessing the Reviews

Implement a function to tokenize a sentence ...

In [5]:
import contractions
import re
from torchtext.data.utils import get_tokenizer

tokenizer = get_tokenizer("basic_english")

def tokenize(text):
    """Tokenize text."""
    text = contractions.fix(text)

    replacements = {"’": "'", "‘": "'", "“": '"', "”": '"', " ́": "'", " ́ ́": '"'}
    for old, new in replacements.items():
        text = text.replace(old, new)

    tokens = tokenizer(text)

    filtered_tokens = [
        token for token in tokens
        if re.match(r"^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*(_[a-zA-Z0-9]+)*$", token)
    ]

    return filtered_tokens

... create a vocabulary ...

In [6]:
from torchtext.vocab import build_vocab_from_iterator

def imdb_iterator(dataset):
    """Iterate over the IMBD dataset."""
    for data in dataset:
        yield tokenize(data["text"])

vocab = build_vocab_from_iterator(
    imdb_iterator(train_dataset),
    specials=["<unk>"],
)
vocab.set_default_index(vocab["<unk>"])

... and preprocess the training, validation, and testing datasets.

In [7]:
def preprocessing(sample):
    """Preprocess the input data."""
    tokens = tokenize(sample["text"])
    indices = vocab(tokens)
    sample.update({"x": indices})
    return sample

train_dataset = train_dataset.map(preprocessing)
val_dataset = val_dataset.map(preprocessing)
test_dataset = dataset["test"].map(preprocessing)

Map:   0%|          | 0/20000 [00:00<?, ? examples/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

## Building a Transformer Encoder Layer

Prepare a class to implement a multi-head attention layer ...

In [8]:
import deeplay as dl
import torch
import torch.nn as nn

class MultiHeadAttentionLayer(dl.DeeplayModule):
    """"Multi-head attention layer."""

    def __init__(self, features, num_heads):
        """Initialize multi-head attention layer."""
        super().__init__()
        self.features, self.num_heads = features, num_heads
        self.layer = dl.Layer(nn.MultiheadAttention, features, num_heads)

    def forward(self, x, batch_indices):
        """Calculate forward pass."""
        attn_mask = self._fetch_attn_mask(batch_indices)
        y, *_ = self.layer(x, x, x, attn_mask=attn_mask)
        return y

    def _fetch_attn_mask(self, batch_indices):
        """Get attention mask."""
        return ~torch.eq(batch_indices.unsqueeze(1),
                         batch_indices.unsqueeze(0))

... and a class to implement a transformer encoder layer ...

In [9]:
from torch_geometric.nn.norm import LayerNorm

class TransformerEncoderLayer(dl.DeeplayModule):
    """Transformer encoder layer."""

    def __init__(self, d_model, num_heads, feedforward_dim, dropout_p=0.0):
        """Initialize transformer encoder layer."""
        super().__init__()

        self.d_model = d_model
        self.num_heads = num_heads
        self.feedforward_dim = feedforward_dim
        self.dropout_p = dropout_p

        self.self_attn = MultiHeadAttentionLayer(d_model, num_heads)
        self.attn_dropout = dl.Layer(nn.Dropout, dropout_p)
        self.attn_skip = dl.Add()
        self.attn_norm = dl.Layer(LayerNorm, d_model, eps=1e-6)

        self.feedforward = dl.Sequential(
            dl.Layer(nn.Linear, d_model, feedforward_dim),
            dl.Layer(nn.ReLU),
            dl.Layer(nn.Linear, feedforward_dim, d_model),
        )
        self.feedforward_dropout = dl.Layer(nn.Dropout, dropout_p)
        self.feedforward_skip = dl.Add()
        self.feedforward_norm = dl.Layer(LayerNorm, d_model, eps=1e-6)

    def forward(self, x, batch_index):
        """Calculate forward pass."""
        y_attn = self.self_attn(x, batch_index)
        y_attn = self.attn_dropout(y_attn)
        y_attn = self.attn_skip(x, y_attn)
        y_attn = self.attn_norm(y_attn, batch_index)

        y = self.feedforward(y_attn)
        y = self.feedforward_dropout(y)
        y = self.feedforward_skip(y_attn, y)
        y = self.feedforward_norm(y, batch_index)
        return y

## Building a Transformer Encoder Model

Build a class to implement a transformer encoder model ...

In [10]:
class TransformerEncoderModel(dl.DeeplayModule):
    """Transformer encoder model."""

    def __init__(self, vocab_size, d_model, num_heads, feedforward_dim,
                 num_layers, out_dim, dropout_p=0.0):
        """Initialize transformer encoder model."""
        super().__init__()

        self.d_model = d_model
        self.num_heads = num_heads
        self.feedforward_dim = feedforward_dim
        self.num_layers = num_layers
        self.dropout_p = dropout_p
        self.out_dim = out_dim

        self.embedding = dl.Layer(nn.Embedding, vocab_size, d_model)

        self.pos_encoder = dl.IndexedPositionalEmbedding(d_model)
        self.pos_encoder.dropout.configure(p=dropout_p)

        self.blocks = dl.LayerList()
        for _ in range(num_layers):
            self.blocks.append(
                TransformerEncoderLayer(
                    d_model, num_heads, feedforward_dim, dropout_p=dropout_p
                )
            )

        self.out = dl.Sequential(
            dl.Layer(nn.Dropout, dropout_p),
            dl.Layer(nn.Linear, d_model, d_model // 2),
            dl.Layer(nn.ReLU),
            dl.Layer(nn.Linear, d_model // 2, out_dim),
            dl.Layer(nn.Sigmoid),
        )

    def forward(self, seq):
        """Calculate forward pass."""
        h = self.embedding(seq["x"]) * self.d_model ** 0.5
        h = self.pos_encoder(h, seq["batch_indices"])

        for layer in self.blocks:
            h = layer(h, seq["batch_indices"])

        batch_size = torch.max(seq["batch_indices"]) + 1
        g = torch.zeros(batch_size, self.d_model, device=h.device)
        g = g.scatter_add(0, seq["batch_indices"][:, None].expand_as(h), h)
        g = g / torch.bincount(seq["batch_indices"])[:, None]

        return self.out(g).squeeze()

... instantiate the transformer encoder model ...

In [11]:
model = TransformerEncoderModel(
    vocab_size=len(vocab),
    d_model=300,
    num_heads=12,
    feedforward_dim=512,
    num_layers=4,
    out_dim=1,
    dropout_p=0.1,
).create()

print(model)

TransformerEncoderModel(
  (embedding): Embedding(80834, 300)
  (pos_encoder): IndexedPositionalEmbedding(
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (blocks): LayerList(
    (0-3): 4 x TransformerEncoderLayer(
      (self_attn): MultiHeadAttentionLayer(
        (layer): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=300, out_features=300, bias=True)
        )
      )
      (attn_dropout): Dropout(p=0.1, inplace=False)
      (attn_skip): Add()
      (attn_norm): LayerNorm(300, affine=True, mode=graph)
      (feedforward): Sequential(
        (0): Linear(in_features=300, out_features=512, bias=True)
        (1): ReLU()
        (2): Linear(in_features=512, out_features=300, bias=True)
      )
      (feedforward_dropout): Dropout(p=0.1, inplace=False)
      (feedforward_skip): Add()
      (feedforward_norm): LayerNorm(300, affine=True, mode=graph)
    )
  )
  (out): Sequential(
    (0): Dropout(p=0.1, inplace=False)
    (1): Linear(in_feature

... and add pretrained embeddings.

In [12]:
from torchtext.vocab import GloVe

glove = GloVe(name="42B", dim=300, cache="glove_embeddings_dataset")

model.embedding.weight.data = glove.get_vecs_by_tokens(
    vocab.get_itos(), lower_case_backup=True
)
model.embedding.weight.requires_grad = False

glove_embeddings_dataset/glove.42B.300d.zip: 1.88GB [05:53, 5.31MB/s]                            
100%|█████████▉| 1917493/1917494 [06:23<00:00, 5003.28it/s]


## Defining the Data Loaders

In [13]:
from torch.utils.data import DataLoader
from torch_geometric.data import Data

def collate(batch):
    """Combine data into a single batch that the model can process."""
    xs, ys, batch_indices = [], [], []
    for i, b in enumerate(batch):
        x, label = torch.tensor(b["x"]), torch.tensor(b["label"])
        xs.append(x), ys.append(label)
        batch_indices.append(torch.ones_like(x, dtype=torch.long) * i)
    return Data(x=torch.cat(xs), batch_indices=torch.cat(batch_indices),
                y=torch.Tensor(ys).float())

train_dataloader = DataLoader(
    train_dataset, batch_size=8, shuffle=True, collate_fn=collate
)
val_dataloader = DataLoader(
    val_dataset, batch_size=8, shuffle=False, collate_fn=collate
)
test_dataloader = DataLoader(
    test_dataset, batch_size=8, shuffle=False, collate_fn=collate
)


## Training the Model

Compile the model ...

In [14]:
class AdamW(dl.Optimizer):
    """AdamW optimizer."""

    def __pre_init__(self, **optimzer_kwargs):
        """Execute before initialization."""
        optimzer_kwargs.pop("classtype", None)
        super().__pre_init__(torch.optim.AdamW, **optimzer_kwargs)

classifier = dl.BinaryClassifier(
    model=model,
    optimizer=AdamW(lr=1e-4),
).create()

... and train it.

In [15]:
from lightning.pytorch.callbacks import ModelCheckpoint

checkpoint_callback = ModelCheckpoint(
    monitor="valBinaryAccuracy",
    dirpath="models",
    filename="ATT-model{epoch:02d}-val_accuracy{valBinaryAccuracy:.2f}",
    auto_insert_metric_name=False,
    mode="max",
)
trainer = dl.Trainer(max_epochs=5, callbacks=[checkpoint_callback])
trainer.fit(classifier, train_dataloader, val_dataloader)



Output()

## Evaluating the Trained Model

Load the best model ...

In [16]:
import glob, os

best_model = glob.glob("./models/ATT-model*")
best_model = max(best_model, key=os.path.getctime)
best_classifier = dl.BinaryClassifier \
    .load_from_checkpoint(best_model, model=model).create()

... test the trained model ...

In [17]:
test_results = trainer.test(best_classifier, test_dataloader)

Output()

... and display the model’s prediction on some reviews.

In [18]:
import pandas as pd
import random

best_classifier.model.eval()

texts, labels, predictions = [], [], []
for idx in random.sample(range(len(test_dataset)), 3):
    sample = test_dataset[idx]

    input_tensor = torch.Tensor(vocab(tokenize(sample["text"]))).long()
    test_input = {
        "x": input_tensor,
        "batch_indices": torch.zeros_like(input_tensor, dtype=torch.long)
    }

    probability = classifier.model(test_input)
    pred = probability > 0.5

    texts.append(sample["text"])
    labels.append(sample["label"])
    predictions.append(pred.item() * 1)

df = pd.DataFrame({"text": texts, "label": labels, "prediction": predictions})
styled_df = df.style.set_properties(**{"text-align": "left"}).set_table_styles(
    [{"selector": "th", "props": [("text-align", "center")]}]
)
with pd.option_context("display.max_colwidth", None):
    display(styled_df)

Unnamed: 0,text,label,prediction
0,"I first saw this movie as a younger child. My sister had told me about it and I thought it would be more of a kid's movie. However it remains to also be an incredible movie. True that the subject behind the movie is ruff but also true this movie will never stop touching your heart. I was only 6 when I first saw it and just yesterday, 7 years later, I saw it again. For the first time in a long time. Even after I knew how it ended, I knew that I had seen it a billion, bizillion times I wept like a teeny weeny little baby. When I saw it a few years back I finally got the idea and the seriousness of the film. So I stoped watching it for a while. But yesterday I didn't change the channel, I watched it. By the end I was astonished at how much it still made me laugh, cry, think, and above all, believe in mericals again. I haven't belived in a long time and this movie got me out of my shell and opened up my heart. This movie wasn't just impacting. I was also so impressed with the actors. Especially Bobby. So if you are wanting to see this movie for the first time I suggest seeing it alone. With tissues. And being ready to discover your young, sweet, innocent side and the side that still has hope. This movie touched my soul when I was only 6, and even in this time of trying to figure out who I am this movie helped me realize both what I don't and still do want to be in life.",1,1
1,"Jude law gives Keanu Reeves a run for his money as the most wooden actor around, Renee Z's character is straight out of the Beverly Hillbillies, and the two leads have about as much chemistry as Darth Vader and Queen Amedala. The ""bad guys"" are the worst kind of cliche, and there's not a subtle moment in the film. Incredible that some critics actually liked this movie.",0,0
2,"I really think this movie deserves some Oscars! I really don't care what people can say badly about this movie...because it's a really well played parts from Samuel L. Jackson, and mainly by Christina Ricci!! I'm a big fan of hers, right since I saw her in Addams Family...been trying to watch all things she makes, and this is absolutely one of the best parts she played!! I love her looks (even though people say she's not pretty...I think she is...and I love her eyes)!! The movie is about many things...people say that is religious...people say that it's about racism...people say that is about drugs...people say that is about nymphomania...well...maybe is a little bit of all that!! But what I truly feel is that this movie has a high level of eye opening for what blues is...what it stands for...and mainly were it comes from: life...heart...pain...sorrow...and above all...spontaneous feelings!! I hope you get to see it...if you are that kind of person that likes a movie not only from the pictures or the story it tells...this is a good movie to see...if your not...well...see it anyway...cant hurt that much, and you get to see Christina Ricci acting so horny!! She's a fox!!",1,1
