# Predicting Sentiment Using a Transformer

<div style="background-color: #f0f8ff; border: 2px solid #4682b4; padding: 10px;">
<a href="https://colab.research.google.com/github/DeepTrackAI/DeepLearningCrashCourse/blob/main/Ch08_Attention/ec08_B_transformer/transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<strong>If using Colab/Kaggle:</strong> You need to uncomment the code in the cell below this one.
</div>

In [1]:
# Uncomment if using Colab/Kaggle.
# !pip install contractions datasets deeplay deeptrack spacy

This notebook provides you with a complete code example that predicts the sentiment of movie reviews using a transformer encoder network.

<div style="background-color: #f0f8ff; border: 2px solid #4682b4; padding: 10px;">
<strong>Note:</strong> This notebook contains the Code Example 8-B from the book  

**Deep Learning Crash Course**  
Benjamin Midtvedt, Jesús Pineda, Henrik Klein Moberg, Harshith Bachimanchi, Joana B. Pereira, Carlo Manzo, Giovanni Volpe  
No Starch Press, San Francisco (CA), 2025  
ISBN-13: 9781718503922  

[https://nostarch.com/deep-learning-crash-course](https://nostarch.com/deep-learning-crash-course)

You can find the other notebooks on the [Deep Learning Crash Course GitHub page](https://github.com/DeepTrackAI/DeepLearningCrashCourse).
</div>

## Using the IMDB Dataset

Start by downloading the Large Movie Review Dataset (often referred to as the IMDB dataset, as it’s available at https://huggingface.co/datasets/imdb). It contains 50,000 movie reviews, labeled as positive or negative. The dataset is divided into 25,000 reviews for training and 25,000 reviews for testing.

Download the IMDB dataset ...

In [2]:
from datasets import load_dataset

dataset = load_dataset("imdb")

... splitting the training and validation datasets ...

In [3]:
split = dataset["train"].train_test_split(test_size=0.2,
                                          stratify_by_column="label", seed=42)
train_dataset, val_dataset = split["train"], split["test"]

... and print some example reviews.

In [4]:
import numpy as np
import pandas as pd

samples = train_dataset.select(np.random.randint(0, len(train_dataset), 3))
texts, labels = samples["text"], samples["label"]

df = pd.DataFrame({"Text": texts, "Label": labels})
styled_df = df.style.set_properties(**{"text-align": "left"}).set_table_styles(
    [{"selector": "th", "props": [("text-align", "center")]}]
)
with pd.option_context("display.max_colwidth", None):
    display(styled_df)

Unnamed: 0,Text,Label
0,"First, an explanation: Despite my headline, I'm giving this film only 8 stars because overall this is NOT one of the best films ever made. All the criticisms registered here have valid points. Also, be warned that to enjoy the script you really need to appreciate Neil Simon's brilliance with finding the wit within real human banter. He does have a distinctively New York ear for dialogue -- especially dry, Jewish, love-suffused sarcasm -- and if you have trouble accepting sarcasm as an expression of love, then you might have trouble accepting the optimism at the heart of this movie. So much for warnings. Here's my main point: Walter Matthau is flat-out perfect, even beyond perfect, in this movie. I have never seen him funnier, or more touching for that matter -- because at the same time that he shows us the hilariousness of this character who refuses to give up his Big Star self-image or insufferable attitudes even as his coherence is in decline, he also shows us the more vulnerable, maybe even heartbreakingly scared person inside the grouch. And he only barely shows us that sad part -- it's just enough to really get to you if you happen to be coping with your own father's or husband's mental decline right now (I mention this as a warning), but artistically, it's just enough pathos to give this character the most authentically deep roots I'm seen in possibly any film performance. This is beyond Method acting -- Matthau's performance is exquisite as character work and a pure delight as comic delivery. This is a masterpiece of comic acting. About Richard Benjamin: I personally find his acting annoying in general, and his work in this movie is no exception -- although he has some fine moments here. (""Chicken is funny...."" is one of them.) So if you like him, you should like him here, and if you don't this movie won't change your mind. About the 1976 Oscars...I agree that Matthau was unfortunate to be up against Nicholson in ""Cuckoo's Nest"" that year. It was a killer year for leading-actor competition; if only there were separate Oscars for comedy and drama, then I think the Best Actor Oscars would have gone to Al Pacino for ""Dog Day Afternoon"" and to Walter Matthau for ""The Sunshine Boys"" -- not to dis Jack's fine work as McMurphy, but I think that Pacino and Matthau were each CLEARLY more masterful and astoundingly effective and downright legendary in their performances than Nicholson was that year. Also, I believe that Burns got the Supporting Actor Oscar more for sentimental reasons than for the quality of his performance -- I mean, he was good in this movie, but not THAT good. (Burns's fine-as-ever but unexceptional-in-itself return to show biz beat Brad Dourif's truly brilliant debut in ""Cuckoo's Nest,"" not to mention Chris Sarandon's stunning debut in ""Dog Day Afternoon"" -- which I think proves my theory.) Oscar theories aside, here's my bottom line review: If you like Matthau's comic acting, then see this movie and savor his powerhouse tirades and wonderful grandmother-inspired gestures, fleeting facial expressions and seemingly unscripted asides. (But if you're currently dealing with the pain of watching an old person lose his grip, then be warned that this movie might either be the comic relief you need or a dose of reality too painful to watch right now.)",1
1,"One thing i can say about this movie is well long, VERY LONG! I actually recently purchased this movie a couple of months ago seeing that there was a new version coming out. I was happy to find that it was made in 1978 because The 70's (even though i never lived in them) is actually one of my favourite decades, especially for the music! when i watched this movie the story was actually very good at the start but then after about 50 mins it started to get very boring and repetitive. i will admitt the animation did impress me! it was nothing i had ever seen before and was well pretty cool to see. but the movie honestly could of been a bit better, it could of had alot more talking and story to it than just 15 to 20 minute scenes that just had wierd fighting. then for the last 5 or 10 minutes the movie picked up and got good again but ended unexpectedly. in my opinion i thought it was EXTREMELY long. i know its 13 minutes over 2 hours and that is still long for a cartoon but since it was boring for most of the movie, it made it seem like it was 4 hours long!!!! but overall it is an okay film i guess and i will watch it again on one of those ""nothing to do days"". i will see the new one and i hope it is better!",1
2,"I have watched this movie three times. The last time, I kept skipping around confusing scenes to find resolution for the plot. Perhaps the plot is not intended to hang together logically. Or perhaps these rough spots are in the plot because Ann's recall of distant events is rather faulty. Take the young Ann Grant (Claire Danes). Here is a young woman who has attended an unnamed college with the scions of a rich family. She must have had help to afford this very expensive education, but never seems to have any family ties at all. She never seems to have any relatives she can turn to when the consequences of one of her disastrous decisions take effect. Ann shares an evening of passion with her great love Harris Arden (Patrick Wilson). Then, when Harris comforts Lila after the tragic death of her brother Buddy, Ann suddenly finds him repulsive and is disgusted with her own behavior. I must have missed something significant here. Ann's behavior seems totally inexplicable. Ann abandons her relationship with Harris and eventually marries one of the groomsmen at Lila's wedding. Despite Ann's rejection of Harris, she continues to hold deep feelings for him on her deathbed. It was obvious from his behavior that Harris was deeply smitten with Ann and would have gladly married her. A scene showing their chance meeting years after Lila's wedding showed that Harris still had deep feelings for Ann. The film showed a pattern for Ann's romantic relationships. She always had a falling out with her men and she rejected them. This pattern held with Harris and two husbands. In contrast, Lila married a man she did not love and she remained with her husband until he died. Perhaps Lila was able to build a relationship because she refused to let her marriage fail. Then came the too convenient reappearance of Lila Ross at Ann's bedside. Apparently Ann's nurse was able to extract enough information from Ann's last few lucid moments to identify and contact Lila. None of this communication appeared on the film. I kept wondering about the house Ann was living in during her final days. How did she afford to buy such a house on the meager earnings of her singing career? Ann always seemed one step ahead of financial disaster while raising her two daughters. On another level, I enjoyed the film's setting and music immensely. The seaside mansion was just so heartbreakingly beautiful. Claire Danes was luminous as the young Ann Grant. She is really quite a talented singer. I much prefer her natural brunette to the bottle blonde look she had in the film extras. If only those pesky CGI fireflies would go away, I could raise the movie a whole point in my vote!",1


### Preprocessing the Reviews

Implement a function to tokenize a sentence ...

In [5]:
import contractions, re, spacy, unicodedata

tokenizers = {"eng": spacy.blank("en"), "spa": spacy.blank("es")}

regular_expression = r"^[a-zA-Z0-9áéíóúüñÁÉÍÓÚÜÑ.,!?¡¿/:()]+$"
pattern = re.compile(unicodedata.normalize("NFC", regular_expression))

def tokenize(text, lang="eng"):
    """Tokenize text."""
    swaps = {"’": "'", "‘": "'", "“": '"', "”": '"', "´": "'", "´´": '"'}
    for old, new in swaps.items():
        text = text.replace(old, new)
    text = contractions.fix(text) if lang == "eng" else text
    tokens = tokenizers[lang](text)
    return [token.text for token in tokens if pattern.match(token.text)]

### Building a Vocabulary

Implement a class to represent a vocabulary ...

In [6]:
class Vocab:
    """Vocabulary as callable dictionary."""

    def __init__(self, vocab_dict, unk_token="<unk>"):
        """Initialize vocabulary."""
        self.vocab_dict, self.unk_token = vocab_dict, unk_token
        self.default_index = vocab_dict.get(unk_token, -1)
        self.index_to_token = {idx: token for token, idx in vocab_dict.items()}

    def __call__(self, token_or_tokens):
        """Return the index(es) for given token or list of tokens."""
        if not isinstance(token_or_tokens, list):
            return self.vocab_dict.get(token_or_tokens, self.default_index)
        else:
            return [self.vocab_dict.get(token, self.default_index)
                    for token in token_or_tokens]

    def set_default_index(self, index):
        """Set default index for unknown tokens."""
        self.default_index = index

    def lookup_token(self, index_or_indices):
        """Retrieve token corresponding to given index or list of indices."""
        if not isinstance(index_or_indices, list):
            return self.index_to_token.get(int(index_or_indices),
                                           self.unk_token)
        else:
            return [self.index_to_token.get(int(index), self.unk_token)
                    for index in index_or_indices]

    def get_tokens(self):
        """Return a list of tokens ordered by their index."""
        tokens = [None] * len(self.index_to_token)
        for index, token in self.index_to_token.items():
            tokens[index] = token
        return tokens

    def __iter__(self):
        """Iterate over the tokens in the vocabulary."""
        return iter(self.vocab_dict)

    def __len__(self):
        """Return the number of tokens in the vocabulary."""
        return len(self.vocab_dict)

    def __contains__(self, token):
        """Check if a token is in the vocabulary."""
        return token in self.vocab_dict

... implement a function to build vocabulary from an iterator ...

In [7]:
from collections import Counter

def build_vocab_from_iterator(iterator, specials=None, min_freq=1):
    """Build vocabulary from an iterator over tokenized sentences."""
    token_freq = Counter(token for tokens in iterator for token in tokens)
    vocab, index = {}, 0
    if specials:
        for token in specials:
            vocab[token] = index
            index += 1
    for token, freq in token_freq.items():
        if freq >= min_freq:
            vocab[token] = index
            index += 1
    return vocab

... create a vocabulary ...

In [8]:
def imdb_iterator(dataset):
    """Iterate over the IMDB dataset."""
    for sample in dataset:
        yield tokenize(sample["text"])

vocab_dict = build_vocab_from_iterator(imdb_iterator(train_dataset),
                                       specials=["<unk>"], min_freq=10)
vocab = Vocab(vocab_dict, unk_token="<unk>")
vocab.set_default_index(vocab(vocab.unk_token))

... and preprocess the training, validation, and testing datasets.

In [9]:
def preprocessing(sample):
    """Preprocess a movie review."""
    sentence = sample["text"]
    tokens = tokenize(unicodedata.normalize("NFC", sentence))
    sequence_of_indices = vocab(tokens)
    sample.update({"sequences": sequence_of_indices})
    return sample

train_dataset = train_dataset.map(preprocessing)
val_dataset = val_dataset.map(preprocessing)
test_dataset = dataset["test"].map(preprocessing)

## Defining the Data Loaders

In [10]:
import torch
from torch.utils.data import DataLoader
from torch_geometric.data import Data

def collate(batch_of_sequences):
    """Prepare a batch of sequences for the model to process."""
    sequences, labels, batch_indices = [], [], []
    for batch_index, sample in enumerate(batch_of_sequences):
        sequence = torch.tensor(sample["sequences"])
        sequences.append(sequence)
        batch_indices.append(torch.ones_like(sequence, dtype=torch.long)
                             * batch_index)
        label = torch.tensor(sample["label"])
        labels.append(label)
    return Data(sequences=torch.cat(sequences),
                batch_indices=torch.cat(batch_indices),
                y=torch.Tensor(labels).float())

train_dataloader = \
    DataLoader(train_dataset, batch_size=8, shuffle=True, collate_fn=collate)
val_dataloader = \
    DataLoader(val_dataset, batch_size=8, shuffle=False, collate_fn=collate)
test_dataloader = \
    DataLoader(test_dataset, batch_size=8, shuffle=False, collate_fn=collate)

## Building a Transformer Encoder Layer

Prepare a class to implement a multi-head attention layer ...

In [11]:
import deeplay as dl

class MultiHeadAttentionLayer(dl.DeeplayModule):
    """Multi-head attention layer with masking."""

    def __init__(self, num_features, num_heads):
        """Initialize multi-head attention."""
        super().__init__()
        self.num_features, self.num_heads = num_features, num_heads
        self.head_dim = num_features // num_heads  # Must be integer.

        self.Wq = dl.Layer(torch.nn.Linear, num_features, num_features)
        self.Wk = dl.Layer(torch.nn.Linear, num_features, num_features)
        self.Wv = dl.Layer(torch.nn.Linear, num_features, num_features)
        self.Wout = dl.Layer(torch.nn.Linear, num_features, num_features)

    def forward(self, in_sequence, batch_indices):
        """Apply the multi-head attention mechanism to the input sequence."""
        seq_len, embed_dim = in_sequence.shape
        Q = self.Wq(in_sequence)
        Q = Q.view(seq_len, self.num_heads, self.head_dim).permute(1, 0, 2)
        K = self.Wk(in_sequence)
        K = K.view(seq_len, self.num_heads, self.head_dim).permute(1, 0, 2)
        V = self.Wv(in_sequence)
        V = V.view(seq_len, self.num_heads, self.head_dim).permute(1, 0, 2)

        attn_scores = (torch.matmul(Q, K.transpose(-2, -1))
                       / (self.head_dim ** 0.5))

        attn_mask = torch.eq(batch_indices.unsqueeze(1),
                             batch_indices.unsqueeze(0))
        attn_mask = attn_mask.unsqueeze(0)
        attn_scores = attn_scores.masked_fill(attn_mask == False,
                                              float("-inf"))

        attn_weights = torch.nn.functional.softmax(attn_scores, dim=-1)
        attn_output = torch.matmul(attn_weights, V)
        attn_output = attn_output.permute(1, 0, 2).contiguous()
        attn_output = attn_output.view(seq_len, self.num_features)
        return self.Wout(attn_output)

... and a class to implement a transformer encoder layer ...

In [12]:
from torch_geometric.nn.norm import LayerNorm

class TransformerEncoderLayer(dl.DeeplayModule):
    """Transformer encoder layer."""

    def __init__(self, num_features, num_heads, feedforward_dim, dropout=0.0):
        """Initialize transformer encoder layer."""
        super().__init__()

        self.self_attn = MultiHeadAttentionLayer(num_features, num_heads)
        self.attn_dropout = dl.Layer(torch.nn.Dropout, dropout)
        self.attn_skip = dl.Add()
        self.attn_norm = dl.Layer(LayerNorm, num_features, eps=1e-6)

        self.feedforward = dl.Sequential(
            dl.Layer(torch.nn.Linear, num_features, feedforward_dim),
            dl.Layer(torch.nn.ReLU),
            dl.Layer(torch.nn.Linear, feedforward_dim, num_features),
        )
        self.feedforward_dropout = dl.Layer(torch.nn.Dropout, dropout)
        self.feedforward_skip = dl.Add()
        self.feedforward_norm = dl.Layer(LayerNorm, num_features, eps=1e-6)

    def forward(self, in_sequence, batch_indices):
        """Refine sequence via attention and feedforward layers."""
        attns = self.self_attn(in_sequence, batch_indices)
        attns = self.attn_dropout(attns)
        attns = self.attn_skip(in_sequence, attns)
        attns = self.attn_norm(attns, batch_indices)

        out_sequence = self.feedforward(attns)
        out_sequence = self.feedforward_dropout(out_sequence)
        out_sequence = self.feedforward_skip(attns, out_sequence)
        out_sequence = self.feedforward_norm(out_sequence, batch_indices)

        return out_sequence

## Building a Transformer Encoder Model

Build a class to implement a transformer encoder model ...

In [13]:
class TransformerEncoderModel(dl.DeeplayModule):
    """Transformer encoder model."""

    def __init__(self, vocab_size, num_features, num_heads, feedforward_dim,
                 num_layers, out_dim, dropout=0.0):
        """Initialize transformer encoder model."""
        super().__init__()
        self.num_features = num_features

        self.embedding = dl.Layer(torch.nn.Embedding, vocab_size, num_features)

        self.pos_encoder = dl.IndexedPositionalEmbedding(num_features)
        self.pos_encoder.dropout.configure(p=dropout)

        self.transformer_block = dl.LayerList()
        for _ in range(num_layers):
            self.transformer_block.append(TransformerEncoderLayer(
                    num_features, num_heads, feedforward_dim, dropout=dropout,
            ))

        self.out_block = dl.Sequential(
            dl.Layer(torch.nn.Dropout, dropout),
            dl.Layer(torch.nn.Linear, num_features, num_features // 2),
            dl.Layer(torch.nn.ReLU),
            dl.Layer(torch.nn.Linear, num_features // 2, out_dim),
            dl.Layer(torch.nn.Sigmoid),
        )

    def forward(self, dict):
        """Predict sentiment of movie reviews."""
        in_sequence, batch_indices = dict["sequences"], dict["batch_indices"]

        embeddings = self.embedding(in_sequence) * self.num_features ** 0.5
        pos_embeddings = self.pos_encoder(embeddings, batch_indices)

        out_sequence = pos_embeddings
        for transformer_layer in self.transformer_block:
            out_sequence = transformer_layer(out_sequence, batch_indices)

        batch_size = torch.max(batch_indices) + 1
        aggregates = torch.zeros(batch_size, self.num_features,
                                 device=out_sequence.device)
        for batch_index in torch.unique(batch_indices):
            mask = batch_indices == batch_index
            aggregates[batch_index] = out_sequence[mask].mean(dim=0)

        pred_sentiment = self.out_block(aggregates).squeeze()
        return pred_sentiment

... instantiate the transformer encoder model ...

In [14]:
model = TransformerEncoderModel(
    vocab_size=len(vocab), num_features=300, num_heads=12, feedforward_dim=512,
    num_layers=4, out_dim=1, dropout=0.1,
).create()

... and print it out.

In [15]:
print(model)

TransformerEncoderModel(
  (embedding): Embedding(19566, 300)
  (pos_encoder): IndexedPositionalEmbedding(
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (transformer_block): LayerList(
    (0-3): 4 x TransformerEncoderLayer(
      (self_attn): MultiHeadAttentionLayer(
        (Wq): Linear(in_features=300, out_features=300, bias=True)
        (Wk): Linear(in_features=300, out_features=300, bias=True)
        (Wv): Linear(in_features=300, out_features=300, bias=True)
        (Wout): Linear(in_features=300, out_features=300, bias=True)
      )
      (attn_dropout): Dropout(p=0.1, inplace=False)
      (attn_skip): Add()
      (attn_norm): LayerNorm(300, affine=True, mode=graph)
      (feedforward): Sequential(
        (0): Linear(in_features=300, out_features=512, bias=True)
        (1): ReLU()
        (2): Linear(in_features=512, out_features=300, bias=True)
      )
      (feedforward_dropout): Dropout(p=0.1, inplace=False)
      (feedforward_skip): Add()
      (feedforward_norm): La

## Loading Pretrained Embeddings

Download the GloVe embeddings ...

In [16]:
import os
from torchvision.datasets.utils import download_url, extract_archive

glove_folder = ".glove_cache"
if not os.path.exists(glove_folder):
    os.makedirs(glove_folder, exist_ok=True)
    url = "https://nlp.stanford.edu/data/glove.42B.300d.zip"
    download_url(url, glove_folder)
    zip_filepath = os.path.join(glove_folder, "glove.42B.300d.zip")
    extract_archive(zip_filepath, glove_folder)
    os.remove(zip_filepath)

... implement a function to load the GloVe embeddings ...

In [17]:
def load_glove_embeddings(glove_file):
    """Load GloVe embeddings."""
    glove_embeddings = {}
    with open(glove_file, "r", encoding="utf-8") as file:
        for line in file:
            values = line.split()
            word = values[0]
            glove_embeddings[word] = np.round(
                np.asarray(values[1:], dtype="float32"), decimals=6,
            )
    return glove_embeddings

... implement a function to get GloVe embeddings for a vocabulary ...

In [18]:
def get_glove_embeddings(vocab, glove_embeddings, embed_dim):
    """Get GloVe embeddings for a vocabulary."""
    embeddings = torch.zeros((len(vocab), embed_dim), dtype=torch.float32)
    for i, token in enumerate(vocab):
        embedding = glove_embeddings.get(token)
        if embedding is None:
            embedding = glove_embeddings.get(token.lower())
        if embedding is not None:
            embeddings[i] = torch.tensor(embedding, dtype=torch.float32)
    return embeddings

... ad add the GloVe pretrained embeddings.

In [19]:
glove_file = os.path.join(glove_folder, "glove.42B.300d.txt")
glove_embed, embed_dim = load_glove_embeddings(glove_file), 300

model.embedding.weight.data = \
    get_glove_embeddings(vocab.get_tokens(), glove_embed, embed_dim)
model.embedding.weight.requires_grad = False

## Training the Model

Compile the model ...

In [20]:
classifier = dl.BinaryClassifier(
    model=model, optimizer=dl.AdamW(lr=1e-4),
).create()

... and train it.

In [None]:
trainer = dl.Trainer(max_epochs=5)
trainer.fit(classifier, train_dataloader, val_dataloader)

/Users/giovannivolpe/Documents/GitHub/DeepLearningCrashCourse/py_env_book/lib/python3.10/site-packages/lightning/pytorch/trainer/setup.py:177: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.
/Users/giovannivolpe/Documents/GitHub/DeepLearningCrashCourse/py_env_book/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default

  | Name          | Type                    | Params | Mode 
------------------------------------------------------------------
0 | loss          | BCELoss                 | 0      | trai

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/giovannivolpe/Documents/GitHub/DeepLearningCrashCourse/py_env_book/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=10` in the `DataLoader` to improve performance.
/Users/giovannivolpe/Documents/GitHub/DeepLearningCrashCourse/py_env_book/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=10` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

## Evaluating the Trained Model

Test the trained model ... ...

In [22]:
test_results = trainer.test(classifier, test_dataloader)

/Users/giovannivolpe/Documents/GitHub/DeepLearningCrashCourse/py_env_book/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=10` in the `DataLoader` to improve performance.


Testing: |          | 0/? [00:00<?, ?it/s]

... and display the model’s prediction on some reviews.

In [23]:
import random

classifier.model.eval()

texts, labels, predictions = [], [], []
for idx in random.sample(range(len(test_dataset)), 3):
    sample = test_dataset[idx]
    input_sequence = torch.Tensor(vocab(tokenize(sample["text"]))).long()
    test_input = {
        "sequences": input_sequence,
        "batch_indices": torch.zeros_like(input_sequence, dtype=torch.long),
    }
    probability = classifier.model(test_input)
    prediction = probability > 0.5

    texts.append(sample["text"])
    labels.append(sample["label"])
    predictions.append(prediction.item() * 1)

df = pd.DataFrame({"text": texts, "label": labels, "prediction": predictions})
styled_df = df.style.set_properties(**{"text-align": "left"}).set_table_styles(
    [{"selector": "th", "props": [("text-align", "center")]}]
)
with pd.option_context("display.max_colwidth", None):
    display(styled_df)

Unnamed: 0,text,label,prediction
0,"This is easily one of the best movies of the 1950s. Otto Preminger directed only four or five really good movies and this is one of them. Frank Sinatra gives his best performance and the music score by Elmer Bernstein is dynamite. From the opening titles (by Saul Bass) to the hysteria of drug addict Frank going cold turkey, this is a riveting movie! With Kim Novak (giving a very good performance), Eleanor Parker (giving a very bad performance) as well as Darren McGavin as the reptilian pusher and Arnold Stang as Frank's grifter pal. Beware of bad prints: this movie is in the public domain so some copies are pretty rough.",1,1
1,"It's hard to believe a movie can be this bad, but you live and learn. What's more amazing is the fact that the people who put this thing together likely had college educations. Meanwhile, the fruit of their labor bares the appearance of something a group of five eighth graders may have come up with. On the bright side, (if there is one) the soundtrack has some nice moments, which is another reason to question how the rest of the film can be so hideously bad.",0,0
2,"""Two Hands"" is an entertaining, funny story about Australian lowlifes. The screenplay contrasts the world of fast money and deadly acts with the inexplicability of fate and circumstance. In a subtle way we are asked to ponder the concept that major events in our lives are sometimes generated without our being fully aware of the root causes. The forces of fate and circumstance take Jimmy, the main character, into situations that bring about the realization of his shallow dreams and, ultimately, an understanding of a more personally promising world. The clueless Jimmy, portrayed with acumen by Heath Ledger, is a kid who grew up without opportunity. The high paying world of crime offers the greatest appeal to his blunted senses. The love and help of friends guides him to a higher plateau. The film is well-directed and well-acted. The band of criminals teeter between likable and despicable, keeping us interested in their crazy antics all through the film.",1,1
