In [1]:
!pip install transformers
!pip install datasets

Collecting transformers
  Downloading transformers-4.33.2-py3-none-any.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.17.2-py3-none-any.whl (294 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.9/294.9 kB[0m [31m30.4 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m76.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m61.2 MB/s[0m eta [36m0:00:0

In [2]:
!nvidia-smi

Mon Sep 25 21:36:05 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   53C    P8    11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

**Setting the device**

By using a GPU, we can accelerate the training and inference of a machine learning model, which can significantly reduce the time required to complete these tasks.

In [3]:
import torch

if torch.cuda.is_available():
   dev = "cuda:0"
else:
   dev = "cpu"
device = torch.device(dev)
print('Using {}'.format(device))

Using cuda:0


# **Downloading Dataset**
The SST-2 dataset, or the Stanford Sentiment Treebank, is popular for sentiment analysis tasks in Natural Language Processing (NLP). It consists of movie reviews from the Rotten Tomatoes website that are labeled with either a positive or negative sentiment.

In [4]:
from datasets import load_dataset

test_dataset = load_dataset('glue', 'sst2', split='validation')

test_dataset = test_dataset.map(lambda example: {'labels': example['label']}, batched=True)
test_dataset = test_dataset.remove_columns(['label'])

Downloading builder script:   0%|          | 0.00/28.8k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/28.7k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/27.9k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/7.44M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/67349 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/872 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1821 [00:00<?, ? examples/s]

Map:   0%|          | 0/872 [00:00<?, ? examples/s]

In [5]:
import pandas as pd

df = pd.DataFrame(test_dataset)
df.head(10)

Unnamed: 0,sentence,idx,labels
0,it 's a charming and often affecting journey .,0,1
1,unflinchingly bleak and desperate,1,0
2,allows us to hope that nolan is poised to emba...,2,1
3,"the acting , costumes , music , cinematography...",3,1
4,"it 's slow -- very , very slow .",4,0
5,although laced with humor and a few fanciful t...,5,1
6,a sometimes tedious film .,6,0
7,or doing last year 's taxes with your ex-wife .,7,0
8,you do n't have to know about music to appreci...,8,1
9,"in exactly 89 minutes , most of which passed a...",9,0


**Downloading the model**

We utilize the transformers library to load the sentiment analysis fine-tuned BERT model. We are using two classes from the transformers library: *AutoTokenizer* and *AutoModelForSequenceClassification*.

The AutoTokenizer class is used to tokenize input text data in preparation for use with the BERT model. It is instantiated with a pre-trained tokenizer, in this case "bert-base-uncased", trained on the lower-cased English text. The AutoModelForSequenceClassification class is used to load a pre-trained BERT model fine-tuned for sequence classification, in this case "jap2/bert-base-sst-2". This pre-trained model has been fine-tuned on the SST-2 dataset for sentiment analysis, and it can classify input text into two sentiment categories: positive or negative.

In [6]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

model = AutoModelForSequenceClassification.from_pretrained("jap2/bert-base-sst-2")

model.to(device)

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

**Model Evaluation**

Test Data preparation

To evaluate the performance of the fine-tuned BERT model for sentiment analysis, we use the SST-2 dataset as a benchmark. Before running the evaluation, we need to prepare the test dataset by tokenizing the data and converting it to a format that can be processed by the model.

To tokenize the test dataset, we use the same tokenizer object that was used for the fine-tuning process. We then convert the tokenized data to the torch format, which is a format that can be processed by PyTorch, a popular machine learning framework.

The code provided below applies the tokenizer to each sentence in the test dataset and sets the padding and truncation parameters to ensure that each sentence is of equal length. The resulting tokenized data is then converted to the torch format, with the input_ids, token_type_ids, attention_mask, and labels columns being specified. This format is suitable for feeding into the fine-tuned BERT model for inference and evaluating its performance on the test dataset.

In [7]:
MAX_LENGTH = 128 #  maximum length of the tokenized sentences

test_dataset = test_dataset.map(lambda e: tokenizer(e['sentence'], truncation=True, padding='max_length', max_length=MAX_LENGTH), batched=True)

Map:   0%|          | 0/872 [00:00<?, ? examples/s]

The **set_format()** method is called on the dataset object to convert the dataset to the PyTorch tensor format, which is required by the BERT model.

The columns argument specifies which columns in the dataset should be included in the PyTorch format. In this case, the input_ids, token_type_ids, attention_mask, and labels columns are included. These columns correspond to the inputs and labels that the BERT model expects for sequence classification tasks.

In [8]:
test_dataset.set_format(type='torch', columns=['input_ids', 'token_type_ids', 'attention_mask', 'labels'])

Each example of the dataset consists of a dictionary with the following keys:

**labels:** This is the label for the sample. It is a tensor with value 1 (positive example) in this case.
input_ids: This is a tensor of integers representing the tokenized and encoded input text. The input text has been broken down into individual words, and each word has been assigned a unique integer identifier. The tensor has a length of 128, which means that the input text has been truncated or padded with zeros to fit this length.

**token_type_ids**: This is a tensor of integers indicating which part of the input text each token belongs to. In this case, all tokens belong to the same segment of text, so the tensor contains only zeros.

**attention_mask:** This is a tensor of ones and zeros indicating which elements of input_ids should be attended to by the NLP model. The ones represent the actual tokens in the input text, while the zeros represent the padding added to achieve the length of 128.

In [9]:
test_dataset[0]

{'labels': tensor(1),
 'input_ids': tensor([  101,  2009,  1005,  1055,  1037, 11951,  1998,  2411, 12473,  4990,
          1012,   102,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,  

We use the PyTorch's DataLoader class to create a data loader for evaluating a machine learning model on a test dataset. The data loader will load the test dataset in batches of 256 samples at a time.

The DataLoader class is a PyTorch utility that helps you efficiently load and preprocess data in batches from a dataset.

The batch_size parameter specifies the number of samples to load in each batch. Loading data in batches is important because it allows you to efficiently use the memory of your machine learning system. When you have a large dataset that doesn't fit into the memory of your system, loading the entire dataset at once would cause an out of memory error.

In [10]:
from torch.utils.data import DataLoader
eval_dataloader = DataLoader(test_dataset, batch_size=256)

**Inference**

 inference stage of the evaluation using fine-tuned model to make predictions on the test dataset.

Firstly, we import necessary libraries like torch, datasets, load_metric, and tqdm. We load the accuracy metric using the load_metric function from the datasets library. load_metric is a function from the Hugging Face's datasets library that provides a convenient way to load various evaluation metrics for natural language processing tasks. It allows users to easily import and use standard evaluation metrics, such as accuracy, F1 score, and perplexity, in their machine learning pipelines.

Then we set the model to evaluation mode using model.eval(). Next, we loop over the test dataset using a DataLoader object with a batch size of 256. Inside the loop, it prepares the batch by creating a dictionary of keys and values and moves the data to the GPU using the to() method. We then use torch.no_grad() to avoid tracking gradients, since we are not training the model. The model makes predictions on the batch using the model(**batch) method.

We then extract the logits from the output and use torch.argmax to get the predicted class label for each example.

Finally, we update the metric object by adding the batch predictions and corresponding true labels to it using metric.add_batch() and compute the overall accuracy using metric.compute()

In [11]:
import torch
from datasets import load_metric
from tqdm import tqdm

metric= load_metric("accuracy")
model.eval()
for batch in tqdm(eval_dataloader):
    batch = {k: v.to(device) for k, v in batch.items()}
    with torch.no_grad():
        outputs = model(**batch)

    logits = outputs.logits
    predictions = torch.argmax(logits, dim=-1)
    metric.add_batch(predictions=predictions, references=batch["labels"])

metric.compute()

  metric= load_metric("accuracy")


Downloading builder script:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

100%|██████████| 4/4 [00:08<00:00,  2.11s/it]


{'accuracy': 0.9243119266055045}

**Practical**

The input text is given in the input field and the tokenizer function will tokenize the text and convert it into a PyTorch tensor. The tokenizer function takes the input text as its argument and returns a dictionary containing the tokenized text as tensors. The return_tensors argument specifies that the function should return the output in PyTorch tensor format.

The tokenized input is then passed to the model to obtain the output logits. The output logits are a tensor of size 1x2, where the first element corresponds to the negative sentiment score and the second element corresponds to the positive sentiment score.

The argmax() function is used to obtain the index of the element with the highest score, which is used to determine the sentiment of the input text. The class_label variable is set to either "negative" or "positive" based on the sentiment score, and it is printed to the console. Users can try different input texts and observe how the model performs on them.

In [12]:
text = "This movie is awesome!" # @param

input = tokenizer(text, return_tensors="pt")
output = model(**input.to(device)) # notice that the input is moved to the device (GPU)
label_id = output.logits.argmax()
class_label = ["negative","positive"][label_id]
print(f"The sentence '{text}' is classified as '{class_label}'.")

The sentence 'This movie is awesome!' is classified as 'positive'.


**Analyzing the outputs**

The model's output contains information about the loss, logits, hidden states, and attentions. In this case, the loss is None, which means that the model did not calculate the loss during inference since the labels are not input to the model during this stage. The logits are a tensor of shape (1, 2) that represent the unnormalized scores for each class and can be transformed into probabilities by applying the softmax function. The hidden states and attentions are None, but one can choose to output them during inference by setting the output_hidden_states parameter to True when instantiating the Huggin Face transformers model.

In [13]:
output

SequenceClassifierOutput(loss=None, logits=tensor([[-3.0367,  3.7320]], device='cuda:0', grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

**Applying the activation function**

The softmax function is a mathematical function that converts a vector of real numbers into a probability distribution. In the context of deep learning, it is typically applied to the output of the last linear layer of a neural network, which contains the logits or raw scores that are transformed into probabilities.

In the code below, torch.softmax is a PyTorch function that applies the softmax operation along a specific dimension of a tensor. The first argument is the tensor to which the softmax function is applied, and the second argument specifies the dimension along which the function is applied. In this case, the dim=1 argument means that the function is applied along the second dimension of the logits tensor, which represents the different classes in our classification task. The resulting tensor contains the probabilities for each class.

In [14]:
import torch
torch.softmax(output["logits"], dim=1)

tensor([[0.0011, 0.9989]], device='cuda:0', grad_fn=<SoftmaxBackward0>)