# NSP In Code
Let’s take a look at how we can demonstrate NSP in code.

We’ll be using HuggingFace’s transformers and PyTorch, alongside the bert-base-uncased model. So, let’s import and initialize everything first:



In [None]:
!pip install transformers torch accelerate -U transformers[torch]



In [None]:
from transformers import BertTokenizer, BertForNextSentencePrediction
import torch

In [None]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')


text = ("After Abraham Lincoln won the November 1860 presidential election on an "
        "anti-slavery platform, an initial seven slave states declared their "
        "secession from the country to form the Confederacy.")
text2 = ("War broke out in April 1861 when secessionist forces attacked Fort "
         "Sumter in South Carolina, just over a month after Lincoln's "
         "inauguration.")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]



config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

# Notice that we have two separate strings — text for sentence A, and text2 for sentence B. Keeping them separate allows our tokenizer to process them both correctly, which we’ll explain in a moment.

## We now have three steps that we need to take:

Tokenization

Create classification label

Calculate loss

## Let’s start with tokenization.

1.Tokenization — we perform tokenization using our initialized tokenizer, passing both text and text2.

In [None]:
inputs = tokenizer(text, text2, return_tensors = 'pt')
inputs.keys()

dict_keys(['input_ids', 'token_type_ids', 'attention_mask'])

In [None]:
inputs

{'input_ids': tensor([[  101,  2044,  8181,  5367,  2180,  1996,  2281,  7313,  4883,  2602,
          2006,  2019,  3424,  1011,  8864,  4132,  1010,  2019,  3988,  2698,
          6658,  2163,  4161,  2037, 22965,  2013,  1996,  2406,  2000,  2433,
          1996, 18179,  1012,   102,  2162,  3631,  2041,  1999,  2258,  6863,
          2043, 22965,  2923,  2749,  4457,  3481,  7680,  3334,  1999,  2148,
          3792,  1010,  2074,  2058,  1037,  3204,  2044,  5367,  1005,  1055,
         17331,  1012,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

There are a few things that we should be aware of for NSP. First, our two sentences are merged into the same set of tensors — but there are ways that BERT can identify that they are, in fact, two separate sentences.

A [SEP] token is added in between both sentences. This separator token is represented by 102 in our input_ids tensor above.

The token_type_ids tensor contains segment ids that identify which segment the respective token belongs to. Sentence A is represented by 0 and sentence B by 1.


# 2.Create class label
— The next step is easy, all we need to do here is create a new labels tensor that identifies whether sentence B follows sentence A.



In [None]:
labels = torch.LongTensor([0])
labels

tensor([0])

# We use a value of 0 to represent IsNextSentence and 1 for NotNextSentence. Additionally, we must use the torch.LongTensor format.

3.Calculate loss — Finally, we get around to calculating our loss. We start by processing our inputs and labels through our model.

In [None]:
outputs= model(**inputs, labels=labels)
outputs.keys()

odict_keys(['loss', 'logits'])

In [None]:
outputs.loss

tensor(3.2186e-06, grad_fn=<NllLossBackward0>)

In [None]:
outputs.loss.item()


3.2186455882765586e-06

Our model will return the loss tensor, which is what we would optimize on during training — which we’ll move onto very soon.

# Prediction

We may also not need to train our model, and would just like to use the model for inference. In this case, we would have no labels tensor, and we would modify the last part of our code to extract the logits tensor like so:



In [None]:
outputs = model(** inputs)
outputs.keys()

odict_keys(['logits'])

# And take the argmax to get our prediction:



In [None]:
torch.argmax(outputs.logits)

tensor(0)