# BERT Next Sentence Prediction (NSP) Exploration

This notebook explores the Next Sentence Prediction (NSP) task using the BERT model.

**Objectives:**

* Understand how BERT's NSP works.
* Experiment with different sentence pairs to see how BERT predicts their relationship.
* Analyze the impact of sentence similarity on NSP predictions.

In [None]:
!pip install transformers==4.28.1 --quiet

In [None]:
import torch
from transformers import BertTokenizer, BertForNextSentencePrediction

# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Load pre-trained model (weights)
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')
model = model.to('cuda' if torch.cuda.is_available() else 'cpu') # Use cuda if available, fallback to cpu
model = model.eval() # This is important as it stops dropouts

In [None]:
def predict_nsp(sentence1, sentence2, model, tokenizer):
  """
  Calculates the probability that sentence2 follows sentence1.

  Args:
      sentence1 (str): The first sentence.
      sentence2 (str): The second sentence.
      model (BertForNextSentencePrediction): The pre-trained BERT model.
      tokenizer (BertTokenizer): The tokenizer associated with the model.

  Returns:
      float: The probability that sentence2 follows sentence1.
  """
  inputs = tokenizer(sentence1, sentence2, return_tensors='pt', padding=True, truncation=True)
  inputs = {k: v.to(model.device) for k, v in inputs.items()}
  outputs = model(**inputs)
  probability = torch.softmax(outputs.logits, dim=1)[:, 0].item()
  return probability

In [None]:
# Example usage
sentence1 = "Who was Jim Henson?"
sentence2 = "Jim Henson was a puppeteer."
sentence3 = "Paris is in France."

probability1 = predict_nsp(sentence1, sentence2, model, tokenizer)
probability2 = predict_nsp(sentence1, sentence3, model, tokenizer)

print(f"Probability that '{sentence2}' follows '{sentence1}': {probability1:.4f}")
print(f"Probability that '{sentence3}' follows '{sentence1}': {probability2:.4f}")

## Exploring Next Sentence Prediction (NSP)

**Task:**

1. **Experiment with Sentence Pairs:**  Try out different sentence pairs using the `predict_nsp` function. Vary the semantic relationship between the sentences (e.g., related, unrelated, contradictory) to see how BERT's predictions change.

2. **The [CLS] Token:**  In the tokenized input, you'll notice the special token `[CLS]` at the beginning. This token plays a crucial role in NSP. Research and explain its purpose in the context of BERT's architecture.

**Example:**



In [None]:
# Try your own sentences here

your_sentence1 = "The cat sat on the mat."
your_sentence2 = "Dogs are loyal companions."

probability = predict_nsp(your_sentence1, your_sentence2, model, tokenizer)
print(f"Probability: {probability:.4f}")