# Steps and Code to Set Up Word Prediction with BERT

## 1. Install Dependencies:
## transformers and torch libraries

In [1]:
from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch

  from .autonotebook import tqdm as notebook_tqdm


# Load the Pre-trained BERT Model:

#### Load the pre-trained BERT model and tokenizer using Hugging Face's AutoModelForMaskedLM for masked language modeling (MLM)

In [2]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")

BertForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another archite

# Prepare Text for Word Prediction:

### To predict a missing word, replace it with the [MASK] token. BERT is pre-trained with masked words, so it will predict the masked token's probable words based on context.

In [7]:
sentence = "Rome is the [MASK] of Italy, which is why it hosts many government buildings."
inputs = tokenizer(sentence, return_tensors="pt")

# Perform Word Prediction:
### Pass the input through the BERT model to get predictions. BERT will output scores for the [MASK] token based on context.

In [11]:
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Identify the Top Predictions:
### Find the most probable words for the masked token by looking at the scores.

In [12]:
mask_token_index = (inputs["input_ids"] == tokenizer.mask_token_id).nonzero(as_tuple=True)[1]

# Get logits for the masked token
mask_token_logits = logits[0, mask_token_index, :]

# Choose the top 5 predictions
top_5_tokens = torch.topk(mask_token_logits, 1, dim=1).indices[0].tolist()


for token in top_5_tokens:
    print(f"Prediction: {tokenizer.decode([token])}")

Prediction: capital


# Next Sentence Prediction task:

### Sentences as "IsNext" (if the second sentence naturally follows the first) or "NotNext" using BERT

In [51]:
Sentence_A =  "Before my bed lies a pool of moon bright"
Sentence_B =  "I look up and see the bright shining moon"

Sentence_C = "i like ian rankin books very much"
Sentence_D = "yesterday i was reading the book of ian rankin"



# Tokenize and prepare inputs
inputs = tokenizer(Sentence_C, Sentence_D, return_tensors="pt")


with torch.no_grad():
    outputs = model(**inputs)
    sentence_logits = outputs.logits # Shape will be [1, 2]

In [52]:
is_next_score = logits[0, 0].item()  # Score for "IsNext"
not_next_score = logits[0, 1].item()  # Score for "NotNext"


if is_next_score > not_next_score:
    print("Prediction: IsNext")
else:
    print("Prediction: NotNext")

Prediction: IsNext
