Self-supervised learning (SSL) is a technique where the model learns by predicting part of the input data itself without requiring labeled data. A classic SSL example is Masked Language Modeling (MLM), where a model, like BERT, learns by predicting missing words in a sentence.

In [1]:
from transformers import BertTokenizer, BertForMaskedLM
import torch

# Load pre-trained BERT model and tokenizer for masked language modeling
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForMaskedLM.from_pretrained("bert-base-uncased")

# Input sentence with a masked word
sentence = "The quick brown [MASK] jumps over the lazy dog."

# Tokenize the input sentence
inputs = tokenizer(sentence, return_tensors="pt")

# Run the model to predict the masked token
with torch.no_grad():
    outputs = model(**inputs)
    predictions = outputs.logits

# Get the predicted token for the masked position
masked_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(axis=-1).item()
predicted_token = tokenizer.decode([predicted_token_id])

print(f"Input Sentence: {sentence}")
print(f"Predicted masked word: '{predicted_token}'")


BertForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another archite

Input Sentence: The quick brown [MASK] jumps over the lazy dog.
Predicted masked word: 'cat'
