# Language Modeling
The resources for training a transformer model with Language Modeling objective can be found in the Huggingface transformer library under the resources for Causal Language Modeling task. Here is a code snippet which demonstrates how to train a Bert Model with a language modeling head.

The Bert model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of cross-attention is added between the self-attention layers, following the architecture described in Attention is all you need article. Given that we want to use `BertLMHeadModel` as a standalone model, we shall add `is_decoder=True.`

In [1]:
import torch

from transformers import logging, BertConfig, BertTokenizer, BertLMHeadModel, BertForMaskedLM, BertForNextSentencePrediction
logging.set_verbosity_error()

In [2]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
config = BertConfig.from_pretrained("bert-base-uncased")

config.is_decoder = True
model = BertLMHeadModel.from_pretrained('bert-base-uncased', config=config)

inputs = tokenizer("California a well known for the great startups", return_tensors="pt")
outputs = model(**inputs)

prediction_logits = outputs.logits

print(inputs['input_ids'].shape)  # torch.Size([1, 11])
print(prediction_logits.shape)  # torch.Size([1, 11, 30522])

torch.Size([1, 11])
torch.Size([1, 11, 30522])


# Masked Language Modeling

In [3]:
model = BertForMaskedLM.from_pretrained('bert-base-uncased')

inputs = tokenizer("California a well known for the great [MASK].", return_tensors="pt")

outputs = model(**inputs)
logits = outputs.logits

print(inputs['input_ids'].shape)  # torch.Size([1, 11])
print(prediction_logits.shape)  # torch.Size([1, 11, 30522])

torch.Size([1, 11])
torch.Size([1, 11, 30522])


In [4]:
model = BertForMaskedLM.from_pretrained('bert-base-uncased')

sequence = f"I was walking in the {tokenizer.mask_token} while the car drove away"

inputs = tokenizer(sequence, return_tensors="pt")
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]

token_logits = model(**inputs).logits
mask_token_logits = token_logits[0, mask_token_index, :]

top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()

for token in top_5_tokens:
    print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

I was walking in the park while the car drove away
I was walking in the rain while the car drove away
I was walking in the dark while the car drove away
I was walking in the grass while the car drove away
I was walking in the street while the car drove away


# Next Sentence Prediction

In [5]:
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')

prompt = "Mercury is the first plant in the solar system"
next_sentence = "Venus is the second planet in the solar system"
encoding = tokenizer(prompt, next_sentence, return_tensors='pt')

outputs = model(**encoding)
logits = outputs.logits
print(logits[0, 0].item(), logits[0, 1].item())  # 6.285 -6.113. It can be next sentence

6.285246849060059 -6.113039970397949


In [6]:
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')

prompt = "Mercury is the first plant in the solar system"
next_sentence = "My name is Tom"
encoding = tokenizer(prompt, next_sentence, return_tensors='pt')

outputs = model(**encoding, labels=torch.LongTensor([1]))
logits = outputs.logits
print(logits[0, 0].item(), logits[0, 1].item())  # -1.675 3.869. next sentence was random

-1.6754419803619385 3.869070529937744
