This document is a demonstration of the BERT model, retrieved from https://huggingface.co/bert-base-cased. 

In [1]:
# Uncomment and run the line below to install the transformers package
# !pip install transformers

In [2]:
from transformers import pipeline, AutoTokenizer, BertForNextSentencePrediction
import torch
from torch.nn.functional import softmax

In [3]:
# Import BERT (cased)
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
model = BertForNextSentencePrediction.from_pretrained("bert-base-cased")

# Pre-built for masking tasks
unmasker = pipeline('fill-mask', model='bert-base-cased')

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForNextSentencePrediction: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationshi

Testing the "unmasker" function

In [4]:
for dictionary in unmasker("This is a test of how [MASK] the model works"):
    print("{:<25s} {:f}".format(dictionary.get("token_str"), dictionary.get("score")))

well                      0.683454
effectively               0.079846
efficiently               0.036923
far                       0.014120
accurately                0.013984


In [5]:
for dictionary in unmasker("This is a more ambiguous answer. There is no [MASK] solution"):
    print("{:<25s} {:f}".format(dictionary.get("token_str"), dictionary.get("score")))

alternative               0.079788
other                     0.066093
obvious                   0.056552
clear                     0.044919
possible                  0.040370


In [6]:
for dictionary in unmasker("Gibberish: cardboard indigo dog outer [MASK] turpentine France shellack"):
    print("{:<25s} {:f}".format(dictionary.get("token_str"), dictionary.get("score")))

,                         0.116117
and                       0.051120
:                         0.041306
-                         0.040721
of                        0.022506


In [7]:
def run_bert_nsp(first_sentence, possible_next_sentence):
    encoding = tokenizer(first_sentence, possible_next_sentence, return_tensors="pt")
    logits = model(**encoding, labels=torch.LongTensor([1])).logits    # logits are output
    
    probs = softmax(logits, dim=1)
    
    print(probs[0,0].item())
    
    if probs[0,0] < probs[0,1]:
        print("Not likely to be the next sentence.")
    else:
        print("Likely to be the next sentence.")

In [8]:
run_bert_nsp("Folk in those stories had lots of chances of turning back, only they didn't.", 
             "They kept going because they were holding onto something.")

0.9972876310348511
Likely to be the next sentence.


In [9]:
run_bert_nsp("Tell him about the Twinkie.", 
             "Are you telling me you built a time machine out of a DeLorean?")

0.13158494234085083
Not likely to be the next sentence.


This behavior is as expected. It makes sense so far. But then, it gets weird:

In [10]:
run_bert_nsp("I don't like sand.", 
             "You're my favorite deputy!")

0.6591819524765015
Likely to be the next sentence.


In [11]:
run_bert_nsp("I don't like sand", 
             "You're my favorite deputy!")

0.9073694348335266
Likely to be the next sentence.


In [12]:
run_bert_nsp("It's over, Anakin!", 
             "I have the high ground.")

0.9806997776031494
Likely to be the next sentence.


I wanted to find a sentence where the model was as close to uncertain as possible. This was the closest I could come.

In [13]:
run_bert_nsp("The waves crashed against the shore, leaving a line of foam in their wake.", 
             "The painter applied the final brushstrokes to the canvas, completing the masterpiece.")

0.4564545452594757
Not likely to be the next sentence.
