<a target="_blank" href="https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/BERT.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# BERT in TransformerLens
This demo shows how to use BERT in TransformerLens for the Masked Language Modelling and Next Sentence Prediction task.

# Setup
(No need to read)

In [1]:
# NBVAL_IGNORE_OUTPUT
import os

# Janky code to do different setup when run in a Colab notebook vs VSCode
DEVELOPMENT_MODE = False
IN_GITHUB = os.getenv("GITHUB_ACTIONS") == "true"
try:
    import google.colab

    IN_COLAB = True
    print("Running as a Colab notebook")

    # PySvelte is an unmaintained visualization library, use it as a backup if circuitsvis isn't working
    # # Install another version of node that makes PySvelte work way faster
    # !curl -fsSL https://deb.nodesource.com/setup_16.x | sudo -E bash -; sudo apt-get install -y nodejs
    # %pip install git+https://github.com/neelnanda-io/PySvelte.git
except:
    IN_COLAB = False

if not IN_GITHUB and not IN_COLAB:
    print("Running as a Jupyter notebook - intended for development only!")
    from IPython import get_ipython

    ipython = get_ipython()
    # Code to automatically update the HookedTransformer code as its edited without restarting the kernel
    ipython.magic("load_ext autoreload")
    ipython.magic("autoreload 2")

if IN_COLAB:
    %pip install transformer_lens
    %pip install circuitsvis

Running as a Jupyter notebook - intended for development only!








In [2]:
# Plotly needs a different renderer for VSCode/Notebooks vs Colab argh
import plotly.io as pio

if IN_COLAB or not DEVELOPMENT_MODE:
    pio.renderers.default = "colab"
else:
    pio.renderers.default = "notebook_connected"
print(f"Using renderer: {pio.renderers.default}")

Using renderer: colab


In [3]:
import circuitsvis as cv

# Testing that the library works
cv.examples.hello("Neel")

In [4]:
# Import stuff
import torch

from transformers import AutoTokenizer

from transformer_lens import HookedEncoder, NextSentencePrediction

In [5]:
torch.set_grad_enabled(False)

<torch.autograd.grad_mode.set_grad_enabled at 0x2a285a790>

# BERT

In this section, we will load a pretrained BERT model and use it for the Masked Language Modelling and Next Sentence Prediction task

In [6]:
# NBVAL_IGNORE_OUTPUT
bert = HookedEncoder.from_pretrained("bert-base-cased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

If using BERT for interpretability research, keep in mind that BERT has some significant architectural differences to GPT. For example, LayerNorms are applied *after* the attention and MLP components, meaning that the last LayerNorm in a block cannot be folded.


Moving model to device:  mps
Loaded pretrained model bert-base-cased into HookedTransformer


## Masked Language Modelling
Use the "[MASK]" token to mask any tokens which you would like the model to predict.  
When specifying return_type="predictions" the prediction of the model is returned, alternatively (and by default) the function returns logits.  
You can also specify None as return type for which nothing is returned

In [7]:
prompt = "The [MASK] is bright today."

prediction = bert(prompt, return_type="predictions")

print(f"Prompt: {prompt}")
print(f'Prediction: "{prediction}"')

Prompt: The [MASK] is bright today.
Prediction: "sun"


You can also input a list of prompts:

In [8]:
prompts = ["The [MASK] is bright today.", "She [MASK] to the store.", "The dog [MASK] the ball."]

predictions = bert(prompts, return_type="predictions")

print(f"Prompt: {prompts}")
print(f'Prediction: "{predictions}"')

Prompt: ['The [MASK] is bright today.', 'She [MASK] to the store.', 'The dog [MASK] the ball.']
Prediction: "['Prediction 0: sun', 'Prediction 1: went', 'Prediction 2: caught']"


## Next Sentence Prediction
To carry out Next Sentence Prediction, you have to load in BERT via the class NextSentencePrediction. Then, create a list with the two sentences you want to perform NSP on as elements and then use that as input to the forward function.  
The model will then predict the probability of the sentence at position 1 following (i.e. being the next sentence) to the sentence at position 0.

In [9]:
nsp = NextSentencePrediction.from_pretrained("bert-base-cased")
sentence_a = "A man walked into a grocery store."
sentence_b = "He bought an apple."

input = [sentence_a, sentence_b]

predictions = nsp(input, return_type="predictions")

print(f"Sentence A: {sentence_a}")
print(f"Sentence B: {sentence_b}")
print(f'Prediction: "{predictions}"')

If using BERT for interpretability research, keep in mind that BERT has some significant architectural differences to GPT. For example, LayerNorms are applied *after* the attention and MLP components, meaning that the last LayerNorm in a block cannot be folded.


Moving model to device:  cpu
Loaded pretrained model bert-base-cased into HookedEncoder
Sentence A: A man walked into a grocery store.
Sentence B: He bought an apple.
Prediction: "The sentences are sequential"


# Inputting tokens directly
You can also input tokens instead of a string or a list of strings into the model, which could look something like this

In [10]:
prompt = "The [MASK] is bright today."

tokens = tokenizer(prompt, return_tensors="pt")["input_ids"]
logits = bert(tokens) # Since we are not specifying return_type, we get the logits
logprobs = logits[tokens == tokenizer.mask_token_id].log_softmax(dim=-1)
prediction = tokenizer.decode(logprobs.argmax(dim=-1).item())

print(f"Prompt: {prompt}")
print(f'Prediction: "{prediction}"')

Prompt: The [MASK] is bright today.
Prediction: "sun"


Well done, BERT!