# Visualization with BertViz
##  Head View



In [1]:
!pip install bertviz
!pip install transformers
!pip install ipywidgets

Collecting bertviz
[?25l  Downloading https://files.pythonhosted.org/packages/30/5f/e4a5729cdf0ca6c8fbbbe2c7e15add55eab8868f2c6653ef92e4cfdaaa71/bertviz-1.1.0-py3-none-any.whl (153kB)
[K     |██▏                             | 10kB 15.8MB/s eta 0:00:01[K     |████▎                           | 20kB 20.4MB/s eta 0:00:01[K     |██████▍                         | 30kB 22.2MB/s eta 0:00:01[K     |████████▋                       | 40kB 25.7MB/s eta 0:00:01[K     |██████████▊                     | 51kB 28.9MB/s eta 0:00:01[K     |████████████▉                   | 61kB 27.9MB/s eta 0:00:01[K     |███████████████                 | 71kB 27.9MB/s eta 0:00:01[K     |█████████████████▏              | 81kB 28.5MB/s eta 0:00:01[K     |███████████████████▎            | 92kB 29.9MB/s eta 0:00:01[K     |█████████████████████▍          | 102kB 29.1MB/s eta 0:00:01[K     |███████████████████████▌        | 112kB 29.1MB/s eta 0:00:01[K     |█████████████████████████▊      | 122kB 29.1

In [2]:
from bertviz import head_view
from transformers import BertTokenizer, BertModel

In [3]:
# We define getBertAttentions() function to retrieve attentions and tokens from a given model 

def get_bert_attentions(model_path, sentence_a, sentence_b):
    model = BertModel.from_pretrained(model_path, output_attentions=True)
    tokenizer = BertTokenizer.from_pretrained(model_path)
    inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt', add_special_tokens=True) #, add_special_tokens=True)
    token_type_ids = inputs['token_type_ids']
    input_ids = inputs['input_ids']
    attention = model(input_ids, token_type_ids=token_type_ids)[-1]
    input_id_list = input_ids[0].tolist()
    tokens = tokenizer.convert_ids_to_tokens(input_id_list)
    return attention, tokens


# Head View
The head view visualizes attention in one or more heads for the selected layer.
 

In [4]:
model_path = 'bert-base-cased'
sentence_a = "The cat is very sad."
sentence_b = "Because it could not find food to eat."
attention, tokens=get_bert_attentions(model_path, sentence_a, sentence_b)
head_view(attention, tokens)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=570.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435779157.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=29.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435797.0, style=ProgressStyle(descripti…




<IPython.core.display.Javascript object>

#Working the language models other than English

##  A Turkish Model

In [5]:
model_path = 'dbmdz/bert-base-turkish-cased'

sentence_a = "Kedi çok üzgün."
sentence_b = "Çünkü o her zamanki gibi çok fazla yemek yedi."

attention, tokens=get_bert_attentions(model_path, sentence_a, sentence_b)
head_view(attention, tokens)
# <Layer-8, Head-8>

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=385.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=445018508.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at dbmdz/bert-base-turkish-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=251003.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=60.0, style=ProgressStyle(description_w…




<IPython.core.display.Javascript object>

In [6]:
model_path = 'bert-base-german-cased'
sentence_a = "Die Katze ist sehr traurig."
sentence_b = "Weil sie zu viel gegessen hat"
attention, tokens=get_bert_attentions(model_path, sentence_a, sentence_b)
head_view(attention, tokens)

# <Layer-8, Head-11>

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=438869143.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at bert-base-german-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=254728.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=29.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=485115.0, style=ProgressStyle(descripti…




<IPython.core.display.Javascript object>

## Model View
* https://github.com/jessevig/bertviz/blob/master/notebooks/model_view_bert.ipynb

In [7]:
from bertviz import model_view
from transformers import BertTokenizer, BertModel

In [8]:
def show_model_view(model, tokenizer, sentence_a, sentence_b=None, hide_delimiter_attn=False, display_mode="light"):
    inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt', add_special_tokens=True)
    input_ids = inputs['input_ids']
    if sentence_b:
        token_type_ids = inputs['token_type_ids']
        attention = model(input_ids, token_type_ids=token_type_ids)[-1]
        sentence_b_start = token_type_ids[0].tolist().index(1)
    else:
        attention = model(input_ids)[-1]
        sentence_b_start = None
    input_id_list = input_ids[0].tolist() # Batch index 0
    tokens = tokenizer.convert_ids_to_tokens(input_id_list)  
    if hide_delimiter_attn:
        for i, t in enumerate(tokens):
            if t in ("[SEP]", "[CLS]"):
                for layer_attn in attention:
                    layer_attn[0, :, i, :] = 0
                    layer_attn[0, :, :, i] = 0
    model_view(attention, tokens, sentence_b_start, display_mode=display_mode)

In [9]:
model_path='bert-base-german-cased'
model = BertModel.from_pretrained(model_path, output_attentions=True)
tokenizer = BertTokenizer.from_pretrained(model_path)

Some weights of the model checkpoint at bert-base-german-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [10]:
show_model_view(model, tokenizer, sentence_a, sentence_b, hide_delimiter_attn=False, display_mode="light")

<IPython.core.display.Javascript object>

pronoun-antecedent relation (Coreference) patterns are mostly encoded in the heads <8,1> <8,11>, <10,1> <10,7> (< LAYER, HEAD >)

< Layer-8, Head-11> is the strongest head that encodes the corerefence relation in German Model

# Neuron View
The attention-head view visualizes attention, as well as query and key values, in a particuler attention head.

The official Usage Notes:
* Hover over any of the tokens on the left side of the visualization to filter attention from that token.
* Then click on the plus icon that is revealed when hovering. This shows the query vectors, key vectors, and intermediate computations for the attention weights (blue=positive, orange=negative).
* Once in the expanded view, hover over any other token on the left to see the associated attention computations.
* Click on the Layer or Head drop-downs to change the model layer or head (zero-indexed).

In [None]:
from bertviz.transformers_neuron_view import BertModel, BertTokenizer
from bertviz.neuron_view import show
model_path='bert-base-german-cased'
sentence_a = "Die Katze ist sehr traurig."
sentence_b = "Weil sie zu viel gegessen hat"
model = BertModel.from_pretrained(model_path, output_attentions=True)
tokenizer = BertTokenizer.from_pretrained(model_path)
model_type = 'bert'
show(model, model_type, tokenizer, sentence_a, sentence_b, layer=8, head=11, display_mode="light")

#let us check  <8,11>  that is for pronoun-antecedent relation,  <2,6> is for nect token pattern