<a href="https://colab.research.google.com/github/JKrse/LUKE_thesis/blob/master/visual_attention/colab_head_and_model_view_bert_roberta_luke.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Welcome** 🚀
Here is our COLAB implementation of the BertViz tool. This is a tool for visualizing attention in the Transformer model, supporting all models from the transformers library (BERT, GPT-2, XLNet, RoBERTa, XLM, CTRL, etc.), however, we have added LUKE's *entity-aware* self-attention.

For this demo we have added BERT, RoBERTa, and LUKE. 

Note, we have provided a sample dataset to play around with (*data_dir*) but you can easily generate your own samples. Please follow the instruction on the Git page for further intructions.  

---

**Head View**

The *head view* visualizes the attention patterns produced by one or more attention heads in a given transformer layer.

**Model View**

The *model view* provides a birds-eye view of attention across all of the model’s layers and heads.

---

All credits to JesseVig for making the BertViz framework. 

BertViz repo: https://github.com/jessevig/bertviz //
LUKE repo: https://github.com/studio-ousia/luke

Project repo: https://github.com/JKrse/LUKE_thesis

# Get started
####**Just run this cell, it will download, install and load all the dependencies needed.**

## Install and load packages

In [None]:
!pip install gitpython
!pip install torch
!pip install transformers

!test -d luke_repo || git clone https://github.com/JKrse/LUKE_thesis luke_repo


In [None]:
import sys
import os 
from luke_repo.visual_attention.bertviz import head_view, model_view
from luke_repo.visual_attention.bertviz.util import drop_down, sentence_index, print_sentence
from transformers import BertTokenizer, BertModel, RobertaTokenizer, RobertaModel
import torch
import pickle

if not 'luke_repo' in sys.path:
  sys.path += ['luke_repo']

## Load models: BERT, RoBERTa, and LUKE

### LUKE data

In [None]:
data_dir = "luke_repo/visual_attention/sample_data"
luke_data = pickle.load( open( f"{data_dir}/output_attentions.p", "rb" ))
sentences = [luke_data[sent]["sentence"] for sent in luke_data.keys()]

luke_data = get_entity_string(luke_data)
sentences_with_entity = [luke_data[sent]["sentence_with_entity"] for sent in luke_data]

### Load RoBERTa

In [38]:
# RoBERTa:
do_lower_case = True
output_attentions=True
model_version_roberta = 'roberta-large'
model_roberta = RobertaModel.from_pretrained(model_version_roberta, output_attentions=output_attentions)
tokenizer_roberta = RobertaTokenizer.from_pretrained(model_version_roberta)
assert model_roberta.config.output_attentions == True

### Load BERT

In [None]:
# BERT: 
do_lower_case = True
output_attentions=True
model_version_bert = 'bert-base-uncased'
model_bert = BertModel.from_pretrained(model_version_bert, output_attentions=output_attentions)
tokenizer_bert = BertTokenizer.from_pretrained(model_version_bert, do_lower_case=do_lower_case)
assert model_bert.config.output_attentions == True

## Helper functions: *show_model_view* and *show_head_view*

In [22]:
def show_head_view(model, tokenizer, sentence_a, sentence_b=None, layer=None, heads=None, format_attention=True):
    inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt', add_special_tokens=True)
    input_ids = inputs['input_ids']
    if sentence_b:
        token_type_ids = inputs['token_type_ids']
        attention = model(input_ids, token_type_ids=token_type_ids)[-1]
        sentence_b_start = token_type_ids[0].tolist().index(1)
    else:
        attention = model(input_ids)[-1]
        sentence_b_start = None
    input_id_list = input_ids[0].tolist() # Batch index 0
    tokens = tokenizer.convert_ids_to_tokens(input_id_list)  
    head_view(attention, tokens, sentence_b_start, layer=layer, heads=heads, format_data=format_attention)


def show_model_view(model, tokenizer, sentence_a, sentence_b=None, hide_delimiter_attn=False, display_mode="dark", format_attention=True):
    inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt', add_special_tokens=True)
    input_ids = inputs['input_ids']
    if sentence_b:
        token_type_ids = inputs['token_type_ids']
        attention = model(input_ids, token_type_ids=token_type_ids)[-1]
        sentence_b_start = token_type_ids[0].tolist().index(1)
    else:
        attention = model(input_ids)[-1]
        sentence_b_start = None
    input_id_list = input_ids[0].tolist() # Batch index 0
    tokens = tokenizer.convert_ids_to_tokens(input_id_list)  
    if hide_delimiter_attn:
        for i, t in enumerate(tokens):
            if t in ("[SEP]", "[CLS]"):
                for layer_attn in attention:
                    layer_attn[0, :, i, :] = 0
                    layer_attn[0, :, :, i] = 0
    model_view(attention, tokens, sentence_b_start, display_mode=display_mode, format_data=format_attention)

# Select a sentence:
### Run cell 👇🏼 and then simple use the dropdown to select the sentence of interest
(OBS re-run "head_view" and "model_view" when updating the dropdown)

In [42]:
sentence_select = drop_down(options=sentences_with_entity)
display(sentence_select)

Dropdown(description='Select:', options=('" For Shanghai to become a financial and economic centre , it will h…

# Head View
The attention-head view visualizes attention in one or more heads in a particular layer in the model.

## Usage
* **Hover** over any **token** on the left/right side of the visualization to filter attention from/to that token. The colors correspond to different attention heads.
* **Double-click** on any of the **colored tiles** at the top to filter to the corresponding attention head.
* **Single-click** on any of the **colored tiles** to toggle selection of the corresponding attention head. 
* **Click** on the **Layer** drop-down to change the model layer (zero-indexed).
* The lines show the attention from each token (left) to every other token (right). Darker lines indicate higher attention weights. When multiple heads are selected, the attention weights are overlaid on one another. 




In [40]:
sentence_a = sentence_select.value[:sentence_select.value.find("[")-1]
sentence_b = None
# LUKE
index = sentence_index(luke_data, sentence_select.value[:sentence_select.value.find("[")-1])
attention_luke = luke_data[f"sent_{index}"]["attention"]
tokens_luke = luke_data[f"sent_{index}"]["tokens"]

# Plot: 
print_sentence(sentence_a, sentence_b)
print(f"\n")
print(f"Model: {model_version_bert}")
show_head_view(model_bert, tokenizer_bert, sentence_a, sentence_b)
print(f"\n")
print(f"Model: {model_version_roberta}")
show_head_view(model_roberta, tokenizer_roberta, sentence_a, sentence_b)
print(f"\n")
print(f"Model: LUKE")
head_view(attention_luke, tokens_luke, format_data=False)

Sentence: the European Parliament


Model: bert-base-uncased


<IPython.core.display.Javascript object>



Model: roberta-large


<IPython.core.display.Javascript object>



Model: LUKE


<IPython.core.display.Javascript object>

# Model View
The model view gives a birds-eye view of attention across all of the layers (rows) and heads (columns) in the model. In this case we are showing *bert-base*, which has 12 layers and 12 heads (zero-indexed). 

## Usage
* **Click** on any **cell** for a detailed view of attention for the associated attention head.
* Then **hover** over any **token** on the left side of detail view to filter the attention from that token.
* The lines show the attention from each token (left) to every other token (right). Darker lines indicate higher attention weights.  

In [41]:
sentence_a = sentence_select.value
sentence_b = None
# LUKE
index = sentence_index(luke_data, sentence_select.value)
attention_luke = luke_data[f"sent_{index}"]["attention"]
tokens_luke = luke_data[f"sent_{index}"]["tokens"]


print_sentence(sentence_a, sentence_b)
print(f"\n")
print(f"Model: {model_version_bert}")
show_model_view(model_bert, tokenizer_bert, sentence_a, sentence_b, hide_delimiter_attn=False, display_mode="dark", format_attention=True)
print(f"\n")

print(f"\n")
print(f"Model: {model_version_roberta}")
show_model_view(model_roberta, tokenizer_roberta, sentence_a, sentence_b, hide_delimiter_attn=False, display_mode="dark", format_attention=True)

print(f"\n")
print(f"Model: LUKE")
model_view(attention_luke, tokens_luke, format_data=False)

Sentence: the European Parliament


Model: bert-base-uncased


<IPython.core.display.Javascript object>





Model: roberta-large


<IPython.core.display.Javascript object>



Model: LUKE


<IPython.core.display.Javascript object>