# WARNING
**Please make sure to "COPY AND EDIT NOTEBOOK" to use compatible library dependencies! DO NOT CREATE A NEW NOTEBOOK AND COPY+PASTE THE CODE - this will use latest Kaggle dependencies at the time you do that, and the code will need to be modified to make it work. Also make sure internet connectivity is enabled on your notebook**

# Preliminaries

Write requirements to file, anytime you run it, in case you have to go back and recover Kaggle dependencies. **MOST OF THESE REQUIREMENTS WOULD NOT BE NECESSARY FOR LOCAL INSTALLATION**

Requirements are hosted for each notebook in the companion github repo, and can be pulled down and installed here if needed. Companion github repo is located at https://github.com/azunre/transfer-learning-for-nlp


In [1]:
!pip freeze > kaggle_image_requirements.txt

In [2]:
# get critical dependencies for our purposes
!pip install tensor2tensor
!git clone https://github.com/jessevig/bertviz.git

Collecting tensor2tensor
  Downloading tensor2tensor-1.15.7-py2.py3-none-any.whl (1.4 MB)
[K     |████████████████████████████████| 1.4 MB 4.6 MB/s 
[?25hCollecting tf-slim
  Downloading tf_slim-1.1.0-py2.py3-none-any.whl (352 kB)
[K     |████████████████████████████████| 352 kB 18.8 MB/s 
Collecting gin-config
  Downloading gin_config-0.4.0-py2.py3-none-any.whl (46 kB)
[K     |████████████████████████████████| 46 kB 3.0 MB/s 
[?25hCollecting bz2file
  Downloading bz2file-0.98.tar.gz (11 kB)
Collecting mesh-tensorflow
  Downloading mesh_tensorflow-0.1.19-py3-none-any.whl (366 kB)
[K     |████████████████████████████████| 366 kB 21.4 MB/s 
Collecting dopamine-rl
  Downloading dopamine_rl-3.1.13-py3-none-any.whl (119 kB)
[K     |████████████████████████████████| 119 kB 24.7 MB/s 
Collecting tensorflow-gan
  Downloading tensorflow_gan-2.0.0-py2.py3-none-any.whl (365 kB)
[K     |████████████████████████████████| 365 kB 22.2 MB/s 
Collecting tensorflow-probabilit

# BERT Self-Attention Visualization

In [3]:
from transformers import BertTokenizer, BertModel # transformers BERT tokenizer and model



In [4]:
model = BertModel.from_pretrained('bert-base-uncased', output_attentions=True) # load uncased bert model, making sure to output attention
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True) # load uncased bert tokenizer

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=440473133.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




**Load some required javascript libraries for displaying visualization in notebook**

In [5]:
%%javascript
require.config({
  paths: {
      d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min',
      jquery: '//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min',
  }
});

<IPython.core.display.Javascript object>

In [6]:
from bertviz.bertviz import head_view # bertviz attention head visualization method

# Function for displaying the multiheaded attention
def show_head_view(model, tokenizer, sentence):
    input_ids = tokenizer.encode(sentence, return_tensors='pt', add_special_tokens=True) # be sure to use PyTorch tensors with bertviz 
    attention = model(input_ids)[-1] # get attention layer
    tokens = tokenizer.convert_ids_to_tokens(list(input_ids[0]))    
    head_view(attention, tokens) # call the internal bertviz method to display self-attention

In [7]:
# you can always take a very detailed look at the model like this
print(model)

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          

How to tokenize the input and convert to ready ingestible form by model?

In [8]:
sentence = "He didnt want to talk about cells on the cell phone because he considered it boring"
inputs = tokenizer.encode(sentence, return_tensors='tf', add_special_tokens=True) # changing return_tensors to "pt" would return PyTorch tensors
print(inputs)

tf.Tensor(
[[  101  2002  2134  2102  2215  2000  2831  2055  4442  2006  1996  3526
   3042  2138  2002  2641  2009 11771   102]], shape=(1, 19), dtype=int32)


In [9]:
tokens = tokenizer.convert_ids_to_tokens(list(inputs[0])) # Extract sample of batch index 0 from inputs list of lists
print(tokens)

['[CLS]', 'he', 'didn', '##t', 'want', 'to', 'talk', 'about', 'cells', 'on', 'the', 'cell', 'phone', 'because', 'he', 'considered', 'it', 'boring', '[SEP]']


Now, actually show the self-attention

In [10]:
show_head_view(model, tokenizer, sentence)

<IPython.core.display.Javascript object>

# English-Twi Translation Example

We load the Helsinki-NLP model, trained on the JW300 corpus, for the Twi language - a popular language in Ghana. 

This is an encoder-decoder architecture very similar to that in the original "Attention is all you need" transformers paper

Over a thousand languages are available - https://huggingface.co/Helsinki-NLP

The high-resource translation models are pretty good. The low resource models - for a language like Twi - are a good baseline.

In [11]:
from transformers import MarianMTModel, MarianTokenizer

model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-tw")
tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-tw")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1133.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=295668957.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=822549.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=788340.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1305726.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=42.0, style=ProgressStyle(description_w…






In [12]:
text = "My name is Paul" # Input English Sentence to be translated
inputs = tokenizer.encode(text, return_tensors="pt") # Encode to input token ids
outputs = model.generate(inputs) # Generate output token ids
decoded_output = [tokenizer.convert_ids_to_tokens(int(outputs[0][i])) for i in range(len(outputs[0]))] # Decode output token ids to actual output tokens
print("Original Sentence:")
print(text)
print("Translation (good):")
print(decoded_output)

Original Sentence:
My name is Paul
Translation (good):
['<pad>', '▁Me', '▁din', '▁de', '▁Paul']


In [13]:
text = "How are things?"
inputs = tokenizer.encode(text, return_tensors="pt")
outputs = model.generate(inputs)
decoded_output = [tokenizer.convert_ids_to_tokens(int(outputs[0][i])) for i in range(len(outputs[0]))]
print("Original Sentence:")
print(text)
print("Translation (bad):")
print(decoded_output)

Original Sentence:
How are things?
Translation (bad):
['<pad>', '▁Ɔkwan', '▁bɛn', '▁so', '▁na', '▁nneɛma', '▁te', '▁saa', '?']


The translation is roughly "In which way are things like this?" which while semantically similar, conveys a different meaning. 

The semantic similarity however shows that his model is a decent start for further training/transfer. As shown in the following code, rephrasing lightly gets the right translation.

In [14]:
text = "How are you?"
inputs = tokenizer.encode(text, return_tensors="pt")
outputs = model.generate(inputs)
decoded_output = [tokenizer.convert_ids_to_tokens(int(outputs[0][i])) for i in range(len(outputs[0]))]
print("Original Sentence:")
print(text)
print("Translation (good):")
print(decoded_output)

Original Sentence:
How are you?
Translation (good):
['<pad>', '▁Wo', '▁ho', '▁te', '▁dɛn', '?']


In [15]:
# remove bertviz folder for saving notebook
!rm -r bertviz