# Learnings

General
- **Equipment**: GPU is 20 times faster than CPU. P100 is 2-3 times faster than K80.
- **Down-sampling**: without down sampling & skewed distribution (90/10), the model performed more poorly, F1 0.30.
- **RoBERTa**: Removes next sentence prediction task, and therefore [CLS] token is meaningless until finetuned. BERT CLS trains fater than RoBERTa CLS and is robust to freezing layers.

Project
- **Number of Epochs**: diminishing return after 2 epochs, though epoch 7-10 produced highest F1 scores. Train for 4 epochs to be safe.
- **RoBERTa**: trains slower, 4-6, to fine tune. Performance might be a little better than BERT.
- **Sentence length**: 50th, 95th, and 99th percentile length is 250, 500, 600. Sentence length of 600 does not outperform sentence length of 150.
- **Layer freezing**: Training only top 3 layers, RoBERTa results is meaningfully worse with F1 between 0.2 and 0.3. However, BERT result is robust & good.

Best approaches
- Use BERT, train top 3 layers, extend max_len, train 2 epochs
- Use RoBERTa, train all layers, short max_len (150), train 6-8 iterations.



# Setup

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [7]:
import os
os.chdir('/content/drive/MyDrive/w266 project/dontpatronizeme/semeval-2022')
os.getcwd()

'/content/drive/MyDrive/w266 project/dontpatronizeme/semeval-2022'

In [8]:
!pip install alibi

Collecting alibi
  Downloading alibi-0.6.4-py3-none-any.whl (397 kB)
[K     |████████████████████████████████| 397 kB 5.5 MB/s 
Collecting tensorflow!=2.6.0,!=2.6.1,<2.8.0,>=2.0.0
  Downloading tensorflow-2.7.1-cp37-cp37m-manylinux2010_x86_64.whl (495.0 MB)
[K     |████████████████████████████████| 495.0 MB 30 kB/s 
Collecting spacy-lookups-data<0.2.0,>=0.0.5
  Downloading spacy_lookups_data-0.1.0.tar.gz (28.0 MB)
[K     |████████████████████████████████| 28.0 MB 54.3 MB/s 
Collecting keras<2.8,>=2.7.0rc0
  Downloading keras-2.7.0-py2.py3-none-any.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 34.0 MB/s 
Collecting tensorflow-estimator<2.8,~=2.7.0rc0
  Downloading tensorflow_estimator-2.7.0-py2.py3-none-any.whl (463 kB)
[K     |████████████████████████████████| 463 kB 21.0 MB/s 
Collecting gast<0.5.0,>=0.2.1
  Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)
Building wheels for collected packages: spacy-lookups-data
  Building wheel for spacy-lookups-data (setup.py)

In [9]:
!pip install transformers



In [10]:
import pandas as pd
import numpy as np
import ast
import matplotlib.pyplot as plt
import random
import alibi

from sklearn.metrics import f1_score
import tensorflow as tf
import transformers
from transformers import BertTokenizer, TFBertModel, DistilBertTokenizer, TFDistilBertModel


import logging
tf.get_logger().setLevel(logging.ERROR)
tf.config.list_physical_devices('GPU')

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [11]:
# helper function to save predictions to an output file
def labels2file(p, outf_path):
	with open(outf_path,'w') as outf:
		for pi in p:
			outf.write(','.join([str(k) for k in pi])+'\n')

# Data

In [12]:
from dont_patronize_me import DontPatronizeMe
dpm = DontPatronizeMe('data', 'TEST/task4_test.tsv')
dpm.load_task1()
dpm.load_task2(return_one_hot=True)
dpm.load_test()

Map of label to numerical label:
{'Unbalanced_power_relations': 0, 'Shallow_solution': 1, 'Presupposition': 2, 'Authority_voice': 3, 'Metaphors': 4, 'Compassion': 5, 'The_poorer_the_merrier': 6}


In [13]:
trids = pd.read_csv('practice splits/train_semeval_parids-labels.csv')
teids = pd.read_csv('practice splits/dev_semeval_parids-labels.csv') 
trids.par_id = trids.par_id.astype(str)
teids.par_id = teids.par_id.astype(str)
print(trids.shape)
print(teids.shape)

(8375, 2)
(2094, 2)


In [14]:
# Rebuild train set for Task 1
rows = [] # will contain par_id, label and text
for idx in range(len(trids)):  
  parid = trids.par_id[idx]
  #print(parid)
  # select row from original dataset to retrieve `text` and binary label
  text = dpm.train_task1_df.loc[dpm.train_task1_df.par_id == parid].text.values[0]
  label = dpm.train_task1_df.loc[dpm.train_task1_df.par_id == parid].label.values[0]
  rows.append({
      'par_id':parid,
      'text':text,
      'label':label
  })

trdf1 = pd.DataFrame(rows)

# Rebuild test set for Task 1
rows = [] # will contain par_id, label and text
for idx in range(len(teids)):  
  parid = teids.par_id[idx]
  #print(parid)
  # select row from original dataset
  text = dpm.train_task1_df.loc[dpm.train_task1_df.par_id == parid].text.values[0]
  label = dpm.train_task1_df.loc[dpm.train_task1_df.par_id == parid].label.values[0]
  rows.append({
      'par_id':parid,
      'text':text,
      'label':label
  })

tedf1 = pd.DataFrame(rows)

# downsample negative instances
pcldf = trdf1[trdf1.label==1]
npos = len(pcldf)

training_set1 = pd.concat([pcldf,trdf1[trdf1.label==0][:npos*2]])
training_set1

Unnamed: 0,par_id,text,label
0,4341,"The scheme saw an estimated 150,000 children f...",1
1,4136,Durban 's homeless communities reconciliation ...,1
2,10352,The next immediate problem that cropped up was...,1
3,8279,Far more important than the implications for t...,1
4,1164,To strengthen child-sensitive social protectio...,1
...,...,...,...
2377,1775,Last but not the least element of culpability ...,0
2378,1776,"Then , taking the art of counter-intuitive non...",0
2379,1777,Kagunga village was reported to lack necessary...,0
2380,1778,"""After her parents high-profile divorce after ...",0


In [15]:
# Rebuild train set for task 2
rows2 = [] # will contain par_id, label and text
for idx in range(len(trids)):  
  parid = trids.par_id[idx]
  label = trids.label[idx]
  # select row from original dataset to retrieve the `text` value
  text = dpm.train_task1_df.loc[dpm.train_task1_df.par_id == parid].text.values[0]
  rows2.append({
      'par_id':parid,
      'text':text,
      'label':label
  })
  
trdf2 = pd.DataFrame(rows2)
trdf2.label = trdf2.label.apply(ast.literal_eval)

rows2 = [] # will contain par_id, label and text
for idx in range(len(teids)):  
  parid = teids.par_id[idx]
  label = teids.label[idx]
  #print(parid)
  # select row from original dataset to access the `text` value
  text = dpm.train_task1_df.loc[dpm.train_task1_df.par_id == parid].text.values[0]
  rows2.append({
      'par_id':parid,
      'text':text,
      'label':label
  })
  
tedf2 = pd.DataFrame(rows2)
tedf2.label = tedf2.label.apply(ast.literal_eval)

# downsample
all_negs = trdf2[trdf2.label.apply(lambda x:sum(x) == 0)]
all_pos = trdf2[trdf2.label.apply(lambda x:sum(x) > 0)]

training_set2 = pd.concat([all_pos,all_negs[:round(len(all_pos)*0.5)]])
training_set2

Unnamed: 0,par_id,text,label
0,4341,"The scheme saw an estimated 150,000 children f...","[1, 0, 0, 1, 0, 0, 0]"
1,4136,Durban 's homeless communities reconciliation ...,"[0, 1, 0, 0, 0, 0, 0]"
2,10352,The next immediate problem that cropped up was...,"[1, 0, 0, 0, 0, 1, 0]"
3,8279,Far more important than the implications for t...,"[0, 0, 0, 1, 0, 0, 0]"
4,1164,To strengthen child-sensitive social protectio...,"[1, 0, 0, 1, 1, 1, 0]"
...,...,...,...
1186,434,""""""" I was absolutely useless at school , hopel...","[0, 0, 0, 0, 0, 0, 0]"
1187,435,I also noticed the change in socio-economic le...,"[0, 0, 0, 0, 0, 0, 0]"
1188,436,"Can Donald Trump win ? It 's possible , but ce...","[0, 0, 0, 0, 0, 0, 0]"
1189,437,He added that any introduction of new law must...,"[0, 0, 0, 0, 0, 0, 0]"


# Epochs

In [16]:
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [17]:
max_length = 150

x_train = tokenizer([str(x) for x in training_set1['text'].values], 
              max_length=max_length,
              truncation=True,
              padding='max_length', 
              return_tensors='tf')
y_train = training_set1['label'].values

x_test = tokenizer([str(x) for x in tedf1['text'].values], 
              max_length=max_length,
              truncation=True,
              padding='max_length', 
              return_tensors='tf')
y_test = tedf1['label'].values

In [18]:
def create_classification_model(hidden_size = 200, 
                                train_layers = -1, 
                                optimizer=tf.keras.optimizers.Adam()):
    """
    Build a simple classification model with BERT. Let's keep it simple and don't add dropout, layer norms, etc.
    """

    input_ids = tf.keras.layers.Input(shape=(max_length,), dtype=tf.int32, name='input_ids_layer')
    token_type_ids = tf.keras.layers.Input(shape=(max_length,), dtype=tf.int32, name='token_type_ids_layer')
    attention_mask = tf.keras.layers.Input(shape=(max_length,), dtype=tf.int32, name='attention_mask_layer')

    bert_inputs = {'input_ids': input_ids,
                  'token_type_ids': token_type_ids,
                  'attention_mask': attention_mask}


    #restrict training to the train_layers outer transformer layers
    if not train_layers == -1:

            retrain_layers = []

            for retrain_layer_number in range(train_layers):

                layer_code = '_' + str(11 - retrain_layer_number)
                retrain_layers.append(layer_code)

            for w in bert_model.weights:
                if not any([x in w.name for x in retrain_layers]):
                    w._trainable = False


    bert_out = bert_model(bert_inputs)


    classification_token = tf.keras.layers.Lambda(lambda x: x[:,0,:], name='get_first_vector')(bert_out[0])


    hidden1 = tf.keras.layers.Dense(hidden_size, name='hidden_layer_1')(classification_token)
    hidden2 = tf.keras.layers.Dense(hidden_size, name='hidden_layer_2')(hidden1)

    classification = tf.keras.layers.Dense(1, activation='sigmoid', name='classification_layer')(hidden2)

    classification_model = tf.keras.Model(inputs=[input_ids, token_type_ids, attention_mask], 
                                          outputs=[classification])
    
    classification_model.compile(optimizer=optimizer,
                            loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
                            metrics='accuracy')


    return classification_model

In [44]:
try:
    del classification_model
except:
    pass

try:
    del bert_model
except:
    pass

tf.keras.backend.clear_session()

from transformers import BertConfig
#config = BertConfig(output_hidden_states=True)
bert_model = TFBertModel.from_pretrained('bert-base-cased')

classification_model = create_classification_model(optimizer=tf.keras.optimizers.Adam(0.00005),
                                                   train_layers=3)

Some layers from the model checkpoint at bert-base-cased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


<keras.callbacks.ModelCheckpoint at 0x7f8ce17c5410>

In [None]:
for epoch in range(2):
  classification_model.fit([x_train.input_ids, x_train.token_type_ids, x_train.attention_mask],
                          y_train,
                          epochs=1,
                          batch_size=8)
  
  y_predict = classification_model.predict([x_test.input_ids, x_test.token_type_ids, x_test.attention_mask], 
                                          batch_size=8, verbose=1)  # steps=2?
  y_predict = [1 if i[0]>0.5 else 0 for i in y_predict]

  print('Epoch:', epoch+1, 'F1:', f1_score(y_test, y_predict))

# RoBERTa

In [None]:
from transformers import RobertaTokenizer, TFRobertaModel

In [None]:
roberta_tokenizer = RobertaTokenizer.from_pretrained('roberta-base')

max_length = 150

x_train = tokenizer([str(x) for x in training_set1['text'].values], 
              max_length=max_length,
              truncation=True,
              padding='max_length', 
              return_tensors='tf')
y_train = training_set1['label'].values

x_test = tokenizer([str(x) for x in tedf1['text'].values], 
              max_length=max_length,
              truncation=True,
              padding='max_length', 
              return_tensors='tf')
y_test = tedf1['label'].values

In [None]:
def create_classification_model(hidden_size = 200, 
                                train_layers = -1, 
                                optimizer=tf.keras.optimizers.Adam()):
    """
    Build a simple classification model with BERT. Let's keep it simple and don't add dropout, layer norms, etc.
    """

    input_ids = tf.keras.layers.Input(shape=(max_length,), dtype=tf.int32, name='input_ids_layer')
    attention_mask = tf.keras.layers.Input(shape=(max_length,), dtype=tf.int32, name='attention_mask_layer')

    bert_inputs = {'input_ids': input_ids,
                  'attention_mask': attention_mask}


    #restrict training to the train_layers outer transformer layers
    if not train_layers == -1:

            retrain_layers = []

            for retrain_layer_number in range(train_layers):

                layer_code = '_' + str(11 - retrain_layer_number)
                retrain_layers.append(layer_code)

            for w in roberta_model.weights:
                if not any([x in w.name for x in retrain_layers]):
                    w._trainable = False


    bert_out = roberta_model(bert_inputs)

    classification_token = tf.keras.layers.Lambda(lambda x: x[:,0,:], name='get_first_vector')(bert_out[0])

    hidden1 = tf.keras.layers.Dense(hidden_size, name='hidden_layer_1')(classification_token)
    hidden2 = tf.keras.layers.Dense(hidden_size, name='hidden_layer_2')(hidden1)

    classification = tf.keras.layers.Dense(1, activation='sigmoid', name='classification_layer')(hidden2)

    classification_model = tf.keras.Model(inputs=[input_ids, attention_mask], 
                                          outputs=[classification])
    
    classification_model.compile(optimizer=optimizer,
                            loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
                            metrics='accuracy')


    return classification_model

In [None]:
try:
    del classification_model
except:
    pass

try:
    del roberta_model
except:
    pass

tf.keras.backend.clear_session()

roberta_model = TFRobertaModel.from_pretrained("roberta-base")

classification_model = create_classification_model(optimizer=tf.keras.optimizers.Adam(0.00005),
                                                   train_layers=3)

Some layers from the model checkpoint at roberta-base were not used when initializing TFRobertaModel: ['lm_head']
- This IS expected if you are initializing TFRobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFRobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFRobertaModel were initialized from the model checkpoint at roberta-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaModel for predictions without further training.


In [None]:
for epoch in range(6):
  classification_model.fit([x_train.input_ids, x_train.attention_mask],
                          y_train,
                          epochs=1,
                          batch_size=8)
  
  y_predict = classification_model.predict([x_test.input_ids, x_test.attention_mask], 
                                          batch_size=8, verbose=1)  # steps=2?
  y_predict = [1 if i[0]>0.5 else 0 for i in y_predict]

  print('Epoch:', epoch+1, 'F1:', f1_score(y_test, y_predict))


Epoch: 1 F1: 0.22641509433962265
Epoch: 2 F1: 0.22181146025878
Epoch: 3 F1: 0.02884615384615384
Epoch: 4 F1: 0.24689165186500886
Epoch: 5 F1: 0.225
Epoch: 6 F1: 0.2614742698191933


# IG

In [39]:
classification_model.layers[0].layers[0]

AttributeError: ignored

# Evaluation

In [None]:
# output
labels2file([[y] for y in y_predict], os.path.join('res/task1.txt'))

# Evaluate
!python3 evaluation.py . .
!cat scores.txt

task1_precision:0.34285714285714286
task1_recall:0.7236180904522613
task1_f1:0.4652665589660743
task2_unb:0.1176470588235294
task2_sha:0.04224058769513315
task2_pre:0.0525854513584575
task2_aut:0.03071364046973803
task2_met:0.05253623188405797
task2_com:0.09909909909909911
task2_the:0.014746543778801843
task2_avg:0.05850980187268815


In [None]:
# Task 1
x_test_s = tokenizer([str(x) for x in dpm.test_set_df['text'].values], 
              max_length=max_length,
              truncation=True,
              padding='max_length', 
              return_tensors='tf')

y_predict = classification_model.predict([x_test_s.input_ids, x_test_s.token_type_ids, x_test_s.attention_mask], 
                                         batch_size=8, verbose=1)  # steps=2?
y_predict = [1 if i[0]>0.5 else 0 for i in y_predict]

# output
labels2file([[y] for y in y_predict], os.path.join('res/task1.txt'))



In [None]:
# load test data
# predict & output (task1 with model, task2 with random)

os.chdir('res')

!cat task1.txt | head -n 3
!cat task2.txt | head -n 3
!zip submission.zip task1.txt task2.txt

os.chdir('..')
#os.chdir('/content/drive/MyDrive/w266 project/dontpatronizeme/semeval-2022')

0
0
0
0,1,0,1,0,1,1
1,1,0,1,0,0,0
1,0,0,0,0,0,0
updating: task1.txt (deflated 94%)
updating: task2.txt (deflated 87%)


# Testing

In [47]:
# Necessary Imports
import tensorflow as tf
import numpy as np
import pandas as pd
import re
from alibi.explainers import IntegratedGradients
import matplotlib as mpl

# Preprocess and clean texts
def preprocess_reviews(reviews):

    REPLACE_NO_SPACE = re.compile("[.;:,!'?()[]]")
    REPLACE_WITH_SPACE = re.compile("(<brs*/><brs*/>)|(-)|(/)")

    reviews = [REPLACE_NO_SPACE.sub("", line.lower()) for line in reviews]
    reviews = [REPLACE_WITH_SPACE.sub(" ", line) for line in reviews]

    return reviews

# Tokenize text
def process_sentences(sentence,
 tokenizer,
 max_len):
    z = tokenizer(sentence,
                  add_special_tokens = False,
                  padding = 'max_length',
                  max_length = max_len,
                  truncation = True,
                  return_token_type_ids=True,
                  return_attention_mask = True,
                  return_tensors = 'np')
    return z

# Load pretrained BERT Model from Transformers Library
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
auto_model_bert = TFAutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/256M [00:00<?, ?B/s]

Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english and are newly initialized: ['dropout_56']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

In [48]:
class AutoModelWrapper(tf.keras.Model):

    def __init__(self, model_bert, **kwargs):
        super().__init__()
        self.model_bert = model_bert

# Apply softmax function to logits
    def call(self, inputs, attention_mask=None):
        out = self.model_bert(inputs,
                              attention_mask=attention_mask)
        return tf.nn.softmax(out.logits)

    def get_config(self):
        return {}

    @classmethod
    def from_config(cls, config):
        return cls(**config)

auto_model = AutoModelWrapper(auto_model_bert)

# Define the maximum length of the sequence
max_len = 20

In [49]:
z_test_sample = ['I love you, but I also kind of dislike you.']
z_test_sample = preprocess_reviews(z_test_sample)
z_test_sample = process_sentences(z_test_sample,
 tokenizer,
 max_len)

# We need only the input ids for the classification
x_test_sample = z_test_sample['input_ids']

# Preparation of Attention Masks
kwargs = {k: tf.constant(v) for k,v in z_test_sample.items() if k ==
'attention_mask'}

In [52]:
#  Extracting the first transformer block
bl = auto_model.layers[0].layers[0].transformer.layer[1]

In [96]:
auto_model.layers[0].summary()

Model: "tf_distil_bert_for_sequence_classification"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 distilbert (TFDistilBertMai  multiple                 66362880  
 nLayer)                                                         
                                                                 
 pre_classifier (Dense)      multiple                  590592    
                                                                 
 classifier (Dense)          multiple                  1538      
                                                                 
 dropout_56 (Dropout)        multiple                  0         
                                                                 
Total params: 66,955,010
Trainable params: 66,955,010
Non-trainable params: 0
_________________________________________________________________


In [95]:
classification_model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 attention_mask_layer (InputLay  [(None, 150)]       0           []                               
 er)                                                                                              
                                                                                                  
 input_ids_layer (InputLayer)   [(None, 150)]        0           []                               
                                                                                                  
 token_type_ids_layer (InputLay  [(None, 150)]       0           []                               
 er)                                                                                              
                                                                                              

In [56]:
n_steps = 20
method = "gausslegendre"
internal_batch_size = 5
ig  = IntegratedGradients(auto_model,
                          layer=bl,
                          n_steps=n_steps,
                          method=method,
                          internal_batch_size=internal_batch_size)

In [57]:
predictions = auto_model(x_test_sample, **kwargs).numpy().argmax(axis=1)
explanation = ig.explain(x_test_sample,
                         forward_kwargs=kwargs,
                         baselines=None,
                         target=predictions)

In [59]:
attrs = explanation.attributions[0]
attrs = attrs.sum(axis=2)
print('Attributions shape:', attrs.shape)

Attributions shape: (1, 20)


In [64]:
explanation.attributions[0].shape

(1, 20, 768)

In [65]:
from IPython.display import HTML
# Return HTML markup which highlights the text with a desired color.
def  hlstr(string, color='white'):
    return f"<mark style=background-color:{color}>{string} </mark>"

# Calculates color based on attribution values
def colorize(attrs, cmap='PiYG'):
    cmap_bound = np.abs(attrs).max()
    norm = mpl.colors.Normalize(vmin=-cmap_bound, vmax=cmap_bound)
    cmap = mpl.cm.get_cmap(cmap)

    colors = list(map(lambda x: mpl.colors.rgb2hex(cmap(norm(x))), attrs))
    return colors

In [72]:
words = tokenizer.decode(x_test_sample[0]).split()
colors = colorize(attrs[0])

print('Predicted label =  {}'.format(predictions[0]))

Predicted label =  1


In [73]:
HTML("".join(list(map(hlstr, words, colors))))
