# Links

### APIs

Keras-NLP : https://keras.io/api/keras_nlp/

Tensorflow-NLP : https://www.tensorflow.org/api_docs/python/tfm/nlp

Keras-NLP Transformer Encoder : https://keras.io/api/keras_nlp/modeling_layers/transformer_encoder/

Keras-NLP position encoding : https://keras.io/api/keras_nlp/modeling_layers/sine_position_encoding/

Tensorflow-NLP Transformer Encoder : https://www.tensorflow.org/api_docs/python/tfm/nlp/layers/TransformerEncoderBlock

### Tutorials

Keras NLP tutorials : https://keras.io/examples/nlp/

Example of text classification with Transformers : https://keras.io/examples/nlp/text_classification_with_transformer/


### Dataset

Keras datasets : https://keras.io/api/datasets/

Reuters newswire : https://keras.io/api/datasets/reuters/

IMDB movie review sentiment : https://keras.io/api/datasets/imdb/

In [None]:
!rm -r /content/logs/*

In [None]:
# Load the TensorBoard notebook extension.
%load_ext tensorboard

In [None]:
%tensorboard --logdir logs/scalars

## Dataset

 Download the dataset vocabulary, which can be used to limit the vocabulary size when downloading the dataset

In [None]:
from tensorflow import keras

voc = keras.datasets.reuters.get_word_index()
print(voc)
print(len(voc))

Reuters dataset classes from https://martin-thoma.com/nlp-reuters/

'cocoa',
'grain',
'veg-oil',
'earn',
'acq',
'wheat',
'copper',
'housing',
'money-supply',
'coffee',
'sugar',
'trade',
'reserves',
'ship',
'cotton',
'carcass',
'crude',
'nat-gas',
'cpi',
'money-fx',
'interest',
'gnp',
'meal-feed',
'alum',
'oilseed',
'gold',
'tin',
'strategic-metal',
'livestock',
'retail',
'ipi',
'iron-steel',
'rubber',
'heat',
'jobs',
 'lei',
 'bop',
 'zinc',
 'orange',
 'pet-chem',
 'dlr',
 'gas',
 'silver',
 'wpi',
 'hog',
 'lead'

Download the Reuters dataset while limiting the vocabulary size

In [None]:
import tensorflow as tf
from tensorflow import keras

max_features = 20000  # Only consider the top 20k words

(x_train, y_train), (x_val, y_val) = keras.datasets.reuters.load_data(num_words=max_features)

In [None]:
print(len(x_train), "Training sequences")
print(len(x_val), "Validation sequences")

To define the number of classes that we have in our dataset, we look at the maximum class index in the labels.

In [None]:
# This assume that all classes have at least one sample
n_classes = max(y_train) + 1
print(n_classes)

Because we want to train our model in batches, we need to pad the sequences so that they all have the same length. For this we have 2 possible solutions:

*   we look at the longer sequence in our training set, and pad all the other sequences to have the same length everywhere
*   or we can fix a maximum length and pad the sequences shorter than this length and trim the sequences longer than this length

To make our model training faster, we select the second option; but feel free to try the first one and compare the performance.

Hopefully, `tf.keras.preprocessing` provides us with a tool for padding sequences.

In [None]:
maxlen = 200  # Only consider the first 200 words of each newswire

x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_val = keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen)

Here we convert our class indexes to one-hot vectors.

In [None]:
y_train = tf.one_hot(y_train, n_classes)
y_val = tf.one_hot(y_val, n_classes)

## Keras-NLP

We will use the Keras-NLP API, because it provides us with the Positional Encoding ([SinePositionEncoding](https://keras.io/api/keras_nlp/modeling_layers/sine_position_encoding/)). We also use the [Transformer Encoder of Keras](https://keras.io/api/keras_nlp/layers/transformer_encoder/) that works similarly to Tensorflow one's.

However here, the biggest difference is the use of the [Keras Embedding](https://keras.io/api/layers/core_layers/embedding/) layer that provides us with a masking option. This is extremly important because otherwise the padded values (that are just zeros) will be used in the self-attention of the encoder; while they do not mean anything but are just here to complete the sequences for batching. To make them invisible for the encoder we use **masking**. It consists to passing to the encoder a mask that contains boolean values with *True* where the encoder should use the vectors and *False* where it should not. The Embedding layer of Keras does this for us. We just have to pass to him `mask_zero=True`, and it will mask the zero values in the input. Note that you have to use Keras NLP layers after to be compatible with this option.

Lets' install the Keras-NLP library

In [None]:
!pip install keras_nlp

Build your model.

In [None]:
import keras_nlp
from tensorflow import keras

d_model = 64   # dimension of vectors in the Multi-Head Attention
n_head = 4      # number of head in Multi-Head Attention
d_ffn = 512     # dimension of vectors in the Feed Forward Network
n_layer = 5     # number of encoder layers

inputs = keras.Input(shape=(None,), dtype="int32")

x = keras.layers.Embedding(max_features, d_model, mask_zero=True)(inputs)   # Notice the mask_zero parameter to indicate to not pay attention to padding

positional_encoding = keras_nlp.layers.SinePositionEncoding()(x)   # encode the position using Keras API

x = x + positional_encoding   # add to the tokens

for i in range(n_layer):
    x = keras_nlp.layers.TransformerEncoder(intermediate_dim=d_ffn, num_heads=n_head, activation='relu')(x)

x = keras.layers.GlobalAveragePooling1D()(x)

outputs = keras.layers.Dense(n_classes, activation="softmax")(x)

model = keras.Model(inputs, outputs)

model.summary()

Compile and train.

In [None]:
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from datetime import datetime

now = datetime.now().strftime("%Y%m%d_%H%M%S")
tensorboard_callback = TensorBoard(log_dir="logs/scalars/{}".format(now))
checkpointer = ModelCheckpoint(filepath='{}.keras'.format(now), monitor='val_loss', verbose=1, save_best_only=True)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5), loss="categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=128, epochs=100, validation_data=(x_val, y_val), callbacks=[tensorboard_callback, checkpointer])