# Timeseries classification with a Transformer model

**Author:** [Theodoros Ntakouris](https://github.com/ntakouris)<br>
**Date created:** 2021/06/25<br>
**Last modified:** 2021/08/05<br>
**Description:** This notebook demonstrates how to do timeseries classification using a Transformer model.

## Introduction

This is the Transformer architecture from
[Attention Is All You Need](https://arxiv.org/abs/1706.03762),
applied to timeseries instead of natural language.

This example requires TensorFlow 2.4 or higher.

## Load the dataset

We are going to use the same dataset and preprocessing as the
[TimeSeries Classification from Scratch](https://keras.io/examples/timeseries/timeseries_classification_from_scratch)
example.

In [1]:
import numpy as np


def readucr(filename):
    data = np.loadtxt(filename, delimiter="\t")
    y = data[:, 0]
    x = data[:, 1:]
    return x, y.astype(int)


root_url = "https://raw.githubusercontent.com/hfawaz/cd-diagram/master/FordA/"

x_train, y_train = readucr(root_url + "FordA_TRAIN.tsv")
x_test, y_test = readucr(root_url + "FordA_TEST.tsv")

x_train = x_train.reshape((x_train.shape[0], x_train.shape[1], 1))
x_test = x_test.reshape((x_test.shape[0], x_test.shape[1], 1))

n_classes = len(np.unique(y_train))

idx = np.random.permutation(len(x_train))
x_train = x_train[idx]
y_train = y_train[idx]

y_train[y_train == -1] = 0
y_test[y_test == -1] = 0

In [2]:
print(x_test.shape)
print(x_train.shape)

(1320, 500, 1)
(3601, 500, 1)


In [3]:
print(y_test.shape)
print(y_train.shape)

(1320,)
(3601,)


In [4]:
x_test1 = np.stack((x_test, x_test, x_test, x_test, x_test, x_test, x_test, x_test), axis=-1)
x_test = np.squeeze(x_test1,2)
print(x_test.shape)

x_train1 = np.stack((x_train, x_train, x_train, x_train, x_train, x_train, x_train, x_train), axis=-1)
x_train = np.squeeze(x_train1,2)
print(x_train.shape) # shape is data_size, seq_len, dim

(1320, 500, 8)
(3601, 500, 8)


## Build the model

Our model processes a tensor of shape `(batch size, sequence length, features)`,
where `sequence length` is the number of time steps and `features` is each input
timeseries.

You can replace your classification RNN layers with this one: the
inputs are fully compatible!

In [5]:
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow as tf

We include residual connections, layer normalization, and dropout.
The resulting layer can be stacked multiple times.

The projection layers are implemented through `keras.layers.Conv1D`.

In [6]:

def transformer_encoder(inputs, num_heads, embed_dim, ff_dim, dropout=0): # embed_dim:8, head=4
    # Normalization and Attention
    key_dim = int (embed_dim/num_heads)
    x = layers.MultiHeadAttention(
        key_dim=key_dim, num_heads=num_heads, dropout=dropout
    )(inputs, inputs)
    x = layers.Dropout(dropout)(x)
    res = x + inputs
    mha_out = layers.LayerNormalization()(res)
    
    # Feed Forward Part
    
    x = layers.Dense(ff_dim, activation="relu")(mha_out)
    x = layers.Dropout(dropout)(x)
    x = layers.Dense(embed_dim)(x)
    res2 = x + mha_out
    en_out = layers.LayerNormalization()(res2)
    return en_out


The main part of our model is now complete. We can stack multiple of those
`transformer_encoder` blocks and we can also proceed to add the final
Multi-Layer Perceptron classification head. Apart from a stack of `Dense`
layers, we need to reduce the output tensor of the `TransformerEncoder` part of
our model down to a vector of features for each data point in the current
batch. A common way to achieve this is to use a pooling layer. For
this example, a `GlobalAveragePooling1D` layer is sufficient.

In [7]:

def build_model(
    input_shape,
    num_heads,
    embed_dim,
    ff_dim,
    num_transformer_blocks,
    dropout=0,
):
    inputs = keras.Input(shape=input_shape)
    
#     length = tf.shape(inputs)[-2]
#     print("length is: ", length)
    positions = tf.range(start=0, limit=500, delta=1)
    
    x = layers.Embedding(input_dim=500, output_dim=8)(positions)
    x = inputs + x
    for _ in range(num_transformer_blocks):
        x = transformer_encoder(x, num_heads, embed_dim, ff_dim, dropout)
   
    x = layers.GlobalAveragePooling1D()(x)
    x = layers.Dense(128)(x)
    outputs = layers.Dense(n_classes, activation="softmax")(x)
    return keras.Model(inputs, outputs)


## Train and evaluate

In [8]:
input_shape = x_train.shape[1:]
embed_dim = x_train.shape[-1]
print(input_shape) #(500, 8)
print(embed_dim)
print(n_classes)

(500, 8)
8
2


In [9]:
model = build_model(
    input_shape,
    num_heads=4,
    embed_dim=embed_dim,
    ff_dim=128,
    num_transformer_blocks=6,
    dropout=0.15,
)

model.compile(
    loss="sparse_categorical_crossentropy",
    optimizer=keras.optimizers.Adam(learning_rate=1e-4),
    metrics=["sparse_categorical_accuracy"],
)
model.summary()


length is:  KerasTensor(type_spec=TensorSpec(shape=(), dtype=tf.int32, name=None), inferred_value=[500], name='tf.__operators__.getitem/strided_slice:0', description="created by layer 'tf.__operators__.getitem'")
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 500, 8)]     0           []                               
                                                                                                  
 tf.compat.v1.shape (TFOpLambda  (3,)                0           ['input_1[0][0]']                
 )                                                                                                
                                                                                                  
 tf.__operators__.getitem (Slic  ()                  0           ['tf.compat.v1

                                                                                                  
 tf.__operators__.add_6 (TFOpLa  (None, 500, 8)      0           ['dense_5[0][0]',                
 mbda)                                                            'layer_normalization_4[0][0]']  
                                                                                                  
 layer_normalization_5 (LayerNo  (None, 500, 8)      16          ['tf.__operators__.add_6[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 multi_head_attention_3 (MultiH  (None, 500, 8)      288         ['layer_normalization_5[0][0]',  
 eadAttention)                                                    'layer_normalization_5[0][0]']  
                                                                                                  
 dropout_6

                                                                                                  
Total params: 20,434
Trainable params: 20,434
Non-trainable params: 0
__________________________________________________________________________________________________


In [10]:
print(x_train.shape)
print(y_train.shape)

(3601, 500, 8)
(3601,)


In [11]:
callbacks = [keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)]

model.fit(
    x_train,
    y_train,
    validation_split=0.2,
    epochs=1,
    batch_size=32,
    callbacks=callbacks,
)



<keras.callbacks.History at 0x1d854d89430>

In [12]:
model.evaluate(x_test, y_test, verbose=1)



[0.694871187210083, 0.4856060743331909]

In [13]:
model.save("tran_model.h5")

In [14]:
from tensorflow.keras.models import load_model

In [15]:
model = load_model('tran_model.h5')

## Conclusions

In about 110-120 epochs (25s each on Colab), the model reaches a training
accuracy of ~0.95, validation accuracy of ~84 and a testing
accuracy of ~85, without hyperparameter tuning. And that is for a model
with less than 100k parameters. Of course, parameter count and accuracy could be
improved by a hyperparameter search and a more sophisticated learning rate
schedule, or a different optimizer.

You can use the trained model hosted on [Hugging Face Hub](https://huggingface.co/keras-io/timeseries_transformer_classification) and try the demo on [Hugging Face Spaces](https://huggingface.co/spaces/keras-io/timeseries_transformer_classification).