# Attention model

Reference : https://machinelearningmastery.com/encoder-decoder-attention-sequence-to-sequence-prediction-keras/

The encoder-decoder model for recurrent neural networks is an architecture for sequence-to-sequence prediction problems.

**Encoder**: The encoder is responsible for stepping through the input time steps and encoding the entire sequence into a fixed length vector called a context vector.

**Decoder**: The decoder is responsible for stepping through the output time steps while reading from the context vector.

A problem with the architecture is that performance is poor on long input or output sequences.

Attention is an extension to the architecture that addresses this limitation. It works by first providing a richer context from the encoder to the decoder and a learning mechanism where the decoder can learn where to pay attention in the richer encoding when predicting each time step in the output sequence.

In [3]:
pip install keras-self-attention

Collecting keras-self-attention
  Downloading keras-self-attention-0.50.0.tar.gz (12 kB)
Building wheels for collected packages: keras-self-attention
  Building wheel for keras-self-attention (setup.py) ... [?25l[?25hdone
  Created wheel for keras-self-attention: filename=keras_self_attention-0.50.0-py3-none-any.whl size=19414 sha256=4dfce577dcdbaa01df4af3a8296744b2ebdf6ab20cc3ebf41df2dd43b3210450
  Stored in directory: /root/.cache/pip/wheels/92/7a/a3/231bef5803298e7ec1815215bc0613239cb1e9c03c57b13c14
Successfully built keras-self-attention
Installing collected packages: keras-self-attention
Successfully installed keras-self-attention-0.50.0


In [6]:
from numpy import array
from numpy import argmax
from numpy import array_equal
from keras.models import Sequential
from keras.layers import LSTM
from attention import AttentionLayer

In [None]:
# generate a sequence of random integers
def generate_sequence(length, n_unique):
	return [randint(0, n_unique-1) for _ in range(length)]

In [None]:
# one hot encode sequence
def one_hot_encode(sequence, n_unique):
	encoding = list()
	for value in sequence:
		vector = [0 for _ in range(n_unique)]
		vector[value] = 1
		encoding.append(vector)
	return array(encoding)

In [None]:
# decode a one hot encoded string
def one_hot_decode(encoded_seq):
	return [argmax(vector) for vector in encoded_seq]

In [None]:
# prepare data for the LSTM
def get_pair(n_in, n_out, cardinality):

	# generate random sequence
	sequence_in = generate_sequence(n_in, cardinality)
	sequence_out = sequence_in[:n_out] + [0 for _ in range(n_in-n_out)]
  
	# one hot encode
	X = one_hot_encode(sequence_in, cardinality)
	y = one_hot_encode(sequence_out, cardinality)
	# reshape as 3D
	X = X.reshape((1, X.shape[0], X.shape[1]))
	y = y.reshape((1, y.shape[0], y.shape[1]))
	return X,y

In [None]:
n_features = 50
n_timesteps_in = 5
n_timesteps_out = 2

In [None]:
# define model
model = Sequential()
model.add(LSTM(150, input_shape=(n_timesteps_in, n_features), return_sequences=True))
model.add(AttentionDecoder(150, n_features))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
# train LSTM
for epoch in range(5000):

	# generate new random sequence
	X,y = get_pair(n_timesteps_in, n_timesteps_out, n_features)
 
	# fit model for one epoch on this sequence
	model.fit(X, y, epochs=1, verbose=2)

In [None]:
# train LSTM
for epoch in range(5000):
	# generate new random sequence
	X,y = get_pair(n_timesteps_in, n_timesteps_out, n_features)
	# fit model for one epoch on this sequence
	model.fit(X, y, epochs=1, verbose=2)

In [None]:
# evaluate LSTM
total, correct = 100, 0
for _ in range(total):
	X,y = get_pair(n_timesteps_in, n_timesteps_out, n_features)
	yhat = model.predict(X, verbose=0)
	if array_equal(one_hot_decode(y[0]), one_hot_decode(yhat[0])):
		correct += 1
print('Accuracy: %.2f%%' % (float(correct)/float(total)*100.0))

In [7]:

# # check some examples
# for _ in range(10):
# 	X,y = get_pair(n_timesteps_in, n_timesteps_out, n_features)
# 	yhat = model.predict(X, verbose=0)
# 	print('Expected:', one_hot_decode(y[0]), 'Predicted', one_hot_decode(yhat[0]))