Simple Luong Attention (Global) as given in the paper.

Minh-Thang Luong, Hieu Pham, Christopher D. Manning, Effective Approaches to Attention-based Neural Machine Translation, 2015 (https://arxiv.org/abs/1508.04025)

Compute attention as given in the paper section 3.1 as "dot"

Note that in a real-world scenario, attention mechanism would be incorporated into a larger model, such as a sequence-to-sequence model.

In [20]:
import numpy as np

# Encoder hidden states
encoder_hidden_states = np.array([[0.1, 0.2, 0.6],
                                  [0.4, 0.5, 0.6],
                                  [0.7, 0.8, 0.9],
                                  [0.7, 0.4, 0.4]])

# Decoder hidden state
decoder_hidden_state = np.array([0.5, 0.6, 0.7])

# Calculate attention scores (dot)
attention_scores = np.dot(decoder_hidden_state, np.transpose(encoder_hidden_states))

# Apply softmax to get attention weights
attention_weights = np.exp(attention_scores) / np.sum(np.exp(attention_scores))

# Calculate context vector
context_vector = np.dot(attention_weights, encoder_hidden_states)

print("Attention Weights:", attention_weights)
print("Context Vector:", context_vector)

Attention Weights: [0.16390732 0.22798986 0.39123216 0.21687066]
Context Vector: [0.53325865 0.54651039 0.67399552]


Compute attention as given in the paper section 3.1 as "general".

Note: This is just an example and Wa is initialized with random weights. In a real-world scenario, you would usually use learned parameters for Wa and the attention mechanism would be incorporated into a larger model, such as a sequence-to-sequence model.

In [21]:
# Weight matrix (this is the matrix that is learned)
Wa = np.random.rand(3, 3)

# Calculate attention scores (dot)
attention_scores = np.dot(np.dot(decoder_hidden_state, Wa), np.transpose(encoder_hidden_states))

# Apply softmax to get attention weights
attention_weights = np.exp(attention_scores) / np.sum(np.exp(attention_scores))

# Calculate context vector
context_vector = np.dot(attention_weights, encoder_hidden_states)

print("Attention Weights:", attention_weights)
print("Context Vector:", context_vector)

Attention Weights: [0.1079391  0.18923605 0.50715771 0.19566713]
Context Vector: [0.57846572 0.60019887 0.71301389]
