### Manual Attention Calculation

In [1]:
import numpy as np

In [2]:
# Define query and key vectors
q = np.array([1, 2, 1.])
k1 = np.array([-1, -1, 3.])
k2 = np.array([1, 2, -5.])

In [3]:
# Calculate attention scores using dot product
attention_scores_k1 = np.dot(q, k1)
attention_scores_k2 = np.dot(q, k2)

In [4]:
# Apply softmax to get attention weights
attention_weights_k1 = np.exp(attention_scores_k1) / (np.exp(attention_scores_k1) + np.exp(attention_scores_k2))
attention_weights_k2 = np.exp(attention_scores_k2) / (np.exp(attention_scores_k1) + np.exp(attention_scores_k2))

In [5]:
# Calculate context vector as the weighted sum of value vectors (assuming values are the same as keys)
value_k1 = k1  # Value vector corresponding to key k1
value_k2 = k2  # Value vector corresponding to key k2

In [6]:
context_vector = attention_weights_k1 * value_k1 + attention_weights_k2 * value_k2

In [7]:
print("Context Vector:", context_vector)
print("Attention Weights for k1:", attention_weights_k1)
print("Attention Weights for k2:", attention_weights_k2)

Context Vector: [ 0.   0.5 -1. ]
Attention Weights for k1: 0.5
Attention Weights for k2: 0.5


### With Keras

In [8]:
import tensorflow as tf

2023-09-25 11:34:54.531705: I tensorflow/core/platform/cpu_feature_guard.cc:183] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [9]:
tf.keras.layers.Attention()([np.array([q]), np.array([k1, k2])], return_attention_scores=True)

2023-09-25 11:35:27.375089: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:1013] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-09-25 11:35:27.427054: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:1013] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-09-25 11:35:27.427168: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:1013] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/s

(<tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[ 0. ,  0.5, -1. ]], dtype=float32)>,
 <tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[0.5, 0.5]], dtype=float32)>)