## **Additive Attention**

Additive Attention, also known as Bahdanau attention, computes the attention weights based on a learned scoring function. The attention score is calculated as the sum of a query and a key, followed by a non-linear activation function (typically a tanh function), which is then passed through a feed-forward neural network.

**Imports**

In [3]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

**Data Loading**

In [None]:
X = np.random.randn(10, 20)  # Random data for input (10 sequences, 20 features)
query = torch.randn(1, 20)  # A single query vector
key = torch.randn(10, 20)  # Key for all sequences
value = torch.randn(10, 20)  # Value for all sequences

**Additive Attention Model**

In [None]:
class AdditiveAttention(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(AdditiveAttention, self).__init__()
        self.query_layer = nn.Linear(input_dim, hidden_dim)
        self.key_layer = nn.Linear(input_dim, hidden_dim)
        self.energy_layer = nn.Linear(hidden_dim, 1)
    
    def forward(self, query, key, value):
        query = self.query_layer(query).unsqueeze(0)
        key = self.key_layer(key)
        energy = torch.tanh(query + key)
        attention_weights = torch.softmax(self.energy_layer(energy).squeeze(), dim=0)
        weighted_sum = torch.sum(attention_weights.unsqueeze(1) * value, dim=0)
        return weighted_sum, attention_weights

**Instantiate and Apply Attention**

In [None]:
attention = AdditiveAttention(input_dim=20, hidden_dim=32)
weighted_sum, attention_weights = attention(query, key, value)

**Display Results**

In [None]:
print("Weighted Sum:", weighted_sum)
print("Attention Weights:", attention_weights)