# Attention Mechanism with LSTM

This notebook demonstrates how to use **Attention** with LSTMs.

🔹 LSTMs capture sequential dependencies but may struggle with long sequences.
🔹 Attention helps the model **focus on important time steps** instead of compressing everything into the final hidden state.

📌 Applications: Machine Translation, Text Summarization, Question Answering.

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import LSTM, Dense, Embedding, Input, Attention

print("TensorFlow version:", tf.__version__)

## Create Toy Sequence Data

In [None]:
def generate_data(n_samples, timesteps, vocab_size):
    X = np.random.randint(1, vocab_size, (n_samples, timesteps))
    y = (np.sum(X, axis=1) % 2 == 0).astype(int)  # even sum → class 1, else 0
    return X, y

n_samples, timesteps, vocab_size = 1000, 10, 50
X, y = generate_data(n_samples, timesteps, vocab_size)
print("X shape:", X.shape, "y shape:", y.shape)

## Build Attention-based LSTM Model

In [None]:
inputs = Input(shape=(timesteps,))
embed = Embedding(input_dim=vocab_size, output_dim=32)(inputs)

# Return sequences for attention
lstm_out = LSTM(64, return_sequences=True)(embed)

# Apply Attention
attention = Attention()([lstm_out, lstm_out])
context_vector = tf.reduce_mean(attention, axis=1)

# Output Layer
output = Dense(1, activation='sigmoid')(context_vector)

model = Model(inputs, output)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

## Train Model

In [None]:
history = model.fit(X, y, epochs=5, batch_size=32, validation_split=0.2, verbose=1)

## Test Prediction

In [None]:
sample = np.random.randint(1, vocab_size, (1, timesteps))
pred = model.predict(sample)
print("Input sequence:", sample)
print("Predicted probability:", pred[0][0])