In [2]:
pip install --user numpy


Note: you may need to restart the kernel to use updated packages.


In [3]:
from tensorflow.keras.layers import Layer
import tensorflow.keras.backend as K

class AttentionLayer(Layer):
    def __init__(self, **kwargs):
        super(AttentionLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        # Define the trainable parameters: weights and bias for the attention mechanism
        self.W_a = self.add_weight(name='W_a',
                                   shape=(input_shape[0][-1], input_shape[0][-1]),
                                   initializer='glorot_uniform',
                                   trainable=True)
        self.U_a = self.add_weight(name='U_a',
                                   shape=(input_shape[1][-1], input_shape[0][-1]),
                                   initializer='glorot_uniform',
                                   trainable=True)
        self.V_a = self.add_weight(name='V_a',
                                   shape=(input_shape[0][-1], 1),
                                   initializer='glorot_uniform',
                                   trainable=True)
        super(AttentionLayer, self).build(input_shape)

    def call(self, inputs):
        # Inputs: [encoder_outputs, decoder_outputs]
        encoder_outputs, decoder_outputs = inputs

        # Compute the alignment scores
        score = K.tanh(K.dot(encoder_outputs, self.W_a) + K.dot(decoder_outputs, self.U_a))
        score = K.dot(score, self.V_a)
        score = K.squeeze(score, axis=-1)  # Remove single-dimensional entries

        # Compute the attention weights
        attention_weights = K.softmax(score)

        # Compute the context vector as a weighted sum of encoder outputs
        context_vector = K.batch_dot(attention_weights, encoder_outputs, axes=[1, 1])

        return context_vector, attention_weights

    def compute_output_shape(self, input_shape):
        return [(input_shape[1][0], input_shape[1][1], input_shape[0][-1]),  # Context vector shape
                (input_shape[1][0], input_shape[1][1])]  # Attention weights shape


# Attention Layer Explanation

The `attention.py` file defines an **Attention Layer** for neural networks, specifically for Sequence-to-Sequence (Seq2Seq) models. This layer improves the model's ability to focus on the most relevant parts of the input sequence during each decoding step, significantly enhancing the summarization process.

---

## **Key Functions of the Attention Layer**

### **Purpose of Attention**
- Attention allows the decoder to dynamically focus on specific parts of the encoder's outputs (contextual representations) while generating each word in the summary.
- Instead of treating all encoder outputs equally, the decoder assigns **weights** (attention scores) to different encoder outputs based on their relevance to the current decoding step.

---

## **Code Breakdown**

### 1. **`build(self, input_shape)`**
- Initializes trainable weight matrices for the attention mechanism:
  - **`W_a`**: Transforms encoder outputs into a comparable space with decoder outputs.
  - **`U_a`**: Transforms decoder outputs to align with transformed encoder outputs.
  - **`V_a`**: Projects the combined alignment scores into a single scalar for each encoder time step.
- Uses the `glorot_uniform` initializer to ensure optimal weight initialization for better gradient flow.

---

### 2. **`call(self, inputs)`**
Processes the inputs to calculate attention weights and context vectors:

- **Inputs**: 
  - `encoder_outputs`: Sequence of outputs from the encoder (hidden states across all time steps).
  - `decoder_outputs`: Current hidden state of the decoder.

- **Steps**:
  1. **Compute Alignment Scores**:
     \[
     \text{score} = \tanh(W_a \cdot \text{encoder\_outputs} + U_a \cdot \text{decoder\_outputs})
     \]
     Captures the similarity between the encoder outputs and the decoder state.
     
  2. **Apply Scoring Weights**:
     \[
     \text{score} = V_a \cdot \text{score}
     \]
     Reduces the scores to a scalar for each encoder time step.

  3. **Compute Attention Weights**:
     \[
     \text{attention\_weights} = \text{softmax(score)}
     \]
     Converts the scores into probabilities, representing the importance of each encoder time step.

  4. **Compute Context Vector**:
     \[
     \text{context\_vector} = \text{sum}(\text{attention\_weights} \cdot \text{encoder\_outputs})
     \]
     Generates a weighted sum of encoder outputs, focusing on the most relevant parts.

---

### 3. **`compute_output_shape(self, input_shape)`**
Defines the output shapes:
- **Context Vector**: Combines encoder outputs dynamically based on attention weights.
- **Attention Weights**: Probabilities representing the importance of each encoder time step.

---

## **How Attention Enhances the Model**

1. **Dynamic Focus**:
   - Attention enables the model to selectively prioritize specific encoder outputs during decoding, rather than processing all inputs uniformly.

2. **Improved Contextual Understanding**:
   - Helps capture dependencies between input and output sequences, particularly for long texts, where fixed-length representations may lose critical information.

3. **Better Summarization**:
   - By focusing on the most relevant parts of the input, the model generates more accurate and contextually appropriate summaries.

4. **Interpretability**:
   - The attention weights show which parts of the input contributed to the output, making the model's decisions more transparent.

---

## **In Summary**
The `AttentionLayer` enhances the Seq2Seq model by introducing a mechanism that dynamically prioritizes certain encoder outputs during decoding. This improves performance on complex sequences and allows the model to produce high-quality summaries by focusing on the most relevant context. The layer also adds interpretability to the model's predictions.
