# Variants of LSTM RNN: Peephole Connections and Coupled Gates

### This Jupyter Notebook explores two important architectural variants of Long Short-Term Memory (LSTM) Recurrent Neural Networks, providing insights into how minor modifications can alter information flow and model behavior.

## Key Learning Objectives:

### 1. **Understanding LSTM Variants**
* While the standard LSTM architecture is widely used, various modifications have been proposed to improve performance, simplify the model, or cater to specific use cases.
* We will focus on two such variants that you might encounter in research papers.

### 2. **Variant 1: LSTM with Peephole Connections**

* **Concept**: This variant enhances the standard LSTM by allowing the gate layers to "look at" (or "peephole into") the cell state. This means the cell state itself becomes an additional input to the gate calculations.
* **Diagrammatic Difference**: In the visual representation, you will observe additional lines emanating from the cell state ($C_{t-1}$) and feeding into the neural network layers of:
    * **Forget Gate**: The calculation of $f_t$ now also takes $C_{t-1}$ as an input.
    * **Input Gate**: The calculation of $i_t$ now also takes $C_{t-1}$ as an input.
    * **Output Gate**: The calculation of $o_t$ now also takes $C_t$ (the *current* cell state, after update) as an input.
* **Mathematical Changes (Illustrative):**
    * **Forget Gate**: $f_t = \sigma(W_f \cdot [H_{t-1}, X_t, \mathbf{C_{t-1}}] + b_f)$
    * **Input Gate**: $i_t = \sigma(W_i \cdot [H_{t-1}, X_t, \mathbf{C_{t-1}}] + b_i)$
    * **Output Gate**: $o_t = \sigma(W_o \cdot [H_{t-1}, X_t, \mathbf{C_t}] + b_o)$
    * *(Note: The bolded $C_{t-1}$ and $C_t$ terms are the added peephole connections)*
* **Purpose/Advantage**: By allowing the gates to "see" the cell state, they can make more informed decisions about forgetting, adding, and outputting information. This can lead to more precise control over the memory flow and potentially better performance on certain tasks.

### diagram - 

![Alt text for the image](images/lstm_peephole_connections_variation.png)

### 3. **Variant 2: Coupled Forget and Input Gates**

* **Concept**: This variant simplifies the LSTM by coupling the decisions of the forget and input gates. Instead of separately deciding what to forget and what to add, these two decisions are made together in an interdependent manner.
* **Core Idea**: "We only forget when we are going to input something in its place." Or, "We only input new values to the state when we forget something older."
* **Diagrammatic Difference**: Visually, you'll see a connection that directly links the output of the sigmoid for forgetting to the input gate's operation, or a combined gate structure.
* **Mathematical Change**: The key change lies in how the cell state ($C_t$) is updated:
    * $C_t = (f_t \times C_{t-1}) + ((1 - f_t) \times \tilde{C}_t)$
* **Explanation of the Equation**:
    * $f_t$: This is the output of the forget gate (sigmoid).
    * $(f_t \times C_{t-1})$: This part is the same as in standard LSTM – what's kept from the old memory.
    * $(1 - f_t)$: This is the crucial coupling. If the model decides to *forget* (i.e., $f_t$ is low, close to 0), then $(1 - f_t)$ will be high (close to 1). Conversely, if it decides to *keep* (i.e., $f_t$ is high, close to 1), then $(1 - f_t)$ will be low (close to 0).
    * $((1 - f_t) \times \tilde{C}_t)$: This means that when the model forgets a certain amount of information (as determined by $f_t$), it *simultaneously* adds a corresponding amount of new candidate information ($\tilde{C}_t$) into that "freed up" space.
* **Purpose/Advantage**: This coupling reduces the number of parameters (as the input gate no longer has its own separate $W_i$ and $b_i$ in the same way, but rather its influence is derived from $f_t$) and can sometimes make training more stable or efficient. It enforces a direct trade-off: if you remember something, you can't add new information in that slot; if you forget something, you make space for new information.

### diagram - 

![Alt text for the image](images/lstm-coupling-forget-and_ip_gate_variation.png)

### 4. **Introduction to GRU (Gated Recurrent Unit)**
* The lecture briefly mentions GRU as another important variant of RNNs, often considered a simplified version of LSTM.
* GRUs aim to achieve similar performance to LSTMs with fewer gates and parameters, potentially leading to faster training and easier implementation.
* The detailed discussion of GRU architecture and its advantages/disadvantages is deferred to the next video.

### Conclusion:
These LSTM variants demonstrate the flexibility of the recurrent neural network architecture. Peephole connections allow gates to directly inform their decisions by looking at the cell state, while coupled gates enforce a direct relationship between forgetting and adding new information. Understanding these variations provides a deeper insight into the design choices and trade-offs in building effective sequence models.

**Next Video**: Deep dive into the architecture and working of Gated Recurrent Units (GRUs).