# Dissecting the LSTM Input Gate and Candidate Memory: How LSTMs Add New Information

### This Jupyter Notebook continues the deep dive into the Long Short-Term Memory (LSTM) architecture, focusing on the second and third crucial components: the Input Gate and the Candidate Memory. These two parts work in conjunction to decide what *new* information from the current input should be added to the LSTM's long-term memory (cell state).

## high level understanding - 

![Alt text for the image](images/input_gate_and_candidate_memory-1.png)

![Alt text for the image](images/input_gate_and_candidate_memory.png)


## Key Learning Objectives:

### 1. **Recap: Forget Gate Equation**
* Briefly revisit the mathematical equation for the Forget Gate ($f_t$):
    * $f_t = \sigma(W_f \cdot [H_{t-1}, X_t] + b_f)$
    * Where:
        * $\sigma$ is the sigmoid activation function.
        * $W_f$ is the weight matrix for the Forget Gate.
        * $[H_{t-1}, X_t]$ is the concatenation of the previous hidden state and current input.
        * $b_f$ is the bias vector for the Forget Gate.
* Understand that this equation represents the internal neural network operation (linear transformation followed by sigmoid activation) that determines the forget factors.

### 2. **Focus: The Input Gate's Role**

* The **Input Gate** decides which parts of the *new information* generated from the current input and previous hidden state are important enough to be *added* to the cell state. It acts as a filter for new information.

#### **2.1. Input Gate Operation**
* **Inputs**: Similar to the Forget Gate, it takes the concatenated $H_{t-1}$ and $X_t$.
* **Neural Network Layer + Sigmoid Activation**: This input is passed through another neural network layer (with its own weights $W_i$ and bias $b_i$) and then through a sigmoid activation function.
    * **Mathematical Representation**: $i_t = \sigma(W_i \cdot [H_{t-1}, X_t] + b_i)$
    * The output, $i_t$, is a vector of values between 0 and 1, indicating the "importance" of each corresponding element in the *candidate memory* (discussed next).

### 3. **Focus: The Candidate Memory's Role**

* The **Candidate Memory** (often denoted as $\tilde{C}_t$) creates a *new candidate* version of the cell state. This candidate state contains potential new information that *could* be added to the long-term memory.

#### **3.1. Candidate Memory Operation**
* **Inputs**: Also takes the concatenated $H_{t-1}$ and $X_t$.
* **Neural Network Layer + Tanh Activation**: This input is passed through yet another neural network layer (with its own weights $W_c$ and bias $b_c$) and then through a **tanh activation function**.
    * **Mathematical Representation**: $\tilde{C}_t = \tanh(W_c \cdot [H_{t-1}, X_t] + b_c)$
    * The tanh function squashes values between -1 and 1, producing a vector that represents the "candidate" new information.

### 4. **Combining Input Gate and Candidate Memory**

* The outputs of the Input Gate ($i_t$) and the Candidate Memory ($\tilde{C}_t$) are combined via **point-wise multiplication ($i_t \times \tilde{C}_t$)**.
* **Purpose**: This multiplication acts as a filter. $i_t$ (from the sigmoid) determines how much of the new candidate information ($\tilde{C}_t$) is actually allowed to pass through and potentially be added to the cell state.
    * If an element in $i_t$ is close to 0, the corresponding new candidate information is largely ignored.
    * If an element in $i_t$ is close to 1, the corresponding new candidate information is fully considered.

### 5. **Updating the Cell State ($C_t$)**

* The final step in updating the long-term memory cell state ($C_t$) for the current time step involves combining the results from the Forget Gate and the Input Gate/Candidate Memory.
* **Mathematical Representation**: $C_t = (f_t \times C_{t-1}) + (i_t \times \tilde{C}_t)$
* **Explanation**:
    1.  $(f_t \times C_{t-1})$: This part represents the *filtered* old memory – what the Forget Gate decided to keep from $C_{t-1}$.
    2.  $(i_t \times \tilde{C}_t)$: This part represents the *new filtered information* – what the Input Gate decided to add from the candidate memory.
    3.  These two results are then combined via **point-wise addition**. This effectively updates the cell state by removing old irrelevant information and adding new relevant information, allowing the LSTM to maintain relevant context over long sequences.

### Conclusion:
The Input Gate and Candidate Memory, working in tandem with the Forget Gate, are crucial for LSTMs' ability to learn long-term dependencies. They meticulously control what new information enters the long-term memory cell state, ensuring that only contextually relevant data is retained and passed forward, thereby mitigating the vanishing gradient problem.

**Next Video**: The next lecture will cover the final component, the **Output Gate**, and then explain how the entire LSTM unit works together to produce the final hidden state.