## 🧠 LSTM RNN – Step-by-Step Explanation of Input Gate, Candidate Memory, and Cell State Update

---

### 🔄 Quick Recap of the LSTM RNN Structure

In an LSTM RNN, the cell has two memory pathways:
- **Short-Term Memory**: The hidden state `hₜ₋₁`
- **Long-Term Memory**: The cell state `Cₜ₋₁`

The update to the memory (`Cₜ`) is controlled by three gates:
1. **Forget Gate** – Decides what old info to erase.
2. **Input Gate** – Decides what new info to add.
3. **Output Gate** – Decides what to send to the next layer/time step.

---

### 🟡 Step 1: Input Gate – Controls *What New Info to Store*

Think of the input gate as a **filter**.

It looks at:
- The **current input** (`xₜ`)
- The **previous hidden state** (`hₜ₋₁`)

It outputs a vector of values between 0 and 1 (after applying a sigmoid function). These values **scale** the amount of new information that can be added to the memory.

**Example**: If the input gate outputs `[0.8, 0.2, 0.0]`, it means:
- Store most of the new info in the first dimension
- Store a little in the second
- Store nothing in the third

---

### 🟩 Step 2: Candidate Memory – Proposes *What the New Info Is*

The **candidate memory** (also called "memory content") is a vector containing new information **proposed to be written** into the memory.

This vector is created using the current input and the previous hidden state, passed through a tanh activation (which outputs values between -1 and 1). It’s like a suggestion for what the cell could remember next.

**Example**: Candidate memory = `[0.5, -0.3, 0.7]`  
— This means the cell is proposing to:
- Add positive info in the 1st dimension
- Remove/penalize info in the 2nd
- Add strong positive info in the 3rd

---

### 🔁 Step 3: Combine with Forget Gate and Update Cell State

To compute the new cell state `Cₜ`:
1. The **Forget Gate** decides how much of the **old memory** `Cₜ₋₁` should be kept.
2. The **Input Gate** scales the **new proposed memory**.
3. The cell **adds** both parts together to update the memory.

This way:
- The model can **retain context** for long sequences.
- It can **delete unhelpful info** (via forget gate).
- It can **add new knowledge** (via input gate and candidate memory).

---

### 🧪 Intuition with Numbers

Let’s say:
- The forget gate gives `[0.9, 0.2, 0.1]` → Keep 90% of old info in dim 1, 20% in dim 2, 10% in dim 3.
- The input gate gives `[0.88, 0.26, 0.5]` → Allow medium-to-high new info in dims 1 and 3.
- The candidate memory gives `[0.46, -0.29, 0.76]`

Then:
- **Old memory contribution** = forget gate * old memory
- **New memory contribution** = input gate * candidate memory
- **Updated memory** = sum of above two

This is how the LSTM learns **when to remember, when to forget, and what to focus on**.

---


## 🧠 LSTM RNN – Input Gate, Candidate Memory, and Cell State Update (Ct)

---

### 🟡 Step 1: Input Gate – Decides *What New Information to Add*

We compute the input gate vector `iₜ`:

\[
iₜ = \sigma(Wᵢ \cdot [hₜ₋₁, xₜ] + bᵢ)
\]

#### Example

Let:

- `xₜ = [1, 2, 3]` (current input)
- `hₜ₋₁ = [4, 5, 6]` (previous hidden state)

Concatenate them:

\[
[hₜ₋₁, xₜ] = [4, 5, 6, 1, 2, 3]
\]

Assume `Wᵢ` is a 3x6 matrix and `bᵢ = [0.1, 0.2, 0.3]`.

Let:

\[
Wᵢ \cdot [hₜ₋₁, xₜ] + bᵢ = [2.0, -1.0, 0.0]
\]

Now apply the sigmoid function:

\[
iₜ = \sigma([2.0, -1.0, 0.0]) = [0.8808, 0.2689, 0.5]
\]

---

### 🟩 Step 2: Candidate Memory (`Ĉₜ`) – Proposes New Info to Store

\[
Ĉₜ = \tanh(W_c \cdot [hₜ₋₁, xₜ] + b_c)
\]

Assume:

\[
W_c \cdot [hₜ₋₁, xₜ] + b_c = [0.5, -0.3, 1.0]
\]

Then:

\[
Ĉₜ = \tanh([0.5, -0.3, 1.0]) = [0.4621, -0.2913, 0.7616]
\]

---

### 🧩 Step 3: Combine with Forget Gate to Compute `Cₜ`

Recall from the previous example:

- Forget Gate output:  
  `fₜ = [0.9, 0.2, 0.1]`
- Previous Cell State:  
  `Cₜ₋₁ = [0.5, 0.1, -0.4]`

Now compute new cell state:

\[
Cₜ = fₜ \odot Cₜ₋₁ + iₜ \odot Ĉₜ
\]

Element-wise operations:

1. **Forget part**:  
   `fₜ * Cₜ₋₁ = [0.9×0.5, 0.2×0.1, 0.1×(-0.4)] = [0.45, 0.02, -0.04]`

2. **Input part**:  
   `iₜ * Ĉₜ = [0.8808×0.4621, 0.2689×(-0.2913), 0.5×0.7616]`  
   `= [0.4071, -0.0783, 0.3808]`

3. **Add both parts**:  
   `Cₜ = [0.45 + 0.4071, 0.02 + (-0.0783), -0.04 + 0.3808]`  
   `= [0.8571, -0.0583, 0.3408]`

---

### ✅ Final Result

\[
Cₜ = [0.8571, -0.0583, 0.3408]
\]

This updated **cell state** `Cₜ` now contains a blend of **retained memory** from `Cₜ₋₁` and **new relevant information** added via the input gate and candidate memory.

---
