## üß† LSTM RNN ‚Äì Step-by-Step Explanation of Input Gate, Candidate Memory, and Cell State Update

---

### üîÑ Quick Recap of the LSTM RNN Structure

In an LSTM RNN, the cell has two memory pathways:
- **Short-Term Memory**: The hidden state `h‚Çú‚Çã‚ÇÅ`
- **Long-Term Memory**: The cell state `C‚Çú‚Çã‚ÇÅ`

The update to the memory (`C‚Çú`) is controlled by three gates:
1. **Forget Gate** ‚Äì Decides what old info to erase.
2. **Input Gate** ‚Äì Decides what new info to add.
3. **Output Gate** ‚Äì Decides what to send to the next layer/time step.

---

### üü° Step 1: Input Gate ‚Äì Controls *What New Info to Store*

Think of the input gate as a **filter**.

It looks at:
- The **current input** (`x‚Çú`)
- The **previous hidden state** (`h‚Çú‚Çã‚ÇÅ`)

It outputs a vector of values between 0 and 1 (after applying a sigmoid function). These values **scale** the amount of new information that can be added to the memory.

**Example**: If the input gate outputs `[0.8, 0.2, 0.0]`, it means:
- Store most of the new info in the first dimension
- Store a little in the second
- Store nothing in the third

---

### üü© Step 2: Candidate Memory ‚Äì Proposes *What the New Info Is*

The **candidate memory** (also called "memory content") is a vector containing new information **proposed to be written** into the memory.

This vector is created using the current input and the previous hidden state, passed through a tanh activation (which outputs values between -1 and 1). It‚Äôs like a suggestion for what the cell could remember next.

**Example**: Candidate memory = `[0.5, -0.3, 0.7]`  
‚Äî This means the cell is proposing to:
- Add positive info in the 1st dimension
- Remove/penalize info in the 2nd
- Add strong positive info in the 3rd

---

### üîÅ Step 3: Combine with Forget Gate and Update Cell State

To compute the new cell state `C‚Çú`:
1. The **Forget Gate** decides how much of the **old memory** `C‚Çú‚Çã‚ÇÅ` should be kept.
2. The **Input Gate** scales the **new proposed memory**.
3. The cell **adds** both parts together to update the memory.

This way:
- The model can **retain context** for long sequences.
- It can **delete unhelpful info** (via forget gate).
- It can **add new knowledge** (via input gate and candidate memory).

---

### üß™ Intuition with Numbers

Let‚Äôs say:
- The forget gate gives `[0.9, 0.2, 0.1]` ‚Üí Keep 90% of old info in dim 1, 20% in dim 2, 10% in dim 3.
- The input gate gives `[0.88, 0.26, 0.5]` ‚Üí Allow medium-to-high new info in dims 1 and 3.
- The candidate memory gives `[0.46, -0.29, 0.76]`

Then:
- **Old memory contribution** = forget gate * old memory
- **New memory contribution** = input gate * candidate memory
- **Updated memory** = sum of above two

This is how the LSTM learns **when to remember, when to forget, and what to focus on**.

---


## üß† LSTM RNN ‚Äì Input Gate, Candidate Memory, and Cell State Update (Ct)

---

### üü° Step 1: Input Gate ‚Äì Decides *What New Information to Add*

We compute the input gate vector `i‚Çú`:

\[
i‚Çú = \sigma(W·µ¢ \cdot [h‚Çú‚Çã‚ÇÅ, x‚Çú] + b·µ¢)
\]

#### Example

Let:

- `x‚Çú = [1, 2, 3]` (current input)
- `h‚Çú‚Çã‚ÇÅ = [4, 5, 6]` (previous hidden state)

Concatenate them:

\[
[h‚Çú‚Çã‚ÇÅ, x‚Çú] = [4, 5, 6, 1, 2, 3]
\]

Assume `W·µ¢` is a 3x6 matrix and `b·µ¢ = [0.1, 0.2, 0.3]`.

Let:

\[
W·µ¢ \cdot [h‚Çú‚Çã‚ÇÅ, x‚Çú] + b·µ¢ = [2.0, -1.0, 0.0]
\]

Now apply the sigmoid function:

\[
i‚Çú = \sigma([2.0, -1.0, 0.0]) = [0.8808, 0.2689, 0.5]
\]

---

### üü© Step 2: Candidate Memory (`ƒà‚Çú`) ‚Äì Proposes New Info to Store

\[
ƒà‚Çú = \tanh(W_c \cdot [h‚Çú‚Çã‚ÇÅ, x‚Çú] + b_c)
\]

Assume:

\[
W_c \cdot [h‚Çú‚Çã‚ÇÅ, x‚Çú] + b_c = [0.5, -0.3, 1.0]
\]

Then:

\[
ƒà‚Çú = \tanh([0.5, -0.3, 1.0]) = [0.4621, -0.2913, 0.7616]
\]

---

### üß© Step 3: Combine with Forget Gate to Compute `C‚Çú`

Recall from the previous example:

- Forget Gate output:  
  `f‚Çú = [0.9, 0.2, 0.1]`
- Previous Cell State:  
  `C‚Çú‚Çã‚ÇÅ = [0.5, 0.1, -0.4]`

Now compute new cell state:

\[
C‚Çú = f‚Çú \odot C‚Çú‚Çã‚ÇÅ + i‚Çú \odot ƒà‚Çú
\]

Element-wise operations:

1. **Forget part**:  
   `f‚Çú * C‚Çú‚Çã‚ÇÅ = [0.9√ó0.5, 0.2√ó0.1, 0.1√ó(-0.4)] = [0.45, 0.02, -0.04]`

2. **Input part**:  
   `i‚Çú * ƒà‚Çú = [0.8808√ó0.4621, 0.2689√ó(-0.2913), 0.5√ó0.7616]`  
   `= [0.4071, -0.0783, 0.3808]`

3. **Add both parts**:  
   `C‚Çú = [0.45 + 0.4071, 0.02 + (-0.0783), -0.04 + 0.3808]`  
   `= [0.8571, -0.0583, 0.3408]`

---

### ‚úÖ Final Result

\[
C‚Çú = [0.8571, -0.0583, 0.3408]
\]

This updated **cell state** `C‚Çú` now contains a blend of **retained memory** from `C‚Çú‚Çã‚ÇÅ` and **new relevant information** added via the input gate and candidate memory.

---
