## 🔍 Forget Gate in LSTM RNN – Step-by-Step Explanation

The **Forget Gate** in an LSTM RNN is responsible for deciding how much of the **past memory (cell state)** should be **retained or discarded** at each time step.

---

### 🧠 Step-by-Step Working of the Forget Gate

#### 🔹 Step 1: Inputs to the Forget Gate
- The forget gate receives two inputs:
  - `hₜ₋₁`: Hidden state from the previous time step (short-term memory)
  - `xₜ`: Current input

#### 🔹 Step 2: Concatenation
- These two vectors are **concatenated**:
  \[
  [hₜ₋₁, xₜ]
  \]

#### 🔹 Step 3: Linear Transformation
- The concatenated vector is passed through a **weight matrix (`W_f`)** and added to a **bias (`b_f`)**:
  \[
  z_f = W_f \cdot [hₜ₋₁, xₜ] + b_f
  \]

#### 🔹 Step 4: Sigmoid Activation
- A **sigmoid activation** is applied to the result:
  \[
  fₜ = \sigma(z_f)
  \]
- This gives values between **0 and 1**:
  - `0` → Forget everything
  - `1` → Keep everything

#### 🔹 Step 5: Element-wise Multiplication with Cell State
- The output `fₜ` is multiplied element-wise with the previous cell state `Cₜ₋₁`:
  \[
  Cₜ = fₜ * Cₜ₋₁
  \]
- This controls how much of the previous memory is **carried forward**.

---

### 🧪 Example

Let:
- `Cₜ₋₁ = [10, 20, 30]`
- `fₜ = [0.9, 0.5, 0.0]`

Then:
- `Cₜ = [9.0, 10.0, 0.0]`

**Interpretation:**
- Keep 90% of the first memory value
- Keep 50% of the second
- Discard the third completely

---

### 💡 Why is the Forget Gate Important?

- Helps the model **retain only relevant information** from the past.
- Prevents **accumulation of unnecessary memory**.
- Solves the **long-term dependency problem** found in vanilla RNNs.
- Makes LSTM RNNs **more stable** and capable of learning across long sequences.

---
