# 📚 Table of Contents


- [🔁 Introduction to Recurrent Neural Networks (RNNs)](#introduction-to-recurrent-neural-networks-rnns)
  - [🧵 What is a RNN and how does it handle sequential data?](#what-is-a-rnn-and-how-does-it-handle-sequential-data)
  - [⚠️ Understanding vanishing and exploding gradient problems in RNNs](#understanding-vanishing-and-exploding-gradient-problems-in-rnns)
  - [🧪 Implementing a simple RNN for text classification](#implementing-a-simple-rnn-for-text-classification)
- [🧠 Long Short-Term Memory (LSTM) Networks](#long-short-term-memory-lstm-networks)
  - [🚪 The LSTM architecture: Forget, input, and output gates](#the-lstm-architecture-forget-input-and-output-gates)
  - [🧱 Solving the vanishing gradient problem with LSTMs](#solving-the-vanishing-gradient-problem-with-lstms)
  - [📈 Example: Using LSTMs for sentiment analysis on text data](#example-using-lstms-for-sentiment-analysis-on-text-data)
- [🔒 Gated Recurrent Units (GRUs)](#gated-recurrent-units-grus)
  - [🆚 Differences between GRUs and LSTMs](#differences-between-grus-and-lstms)
  - [❓ When to choose GRUs over LSTMs](#when-to-choose-grus-over-lstms)
  - [🧪 Implementing GRUs for language modeling or sequence prediction tasks](#implementing-grus-for-language-modeling-or-sequence-prediction-tasks)

---


### **1. Fixed Core RNN Architecture (Layered View)**
```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart TD
    %% Basic Structure
    subgraph Basic["Basic RNN Structure"]
        direction LR
        X[Input x<sub>t</sub>] --> H[Hidden h<sub>t</sub>]
        H --> Y[Output ŷ<sub>t</sub>]
        H --> HNext[Next h<sub>t+1</sub>]
    end

    %% Detailed View
    subgraph Detail["Detailed View"]
        direction LR
        xt[Input x<sub>t</sub>] -->|Concat| HMath[["h<sub>t</sub> = tanh(W<sub>h</sub>[h<sub>t-1</sub>,x<sub>t</sub>]+b)"]]
        HMath --> yMath[["ŷ<sub>t</sub> = σ(W<sub>y</sub>h<sub>t</sub>)"]]
        note[["Parameters: (300+128)×128 + 128 = 38,528"]]:::yellow
    end

    %% Connection
    Basic --> Detail
    
    classDef yellow fill:#ffffcc,stroke:#ffcc00
```
---

### **2. LSTM Architecture with Mathematical Context**
```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart TB
    %% Gate Structure
    subgraph Gates["LSTM Gates"]
        direction LR
        ft[Forget σ]:::sigmoid
        it[Input σ]:::sigmoid
        Ct[Cell tanh]:::tanh
        ot[Output σ]:::sigmoid
    end

    %% Parameter Context
    subgraph Math["Mathematical Context"]
        direction TB
        Formula[["Params = 4×(h<sub>in</sub>+h<sub>hid</sub>+1)×h<sub>hid</sub>
        h=128 → 4×(300+128)×128 = 219,648"]]:::yellow
        Gates --> Formula
    end

    %% Real-world Connection
    subgraph Usage["Practical Use"]
        Model[["LSTM(units=128)<br/>Keras/PyTorch API"]] --> Perf[["~3× slower than GRU<br/>Better long-range dep"]]
    end

    classDef sigmoid fill:#e6ffe6,stroke:#009900
    classDef tanh fill:#ffe6cc,stroke:#ff9900
    classDef yellow fill:#ffffcc,stroke:#ffcc00
```

---

### **3. GRU vs LSTM: Complete Comparison**
```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart TD
    subgraph Structural["Structural Difference"]
        direction LR
        LSTM_Struct[["LSTM: 3 gates + 2 states"]] --> GRU_Struct[["GRU: 2 gates + 1 state"]]
    end

    subgraph Practical["Practical Trade-offs"]
        direction LR
        Params[["Params: LSTM=4h(h+i) vs GRU=3h(h+i)"]] --> Speed[["Speed: GRU ~25% faster"]]
        Accuracy[["Accuracy: LSTM better on >100 steps"]] --> UseCase[["Use GRU for:<br/>- Real-time apps<br/>- Short sequences"]]
    end

    Structural --> Practical
```

---

### **4. Fixed Integrated Training Workflow**
```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart TD
    subgraph Workflow["End-to-End Training"]
        direction LR
        Code[["Code:
        model = Sequential()
        model.add(LSTM(128))
        model.add(Dense(2))"]] --> Data[Data Loading]
        Data --> Train[Training Loop]
        Train --> Eval[Evaluation]
    end

    subgraph Theory["Theoretical Foundation"]
        direction TB
        Loss[["Loss = Cross-Entropy"]] --> Opt[["Adam Optimizer"]]
    end

    Workflow --> Theory

    classDef code fill:#f8f8f8,stroke:#666
    class Code code
```
---



### **1. RNN Architecture & Parameter Insight**

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart LR
    subgraph RNN["RNN Cell"]
        direction LR
        xt[("x<sub>t</sub>")] -->|Concatenate| H[["h<sub>t</sub> = tanh(W<sub>h</sub>[h<sub>t-1</sub>,x<sub>t</sub>] + b)
        Params = (h<sub>in</sub>+h<sub>hid</sub>)*h<sub>hid</sub> + h<sub>hid</sub>"]]
        H --> yt[("ŷ<sub>t</sub> = σ(W<sub>y</sub>h<sub>t</sub>)")]
        H -->|Recurrence| H_prev[("h<sub>t-1</sub>")]
    end

    subgraph Problems["Gradient Issues"]
        direction TB
        VG[["Vanishing Gradients<br/>(∂Loss/∂h₀ ≈ 0)"]]:::red
        EG[["Exploding Gradients<br/>(||∂hₜ/∂h₀|| → ∞)"]]:::orange
    end

    classDef tanh fill:#ffe6cc,stroke:#ff9900
    classDef sigmoid fill:#e6ffe6,stroke:#009900
    class H tanh
    class yt sigmoid
    classDef red fill:#ffe6e6,stroke:#cc0000
    classDef orange fill:#ffebcc,stroke:#ff9900
```

---

### **2. LSTM: Gate Logic + Parameterization**

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart TB
    subgraph LSTM["LSTM Cell"]
        direction LR
        ft[["Forget Gate<br/>σ (sigmoid)"]]:::sigmoid
        it[["Input Gate<br/>σ (sigmoid)"]]:::sigmoid
        Ct[["Cell Update<br/>tanh"]]:::tanh
        ot[["Output Gate<br/>σ (sigmoid)"]]:::sigmoid
        ht[["Hidden State<br/>tanh"]]:::tanh
    end

    subgraph Params["LSTM Param Count"]
        direction TB
        Formula[["LSTM = 4 × (h<sub>in</sub> + h<sub>hid</sub> + 1) × h<sub>hid</sub><br/>
        Example: h=128, i=300 → 4×(300+128+1)×128 = 219,648"]]:::yellow
    end

    classDef sigmoid fill:#e6ffe6,stroke:#009900
    classDef tanh fill:#ffe6cc,stroke:#ff9900
    classDef yellow fill:#ffffcc,stroke:#ffcc00
```

---

### **3. GRU vs LSTM vs RNN: Structural & Parametric Comparison**

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart LR
    subgraph Structure["Architectural Comparison"]
        direction TB
        LSTM_S[["LSTM: 3 Gates (Forget, Input, Output) + 2 States (Cell, Hidden)"]]:::purple
        GRU_S[["GRU: 2 Gates (Update, Reset) + 1 State (Hidden)"]]:::green
        RNN_S[["SimpleRNN: No gates, 1 State (Hidden)"]]:::orange
    end

    subgraph Params["Parameter Count"]
        direction TB
        LSTM_P[["LSTM: 4h(h+i+1) = 219,648"]]:::purple
        GRU_P[["GRU: 3h(h+i+1) = 164,736"]]:::green
        RNN_P[["RNN: h(h+i+1) = 54,912"]]:::orange
    end

    classDef purple fill:#f0e6ff,stroke:#6600cc
    classDef green fill:#e6ffe6,stroke:#009900
    classDef orange fill:#ffe6cc,stroke:#ff9900
```

---

### **4. Implementation Examples**

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart LR
    subgraph Keras["Keras Example"]
        Input["Input(seq_len)"] --> Embed["Embedding(128)"] --> RNNL["SimpleRNN(64)"] --> Dense["Dense(2, softmax)"]
    end

    subgraph PyTorch["PyTorch GRU"]
        Define[["nn.GRU(input_size=300, hidden_size=128, num_layers=2, batch_first=True)"]]:::code
        Forward[["out, _ = self.gru(x)\nout = self.fc(out[:,-1,:])"]]
    end

    classDef code fill:#f8f8f8,stroke:#666
```

---

### **5. Integrated Training Workflow**

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart TD
    subgraph All["Training Workflow"]
        direction LR
        Data[["1. Data Prep
        - Tokenize
        - Pad
        - Embed"]] --> Model[["2. Model Selection
        - RNN → Short Seqs
        - LSTM → Long Deps
        - GRU → Fast Training"]]
        Model --> Train[["3. Train Loop
        - Forward
        - Loss
        - BPTT
        - Update"]]
        Train --> Eval[["4. Evaluation
        - Accuracy / Perplexity
        - Grad Norm"]]
    end

    subgraph Config["Training Config"]
        direction TB
        Loss[["Loss:
        - CrossEntropy (LM)
        - BCE (Classif)"]]:::yellow
        Opt[["Optimizer:
        - Adam (lr=3e-4)
        - Grad Clipping"]]:::blue
    end

    classDef yellow fill:#ffffcc,stroke:#ffcc00
    classDef blue fill:#e6f3ff,stroke:#0066cc
```



# <a id="introduction-to-recurrent-neural-networks-rnns"></a>🔁 Introduction to Recurrent Neural Networks (RNNs)

# <a id="what-is-a-rnn-and-how-does-it-handle-sequential-data"></a>🧵 What is a RNN and how does it handle sequential data?

# <a id="understanding-vanishing-and-exploding-gradient-problems-in-rnns"></a>⚠️ Understanding vanishing and exploding gradient problems in RNNs

# <a id="implementing-a-simple-rnn-for-text-classification"></a>🧪 Implementing a simple RNN for text classification

---

# <a id="long-short-term-memory-lstm-networks"></a>🧠 Long Short-Term Memory (LSTM) Networks

# <a id="the-lstm-architecture-forget-input-and-output-gates"></a>🚪 The LSTM architecture: Forget, input, and output gates

# <a id="solving-the-vanishing-gradient-problem-with-lstms"></a>🧱 Solving the vanishing gradient problem with LSTMs

# <a id="example-using-lstms-for-sentiment-analysis-on-text-data"></a>📈 Example: Using LSTMs for sentiment analysis on text data

---

# <a id="gated-recurrent-units-grus"></a>🔒 Gated Recurrent Units (GRUs)

# <a id="differences-between-grus-and-lstms"></a>🆚 Differences between GRUs and LSTMs

# <a id="when-to-choose-grus-over-lstms"></a>❓ When to choose GRUs over LSTMs

# <a id="implementing-grus-for-language-modeling-or-sequence-prediction-tasks"></a>🧪 Implementing GRUs for language modeling or sequence prediction tasks

---
