# 🧠 GRU (Gated Recurrent Unit)

## Definition
GRU is a type of neural network used for understanding and working with sequences — like sentences, time series, or audio.

It remembers important information and forgets unimportant parts using two smart "gates":

- **Update Gate** – decides how much old memory to keep  
- **Reset Gate** – decides how much past info to forget  

GRU is faster and simpler than LSTM, but still very good at handling long-term information.

✅ **In short**: GRU helps computers understand data that comes in order (like text or time), by remembering what's important and forgetting what's not.

---

## 🕰️ History of GRU

| Year | Event |
|------|-------|
| 1991 | RNN introduced (simple memory over time) |
| 1997 | LSTM introduced (solved forgetting problem) |
| 2014 | GRU introduced by Kyunghyun Cho |

GRU was made to be simpler and faster than LSTM, while still solving the problems of RNNs.

---

## 🔍 Why GRU Was Needed

### 🧠 Problem with RNN:
- Forgets long-term information
- Suffers from vanishing gradient problem (it stops learning)
- Not good for long sentences or time-series data

### ✅ Solution: LSTM and GRU
- Both are improved versions of RNN
- GRU is simpler than LSTM but works nearly as well
- GRU is faster to train (fewer parts)

---

## � GRU Architecture – In Simple Words

Imagine GRU as a smart box with:

| Part | Purpose | Emoji |
|------|---------|-------|
| Update Gate | Should I keep old memory or add new? | 🔁 |
| Reset Gate | Should I forget the past memory? | 🔄 |
| Hidden State | This is the memory carried to the next step | 🧠 |

---

## 💡 How GRU Works (Step by Step):
1. It looks at the current input (like a word or number)
2. It also checks the previous memory
3. Reset gate decides: "Should I forget the old memory?"
4. Update gate decides: "Should I keep old memory or add new?"
5. It mixes the old and new information to create the final memory
6. This memory is passed to the next time step

---

## 🧪 GRU vs LSTM vs RNN – Easy Comparison

| Feature | RNN | LSTM | GRU |
|---------|-----|------|-----|
| Memory Ability | Weak (forgets easily) | Strong (remembers long-term) | Strong (like LSTM) |
| Gates Used | None | 3 gates (input, forget, output) | 2 gates (update, reset) |
| Cell State | No | Yes (cell + hidden) | No (only hidden state) |
| Speed | Fast | Slow | Faster than LSTM |
| Simplicity | Simple | Complex | Medium (simpler than LSTM) |
| Best For | Short sequences | Long and complex sequences | Long sequences, faster tasks |

---

## 🔄 GRU Architecture Diagram

```plaintext
        ┌──────────────┐
        │ Prev hidden  │
        └─────┬────────┘
              │
        ┌─────▼────────┐
        │ Update Gate  │ ← Should I keep old memory?
        └─────┬────────┘
              │
        ┌─────▼────────┐
        │ Reset Gate   │ ← Should I forget past?
        └─────┬────────┘
              │
        ┌─────▼────────┐
        │ Candidate h~ │ ← New memory (with reset info)
        └─────┬────────┘
              │
        ┌─────▼────────┐
        │  Final h(t)  │ ← Combined output (new + old)
        └──────────────┘
```

## 🧪 GRU Gates and Equations

### 🔁 1. Update Gate – How much to update memory?
Controls how much of the previous hidden state to keep.

```
zₜ = σ(W_z⋅[hₜ₋₁, xₜ] + b_z)

Expend or Explain the Formula

z_t = \sigma(W_z \cdot [h_{t-1}, x_t] + b_z)
```
- If zₜ ≈ 1: Keep old memory (maintain previous state)
- If zₜ ≈ 0: Use new memory (update with current input)

### 🔄 2. Reset Gate – How much past to forget?
Controls how much of the past to ignore.
```
rₜ = σ(W_r⋅[hₜ₋₁, xₜ] + b_r)

Expend or Explain the Formula

r_t = \sigma(W_r \cdot [h_{t-1}, x_t] + b_r)
```
- If rₜ ≈ 0: Forget past
- If rₜ ≈ 1: Remember all past

### 🧠 3. Candidate Hidden State – New memory to consider

```
h̃ₜ = tanh(W_h⋅[rₜ∗hₜ₋₁, xₜ] + b_h)

```
Combines current input and past memory  
Adjusted by reset gate

### 📤 Final Hidden State – The updated memory/output

```
hₜ = (1−zₜ)∗hₜ₋₁ + zₜ∗h̃ₜ
```

Mixes old memory and new memory  
Controlled by update gate

---

## 🔄 Full Flow of GRU at Time Step t
1. Take current input xₜ and previous output hₜ₋₁
2. Compute update gate zₜ
3. Compute reset gate rₜ
4. Calculate new memory h̃ₜ
5. Combine using zₜ to get new hidden state hₜ

---

## 💡 Real-Life Example (To Understand GRU)
Imagine you're watching a TV drama series:
- You remember main storylines (update gate keeps useful info)
- You forget boring details like what color clothes someone wore (reset gate forgets unimportant parts)
- Every new episode adds to your memory without confusing you

That's how GRU works!

---

## 📦 Where GRU is Used

| Use Case | What GRU Does |
|----------|---------------|
| Language Translation | Remembers sentence structure |
| Sentiment Analysis | Understands meaning in reviews |
| Time Series Forecasting | Predicts future stock/weather |
| Chatbots | Keeps conversation flow |
| Speech Recognition | Understands spoken words over time |

---

## 📌 Final Summary – GRU in One Shot
✅ GRU is a smart RNN that learns what to remember and what to forget  
🔁 Uses two gates (Update & Reset)  
🧠 Keeps a single hidden state (no cell state like LSTM)  
⚡ Works well for long sequences and is faster than LSTM  
💬 Perfect for tasks like text, speech, and time series

## **✅ GRU Model (Single Layer) with IMDB Dataset in Keras**

In [None]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense
from tensorflow.keras.preprocessing.sequence import pad_sequences

# 1. Load the IMDB Dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)

# 2. Pad Sequences to have the same length
x_train = pad_sequences(x_train, maxlen=100)
x_test = pad_sequences(x_test, maxlen=100)

# 3. Define the GRU Model (Single Layer)
model = Sequential([
    Embedding(10000, 32, input_length=100),  # Converts words to vectors
    GRU(32),                                 # Single GRU Layer
    Dense(1, activation='sigmoid')           # Output layer for binary classification
])

# 4. Compile the Model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# 5. Model Summary
model.summary()

# 6. Train the Model
history = model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.2)
