![Alt text](gru1.png)

# Gated Recurrent Units (GRU)

## Overview
Gated Recurrent Units (GRUs) are a type of Recurrent Neural Network (RNN) designed to capture dependencies in sequential data. They are similar to Long Short-Term Memory (LSTM) networks but with a simpler architecture, making them faster and more efficient.

## Architecture of GRU

A GRU unit consists of:

1. **Hidden State (\(h_t\))**: Represents the current memory state of the network.
2. **Gates**: GRUs have two main gates that control the flow of information:
   - **Update Gate (\(z_t\))**: Determines how much of the past information needs to be passed along to the future.
   - **Reset Gate (\(r_t\))**: Determines how much of the past information to forget.

### Mathematical Representation

1. **Update Gate**:
   $$
   z_t = \sigma(W_z \cdot [h_{t-1}, x_t] + b_z)
   $$

2. **Reset Gate**:
   $$
   r_t = \sigma(W_r \cdot [h_{t-1}, x_t] + b_r)
   $$

3. **Candidate Hidden State**:
   $$
   \tilde{h}_t = \tanh(W_h \cdot [r_t * h_{t-1}, x_t] + b_h)
   $$

4. **Update Hidden State**:
   $$
   h_t = (1 - z_t) * h_{t-1} + z_t * \tilde{h}_t
   $$

### Summary of GRU Equations
- Update gate: \( z_t \)
- Reset gate: \( r_t \)
- Candidate hidden state: \( \tilde{h}_t \)
- Hidden state update: \( h_t \)

## Use Cases of GRU

GRUs are widely used in various applications, including:

1. **Natural Language Processing (NLP)**:
   - Language modeling
   - Machine translation
   - Sentiment analysis

2. **Time Series Forecasting**:
   - Predicting stock prices
   - Weather forecasting

3. **Speech Recognition**:
   - Converting audio signals to text.

4. **Music Generation**:
   - Composing melodies based on previous notes.

5. **Video Analysis**:
   - Activity recognition in video streams.

## Advantages of GRU

- **Simplicity**: GRUs have a simpler architecture compared to LSTMs, making them faster to train.
- **Fewer Parameters**: They require fewer parameters than LSTMs, which can reduce the risk of overfitting.

## Disadvantages of GRU

- **Less Control**: The simpler architecture may limit their ability to model complex dependencies compared to LSTMs in some cases.
- **Performance Variability**: Depending on the task, GRUs may not always outperform LSTMs.

## Implementation in TensorFlow/Keras

Here’s a basic example of how to implement a GRU in TensorFlow/Keras:

```python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense

# Define the model
model = Sequential()
model.add(GRU(50, input_shape=(timesteps, features)))  # 50 GRU units
model.add(Dense(1))  # Output layer

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Fit the model
model.fit(X_train, y_train, epochs=50, batch_size=32)


- basic rnn suffer from short term memory problem
- a network which can remember important keyword <br>
  - LSTM have long term memory and short term memory
  - GRU combines long term and short term memory into its hidden state


- LSTM have 3 gates 
  - input output and forget
- GRU have only 2 gates 
  - update gate (how much of past memory to retain)
  - reset gate (how much of past memory to forget)
