# üß† Recurrent Neural Networks (RNN) - In-depth Notes


Recurrent Neural Networks (RNNs) are a class of neural networks designed for **sequential data**.
They are widely used in Natural Language Processing (NLP), Time Series Forecasting, Speech Recognition, etc.

Unlike traditional feedforward neural networks, RNNs have **loops** that allow information to persist, making them ideal for learning patterns in sequences.



## üì¶ Applications of RNNs

- Language Modeling and Text Generation
- Sentiment Analysis
- Machine Translation
- Time Series Forecasting
- Speech Recognition



## üèóÔ∏è Basic RNN Architecture

An RNN processes a sequence of inputs by maintaining a **hidden state** `h_t` which is updated at each time step:

**Equations:**
- Hidden state: \( h_t = 	anh(W_{hh} h_{t-1} + W_{xh} x_t + b_h) \)
- Output: \( y_t = W_{hy} h_t + b_y \)

Where:
- \( x_t \): input at time t
- \( h_t \): hidden state at time t
- \( W \): weight matrices
- \( b \): bias terms

The output at each time step depends on the current input and previous hidden state.



### üîÅ Unrolled RNN (for 3 time steps)

Each `( )` is an RNN cell sharing the same weights.


## üîß Code Example: Simple RNN for Sentiment Classification (IMDb Dataset using Keras)

In [None]:

from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Embedding, Dense
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Load IMDb dataset
vocab_size = 10000  # Only consider the top 10k words
maxlen = 200  # Max length of each sequence

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test = pad_sequences(x_test, maxlen=maxlen)

# Build the RNN model
model = Sequential([
    Embedding(vocab_size, 32, input_length=maxlen),
    SimpleRNN(32),  # RNN Layer
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

# Train model
model.fit(x_train, y_train, epochs=2, batch_size=64, validation_split=0.2)



## ‚ö†Ô∏è Limitations of Vanilla RNNs

- **Vanishing/Exploding Gradients**: During backpropagation, gradients can become too small or too large, making training difficult.
- **Short-term memory**: Struggles with long-term dependencies.
- **Slow Training**: Sequential computation is less parallelizable.

‚úÖ These problems are solved by advanced architectures like LSTM and GRU.



## üîç Alternatives to Vanilla RNNs

### 1. Long Short-Term Memory (LSTM)
- Designed to combat vanishing gradient problem.
- Uses **gates** (input, forget, output) to control flow of information.

### 2. Gated Recurrent Unit (GRU)
- A simpler version of LSTM with fewer gates (reset and update).
- Efficient and performs comparably to LSTM on many tasks.


## üîß Code Example: LSTM for IMDb

In [None]:

from tensorflow.keras.layers import LSTM

# LSTM model
model_lstm = Sequential([
    Embedding(vocab_size, 32, input_length=maxlen),
    LSTM(32),  # LSTM Layer
    Dense(1, activation='sigmoid')
])

model_lstm.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model_lstm.summary()

# Train model
model_lstm.fit(x_train, y_train, epochs=2, batch_size=64, validation_split=0.2)



## ‚úÖ Summary

| Model | Memory | Complexity | Handles Long-term Dependency |
|-------|--------|------------|-------------------------------|
| RNN   | üü† Short | Low        | ‚ùå No                        |
| LSTM  | üü¢ Long  | High       | ‚úÖ Yes                       |
| GRU   | üü¢ Long  | Medium     | ‚úÖ Yes                       |

RNNs are foundational in sequence modeling. However, for most practical applications today, **LSTM or GRU** are preferred.
