Recurrent Neural Networks (RNNs) & Long Short-Term Memory (LSTMs)

Recurrent Neural Networks (RNNs) are designed to handle sequential data such as text, speech, and time-series data. 

Unlike traditional neural networks, RNNs have a memory that allows them to learn from previous inputs.

Why Do We Need RNNs?

Traditional feedforward networks cannot handle sequential dependencies because they treat all inputs independently.

 Example:

"The cat is sitting on the ..." → The next word is likely "mat" (context matters).

A standard neural network would not remember previous words to predict the next word.

 Solution: Use RNNs to maintain context across time steps.

RNN Architecture

Each neuron in an RNN takes input from the previous time step.


ht=tanh(W xx t+W hh t−1+b)

Where:

x t= Input at time 𝑡

ht= Hidden state at time t (stores memory).

W x,W h = Learnable weights.

b = Bias term.

Implementing a Simple RNN (Text Prediction)

We'll build a simple RNN that predicts the next word in a sentence.

Step 1: Import Libraries

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import SimpleRNN, Dense, Embedding
import numpy as np


Step 2: Prepare Data (Text to Sequence)

In [None]:
# Example sentences
text = ["hello how are you", "I am fine thank you", "how about you"]

# Tokenization
tokenizer = keras.preprocessing.text.Tokenizer()
tokenizer.fit_on_texts(text)
sequences = tokenizer.texts_to_sequences(text)

# Create input (X) and output (Y) pairs
X = np.array([seq[:-1] for seq in sequences])  # Input sequence
y = np.array([seq[-1] for seq in sequences])   # Next word
vocab_size = len(tokenizer.word_index) + 1


Step 3: Build RNN Model

In [None]:
model = keras.Sequential([
    Embedding(input_dim=vocab_size, output_dim=10),  # Word embedding layer
    SimpleRNN(50, activation="tanh"),  # RNN layer with 50 units
    Dense(vocab_size, activation="softmax")  # Output layer
])

model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])


Step 4: Train the RNN

In [None]:
model.fit(X, y, epochs=100)

Long Short-Term Memory (LSTMs)

Why Do We Need LSTMs?

Problem with Vanilla RNNs:

They suffer from vanishing gradients, meaning they forget long-term dependencies.

 Solution: LSTMs introduce memory cells that store long-term information.

LSTM Cell Structure

LSTMs have gates that regulate memory flow:

Forget Gate 🚪: Decides what to remove.

Input Gate 📝: Decides what to store.

Output Gate 📤: Decides what to output.


ht=ot×tanh(Ct)

Implementing LSTM for Text Prediction

Step 1: Define LSTM Model

In [None]:
from tensorflow.keras.layers import LSTM

model = keras.Sequential([
    Embedding(input_dim=vocab_size, output_dim=10),
    LSTM(50, activation="tanh"),  # Replace SimpleRNN with LSTM
    Dense(vocab_size, activation="softmax")
])

model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])


Step 2: Train the LSTM

In [None]:
model.fit(X, y, epochs=100)


GRU (Gated Recurrent Unit) - An Alternative to LSTM

GRUs work like LSTMs but have fewer parameters, making them faster.

In [None]:
from tensorflow.keras.layers import GRU

model = keras.Sequential([
    Embedding(input_dim=vocab_size, output_dim=10),
    GRU(50, activation="tanh"),  # Use GRU instead of LSTM
    Dense(vocab_size, activation="softmax")
])


Summary: RNN vs. LSTM vs. GRU

**Model**	**Strength**	**Weakness**	**Best For**

**RNN**	   Works for short sequences	Vanishing gradient problem	Simple text data

**LSTM**	Handles long sequences	More parameters, slower	Sentiment analysis, speech recognition

**GRU**	Faster than LSTM	Less control over memory	Chatbots, real-time task