**<h2>Basics of Sequential Data</h2>**

Sequential data is data where the order of data points matters. Examples include time series, natural language text, and audio signals. Unlike traditional feedforward neural networks that treat inputs as independent, Recurrent Neural Networks (RNNs) are designed to handle sequential data by maintaining a hidden state that captures information from previous inputs.

**<h2>Introduction to Recurrent Neural Networks (RNNs)</h2>**

RNNs are a class of neural networks that are particularly effective for sequential data. They have connections that loop back on themselves, allowing information to persist over time. This capability makes them suitable for tasks such as language modeling, time series prediction, and sequence-to-sequence tasks.

**<h2>Key Components of RNNs</h2>**

RNNs consist of several types of layers, each serving a specific purpose in handling sequential data. Below, we describe these layers in detail, along with code examples for building, fitting, and predicting using these layers.

**<h2>1. Simple RNN Layer</h2>**

Purpose: To process sequential data by maintaining a hidden state that captures information from previous inputs.

**Example: Simple RNN**

In [10]:
import tensorflow as tf
from tensorflow.keras import layers, models

# Generate dummy sequential data
import numpy as np
X_train = np.random.random((100, 10, 1))  # 100 samples, 10 timesteps, 1 feature
y_train = np.random.randint(2, size=(100, 1))

# Build a simple RNN model
model = models.Sequential([
    layers.SimpleRNN(32, activation='relu', input_shape=(10, 1)),
    layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

# Fit the model
model.fit(X_train, y_train, epochs=10, batch_size=32)

# Predict
predictions = model.predict(X_train)
print(predictions)

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn (SimpleRNN)      (None, 32)                1088      
                                                                 
 dense_7 (Dense)             (None, 1)                 33        
                                                                 
Total params: 1121 (4.38 KB)
Trainable params: 1121 (4.38 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [11]:
import tensorflow as tf
from tensorflow.keras import layers, models

# Generate dummy sequential data
import numpy as np
X_train = np.random.random((100, 10, 1))  # 100 samples, 10 timesteps, 1 feature
y_train = np.random.randint(2, size=(100, 1))

# Build a simple RNN model
model = models.Sequential([
    layers.SimpleRNN(32, activation='relu', input_shape=(10, 1), return_sequences=True),
    layers.SimpleRNN(32, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

# Fit the model
model.fit(X_train, y_train, epochs=10, batch_size=32)

# Predict
predictions = model.predict(X_train)
print(predictions)

Model: "sequential_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_1 (SimpleRNN)    (None, 10, 32)            1088      
                                                                 
 simple_rnn_2 (SimpleRNN)    (None, 32)                2080      
                                                                 
 dense_8 (Dense)             (None, 1)                 33        
                                                                 
Total params: 3201 (12.50 KB)
Trainable params: 3201 (12.50 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


**<h2>2. Long Short-Term Memory (LSTM) Networks</h2>**

Purpose: To overcome the limitations of simple RNNs in capturing long-term dependencies by using a more complex architecture with gates to control the flow of information.

**Example: LSTM**

In [12]:
# Generate dummy sequential data
X_train = np.random.random((100, 10, 1))
y_train = np.random.randint(2, size=(100, 1))

# Build an LSTM model
model = models.Sequential([
    layers.LSTM(32, activation='tanh', input_shape=(10, 1)),
    layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()


# Fit the model
model.fit(X_train, y_train, epochs=10, batch_size=32)

# Predict
predictions = model.predict(X_train)
print(predictions)


Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm (LSTM)                 (None, 32)                4352      
                                                                 
 dense_9 (Dense)             (None, 1)                 33        
                                                                 
Total params: 4385 (17.13 KB)
Trainable params: 4385 (17.13 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


**<h2>3. Gated Recurrent Unit (GRU)</h2>**

Purpose: To offer a simpler alternative to LSTMs while still addressing the vanishing gradient problem, providing a good balance between simplicity and performance.

**Example: GRU**

In [13]:
# Generate dummy sequential data
X_train = np.random.random((100, 10, 1))
y_train = np.random.randint(2, size=(100, 1))

# Build a GRU model
model = models.Sequential([
    layers.GRU(32, activation='tanh', input_shape=(10, 1)),
    layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()


# Fit the model
model.fit(X_train, y_train, epochs=10, batch_size=32)

# Predict
predictions = model.predict(X_train)
print(predictions)

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 gru (GRU)                   (None, 32)                3360      
                                                                 
 dense_10 (Dense)            (None, 1)                 33        
                                                                 
Total params: 3393 (13.25 KB)
Trainable params: 3393 (13.25 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


**<h2>4. TimeDistributed Layer</h2>**

Purpose: To apply a layer (e.g., Dense) to every temporal slice of an input. This is useful when you need to apply the same operation across each time step independently, often used in combination with RNNs to handle sequential data where each time step is processed separately.

**Example: TimeDistributed**

Consider a sequence classification task where each time step in the sequence needs to be processed independently by a dense layer before passing the result to an RNN layer.

In [14]:
# Generate dummy sequential data
X_train = np.random.random((100, 10, 8))  # 100 samples, 10 timesteps, 8 features
y_train = np.random.randint(2, size=(100, 1))

# Build a model using TimeDistributed
model = models.Sequential([
    layers.TimeDistributed(layers.Dense(16, activation='relu'), input_shape=(10, 8)),
    layers.LSTM(32, activation='tanh'),
    layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

# Fit the model
model.fit(X_train, y_train, epochs=10, batch_size=32)

# Predict
predictions = model.predict(X_train)
print(predictions)


Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 time_distributed (TimeDist  (None, 10, 16)            144       
 ributed)                                                        
                                                                 
 lstm_1 (LSTM)               (None, 32)                6272      
                                                                 
 dense_12 (Dense)            (None, 1)                 33        
                                                                 
Total params: 6449 (25.19 KB)
Trainable params: 6449 (25.19 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


**<h2>Summary</h2>**

- Sequential Data: Data where the order of data points is important.
- Simple RNN: Basic recurrent network maintaining a hidden state over time.
- LSTM: Advanced RNN designed to capture long-term dependencies using gates.
- GRU: Simplified version of LSTM offering a good balance between performance and computational efficiency.
- TimeDistributed: Applies a layer to each temporal slice of input, useful for processing each time step independently in a sequence.

In [17]:
output = layers.LSTM(32, activation='tanh')(np.random.random((100, 10, 1)))
output.shape

TensorShape([100, 32])

In [18]:
output = layers.LSTM(32, activation='tanh', return_sequences=True)(np.random.random((100, 10, 1)))
output.shape

TensorShape([100, 10, 32])

In [22]:
output, h, c = layers.LSTM(32, activation='tanh', return_state=True)(np.random.random((100, 10, 1)))
print(f"Output: {output.shape}")
print(f"Hidden State: {h.shape}") # output = h
print(f"Cell State: {c.shape}")


Output: (100, 32)
Hidden State: (100, 32)
Cell State: (100, 32)


In [24]:
output, h, c = layers.LSTM(32, activation='tanh', return_sequences=True, return_state=True)(np.random.random((100, 10, 1)))
print(f"Output: {output.shape}")
print(f"Hidden State: {h.shape}") # h corresponds to the last element of output
print(f"Cell State: {c.shape}")

Output: (100, 10, 32)
Hidden State: (100, 32)
Cell State: (100, 32)


**<h2>5. Output Shapes and Paramters</h2>**

1. **Output Shapes**:
   - **Default**: Returns `(None, units)`.
   - **With `return_sequences=True`**: Returns `(None, sequence, units)`.
   - **With `return_state=True`**: Returns two outputs: `(None, units)` and `(None, units)`.
   - **With `return_sequences=True` and `return_state=True`**: Returns two outputs: `(None, sequence, units)` and `(None, units)`.

   **Note**: This is applicable for GRU. For LSTM, `return_state=True` returns two states: `(None, units)` for `hidden_state` and `(None, units)` for `cell_state`.

2. **Parameters**:
   - The input consists of two concatenated vectors: `x` (the last dimension of the input: `shape[-1]`) and `h`. `x` is the input token and `h` is the hidden state.
   - **RNN**: `(x + h) × h + h`
   - **LSTM**: `4 × [(x + h) × h + h]`
   - **GRU**: `3 × [(x + h) × h + h]`
