# **CHAPTER 15**
# **Processing Sequences Using RNNs and CNNs**

**Introduction to Sequential Data and RNNs**

This chapter introduces Recurrent Neural Networks (RNNs) as neural network architectures specifically designed to process sequential data. Unlike feedforward neural networks, which assume fixed-size inputs, RNNs can handle sequences of arbitrary length such as time series, text, speech, and audio signals. Humans naturally predict future events by observing sequences, and RNNs attempt to replicate this ability computationally.
RNNs are widely used in applications such as time series forecasting, speech recognition, machine translation, and autonomous driving. The chapter outlines the main challenges of RNNs, including unstable gradients and limited short-term memory, and introduces advanced solutions such as LSTM, GRU, CNN-based sequence models, and WaveNet architectures.


**Recurrent Neurons and Layers**

A recurrent neural network differs from a feedforward network because it has feedback connections. Each recurrent neuron receives not only the input at the current time step but also its own output from the previous time step. This feedback loop allows the network to maintain a form of memory.
Mathematically, the output of a recurrent layer at time step t is computed using both the current input and the previous output. The same weights are reused at every time step, which enables the network to generalize across sequences of varying lengths. This process can be visualized by “unrolling” the network across time steps, turning it into a deep network where each layer represents a different time step.


**Memory Cells**

The concept of a memory cell refers to any neural component that preserves information across time steps. In basic RNNs, the hidden state serves both as memory and output. However, this memory is short-lived and typically only captures patterns across a small number of time steps.
Formally, a memory cell maintains a hidden state h(t), which depends on the current input x(t)and the previous state h(t-1). More advanced memory cells, such as LSTM and GRU, extend this idea by introducing mechanisms that selectively store, forget, and retrieve information.


**Input and Output Sequences**

RNNs can be configured to handle different types of sequence problems:
•	Sequence-to-sequence: input and output are both sequences (e.g., time series prediction).
•	Sequence-to-vector: input is a sequence, output is a single value (e.g., sentiment analysis).
•	Vector-to-sequence: input is a single vector, output is a sequence (e.g., image captioning).
•	Encoder–Decoder: combines sequence-to-vector and vector-to-sequence architectures, commonly used in machine translation.
Each architecture is suited to different real-world tasks depending on how input and output data are structured.


**Training RNNs (Backpropagation Through Time)**

RNNs are trained using Backpropagation Through Time (BPTT). The network is unrolled across time steps, and standard backpropagation is applied to the resulting deep network. Gradients flow backward through time, accumulating contributions from each time step.
Because the same parameters are reused at each time step, gradients from all steps are summed together. While this approach is conceptually simple, it often leads to unstable gradients, especially for long sequences.


**Forecasting a Time Series**

Time series forecasting is a common application of RNNs. In this chapter, a synthetic univariate time series is generated using sine waves and noise.


In [4]:
import numpy as np

In [5]:
def generate_time_series(batch_size, n_steps):
    freq1, freq2, offsets1, offsets2 = np.random.rand(4, batch_size, 1)
    time = np.linspace(0, 1, n_steps)
    series = 0.5 * np.sin((time - offsets1) * (freq1 * 10 + 10))
    series += 0.2 * np.sin((time - offsets2) * (freq2 * 20 + 20))
    series += 0.1 * (np.random.rand(batch_size, n_steps) - 0.5)
    return series[..., np.newaxis].astype(np.float32)


In [6]:
n_steps = 50

# Generate dataset
series = generate_time_series(10000, n_steps + 1)
X_train, y_train = series[:7000, :n_steps], series[:7000, -1]
X_valid, y_valid = series[7000:9000, :n_steps], series[7000:9000, -1]
X_test, y_test = series[9000:, :n_steps], series[9000:, -1]

print(X_train.shape, y_train.shape)  # Cek bentuk data

(7000, 50, 1) (7000, 1)


**Baseline Models**

Before using RNNs, baseline models are evaluated:
•	Naive forecasting, which predicts the last observed value.
•	Linear regression using a Dense layer, which significantly improves performance.
These baselines are crucial for evaluating whether more complex models actually add value.


**Implementing a Simple RNN**

In [10]:
import tensorflow as tf
from tensorflow import keras

# Buat model RNN sederhana
model = keras.models.Sequential([
    keras.layers.SimpleRNN(1, input_shape=[None, 1])
])

model.summary()


  super().__init__(**kwargs)


**Deep RNNs**

Stacking multiple recurrent layers forms a Deep RNN, allowing the model to learn more complex temporal patterns.


In [11]:
model = keras.models.Sequential([
    keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]),
    keras.layers.SimpleRNN(20, return_sequences=True),
    keras.layers.SimpleRNN(1)
])


In [12]:
model = keras.models.Sequential([
    keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]),
    keras.layers.SimpleRNN(20),
    keras.layers.Dense(1)
])


**Forecasting Multiple Time Steps Ahead**

Two strategies are discussed:
1.	Predicting one step at a time and feeding predictions back into the model.
2.	Predicting multiple future steps simultaneously.
Sequence-to-sequence models provide better accuracy and training stability by producing outputs at every time step.


In [13]:
model = keras.models.Sequential([
    keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]),
    keras.layers.SimpleRNN(20, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(10))
])


**Handling Long Sequences**

Long sequences introduce two major problems:
•	Unstable gradients (exploding or vanishing).
•	Short-term memory limitations.
Techniques such as gradient clipping, layer normalization, and dropout help stabilize training.


**Layer Normalization in RNNs**

In [14]:
class LNSimpleRNNCell(keras.layers.Layer):
    def __init__(self, units, activation="tanh", **kwargs):
        super().__init__(**kwargs)
        self.state_size = units
        self.output_size = units
        self.simple_rnn_cell = keras.layers.SimpleRNNCell(units, activation=None)
        self.layer_norm = keras.layers.LayerNormalization()
        self.activation = keras.activations.get(activation)

    def call(self, inputs, states):
        outputs, new_states = self.simple_rnn_cell(inputs, states)
        norm_outputs = self.activation(self.layer_norm(outputs))
        return norm_outputs, [norm_outputs]


**LSTM Cells**

Long Short-Term Memory (LSTM) cells solve the short-term memory problem by maintaining separate long-term and short-term states. Gates control what information is stored, forgotten, and output.


In [15]:
model = keras.models.Sequential([
    keras.layers.LSTM(20, return_sequences=True, input_shape=[None, 1]),
    keras.layers.LSTM(20, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(10))
])


**GRU Cells**

GRU cells simplify LSTM architecture by combining gates and merging states, while achieving comparable performance.


In [16]:
model = keras.models.Sequential([
    keras.layers.GRU(20, return_sequences=True, input_shape=[None, 1]),
    keras.layers.GRU(20, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(10))
])


**Using 1D CNNs for Sequences**

1D convolutional layers can extract local temporal patterns and shorten sequences, making it easier for recurrent layers to learn long-term dependencies.


In [17]:
model = keras.models.Sequential([
    keras.layers.Conv1D(filters=20, kernel_size=4, strides=2, padding="valid",
                        input_shape=[None, 1]),
    keras.layers.GRU(20, return_sequences=True),
    keras.layers.GRU(20, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(10))
])


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


**WaveNet Architecture**

WaveNet uses dilated causal convolutions to efficiently model very long sequences without recurrence.


In [18]:
model = keras.models.Sequential()
model.add(keras.layers.InputLayer(input_shape=[None, 1]))
for rate in (1, 2, 4, 8) * 2:
    model.add(keras.layers.Conv1D(filters=20, kernel_size=2, padding="causal",
                                  activation="relu", dilation_rate=rate))
model.add(keras.layers.Conv1D(filters=10, kernel_size=1))




**Conclusion**

Chapter 15 demonstrates how RNNs and CNNs can be used to process sequential data effectively. While simple RNNs are useful for short sequences, advanced architectures like LSTM, GRU, and WaveNet are essential for handling long-term dependencies. Combining recurrent and convolutional layers often yields the best performance in real-world sequence modeling tasks.
