# Week 5 - Residual Connections and Layer Normalization

### 1. Introduction & Objectives

This notebook focuses on improving a neural network by incorporating advanced techniques, specifically residual connections and layer normalization. These enhancements are widely used in modern deep learning architectures to improve performance and stability. The primary objectives of this notebook are:

1. Understand the concept and implementation of residual connections in neural networks.
2. Learn how to effectively integrate layer normalization into a neural network model.


### 2. Data Understanding

In this notebook, we will use the [IMDB movie reviews dataset](https://keras.io/api/datasets/imdb/), a widely recognized benchmark for binary sentiment classification. The dataset consists of 50,000 reviews, evenly split into 25,000 for training and 25,000 for testing. Each review is labeled as either positive (1) or negative (0), making it ideal for binary classification tasks.

The reviews have been preprocessed and are represented as lists of word indexes (integers). Words are indexed based on their overall frequency in the dataset, where, for example, the integer "3" represents the third most frequent word. This indexing system enables efficient filtering, such as limiting the vocabulary to the top 10,000 most frequent words while excluding overly common words, like the top 20.

#### 2.1 Importing Libraries and Loading the Data

We begin by importing the necessary libraries and loading the IMDB movie reviews dataset using the Keras API. The dataset is preprocessed and tokenized, allowing us to focus on model development and training.

In [1]:
# Importing Libraries
from keras.api.datasets import imdb
from keras.api.preprocessing.sequence import pad_sequences
from keras.src.layers import Input, Embedding, Attention, Add, LayerNormalization, Dense, Dropout, \
    GlobalAveragePooling1D
from keras.src.models.model import Model

2024-11-26 17:12:28.620274: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-26 17:12:28.726017: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-26 17:12:28.755969: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-26 17:12:28.946172: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Next, we will load the IMDB movie reviews dataset and explore its structure and content.

In [2]:
max_features = 10000  # Number of most frequent words to consider
maxlen = 200  # Maximum length of sequences

# Load the IMDB dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Pad sequences to a fixed length
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test = pad_sequences(x_test, maxlen=maxlen)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


The data is successfully loaded and preprocessed, ready for use in training and evaluation. We can now proceed to the next steps of building and enhancing our neural network model.

### 3. Building the Model

In this section, we will construct a neural network model for sentiment classification using the IMDB movie reviews dataset. The model will incorporate residual connections and layer normalization to improve performance and stability.

In [3]:
# Build the model with residual connections and layer normalization
inputs = Input(shape=(maxlen,))
embedding_layer = Embedding(max_features, 128)(inputs)

# Attention layer
attention_output = Attention()([embedding_layer, embedding_layer])

# Residual connection around attention layer
residual_attention_output = Add()([embedding_layer, attention_output])

# Layer normalization after attention layer
normalized_output = LayerNormalization()(residual_attention_output)

# Dense layers for binary classification
x = GlobalAveragePooling1D()(normalized_output)
x = Dense(64, activation='relu')(x)
x = Dropout(0.5)(x)
outputs = Dense(1, activation='sigmoid')(x)

# Create the model
model = Model(inputs=inputs, outputs=outputs)

# Summary of the model
model.summary()

I0000 00:00:1732633957.454238    4944 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1732633957.647732    4944 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1732633957.647790    4944 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1732633957.652934    4944 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1732633957.652989    4944 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:0

The model has been created successfully, incorporating residual connections and layer normalization for improved performance. The next step is to compile and train the model on the IMDB movie reviews dataset.

### 4. Compiling and Fitting the Model

In this section, we will compile and train the neural network model on the IMDB movie reviews dataset. We will use binary cross-entropy as the loss function and the Adam optimizer for training.

In [4]:
# Compile the model
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(x_train, y_train,
                    epochs=5,
                    batch_size=32,
                    validation_split=0.2)

Epoch 1/5


I0000 00:00:1732633959.883394    5019 service.cc:146] XLA service 0x7f9bac0071e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1732633959.883453    5019 service.cc:154]   StreamExecutor device (0): NVIDIA GeForce RTX 4060 Ti, Compute Capability 8.9
2024-11-26 17:12:39.923510: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-11-26 17:12:40.090602: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8907



[1m 99/625[0m [32m━━━[0m[37m━━━━━━━━━━━━━━━━━[0m [1m0s[0m 2ms/step - accuracy: 0.5503 - loss: 0.6928

I0000 00:00:1732633962.480465    5019 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 4ms/step - accuracy: 0.6941 - loss: 0.5533 - val_accuracy: 0.8662 - val_loss: 0.3254
Epoch 2/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8883 - loss: 0.2755 - val_accuracy: 0.8684 - val_loss: 0.3101
Epoch 3/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9245 - loss: 0.1863 - val_accuracy: 0.8636 - val_loss: 0.3541
Epoch 4/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9538 - loss: 0.1277 - val_accuracy: 0.8586 - val_loss: 0.3809
Epoch 5/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9650 - loss: 0.0987 - val_accuracy: 0.8716 - val_loss: 0.4270


The model has been successfully compiled and trained on the training data. We have also stored the training history to analyze the model's performance over time. The next step is to evaluate the model on the test set to assess its effectiveness in sentiment classification.

### 5. Evaluating the Model

Finally, we will evaluate the performance of the neural network model on the test set of the IMDB movie reviews dataset. We will calculate the accuracy and loss metrics to assess the model's effectiveness in sentiment classification.

In [5]:
# Evaluate the model on the test set
loss, accuracy = model.evaluate(x_test, y_test)

print(f'Test Loss: {loss:.4f}, Test Accuracy: {accuracy * 100:.2f}%')

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.8614 - loss: 0.4606
Test Loss: 0.4673, Test Accuracy: 85.75%


The model achieved a Test Accuracy of **85.75%** and a Test Loss of **0.4673**, indicating that it performs well on the sentiment classification task. The incorporation of residual connections and layer normalization has helped improve the model's performance and stability.

### 6. Conclusion

In this notebook, we successfully implemented a neural network model for sentiment classification using the IMDB movie reviews dataset. By incorporating residual connections and layer normalization, we enhanced the model's performance and stability, achieving a Test Accuracy of **85.75%**. These advanced techniques are widely used in modern deep learning architectures to improve training efficiency and model effectiveness.