In [1]:
from helping_file import *

# Residual Networks
Objectives: 

- Implement the basic building blocks of ResNets in a deep neural network using Keras
- Put together these building blocks to implement and train a state-of-the-art neural network for image classification
- Implement a skip connection in the network

## 1 - The Problem of Very Deep Neural Networks
### 🧠 Why Very Deep Networks Are Powerful — and Challenging

In recent years, **we** have seen a shift from shallow architectures like *AlexNet* to networks with **hundreds of layers**. The motivation is clear:

> **Deeper networks** allow us to model **more complex functions** and learn **hierarchical representations**, from low-level features like edges to high-level abstractions.

However, as **we** deepen networks, **we** face a major challenge:

> ⚠️ **Vanishing gradients** during training make it difficult to optimize the earlier layers.

As **we** apply backpropagation, gradients are repeatedly multiplied by weight matrices. This can cause:
- Gradients to **shrink exponentially** (*vanishing gradients*)
- Or occasionally **explode** (*unstable training*)

The result?

> 🧩 Gradients in early layers become **too small to learn effectively**, and training slows or stalls.

This understanding led **us** to explore solutions like **ResNets** and **skip connections**, which help preserve gradient flow and enable the successful training of very deep networks.


## 2 - Building a Residual Network
In ResNets, a "shortcut" or a "skip connection" allows the model to skip layers:
<img src="images/skip_connection_kiank.png" style="width:650px;height:200px;">
<caption><center> <u> <font color='purple'> <b>Figure 2</b> </u><font color='purple'>  : A ResNet block showing a skip-connection <br> </center></caption>

### 2.1 - The Identity Block
<img src="images/idblock2_kiank.png" style="width:650px;height:150px;">
<caption><center> <u> <font color='purple'> <b>Figure 3</b> </u><font color='purple'>  : <b>Identity block.</b> Skip connection "skips over" 2 layers. </center></caption>

In [2]:

# Données d'entrée factices
np.random.seed(1)
tf.random.set_seed(2)

X1 = np.ones((1, 4, 4, 3)) * -1
X2 = np.ones((1, 4, 4, 3)) * 1
X3 = np.ones((1, 4, 4, 3)) * 3
X = np.concatenate((X1, X2, X3), axis=0).astype(np.float32)

# Avec training=False
print('\033[1mWith training=False\033[0m\n')
A3 = identity_block(X, f=2, filters=[4, 4, 3],
                    initializer=lambda seed=0: Constant(1.0),
                    training=False)
A3np = A3.numpy()
print(np.around(A3np[:, (0, -1), :, :].mean(axis=3), 5))
print(A3np[1, -1, 0, 0])

# Avec training=True
print('\n\033[1mWith training=True\033[0m\n')
A4 = identity_block(X, f=2, filters=[3, 3, 3],
                    initializer=lambda seed=0: Constant(1.0),
                    training=True)
A4np = A4.numpy()
print(np.around(A4np[:, (0, -1), :, :].mean(axis=3), 5))
print(A4np[1, -1, 0, 0])


[1mWith training=False[0m

[[[  0.        0.        0.        0.     ]
  [  0.        0.        0.        0.     ]]

 [[192.71233 192.71233 192.71233  96.85616]
  [ 96.85616  96.85616  96.85616  48.92808]]

 [[578.13684 578.13684 578.13684 290.56848]
  [290.56848 290.56848 290.56848 146.78424]]]
96.85616

[1mWith training=True[0m

[[[0.      0.      0.      0.     ]
  [0.      0.      0.      0.     ]]

 [[0.40739 0.40739 0.40739 0.40739]
  [0.40739 0.40739 0.40739 0.40739]]

 [[4.99991 4.99991 4.99991 3.25948]
  [3.25948 3.25948 3.25948 2.40739]]]
0.40739083


### 2.2 - The Convolutional Block
<img src="images/convblock_kiank.png" style="width:650px;height:150px;">
<caption><center> <u> <font color='purple'> <b>Figure 4</b> </u><font color='purple'>  : <b>Convolutional block</b> </center></caption>

### 🔄 ResNet Convolutional Block (When Dimensions Change)

In our deep network architectures, we use **convolutional blocks** when the **input and output dimensions differ**. These blocks allow us to **maintain residual connections** even when spatial or depth dimensions change.

In these blocks, **we include a CONV2D layer in the shortcut path**. This allows us to reshape the input to match the dimensions of the output from the main path. 

🧩 This is crucial for enabling the addition operation at the end of the block.

#### 🚧 Why we need a CONV layer in the shortcut
- Without matching dimensions, the shortcut \( x \) and the main path output couldn't be added.
- So we use a **1x1 convolution with stride \( s \)** on the shortcut to reduce or match dimensions.
- **No activation** is used on this shortcut path—only a **linear transformation** (learned weights).

---

### 🛠 Main Path Details

1. **First Component:**
   - CONV2D with \( F_1 \) filters, **(1,1)** kernel, stride \( (s,s) \), padding = `"valid"`
   - `BatchNorm` along channels
   - `ReLU` activation

2. **Second Component:**
   - CONV2D with \( F_2 \) filters, **(f,f)** kernel, stride \( (1,1) \), padding = `"same"`
   - `BatchNorm` along channels
   - `ReLU` activation

3. **Third Component:**
   - CONV2D with \( F_3 \) filters, **(1,1)** kernel, stride \( (1,1) \), padding = `"valid"`
   - `BatchNorm` (no `ReLU` here!)

---

### ✂️ Shortcut Path

- CONV2D with \( F_3 \) filters, **(1,1)** kernel, stride \( (s,s) \), padding = `"valid"`
- `BatchNorm` (no activation)

---

### ➕ Final Step

- We **add** the main path and shortcut outputs.
- Then apply **ReLU** to the result.

> This structure allows us to build **deeper** networks while maintaining efficient training and **avoiding vanishing gradients**.


## 🧠 3 - Building Our First ResNet-50 Model

We now have all the necessary building blocks to construct a very deep **Residual Network**: **ResNet-50**. Below is a step-by-step breakdown of the architecture.
<img src="images/resnet_kiank.png" style="width:850px;height:150px;">
<caption><center> <u> <font color='purple'> <b>Figure 5</b> </u><font color='purple'>  : <b>ResNet-50 model</b> </center></caption>


📌 In the diagram:
- “ID BLOCK” = Identity block
- “ID BLOCK x3” = Stack 3 identity blocks sequentially

---

### 📐 ResNet-50 Architecture Details

#### 🔹 Input
- We start with **zero-padding** of the input with `(3, 3)` on height and width.

---

#### 🔸 Stage 1
- **Conv2D**: 64 filters of size `(7,7)`, stride = `(2,2)`
- **BatchNorm**: applied to the channels axis
- **MaxPooling**: size `(3,3)`, stride = `(2,2)`

---

#### 🔸 Stage 2
- **Convolutional block**: filters = `[64, 64, 256]`, kernel size `f = 3`, stride `s = 1`
- **2 Identity blocks**: filters = `[64, 64, 256]`, `f = 3`

---

#### 🔸 Stage 3
- **Convolutional block**: filters = `[128, 128, 512]`, `f = 3`, `s = 2`
- **3 Identity blocks**: filters = `[128, 128, 512]`, `f = 3`

---

#### 🔸 Stage 4
- **Convolutional block**: filters = `[256, 256, 1024]`, `f = 3`, `s = 2`
- **5 Identity blocks**: filters = `[256, 256, 1024]`, `f = 3`

---

#### 🔸 Stage 5
- **Convolutional block**: filters = `[512, 512, 2048]`, `f = 3`, `s = 2`
- **2 Identity blocks**: filters = `[512, 512, 2048]`, `f = 3`

---

#### 🔹 Final Layers
- **AveragePooling2D**: pool size = `(2,2)`
- **Flatten**
- **Dense (Fully Connected)**: output = `number of classes`, activation = `softmax`

---

### 🧠 Why ResNet-50 Works So Well

By stacking **convolutional** and **identity blocks** together while using **shortcut connections**, we make it easy for the network to learn identity mappings. This helps prevent vanishing gradients and enables us to train much deeper networks, like ResNet-50.



In [3]:
model = ResNet50(input_shape = (64, 64, 3), classes = 6)
print(model.summary())

None


In [4]:
np.random.seed(1)
tf.random.set_seed(2)
opt = tf.keras.optimizers.Adam(learning_rate=0.00015)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

The model is now ready to be trained. The only thing we need now is a dataset!

Let's load our old friend, the SIGNS dataset.

<img src="images/signs_data_kiank.png" style="width:450px;height:250px;">
<caption><center> <u> <font color='purple'> <b>Figure 6</b> </u><font color='purple'>  : <b>SIGNS dataset</b> </center></caption>


In [5]:
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

# Normalize image vectors
X_train = X_train_orig / 255.
X_test = X_test_orig / 255.

# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T

print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))

number of training examples = 1080
number of test examples = 120
X_train shape: (1080, 64, 64, 3)
Y_train shape: (1080, 6)
X_test shape: (120, 64, 64, 3)
Y_test shape: (120, 6)


In [6]:
model.fit(X_train, Y_train, epochs = 10, batch_size = 32)

Epoch 1/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 607ms/step - accuracy: 0.2934 - loss: 2.0947
Epoch 2/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 677ms/step - accuracy: 0.5510 - loss: 1.2235
Epoch 3/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 739ms/step - accuracy: 0.7618 - loss: 0.6352
Epoch 4/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 867ms/step - accuracy: 0.8943 - loss: 0.2789
Epoch 5/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 1s/step - accuracy: 0.9371 - loss: 0.1808
Epoch 6/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 904ms/step - accuracy: 0.9423 - loss: 0.1517
Epoch 7/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 1s/step - accuracy: 0.9552 - loss: 0.1361
Epoch 8/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 1s/step - accuracy: 0.9536 - loss: 0.1469
Epoch 9/10
[1m34/34[0m [32m━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x26b3b0b4ec0>

In [None]:
preds = model.evaluate(X_test, Y_test)
print ("Loss = " + str(preds[0]))
print ("Test Accuracy = " + str(preds[1]))

In [None]:
pre_trained_model = load_model('resnet50.h5')

In [None]:
preds = pre_trained_model.evaluate(X_test, Y_test)
print ("Loss = " + str(preds[0]))
print ("Test Accuracy = " + str(preds[1]))

**IMPORTANT**:

- Very deep "plain" networks don't work in practice because vanishing gradients make them hard to train.  
- Skip connections help address the Vanishing Gradient problem. They also make it easy for a ResNet block to learn an identity function. 
- There are two main types of blocks: The **identity block** and the **convolutional block**. 
- Very deep Residual Networks are built by stacking these blocks together.