<a href="https://colab.research.google.com/github/Odima-dev/Data-Science-and-Machine-Learning/blob/main/ResNetandVGG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#Mounting Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Creating my project folder
!mkdir -p "/content/drive/MyDrive/ResnetandVGGProject"
project_folder = "/content/drive/MyDrive/ResnetandVGGProject"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [5]:
%cd /content/drive/MyDrive/ResnetandVGGProject

/content/drive/MyDrive/ResnetandVGGProject


In [6]:
!pwd

/content/drive/MyDrive/ResnetandVGGProject


**Problem 1: Code review**

**A. The difference between this downloaded code and the former simple implementation of U-Net**

1. Pretrained Encoder (ResNet50)
  * Rather than trying to train an encoder on its own, it imports a pre-trained `ResNet50` pre-trained on `ImageNet` data and takes its intermediary feature maps (as skip connections).

  * The encoder layers are initially frozen, which decreases overfitting with small data.

2. Decoder Design
  * Custom decoder blocks `(decoder_block_simple, decoder_block_bottleneck)` are introduced to upsample and fuse encoder features with skip connections (like U-Net).

  * This enhances reuse of features and minimizes the training time.

3. Loss Functions
  * Utilises combined losses `(BCE + Dice)`, and optionally Lovasz hinge loss, as opposed to using Binary Cross-Entropy, which are more suitable when dealing with segmentation.

4. IoU-based Metric
  * Uses IoU metric `(Kaggle competition metric)` and it separately optimizes segmentation threshold to have a better leaderboard score.

5. Feature Engineering
  * Utilizes other derived channels `(create_depth_abs_channels)` to assist the model learning contextual spatial data (such as depth).

6. Cross-Validation & Threshold Optimization
  * Takes advantage of stratified k-fold to attain a more efficient train/validation split.
  * Trains and optimizes the segmentation threshold according to IoU to maximize it.

**B. How transfer learning is Done**

* It loads `ResNet50` with `weights='imagenet'` and `include_top=False`.

* Also, intermediate feature maps `(conv1, res2c_branch2c, res3d_branch2c, res4f_branch2c, res5c_branch2c)` are used as skip connections.

* Decoder learning occurs from scratch, whereas the parameters of encoder are frozen at the beginning (or partially unfrozen during for fine-tuning).

In [None]:
# Problem 2: Code rewriting

import tensorflow as tf
import keras.backend as K
from keras.applications import VGG16
from keras.layers import *
from keras.models import Model
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau

# Defining IoU metric
def get_iou_vector(A, B):
    batch_size = A.shape[0]
    metric = 0.0
    for batch in range(batch_size):
        t, p = A[batch], B[batch]
        true = np.sum(t)
        pred = np.sum(p)
        if true == 0:
            metric += (pred == 0)
            continue
        intersection = np.sum(t * p)
        union = true + pred - intersection
        iou = intersection / union
        iou = np.floor(max(0, (iou - 0.45) * 20)) / 10
        metric += iou
    metric /= batch_size
    return metric

def my_iou_metric(label, pred):
    return tf.py_function(get_iou_vector, [label, pred > 0.5], tf.float64)

# Loss Functions
from keras.losses import binary_crossentropy

def dice_loss(y_true, y_pred):
    smooth = 1.
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = y_true_f * y_pred_f
    score = (2. * K.sum(intersection) + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
    return 1. - score

def bce_dice_loss(y_true, y_pred):
    return binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)

# Decoder Block (same as ResNet version)
def decoder_block_bottleneck(layer_name, block_name, num_filters=32, conv_dim=(3,3), dropout_frac=0.2):
    x_dec = Conv2D(num_filters, conv_dim, padding='same', name=f'{block_name}_conv1')(layer_name)
    x_dec = BatchNormalization(name=f'{block_name}_bn1')(x_dec)
    x_dec = PReLU(name=f'{block_name}_activation1')(x_dec)
    x_dec = Dropout(dropout_frac)(x_dec)

    x_dec2 = Conv2D(num_filters // 2, conv_dim, padding='same', name=f'{block_name}_conv2')(x_dec)
    x_dec2 = BatchNormalization(name=f'{block_name}_bn2')(x_dec2)
    x_dec2 = PReLU(name=f'{block_name}_activation2')(x_dec2)
    x_dec2 = Dropout(dropout_frac)(x_dec2)

    x_dec2 = Conv2D(num_filters, conv_dim, padding='same', name=f'{block_name}_conv3')(x_dec2)
    x_dec2 = BatchNormalization(name=f'{block_name}_bn3')(x_dec2)
    x_dec2 = PReLU(name=f'{block_name}_activation3')(x_dec2)
    x_dec2 = Dropout(dropout_frac)(x_dec2)

    x_dec2 = Add()([x_dec, x_dec2])
    return x_dec2

# U-Net with VGG16 Encoder
def unet_vgg(input_size=(224,224,3), decoder_block=decoder_block_bottleneck,
             weights='imagenet', loss_func='binary_crossentropy',
             metrics_list=[my_iou_metric], use_lovash=False):

    # Encoder
    base_model = VGG16(input_shape=input_size, include_top=False, weights=weights)
    encoder1 = base_model.get_layer('block1_conv2').output
    encoder2 = base_model.get_layer('block2_conv2').output
    encoder3 = base_model.get_layer('block3_conv3').output
    encoder4 = base_model.get_layer('block4_conv3').output
    encoder5 = base_model.get_layer('block5_conv3').output

    # Center
    center = decoder_block(encoder5, 'center', num_filters=512)
    concat5 = concatenate([center, encoder5], axis=-1)

    # Decoder path with skip connections
    decoder4 = decoder_block(concat5, 'decoder4', num_filters=256)
    concat4 = concatenate([UpSampling2D()(decoder4), encoder4], axis=-1)

    decoder3 = decoder_block(concat4, 'decoder3', num_filters=128)
    concat3 = concatenate([UpSampling2D()(decoder3), encoder3], axis=-1)

    decoder2 = decoder_block(concat3, 'decoder2', num_filters=64)
    concat2 = concatenate([UpSampling2D()(decoder2), encoder2], axis=-1)

    decoder1 = decoder_block(concat2, 'decoder1', num_filters=64)
    concat1 = concatenate([UpSampling2D()(decoder1), encoder1], axis=-1)

    # Final upsampling
    output = UpSampling2D()(concat1)
    output = decoder_block(output, 'decoder_output', num_filters=32)
    output = Conv2D(1, (1, 1), activation='sigmoid', name='prediction')(output)

    model = Model(base_model.input, output)
    model.compile(optimizer=Adam(learning_rate=1e-4),
                  loss=loss_func,
                  metrics=metrics_list)
    return model

#  Building VGG Model
vgg_model = unet_vgg(
    input_size=(224,224,3),
    decoder_block=decoder_block_bottleneck,
    weights='imagenet',
    loss_func=bce_dice_loss,
    metrics_list=[my_iou_metric],
    use_lovash=False
)

print(vgg_model.summary())


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m58889256/58889256[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


None


In [None]:
# Problem 3: Learning / estimation

# Uploading TGS Salt Identification Challeng dataset
from google.colab import files
files.upload()

Saving depths.csv to depths.csv
Saving train.csv to train.csv
Saving train.zip to train.zip
Buffered data was truncated after reaching the output size limit.

In [None]:
# Unzipping train.zip file
import zipfile
import os

# Path to uploaded zip file
zip_path = "train.zip"
output_dir = "train"

# Creating output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

# Extracting files
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(output_dir)

print("Extraction complete. Files are in:", output_dir)

Extraction complete. Files are in: train


In [29]:
ls

depths.csv  train.csv  unet_resnet_best.weights.h5
[0m[01;34mtrain[0m/      train.zip  unet_vgg_best.weights.h5


In [30]:
# Importing Libraries
import numpy as np
import pandas as pd
import cv2
import gc
import tensorflow as tf
from sklearn.model_selection import StratifiedKFold
from keras import backend as K
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from keras.layers import *
from keras.models import Model
from keras.applications.resnet50 import ResNet50
from keras.applications.vgg16 import VGG16
from keras.losses import binary_crossentropy

# Metrics and Loss
def dice_coef(y_true, y_pred):
    y_true_f = tf.reshape(y_true, [-1])
    y_pred_f = tf.reshape(y_pred, [-1])
    intersection = tf.reduce_sum(y_true_f * y_pred_f)
    return (2. * intersection + 1.) / (tf.reduce_sum(y_true_f) + tf.reduce_sum(y_pred_f) + 1.)

def dice_loss(y_true, y_pred):
    return 1. - dice_coef(y_true, y_pred)

def bce_dice_loss(y_true, y_pred):
    return binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)

# IoU metric
def iou_metric_tf(y_true, y_pred):
    y_pred = tf.cast(y_pred > 0.5, tf.float32)
    y_true_f = tf.reshape(y_true, [-1])
    y_pred_f = tf.reshape(y_pred, [-1])
    intersection = tf.reduce_sum(y_true_f * y_pred_f)
    union = tf.reduce_sum(y_true_f) + tf.reduce_sum(y_pred_f) - intersection
    return tf.math.divide_no_nan(intersection, union)

# Data Loading (128x128)
train = pd.read_csv("train.csv")
depth = pd.read_csv("depths.csv")
train = train.merge(depth, on='id', how='left')

# Loading images & masks (grayscale)
X_train = np.array([cv2.imread(f'train/images/{x}.png', 0) for x in train.id]) / 255.
y_train = np.array([cv2.imread(f'train/masks/{x}.png', 0) for x in train.id]) / 255.

# Resizing to 128x128 and adding 3 channels for encoder
X_resized = np.array([cv2.resize(np.stack([x]*3, axis=-1), (128,128)) for x in X_train])
y_resized = np.array([cv2.resize(x, (128,128)) for x in y_train])

# Stratified split
train['coverage'] = np.mean(y_train, axis=(1, 2))
def cov_to_class(val):
    for i in range(0, 11):
        if val * 10 <= i:
            return i
train['coverage_class'] = train.coverage.map(cov_to_class)
kfold = StratifiedKFold(n_splits=5, random_state=1337, shuffle=True)
for train_index, valid_index in kfold.split(train.id, train.coverage_class):
    X_tr, X_val = X_resized[train_index], X_resized[valid_index]
    y_tr, y_val = y_resized[train_index], y_resized[valid_index]
    break
y_tr = np.expand_dims(y_tr, axis=-1)
y_val = np.expand_dims(y_val, axis=-1)

print("Train:", X_tr.shape, y_tr.shape)
print("Validation:", X_val.shape, y_val.shape)

# Decoder Block
def decoder_block_bottleneck(layer_name, block_name, num_filters=32):
    x_dec = Conv2D(num_filters, (3,3), padding='same', name=f'{block_name}_conv1')(layer_name)
    x_dec = BatchNormalization(name=f'{block_name}_bn1')(x_dec)
    x_dec = Activation('relu')(x_dec)
    x_dec2 = Conv2D(num_filters, (3,3), padding='same', name=f'{block_name}_conv2')(x_dec)
    x_dec2 = BatchNormalization(name=f'{block_name}_bn2')(x_dec2)
    x_dec2 = Activation('relu')(x_dec2)
    return Add()([x_dec, x_dec2])

# UNet ResNet50 Encoder
def unet_resnet(input_size=(128,128,3), decoder_block=decoder_block_bottleneck,
                weights='imagenet', loss_func=bce_dice_loss, metrics_list=[iou_metric_tf]):
    base_model = ResNet50(input_shape=input_size, include_top=False, weights=weights)
    encoder1 = base_model.get_layer('conv1_relu').output
    encoder2 = base_model.get_layer('conv2_block3_out').output
    encoder3 = base_model.get_layer('conv3_block4_out').output
    encoder4 = base_model.get_layer('conv4_block6_out').output
    encoder5 = base_model.get_layer('conv5_block3_out').output

    center = decoder_block(encoder5, 'center', 512)
    concat5 = concatenate([center, encoder5], axis=-1)

    decoder4 = decoder_block(concat5, 'decoder4', 256)
    concat4 = concatenate([UpSampling2D()(decoder4), encoder4], axis=-1)

    decoder3 = decoder_block(concat4, 'decoder3', 128)
    concat3 = concatenate([UpSampling2D()(decoder3), encoder3], axis=-1)

    decoder2 = decoder_block(concat3, 'decoder2', 64)
    concat2 = concatenate([UpSampling2D()(decoder2), encoder2], axis=-1)

    decoder1 = decoder_block(concat2, 'decoder1', 32)
    concat1 = concatenate([UpSampling2D()(decoder1), encoder1], axis=-1)

    # Final upsampling and segmentation layer
    final_upsample = UpSampling2D()(concat1)
    output = Conv2D(1, (1, 1), activation='sigmoid')(final_upsample)

    model = Model(base_model.input, output)
    model.compile(optimizer='adam', loss=loss_func, metrics=metrics_list)
    return model


# UNet VGG16 Encoder
def unet_vgg(input_size=(128,128,3), decoder_block=decoder_block_bottleneck,
             weights='imagenet', loss_func=bce_dice_loss, metrics_list=[iou_metric_tf]):
    base_model = VGG16(input_shape=input_size, include_top=False, weights=weights)
    encoder1 = base_model.get_layer('block1_conv2').output
    encoder2 = base_model.get_layer('block2_conv2').output
    encoder3 = base_model.get_layer('block3_conv3').output
    encoder4 = base_model.get_layer('block4_conv3').output
    encoder5 = base_model.get_layer('block5_conv3').output

    center = decoder_block(encoder5, 'center', 512)
    concat5 = concatenate([center, encoder5], axis=-1)

    decoder4 = decoder_block(concat5, 'decoder4', 256)
    concat4 = concatenate([UpSampling2D()(decoder4), encoder4], axis=-1)

    decoder3 = decoder_block(concat4, 'decoder3', 128)
    concat3 = concatenate([UpSampling2D()(decoder3), encoder3], axis=-1)

    decoder2 = decoder_block(concat3, 'decoder2', 64)
    concat2 = concatenate([UpSampling2D()(decoder2), encoder2], axis=-1)

    decoder1 = decoder_block(concat2, 'decoder1', 32)
    concat1 = concatenate([UpSampling2D()(decoder1), encoder1], axis=-1)

    # Direct segmentation output (NO final upsampling)
    output = Conv2D(1, (1, 1), activation='sigmoid')(concat1)

    model = Model(base_model.input, output)
    model.compile(optimizer='adam', loss=loss_func, metrics=metrics_list)
    return model

Train: (3200, 128, 128, 3) (3200, 128, 128, 1)
Validation: (800, 128, 128, 3) (800, 128, 128, 1)


In [6]:
# Training ResNet50

# Callbacks
model_checkpoint_resnet = ModelCheckpoint(
    'unet_resnet_best.weights.h5', monitor='iou_metric_tf', mode='max',
    save_best_only=True, save_weights_only=True, verbose=1
)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, verbose=1, min_lr=1e-5)

# Training
K.clear_session()
resnet_model = unet_resnet()
print("\nTraining ResNet50-based U-Net...\n")
resnet_model.fit(X_tr, y_tr, validation_data=(X_val, y_val), epochs=5, batch_size=8,
                 callbacks=[model_checkpoint_resnet, reduce_lr], verbose=1)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m94765736/94765736[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step

Training ResNet50-based U-Net...

Epoch 1/5
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3s/step - iou_metric_tf: 0.4849 - loss: 1.0152
Epoch 1: iou_metric_tf improved from -inf to 0.54392, saving model to unet_resnet_best.weights.h5
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1494s[0m 4s/step - iou_metric_tf: 0.4850 - loss: 1.0146 - val_iou_metric_tf: 0.1074 - val_loss: 1.3181 - learning_rate: 0.0010
Epoch 2/5
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3s/step - iou_metric_tf: 0.6218 - loss: 0.6214
Epoch 2: iou_metric_tf improved from 0.54392 to 0.62996, saving model to unet_resnet_best.weights.h5
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1480s[0m 4s/step - iou_metric_tf: 0.6218 - lo

<keras.src.callbacks.history.History at 0x7fd9521b92d0>

In [9]:
# Clearing memory befor training VGG16 U-Net
import gc
K.clear_session()
gc.collect()

0

In [10]:
# Training VGG16

# Callbacks
model_checkpoint_vgg = ModelCheckpoint(
    'unet_vgg_best.weights.h5', monitor='iou_metric_tf', mode='max',
    save_best_only=True, save_weights_only=True, verbose=1
)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, verbose=1, min_lr=1e-5)

# Training
K.clear_session()
vgg_model = unet_vgg()
print("\nTraining VGG16-based U-Net...\n")
vgg_model.fit(X_tr, y_tr, validation_data=(X_val, y_val), epochs=5, batch_size=8,
              callbacks=[model_checkpoint_vgg, reduce_lr], verbose=1)


Training VGG16-based U-Net...

Epoch 1/5
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7s/step - iou_metric_tf: 0.4690 - loss: 0.9100
Epoch 1: iou_metric_tf improved from -inf to 0.51092, saving model to unet_vgg_best.weights.h5
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2970s[0m 7s/step - iou_metric_tf: 0.4691 - loss: 0.9098 - val_iou_metric_tf: 0.5380 - val_loss: 0.7701 - learning_rate: 0.0010
Epoch 2/5
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7s/step - iou_metric_tf: 0.5820 - loss: 0.7089
Epoch 2: iou_metric_tf improved from 0.51092 to 0.58154, saving model to unet_vgg_best.weights.h5
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2972s[0m 7s/step - iou_metric_tf: 0.5820 - loss: 0.7089 - val_iou_metric_tf: 0.5725 - val_loss: 0.7038 - learning_rate: 0.0010
Epoch 3/5
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7s/step - iou_metric_tf: 0.6099 - loss: 0.6558
Epoch 3: iou_metric_tf improved 

<keras.src.callbacks.history.History at 0x7906e85ba250>

In [33]:
# Evaluation and IoU Comparison

# Recreating Resnet model previously cleared from memory
K.clear_session()
resnet_model = unet_resnet()
resnet_model.load_weights("unet_resnet_best.weights.h5")


# Predictions
resnet_preds = resnet_model.predict(X_val, batch_size=8)
vgg_preds = vgg_model.predict(X_val, batch_size=8)

# Binarizing predictions at 0.5 threshold
resnet_bin = (resnet_preds > 0.5).astype(np.float32)
vgg_bin = (vgg_preds > 0.5).astype(np.float32)

# IoU scores (batch mean)
def iou_score(y_true, y_pred):
    ious = []
    for i in range(len(y_true)):
        intersection = np.sum(y_true[i] * y_pred[i])
        union = np.sum(y_true[i]) + np.sum(y_pred[i]) - intersection
        ious.append(intersection / union if union != 0 else 1.0)
    return np.mean(ious)

resnet_iou = iou_score(y_val, resnet_bin)
vgg_iou = iou_score(y_val, vgg_bin)

print(f"\nResNet IoU: {resnet_iou:.4f}")
print(f"VGG IoU:   {vgg_iou:.4f}")

  saveable.load_own_variables(weights_store.get(inner_path))


[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m67s[0m 632ms/step
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m181s[0m 2s/step

ResNet IoU: 0.6592
VGG IoU:   0.6648


Both architectures performed comparably, with VGG16 (0.6648) slightly outperforming ResNet50 (0.6592) in IoU performance for this dataset and configuration, suggesting slightly better segmentation performance