<a href="https://colab.research.google.com/github/CDFire/ProjectsInAI-ML/blob/main/HW5/ProjectsInAIML_HW5_GreaduateTask.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


Key models commonly employed for polyp segmentation tasks:

U-Net (Ronneberger et al., 2015)

  * Baseline encoder-decoder architecture.
  * Common benchmark model.
  * Known for robustness in biomedical image segmentation.

U-Net++ (Zhou et al., 2018)

  * Adds dense skip connections to improve accuracy.
  * Reduces semantic gap between encoder and decoder features.

ResUNet (Zhang et al., 2018)
  * Incorporates Residual connections.
  * Helps improve gradient flow and training speed.

Attention U-Net (Oktay et al., 2018)
  * Uses attention mechanisms to focus on relevant areas.

Key Metrics for Evaluation:
  * Dice Coefficient (Dice Similarity Coefficient):
  * Intersection-over-Union (IoU or Jaccard Index):
  * Accuracy, Precision, Recall, and F1-score.


In [1]:
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
import os
import numpy as np
import cv2
from sklearn.model_selection import train_test_split

In [2]:
def conv_block(inputs, filters):
    x = Conv2D(filters, 3, padding='same')(inputs)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(filters, 3, padding='same')(x)
    x = BatchNormalization()(x)
    shortcut = Conv2D(filters, 1, padding='same')(inputs)
    shortcut = BatchNormalization()(shortcut)
    x = Add()([x, shortcut])
    x = Activation('relu')(x)
    return x

In [3]:
def encoder_block(inputs, filters):
    x = conv_block(inputs, filters)
    p = MaxPooling2D((2, 2))(x)
    return x, p

def decoder_block(inputs, skip_features, filters):
    x = Conv2DTranspose(filters, (2, 2), strides=2, padding='same')(inputs)
    x = Concatenate()([x, skip_features])
    x = conv_block(x, filters)
    return x

In [4]:
def build_resunet(input_shape):
    inputs = Input(input_shape)

    s1, p1 = encoder_block(inputs, 64)
    s2, p2 = encoder_block(p1, 128)
    s3, p3 = encoder_block(p2, 256)
    s4, p4 = encoder_block(p3, 512)

    b1 = conv_block(p4, 1024)

    d1 = decoder_block(b1, s4, 512)
    d2 = decoder_block(d1, s3, 256)
    d3 = decoder_block(d2, s2, 128)
    d4 = decoder_block(d3, s1, 64)

    outputs = Conv2D(1, 1, activation='sigmoid')(d4)

    model = Model(inputs, outputs, name="ResUNet")
    return model

model = build_resunet((256, 256, 3))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

In [5]:
def load_data(image_dir, mask_dir, size=(256, 256), max_images=20):
    images, masks = [], []
    img_files = sorted(os.listdir(image_dir))[:max_images]
    for img_file in img_files:
        img_path = os.path.join(image_dir, img_file)
        mask_path = os.path.join(mask_dir, img_file)

        img = cv2.imread(img_path)
        mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)

        if img is None or mask is None:
            print(f"Missing image or mask for file: {img_file}, skipping...")
            continue

        img = cv2.resize(img, size)
        mask = cv2.resize(mask, size)
        mask = np.expand_dims(mask, axis=-1) / 255.

        images.append(img)
        masks.append(mask)

    return np.array(images)/255., np.array(masks)

In [8]:
!unzip PNG.zip

Archive:  PNG.zip
   creating: PNG/
  inflating: __MACOSX/._PNG          
  inflating: PNG/.DS_Store           
  inflating: __MACOSX/PNG/._.DS_Store  
   creating: PNG/Original/
  inflating: __MACOSX/PNG/._Original  
   creating: PNG/Ground Truth/
  inflating: __MACOSX/PNG/._Ground Truth  
  inflating: PNG/Original/348.png    
  inflating: __MACOSX/PNG/Original/._348.png  
  inflating: PNG/Original/412.png    
  inflating: __MACOSX/PNG/Original/._412.png  
  inflating: PNG/Original/374.png    
  inflating: __MACOSX/PNG/Original/._374.png  
  inflating: PNG/Original/360.png    
  inflating: __MACOSX/PNG/Original/._360.png  
  inflating: PNG/Original/406.png    
  inflating: __MACOSX/PNG/Original/._406.png  
  inflating: PNG/Original/176.png    
  inflating: __MACOSX/PNG/Original/._176.png  
  inflating: PNG/Original/88.png     
  inflating: __MACOSX/PNG/Original/._88.png  
  inflating: PNG/Original/610.png    
  inflating: __MACOSX/PNG/Original/._610.png  
  inflating: PNG/Original/604

In [6]:
X, y = load_data('PNG/Original/', 'PNG/GroundTruth/', max_images=5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [7]:
trained_model = model.fit(X_train, y_train, epochs=20, batch_size=8, validation_data=(X_test, y_test))

Epoch 1/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m57s[0m 57s/step - accuracy: 0.7201 - loss: 0.5870 - val_accuracy: 0.7742 - val_loss: 0.6626
Epoch 2/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 40s/step - accuracy: 0.9158 - loss: 0.2941 - val_accuracy: 0.5778 - val_loss: 0.6881
Epoch 3/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 32s/step - accuracy: 0.9287 - loss: 0.2078 - val_accuracy: 0.2767 - val_loss: 0.7687
Epoch 4/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 41s/step - accuracy: 0.9393 - loss: 0.1778 - val_accuracy: 0.1894 - val_loss: 0.9682
Epoch 5/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 41s/step - accuracy: 0.9594 - loss: 0.1160 - val_accuracy: 0.1953 - val_loss: 0.9667
Epoch 6/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 41s/step - accuracy: 0.9465 - loss: 0.1294 - val_accuracy: 0.1503 - val_loss: 5.4927
Epoch 7/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━

In [8]:
def dice_coef(y_true, y_pred, smooth=1):
    y_true_f = y_true.flatten()
    y_pred_f = (y_pred.flatten() > 0.5).astype(np.float32)
    intersection = np.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (np.sum(y_true_f) + np.sum(y_pred_f) + smooth)

def iou_score(y_true, y_pred, smooth=1):
    y_true_f = y_true.flatten()
    y_pred_f = (y_pred.flatten() > 0.5).astype(np.float32)
    intersection = np.sum(y_true_f * y_pred_f)
    union = np.sum(y_true_f) + np.sum(y_pred_f) - intersection
    return (intersection + smooth) / (union + smooth)

preds = model.predict(X_test)
dice = dice_coef(y_test, preds)
iou = iou_score(y_test, preds)

print(f'Dice coefficient: {dice}')
print(f'IoU score: {iou}')

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3s/step
Dice coefficient: 0.2704760064590889
IoU score: 0.15639397831549728


PraNet (Fan et al., 2020)

  * Dice Coefficient: 0.899
  * IoU Score: 0.840

U-Net++ (Zhou et al., 2018)
  * Dice Coefficient: 0.794
  * IoU Score: 0.729

My model
  * Dice coefficient: 0.2704760064590889
  * IoU score: 0.15639397831549728

Training Data Size:

* PraNet and U-Net++ were trained on the full dataset (612 images), resulting in robust generalization.
* My model was trained on only 5 training samples due to lack of GPU, severely limiting its capacity to generalize.

Model Complexity and Regularization:
* PraNet employs attention mechanisms (Parallel Reverse Attention), significantly boosting accuracy.
* U-Net++ integrates nested skip connections to reduce semantic gaps, improving results over standard U-Net.
* My model, while powerful, lacks sufficient data, regularization, and augmentation, resulting in severe overfitting.

Training Duration:
* Both PraNet and U-Net++ used substantial training epochs (typically 50–200 epochs).
* My model trained only for 20 epochs due to time constraints and lack of GPU, barely enough for meaningful convergence.