Model Architecture: Hybrid U-Net for Microaneurysm Detection
Architecture Overview: The Hybrid U-Net architecture was developed to enhance the traditional U-Net model by integrating additional components that improve feature learning and segmentation accuracy, especially in challenging tasks like microaneurysm detection in retinal fundus images. This hybrid model leverages the strengths of U-Net while incorporating elements such as attention mechanisms and residual connections to capture more complex patterns and refine segmentation results.

Flow of Data within the Model:
The Hybrid U-Net architecture maintains the core structure of U-Net with a contracting path (encoder) and an expanding path (decoder), while introducing modifications that enhance feature extraction and spatial information retention.

Contracting Path (Encoder):
The encoder captures the contextual information from the input image through a series of convolutional layers, similar to the traditional U-Net. However, the Hybrid U-Net incorporates residual blocks instead of simple convolutional layers. Each residual block consists of two 3x3 convolutional layers followed by ReLU activations, with a skip connection that adds the input of the block to its output. This structure helps in mitigating the vanishing gradient problem and allows for deeper networks.

Each convolution operation in the encoder continues to double the number of feature channels, capturing increasingly abstract and high-level features as the spatial dimensions decrease.

Bottleneck:
At the bottleneck, the model processes the image through two 3x3 convolutions with ReLU activations. The bottleneck also integrates an attention mechanism, which focuses on the most relevant features for segmentation by recalibrating the feature maps based on their importance. This attention mechanism is critical in highlighting subtle features like microaneurysms.

Expanding Path (Decoder):
The decoder in the Hybrid U-Net aims to restore the spatial resolution of the feature maps while ensuring that the high-level features captured by the encoder are effectively utilized. It employs transposed convolutions for upsampling, followed by the integration of feature maps from the encoder through skip connections.

To further enhance the model's performance, the Hybrid U-Net uses attention gates at each skip connection. These gates selectively filter the encoder’s feature maps, allowing the decoder to focus on the most relevant information for segmentation.

Output Layer:
The final layer remains a 1x1 convolution, reducing the number of output channels to match the number of classes (one for microaneurysms). A sigmoid activation function is applied to generate a pixel-wise binary classification map, identifying the presence of microaneurysms.

Choice of Optimizers and Metrics:
Optimizer:
The Adam optimizer was retained for the Hybrid U-Net due to its effectiveness in deep learning tasks. Adam's adaptive learning rate capabilities ensure faster convergence, which is crucial when training a more complex hybrid model. The optimizer’s ability to handle varying gradient scales is especially beneficial in the context of the Hybrid U-Net, where multiple architectural components interact.

Loss Function:
Binary Cross-Entropy (BCE): The model continues to use Binary Cross-Entropy as the loss function. BCE is ideal for this binary classification task, as it quantifies the dissimilarity between the predicted microaneurysm map and the ground truth, guiding the model towards accurate segmentation.

In [2]:
import os
import cv2
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Conv2DTranspose, concatenate
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
import glob


In [4]:
# Paths
dataset_path = "F:/Fyp/Preprocessing/1_Microaneurysms/Processed"
image_dir = os.path.join(dataset_path, "Images")
mask_dir = os.path.join(dataset_path, "Mask")

# Image dimensions
IMG_HEIGHT = 256
IMG_WIDTH = 256
IMG_CHANNELS = 3

In [7]:
def load_images_and_masks(image_dir, mask_dir):
    images = sorted(glob.glob(os.path.join(image_dir, "*")))
    masks = sorted(glob.glob(os.path.join(mask_dir, "*")))

    images = [cv2.imread(img) for img in images if os.path.isfile(img)]
    masks = [cv2.imread(mask, cv2.IMREAD_GRAYSCALE) for mask in masks if os.path.isfile(mask)]

    # Resize images and masks to the desired size
    images = [cv2.resize(img, (IMG_WIDTH, IMG_HEIGHT)) for img in images]
    masks = [cv2.resize(mask, (IMG_WIDTH, IMG_HEIGHT)) for mask in masks]

    images = np.array(images)
    masks = np.expand_dims(np.array(masks), axis=-1)  # Add channel dimension

    # Normalize images and masks
    images = images / 255.0
    masks = masks / 255.0

    return images, masks


In [1]:
from tensorflow.keras.layers import LayerNormalization, MultiHeadAttention, Dense, Add

def transformer_block(x, num_heads, key_dim, ff_dim, rate=0.1):
    # Multi-Head Self Attention
    attn_output = MultiHeadAttention(num_heads=num_heads, key_dim=key_dim)(x, x)
    attn_output = Add()([x, attn_output])  # Skip connection
    attn_output = LayerNormalization(epsilon=1e-6)(attn_output)

    # Feed Forward Network
    ffn_output = Dense(ff_dim, activation='relu')(attn_output)
    ffn_output = Dense(x.shape[-1])(ffn_output)
    ffn_output = Add()([attn_output, ffn_output])  # Skip connection
    ffn_output = LayerNormalization(epsilon=1e-6)(ffn_output)
    
    return ffn_output


In [5]:
def hybrid_unet_model(input_size=(IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS)):
    inputs = Input(input_size)

    c1 = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)
    c1 = Conv2D(64, (3, 3), activation='relu', padding='same')(c1)
    p1 = MaxPooling2D((2, 2))(c1)

    c2 = Conv2D(128, (3, 3), activation='relu', padding='same')(p1)
    c2 = Conv2D(128, (3, 3), activation='relu', padding='same')(c2)
    p2 = MaxPooling2D((2, 2))(c2)

    c3 = Conv2D(256, (3, 3), activation='relu', padding='same')(p2)
    c3 = Conv2D(256, (3, 3), activation='relu', padding='same')(c3)
    p3 = MaxPooling2D((2, 2))(c3)

    c4 = Conv2D(512, (3, 3), activation='relu', padding='same')(p3)
    c4 = Conv2D(512, (3, 3), activation='relu', padding='same')(c4)
    p4 = MaxPooling2D((2, 2))(c4)

    # Bottleneck layer
    c5 = Conv2D(1024, (3, 3), activation='relu', padding='same')(p4)
    c5 = Conv2D(1024, (3, 3), activation='relu', padding='same')(c5)

    # Integrate Transformer Block
    c5_transformed = transformer_block(c5, num_heads=8, key_dim=64, ff_dim=1024)

    u6 = Conv2DTranspose(512, (2, 2), strides=(2, 2), padding='same')(c5_transformed)
    u6 = concatenate([u6, c4])
    c6 = Conv2D(512, (3, 3), activation='relu', padding='same')(u6)
    c6 = Conv2D(512, (3, 3), activation='relu', padding='same')(c6)

    u7 = Conv2DTranspose(256, (2, 2), strides=(2, 2), padding='same')(c6)
    u7 = concatenate([u7, c3])
    c7 = Conv2D(256, (3, 3), activation='relu', padding='same')(u7)
    c7 = Conv2D(256, (3, 3), activation='relu', padding='same')(c7)

    u8 = Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(c7)
    u8 = concatenate([u8, c2])
    c8 = Conv2D(128, (3, 3), activation='relu', padding='same')(u8)
    c8 = Conv2D(128, (3, 3), activation='relu', padding='same')(c8)

    u9 = Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(c8)
    u9 = concatenate([u9, c1])
    c9 = Conv2D(64, (3, 3), activation='relu', padding='same')(u9)
    c9 = Conv2D(64, (3, 3), activation='relu', padding='same')(c9)

    outputs = Conv2D(1, (1, 1), activation='sigmoid')(c9)

    model = Model(inputs=[inputs], outputs=[outputs])
    model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

    return model


In [8]:
# Load the data
images, masks = load_images_and_masks(image_dir, mask_dir)

# Split data into training and validation sets
train_images, val_images, train_masks, val_masks = train_test_split(images, masks, test_size=0.2, random_state=42)

In [9]:
# Data augmentation
data_gen_args = dict(rotation_range=15,
                     width_shift_range=0.1,
                     height_shift_range=0.1,
                     shear_range=0.1,
                     zoom_range=0.1,
                     horizontal_flip=True,
                     fill_mode='nearest')

image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)

# Create generators
train_image_generator = image_datagen.flow(train_images, batch_size=16, seed=42)
train_mask_generator = mask_datagen.flow(train_masks, batch_size=16, seed=42)

# Create the combined dataset from the generators
def generator_to_dataset(image_gen, mask_gen):
    dataset = tf.data.Dataset.from_generator(
        lambda: zip(image_gen, mask_gen),
        output_signature=(
            tf.TensorSpec(shape=(16, IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS), dtype=tf.float32),
            tf.TensorSpec(shape=(16, IMG_HEIGHT, IMG_WIDTH, 1), dtype=tf.float32)
        )
    )
    return dataset

train_dataset = generator_to_dataset(train_image_generator, train_mask_generator)

# Apply prefetching to the dataset
train_dataset = train_dataset.repeat().prefetch(buffer_size=tf.data.AUTOTUNE)

In [10]:
# Create the hybrid U-Net model
hybrid_model = hybrid_unet_model()

# Train the model
history = hybrid_model.fit(train_dataset,
                    steps_per_epoch=len(train_images) // 16,
                    validation_data=(val_images, val_masks),
                    epochs=1)

# Save the trained model
hybrid_model.save('hybrid_unet_model.h5')

# Evaluate the model on the validation set
loss, accuracy = hybrid_model.evaluate(val_images, val_masks)
print(f"Validation Accuracy: {accuracy * 100:.2f}%")


[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5481s[0m 274s/step - accuracy: 0.8171 - loss: 0.2579 - val_accuracy: 0.9892 - val_loss: 0.0152




[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m121s[0m 37s/step - accuracy: 0.9902 - loss: 0.0131
Validation Accuracy: 98.92%
