Skip to content

Ahmadinit/Semantic-Segmentation-Using-U-Net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Semantic Segmentation Using U-Net

Deep Learning Project

1. Problem Statement

Semantic segmentation is a fundamental problem in computer vision where the objective is to assign a class label to every pixel in an image. Unlike image classification, segmentation requires precise localization and boundary understanding.

In this project, the task is to segment images into three semantic classes:

  • Pet
  • Boundary
  • Background

This problem is challenging due to:

  • Fine object boundaries
  • Variations in pet shapes, colors, and poses
  • Class imbalance between foreground and background pixels

2. Dataset

The project uses the Oxford-IIIT Pets Dataset, which contains images of cats and dogs along with pixel-wise segmentation masks.

Mask Classes (after preprocessing):

  • 0 → Pet
  • 1 → Boundary
  • 2 → Background

The dataset is loaded using tensorflow_datasets (tfds) and split into training and testing sets.

3. Proposed Solution

To solve the segmentation problem, a pure U-Net architecture is implemented from scratch, without using any pretrained encoders.

U-Net is particularly well-suited for semantic segmentation because:

  • It captures context through downsampling (encoder)
  • It preserves spatial detail using skip connections
  • It performs well even on limited datasets

4. Data Preprocessing and Augmentation

4.1 Image Preprocessing

  • Images are resized to 128 × 128
  • Pixel values are normalized to the range [0, 1]
  • Masks are converted to integer labels suitable for sparse loss functions

4.2 Data Augmentation

  • Random horizontal flipping is applied during training
  • Helps improve generalization and reduce overfitting

5. Model Architecture (U-Net)

5.1 Encoder (Contracting Path)

The encoder consists of repeated blocks of:

  • Two 3×3 convolution layers
  • Batch normalization
  • ReLU activation
  • 2×2 max pooling for downsampling

Feature depth increases progressively: 64 → 128 → 256 → 512

5.2 Bottleneck

The bottleneck captures high-level semantic features using:

  • Two convolution layers with 1024 filters

5.3 Decoder (Expanding Path)

The decoder restores spatial resolution using:

  • Transposed convolutions for upsampling
  • Skip connections from corresponding encoder layers
  • Double convolution blocks after concatenation

This structure allows precise localization of object boundaries.

5.4 Output Layer

A 1×1 convolution with softmax activation produces a probability map for each class:

Output Shape = (H, W, 3)

6. Loss Function and Metrics

6.1 Loss Function

Sparse Categorical Cross-Entropy is used:

  • Suitable for multi-class pixel-wise classification
  • Efficient as masks are stored as integer labels

6.2 Evaluation Metrics

  • Mean Intersection over Union (Mean IoU)
  • Pixel Accuracy

Mean IoU is the primary metric for segmentation quality as it measures overlap between predicted and ground-truth regions.

7. Training Pipeline

  • Optimizer: Adam
  • Batch size: 64
  • Epochs: 10

Dataset pipeline uses:

  • Caching
  • Shuffling
  • Batching
  • Prefetching for performance optimization

8. Visualization and Results

8.1 Qualitative Results

The project includes visualization utilities that display:

  • Input image
  • Ground truth segmentation mask
  • Predicted segmentation mask

These are shown side-by-side, allowing direct visual comparison of model performance.

8.2 Quantitative Results

After training, the model is evaluated on the test set and reports:

  • Final test loss
  • Mean IoU score
  • Pixel accuracy

These metrics provide an objective assessment of segmentation performance.

9. Tools and Technologies

  • Python
  • TensorFlow / Keras
  • TensorFlow Datasets (TFDS)
  • NumPy
  • Matplotlib

10. Conclusion

This project demonstrates the effectiveness of a scratch-built U-Net for multi-class semantic segmentation. By combining careful preprocessing, a well-structured encoder–decoder architecture, and appropriate evaluation metrics, the model achieves accurate segmentation of pets and their boundaries. The project provides a strong foundation for understanding pixel-level deep learning tasks.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors