Semantic Segmentation Using U-Net

Deep Learning Project

1. Problem Statement

Semantic segmentation is a fundamental problem in computer vision where the objective is to assign a class label to every pixel in an image. Unlike image classification, segmentation requires precise localization and boundary understanding.

In this project, the task is to segment images into three semantic classes:

Pet
Boundary
Background

This problem is challenging due to:

Fine object boundaries
Variations in pet shapes, colors, and poses
Class imbalance between foreground and background pixels

2. Dataset

The project uses the Oxford-IIIT Pets Dataset, which contains images of cats and dogs along with pixel-wise segmentation masks.

Mask Classes (after preprocessing):

0 → Pet
1 → Boundary
2 → Background

The dataset is loaded using tensorflow_datasets (tfds) and split into training and testing sets.

3. Proposed Solution

To solve the segmentation problem, a pure U-Net architecture is implemented from scratch, without using any pretrained encoders.

U-Net is particularly well-suited for semantic segmentation because:

It captures context through downsampling (encoder)
It preserves spatial detail using skip connections
It performs well even on limited datasets

4. Data Preprocessing and Augmentation

4.1 Image Preprocessing

Images are resized to 128 × 128
Pixel values are normalized to the range [0, 1]
Masks are converted to integer labels suitable for sparse loss functions

4.2 Data Augmentation

Random horizontal flipping is applied during training
Helps improve generalization and reduce overfitting

5. Model Architecture (U-Net)

5.1 Encoder (Contracting Path)

The encoder consists of repeated blocks of:

Two 3×3 convolution layers
Batch normalization
ReLU activation
2×2 max pooling for downsampling

Feature depth increases progressively: 64 → 128 → 256 → 512

5.2 Bottleneck

The bottleneck captures high-level semantic features using:

Two convolution layers with 1024 filters

5.3 Decoder (Expanding Path)

The decoder restores spatial resolution using:

Transposed convolutions for upsampling
Skip connections from corresponding encoder layers
Double convolution blocks after concatenation

This structure allows precise localization of object boundaries.

5.4 Output Layer

A 1×1 convolution with softmax activation produces a probability map for each class:

Output Shape = (H, W, 3)

6. Loss Function and Metrics

6.1 Loss Function

Sparse Categorical Cross-Entropy is used:

Suitable for multi-class pixel-wise classification
Efficient as masks are stored as integer labels

6.2 Evaluation Metrics

Mean Intersection over Union (Mean IoU)
Pixel Accuracy

Mean IoU is the primary metric for segmentation quality as it measures overlap between predicted and ground-truth regions.

7. Training Pipeline

Optimizer: Adam
Batch size: 64
Epochs: 10

Dataset pipeline uses:

Caching
Shuffling
Batching
Prefetching for performance optimization

8. Visualization and Results

8.1 Qualitative Results

The project includes visualization utilities that display:

Input image
Ground truth segmentation mask
Predicted segmentation mask

These are shown side-by-side, allowing direct visual comparison of model performance.

8.2 Quantitative Results

After training, the model is evaluated on the test set and reports:

Final test loss
Mean IoU score
Pixel accuracy

These metrics provide an objective assessment of segmentation performance.

9. Tools and Technologies

Python
TensorFlow / Keras
TensorFlow Datasets (TFDS)
NumPy
Matplotlib

10. Conclusion

This project demonstrates the effectiveness of a scratch-built U-Net for multi-class semantic segmentation. By combining careful preprocessing, a well-structured encoder–decoder architecture, and appropriate evaluation metrics, the model achieves accurate segmentation of pets and their boundaries. The project provides a strong foundation for understanding pixel-level deep learning tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
image_segmentation_unet.ipynb		image_segmentation_unet.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Segmentation Using U-Net

1. Problem Statement

2. Dataset

3. Proposed Solution

4. Data Preprocessing and Augmentation

4.1 Image Preprocessing

4.2 Data Augmentation

5. Model Architecture (U-Net)

5.1 Encoder (Contracting Path)

5.2 Bottleneck

5.3 Decoder (Expanding Path)

5.4 Output Layer

6. Loss Function and Metrics

6.1 Loss Function

6.2 Evaluation Metrics

7. Training Pipeline

8. Visualization and Results

8.1 Qualitative Results

8.2 Quantitative Results

9. Tools and Technologies

10. Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Semantic Segmentation Using U-Net

1. Problem Statement

2. Dataset

3. Proposed Solution

4. Data Preprocessing and Augmentation

4.1 Image Preprocessing

4.2 Data Augmentation

5. Model Architecture (U-Net)

5.1 Encoder (Contracting Path)

5.2 Bottleneck

5.3 Decoder (Expanding Path)

5.4 Output Layer

6. Loss Function and Metrics

6.1 Loss Function

6.2 Evaluation Metrics

7. Training Pipeline

8. Visualization and Results

8.1 Qualitative Results

8.2 Quantitative Results

9. Tools and Technologies

10. Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages