Skip to content

BidoCodeHub/dishware-instance-segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Dishware Instance Segmentation

Instance segmentation of dishware in cluttered kitchen scenes using a fine-tuned Faster R-CNN with Feature Pyramid Network (FPN). Developed for APS360 (Applied Deep Learning) at the University of Toronto, Sep-Dec 2025.

Overview

Detecting and segmenting individual dishware items in cluttered kitchen environments is a challenging computer vision task with applications in robotics, smart kitchens, and assistive technology. This project fine-tunes a Faster R-CNN model with a ResNet-50 backbone to perform instance segmentation on plates, bowls, cups, and utensils. The model is benchmarked against a YOLOv8n baseline, achieving a significant improvement in detection accuracy.

Model Architecture

The model builds on Faster R-CNN with the following components:

  • Backbone: ResNet-50 with a Feature Pyramid Network (FPN) for multi-scale feature extraction.
  • Region Proposal Network (RPN): Custom anchor sizes of 16, 32, 64, 128, 256, and 512 pixels to accommodate dishware ranging from small utensils to large plates.
  • RoI Align: Precise region-of-interest alignment, avoiding the quantization artifacts of RoI Pooling.
  • Dual Prediction Heads: One branch for bounding box regression and classification, and a second branch for pixel-level mask generation.

The baseline comparison model is YOLOv8n (nano), a lightweight single-stage detector.

Dataset

  • Source: 245 images curated from COCO, LVIS, and Open Images datasets.
  • Split: 70/15/15 stratified train/validation/test split.
  • Augmentation: Random horizontal flip, random crop, and brightness jitter applied during training.

Results

Model AP50 mAP (0.50:0.95)
Faster R-CNN (ours) 0.270 0.134
YOLOv8n (baseline) 0.144 0.121

Key findings:

  • The model achieves strong detection performance on plates, bowls, and cups.
  • Small utensils remain challenging due to scale variation and occlusion.
  • On unseen data, the model produces an average of 13.17 detections per image at a confidence threshold of 0.60, demonstrating reasonable generalization.

Getting Started

Google Colab (Recommended)

  1. Upload dishware_segmentation.ipynb to Google Colab.
  2. Select a GPU runtime (Runtime > Change runtime type > GPU).
  3. Run all cells sequentially.

Local Setup

# Clone the repository
git clone https://github.com/BidoCodeHub/dishware-instance-segmentation.git
cd dishware-instance-segmentation

# Create a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install torch torchvision matplotlib pycocotools

Then open the notebook with Jupyter:

jupyter notebook dishware_segmentation.ipynb

Requirements:

  • Python 3.8+
  • PyTorch 1.12+
  • torchvision 0.13+
  • matplotlib
  • pycocotools

Usage

The notebook is organized into the following sections:

  1. Environment Setup - Install and import required libraries.
  2. Dataset Preparation - Download, preprocess, and augment the dishware images.
  3. Model Definition - Configure Faster R-CNN with custom anchors and FPN.
  4. Training - Fine-tune the model on the training set with validation monitoring.
  5. Evaluation - Compute AP50 and mAP metrics on the test set.
  6. Inference and Visualization - Run predictions on new images and visualize segmentation masks.
  7. Baseline Comparison - Train and evaluate YOLOv8n under the same conditions.

Acknowledgments

This project was completed as part of APS360 (Applied Deep Learning) at the University of Toronto. We thank the course instructors and teaching assistants for their guidance.

The training data is sourced from the COCO, LVIS, and Open Images datasets.

License

This project is licensed under the MIT License. See LICENSE for details.

About

Instance segmentation of dishware in cluttered scenes using Faster R-CNN with FPN, outperforming YOLOv8 baseline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors