Segmenting Object Affordances: Reproducibility and Sensitivity to Scale

Visual affordance segmentation identifies image regions of an object an agent can interact with. Existing methods re-use and adapt learning-based architectures for semantic segmentation to the affordance segmentation task and evaluate on small-size datasets. However, experimental setups are often not reproducible, thus leading to unfair and inconsistent comparisons. In this work, we benchmark these methods under a reproducible setup on two single objects scenarios, tabletop without occlusions and hand-held containers, to facilitate future comparisons. We include a version of a recent architecture, Mask2Former, re-trained for affordance segmentation and show that this model is the best-performing on most testing sets of both scenarios. Our analysis show that models are not robust to scale variations when object resolutions differ from those in the training set.

[arXiv] [webpage] [trained models]

News

26 October 2024: Released code and weights of CNN, DRNAtt, AffNet, and Mask2Former, trained on unoccluded object setting (UMD)
26 September 2024: Released code and weights of ACANet, ACANet50, RN18U, DRNAtt, RN50F, Mask2Former, trained on hand-occluded object setting (CHOC-AFF)
04 September 2024: Pre-print available on arxiv at https://arxiv.org/abs/2409.01814
17 August 2024: Source code, models, and further details will be released in the next weeks.
15 August 2024: Paper accepted at Twelfth International Workshop on Assistive Computer Vision and Robotics (ACVR), in conjunction with the 2024 European Conference on Computer Vision (ECCV).

Installation

Setup specifics

The models testing were performed using the following setup:

OS: Ubuntu 18.04.6 LTS
Kernel version: 4.15.0-213-generic
CPU: Intel® Core™ i7-9700K CPU @ 3.60GHz
Cores: 8
RAM: 32 GB
GPU: NVIDIA GeForce RTX 2080 Ti
Driver version: 510.108.03
CUDA version: 11.6

Requirements

Python 3.8
PyTorch 1.9.0
Torchvision 0.10.0
OpenCV 4.10.0.84
Numpy 1.24.4
Tqdm 4.66.5

Instructions

# Create and activate conda environment
conda create -n affordance_segmentation python=3.8
conda activate affordance_segmentation
    
# Install libraries
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia
pip install opencv-python onnx-tool numpy tqdm scipy

Running demo

Download model checkpoint ACANet.zip, and unzip it.

Use the images in the folder src/test_dir or try with your own images. The folder structure is DATA_DIR/rgb.

To run the model and visualise the output:

python src/demo.py --gpu_id=GPU_ID --model_name=MODEL_NAME --train_dataset=TRAIN_DATA --data_dir=DATA_DIR --checkpoint_path=CKPT_PATH --save_res=True --dest_dit=DEST_DIR

Replace MODEL_NAME with ACANet
DATA_DIR: directory where data are stored
TRAIN_DATA: name of the training dataset
CKPT_PATH: path to the .pth file
DEST_DIR: path to the destination directory. This flag is considered only if you save the predictions --save_res=True or the overlay visualisation --save_overlay=True. Results are automatically saved in DEST_DIR/pred, overlays in DEST_DIR/vis.

You can test if the model has the same performance by running inference on the images provided in src/test_dir/rgb and checking if the output is the same of test_dir/pred .

Trained models

Here is the list of available models trained on UMD or CHOC-AFF

Model name	UMD	CHOC-AFF
CNN	link to zip
AffordanceNet	link to zip
ACANet		link to zip
ACANet50		link to zip
RN50F		link to zip
RN18U		link to zip
DRNAtt	link to zip	link to zip
Mask2Former	link to zip	link to zip

Models installation

Note

When testing the installation of a model, you might need to change the imports in the scripts.

Mask2Former installation

To use Mask2Former model, please run the following commands:

# Install detectron2 library
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

# Access mask2former folder in repository
cd src/models/mask2former

# Clone code from Mask2Former repository
git clone https://github.com/facebookresearch/Mask2Former.git 

# Compile 
cd Mask2Former/mask2former/modeling/pixel_decoder/ops
sh make.sh

# Return to the main directory (aff-seg)
cd ../../../../../../../../

# Install required libraries
pip install timm

# Run script to load Mask2Former (expected output: "Model loaded correctly!!")
python src/models/mask2former/test_mask2former_load.py

Comment out line 194 in /src/models/mask2former/Mask2Former/mask2former/maskformer_model.py (images = [(x - self.pixel_mean) / self.pixel_std for x in images]) because we preprocess images in the dataloader.

ResNet50FCN installation

To use ResNet50FastFCN (RN50F) model, please run the following commands:

# Access resnet_fcn folder in repository
cd src/models/resnet_fcn

# Clone code from FastFCN repository
git clone https://github.com/wuhuikai/FastFCN.git

# Return to the main directory (aff-seg)
cd ../../../

In aff-seg/src/models/resnet_fcn/FastFCN/encoding/models/encnet.py replace line 11 import encoding with from ..nn import encoding
In aff-seg/src/models/resnet_fcn/FastFCN/encoding/models/base.py replace in line 38 pretrained=True with pretrained=False (the script tries to download the resnet pretrained weights, but fails). In case you want to use the pretrained weights, download them from issue#86 and then modify line 27 root='~/.encoding/models' to point at the folder with the downloaded checkpoint.

Run script to load RN50F (expected output: model statistics, with average inference time and standard deviation):

python src/models/resnet_fcn/test_resnet_fcn_load.py

DRNAtt installation

To use DRNAtt model, please run the following commands:

# Access drnatt folder in repository
cd src/models/drnatt

# Clone code from DANet repository
git clone https://github.com/junfu1115/DANet.git

# Clone code from DRN repository
git clone https://github.com/fyu/drn.git

# Install required libraries
pip install ninja

# Return to the main directory (aff-seg)
cd ../../../

Comment out line 12 and 13 in /DANet/encoding/_init_.py (from .version import __version__, and from . import nn, functions, parallel, utils, models, datasets, transforms)

Run script to check that the model is correctly installed (expected output: model statistics, with average inference time and standard deviation):

python src/models/drnatt/drn_att.py

AffordanceNet installation

To use AffordanceNet (AffNet), please run the following commands:

# Access affnet folder in repository
cd src/models/affnet

# Clone code from AffNetDR repository
git clone https://github.com/HuchieWuchie/affnetDR.git

# Return to the main directory (aff-seg)
cd ../../../

Replace line 75 in /affNetDR/lib/roi_heads.py (mask_prob = x.sigmoid()) with mask_prob = x.softmax(dim=1).
Replace line 77 in /affNetDR/lib/roi_heads.py with the commented lines 83 and 87
Replace imports torchvision._internally_replaced_utils with torchvision.models.utils in /affNetDR/lib/mask_rcnn.py (line 7), /affNetDR/lib/faster_rcnn.py (line 8)
Change line 148 in /affNetDR/lib/mask_rcnn.py with min_size=480, max_size=640

Run script to check that the model is correctly installed (expected output: model loaded successfully!):

python src/models/affnet/test_affordancenet_load.py

Training and testing data

Hand-occluded object setting

To recreate the training and testing splits of the mixed-reality dataset:

Download CHOC-AFF folders rgb, mask, annotations, affordance and unzip them in the preferred folder SRC_DIR.
Run python src/utils/split_CHOC.py --src_dir=SRC_DIR --dst_dir=DST_DIR to split into training, validation and testing sets. DST_DIR is the directory where splits are saved.
Run python src/utils/create_dataset_crops_CHOC.py --data_dir=DATA_DIR --save=True --dest_dir=DEST_DIR to perform the cropping window procedure described in ACANet paper. This script performs also the union between the arm mask and the affordance masks. DATA_DIR is the directory containing the rgb and affordance folders e.g. DST_DIR/training following the naming used for the previous script. DEST_DIR is the destination directory, where to save cropped rgb images, and segmentation masks.

To use the manually annotated data from CCM and HO-3D datasets:

Download rgb and annotation files from https://doi.org/10.5281/zenodo.10708553 and unzip them in the preferred folder SRC_DIR.
Run python src/utils/create_dataset_crops.py --data_dir=DATA_DIR --dataset_name=DATA_NAME --save=True --dest_dir=DEST_DIR to perform the cropping window procedure described in ACANet paper. DATA_DIR is the directory containing the rgb and affordance folders. DATA_NAME is the dataset name (either CCM or HO3D). DEST_DIR is the destination directory, where to save cropped rgb images, and segmentation masks.

Unoccluded object setting

To recreate the training and testing splits of the UMD dataset:

Download the UMD (tools) dataset and unzip it in $YOUR_DIRECTORY$
python src/utils/split_UMD.py --src_dir=SRC_DIR --file_path=FILE_PATH --save=True --dst_dir=DST_DIR to split into training and testing sets. SRC_DIR is the source directory of UMD $YOUR_PATH$/part-affordance-dataset-tools/part-affordance-dataset/tools, FILE_PATH is the path to the UMD file containing the splits that object instances belong to $YOUR_PATH$/part-affordance-dataset-tools/part-affordance-dataset/category_split.txt, DST_DIR is the directory where splits are saved. Training and testing folders are created automatically.

Contributing

If you find an error, if you want to suggest a new feature or a change, you can use the issues tab to raise an issue with the appropriate label.

Complete and full updates can be found in CHANGELOG.md. The file follows the guidelines of https://keepachangelog.com/en/1.1.0/.

Credits

T. Apicella, A. Xompero, P. Gastaldo, A. Cavallaro, Segmenting Object Affordances: Reproducibility and Sensitivity to Scale, Proceedings of the European Conference on Computer Vision Workshops, Twelfth International Workshop on Assistive Computer Vision and Robotics (ACVR), Milan, Italy, 29 September 2024.

@InProceedings{Apicella2024ACVR_ECCVW,
            title = {Segmenting Object Affordances: Reproducibility and Sensitivity to Scale},
            author = {Apicella, T. and Xompero, A. and Gastaldo, P. and Cavallaro, A.},
            booktitle = {Proceedings of the European Conference on Computer Vision Workshops},
            note = {Twelfth International Workshop on Assistive Computer Vision and Robotics},
            address={Milan, Italy},
            month="29" # SEP,
            year = {2024},
        }

Enquiries, Question and Comments

If you have any further enquiries, question, or comments, or you would like to file a bug report or a feature request, please use the Github issue tracker.

Licence

This work is licensed under the MIT License. To view a copy of this license, see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
docs		docs
src		src
zenodo		zenodo
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Segmenting Object Affordances: Reproducibility and Sensitivity to Scale

Table of Contents

News

Installation

Setup specifics

Requirements

Instructions

Running demo

Trained models

Models installation

Mask2Former installation

ResNet50FCN installation

DRNAtt installation

AffordanceNet installation

Training and testing data

Hand-occluded object setting

Unoccluded object setting

Contributing

Credits

Enquiries, Question and Comments

Licence

About

Releases

Packages

Contributors 2

Languages

License

apicis/aff-seg

Folders and files

Latest commit

History

Repository files navigation

Segmenting Object Affordances: Reproducibility and Sensitivity to Scale

Table of Contents

News

Installation

Setup specifics

Requirements

Instructions

Running demo

Trained models

Models installation

Mask2Former installation

ResNet50FCN installation

DRNAtt installation

AffordanceNet installation

Training and testing data

Hand-occluded object setting

Unoccluded object setting

Contributing

Credits

Enquiries, Question and Comments

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages