Skip to content
/ aff-seg Public

Official repository of the paper "Segmenting Object Affordances: Reproducibility and Sensitivity to Scale" accepted at 12th Workshop on Assistive Computer Vision and Robotics (ACVR) in ECCV 2024

License

Notifications You must be signed in to change notification settings

apicis/aff-seg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Segmenting Object Affordances: Reproducibility and Sensitivity to Scale

Visual affordance segmentation identifies image regions of an object an agent can interact with. Existing methods re-use and adapt learning-based architectures for semantic segmentation to the affordance segmentation task and evaluate on small-size datasets. However, experimental setups are often not reproducible, thus leading to unfair and inconsistent comparisons. In this work, we benchmark these methods under a reproducible setup on two single objects scenarios, tabletop without occlusions and hand-held containers, to facilitate future comparisons. We include a version of a recent architecture, Mask2Former, re-trained for affordance segmentation and show that this model is the best-performing on most testing sets of both scenarios. Our analysis show that models are not robust to scale variations when object resolutions differ from those in the training set.

[arXiv] [webpage] [trained models]

Table of Contents

  1. News
  2. Installation
    1. Setup specifics
    2. Requirements
    3. Instructions
  3. Running demo
  4. Trained models
  5. Training and testing data
    1. Hand-occluded object setting
    2. Unoccluded object setting
  6. Contributing
  7. Credits
  8. Enquiries, Question and Comments
  9. License

News

  • 26 October 2024: Released code and weights of CNN, DRNAtt, AffNet, and Mask2Former, trained on unoccluded object setting (UMD)
  • 26 September 2024: Released code and weights of ACANet, ACANet50, RN18U, DRNAtt, RN50F, Mask2Former, trained on hand-occluded object setting (CHOC-AFF)
  • 04 September 2024: Pre-print available on arxiv at https://arxiv.org/abs/2409.01814
  • 17 August 2024: Source code, models, and further details will be released in the next weeks.
  • 15 August 2024: Paper accepted at Twelfth International Workshop on Assistive Computer Vision and Robotics (ACVR), in conjunction with the 2024 European Conference on Computer Vision (ECCV).

Installation

Setup specifics

The models testing were performed using the following setup:

  • OS: Ubuntu 18.04.6 LTS
  • Kernel version: 4.15.0-213-generic
  • CPU: Intel® Core™ i7-9700K CPU @ 3.60GHz
  • Cores: 8
  • RAM: 32 GB
  • GPU: NVIDIA GeForce RTX 2080 Ti
  • Driver version: 510.108.03
  • CUDA version: 11.6

Requirements

  • Python 3.8
  • PyTorch 1.9.0
  • Torchvision 0.10.0
  • OpenCV 4.10.0.84
  • Numpy 1.24.4
  • Tqdm 4.66.5

Instructions

# Create and activate conda environment
conda create -n affordance_segmentation python=3.8
conda activate affordance_segmentation
    
# Install libraries
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia
pip install opencv-python onnx-tool numpy tqdm scipy

Running demo

Download model checkpoint ACANet.zip, and unzip it.

Use the images in the folder src/test_dir or try with your own images. The folder structure is DATA_DIR/rgb.

To run the model and visualise the output:

python src/demo.py --gpu_id=GPU_ID --model_name=MODEL_NAME --train_dataset=TRAIN_DATA --data_dir=DATA_DIR --checkpoint_path=CKPT_PATH --save_res=True --dest_dit=DEST_DIR
  • Replace MODEL_NAME with ACANet
  • DATA_DIR: directory where data are stored
  • TRAIN_DATA: name of the training dataset
  • CKPT_PATH: path to the .pth file
  • DEST_DIR: path to the destination directory. This flag is considered only if you save the predictions --save_res=True or the overlay visualisation --save_overlay=True. Results are automatically saved in DEST_DIR/pred, overlays in DEST_DIR/vis.

You can test if the model has the same performance by running inference on the images provided in src/test_dir/rgb and checking if the output is the same of test_dir/pred .

Trained models

Here is the list of available models trained on UMD or CHOC-AFF

Model name UMD CHOC-AFF
CNN link to zip
AffordanceNet link to zip
ACANet link to zip
ACANet50 link to zip
RN50F link to zip
RN18U link to zip
DRNAtt link to zip link to zip
Mask2Former link to zip link to zip

Models installation

Note

When testing the installation of a model, you might need to change the imports in the scripts.

Mask2Former installation

To use Mask2Former model, please run the following commands:

# Install detectron2 library
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

# Access mask2former folder in repository
cd src/models/mask2former

# Clone code from Mask2Former repository
git clone https://github.com/facebookresearch/Mask2Former.git 

# Compile 
cd Mask2Former/mask2former/modeling/pixel_decoder/ops
sh make.sh

# Return to the main directory (aff-seg)
cd ../../../../../../../../

# Install required libraries
pip install timm

# Run script to load Mask2Former (expected output: "Model loaded correctly!!")
python src/models/mask2former/test_mask2former_load.py

Comment out line 194 in /src/models/mask2former/Mask2Former/mask2former/maskformer_model.py (images = [(x - self.pixel_mean) / self.pixel_std for x in images]) because we preprocess images in the dataloader.

ResNet50FCN installation

To use ResNet50FastFCN (RN50F) model, please run the following commands:

# Access resnet_fcn folder in repository
cd src/models/resnet_fcn

# Clone code from FastFCN repository
git clone https://github.com/wuhuikai/FastFCN.git

# Return to the main directory (aff-seg)
cd ../../../
  • In aff-seg/src/models/resnet_fcn/FastFCN/encoding/models/encnet.py replace line 11 import encoding with from ..nn import encoding
  • In aff-seg/src/models/resnet_fcn/FastFCN/encoding/models/base.py replace in line 38 pretrained=True with pretrained=False (the script tries to download the resnet pretrained weights, but fails). In case you want to use the pretrained weights, download them from issue#86 and then modify line 27 root='~/.encoding/models' to point at the folder with the downloaded checkpoint.

Run script to load RN50F (expected output: model statistics, with average inference time and standard deviation):

python src/models/resnet_fcn/test_resnet_fcn_load.py

DRNAtt installation

To use DRNAtt model, please run the following commands:

# Access drnatt folder in repository
cd src/models/drnatt

# Clone code from DANet repository
git clone https://github.com/junfu1115/DANet.git

# Clone code from DRN repository
git clone https://github.com/fyu/drn.git

# Install required libraries
pip install ninja

# Return to the main directory (aff-seg)
cd ../../../

Comment out line 12 and 13 in /DANet/encoding/_init_.py (from .version import __version__, and from . import nn, functions, parallel, utils, models, datasets, transforms)

Run script to check that the model is correctly installed (expected output: model statistics, with average inference time and standard deviation):

python src/models/drnatt/drn_att.py

AffordanceNet installation

To use AffordanceNet (AffNet), please run the following commands:

# Access affnet folder in repository
cd src/models/affnet

# Clone code from AffNetDR repository
git clone https://github.com/HuchieWuchie/affnetDR.git

# Return to the main directory (aff-seg)
cd ../../../
  • Replace line 75 in /affNetDR/lib/roi_heads.py (mask_prob = x.sigmoid()) with mask_prob = x.softmax(dim=1).
  • Replace line 77 in /affNetDR/lib/roi_heads.py with the commented lines 83 and 87
  • Replace imports torchvision._internally_replaced_utils with torchvision.models.utils in /affNetDR/lib/mask_rcnn.py (line 7), /affNetDR/lib/faster_rcnn.py (line 8)
  • Change line 148 in /affNetDR/lib/mask_rcnn.py with min_size=480, max_size=640

Run script to check that the model is correctly installed (expected output: model loaded successfully!):

python src/models/affnet/test_affordancenet_load.py

Training and testing data

Hand-occluded object setting

To recreate the training and testing splits of the mixed-reality dataset:

  1. Download CHOC-AFF folders rgb, mask, annotations, affordance and unzip them in the preferred folder SRC_DIR.
  2. Run python src/utils/split_CHOC.py --src_dir=SRC_DIR --dst_dir=DST_DIR to split into training, validation and testing sets. DST_DIR is the directory where splits are saved.
  3. Run python src/utils/create_dataset_crops_CHOC.py --data_dir=DATA_DIR --save=True --dest_dir=DEST_DIR to perform the cropping window procedure described in ACANet paper. This script performs also the union between the arm mask and the affordance masks. DATA_DIR is the directory containing the rgb and affordance folders e.g. DST_DIR/training following the naming used for the previous script. DEST_DIR is the destination directory, where to save cropped rgb images, and segmentation masks.

To use the manually annotated data from CCM and HO-3D datasets:

  1. Download rgb and annotation files from https://doi.org/10.5281/zenodo.10708553 and unzip them in the preferred folder SRC_DIR.
  2. Run python src/utils/create_dataset_crops.py --data_dir=DATA_DIR --dataset_name=DATA_NAME --save=True --dest_dir=DEST_DIR to perform the cropping window procedure described in ACANet paper. DATA_DIR is the directory containing the rgb and affordance folders. DATA_NAME is the dataset name (either CCM or HO3D). DEST_DIR is the destination directory, where to save cropped rgb images, and segmentation masks.

Unoccluded object setting

To recreate the training and testing splits of the UMD dataset:

  1. Download the UMD (tools) dataset and unzip it in $YOUR_DIRECTORY$
  2. python src/utils/split_UMD.py --src_dir=SRC_DIR --file_path=FILE_PATH --save=True --dst_dir=DST_DIR to split into training and testing sets. SRC_DIR is the source directory of UMD $YOUR_PATH$/part-affordance-dataset-tools/part-affordance-dataset/tools, FILE_PATH is the path to the UMD file containing the splits that object instances belong to $YOUR_PATH$/part-affordance-dataset-tools/part-affordance-dataset/category_split.txt, DST_DIR is the directory where splits are saved. Training and testing folders are created automatically.

Contributing

If you find an error, if you want to suggest a new feature or a change, you can use the issues tab to raise an issue with the appropriate label.

Complete and full updates can be found in CHANGELOG.md. The file follows the guidelines of https://keepachangelog.com/en/1.1.0/.

Credits

T. Apicella, A. Xompero, P. Gastaldo, A. Cavallaro, Segmenting Object Affordances: Reproducibility and Sensitivity to Scale, Proceedings of the European Conference on Computer Vision Workshops, Twelfth International Workshop on Assistive Computer Vision and Robotics (ACVR), Milan, Italy, 29 September 2024.

@InProceedings{Apicella2024ACVR_ECCVW,
            title = {Segmenting Object Affordances: Reproducibility and Sensitivity to Scale},
            author = {Apicella, T. and Xompero, A. and Gastaldo, P. and Cavallaro, A.},
            booktitle = {Proceedings of the European Conference on Computer Vision Workshops},
            note = {Twelfth International Workshop on Assistive Computer Vision and Robotics},
            address={Milan, Italy},
            month="29" # SEP,
            year = {2024},
        }

Enquiries, Question and Comments

If you have any further enquiries, question, or comments, or you would like to file a bug report or a feature request, please use the Github issue tracker.

Licence

This work is licensed under the MIT License. To view a copy of this license, see LICENSE.

About

Official repository of the paper "Segmenting Object Affordances: Reproducibility and Sensitivity to Scale" accepted at 12th Workshop on Assistive Computer Vision and Robotics (ACVR) in ECCV 2024

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published