PF-AAE: a particle filtering framework for 3D object pose tracking with RGB input

This project implements PF-AAE, a framework to perform tracking of the 3D pose of an object via particle filtering (PF) and augmented autoencoders (AAE). The filter iteratively estimates the posterior of the rotation matrix R_t given the RGB input y_1:t. The prediction step deploys a noise model in SO(3), while the correction step deploys a measurement model based on the AAE architecture. A novel resampling strategy called AAE_TL resampling suggests an improvement of the tracking performance when the object undergoes abrupt changes of the pose. It relies on AAE_TL, an augmented autoencoder trained with texture-less reconstruction objectives. The tracking procedure is carried out offline.

This work builts on the AugmentedAutoencoder repository, available here.

⚠️ Code availability disclaimer

The actual implementation of PF-AAE is under NDA. This repository contains only the baseline code of the work, namely the AugmentedAutoencoder repository. The present readme is included to provide an overview of PF-AAE and its functionalities. Therefore, the reported demos cannot be reproduced and some files may not be available. Feel free to reach out for further details regarding the work 🚀.

Installation

Install the code dependencies

pip install -r requirements.txt

Pip installation of the code

pip install --user .

Create the workspace (folder to collect AAE, AAE_TL, PF-AAE data)

export AE_WORKSPACE_PATH=/path/to/aae_workspace
mkdir $AE_WORKSPACE_PATH
ae_init_workspace

Check the content of the workspace

└── aae_workspace
    ├── cfg
    │   └── train_template_aae.cfg
    │   └── train_template_aae_tl.cfg
    ├── cfg_eval
    ├── experiments
    └── tmp_datasets

Augmented Autoencoders

Augmented autoencoders (AAEs) are convolutional autoencoders trained to reconstruct the view of an object from an augmented version of it fed as input. Thus, they deliver an implicit representation of rotations in their latent space. For further details, refer to the original readme of the AugmentedAutoencoder repository.

In this framework, it is possible to train two kinds of AAEs:

AAE architecture: augmented autoencoder with textured reconstruction.
AAE_TL architecture: augmented autoencoder with texture-less reconstruction.

The former is more discriminative, while the latter maps views of the object that are symmetric without textures nearby in the latent space.

AAE architecture

The image shows the AAE training procedure and the results obtained after 20000 training epochs.

AAE_TL architecture

The image shows the AAE_TL training procedure and the results obtained after 20000 training epochs.

Configuration file

The AAEs architectures are defined via a configuration file .cfg. Examples can be found in auto_pose/ae/cfg or in the workspace, after its initialization.

In the configuration file must be defined the path to the 3D model of the object (MODEL_PATH). Moreover, it is required the path to a folder containing the images for the augmentation of the training input (BACKGROUND_IMAGES_GLOB). The datasets used for 3D models and background images are reported in section Datasets.

[Paths]
MODEL_PATH: /path/to/my_3d_model.ply
BACKGROUND_IMAGES_GLOB: /path/to/background/images/*.jpg

To enable the training of an AAE or an AAE_TL architecture the TLESS_TARGET flag must be set as follow:

[Network]
TLESS_TARGET: False # for AAE training
TLESS_TARGET: True  # for AAE_TL training

For further details about the configuration files, refer to the original readme of the AugmentedAutoencoder repository.

Training and embedding

Copy your configuration file my_autoencoder.cfg in the workspace

mkdir $AE_WORKSPACE_PATH/cfg/exp_group
cp path/to/your/my_autoencoder.cfg $AE_WORKSPACE_PATH/cfg/exp_group/my_autoencoder.cfg

Train the architecture

ae_train exp_group/my_autoencoder

Create the embedding (i.e., the codebook)

ae_embed exp_group/my_autoencoder

Check the content of the workspace

└── aae_workspace
    ├── cfg
    │   └── exp_group
    │       └── my_autoencoder.cfg
    └── experiments
        └── exp_group
            └── my_autoencoder
                └── checkpoints
                └── train_figures

PF-AAE architecture

PF-AAE is a particle filter that performs the tracking of the 3D pose of an object from a sequence of images. It iteratively estimates the posterior of the object rotation matrix R_t given the RGB input, or observations, y_1:t.

This framework builts on the implementation of a particle filter offered by the pfilter repository.

PF-AAE update

The image shows one iteration of PF-AAE. The prediction step moves the particles exploiting a noise model in SO(3) as state evolution model. The correction step builds the measurement model with an AAE encoder and its latent space. Namely, the rendered particles are compared with the observation via the cosine similarity. Then, a Gaussian kernel is applied as weighting function (not shown). As resampling procedure, it is possible to combine systematic resampling and AAE_TL resampling (cf. the subsection AAE_TL resampling).

The implemented noise models are norm, unif-norm, predict. The weighting function presents a parameter gamma that controls the discriminative behavior of the system. The resampling is performed when the effective number of particles is below a threshold n_eff_threshold. For further details, refer to auto_pose/pf/pfilter_aae.py.

AAE_TL resampling

With an AAE_TL architecture trained on the same object of the AAE architecture employed in the filter, it is possible to use the AAE_TL resampling. At each iteration, the portion aae_resampling_proportion of particles with the lowest weights is substituted with particles uniformly sampled from the aae_resampling_knn nearest neighbors of the MAP estimate in the AAE_TL codebook. The other particles are resampled according to the systematic resampling procedure.

The codename of this resampling procedure is aae-tl. For comparison, also the unif resampling is implemented. It performs the sampling procedure uniformly in SO(3) instead of the codebook of AAE_TL.

Tracking experiments

Generate a sequence of views of the object

pf_generate_sequences exp_group/my_autoencoder \
    # sequence parameters (see below)

For the available parameters, refer to auto_pose/pf/pf_generate_sequences.py and the section Run a Demo.

Start tracking of the sequence

pf_tracking_sequences -aae exp_group/my_autoencoder \
    # tracking parameters (see below)

For the available parameters, refer to auto_pose/pf/pf_tracking_sequences.py, auto_pose/pf/pfilter_aae.py, and the section Run a Demo.

Check the results in the workspace

└── aae_workspace
    ├── experiments
    │   └── exp_group
    │       └── my_autoencoder
    │           ├── filtering
    │           │   └── sequence_name
    │           │       └── pf_tracking_name
    │           └── ...
    └── ...

In place of sequence_name and tracking_name will appear two strings that respectively identify the generated sequence and the tracking experiments, along with their parameters.

Run a demo

Edit the first two lines of demo/cfg/aae/cracker.cfg and demo/cfg/aae/cracker.cfg. MODEL_PATH must be the path to the YCB cracker_box model, available in demo/obj_000002.ply. BACKGROUND_IMAGES_GLOB must be the path to the Pascal VOC2012 dataset, available here.

[Paths]
MODEL_PATH: /path/to/obj_000002.ply
BACKGROUND_IMAGES_GLOB: /path/to/voc12/VOCdevkit/VOC2012/JPEGImages/*.jpg

Copy the configuration files in demo/cfg in the workspace

cp -r demo/cfg $AE_WORKSPACE_PATH

Training and embedding of the AAE architecture for the YCB cracker_box

ae_train aae/cracker # NB: ~8 hours with a K40 GPU
ae_embed aae/cracker

Training and embedding of the AAE_TL architecture for the YCB cracker_box

ae_train aae_tl/cracker # NB: ~8 hours with a K40 GPU
ae_embed aae_tl/cracker

Generate a sequence with a backflip of the object at 4.2 seconds. Then, run 3 tracking experiments on it: one with AAE_TL resampling, one with uniform resampling, and one without AAE_TL and uniform resampling.

cd demo
./pf_aae_example.sh

Results in $AE_WORKSPACE_PATH/experiments/aae/cracker/filtering

The following animation shows from left to right:

the input sequence (generated)
the output of the PF-AAE w/ AAE_TL resampling (rendered)
the output of the PF-AAE w/ uniform resampling (rendered)
the output of the PF-AAE w/o AAE_TL and uniform resampling (rendered)

The following figure compares the ground truth 3D poses of the object in the input sequence with the ones estimated by the 3 tracking experiments. Poses are expressed with their axis-angle representations.

The following figures show the (rendered) particles of the filters when the backflip occurs. For each particle:

bottom left: the cosine similarity with the observation y_t
bottom right: the weight of the particle

The meanings of the colors of the borders are the following:

black: particles obtained with the prediction step
red: particles obtained with AAE_TL or uniform resampling
blue: MAP estimate that comes from a particle obtained with the prediction step
green: MAP estimate that comes from a particle obtained with AAE_TL or uniform resampling

Particles of PF-AAE w/ AAE_TL resampling when the backflip occurs (4.2 s):

Particles of PF-AAE w/ uniform resampling when the backflip occurs (4.2 s):

Datasets

This work has been tested with the following two datasets:

YCB_Video: used for the 3D models of the objects being tracked.
Pascal VOC 2012: used for the augmentation of the input images during training.

We use Pyrender + EGL for object rendering. Differently from the original AugmentedAutoencoder code, this renderer permits to use 3D models with textures. Please, make sure that the mesh vertices are expressed in meters before launching the training procedure.

Code structure

The main changes from the original AugmentedAutoencoder code are in the following files:

├── auto_pose
│   ├── ae
│   │   ├── ae_latent_exploration.py
│   │   ├── ae_latent_study.py
│   │   ├── cfg
│   │   │   └── train_template_aae.cfg
│   │   │   └── train_template_aae_tl.cfg
│   │   └── ...
│   ├── pf
│   │   ├── pf_generate_sequences.py
│   │   ├── pfilter_aae.py
│   │   ├── pfilter.py
│   │   ├── pf_tracking_sequences.py
│   │   └── utils.py
│   ├── renderer
│   │   └── renderer.py
│   └── ...
├── scripts
│   ├── ae_embedding
│   ├── ae_latent_exploration
│   ├── ae_latent_study
│   ├── ae_training
│   ├── pf_sequences
│   └── pf_tracking
├── setup.py
└── ...

We provide hereafter a brief overview of the code structure:

auto_pose/ae contains some new code to train and study the AAE_TL architecture, along with the original AAE.
auto_pose/pf contains the main code that implements the PF-AAE architecture.
auto_pose/renderer contains the interface with Pyrender, which supports textured models.
scripts contains some examples of shell scripts to configure and use the PF-AAE, AAE, AAE_TL architectures.

We refer to the code documentation for more details.

Acknowledgments

This project has been developed during my internship at the Istituto Italiano di Tecnologia (IIT), within the Humanoid Sensing and Perception group (HSP). I am sincerely thankful to my supervisors for all their support and suggestions to carry out the work.

License

This code is licensed under MIT License, see the LICENSE file for more details.

References

[1] Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, and Rudolph Triebel, Implicit 3D Orientation Learning for 6D Object Detection from RGB Images, The European Conference on Computer Vision (ECCV), September 2018.

[2] Xinke Deng, Arsalan Mousavian, Yu Xiang, Fei Xia, Timothy Bretl, and Dieter Fox, PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking, 2019.

[3] Simo Särkkä, Bayesian Filtering and Smoothing. Cambridge University Press, USA. 2013.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github		.github
auto_pose		auto_pose
demo		demo
detection_utils		detection_utils
docs		docs
scripts		scripts
sixd_toolkit_extensions		sixd_toolkit_extensions
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_AAE.md		README_AAE.md
init_env.sh		init_env.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PF-AAE: a particle filtering framework for 3D object pose tracking with RGB input

⚠️ Code availability disclaimer

Table of Contents

Installation

Augmented Autoencoders

AAE architecture

AAE_TL architecture

Configuration file

Training and embedding

PF-AAE architecture

PF-AAE update

AAE_TL resampling

Tracking experiments

Run a demo

Datasets

Code structure

Acknowledgments

License

References

About

Releases

Packages

Languages

License

claudioverardo/PF-AAE-demo

Folders and files

Latest commit

History

Repository files navigation

PF-AAE: a particle filtering framework for 3D object pose tracking with RGB input

⚠️ Code availability disclaimer

Table of Contents

Installation

Augmented Autoencoders

AAE architecture

AAETL architecture

Configuration file

Training and embedding

PF-AAE architecture

PF-AAE update

AAETL resampling

Tracking experiments

Run a demo

Datasets

Code structure

Acknowledgments

License

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

AAE_TL architecture

AAE_TL resampling

Packages