This is the accompanying code and data for my masterproject.
Building a 3D scene from a 2D image finds an application in many different fields, such as robotics and computer vision. My Project focuses on a subtopic of that, namely the challenge of getting an object's 3D Pose Estimation from a single 2D image. Because of limited amounts of training data, many methods rely on synthetic, rendered images.
The method proposed here works solely on real images, exploiting similarities between different shape classes. It is based on [1] and predicts the closest shape out of a model database and estimates the corresponding 3D rotation from a single image. This is done in two stages, where in the first stage, a general embedding is learnt from the volumetric representations of the available shapes, and in the second stage an image is mapped to the embedding.
This achieves good results on the 'in the wild' dataset PASCAL3D+.
The proposed method also outputs an uncertainty value and therefore makes it easier to spot rotational ambiguities of the pose. In future, this could for example provide a robot with more accurate instructions on how to interact with the environment, by better assessing its possible actions.
My implementation takes only several hours to train on a single GPU, which makes it ideal for quickly testing new hypotheses.
[1] Kyaw Zaw Lin, Weipeng Xu, Qianru Sun, Christian Theobalt, and Tat-Seng Chua. Learning a Disentangled Embedding for Monocular 3D Shape Retrieval and Pose Estimation. arXiv e-prints, page arXiv:1812.09899, December 2018.
- Download the repository
- Install all libraries and dependencies in a conda virtual environment
- Download and Pre-process the images and annotations as described in Data
- Find a description of possible command line arguments by running
python3 inplementation.py -h
The only required argument is an experiment name. It can be specified with -n.
So, for example, python3 implementation.py -n 'my_experiment' will run the code and save the results with the name 'my_experiment'.
The dataset PASCAL3D+ can be found here. I am using version 1.1.
There are two pre-processing parts. One is a notebook for the images and annotations (pre_processing-images_and_annotations.ipynb) and the other is a python program for the 3D models (pre_processing_pointclouds.py).
The pointclouds are already part of this repository, but the images and annotations have to be created:
- Make sure that the PATHS at the top of pre_processing-images_and_annotations.ipynb are correct.
- Run the notebook
- confirm that everything worked, by checking the Data/numpy_data/ folder. It should contain one Test and one Train folder, each with subfolders for every class that contain images, bboxes, shape_idxs, valid_im_ids, occluded, truncated and true_rots.
The pointcloud pre-processing was done with MeshLab. After it is installed (MeshLab Installation), change the paths in pre_processing_pointclouds.py accordingly.
An overview over the results and corresponding parameter values can be found in trained_nets/results/. Display_results.ipynb displays them.