Name		Name	Last commit message	Last commit date
parent directory ..
nets		nets
.gitignore		.gitignore
BUILD		BUILD
README.md		README.md
WORKSPACE		WORKSPACE
eval_ptn.py		eval_ptn.py
eval_rotator.py		eval_rotator.py
input_generator.py		input_generator.py
losses.py		losses.py
metrics.py		metrics.py
model_ptn.py		model_ptn.py
model_rotator.py		model_rotator.py
model_voxel_generation.py		model_voxel_generation.py
pretrain_rotator.py		pretrain_rotator.py
train_ptn.py		train_ptn.py
utils.py		utils.py

README.md

Perspective Transformer Nets

Introduction

This is the TensorFlow implementation for the NIPS 2016 work "Perspective Transformer Nets: Learning Single-View 3D Object Reconstrution without 3D Supervision"

Re-implemented by Xinchen Yan, Arkanath Pathak, Jasmine Hsu, Honglak Lee

Reference: Orginal implementation in Torch

How to run this code

This implementation is ready to be run locally or "distributed across multiple machines/tasks". You will need to set the task number flag for each task when running in a distributed fashion. Please refer to the original paper for parameter explanations and training details.

Installation

TensorFlow
- This code requires the latest open-source TensorFlow that you will need to build manually. The documentation provides the steps required for that.
Bazel
- Follow the instructions here.
- Alternately, Download bazel from https://github.com/bazelbuild/bazel/releases for your system configuration.
- Check for the bazel version using this command: bazel version
matplotlib
- Follow the instructions here.
- You can use a package repository like pip.
scikit-image
- Follow the instructions here.
- You can use a package repository like pip.
PIL
- Install from here.

Dataset

This code requires the dataset to be in tfrecords format with the following features:

image
- Flattened list of image (float representations) for each view point.
mask
- Flattened list of image masks (float representations) for each view point.
vox
- Flattened list of voxels (float representations) for the object.
- This is needed for using vox loss and for prediction comparison.

You can download the ShapeNet Dataset in tfrecords format from here^*.

^* Disclaimer: This data is hosted personally by Arkanath Pathak for non-commercial research purposes. Please cite the ShapeNet paper in your works when using ShapeNet for non-commercial research purposes.

Pretraining: pretrain_rotator.py for each RNN step

$ bazel run -c opt :pretrain_rotator -- --step_size={} --init_model={}

Pass the init_model as the checkpoint path for the last step trained model. You'll also need to set the inp_dir flag to where your data resides.

Training: train_ptn.py with last pretrained model.

$ bazel run -c opt :train_ptn -- --init_model={}

Example TensorBoard Visualizations

To compare the visualizations make sure to set the model_name flag different for each parametric setting:

This code adds summaries for each loss. For instance, these are the losses we encountered in the distributed pretraining for ShapeNet Chair Dataset with 10 workers and 16 parameter servers:

You can expect such images after fine tuning the training as "grid_vis" under Image summaries in TensorBoard: Here the third and fifth columns are the predicted masks and voxels respectively, alongside their ground truth values.

A similar image for when trained on all ShapeNet Categories (Voxel visualizations might be skewed):

Files

ptn

Directory actions

More options