VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition

　An on going TF implementation on VoxNet to deal with 3D LiDAR pointcloud segmentation classification, refer to paper.

@inproceedings{Maturana2015VoxNet,
  title={VoxNet: A 3D Convolutional Neural Network for real-time object recognition},
  author={Maturana, Daniel and Scherer, Sebastian},
  booktitle={Ieee/rsj International Conference on Intelligent Robots and Systems},
  pages={922-928},
  year={2015},
}

Input Layer. This layer accepts a fixed-size grid of I×J×K (I=J=K=32) voxels, each value for each grid cell is updated depending on the occupancy model, resulted in the (−1, 1) range.
Convolutional Layers Conv(f, d, s).
- These layers accept four dimensional input volumes in which three of the dimensions are spatial, and the fourth contains the feature maps.
- The layer creates f feature maps by convolving the input with f learned filters of shape d × d × d × f', where d are the spatial dimensions and f' is the number of input feature maps.
  ==> conv3d(depth=d, height=d, width=d, in_channels=f', out_channels=f)
- Convolution can also be applied at a spatial stride s.
- Output shape: (I/J/K - d + 2*padding)/s + 1
- The output is passed through a leaky rectified nonlinearity unit (Leaky ReLU) with parameter 0.1. (激活函数)
Pooling Layers Pool(m).
- These layers downsample the input volume by a factor of by m along the spatial dimensions by replacing each m × m × m non-overlapping block of voxels with their maximum.
  ≈ conv3d(depth=m, height=m, width=m, in_channels=f, out_channels=m) & s=m
Fully Connected Layer FC(n).
- Fully connected layers have n output neurons. The output of each neuron is a learned linear combination of all the outputs from the previous layer, passed through a nonlinearity.
Output Layer.
- ReLUs save for the final output layer, where the number of outputs corresponds to the number of class labels K and a softmax nonlinearity is used to provide a probabilistic output.
VoxNet: Conv(32, 5, 2)→Conv(32, 3, 1)→Pool(2)→FC(128)→FC(K)

Dataset

Sydney Urban Object Dataset, short for SUOD
Other LiDAR PointCloud Dataset(not yet support though :D):
Stanford Track Collection
KITTI Object Recognition
Semantic 3D

Requirement

　Implemented and tested on Ubuntu 16.04 with Python 3.5 and Tensorflow 1.3.0.

Clone this repo

$ git clone https://github.com/Durant35/VoxNet --recurse-submodules

We name the root directory as $ROOT and if you forget to clone the python-pcl submodule:

$ git submodule update --init --recursive

Setup virtual environment with all requirements

$ mkvirtualenv --no-site-packages -p /usr/bin/python3.5 py3-1.3.0
$ cd $ROOT
$ workon py3-1.3.0
(py3-1.3.0) $ pip3 install -r requirements.txt

[option] python-pcl, or you can comment those pcl codes.

$ cd $ROOT
$ workon py3-1.3.0
(py3-1.3.0) $ pip3 install Cython
(py3-1.3.0) $ cd 3rdparty/python-pcl
(py3-1.3.0) $ python setup.py build_ext -i
(py3-1.3.0) $ python setup.py install
(py3-1.3.0) $ rm -rf *

Data pre-process

　Generate npy_generated/training/*.py from SUDO fold 1-3, npy_generated/testing/*.py from fold 4.

$ cd $ROOT
$ workon py3-1.3.0
(py3-1.3.0) $ python ./src/preprocess.py -h
usage: preprocess.py [-h] [--dataset_dir DATASET_DIR] [--fold FOLD]
                     [--viz [VIZ]] [--noviz] [--pcd [PCD]] [--nopcd]
                     [--npy_dir NPY_DIR] [--clear_cache [CLEAR_CACHE]]
                     [--noclear_cache] [--type TYPE]

optional arguments:
  -h, --help            show this help message and exit
  --dataset_dir DATASET_DIR
                        directory that stores the Sydney Urban Object Dataset,
                        short for SUOD.
  --fold FOLD           which fold, 0..3, for SUOD.
  --viz [VIZ]           visualize preprocess voxelization.
  --noviz
  --pcd [PCD]           save object point cloud as pcd.
  --nopcd
  --npy_dir NPY_DIR     directory to stores the SUOD preprocess results,
                        including occupancy grid and label.
  --clear_cache [CLEAR_CACHE]
                        clear previous generated preprocess results.
  --noclear_cache
  --type TYPE           type of SUOD preprocess results, training set or
                        testing set.
# prepare training set & testing set
(py3-1.3.0) $ python ./src/preprocess.py --clear_cache
(py3-1.3.0) $ python ./src/preprocess.py --fold 1
(py3-1.3.0) $ python ./src/preprocess.py --fold 2
(py3-1.3.0) $ python ./src/preprocess.py --fold 3 --type testing

Training/Validation

Run at once.

$ workon py3-1.3.0
(py3-1.3.0) $ ./scripts/train.sh
...
Start training...
...
INFO:tensorflow:loss = 0.004891032, step = 208 (7.759 sec)
INFO:tensorflow:Saving checkpoints for 214 into ./logs/model.ckpt.
INFO:tensorflow:Loss for final step: 0.007142953.
Finished training.
Start testing...
INFO:tensorflow:Starting evaluation at 2018-04-26-03:29:16
2018-04-26 11:29:16.387119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0)
INFO:tensorflow:Restoring parameters from ./logs/model.ckpt-214
INFO:tensorflow:Finished evaluation at 2018-04-26-03:29:16
INFO:tensorflow:Saving dict for global step 214: accuracy = 0.64666665, global_step = 214, loss = 2.4514768
Finished testing.
You can use Tensorboard to visualize the results by command 'tensorboard --logdir=./logs'.

Run after pre-process npy_generated/training/*.npy

$ cd $ROOT
$ workon py3-1.3.0
(py3-1.3.0) $ python ./src/train.py -h
usage: train.py [-h] [--log_dir LOG_DIR] [--npy_dir NPY_DIR]
                [--clear_log [CLEAR_LOG]] [--noclear_log]
                [--num_epochs NUM_EPOCHS] [--batch_size BATCH_SIZE]

optional arguments:
  -h, --help            show this help message and exit
  --log_dir LOG_DIR     Directory for training logs, including training
                        summaries as well as training model checkpoint.
  --npy_dir NPY_DIR     directory to the preprocess training dataset.
  --clear_log [CLEAR_LOG]
                        force to clear old logs if exist.
  --noclear_log
  --num_epochs NUM_EPOCHS
                        The numbers of epochs for training, train over the
                        dataset about 8 times.
  --batch_size BATCH_SIZE
                        The numbers of training examples present in a single
                        batch for every training.
(py3-1.3.0) $ python ./src/train.py  --clear_log

Testing

$ cd $ROOT
$ workon py3-1.3.0
(py3-1.3.0) $ python ./src/eval.py -h
usage: eval.py [-h] [--model_dir MODEL_DIR] [--npy_dir NPY_DIR]

optional arguments:
  -h, --help            show this help message and exit
  --model_dir MODEL_DIR
                        directory for training model checkpoint.
  --npy_dir NPY_DIR     directory to the preprocess training dataset.
# run on default configs.
(py3-1.3.0) $ python ./src/eval.py
...
Predicted: trunk, Ground Truth: trunk
Top 3 labels: trunk traffic_sign traffic_lights...
Predicted: pedestrian, Ground Truth: pedestrian
Top 3 labels: pedestrian traffic_sign car...
Predicted: pedestrian, Ground Truth: pedestrian
Top 3 labels: pedestrian traffic_sign traffic_lights...
Predicted: traffic_lights, Ground Truth: traffic_lights
Top 3 labels: traffic_lights trunk building...
INFO:tensorflow:Starting evaluation at 2018-04-28-10:09:34
2018-04-28 18:09:34.279983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0)
INFO:tensorflow:Restoring parameters from ./logs/model.ckpt-1381
2018-04-28 18:09:36.986620: W tensorflow/core/framework/op_kernel.cc:1192] Out of range: FIFOQueue '_1_enqueue_input/fifo_queue' is closed and has insufficient elements (requested 128, current size 0)
	 [[Node: fifo_queue_DequeueUpTo = QueueDequeueUpToV2[component_types=[DT_INT64, DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](enqueue_input/fifo_queue, fifo_queue_DequeueUpTo/n)]]
INFO:tensorflow:Finished evaluation at 2018-04-28-10:09:37
INFO:tensorflow:Saving dict for global step 1381: accuracy = 0.64672804, global_step = 1381, loss = 2.816635
Finished testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition

Dataset

Requirement

Data pre-process

Training/Validation

Testing

Files

README.md

Latest commit

History

README.md

File metadata and controls

VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition

Dataset

Requirement

Data pre-process

Training/Validation

Testing