Skip to content

Commit

Permalink
Updated architecture and added training code.
Browse files Browse the repository at this point in the history
  • Loading branch information
ayanc committed Sep 7, 2016
1 parent d0eeaf9 commit 3ecd0a5
Show file tree
Hide file tree
Showing 9 changed files with 1,536 additions and 34 deletions.
39 changes: 25 additions & 14 deletions README.md
@@ -1,20 +1,25 @@
# Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions
Copyright (C) 2016, Authors.

This is a reference implementation of the algorithm described in the
paper, ["**Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions**"
*arXiv:1605.07081 [cs.CV]*](https://arxiv.org/abs/1605.07081). It is
being made available for non-commercial research use only. If you find
this code useful in your research, please consider citing the paper.
This is a reference implementation of the algorithm described in the paper:

Contact <ayanc@ttic.edu> with any questions.
Ayan Chakrabarti, Jingyu Shao, and Gregory Shakhnarovich, ["**Depth from
a Single Image by Harmonizing Overcomplete Local Network Predictions**,"
](https://arxiv.org/abs/1605.07081), NIPS 2016.

It is being made available for non-commercial research use only. If you
find this code useful in your research, please consider citing the paper.

Please see the [project page][proj] and contact <ayanc@ttic.edu> with
any questions.

### Requirements

The inference code is in MATLAB and has no external Caffe dependencies.
The top directory contains the inference code. It is entirely in MATLAB
and has no external Caffe dependencies.

1. You can download our trained neural network model weights,
available as a .caffemodel.h5 file [here][model.h5].
available as a .caffemodel.h5 file from the [project page][proj].

2. This implementation requires a modern CUDA-capable high-memory GPU
(it has been tested on an NVIDIA Titan X), and a recent version of
Expand All @@ -25,20 +30,26 @@ The inference code is in MATLAB and has no external Caffe dependencies.
versions of MATLAB, this can be done by running `mexcuda
postMAP.cu`. Requires the CUDA toolkit with `nvcc` to be installed.

[model.h5]: http://www.ttic.edu/chakrabarti/mdepth/wts.caffemodel.h5
[proj]: http://www.ttic.edu/chakrabarti/mdepth/

### Usage

First, you will need to load the network weights from the model file as:
First, you will need to load the network weights from the model file
as:

```>>> net = load('/path/to/wts.caffemodel.h5');```
```>>> net = load('/path/to/mdepth.caffemodel.h5');```

Then given a floating-point RGB image `img`, normalized to `[0,1]`, estimate the corresponding depth map as:
Then given a floating-point RGB image `img`, normalized to `[0,1]`,
estimate the corresponding depth map as:

```>>> Z = mdepth(img,net);```

Note that we expect `img` to be of size `561x427`, which corresponds to the axis aligned crops in the NYU dataset where there is a valid depth map projection. You can recover these as: `img = imgOrig(45:471, 41:601, :)`.
Note that we expect `img` to be of size `561x427`, which corresponds
to the axis aligned crops in the NYU dataset where there is a valid
depth map projection. You can recover these as:
`img = imgOrig(45:471, 41:601, :)`.

### Training with Caffe

Training code will be released soon.
See the `training/` directory for code and instructions for training
your own network.
68 changes: 66 additions & 2 deletions doForward.m
Expand Up @@ -11,6 +11,8 @@
%-- Ayan Chakrabarti <ayanc@ttic.edu>
function act = doForward(img,net)

glob = doVGG(img,net);

img = gpuArray(single(img));
act = img*2-1;

Expand All @@ -29,7 +31,7 @@

if i > 1
if size(act,3) < size(l{1},3)
act = cat(3,act,net.glob);
act = cat(3,act,glob); clear glob;
end;
end;

Expand All @@ -39,7 +41,65 @@
fprintf('\n');
act = reshape(act,[size(act,1) size(act,2) net.numk net.nbins]);

%%%%%%%%%%%%%%%%%%%%
% Do VGG forward pass
function glob = doVGG(img,net)

img = double(img);

img = img(22:end-22,25:end-25,:);

img = permute(img,[2 1 3]); img = img(:,:,end:-1:1);
img = img*255;
img = bsxfun(@minus,img, ...
reshape([103.939 116.779 123.68],[1 1 3]));

act = gpuArray(single(img));


% Do all the conv layers
idx = 1;
for i = 1:length(net.vconvs)
for j = 1:net.vconvs(i)
fprintf('\r--- Layer %d,%d ',i,j);
l = net.vlayers{idx}; idx = idx+1;

pad = (size(l{1},1)-1)/2;
if pad > 0
act = padarray(act,[pad pad],0,'both');
end;
act = vConv(act,l{1},l{2},1,1);
end;
act0 = max(act(1:2:end,:,:),act(2:2:end,:,:));
act = max(act0(:,1:2:end,:),act0(:,2:2:end,:));
end;
fprintf('\n');

act0 = act(1:2:end,1:2:end,:)+act(1:2:end,2:2:end,:)+...
act(2:2:end,1:2:end,:)+act(2:2:end,2:2:end,:);
act = act0(:)/4;
act = max(0,net.vgg_fc1{1}*act + net.vgg_fc1{2});

act = net.vgg_gfp{1}*act + net.vgg_gfp{2};

bw = net.gsz(1); bh = net.gsz(2);
fac = net.gsz(4); nUnits = net.gsz(3);

act = reshape(act,[bw bh nUnits]);
act = permute(act,[2 1 3]);

cx = (bw-1)*fac+1; cx = (cx-561)/2;
cy = (bh-1)*fac+1; cy = (cy-427)/2;

glob = zeros([427,561,nUnits],'single','gpuArray');
for i = 1:nUnits
us = interp2(act(:,:,i),log2(fac));
glob(:,:,i) = us(1+cy:end-cy,1+cx:end-cx);
end;


%%%%%%%%%%%%%%%%%%%%
% Conv layer forward
function out = vConv(in,wts,bias,dil,relu)

% Define a global variable MAX_SPACE to adjust memory usage.
Expand All @@ -52,6 +112,8 @@
[H,W,C] = size(in);
[K1,K2,~,C2] = size(wts);

wts = gpuArray(single(wts)); bias = gpuArray(single(bias));

% Check if its simply a 1x1 conv
if K1 == 1 && K2 == 1
in = reshape(in,[H*W C]);
Expand All @@ -62,6 +124,7 @@
if relu == 1
out = max(0,out);
end;
clear wts bias
return
end;

Expand Down Expand Up @@ -92,4 +155,5 @@
out = reshape(out,[(H-K1eq+1) (W-K2eq+1) C2]);
if relu == 1
out = max(0,out);
end;
end;
clear wts bias
52 changes: 34 additions & 18 deletions loadModel.m
Expand Up @@ -13,6 +13,7 @@
% Build struct with all details
net = struct;

% Get filters and bin centers
k = squeeze(h5read(mh5,'/data/derFilt/0'));
net.numk = size(k,3);
k = k(end:-1:1,end:-1:1,:);
Expand All @@ -27,6 +28,7 @@
scales = reshape(scales,[1 1 net.numk]);
net.k = bsxfun(@times,k,scales);

% Set up local path
net.layers = {}; rsize = 1;
for i = 1:length(layers)
l = layers{i};
Expand All @@ -41,35 +43,49 @@

net.rsize = rsize;

%Global tensor
tmp=h5read(mh5,'/data/gusamp/0');
fac = size(tmp,1); fac = (fac+1) * 4;
% Set up VGG-19 path

b_w = ceil(560/fac)+1;
b_h = ceil(426/fac)+1;

gfip = h5read(mh5,'/data/gfip0/0');
nUnits = prod(size(gfip))/b_w/b_h;
net.vconvs = [2 2 4 4 4];
net.vlayers = {};
for i = 1:length(net.vconvs)
for j = 1:net.vconvs(i)
w = h5read(mh5,sprintf('/data/conv%d_%d/0',i,j));
b = h5read(mh5,sprintf('/data/conv%d_%d/1',i,j));
net.vlayers{end+1} = {w,b};
end;
end;

gfip = reshape(gfip,[b_w b_h nUnits]);
gfip = permute(gfip,[2 1 3]);
w = h5read(mh5,'/data/vgg_fc1/0');
b = h5read(mh5,'/data/vgg_fc1/1');
net.vgg_fc1 = {w',b};

cx = (b_w-1)*fac+1; cx = (cx-561)/2;
cy = (b_h-1)*fac+1; cy = (cy-427)/2;
w = h5read(mh5,'/data/vgg_fc2/0');
b = h5read(mh5,'/data/vgg_fc2/1');
net.vgg_gfp = {w',b};

net.glob = zeros([427,561,nUnits],'single');
for i = 1:nUnits
us = interp2(gfip(:,:,i),log2(fac));
net.glob(:,:,i) = us(1+cy:end-cy,1+cx:end-cx);
end;
fac = 32;
bw = ceil(560/fac)+1; bh = ceil(426/fac)+1;
nUnits = length(b)/bw/bh;
net.gsz = [bw bh nUnits fac];

% Move everything to gpu
if 1 > 2
for i = 1:length(net.layers)
net.layers{i}{1} = gpuArray(single(net.layers{i}{1}));
net.layers{i}{2} = gpuArray(single(net.layers{i}{2}));
end;
net.glob = gpuArray(single(net.glob));

for i = 1:length(net.vlayers)
net.vlayers{i}{1} = gpuArray(single(net.vlayers{i}{1}));
net.vlayers{i}{2} = gpuArray(single(net.vlayers{i}{2}));
end;

net.vgg_fc1{1} = gpuArray(single(net.vgg_fc1{1}));
net.vgg_fc1{2} = gpuArray(single(net.vgg_fc1{2}));

net.vgg_gfp{1} = gpuArray(single(net.vgg_gfp{1}));
net.vgg_gfp{2} = gpuArray(single(net.vgg_gfp{2}));
end;
%%%% Precompute things for consensus

%%% Choose regularizer
Expand Down
87 changes: 87 additions & 0 deletions training/README.md
@@ -0,0 +1,87 @@
# Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions
Copyright (C) 2016, Authors.

This directory contains code and instructions for training the local prediction
network using the [Caffe](https://github.com/BVLC/caffe) framework.

The primary network definition is in the file `train.prototxt` in this directory.
In addition to the prediction network, we also use existing Caffe layers to
compute depth derivatives, and generate classification targets for each depth
map on the fly.

## Custom Layers

Our network employs two custom layers, included in the `layers/` sub-directory.

1. The first is simply a python data layer in `layers/NYUdata.py`, and handles
loading training data from the NYUv2 dataset (details on how to prepare the
data are in the next section). Make sure you compile Caffe with python layers
enabled, and place the above file in the current directory or somewhere
in your `PYTHONPATH`.

2. The second layer is the SoftMax + KL-Divergence loss layer. You will need to
compile this into Caffe. Copy the header file `softmax_kld_loss_layer.hpp`
into the `include/caffe/layers/` directory of your caffe distribution, and
`softmax_kld_loss_layer.c*` files into the `src/caffe/layers/` directory.
Then run `make` to compile / update caffe.

## Preparing NYUv2 Data

Download the RAW distribution and toolbox from the [NYUv2 depth dataset
page](http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html). Read
the documentation to figure out how to process the RAW data to
create aligned RGB-depth image pairs, and to *fill-in* missing depth
values. Also, make sure you only use scenes corresponding to the training
set in the official train-test split.

For each scene, generate a pair of PNG files to store the RGB and depth data
respectively. These should be named with a common base name and different
suffixes: '_i.png`, for the 8-bit 3 channel PNG corresponding to the
RGB image, and `_f.png` for a 16-bit 1 channel PNG image corresponding
to depth---the depth png should be scaled so that the max UINT16 value
(2^16-1) corresponds to a depth of 10 meters.

All images should be of size 561x427, corresponding to the valid projection
area (you can use the `crop_image` function in the NYU toolbox). If you
decide to train on a different dataset, you might need to edit the data layer
and the network architecture to work with different resolution images.

Place all pairs you want to use in the same directory, and prior to calling
affe, set the environment variable `NYU_DATA_DIR` to its path, e.g. as
`export NYU_DATA_DIR=/pathto/nyu_data_dir`. Then, create a text file called
`train.txt` (and place it in the same directory from which you are calling caffe).
Each line in this file should correspond to the common prefix for each scene. So,
if you have a line with `scene1_frame005`, then the data layer will read the
files:

```
/pathto/nyu_data_dir/scene1_frame005_i.png
/pathto/nyu_data_dir/scene1_frame005_f.png
```

for the image and depth data respectively.


## Training

Use the provided `train.prototxt` file for the network definition, and create a
solver prototxt file based on the description in the paper (momentum of 0.9, no
weight decay, and learning rate schedule described in the paper).

When you begin training, you should provide as an option to caffe:

```
-weights filters_init.caffemodel.h5,/path/to/vgg19.caffemodel
```

where `vgg19.caffemodel` is the pre-trained VGG-19 model from the caffe model
zoo. `filters_init.caffemodel.h5` is provided in this directory, and initializes
the weights of various layers in `train.prototxt` that compute depth-derivatives,
mixture weights with respect to various bins, perform bilinear up-sampling
of the scene features, etc. These layers have a learning rate factor of 0, and
will not change through training. However, they will be saved with model
snapshots, so you will need to provide the above option only the first time you
start training.

Please see the paper for more details, and contact <ayanc@ttic.edu> if you
still have any questions.

0 comments on commit 3ecd0a5

Please sign in to comment.