Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a matrix output and ground truth example (segmentation, sliding window detection, etc.) #1698

Closed
shelhamer opened this issue Jan 8, 2015 · 17 comments

Comments

@shelhamer
Copy link
Member

Caffe is perfectly happy with models that make matrix outputs and learn from matrix ground truths for problems where the output and truth have spatial dimensions e.g. reconstruction / de-noising, pixelwise semantic segmentation, sliding window detection, and so forth. The forward and backward passes for these models follow directly from the definitions and Caffe has always been capable of computing these.

However, there isn't yet a bundled example and exactly how to accomplish this is confusing to many new users.
#189 is already solved technically by on-the-fly reshaping #594, instance-wise losses like SOFTMAX_LOSS, EUCLIDEAN_LOSS, SIGMOID_CROSS_ENTROPY_LOSS and so on, and proper data preparation. At the same time, this isn't immediately obvious from the documentation and examples so a walkthrough would do a lot of good.
#308 is technically redundant and does not mesh with the Caffe code but it was put to use in a standalone way and it's good that the code was made available to accompany the tech report.

@ivendrov
Copy link

I would find this very useful. Echoing #197, a reference OverFeat-like model in Caffe would be fantastic.

@shelhamer
Copy link
Member Author

To help in the meantime, check out this code sample for generating an LMDB in Python with custom data:

import caffe
import lmdb
from PIL import Image

in_db = lmdb.open('image-lmdb', map_size=int(1e12))
with in_db.begin(write=True) as in_txn:
    for in_idx, in_ in enumerate(inputs):
        # load image:
        # - as np.uint8 {0, ..., 255}
        # - in BGR (switch from RGB)
        # - in Channel x Height x Width order (switch from H x W x C)
        im = np.array(Image.open(in_)) # or load whatever ndarray you need
        im = im[:,:,::-1]
        im = im.transpose((2,0,1))
        im_dat = caffe.io.array_to_datum(im)
        in_txn.put('{:0>10d}'.format(in_idx), im_dat.SerializeToString())
in_db.close()

While this code makes an image DB, you can likewise make the ground truth DB from any scalar / vector / matrix data and calling caffe.io.array_to_datum. Data with number of dimensions < 4 needs to be padded with singleton dimensions. Note that the indices are zero padded to preserve their order: LMDB sorts the keys lexicographically so bare integers as strings will be disordered.

@bhack
Copy link
Contributor

bhack commented Mar 6, 2015

/cc @mtamburrano

@demondan
Copy link

@shelhamer Sorry, I'm a newer for caffe and I can't fully understand. Could you tell me the steps dealing with matrix ground trouth

@ctrevino
Copy link

ctrevino commented Jun 5, 2015

@shelhamer you sent this link through caffe-users group, but it is still a little bit confusing. As far as i understand 'inputs' should contain the location for all the images right?

There is a missing bracket in your code, it should be:

im = im.transpose((2,0,1))

@zizhaozhang
Copy link

Hi,
How is the speed of using this code in python compared with the tool provided by caffe-convert_imageset?

@Linzert
Copy link

Linzert commented Aug 1, 2015

@shelhamer Hi,
The caffe.io.array_to_datum can only take three dimentions datas as input.But the groundtruth is two dimentions.What should I do to solve this problem?

@shelhamer
Copy link
Member Author

@Linzert make a singleton dimension for the channels so the groundtruth is 1 x H x W. Please ask usage questions on the caffe-users group.

@Linzert
Copy link

Linzert commented Aug 3, 2015

@shelhamer OK! I got it.Thanks very much.

@inferrna
Copy link

@ctrevino @demondan
Basing on this example https://github.com/BVLC/caffe/blob/master/examples/01-learning-lenet.ipynb you just need to change loading data step like this:

label = L.Data(batch_size=99, backend=P.Data.LMDB, source='train_label', transform_param=dict(scale=1./255), ntop=1)
data = L.Data(batch_size=99, backend=P.Data.LMDB, source='train_data', transform_param=dict(scale=1./255), ntop=1)

@liyangliu
Copy link

Hello, @shelhamer

sorry to bother you, I am trying to fine-tune your fcn_alexnet to my own dataset, but when i use gpu, i encounter a problem as follows:

I1128 22:55:19.044678 358 parallel.cpp:395] GPUs pairs 0:1, 2:3, 0:2
I1128 22:55:19.319453 358 data_layer.cpp:44] output data size: 20,3,451,451
I1128 22:55:19.419453 358 data_layer.cpp:44] output data size: 20,1,451,451
I1128 22:55:22.330047 358 data_layer.cpp:44] output data size: 20,3,451,451
I1128 22:55:22.433482 358 data_layer.cpp:44] output data size: 20,1,451,451
I1128 22:55:25.011587 358 parallel.cpp:238] GPU 2 does not have p2p access to GPU 0
I1128 22:55:25.315258 358 data_layer.cpp:44] output data size: 20,3,451,451
I1128 22:55:25.440186 358 data_layer.cpp:44] output data size: 20,1,451,451
I1128 22:55:28.140528 358 parallel.cpp:423] Starting Optimization
I1128 22:55:28.141278 358 solver.cpp:293] Solving FCN-AlexNet-FD
I1128 22:55:28.141311 358 solver.cpp:294] Learning Rate Policy: fixed
I1128 22:55:28.141938 358 solver.cpp:346] Iteration 0, Testing net (#0)
F1128 22:55:28.424314 358 math_functions.cu:123] Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR
*** Check failure stack trace: ***
@ 0x7fa222b9212d google::LogMessage::Fail()
@ 0x7fa222b93fcd google::LogMessage::SendToLog()
@ 0x7fa222b91d48 google::LogMessage::Flush()
@ 0x7fa222b9482e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fa2232c211a caffe::caffe_gpu_asum<>()
@ 0x7fa2232c4172 caffe::SoftmaxWithLossLayer<>::Forward_gpu()
@ 0x7fa2231d6db2 caffe::Net<>::ForwardFromTo()
@ 0x7fa2231d6ed7 caffe::Net<>::ForwardPrefilled()
@ 0x7fa2231c1ba6 caffe::Solver<>::Test()
@ 0x7fa2231c23be caffe::Solver<>::TestAll()
@ 0x7fa2231c250d caffe::Solver<>::Step()
@ 0x7fa2231c2f15 caffe::Solver<>::Solve()
@ 0x7fa2231ce35e caffe::P2PSync<>::run()
@ 0x40906e train()
@ 0x40671b main
@ 0x7fa222096a40 (unknown)
@ 0x406eb9 _start

i have transform my data to lmdb by your python code at github and use it as the source.

print("Creating Training Data LMDB File ..... ")
in_db = lmdb.open('../data/FDDB/8_2/TrainFDDB_Data_lmdb',map_size=map_size)
with in_db.begin(write=True) as in_txn:
for in_idx, in_ in enumerate(inputs_data_train):
print in_idx
im = np.array(Image.open(in_)) # or load whatever ndarray you need
Dtype = im.dtype
if len(im.shape) == 2:
(row, col) = im.shape
im3 = np.zeros([row, col, 3], Dtype)
for i in range(3):
im3[:, :, i] = im
im = im3
im = im[:,:,::-1]
im = Image.fromarray(im)
im = im.resize([Rheight, Rwidth], Image.ANTIALIAS)
im = np.array(im,Dtype)
im = im.transpose((2,0,1))
im_dat = caffe.io.array_to_datum(im)
in_txn.put('{:0>10d}'.format(in_idx),im_dat.SerializeToString())
in_db.close()

print("Creating Training Label LMDB File ..... ")
in_db = lmdb.open('../data/FDDB/8_2/TrainFDDB_Label_lmdb',map_size=map_size)
with in_db.begin(write=True) as in_txn:
for in_idx, in_ in enumerate(inputs_label_train):
print in_idx
# in_label = in_[:-40]+'SegmentationClass/'+in_[-15:-3]+'png'
Dtype = 'uint8'
L = np.array(Image.open(in_), Dtype) # or load whatever ndarray you need
Limg = Image.fromarray(L)
Limg = Limg.resize([LabelHeight, LabelWidth],Image.NEAREST) # To resize the Label file to the required size
L = np.array(Limg,Dtype)
# L2 = np.zeros([LabelHeight, LabelWidth, 2], Dtype)
# L2[:, :, 0] = L
# L2[:, :, 1] = 1 - L
L = L.reshape(L.shape[0],L.shape[1],1)
L = L.transpose((2,0,1))
L_dat = caffe.io.array_to_datum(L)
in_txn.put('{:0>10d}'.format(in_idx),L_dat.SerializeToString())
in_db.close()

layer {
name: "data"
type: "Data"
top: "data"
include {
phase: TRAIN
}
transform_param {
mean_value: 104.00699
mean_value: 116.66877
mean_value: 122.67892
}
data_param {
source: "data/Train_Data_lmdb"
batch_size: 20
backend: LMDB
}
}
layer {
name: "label"
type: "Data"
top: "label"
include {
phase: TRAIN
}
data_param {
source: "data/Train_Label_lmdb"
batch_size: 20
backend: LMDB
}
}

the same with test phase

i created 4 lmdbs for train data, test data, train label, test label respectively
my label is a matrix(each element is 0 or 1(uint8) as yes or not), and saved as png pictures
and the last convolution layer and deconvolution layer of my net has num_output = 2

when I use cpu mode, it is ok, but it's too slow
could you please be so kind as to help me a little?
thanks very much.

i have looking for the answer for so long and can't figure it out.
i really don't want to bother you, but i have no idea how to leave a message for you at github, sorry.

thank you.

@shelhamer
Copy link
Member Author

@liyangliu

CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR is usually an out-of-memory error in disguise. Please follow-up on caffe-users to know more.

https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md:

Please do not post usage, installation, or modeling questions, or other requests for help to Issues.
Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

@varunagrawal
Copy link

To help anyone in the future with this, here is an example notebook:
http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/pascal-multilabel-with-datalayer.ipynb

@jpsquare
Copy link

@varunagrawal the notebook example code is giving me an error in the lines(In [2]):

sys.path.append("pycaffe/layers") # the datalayers we will use are in this directory.
sys.path.append("pycaffe") # the tools file is in this folder
import tools #this contains some tools that we need

I cannot find the mentioned 'tools' file in the master branch.

@shelhamer
Copy link
Member Author

Please see the fcn.berkeleyvision.org repo for master compatible FCNs, solver configurations, and scripts for learning, inference, and scoring. This includes how to define the net and python layers for loading inputs and labels which are both 3-D.

Closing, as I think the multilabel example #3471 and an FCN example as requested in #3890 will satisfy this issue. Please post on the caffe-users mailing list with further questions about modeling and usage for labels >1-D.

@ChiZhangRIT
Copy link

@ctrevino What does 'inputs' look like? How to contain all the images in "inputs"?

@AlexTS1980
Copy link

For the label LMDB creation, just upload each label as caffe.io.load_image("img.png", False) , which will return an image (mask) size HxWx1. You'll just have to transpose it to 1xHxW after that. Also, this function loads image as RGB, not BGR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests