-
Notifications
You must be signed in to change notification settings - Fork 6.8k
How to use mxnet for image segmentation training? #337
Comments
The Caffe / Mocha.jl way of handling this is to allow "multiple dimensions" in the softmax loss layer. For example, the label could be (using Python's row-based ordering)
In terms of image segmentation, I think this CRF-as-RNN Caffe Code might be a very nice one to incorporate into MXNet and serving as a cool demo. I might consider doing it when I squeeze out some time but it seems not very recently. So @tornadomeet if you are working in segmentation you could probably add this. |
BTW: just as a side comment since our current
The followings are just some minor thoughts on the general design:
This essentially allows us to rename the label variable to whatever we want, as in the
|
@pluskid I agree with your idea on softmax should be solely softmax transformation. |
The care need to make a truely valid softmax operator, though. Because being able to propback the right gradient for any composition after softmax requires the log-probability instead of probability for numerical stability. A better approach might be take a log-softmax operator to allow arbitrary composition. Or simply restrict softmax to only be able to composed with SoftmaxLoss. The current decision was made to make things work in a good way for restricted case, which should be changed. |
@tqchen We should do refactor soon after we fix RNN stuff, because 2 reason: 1. Capable with CuDNN 2. Better support different loss on Softmax |
Hi, all Thanks are quite a lot of work need to be done for segmentation task. The multi softmax loss for segmentation is only one issue. However, there are several other ops need to be implemented:
The most concerned part is the UpSampling layer. It may not easy to implement in current mshadow framework. |
|
@tqchen Sure I'm glad to help on this but let me roll MXNet.jl in a good shape first. We might need to have somewhere an official TODO items for the whole project so that we do not forget things. @winstywang I heard from a friend working on image segmentation saying that
I have not personally tried to implement and verify this statement. But I think it might be worth to keep in mind. If this is true, it will definitely save everybody a lot of extra work. |
@winstywang What do you mean by the IO part? If our softmax is extended to Caffe-like that support evaluating at each pixels, then the whole pipeline is still single-input single-output (though multi-dimensional). It seems the current pipeline fits, or is the current image-records-IO only support single value labels? |
@tqchen I agree that combined softmax and logistic loss is much more numerically stable. The compromise in Caffe/Mocha.jl is to provide a In fact, doing them separately might not be that bad than we think in practice, if you do softmax carefully. I did some experiments quite long time ago (scroll down the page for the last figure). It seems that the discrepancy is within a reasonable range for relatively bounded inputs even for |
Thansk all for discussion, Learned a lot. |
The multi label softmax is merged in #387 |
@tqchen ,Yeah, i see it! thanks |
The current state-of-the-art on the VOC 2012 segmentation competition leaderboard is deep parsing network [1]. Not implemented operations include padded convolution filter copy initialization, up-sampling, local convolution and channels block min pooling. [1] Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, and Xiaoou Tang. Semantic Image Segmentation via Deep Parsing Network. ICCV 2015. |
@HyeonwooNoh's entry in the PASVOC VOC segmentation benchmark POSTECH_DeconvNet_CRF_VOC achieved 74.8% average precision. In his implementation, pixel wise classification loss RedSoftmaxWithLossLayer, EltwiseAccuracyLayer and RedAccuracyLayer are related to segmentation. The C++ seg data layers are better replaced by Python layers like BVLC/caffe#1698 (comment) and #554 (comment) which would be easier to implement. [1] Hyeonwoo Noh, Seunghoon Hong, Bohyung Han. Learning Deconvolution Network for Semantic Segmentation. ICCV 2015. |
@futurely |
wow, the authors are here :) |
Is that possible to assign different weights to different ground-truth labels? The weight can be set as 1/f_i, where f_i is the frequency of the i-th label in the training set, to deal with the unbalanced number of training samples of different classes. |
In "Learning Deconvolution Network for Semantic Segmentation", "for both datasets, we maintain the balance for the number of examples across classes by adding redundant examples for the classes with limited number of examples." |
In more traditional machine learning algorithms, weighted sample are not unusual. But long tail samples sometimes hurt the performance in DL experiments [2]. It's doubtful whether duplicating them will be of any help. [2] Erjin Zhou, Zhimin Cao, Qi Yin: Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not? CoRR abs/1501.04690 (2015). |
As you can see on the PASCAL VOC 2012 segmentation leaderboard, most top performing submissions used the Microsoft COCO dataset to further boost performance. The observation is very similar with what was discovered in "Naive-Deep Face Recognition": big data is more effective than complicated algorithms. |
I will work on segmentation task from next week. I think the first step is to implement the basic FCN model instead of the second model. |
@playerkk class weighting is implemented by the authors of [3][4]. But the experimental result on the PASCAL VOC 2012 segmentation dataset is not very competitive with DeconvNet [1] and CRF-RNN [5] although its network architecture seems to be very similar with DeconvNet. It's would be instructive to get some explanations from @alexgkendall. [3] Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla "Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding." arXiv preprint arXiv:1511.02680, 2015. http://arxiv.org/abs/1511.02680 |
Hi - Our work [3][4] differs from [1] in class weighting as you pointed out. Another major difference is the much more efficient parametrisation in SegNet [4], which is an order of magnitude faster to run, and can be trained end to end in one step. I believe DeConvNet [1] uses stage wise training and multiple region proposals in inference time. We found class weighting to be very important for scene understanding tasks. If you are more interested in datasets such as SUN or CamVid then I'd recommend implementing it. Cheers. |
Thanks all for your reply. I am working on predicting pixel-wise labels, not necessarily image segmentation. The distributions of labels are highly unbalanced. Experimental results based on MatConvNet indicate that weighted loss is quite helpful. MatConvNet, however, is very slow. So I am considering to switch using other packages. |
Cool, SegNet can do that and it is built on Caffe which is pretty fast - you might find this tutorial helpful. |
Once #640 is completed, most of the published image segmentation models that are based on FCN can be replicated without much difficulty in MXNet. Maybe most pre-trained models can even be imported directly. |
Actually, convolutional autoencoder networks don't have to use the pooling and unpooling layers. Simply stacking several conv layers and then a few deconv layers together is alright. Just for a little more fun, they can be arbitrarily interleaved! |
@tornadomeet congras you make it work, i'm going to close this issue now. it will be great if you can PR the example back. to others, the example is available at https://github.com/tornadomeet/mxnet/tree/seg/example/fcn-xs |
@tornadomeet Hi,I'm confused about how to make my own training data.Does the im2rec.py support making .rec files when labels are images as well? How to do with its arguments? |
If my data resides in AWS S3,how should I modify the code? passing s3://bucketname/... doesnt seem to work. |
In image segmentation, if one image has N pixpels, the number of labels is also N(not one), so the Softmax operator in mxnet can't handle it(just my opinion);
I want solve it through:
add a new operator(egs, softmaxseg-inl.h, softmaxseg.cu, softmaxseg.cc) in src/operator directory, SoftmaxSeg is used for image segmentation in forward, backword, calc loss..
change the format of image list file, like this : integer_image_index \t label.jpg \t data.jpg (each pixel in label.jpg stands for its class label).
change the code of iter_image_recordio.cc, so class ImageLabelMap can read label.jpg and store the value in label_;
I am a beginner of mxnet, How should we use mxnet for image segmentation training? can the above way solve it? or is there a better solution?
Give some advise, thanks.
The text was updated successfully, but these errors were encountered: