Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Improve example ssd #4225

Closed
1 of 6 tasks
zhreshold opened this issue Dec 13, 2016 · 20 comments
Closed
1 of 6 tasks

Improve example ssd #4225

zhreshold opened this issue Dec 13, 2016 · 20 comments

Comments

@zhreshold
Copy link
Member

Up to now, there are several issues with example ssd, I'm posting here to track the progress in improving this example in nnvm branch.

  • Make sure the current example does converge. This is confirmed @piiswrong

  • Add test_score.py to allow automatic check for future commits

  • Write a new lr_scheduler because initial gradients are not stable, since the current model vgg16 has no batchNorm layer

  • Replace data loading/augmentation functions with mx.image, after some experiments, I found this is more important than packing images into sequential file, this will make training faster with many gpus

  • Support rec files as input

  • Make caffemodel converter available

Any suggestion is very welcome. I will keep this updated.

@piiswrong
Copy link
Contributor

piiswrong commented Dec 13, 2016

Just to confirm: do you mean it converges on the nnvm branch?

Glad to know that you find mx.image useful. I'm planing to write some tutorials on that. Are you interested in jumping in?

@zhreshold
Copy link
Member Author

Yes, I mean nnvm branch. And I'm absolutely very interested in the tutorial, @piiswrong.

@howard0su
Copy link
Contributor

howard0su commented Dec 13, 2016

more suggestions:

  1. update SSD based on the updated paper. The paper report 5% improvement mAP
    a) update SSD model
    b) add color distort
    I had some code change here: https://github.com/howard0su/mxnet/tree/ssdv3_nnvm but didn't finish some other changes like negative mining change.

  2. Add mAP calculation metric
    this is very useful.

  3. Support other dataset like Kitti

  4. Normalize the implementation of DatIter. We need a standard implementation over detection data input. So that we can leverage existing IO iterators. and data argumenter code can be reused as well.

@zhreshold
Copy link
Member Author

@howard0su Looks good, I will consider them carefully, especially 4 is the one I'm thinking about. Detection problems should reuse the same basics, and it could possibly benefit all existing/future projects.

@howard0su
Copy link
Contributor

@zhreshold Can u propose a design? I can afford some time to help as well.

@zhreshold
Copy link
Member Author

I think mx.image.ImageIter could be a very good starting point to unify the interface of DataIter for object detection problem.
The differences/difficulties are:

  1. Label width varies from image to image because # object varies, this have to be solved by padding or special process before loading the labels. Thus rec files must be prepared accordingly, I think it's better to unify this behavior across tasks.

  2. Data augmentations such like lighting/colorjitter/colornomalize can be reused from current functions, however, anything related to spatial transform must be handled differently: augmenter must take in label as well, cropping/flipping image will result in different labels.

  3. as a result of 2, the format of label for object detection tasks should be fixed, so we can always reuse the augmenter functions. Essentially we need labels in formats like this for each image:
    (im_width, im_height) - required for those using non-fixed size inputs(fast(er)-rcnn, etc)
    (object_id, xmin, ymin, xmax, ymax) x N - proportional or absolute bounding boxes

Just wondering if you guys ever had plans or ideas like this? @piiswrong @sxjscience @precedenceguo

@piiswrong
Copy link
Contributor

  1. you can pack array as label into rec and each record can have different label length
  2. the crop func etc returns the transformed image along with coordinates. So you can write a wrapper that transform the label

@howard0su
Copy link
Contributor

Data augmentations, are u proposing current spatial transform iterators supporting both "label" data as a vector of bounding box? another possible solution is exposing transform information through another output variable. and build iterator to consume those variables to transform bounding box.

@ijkguo
Copy link
Contributor

ijkguo commented Dec 19, 2016

The results we have now:

  • mx.image is useful to speed up training io

The problem:

  • how to pack detection image into rec
  • how to address label transformation

From weiliu-ssd we can learn:

  • lmdb can be used in detection -> rec is possible
  • complex augmentation is necessary in ssd -> maybe also useful to other methods -> generic detection io is meaningful

@santoshmo
Copy link
Contributor

Adding a deconvolutional augmentation to the current SSD would help as well:
https://arxiv.org/pdf/1701.06659v1

Achieves 80.1 mAP on VOC 2007 test and 33.2 mAP on COCO without sacrificing too much in terms of speed. Simple modification of existing feature extractor and additions of deconvolutional network operations at the end of the architecture should improve the SSD.

screen shot 2017-02-01 at 3 12 06 pm

@zhreshold
Copy link
Member Author

I'm testing a new iterator to allow extensive data augmentation with better speed/api, after that I will try to write multiple symbols to match many variations, including this one.

@piiswrong
Copy link
Contributor

piiswrong commented Feb 2, 2017 via email

@zhreshold
Copy link
Member Author

@piiswrong Can you have a look at this: https://github.com/zhreshold/mxnet/blob/ssd2/python/mxnet/image.py, though it's not fully finished.
For rcnn, overriding ImageDetIter.reshape and ImageDetIter.next then it should be good enough for handling mini-batches

@piiswrong
Copy link
Contributor

piiswrong commented Feb 2, 2017

@andreaolgiati
Copy link
Contributor

If I read this correctly, line 518 adds support for variable-length label list. That's awesome!

@zhreshold
Copy link
Member Author

@piiswrong
I now still have concerns about the performance of augmenters in python end. Here's some tests with ssd example on single gpu(titan x) with a relatively weak cpu(e5 4c4t 2.8g):

cast mean bright contrast saturation pca_noise mirror rand_crop rand_pad sample/s
x x x x x x x x 31.6
x x x x x x x 31.2
x x x x x x 30.34
x x x x x x 28.3
x x x x x x 29.02
x x x x x x 29.93
x x x 20.65
x x x x x x 30.6
x x x x x x 30.56
x x x x x x 30.54
x x x x 26.41
16.9

So, brightness+contrast+saturation+pca_noise augmenter could impact the performance a lot, even greater than random cropping and padding(for detection), which is a surprise to me.
I also tried threading and it provides no gain, possibly due to GIL.
Any suggestion?

@piiswrong
Copy link
Contributor

piiswrong commented Feb 6, 2017 via email

@zhreshold
Copy link
Member Author

Well, forgot to do so 😓. Instant climb from 16 sample/s to 29.8 sample/s. Thanks

@andreaolgiati
Copy link
Contributor

Might take some work, but I'd also look into using multiprocessing. I have found GIL to be a big pain in the past.

@zhreshold
Copy link
Member Author

@andreaolgiati I was think about multiprocessing as well. However, if pushing time consuming work into mxnet engine works well, there's no reason to do so.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants