Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCCP pooling layer #498

Closed
wants to merge 3 commits into from
Closed

CCCP pooling layer #498

wants to merge 3 commits into from

Conversation

mavenlin
Copy link
Contributor

This pull request is to add a cascadable cross channel parametric(cccp) pooling layer.
The output feature maps of this layer is an parametric recombination of the input feature map.
It is used with relulayer on top of a convolutional layer. Each patch from the convolution input is mapped to its feature vector in the output feature map through a nonlinear function (multilayer perceptron).
The function is equivalent to 1x1 convolution. However, if convlayer is used for this purpose, there is unnecessary im2col operation.

The cccp layer is used to implement the idea in this paper: Network In Network, which achieved best performance on cifar-10 and cifar-100 datasets. As is shown here.

Using the long name cascadable cross channel parametric pooling is because I want the abbrev to be the same with Союз Советских Социалистических Республик, which is just cool.

Please don't merge yet, tests and examples will be added soon.

@kloudkl
Copy link
Contributor

kloudkl commented Jun 17, 2014

@mavenlin, it's cool that you finally open sourced the algorithm. Why didn't you link to this PR in your blog post on CCCP pooling?

I heard from a colleague who has been collaborating with your team that you were not using it in models trained for production environment. Why? Is it too slow compared to the vanilla convolution?

@mavenlin
Copy link
Contributor Author

@kloudkl I've opensourced it for quite a while in my own fork of caffe and cuda-convnet.
CCCP is not so computation intensive compared to vanilla convolution, because you can view it as 1x1 convolution.
I guess what you hear is that it is not used in last year's imagenet competition.
I'll be releasing a model for imagenet that's only 29MB (Without fully connected layer the model can be very compact) but performs slightly better(60% accuracy) than Alexnet (takes about 4~5 days to train on GTX Titan).

@mavenlin
Copy link
Contributor Author

@shelhamer CCCP is ready to be merged
My small imagenet model has been uploaded to gist.
https://gist.github.com/mavenlin/d802a5849de39225bcc6

@shelhamer
Copy link
Member

Great! Please format your model gist for the model zoo as done in Sergey's
example:

https://gist.github.com/sergeyk/034c6ac3865563b69e60

It should have a readme.md with commit + gist info, a solver prototxt, and
the model prototxt. Include an URL to the model weights in the readme.md
front matter if you choose to redistribute them.

On Saturday, September 13, 2014, Lin Min notifications@github.com wrote:

CCCP is ready to be merged
My small imagenet model has been uploaded to gist.
https://gist.github.com/mavenlin/d802a5849de39225bcc6


Reply to this email directly or view it on GitHub
#498 (comment).

Evan Shelhamer

@mavenlin
Copy link
Contributor Author

@shelhamer It seems gist would put the files in alphabetical order. That's why my readme.md file is put after deploy.prototxt. I remove the deploy file and now it works.

@shelhamer
Copy link
Member

@mavenlin rather than introduce a whole new layer for this special case of convolution I have included an optimization in the Caffe convolution layer #1118.

Once it is merged, please update your NIN definition to use CONV layers instead of CCCP, although of course you can keep the layer names to make their purpose clear. You can include the commit ID in the front matter then too.

Thanks for the inaugural contribution to the model zoo!

@mavenlin
Copy link
Contributor Author

@shelhamer this is reasonable. BTW, I wonder if cudnn can do better in this, if the num dimension is also paralleled.

@shelhamer
Copy link
Member

@mavenlin the num dimension is parallelized in our cuDNN integration. Once
you swap CONV layers into your model prototxt in place of CCCP layers you
could time the Caffe and cuDNN integrations by setting the engine flag in
the convolution_param.

Please close the PR once the model's updated to signal it's ready for a
try. Thanks.

On Saturday, September 20, 2014, Lin Min notifications@github.com wrote:

@shelhamer https://github.com/shelhamer this is reasonable. BTW, I
wonder if cudnn can do better in this, if the num dimension is also
paralleled.


Reply to this email directly or view it on GitHub
#498 (comment).

@shelhamer
Copy link
Member

@mavenlin note that you can include the deploy.prototxt too if you just make the model zoo link include the anchor for the readme.md. That is, link to the gist file url instead of only the gist.

@shelhamer
Copy link
Member

@mavenlin please update your model gist to switch the CCCP type layers to CONV for BVLC/caffe compatibility. I know there is interest in using your model. Thanks!

@mavenlin
Copy link
Contributor Author

mavenlin commented Oct 1, 2014

@shelhamer Sorry for leaving this pending for so long, I've updated the prototxt, I'm currently overseas, I'll update the model once I get back.

@mavenlin mavenlin closed this Oct 1, 2014
@mavenlin
Copy link
Contributor Author

mavenlin commented Oct 1, 2014

model updated.

@shelhamer
Copy link
Member

Awesome! Thanks for contributing the Network-in-Network model.

On Tuesday, September 30, 2014, Lin Min notifications@github.com wrote:

model updated.


Reply to this email directly or view it on GitHub
#498 (comment).

@ducha-aiki
Copy link
Contributor

Actually, it is updated, but looks like not working...
I have tried to finetune it to PASCAL and got an error - on the first cccp layer
1003 13:05:46.623013 31678 caffe.cpp:115] Finetuning from nin_imagenet.caffemodel
...
F1003 13:05:46.656553 31678 net.cpp:713] Check failed: target_blobs[j]->channels() == source_layer.blobs(j).channels() (1 vs. 96)

@ronghanghu
Copy link
Member

@ducha-aiki It takes some extra hacking to update the model weights. I have done this myself previously, and you may want to try my version https://drive.google.com/folderview?id=0B0IedYUunOQINEFtUi1QNWVhVVU&usp=drive_web

@ducha-aiki
Copy link
Contributor

@ronghanghu, thank you very much, I will try it. The funniest thing that I could not train it from the scratch not because of lack of GPU, but because of no free space for imagenet downloading :(

@emasa
Copy link

emasa commented Dec 20, 2014

Hi @mavenlin, I'm wondering what top-5 accuracy do you get on ImageNet with the NIN model ? With and without tests ? I've not read about it so far . Thanks.

@ducha-aiki
Copy link
Contributor

@emasa
Top-1 acc 0.5674
Top-5 acc 0.7953.
Single central crop.
See https://github.com/BVLC/caffe/wiki/Models-accuracy-on-ImageNet-2012-val

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants