Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize the network into graph of Blob nodes and Layer edges #166

Closed
kloudkl opened this issue Feb 26, 2014 · 5 comments
Closed

Generalize the network into graph of Blob nodes and Layer edges #166

kloudkl opened this issue Feb 26, 2014 · 5 comments
Assignees
Milestone

Comments

@kloudkl
Copy link
Contributor

kloudkl commented Feb 26, 2014

Multiple factors motivate this proposal including #57, #119, #129, the papers and codes of Generative Stochastic Networks (GSN) [1, 2, 3, 4, 5], and the scene labeling paper using Recurrent Convolutional Neural Network (RCNN) [6].
image

In the new design Blob becomes BlobNode and has source nodes and target nodes. The edges are represented by the LayerEdge which no longer containS bottom or top blobs. The nodes are independent of the processing layers. Both nodes and edges can be reused (data or weight parameter shared) according to the structures of the network which generally will not be linearly arranged layers from the bottom straightly to the top but truly networks like the social graphs.

[1] Yoshua Bengio, Éric Thibodeau-Laufer, Jason Yosinski. Deep Generative Stochastic Networks Trainable by Backprop. arXiv:1306.1091 [cs.LG]. 2013.
[2] Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent. Generalized Denoising Auto-Encoders as Generative Models. NIPS, 2013.
[3] Li Yao. Efficient implementation of Generative Stochastic Networks. https://github.com/yaoli/GSN. 2013.
[4] @lightcatcher. Generative Stochastic Network (GSN) model. lisa-lab/pylearn2#392. 2013.
[5] Jian Zhou, Olga Troyanskaya. Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction. JMLR W&CP 32 (1) : 745–753, 2014.
[6] Pedro Pinheiro, Ronan Collobert. Recurrent Convolutional Neural Networks for Scene Labeling. JMLR W&CP 32 (1) : 82–90, 2014.

@shelhamer
Copy link
Member

This is an excellent suggestion. Now that Caffe has learned to experiment with DAGs, weight sharing is a natural next generalization that we have been discussing in lab. Making a public issue like this will help focus the plan.

#57 and #119 must be addressed first. Development in earnest of #57 will start after March 7. #119 effectively doubles the size of a model that can be trained on a single GPU, but requires careful changes to the solver, so it should perhaps wait until after #57. Then this and #119 can be pursued.

Thanks, especially for the clarity of this proposal and the references.

@kloudkl
Copy link
Contributor Author

kloudkl commented Feb 26, 2014

Thanks for your support!

March 7 has been mentioned several times. What is the blocking factor until then?

@shelhamer
Copy link
Member

The ECCV '14 paper submission deadline. We are a research group, after all, and there's only so much ☕

@shelhamer
Copy link
Member

We've decided on an alternative design for weight sharing. A new param field will be added to layers that define their parameter blobs, and Net::Init() will have a preprocessing step that instantiates shared parameter blobs and shares them among layers as needed. Layers that do not share parameters will hold their blobs internally, as they do now.

@shelhamer
Copy link
Member

Closing as #500 will solve.

lukeyeager pushed a commit to lukeyeager/caffe that referenced this issue Jun 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants