Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routing #28

Closed
egorkrash opened this issue Dec 1, 2017 · 7 comments
Closed

Routing #28

egorkrash opened this issue Dec 1, 2017 · 7 comments

Comments

@egorkrash
Copy link

egorkrash commented Dec 1, 2017

As I understand, you reset coupling coefficients after each training sample (batch). Don't you think it would be better to keep their previous state and update them depending on it?

@XifengGuo
Copy link
Owner

@egorkrash First, the paper resets coupling coefficients after each sample. Second, I don’t think the value should be kept. Each sample should have its own coefficients, so keeping them will take too many resources. And when it comes to testing samples, they don’t have coefficients in memory, so their coefficients have to initialized to uniform. To keep the consistency, in training process we shouldn’t keep the coefficients of training samples.

@egorkrash
Copy link
Author

egorkrash commented Dec 1, 2017

@XifengGuo Sorry if I seem to be offensive. I'm just trying to fix my misunderstanding and it would be great if you help me to do that.

Could you please point me where it is said in the paper that coupling coefficients should be reseted after each training sample? And in which way do you mean keeping them will cost too much resources? So should routing algorithm take place while testing process?

@XifengGuo
Copy link
Owner

XifengGuo commented Dec 1, 2017

@egorkrash
image
As shown in Routing algorithm, it considers only one sample and resets b_ij to 0 at every call. The routing algorithm is called in every forwarding pass. So whenever a training sample comes, the b_ij for this sample will change from 0 for some iterations. Even the same sample comes in the next epoch, when the routing algorithm is called, the b_ij of this sample will also start from 0. To sum up, the coupling coefficients are reset at each training step.

Now that each sample requires a b_ij whose size is 1152*10=11520. If we use batch training with batch size 100, the elements of b_ij to reserve are 11520*100=1,152,000. Assume we don't need to save the values of b_ij after routing, we can use the same 1,152,000 b_ij for each batch training (they are reset to 0 at each training step anyway). However, if we want to keep track of the b_ij for every samples in the training set, we have to save 60,000 * 1,152,000 = 6.912e10 elements for these b_ijs. Each element is a float type with 4 byte, then saving b_ij requires 4 * 6.912e10Byte=257GB.

The routing algorithm surely takes place in testing phase. For each testing sample, the b_ij will start from 0 as same as in training phase.

You can get more perspectives about routing from #1

@egorkrash
Copy link
Author

@XifengGuo Thanks a lot for the explanation!
As for memory resources, I didn't suggest to keep b_ij in memory for each sample. I thought that updating them while training without setting them to zeros after each batch may work.
One thing seems strange for me now. Why is it better to have shared b_ij for all samples in batch?

@XifengGuo
Copy link
Owner

@egorkrash b_ij are not shared for all samples in batch, each sample in the batch has its own b_ij.

@ghost
Copy link

ghost commented Jan 18, 2018

Thankyou very very much !!!!!! for your clear explanation!!!!!!!

@jlevy44
Copy link

jlevy44 commented Sep 13, 2019

Interesting discussion. Many of the tutorials I've seen for capsnets do not have a b_ij per sample, which is rather unfortunate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants