-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate information about earlier papers using three-way interactions #52
Comments
Roland Memisevic worked on several papers that touch the idea of gating and three-way interactions: We should ask for feedback from him on that. |
His recommendation is to read his review paper as well as the gated softmax paper. |
"Gated Softmax Classification" uses bilinear transformations to represent the three-way interaction between an input (x), latent binary hidden variables (h), and output classes (y). To predict the pre-softmax score of y, you marginalize over all possible combinations of h to combine the various pre-softmax scores given by a class-specific bilinear transformation: (h^T)(W_y)(x). You can use some nice math to do the exact calculation tractably, and you can factor the weights used for the bilinear transformations to significantly decrease the number of parameters and increase regularization. The model gets good results on MNIST-like tasks. The method seems like an interesting way to use bilinear transformations to condition computation, i.e. based on latent variables (rather than side-information or self-information). It seems worth a sentence/phrase/citation, but not much more at the moment. However, I will read the other papers you list here too; if I find a whole class of methods that condition on latent variables similarly, then we could consider making a subsection on this kind of approach. Let me know if you want me to make a pull request for a change into the article, or if it's easier for you to incorporate it yourself directly. |
It would be more convenient for me if you made the PR and I reviewed it. Thanks! |
Okay sounds good I'll make one! |
I made a few notes on Roland's review paper for the portions I've read so far; I'll incorporate them (along with my Gated Softmax notes) into a pull request once I finish reading the review. Here are the notes I have so far (more for myself than anything):
|
@ethanjperez as I mentioned to Florian in issue #92, I think we can avoid making the text heavier by integrating those citations into a bibliographic note in the appendix (see e.g. the CTC article and the relevant portion of its source code). |
@vdumoulin Did we decide to leave the note on Biological Plausibility out? Roland's review paper has this interesting note: "From a biological point of view, multiplicative interactions may also be viewed as a conceptually simple approximation to more complex dendritic computations [60] than the common neuron abstraction used in practically all deep learning models." [60] K. A. Archie and B. W. Mel, “A model for intradendritic computation of binocular disparity,” Nature Neuroscience, vol. 3, no. 1, pp. 54–63, Jan. 2000. |
I wouldn't feel comfortable defending that connection, as I'm not familiar enough with that literature. I think we should leave biological plausibility out. |
No description provided.
The text was updated successfully, but these errors were encountered: