Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate information about earlier papers using three-way interactions #52

Closed
vdumoulin opened this issue Nov 24, 2017 · 9 comments
Closed

Comments

@vdumoulin
Copy link
Contributor

No description provided.

@vdumoulin
Copy link
Contributor Author

Roland Memisevic worked on several papers that touch the idea of gating and three-way interactions:

We should ask for feedback from him on that.

@vdumoulin
Copy link
Contributor Author

His recommendation is to read his review paper as well as the gated softmax paper.

@ethanjperez
Copy link
Contributor

ethanjperez commented Jan 25, 2018

"Gated Softmax Classification" uses bilinear transformations to represent the three-way interaction between an input (x), latent binary hidden variables (h), and output classes (y). To predict the pre-softmax score of y, you marginalize over all possible combinations of h to combine the various pre-softmax scores given by a class-specific bilinear transformation: (h^T)(W_y)(x). You can use some nice math to do the exact calculation tractably, and you can factor the weights used for the bilinear transformations to significantly decrease the number of parameters and increase regularization. The model gets good results on MNIST-like tasks.

The method seems like an interesting way to use bilinear transformations to condition computation, i.e. based on latent variables (rather than side-information or self-information). It seems worth a sentence/phrase/citation, but not much more at the moment. However, I will read the other papers you list here too; if I find a whole class of methods that condition on latent variables similarly, then we could consider making a subsection on this kind of approach.

Let me know if you want me to make a pull request for a change into the article, or if it's easier for you to incorporate it yourself directly.

@vdumoulin
Copy link
Contributor Author

It would be more convenient for me if you made the PR and I reviewed it. Thanks!

@ethanjperez
Copy link
Contributor

Okay sounds good I'll make one!

@ethanjperez
Copy link
Contributor

I made a few notes on Roland's review paper for the portions I've read so far; I'll incorporate them (along with my Gated Softmax notes) into a pull request once I finish reading the review. Here are the notes I have so far (more for myself than anything):

  • "The idea of using multiplicative interactions in vision was introduced about 30 years ago under the terms “mapping units” [1] and “dynamic mappings” [2]."
  • "Our analysis... predicts that the use of squaring non-linearities or multiplicative interactions will be essential in any tasks that involve representing relations."
  • Multiplicative interactions are useful in learning relationships as they are natural in identifying "matches" (think dot product of two feature vectors, logical AND/XNOR, (covariance matrix?) etc.). In contrast, additive interactions are perhaps more natural for carrying out different roles, such as content detection, feature aggregation, logical OR, etc. [Me: Perhaps using both lets you have the best of both worlds?]
  • Gated autoencoders [18], [19] reconstruct an input as a function of some conditioning input.
  • Gated Boltzmann Machine [16] is an RBM whose parameters (and energy function and normalization constant) are a function of some conditioning input. Samples are drawn from conditional distributions, and the model trains in a similar way to an RBM.
  • Emphasizes that the symmetry of multiplicative interactions allows multiple interpretations of what computation is happening (i.e. [latents conditioning transformation over another input] vs. [another input conditioning transformation over latents])

@vdumoulin vdumoulin changed the title Are there older papers that use gating as a side-information fusion mechanism? Incorporate information about earlier papers using three-way interactions Mar 14, 2018
@vdumoulin
Copy link
Contributor Author

@ethanjperez as I mentioned to Florian in issue #92, I think we can avoid making the text heavier by integrating those citations into a bibliographic note in the appendix (see e.g. the CTC article and the relevant portion of its source code).

@ethanjperez
Copy link
Contributor

ethanjperez commented Apr 3, 2018

@vdumoulin Did we decide to leave the note on Biological Plausibility out? Roland's review paper has this interesting note: "From a biological point of view, multiplicative interactions may also be viewed as a conceptually simple approximation to more complex dendritic computations [60] than the common neuron abstraction used in practically all deep learning models."

[60] K. A. Archie and B. W. Mel, “A model for intradendritic computation of binocular disparity,” Nature Neuroscience, vol. 3, no. 1, pp. 54–63, Jan. 2000.

@vdumoulin
Copy link
Contributor Author

I wouldn't feel comfortable defending that connection, as I'm not familiar enough with that literature. I think we should leave biological plausibility out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants