Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archy's feedback #65

Closed
18 of 20 tasks
vdumoulin opened this issue Dec 5, 2017 · 1 comment
Closed
18 of 20 tasks

Archy's feedback #65

vdumoulin opened this issue Dec 5, 2017 · 1 comment

Comments

@vdumoulin
Copy link
Contributor

vdumoulin commented Dec 5, 2017

  • (Archy) I found the maths in [the bilinear transformation] section a bit confusing, probably because I'm not very good at maths, but also because of the number of different nomenclatures flying around. Some visualizations will hopefully help (and look like they're in the pipeline).
  • We should discuss the connection to highway networks, which uses sigmoidal gating (like is used in LSTMs) in the layers of a very deep network.
  • The LSTM and PixelCNN self-conditioning explanations are fragmented and list-like.
  • The phrase "Going backwards from the definition of each computation mechanism, we will now explain how they can be expressed in terms of generalized bilinear transformations." is a bit jarring.
  • Add a supporting example to "It could also be that an image needs to be processed in the context of a question being asked."
  • "Attention is a good example of one such principled approach: side information is used [...]" -> s/side/contextual
  • Add a supporting example to "For instance, in conditional decoder-based generative models, we would like to map a source of noise to model samples in a way that is class-aware."
  • The use of both feature-wise affine transformations and feature-wise linear modulation is confusing.
  • (Archy) I think [starting with concatenation] is very useful, i.e. "Let's think of the dumbest solution possible and examine why it doesn't work." I'd be tempted to move this further up the article, to give a simple concrete example of how you might incorporate contextual details. You can then explain the shortcomings and invoke FiLM as the solution. This will also help clear up some ambiguity for the naive reader as to whether you're considering this 'concatenate and forget' solution as an example of FiLM or not (it's not, right, because there is no modulation of existing features, you just add a bunch more?).
  • (Archy) I see we're working towards the definition of a FiLM that you gave earlier -- composed of a biasing and a scaling -- but that progress is not particularly limpid. Could you restructure to make it clear that you're talking about particular parts of the FiLM definition i.e. have subtitles like "+B(Z)" or "Y(x) . x" or something like that? It'd be helpful to more clearly embed the literature review in an exploration of the equation.
  • "Several variants of FiLM can be found in the literature." -> We need to make it clear that we've been discussing PARTS of the FiLM formulation, and now we're discussing models that use all of the bells and whistles.
  • "So far, the distinction between the FiLM generator and the FiLM-ed network has been rather clear, but it is not strictly necessary." -> To make it even more clear, we could spell it out here: "We've had one network which outputs parameters for the transformation, and these are applied to the layers of a second network."
  • "By feature-wise, we mean that scaling and shifting are applied element-wise, or in the case of convolutional networks, feature map-wise." -> (Archy says) This is a little confusing. Element-wise is a mathematical statement, but feature map-wise is a statement about what that unit of the network represents. We should explain the level of granularity in convolutional neural networks vs. fully-connected networks. In CNNs, a feature map is the same feature observed at different spatial locations.
  • Define gamma and beta before introducing them in the first equation of the article.
  • When discussing multiplicative interactions, it would be helpful to introduce the term 'conditional scaling' at this point rather than later, and to clarify the relationship to conditional biasing.
  • Add a "spoiler" sentence connecting CBN and FiLM when introducing CBN.
  • Merge the sentence "We can also use FiLM layers to condition a style transfer network on a chosen style image." into the next paragraph.
  • Should there be a sub-heading for self-conditioned models?
  • (Archy) Can you say something about how squeeze-and-excitation differs from the norm? i.e. all layers are conditioned on previous layers in a vanilla NN... I think this description is a little underspecified. Isn't SE more about allowing between-channel interactions than between-layer interactions?
  • Beef up the conclusion. What is it that our new formulation brings to the table? Why is thinking of these different things within a single family of bilinear transformations a useful exercise? Are there open questions we would like to see answered?
@archydeberker
Copy link

archydeberker commented Dec 20, 2017

  • The section on LSTM/PixelCNN/WaveNet just before A take on understanding FiLM-ed networks still reads as somewhat fragmented
  • are likely to share a good amount of computation when mapping from the abstract noise vector to the output image . Consider rephrasing good amount
  • In the interest of not being confusing to readers already familiar with these methods, we chose to stick to the nomenclature used in the original papers, but we do draw connections to the FiLM nomenclature where appropriate. Consider removing being.
  • In the visual domain, the ImageNet 2017 winning model [20] employs a self-conditioning scheme in the form of feature-wise sigmoidal gating as a way to condition a layer’s activations on its previous layer. I'm still not completely sure that this description of squeeze-excitation is accurate - did you revisit?
  • Going backwards from the definition of each computation mechanism, we will now explain how they can be expressed in terms of generalized bilinear transformations. Might be better as Starting with the mathematical definition of each computation, we will now explain how they can be expressed in terms of generalized bilinear transformations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants