Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review #2 #158

colah opened this issue Apr 21, 2018 · 0 comments

Review #2 #158

colah opened this issue Apr 21, 2018 · 0 comments


Copy link

colah commented Apr 21, 2018

The following peer review was solicited as part of the Distill review process.

The reviewer chose to keep keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.

Distill is grateful to the reviewer, for taking the time to review this article.

Conflicts of Interest: Reviewer disclosed no conflict of interest.

This article introduces a feature-wise transformation layer and especially focuses on a specific type of feature-wise layer called FiLM layer, which is proposed in Perez et al., 2018, and shows this type of layer often appears in the models used in recent literature (style transfer, question-answering, classification, etc…).

Overall the article well explains the concept of feature-wise transformation layer in its introduction. Also, the article covers a broad range of recent works using the layer and explains how those works condition the network by the feature-wise transformation layer for each task, which would help people who are going to apply the layer for their own tasks which need the integration of multiple sources.

However, in terms of the ‘understanding’ of the FiLM, I wanted to see more examples on each interpolation result in Section “Properties of FiLM-ed networks”.
The article shows only 1 example for each task, but showing more examples, especially “failure cases” of the interpolation, would promote the understanding of the FiLM.
Also, I am concerned that there is lack of novel results in this article.
The interpolation results were already explored in Huang et al, and Perez et al. and the t-SNE results also were explored in Perez et al.

Minor comments:
Regarding the embedding interpolation results of CLEVR, I feel the predicted answer probabilities are not so much intuitive. For example, in the demonstration, there is an interpolated embedding on which the model predicts “2” as the number of items in the picture, but I could not come up with the actual interpolated question corresponding to the model prediction, between “How many purple things are there” and “How many brown things are there”. I would like the know why the authors think that is intuitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

3 participants