You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following peer review was solicited as part of the Distill review process.
The reviewer chose to keep keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.
Distill is grateful to the reviewer, for taking the time to review this article.
Conflicts of Interest: Reviewer disclosed no conflict of interest.
This article introduces a feature-wise transformation layer and especially focuses on a specific type of feature-wise layer called FiLM layer, which is proposed in Perez et al., 2018, and shows this type of layer often appears in the models used in recent literature (style transfer, question-answering, classification, etc…).
Overall the article well explains the concept of feature-wise transformation layer in its introduction. Also, the article covers a broad range of recent works using the layer and explains how those works condition the network by the feature-wise transformation layer for each task, which would help people who are going to apply the layer for their own tasks which need the integration of multiple sources.
However, in terms of the ‘understanding’ of the FiLM, I wanted to see more examples on each interpolation result in Section “Properties of FiLM-ed networks”.
The article shows only 1 example for each task, but showing more examples, especially “failure cases” of the interpolation, would promote the understanding of the FiLM.
Also, I am concerned that there is lack of novel results in this article.
The interpolation results were already explored in Huang et al, and Perez et al. and the t-SNE results also were explored in Perez et al.
Regarding the embedding interpolation results of CLEVR, I feel the predicted answer probabilities are not so much intuitive. For example, in the demonstration, there is an interpolated embedding on which the model predicts “2” as the number of items in the picture, but I could not come up with the actual interpolated question corresponding to the model prediction, between “How many purple things are there” and “How many brown things are there”. I would like the know why the authors think that is intuitive.
The text was updated successfully, but these errors were encountered: