Join GitHub today
Anonymous review 3 #6
The following peer review was solicited as part of the Distill review process. Some points in this review were clarified by an editor after consulting the reviewer.
The reviewer chose to keep keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.
Distill is grateful to the reviewer, for taking the time to review this article.
Conflicts of Interest: Reviewer disclosed no conflicts of interest.
Review Result for Feature Visualization in distill.pub
Feature visualization is one of the important techniques to understand what neural networks have learned from image dataset. This paper focuses on optimization methods and discusses major issues and explore common approaches to solving them.
This paper is well organized, and the logic is clear and easy to follow. Start from why use optimization method to visualize feature in neural networks compared to finding examples from the dataset. Then discuss how to achieve diversity with optimization, which overcomes the diversity problem to some extent. Then further discuss the interaction between neurons, which can explore the combinations of neurons working together to represent images in neural networks. Finally, discuss how to improve the optimization process better by adding different regularizations. I appreciate the authors' effort in giving readers a comprehensive overview on feature visualization, mainly focusing on optimization methods. It will be good to add more descriptions on technical parts, such as in preconditioning and parameterization part. Also, I feel the title is a little broad due to there are many other feature visualization methods, such as input modification methods and deconvolutional methods.
I agree that optimization can isolate the causes of behavior from mere correlations and are more flexible when compared to finding real examples in the dataset. While, besides diversity mentioned in the paper, real examples from the dataset seem to more interpretable than examples generated from optimization. If we directly explore examples generated from optimization, it will be very hard to interpret by people sometimes. I think the authors have noticed this, so they put dataset examples as the last column in the table of the spectrum of regularization part. I suggest the authors to put more arguments on the advantages of using optimization methods to visualize feature learned by neural networks.
Some other comments:
I personally feel that the writing of this paper is not so formal as an academic paper, and it looks more like a blog. Overall, it conducts a comprehensive survey on optimization methods for feature visualization, but does not propose new methods for feature visualization.
As an academic paper, I suggest to systematically summarize their approaches and further improve this draft.
According to the criteria in distill.pub and compared to other paper published in distill.pub, I think it can be accepted with some revisions.
Thank you for your high-quality feedback! We went through every sentence and have made numerous changes to the article based upon the review you provided. These can collectively be found in the pull request #8.
On reviewing the section on preconditioning we agree that we were trying to be very general, potentially at the expense of concreteness and approachability. We rewrote the section in 29e8b19 to be more explicit about how this technique works when applied to images. We also added additional footnotes going into more detail on the the derivation of these techniques.
We have expanded the introduction section to more clearly position our work in relation to these terms and papers, which we believe to be part of a second thread of research the community is beginning to call "attribution" or "saliency maps".
We expanded our discussing of the value of optimization based techniques over dataset based techniques in understanding neural network behavior. We see optimization based techniques as having a significant advantage when the input data distribution may change.
We use GoogLeNet trained on ImageNet.
We added additional captioning on the hero diagram mentioning both the model and the dataset it was trained on.
We try to make this clearer in multiple places:
We have added a footnote with the mathematical definition of our diversity term: cosine dissimilarity between the flattened Gram matrices.
We have reworded all such occurrences and added hyperlinks to the concrete sections.
That is correct. We do not believe these phrasings hurt understanding and have decided to keep them.
You're absolutely correct that the style of this article — and Distill more broadly — is different from traditional academic writing. We believe there is room to improve and value in experimenting with academic communication. We have discussed this idea in more depth in https://distill.pub/2017/research-debt/ .
Thanks for pointing them out, we have fixed those and proofread the article another time! :-)