Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
The following peer review was solicited as part of the Distill review process. The review was formatted by the editor to help with readability.
The reviewer chose to keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.
Distill is grateful to the reviewer for taking the time to write such a thorough review.
This article presents a novel approach to visualizing the behavior of neural networks using the Grand Tour, a classic linear dimensionality reduction method. The authors argue that the Grand Tour provides a more intuitive way of interpreting the models, compared with widely-used non-linear embedding methods such as t-SNE, because it follows the principle of data-visual correspondence. To explain how the approach works, the article presents multiple animated, interactive visualizations using motivating examples on CNN models for three popular image datasets including MNIST.
I particularly like that the article uses the same intuitive examples (e.g., digit 1 recognized at epoch 14) throughout the sections. Presenting a series of interactive visualizations that use the same examples clearly helps readers easily follow the authors' explanations and understand why the Grand Tour method can be powerful compared with other methods.
The figures are well-designed with interactive visualizations. Many of them show animations of visual representations, which works well in describing why the proposed approach could be better. They are easy to interpret and easy to interact with.
However, I wish the article provided more detailed descriptions of when to use the Grand Tour method and when not to use among several dimensionality reduction techniques. As the Grand Tour has not been used in the context of interpreting neural nets and the authors do not seem to claim that non-linear methods should be replaced with linear ones for every case, providing some practical guidances or limitations would help researchers and practitioners get benefits from the article for their work. Discussing its limitations can also promote further research in this area.
Overall, the article is well-structured with a good balance between intuitive explanations using visualizations and theory behind them. But I think the presentation of the beginning part could be improved, especially by restructuring the subsections, or even simply changing the title of the Background section and/or subsections. The Background section is not just designed to provide the "background" knowledge or related work, but to introduce an important motivating example used throughout the article and also present a crucial argument for the article.
There exist many grammatical errors which I list a few of them below. I suggest the authors go over the article to fix them.
Distill employs a reviewer worksheet as a help for reviewers.
The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.
Any concerns or conflicts of interest that you are aware of?: n/a