The following peer review was solicited as part of the Distill review process. The review was formatted by the editor to help with readability.
The reviewer chose to waive anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.
Distill is grateful to the reviewer, Qiqi Yan, for taking the time to write such a thorough review.
There have been tons of techniques in the literature developed for probing into image networks to try to interpret how they operate. This article nicely categorizes those techniques into (1) feature visualization (2) feature attribution (3) feature grouping, and shows that these can be integrated into an interactive interface on top to let users get a sense of the internals of an image network.
Overall I definitely recommend accepting the article. The community would love this article, partially because it’s a nice integration of techniques, partially because it is visually appealing (I do wonder if some warnings should be given in the article on that visualization can give a false sense of understanding).
One challenging topic that I wish gets more discussion is, for these efforts of probing / visualizing internals of image networks:
The text was updated successfully, but these errors were encountered:
Thank you for taking the time to write such a thoughtful review! We've responded to your points individually below.
As of 73e80b2, we have a section discussing the trustworthiness of these interfaces, and forward reference it from the introduction. We think this is an important direction for future work.
Over the last few days, we’ve added a lot of interactive captions to expose the kind of insights one can gain from our interfaces.
One example we find exciting is the how the model “hallucinates” a tennis ball into the mouth of the Labrador retriever. At first glance, it seems surprising that the dog cat image has the third most likely classification be a tennis ball. Digging deeper with the spatial attribution interface, however, we see that the lower snout along with background at “mixed4d” get perceived as a snout holding a tennis ball. This in turn leads to an increased probability of “tennis ball” and also “granny smith apple”. The interesting thing about this is that it seems to reveal a kind of entanglement of features: the model has entangled snouts with tennis balls, and then tennis balls with apples.
We don’t see a way to describe something like this without reference to the model’s internal abstractions. In some sense, one might ultimately care about the input to output mapping the network represents, but the space of mappings from images to labels is very large. Without reference to details of the procedure, it’s unclear to me how one could describe things like this.
The interface is stable in the sense that the same interface can be used for multiple models. The grammar of how different building blocks compose is independent of the particular use case.
(There is an interesting research direction related to this, however, about how to highlight the similarities and differences.)
This is an important use of the term, but "interpretability" is increasingly the name for the field of techniques for understanding models. ("Transparency" seems to have died due to legal connotations.)
We totally agree! As of 73e80b2, there is a sentence about this in the introduction.
Done as of 7f7b4ee.
As of 90979c4, we've added an additional clarification that we're using a specific technique for all examples in this article and why chose to use optimization-based feature visualization (to separate things that caused a feature to activate vs merely correlations).
More generally, when we introduce feature visualization, we both link to our previous Distill article that conducts a systematic review, and cite a wide variety of techniques. We also explicitly call out using alternate feature vis techniques as a direction for future work.
Done as of 8043789.
We’re using area, which has been clarified as b343adc.
Done as of 3c4312b. We also provide notebooks with reference implementations.
In 3a6af47, we clarify that this is a technique used in some HCI communities. We think it’s a powerful technique and wanted to provide an initial exploration of how it applies to interpretability interfaces.