-
Notifications
You must be signed in to change notification settings - Fork 21
Description
The following peer review was solicited as part of the Distill review process.
The reviewer chose to waive anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.
Distill is grateful to the reviewer, Dylan Cashman, for taking the time to write such a thorough review.
In this work, the authors explore three different kinds of recurrent models for autocompletion primarily using two interactive visualizations. It isn't clear what the contribution of the work is, however, because the claims made in the title and introduction are not adequately matched by evidence found in the visualizations. The introduction suggests that the visualizations provided show how gradient magnitudes can be helpful in understanding the differences between short-term and long-term memories, but the two visualizations prominently display the inferences of the models. The second visualization does show gradient magnitude, or connectivity, but the text provides little guidance on how to interpret it, and the two outlined examples only depend on the inference, or expected predictions, of the models. Inspecting the inferences of models to compare them is not a novel idea, although the visualizations are very clear and seem like very useful tools to inspect inference. In summary, the visualizations seemed very useful, but it was unclear what the goal of the work was.
This submission is promising in two distinct ways. To begin with, it features some very compelling visualizations that are well-designed, responsive, and inviting to the reader. It also attempts to explain the difference between memories of various recurrent models, which is not well understood empirically. However, its findings are not convincing, both because the examples given only analyze the inference probabilities of the models, which is not a novel technique, and also because the models being inspected do not replicate the performance found in past studies. I recommend that the authors attempt to replicate the findings of the NLSTM on the auto-correct data, and then use the visualizations to interpret those models. Otherwise, it is impossible to attribute any artifact found in the visualization as being the result of a poorly trained model, or an issue with the visualization. If it is the case that the NLSTM results cannot be replicated on that dataset, the authors need to offer some suggestions of why that's the case. In fact, this might offer a compelling story about their visualization—the qualitative insights they gain from their visualization lead them to conduct more quantitative experiments to understand why their NLSTM model is not learning long-term dependencies.
Some minor comments:
The figures should have more descriptive captions. For example, in the first figure, the reader doesn't know what the green highlights mean. It isn't necessary to describe in detail what connectivity is at this point. It is only necessary to provide a high-level description of how the reader should interpret what's going on in the figure.
In general, the figures are excellent, and the use of hypertext to set the state of the figures is very convenient for the reader.
The intro is fairly generic. It isn't compelling to the reader to describe memorization as ""a challenge"": Why is it challenging? Can you provide examples of why this is a critical issue? Towards the end of the introduction, there should be a high-level description of what the visualization does, and why it would offer any more insight than cross-entropy or accuracy. Many users of recurrent algorithms might also need to be convinced why an interactive visualization would even be useful; why can't this problem be solved by just printing out a metric? The answer has to do with grounding the user in their data, allowing them to rapidly test hypotheses on how the various models perform at sections of their training set, using a pleasing and inviting visualization.
It would be good to link to some literature on the vanishing gradient: Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. ""On the difficulty of training recurrent neural networks."" International Conference on Machine Learning. 2013.
When the authors mention connectivity, it is unclear if this is a term that they have defined, or if it has previous usage in the field. This should be better explained. The authors should also explain, at a high level, why the gradient might be an interesting quantity to show the reader, and explain how to interpret it. In particular, the reader should understand what it means for a model to have some gradient propagating back to much earlier words; that the current prediction is a function of the prediction made at that time step.
In both the figure describing the recurrent units and the appendix, the simpler architectures should go first, with the NLSTM last, to allow the reader to understand the context in which the NLSTM was invented. The description of the LSTM and NLSTM given in the appendix seem mostly superfluous because this information is readily available in the listed references.
Grammar/spelling and word usage:
In the introduction, there is a incomplete sentence ""While quantitative comparisons are useful.""
Some word misspellings: ""gramma"", ""attemps"", ""enogth""
When describing how the models were trained, the authors note they trained for 7139 epochs for one complete run of the data. This actually means they trained for 7139 batches and a single epoch. This may actually be the source of some of the issues they had - it is likely that their models should train for many more epochs.
Distill employs a reviewer worksheet as a help for reviewers.
The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.
Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest
What type of contributions does this article make?: Explanation of existing results
| Advancing the Dialogue | Score |
|---|---|
| How significant are these contributions? | 2/5 |
| Outstanding Communication | Score |
|---|---|
| Article Structure | 1/5 |
| Writing Style | 2/5 |
| Diagram & Interface Style | 4/5 |
| Impact of diagrams / interfaces / tools for thought? | 2/5 |
| Readability | 1/5 |
| Scientific Correctness & Integrity | Score |
|---|---|
| Are claims in the article well supported? | 2/5 |
| Does the article critically evaluate its limitations? How easily would a lay person understand them? | 2/5 |
| How easy would it be to replicate (or falsify) the results? | 2/5 |
| Does the article cite relevant work? | 2/5 |
| Does the article exhibit strong intellectual honesty and scientific hygiene? | 2/5 |