The following peer review was solicited as part of the Distill review process.
The reviewer chose to waive anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.
Distill is grateful to Austin Huang for taking the time to review this article.
On Outstanding Communication:
For such a topic that is in such a great need of accessible explanations, I felt that the article’s organization and writing could have been much improved.
More consideration needs to be put into the writing how topics are sequenced.
For example, under “Marginalization and Conditioning”, the high level intuition came after a detailed technical exposition such that, if the reader was able to follow the technical exposition, they wouldn’t need the figures. On the other hand, if the reader did not have the intuition of what marginalization and conditioning illustrated in the figure already in their minds, they would have struggled to follow the last 3-4 paragraphs.
Other issues follow this pattern of being sequenced poorly, starting with the details and ending with the big picture. The section on multivariate Gaussian had two paragraphs discussing positive semidefiniteness, diagonal/off-diagonal elements of the covariance, only to be followed by actually defining the covariance matrix, ending with a visual of what a multivariate Gaussian looks like.
The section on the posterior distribution (arguably the crux of the model) does not read well. It is difficult to follow and is in need of a rewrite.
See section-by-section comments at the end of the review for a detailed discussion of specific communication issues.
On Scientific Correctness & Integrity:
Given the article does not make scientific claims and primarily focuses on the communicating the basic mathematical structure of a particular area of machine learning, most of these don’t directly apply.
This could be seen as one weakness of the article - that it doesn’t take enough of a stance (a good contrast is the T-sne article, another communication of ML methodology which does take positions on their use rather than focusing strictly on the mathematical underpinnings). This article doesn’t comment or help the reader reason about correct or incorrect application of GPs, it only describes the mathematical machinery behind them.
On the topic of limitations, it would be nice if the article wrote about pitfalls / limitations of Gaussian Process models.
Detailed Section Comments:
This assumption will often be incorrect (in some circles, Gaussian processes are well known), but more importantly, misses an opportunity to provide a stronger motivational hook to the article beyond "it's something you have haven't heard of / rehearsing the basics is good".
The phrasing here feels more convoluted than it needs to be. Why bring the overloaded regression terminology followed by an awkward definitional statement after its initial use. Could just skip to using the definition in the first sentence and avoid the "just a quick recap" sentence altogether.
Multivariate Gaussian Distributions
This section misses an opportunity to introduce a figure that illuminates the gap between the intuition of a Gaussian most people have with the way they're used in GPs.
One of the big stumbling blocks with GPs is a visual one - when students are introduced to Gaussians, they are used to looking at the distribution with each dimension mapped to a visual axis (like the two figures in this section). In GPs, the dimensions of a multivariate Gaussian are visually represented as vertical axes (in effect, sharing a single axis).
This section shows the former, the next section shows the latter, but we’re missing a visual that bridges and links the relationship between the two visual representations. There is an opportunity to provide an illustration that bridges their familiar representation of multivariate gaussian with one dimension per axis (limited between 1D to 3D examples), with the visual representation common to GPs (one dimension per horizontal slice) by showing the correspondence side-by-side in a single interactive visual.
What purpose does the "In general" clause serve?
"symetric" misspelled - should be "symmetric"
I believe this is an error - the diagonal of the covariance matrix corresponds to the variance for each random variable, not the standard deviation.
The placement of the controls for the bivariate normal figures are counter-intuitive.
The placement bottom handle pointing downwards implies that:
Neither of these intuitions turns out to be true of the control's effect on the distribution.
To add more confusion, the most positive correlation is obtained when the slider is at its most negative position and the most negative correlation is obtained when the slider is at its most positive position.
This sentence does not make much sense, unpacking the two clauses:
"Gaussian distributions are widely used to model the real world ... when the original distributions are unknown."
This notion of “assume Gaussian by default” reflects a common misuse of Gaussian distributions and is blamed for many modeling failures. The phrasing here implies that use of Gaussians in conditions where the original distributions are unknown is an acceptable common practice, which a poor modeling practice.
"Gaussian distributions are widely used to model the real world ... in the context of the central limit theorem."
I can guess what this means (that Gaussian distributions are justified if the conditions of the central limit theorem are held) but the phrasing reads very awkwardly.
Use of “This” is ambiguous (Does it refer to being closed, conditioning, marginalization? etc) Could replace "This" with "Being closed under conditioning and marginalization" to reduce ambiguity.
This section needs more elaboration as there are two key stumbling blocks for those coming from other backgrounds.
I don’t think it’s helpful to qualify the interpretation as "straightforward" or “not straightforward”, just explain the interpretation without the qualification.
I think it would be beneficial to show the visual first and then present marginalization and conditioning equations as elaborations on the visual.
This transition sentence is confusing considering continuous views of functions hasn't been discussed. Up to now the main topic has just been the multivariate gaussian distribution, not continuous functions.
There's a bug in the figure here the right margin annotation doesn't get updated until after the slider is released. It should get updated as soon as a slider is touched.
This relationship could be made more intuitive visually. Specifically, what exactly is meant by banding around the diagonal and why does it confer stationarity?
The "click to start" functionality is awkward, why does the user need to start/pause at all? Why not just have the visual continually running by default?
This section probably needs the most work writing-wise.
This can be a harmful intuition since noise parameters can enable the solution set of functions to also not pass through the training points.
I think I understand the idea that the desire is to obtain a closed form representation, but this is a very awkward choice of phrase. Lots of times random processes are used to find fits to data (e.g. MCMC).
Very confusing what concept "it" is referring to here.
Distill employs a reviewer worksheet as a help for reviewers.
The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.
Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest
The text was updated successfully, but these errors were encountered:
We very much agree with these points. In #13 we completly overhaul the interaction for the bivariate figures. Now the MVN can be controlled using its eigenvectors which, in our opinion, feels more natural.
Additionally, we use SVD instead of Cholesky decomposition, which yields much nicer ellipses.
This is a great idea! PR #15 adds such an interactive figure that should make this connection much clearer. It allows the user to explore the two different views (sample from distribution and corresponding horizontal slice; for 2 dimensions).
PR #10 fixes the corresponding passages in the text.
PR #18 gives additional information about the noise model so that the function does not have to pass directly through the observations.
We have fixed this minor comments in PR #19.