Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor suggestions of improvement to JOSS paper #293

Closed
11 tasks done
matt-graham opened this issue Mar 26, 2024 · 6 comments
Closed
11 tasks done

Minor suggestions of improvement to JOSS paper #293

matt-graham opened this issue Mar 26, 2024 · 6 comments

Comments

@matt-graham
Copy link

matt-graham commented Mar 26, 2024

Raising issue as part of openjournals/joss-reviews/issues/6372

Some minor comments / suggestions for improvement of JOSS paper

A julia-language

  • Usually when referring to the language I think Julia is capitalized so I think it should be a A Julia language here.

to constrain the free parameter distribution

  • I would say the free parameter distribution is a bit vague here and would suggest the prior parameter distribution itself, as what is being described is I think the prior distribution but also 'prior parameter distribution' I think is relatively self-explanatory even for readers unfamiliar with Bayesian inference jargon.

constrained joint parameter distribution

  • I think it is a bit unclear here what is being referred to by the joint parameter distribution. The posterior can be seen as restriction of joint distribution on parameters and observed variables given particular values (data) for the observed variables, but it seems like you might just be referring the prior distribution on the parameters here?

statistical machine-learning emulators

  • I would say just statistical emulators is sufficient and this would be consistent with later usage in the paper.

when compared with traditional algorithms to draw samples from the joint parameter distribution

  • Again joint parameter distribution is unclear here - I would specifically state the posterior distribution on parameters if that is what you mean here. Also traditional algorithms is a bit vague here - it would be better to specifically state which algorithms you are referring to (presumably approaches like Markov chain Monte Carlo or sequential Monte Carlo running directly on the target distribution rather than an emulator).

variants of Random Walk Metropolis [@sherlock:2010]

  • The reference to Sherlock et al. (2010) seemed a little out of place to me here, as random walk Metropolis predates this (review) article. I would potentially either additional cite Metropolis et al. (1953) Equation of state calculations by fast computing machines which proposed the Metropolis algorithm (using a random walk proposal albeit not a Gaussian one) or possibly just omit a citation altogether. But I think this is quite a subjective point so feel free to ignore.

Unfortunately this task is intensely computationally expensive, commonly requiring over $10^5$ evaluations of the expensive computer code

  • I don't think this an unreasonable claim, but ideally it could do with some justification with a reference to a specific example or algorithm.

that classically this has been unavailable

  • This doesn't quite read right to me - maybe an additional for is needed at end?

Bayesian Markov Chain Monte Carlo -

  • Chain does not need to be capitalized and it is unclear Bayesian qualifier needed here - Markov chain Monte Carlo methods can be applied to Bayesian inference problems, but also to sample from any distribution for which we can evaluate an (unnormalized) density function, and the algorithms used for the Bayesian case can typically be used for the more general setting.

MCMC

  • Acronym should be defined on first usage (Markov chain Monte Carlo is used in full previously but not linked to short form).

State of the field

@nluetts
Copy link

nluetts commented Mar 27, 2024

Here is some fruther feedback from my side, also in the context of the JOSS review openjournals/joss-reviews#6372

p. 1 L 20 "To accelerate and smooth this process" (and L36 "smoothed accelerated sampling problem")

  • It is not clear to me if "smooth this process" is meant as "makes the sampling easier/less time consuming" or literally, i.e. the emulator smooths when going from parameters to observations.

p. 1 L 36 (listing the sampling methods)

  • You do not list Gibbs sampling or HMC/NUTS. When familiarizing myself with MCMC (which I have used only a few times so far) I got the impression that Gibbs sampling and HMC/NUTS sampling techniques are kind of the "recommended" sampling methods at the moment. NUTS is also the default in STAN, if I am not mistaken, and Turing.jl also implements NUTS. Are (some of) these samplers covered by "preconditioned Crank-Nicholson" (I was unsure from a very brief look at the cited paper)? Or is there a reason, in the context of CES, to prefer the samplers you have included in your package?

p. 2 L53 "order of 10^2 evaluations of the code"

  • I would change "the code" to "the original computer model". This is what it was called in the summary in the beginning of the paper, and it makes it clearer that you get rid of those expensive model runs.

p. 2 L61 "ensemble Methods"

  • Methods with capital M.

p. 2 L68 "the goal ABC is"

  • "the goal of ABC is"?

p. 2 L65 and L72 "GpABC" vs "gpABC"

  • Should this be the same?

p. 2 L72
"ABC can be used in more general contexts than CES, but suffers greater
approximation error and more stringent assumptions, especially in
multi-dimensional problems."

  • Can you provide a reference for this statement? Or is this something that is immediately clear to a statistician? Does it not also depend on the chosen emulator in CES?

Figure 1

  • Rather a cosmetic problem: the legend says true amplitude = 6.99, but the text in the paper says 7.0 (seems to be an artifact from the signal being made up of finite amount of samples, examples/Sinusoid/sinusoid_setup.jl:61).
    Ok, the legend is the mean, the text is the vertical shift, so perhaps one could consider it a property of the model (or finite number of "measurements"), but it is a bit confusing.

  • I think it would make more sense if the observed range (blue double arrow) would be centered with respect to the observed mean (blue dashed line). But your version can be compared to the true values more easily, so feel free to ignore.

Figure 4, caption: "... trained on the re-used calibration pairs"

  • The points (scatter plots) do not seem to be the same as in Figure 3. Should they be the same?

p. 5 L104 "histogram of the samples from (...?) is displayed in Figure 5."

  • Seems to be missing some words, e.g. "MCMC sampling".

@odunbar
Copy link
Collaborator

odunbar commented Apr 3, 2024

Hi both - thanks for these comments.
I shall create a PR shortly for to begin addressing them in the markdown. I have added check-boxes to the comments to aid in tracking them.

@odunbar odunbar mentioned this issue Apr 3, 2024
1 task
@odunbar
Copy link
Collaborator

odunbar commented Apr 3, 2024

@nluetts as a couple of responses to your questions.

  • by smoothing, we mean we really do regularize the problem (as the statistical emulator mean function will be smoother than the forward map evaluations) i've tried to make this more clear
  • RE other methods, we are actually getting a student to look at NUTS/HMC. So it is great that you raise this. The reason we did not initially highlight them is that they are derivative-based methods and so are not directly applicable to "original computer models" that do not have derivatives. However, it should be noted that (precisely due to the smoothing/regularizing property mentioned above) you should be able to regain the ability to use derivative-based methods. If you are interested we are also looking at "Barker proposals" that also are claimed to overcome a lot of the headaches of choosing algorithm parameters for HMC and are remarkably simple to implement compared with NUTS. I shall include a note about the methods.
  • RE ABC, our initial investigations (preformed by my predecessor) had trouble with these likelihood-free or direct likelihood emulators due to dimensionality issues. I shall investigate whether there are reports of these limitations for ABC a bit further and avoid over-claiming without evidence of course.

@odunbar
Copy link
Collaborator

odunbar commented Apr 5, 2024

I believe we have addressed everything, Please close the issue if you are satisfied.

@nluetts
Copy link

nluetts commented Apr 5, 2024

@odunbar thanks for clarifying, and thanks for the pointer to "Barker proposal", which I did not know about so far.

In this new sentence:

Two such examples, written in python, approximate the log-posterior distribution ...

you could capitalize Python.

@matt-graham the issue can be closed from my side 👍

@odunbar odunbar mentioned this issue Apr 5, 2024
1 task
@matt-graham
Copy link
Author

Thanks @odunbar - this looks all good to me too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants