Global (but not large) changes #28

mangiapasta · 2017-12-26T20:11:01Z

Are we at the point in this document where we can do full read-throughs and suggest smaller changes to the content and structure of the document? I'm reading through the whole thing today and have suggestions for revisions to sections I've not yet worked on.

dmzuckerman · 2017-12-27T02:06:51Z

@mangiapasta I have just pushed some tweaks based on a full read-through. A few parts still need adjustment.

mangiapasta · 2017-12-27T19:28:44Z

After having read the full document, I wonder about our perspective on correlated data, especially for time-series that are often output as raw data. Overall I think the document does a good job of explaining that more "effectively uncorrelated data" implies better sampling and averages. On the other hand, I think that there is some misinformation / biased perspective about how problematic correlations actually are.

For example, in section 7.2, we state, "Both block averaging and autocorrelation analysis will produce different effective sample sizes...." I guess that technically this is true, but their respective arithmetic means are identical, so that the variances about the true mean are also identical. So the fact that we have different effective sample sizes is kind of a moot point. We also spend a lot of time talking about block-averaging. Again, not necessarily wrong (and many people use it), but the whole point of that analysis is basically to avoid time correlations. But we don't really say when / why one would want to do that, or if it is even necessary (which oftentimes it's not).

Overall, I think the community has fallen into the trap of believing that correlations are bad. To a certain extent, I agree that we should avoid them, e.g. when initializing multiple copies of a simulation so that we can get better sampling. But when we are dealing with raw data and time-series, correlations are a natural part of the dynamics and usually something that we actually want. Stat-mech basically tells us to expect stationary autocorrelations, so as a sanity check, it's nice to confirm that we got what we expected. Moreover, the autocorrelation analysis is going to give similar (if not identical) results as block-averaging done correctly, but the former has less choices to make.

This is all to say, I think we should make an effort to connect correlations to the underlying physics and dispel with what I see as a mythology about avoiding correlations. To that end, I think it makes sense to elevate the autocorrelation section relative to block-averaging. I also see latter as only being useful when data-storage is a severe problem, which (if others agree), we should perhaps say. I also think we should generally advocate for saving all raw data when possible and/or not preprocessing block-averages.

I realize others may feel differently, so I'm open to suggestions / discussion on this.

mangiapasta · 2017-12-27T19:43:12Z

More comments on structure and overall document.

I think the title should be changed to, "Best Practices for Sampling and Uncertainty Quantification in Molecular Simulations." I generally think of sampling as preceding UQ, and the paper is also structured that way.
The introduction and scope needs a bit more discussion. For example, I think the first two paragraphs of section 3 can go in the intro. I think we should also give a high-level overview of what the document recommends. For example, I think we should highlight the global picture of what best practice looks like, i.e. starting with careful planning, testing of adequate sampling, and then UQ. (This shouldn't step on the toes of the checklist though). If folks also agree, we should discuss where our recommendations deviate from common practice, e.g. that we should not avoid correlations in time-series data.

dmzuckerman · 2017-12-27T23:32:39Z

@mangiapasta thanks for thoughtful comments. Here are my thoughts:

I like the title the way it is because our main focus is indeed uncertainty quantification. Of course, we have to discuss sampling, but we certainly don't say too much about how to do it.
I agree with you that the introduction can be fleshed out along the lines you suggest. However, I don't think we should remove any text from Sec. 3. It's ok to repeat key points and we want our sections to be semi-free-standing.
Regarding correlations, I don't think we should change too much what we have. Of course, if there are genuinely misleading statements, we should correct them - e.g., if we suggest correlations can be avoided. But it IS true that correlations are the main reason why UQ is challenging for molecular simulation. I agree with you 100% that they are physical and in MD represent true dynamics - in fact, although it's not an accepted 'best practice', I am trying to advocate that people simply try to account for them in simulation analysis as I discuss a bit in a hot-off-the-press blog post ... which of course was motivated by our manuscript.

With all that being said, @mangiapasta why don't you make edits as you see fit to the intro ... and perhaps do anti-anti-correlation edits as a pull request?

mangiapasta · 2018-01-04T16:39:41Z

In the section on Pre-simulation sanity checks, I don't know what the following paragraph is actually saying:

"If you read this guide through \emph{before} performing a simulation, you will have a much better sense of the criteria applicable to your data -- and which indeed \emph{should} be applied by knowledgeable reviewers of your work. Thus we strongly advise understanding the concepts presented here as well as in related reviews \cite{Grossfield2009,JCGM:GUM2008}."

Specifically, what does "criteria applicable to data" mean? Should this be something like, "expected properties of your data" or something similar? Also, how are criteria applied by reviewers to work? Is the idea of this paragraph to say something about what decisions can be made on the basis of a given dataset?

Also, it seems to me that the content of that paragraph really apply to the document as a whole, not just the Pre-simulation checks. Am I missing something here? This paragraph feels like it should go elsewhere.

dmzuckerman · 2018-01-04T23:11:10Z

@mangiapasta thanks for catching that flabby writing - I am the guilty author.

I think what I was trying to say belongs in the planning section. I intended to communicate that authors should be aware of the issues involved with doing good uncertainty quantification - and also be aware that the protocols we suggest may not cast the most favorable light on data obtained from a poorly planned or inadequately sampled study.

I think one of the ambitions of the Livecoms journal is that reviewers (from any/all journals) will use the Livecoms Best Practices articles as implicit or explicit criteria in evaluating manuscripts. Thus, authors have a selfish interest to follow our suggestions to the extent they're used by reviewers. And I just thought that awareness of all this at the planning stage would be most beneficial.

I guess there's no reason some of the same things couldn't also be mentioned in the general introduction, though I'm not sure how much we can presume that reviewers (for other journals) will take our recommendations seriously.

If you care to revise along these lines, please feel free. If not, let me know and I'll revisit.

mangiapasta · 2018-01-05T01:04:56Z

Ahh, this makes much more sense now. Thanks for clarifying. I was slowly going through sections tagging things that didn't make sense to me. If you're okay with me editing, I'll try to rephrase a bit along the lines of your post just now. Alternatively if you want first crack at revising your own words, I'm fine with that too.

If you get a chance, let me know your thoughts on the intro. I added several paragraphs in an effort to foreshadow the document's overall structure and (hopefully) make the reader start thinking about issues that we raise.

dmzuckerman · 2018-01-05T02:16:27Z

@mangiapasta please edit, and I'll review. Thanks a lot. I should have some time tomorrow to go through whole doc.

dmzuckerman · 2018-01-09T16:06:52Z

@mangiapasta please let me know when you've had a chance to do this. I want to go over the doc after you're finished. Thanks!!

mangiapasta · 2018-01-09T19:19:05Z

Will do. I'm currently at a conference but should be able to get to it later this week.

…

________________________________ From: dmzuckerman <notifications@github.com> Sent: Tuesday, January 9, 2018 11:06:52 AM To: dmzuckerman/Sampling-Uncertainty Cc: Patrone, Paul (Fed); Mention Subject: Re: [dmzuckerman/Sampling-Uncertainty] Global (but not large) changes (#28) @mangiapasta<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmangiapasta&data=02%7C01%7Cpaul.patrone%40nist.gov%7C6c9fcbd4da7a4b66d77f08d5577b3806%7C2ab5d82fd8fa4797a93e054655c61dec%7C1%7C0%7C636511109095065764&sdata=xzlvglDCysVdqbpQIx9OwALr3QKehEDNrFktel0k8d8%3D&reserved=0> please let me know when you've had a chance to do this. I want to go over the doc after you're finished. Thanks!! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdmzuckerman%2FSampling-Uncertainty%2Fissues%2F28%23issuecomment-356329400&data=02%7C01%7Cpaul.patrone%40nist.gov%7C6c9fcbd4da7a4b66d77f08d5577b3806%7C2ab5d82fd8fa4797a93e054655c61dec%7C1%7C0%7C636511109095065764&sdata=Os9NlS1ArfylInqRCiyPRDdF9%2FD2sELi%2BF4furBEP5A%3D&reserved=0>, or mute the thread<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAd9eEb1nbNCiNmvQH0FRAXBiXFbU5sPKks5tI46cgaJpZM4RM-1b&data=02%7C01%7Cpaul.patrone%40nist.gov%7C6c9fcbd4da7a4b66d77f08d5577b3806%7C2ab5d82fd8fa4797a93e054655c61dec%7C1%7C0%7C636511109095065764&sdata=rZuKYofEvcSmtCQDwlQcf0NtpHlaL2Qi6zK5E0mGvy4%3D&reserved=0>.

mangiapasta · 2018-01-10T20:50:14Z

Okay, I made some modest edits to the section. I tried to not remove anything that was there but only add and/or clarify ideas that you originally put in. I also added a few references. One is to a chapter on UQ for MD that I wrote with another NIST author (It's been accepted at Reviews in Computational Chemistry but is not yet published) and the other is an AIAA conference proceedings where we talk about issues related to time-series data and analysis thereof.

…

________________________________ From: dmzuckerman <notifications@github.com> Sent: Tuesday, January 9, 2018 11:06:52 AM To: dmzuckerman/Sampling-Uncertainty Cc: Patrone, Paul (Fed); Mention Subject: Re: [dmzuckerman/Sampling-Uncertainty] Global (but not large) changes (#28) @mangiapasta<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmangiapasta&data=02%7C01%7Cpaul.patrone%40nist.gov%7C6c9fcbd4da7a4b66d77f08d5577b3806%7C2ab5d82fd8fa4797a93e054655c61dec%7C1%7C0%7C636511109095065764&sdata=xzlvglDCysVdqbpQIx9OwALr3QKehEDNrFktel0k8d8%3D&reserved=0> please let me know when you've had a chance to do this. I want to go over the doc after you're finished. Thanks!! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdmzuckerman%2FSampling-Uncertainty%2Fissues%2F28%23issuecomment-356329400&data=02%7C01%7Cpaul.patrone%40nist.gov%7C6c9fcbd4da7a4b66d77f08d5577b3806%7C2ab5d82fd8fa4797a93e054655c61dec%7C1%7C0%7C636511109095065764&sdata=Os9NlS1ArfylInqRCiyPRDdF9%2FD2sELi%2BF4furBEP5A%3D&reserved=0>, or mute the thread<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAd9eEb1nbNCiNmvQH0FRAXBiXFbU5sPKks5tI46cgaJpZM4RM-1b&data=02%7C01%7Cpaul.patrone%40nist.gov%7C6c9fcbd4da7a4b66d77f08d5577b3806%7C2ab5d82fd8fa4797a93e054655c61dec%7C1%7C0%7C636511109095065764&sdata=rZuKYofEvcSmtCQDwlQcf0NtpHlaL2Qi6zK5E0mGvy4%3D&reserved=0>.

dmzuckerman · 2018-01-10T21:00:35Z

thanks!

mangiapasta · 2018-01-17T06:00:24Z

In the quick-and-dirty section, I left a big red note near the end. I'm still trying to hash out in my mind the distinction between two related concepts. Input from folks here would be useful. In that section, it seems to me that we start by discussion convergence a la Law of Large Numbers / Central Limit Theorem. That is, more sampling makes an estimator converge to its true value. This seems like convergence in a true mathematical sense, insofar as I can state in what sense the convergence occurs and roughly how fast.

We then pivot to ``convergence'' as characterized by overlap of error bars. This seems to me more like a value-judgement proxy for the first type of convergence. That is, given a certain amount of overlap in error bars, do I feel comfortable concluding that my estimator has converged "sufficiently" in the mathematical sense. I'm not aware of any sense in which overlap of error bars provides rigorous assessments of convergence (although I could be wrong here).

Anyways, I want to avoid the illusion that statistical / probabilistic conclusions are the same as the value judgements and "policy" decisions we make on the basis of information gleaned from statistics. To say that we're comfortable with the level of convergence is not a mathematical statement. I feel like we're veering a little close to conflating these ideas, however.

dmzuckerman · 2018-01-17T15:20:20Z

@mangiapasta thanks for, as always, thorough and thoughtful comments. Let us know when you're done going through what you want to do and then others can revise accordingly. Leaving comments in the manuscript is the best way to ensure things get addressed.

Regarding 'overlap of error bars' as indicative of convergence, I think you're referring to the combined clustering subsection written by @drroe . He can give his thoughts on that issue, but bear in mind this is the 'quick and dirty' (now qualitative, semi-quant) section. So I think the whole section should be read as providing a necessary-but-not-sufficient test. I think elsewhere in the paper we say that there is no absolute test for convergence.

That being said, clarifying the language in places where you find it could be read the wrong way would absolutely be helpful.

mangiapasta · 2018-01-17T15:35:11Z

Thanks for the heads up on authorship. I'm done editing that section now.

I agree that the section is really about necessary but not sufficient tests that should be easy to perform. My concern is more that I think there is some confusing (and possibly mathematically incorrect) language. More generally, I also think it's important to draw a clear distinction between mathematical statements (there is x-amount of overlap) and corresponding value judgements ("I'm okay with that amount of overlap," or "Uncertainties are too large; I need to revisit the simulations.").

At any rate, I'll wait back to hear from @drroe before changing anything else in that section.

mangiapasta · 2018-02-05T21:39:41Z

I've got to take a break for a few days and do some other work. I put in a conclusion section and went through the whole manuscript. I didn't get a chance to do much with the bootstrap section, and in some places I left comments littered throughout instead of making edits.

I rewrote large parts of the linear propagation section (sorry original author). Happy to discuss / reinsert some of the original text, but I found some incorrect statements in there and wanted to give a more complete description of how the process works.

dmzuckerman · 2018-02-06T02:02:34Z

Thank you @mangiapasta !
@dwsideriusNIST if you can finish checking on notation etc, that would be great.

dmzuckerman closed this as completed May 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global (but not large) changes #28

Global (but not large) changes #28

mangiapasta commented Dec 26, 2017

dmzuckerman commented Dec 27, 2017

mangiapasta commented Dec 27, 2017

mangiapasta commented Dec 27, 2017

dmzuckerman commented Dec 27, 2017

mangiapasta commented Jan 4, 2018 •

edited

Loading

dmzuckerman commented Jan 4, 2018

mangiapasta commented Jan 5, 2018

dmzuckerman commented Jan 5, 2018

dmzuckerman commented Jan 9, 2018

mangiapasta commented Jan 9, 2018 via email

mangiapasta commented Jan 10, 2018 via email

dmzuckerman commented Jan 10, 2018

mangiapasta commented Jan 17, 2018

dmzuckerman commented Jan 17, 2018

mangiapasta commented Jan 17, 2018

mangiapasta commented Feb 5, 2018

dmzuckerman commented Feb 6, 2018

Global (but not large) changes #28

Global (but not large) changes #28

Comments

mangiapasta commented Dec 26, 2017

dmzuckerman commented Dec 27, 2017

mangiapasta commented Dec 27, 2017

mangiapasta commented Dec 27, 2017

dmzuckerman commented Dec 27, 2017

mangiapasta commented Jan 4, 2018 • edited Loading

dmzuckerman commented Jan 4, 2018

mangiapasta commented Jan 5, 2018

dmzuckerman commented Jan 5, 2018

dmzuckerman commented Jan 9, 2018

mangiapasta commented Jan 9, 2018 via email

mangiapasta commented Jan 10, 2018 via email

dmzuckerman commented Jan 10, 2018

mangiapasta commented Jan 17, 2018

dmzuckerman commented Jan 17, 2018

mangiapasta commented Jan 17, 2018

mangiapasta commented Feb 5, 2018

dmzuckerman commented Feb 6, 2018

mangiapasta commented Jan 4, 2018 •

edited

Loading