Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] FFT based equilibration detection. #112

Closed
wants to merge 9 commits into from

Conversation

kyleabeauchamp
Copy link
Collaborator

One issue is that some of the previous "tricks" are ineffective (e.g. fast=True), as the FFT-based approach calculates all lagtimes simultaneously.

For the search the discard region t, I've elected to just do binary search of the logarithmically-spaced lagtime grid.

@kyleabeauchamp
Copy link
Collaborator Author

Right now I'm using the correlation function code from statsmodels. We can backport their code to avoid an extra dependency, once we finalize the big picture roadmap here

@jchodera
Copy link
Member

What about the code for this contributed from @trendelkampschroer ?

@kyleabeauchamp
Copy link
Collaborator Author

It was a major refactoring of our current code, which would require more work than the current PR.

@jchodera
Copy link
Member

Also, we had discussed using only N ~ 50 origin time points to evaluate as potential time origins t0. Did you have any luck with that?

@jchodera
Copy link
Member

One issue is that some of the previous "tricks" are ineffective (e.g. fast=True), as the FFT-based approach calculates all lagtimes simultaneously.

Do you mean all lag times simultaneously, or time origins t0 simultaneously?

@kyleabeauchamp
Copy link
Collaborator Author

All lag times, not time origins. The time origin still needs to be searched manually.

@kyleabeauchamp
Copy link
Collaborator Author

Basically, the idea here is:

  1. Iteratate time origins t0 via binary search on logarithmically spaced values.
  2. For each t0, calculate C(t) for all lagtimes t via FFT.
  3. Find maximum value of t0 and repeat on a finergrid of t0.

@jchodera
Copy link
Member

It was a major refactoring of our current code, which would require more work than the current PR.

I thought we only needed to replace the statistical_inefficiency method and grab the dependencies (not refactored, just new methods) from @trendelkampschroer's timeseries.py. That would make everything downstream of statistical_inefficiency, including detect_equilibration, fast.

I do like the idea of the binary search, but note that the computed effective number of samples isn't necessarily concave in t0.

@jchodera
Copy link
Member

Also, I'm working on a short paper on the detect_equilibration. This could be a neat follow-up, or maybe we should combine...

@kyleabeauchamp
Copy link
Collaborator Author

The issue is that the current statistical inefficiency uses a heuristic for skipping lagtimes, which isn't as useful when you use the FFT approach, which gives you results for all lagtimes simultaneously.

Also, https://github.com/trendelkampschroer/pymbar/blob/master/src/pymbar/timeseries.py has lots of extra stuff in there.

@kyleabeauchamp
Copy link
Collaborator Author

I agree that our function is not concave, so my heuristic gives only a local maximum. However, we're already using heuristics, so it's not a deal-breaker.

@jchodera
Copy link
Member

I agree that FFT is better than the multiresolution approach I coded.

There's extra stuff in @trendelkampschroer's timeseries.py, but we only need statistical_inefficiency and its dependencies. And that will speed up all statistical inefficiency calculations, including detect_equilibration. I would prefer this to hacking FFT in for only detect_equilibration.

The statsmodel dependency isn't awful---@trendelkampschroer uses scipy instead. Not sure either is a big headache.

Let's definitely try out the binary search of t0 idea!

@kyleabeauchamp
Copy link
Collaborator Author

So what about the Newton's method code? His code for statistical inefficiency calls a Newton optimizer. That's a big change from the current code.

@kyleabeauchamp
Copy link
Collaborator Author

Also, there are no tests.

@kyleabeauchamp
Copy link
Collaborator Author

I chose statsmodels to calculate the ACF to avoid duplicating code that is maintained (https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/stattools.py#L362) and tested (https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/tests/test_stattools.py) elsewhere.

@jchodera
Copy link
Member

I'm happy with the statsmodel version, but would prefer this go into statistical_inefficiency (perhaps as a selectable method?) instead of just detect_equilibration. That way, all uses can benefit.

That sound reasonable?

@kyleabeauchamp
Copy link
Collaborator Author

It's just not possible for all uses to benefit, because the FFT approach only works for a single input observable. The current statisticalIneffiency(A, B) is more general. Also, the current statisticalIneffiency() has a Python while loop hard-coded, which we have avoided in the new code entirely.

How about I have statisticalIneffiency(A) call statisticalIneffiency_fft(A) when B is set to None? I think that will lead to the most readable code.

@jchodera
Copy link
Member

Sounds great!

@jchodera
Copy link
Member

I think there were still some planned changes here. Specifically, you were going to rework detect_equilibration() to use the normal statisticalInefficiency(), but statisticalInefficiency would use the FFT code when only one argument (A_t) was specified.

@kyleabeauchamp
Copy link
Collaborator Author

So I'm going to open an new PR for the FFT code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants