New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add self-diffusivity calculation #24
Conversation
Definitely want to discuss what is best for testing this, perhaps during our meeting on Thursday. I did have some ideas in my proposal, though none of them are as straightforward to put into practice as the tests written thus far. |
The step traj should have an analytically solvable integral, though what it is I am not sure. |
* Method is `sd()` in class `VelocityAutocorr`
29a3e4a
to
b7e2dd3
Compare
Is it really easily solvable analytically? I'm thinking about trying for a different numerical integration method e.g. |
Try making a few notebooks to play around in and we can go over it on when we meet soon. 👍 |
This should make it easier to integrate notebooks into review/version control if you want to do that. |
Thank you, I'll look into it! |
Added tests for both the simple windowed calculation and the FFT calculation but not sure if both are necessary - please let me know your thoughts. Using |
* Add `sd_odd()` using `scipy.integrate.simpson` * Add tests for `sd_odd()`
We could probably do without the non-FFT tests for self-diffusivity, since we test that already in the VACF tests. Am I good to remove them? |
@xhgchen I would check the full code path, testing is cheap but having undetected issues is expensive. |
* Change `self._velocity_array` to `self._velocities` for readability
Made the applicable changes @orionarcher requested in #25 here, namely changing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to validate against existing MSD code.
) | ||
/ tdim_factor | ||
) | ||
# 7705160166.66 (exp) agrees with 7705162888.88 (act) to 6 sig figs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really the self diffusivity? Seems very high. Try with the equivalent trajectory in positions space with the existing MSD module from MDAnalysis and see what you get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests have been added (though they are very rough). Will try to make the separate class next!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great, we should also add something to plot the running integral of the ACF as is done in. Fig 12 of https://livecomsjournal.org/index.php/livecoms/article/view/v1i1e6324/937. The right function for this is probably scipy.integrate.cumtrapz
.
assert_almost_equal(v_simple.results.timeseries, poly, decimal=3) | ||
assert_almost_equal(v_fft.results.timeseries, poly, decimal=3) | ||
|
||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can separate all the tests that use this fixture into a separate class and then make the parameterization over the class rather than repeat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those tests would include the old ones for VACF too, right?
@@ -262,3 +261,68 @@ def plot_vacf(self, start=0, stop=0, step=1): | |||
self.times[start:stop:step], | |||
self.results.timeseries[start:stop:step], | |||
) | |||
|
|||
def sd(self, start=0, stop=0, step=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a green Kubo integral of the VACF so we should use the proper name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also will need citations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does sd_gk()
sound good? I feel that self-diffusivity_greenkubo()
is too long. I thought we were doing citations after we finish the bulk of the implementations? Or would it be better to do them now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sd_gk
sounds good to me. Either is fine on citation, up to you, just make a note or issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a note in Issue #20.
`numpy.float64` | ||
The calculated self-diffusivity value for the analysis. | ||
""" | ||
stop = self.n_frames if stop == 0 else stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should raise an exception if .run hasn't already been called with some kind of cache variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too sure where to start here. Do you think you'd be able to explain a bit more, or would you be good with setting up a meeting to go over this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep happy to meet.
Something like the following should work.
def __init__(self,)
self._run_called = False
...
def _conclude(self):
self._run_called = True
...
def sd_gk(self):
if not self._run_called:
raise RunTimeError("blah blah")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or self._has_run
?
Something like, "VelocityAutocorrelation.run() must be called before 'sd_gk'"
/ self.dim_fac | ||
) | ||
|
||
def sd_odd(self, start=0, stop=0, step=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will probably remove this IMHO as trapeziod or Gquad is the standard. Can keep it around for testing but I would make it a private method with a _
prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought Simpson's rule was better when appropriate? Here's a link to my source: https://math.dartmouth.edu/~m3cod/klbookLectures/406unit/trap.pdf
Is keeping the standard more important than having an option for better accuracy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need for an odd number of samples is a bit of a pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel having the option available is better than not providing it, but I am no expert. Is it really preferable to remove it rather than just make it clear that trapezoid is the standard/default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general you want to aim for the lowest possible API surface so that people don't get confused and do the wrong thing.
The idea being you should plan for users to get frustrated as soon as the first thing goes wrong, meaning you should take away the ability for them to annoy themselves if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If users do the wrong thing (use the method with an even number of samples), scipy.integrate.simpson
actually handles it well by default using the equations in Cartwright's paper for the last interval. On second thought, considering that it is pretty much the same as trapezoid but better, what are your thoughts on using it as the default instead?
Source: https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.simpson.html
* Create position trajectory corresponding to unit velocity trajectory * Write tests against MSD-based self-diffusivity calculation from the MDAnalysis MSD module (Einstein method)
* Track whether analysis is run with `self._run_called` * Raise exceptions in self-diffusivity and plotting functions if analysis not run * Add tests for new exception raises
* Parameterization occurs over class `TestAllDims` to test calculations over all possible dimensions
def step_vtraj_pos(NSTEP): | ||
x = np.arange(NSTEP).astype(np.float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the name here a bit more descriptive? Maybe unit_velocity_traj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this matter enough to be worth the effort? It's used in a lot of places, so the change would be a little messy.
`numpy.float64` | ||
The calculated self-diffusivity value for the analysis. | ||
""" | ||
stop = self.n_frames if stop == 0 else stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or self._has_run
?
Something like, "VelocityAutocorrelation.run() must be called before 'sd_gk'"
@@ -262,3 +266,78 @@ def plot_vacf(self, start=0, stop=0, step=1): | |||
self.times[start:stop:step], | |||
self.results.timeseries[start:stop:step], | |||
) | |||
|
|||
def sd_gk(self, start=0, stop=0, step=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another nitpick, but generally I'm not in favor of suber abbeviated naming. sd_gk
means something to you now, as a developer, but it probably won't mean much to a new user. Maybe self_diffusivity_gk
or self_diffusivity_green_kubo
or diffusivity_green_kubo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll go with self_diffusivity_gk
then.
* Update units in plotting functions
* Make VACF plotting function and tests unitless
* Make `plot_running_integral()` docstring more clear
* Units are angstroms squared per picoseconds squared
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @xhgchen! Addressed all my comments. I would say merge and lets kick some more goals. 💯
Fixes #7
Changes made in this Pull Request:
sd()
in classVelocityAutocorr