Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add self-diffusivity calculation #24

Merged
merged 19 commits into from Aug 2, 2023
Merged

Conversation

xhgchen
Copy link
Collaborator

@xhgchen xhgchen commented Jul 11, 2023

Fixes #7

Changes made in this Pull Request:

  • Add self-diffusivity calculation method sd() in class VelocityAutocorr

@xhgchen
Copy link
Collaborator Author

xhgchen commented Jul 11, 2023

Definitely want to discuss what is best for testing this, perhaps during our meeting on Thursday. I did have some ideas in my proposal, though none of them are as straightforward to put into practice as the tests written thus far.

@codecov
Copy link

codecov bot commented Jul 11, 2023

Codecov Report

Merging #24 (3c66b5d) into main (bebcfa1) will not change coverage.
The diff coverage is 100.00%.

Additional details and impacted files

@hmacdope
Copy link
Member

The step traj should have an analytically solvable integral, though what it is I am not sure.

@xhgchen
Copy link
Collaborator Author

xhgchen commented Jul 12, 2023

Is it really easily solvable analytically? I'm thinking about trying for a different numerical integration method e.g. scipy.integrate.quad for the tests.

@hmacdope
Copy link
Member

Try making a few notebooks to play around in and we can go over it on when we meet soon. 👍

@orionarcher
Copy link
Collaborator

orionarcher commented Jul 13, 2023

@ReviewNB

This should make it easier to integrate notebooks into review/version control if you want to do that.

@xhgchen xhgchen added the enhancement New feature or request label Jul 14, 2023
@xhgchen
Copy link
Collaborator Author

xhgchen commented Jul 14, 2023

@ReviewNB

This should make it easier to integrate notebooks into review/version control if you want to do that.

Thank you, I'll look into it!

@xhgchen
Copy link
Collaborator Author

xhgchen commented Jul 18, 2023

Added tests for both the simple windowed calculation and the FFT calculation but not sure if both are necessary - please let me know your thoughts. Using scipy.integrate.simpson for the tests to start and changed NSTEP from 5000 to 5001 to work with that method.

* Add `sd_odd()` using `scipy.integrate.simpson`
* Add tests for `sd_odd()`
@xhgchen
Copy link
Collaborator Author

xhgchen commented Jul 19, 2023

We could probably do without the non-FFT tests for self-diffusivity, since we test that already in the VACF tests. Am I good to remove them?

@hmacdope
Copy link
Member

hmacdope commented Jul 19, 2023

@xhgchen I would check the full code path, testing is cheap but having undetected issues is expensive.

@xhgchen xhgchen marked this pull request as ready for review July 20, 2023 01:14
@xhgchen
Copy link
Collaborator Author

xhgchen commented Jul 20, 2023

Made the applicable changes @orionarcher requested in #25 here, namely changing _parse_dim_type() to a static method and removing the unnecessary type casting.

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to validate against existing MSD code.

)
/ tdim_factor
)
# 7705160166.66 (exp) agrees with 7705162888.88 (act) to 6 sig figs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the self diffusivity? Seems very high. Try with the equivalent trajectory in positions space with the existing MSD module from MDAnalysis and see what you get.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests have been added (though they are very rough). Will try to make the separate class next!

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, we should also add something to plot the running integral of the ACF as is done in. Fig 12 of https://livecomsjournal.org/index.php/livecoms/article/view/v1i1e6324/937. The right function for this is probably scipy.integrate.cumtrapz.

assert_almost_equal(v_simple.results.timeseries, poly, decimal=3)
assert_almost_equal(v_fft.results.timeseries, poly, decimal=3)

@pytest.mark.parametrize(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can separate all the tests that use this fixture into a separate class and then make the parameterization over the class rather than repeat

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests would include the old ones for VACF too, right?

@@ -262,3 +261,68 @@ def plot_vacf(self, start=0, stop=0, step=1):
self.times[start:stop:step],
self.results.timeseries[start:stop:step],
)

def sd(self, start=0, stop=0, step=1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a green Kubo integral of the VACF so we should use the proper name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also will need citations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does sd_gk() sound good? I feel that self-diffusivity_greenkubo() is too long. I thought we were doing citations after we finish the bulk of the implementations? Or would it be better to do them now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sd_gk sounds good to me. Either is fine on citation, up to you, just make a note or issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a note in Issue #20.

`numpy.float64`
The calculated self-diffusivity value for the analysis.
"""
stop = self.n_frames if stop == 0 else stop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should raise an exception if .run hasn't already been called with some kind of cache variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure where to start here. Do you think you'd be able to explain a bit more, or would you be good with setting up a meeting to go over this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep happy to meet.

Something like the following should work.

def __init__(self,)
    self._run_called = False
   ...

   def _conclude(self):
   self._run_called = True
   ...

   def sd_gk(self):
    if not self._run_called:
        raise RunTimeError("blah blah")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or self._has_run?

Something like, "VelocityAutocorrelation.run() must be called before 'sd_gk'"

/ self.dim_fac
)

def sd_odd(self, start=0, stop=0, step=1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will probably remove this IMHO as trapeziod or Gquad is the standard. Can keep it around for testing but I would make it a private method with a _ prefix

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought Simpson's rule was better when appropriate? Here's a link to my source: https://math.dartmouth.edu/~m3cod/klbookLectures/406unit/trap.pdf

Is keeping the standard more important than having an option for better accuracy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need for an odd number of samples is a bit of a pain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel having the option available is better than not providing it, but I am no expert. Is it really preferable to remove it rather than just make it clear that trapezoid is the standard/default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general you want to aim for the lowest possible API surface so that people don't get confused and do the wrong thing.

The idea being you should plan for users to get frustrated as soon as the first thing goes wrong, meaning you should take away the ability for them to annoy themselves if that makes sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users do the wrong thing (use the method with an even number of samples), scipy.integrate.simpson actually handles it well by default using the equations in Cartwright's paper for the last interval. On second thought, considering that it is pretty much the same as trapezoid but better, what are your thoughts on using it as the default instead?

Source: https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.simpson.html

* Create position trajectory corresponding to unit velocity
trajectory
* Write tests against MSD-based self-diffusivity calculation from
the MDAnalysis MSD module (Einstein method)
* Track whether analysis is run with `self._run_called`
* Raise exceptions in self-diffusivity and plotting functions if
analysis not run
* Add tests for new exception raises
* Parameterization occurs over class `TestAllDims` to test
calculations over all possible dimensions
Comment on lines +62 to +63
def step_vtraj_pos(NSTEP):
x = np.arange(NSTEP).astype(np.float64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the name here a bit more descriptive? Maybe unit_velocity_traj

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this matter enough to be worth the effort? It's used in a lot of places, so the change would be a little messy.

`numpy.float64`
The calculated self-diffusivity value for the analysis.
"""
stop = self.n_frames if stop == 0 else stop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or self._has_run?

Something like, "VelocityAutocorrelation.run() must be called before 'sd_gk'"

@@ -262,3 +266,78 @@ def plot_vacf(self, start=0, stop=0, step=1):
self.times[start:stop:step],
self.results.timeseries[start:stop:step],
)

def sd_gk(self, start=0, stop=0, step=1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nitpick, but generally I'm not in favor of suber abbeviated naming. sd_gk means something to you now, as a developer, but it probably won't mean much to a new user. Maybe self_diffusivity_gk or self_diffusivity_green_kubo or diffusivity_green_kubo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll go with self_diffusivity_gk then.

* Update units in plotting functions
* Make VACF plotting function and tests unitless
* Make `plot_running_integral()` docstring more clear
* Units are angstroms squared per picoseconds squared
Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @xhgchen! Addressed all my comments. I would say merge and lets kick some more goals. 💯

@xhgchen xhgchen merged commit 1211874 into MDAnalysis:main Aug 2, 2023
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create user-friendly self-diffusivity calculation workflow from VACF
3 participants