Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnalysisBase #43

Merged
merged 20 commits into from
Jul 14, 2021
Merged

AnalysisBase #43

merged 20 commits into from
Jul 14, 2021

Conversation

ojeda-e
Copy link
Member

@ojeda-e ojeda-e commented Jul 6, 2021

This PR is an initial suggestion to fix #41

  • Methods included in the initial analysis base:

  • __init__

  • _prepare

  • _single_frame

  • _conclude

  • Some additional comments

This first AnalysisBase was build under the following assumptions:

  1. The atoms of reference in AtomGroup remain the same during the n_frames of the trajectory.
  2. lipid translocation does not occur during the trajectory. (Although I am not considering bilayers, this reinforces 1)
  3. By introducing a grid of bigger size than the simulation box, the calculation of H and K may be jeopardized by the introduction of np.nans. This selection will certainly generate np.nans in the calculation of the gradient, which ultimately will affect the averaged grids. Hence, the attempt of warning the user (in __init__).
  4. The overall result is the averaged np.array of surface, K, and H over frames. (in _conclude)
  5. The calculation of surface, H, and K, is performed in every frame (in _single_frame).
  6. The respective values of surface, H and K and their normalized grids are initialized (in _prepare). They are a tuple of np.arrays where the first element contains the respective values, and the second element counts the elements when defined.
  7. To sum and normalize values, the _output function was introduced. (But I am not sure this is something people would agree with)

As discussed in #21 with @lilyminium, this analysis base will potentially replace core.py.

@codecov
Copy link

codecov bot commented Jul 6, 2021

Codecov Report

Merging #43 (9451b0c) into main (3e87eec) will increase coverage by 32.40%.
The diff coverage is 100.00%.

@ojeda-e ojeda-e mentioned this pull request Jul 6, 2021
4 tasks
Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fantastic start @ojeda-e! Your intuition is spot on for what should be done in the init/prepare/single_frame/conclude methods. I have some notes on the structure, and I wonder if we can avoid using counter arrays entirely by using np.nanmean.

membrane_curvature/base.py Outdated Show resolved Hide resolved

def _single_frame(self):

surface_ = derive_surface(n_x_bins=self.n_cells_x, n_y_bins=self.n_cells_y,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you give derive_surface the coordinates? I don't see self.n_cells_x, self.n_cells_y, self.max_width_x, self.max_width_y defined either?

def _conclude(self):

# Calculate average of np.array over trajectory normalized over counter
self.results.average_surface = self.results.surface[0] / self.results.surface[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use np.nanmean to average over your surfaces without needing to have these counter arrays?

For example, you could set up a z_surface array to collect all the surfaces you get from derive_surface without modification:

def _prepare(self):
    self.results.z_surface = np.full((self.n_frames, self.x_bins, self.y_bins), np.nan)

def _single_frame(self):
    self.results.z_surface[self._frame_index] = derive_surface(...)

and then finally average over the frames in _conclude:

def _conclude(self):
   self.results.average_z_surface = np.nanmean(self.results.z_surface, axis=0)

(and the same for the mean and gaussian surfaces).

self.pbc = pbc
self.y_bins = y_bins
self.x_bins = x_bins
self.x_range = x_range if x_range else (0, universe.dimensions[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like these defaults!

membrane_curvature/base.py Outdated Show resolved Hide resolved
import numpy as np


def __init__(self, universe, select,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need universe to get passed? Is select an AtomGroup or a string -- if it's an AtomGroup, you could just use universe = select.universe.

FYI most AnalysisBase classes use select as a string, and many accept either an AtomGroup or Universe for the first argument. To safeguard against the multiple possibilities, you could do this:

def __init__(self, universe, select="all", ..., **kwargs):
  super().__init__(universe.universe.trajectory, **kwargs)
  self.selection = universe.select_atoms(select)

membrane_curvature/base.py Outdated Show resolved Hide resolved
membrane_curvature/base.py Outdated Show resolved Hide resolved
@ojeda-e ojeda-e marked this pull request as draft July 6, 2021 04:58
self.results.gaussian = (cumulative, count)


def _output(self, value_per_frame, cumulative_output, counter):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to ask you something about this approach, @lilyminium . To be honest, I didn't see any other sensible option to accumulate the values per frame and count. Would you please share with me some alternatives? What do you think would be a good idea?
(Not that I am looking for a change ☺️ , more to consider alternatives I can't see by myself)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to accumulate the count per frame as an end goal? My first thought would be to collect all the surfaces in a 3D array: (n_frames, n_x_cells, n_y_cells) and then average over them with np.nanmean for the average surface.

@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 7, 2021

Hi @lilyminium
I updated the AnalysisBase with the suggested changes and other minor improvements and corrections.
I also changed the position of the argument selection in the derive_surface function just for consistency.

@lilyminium
Copy link
Member

This design is looking really, really good @ojeda-e. I don't have much to say except that some lines are looking quite long so you may want to check that you're following PEP8. One trick could be to make get_z_surface a method in the class, since it uses attributes of self so often -- but that's minor.

I guess the proof is in the pudding -- would you like to start writing tests to see how it works? :-)

@lilyminium
Copy link
Member

Also great idea to switch the order of the arguments to derive_surface, the new order feels much more intuitive!

@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 8, 2021

I guess the proof is in the pudding -- would you like to start writing tests to see how it works? :-)

Yes! :D I will open an issue for the respective tests and will work on them after solving pbc conditions in surface.py.

If you are ok with it, I will skip making get_z_surface as a method. At least temporarily. :)
I will push the final version briefly.

@ojeda-e ojeda-e marked this pull request as ready for review July 8, 2021 01:26
@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 8, 2021

@lilyminium re-reading your comment makes me wonder if you would prefer the tests in this PR or in a separate issue. Apologies I wrongly assumed it was a separate PR. What do you think works best?

@lilyminium
Copy link
Member

lilyminium commented Jul 8, 2021 via email

@ojeda-e ojeda-e marked this pull request as draft July 9, 2021 04:10
- sphinx

- MDAnalysis==2.0.0-dev0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to install it explicitly via pip:

Suggested change
- MDAnalysis==2.0.0-dev0
- pip:
- MDAnalysis==2.0.0-dev0

Copy link
Member

@IAlibay IAlibay Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ojeda-e
Because it's relying on the Results class, could you raise an issue as a reminder that MDA should get pinned to >=2.0.0 in setup.py? (I wouldn't do it now, but it'd be a good thing to do before the first release)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also +1 to @RMeli's comment. The beta needs to be installed via pip (swapping -dev0 with b0 should do the trick).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, @IAlibay, this PR demonstrates to be that the beta was worth the effort because it makes it a lot easier for packages to test against what 2.0 will look like.

@lilyminium
Copy link
Member

lilyminium commented Jul 9, 2021 via email

@RMeli
Copy link
Member

RMeli commented Jul 9, 2021

@lilyminium I had this feeling and I was looking for the announcement mentioning the beta package name... Do you know where I can find it (for future reference)? It is not with the usual blogs. Was it just on Twitter?

@IAlibay
Copy link
Member

IAlibay commented Jul 9, 2021

@lilyminium I had this feeling and I was looking for the announcement mentioning the beta package name... Do you know where I can find it (for future reference)? It is not with the usual blogs. Was it just on Twitter?

We never got around to it, we were going to announce it with the MDAnalysis workshop materials but unfortunately that got stalled.

edit: also yes, can confirm it's MDAnalysis==2.0.0b0

Long term we should have a CI runner that directly installs the dev version though.

@lilyminium
Copy link
Member

Sorry I didn't mean to remove those review requests... hazards of githubbing on phone 🤦‍♀️

@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 10, 2021

Thanks, @lilyminium, @IAlibay, and @RMeli for your comments.

I just wanted to add an initial version of the tests with previously used dummy_arrays and a Universe with .gro and .xtc files. They passed, thanks again!

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments, overall looks good mainly needs documentation.

if (self.selection.n_residues) == 0:
raise ValueError("Invalid selection. Empty AtomGroup.")

# Raise if range doesn't cover entire dimensions of simulation box
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed some of the discussion on this.
Assuming the grid needs to cover the entire box, how do you deal with NPT-led box fluctuations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's something I didn't pay too much attention to (and I didn't clarify, my bad). One of the assumptions I had is that calculation of membrane curvature is performed in equilibrated systems. If the box shrinks or expands too much (one unit cell in the grid), the calculation wouldn't be accurate.

The reason why I introduced the restrictions (which I changed now to a warning message) is that (1) if the surface is derived from a portion of the membrane, the calculation of curvature in the edges will be inaccurate because the calculation of a gradient runs over adjacent values. On the other hand, (2) in the scenario in which the grid is too big, unnecessary np.nans will be introduced. It means that, again, the undefined adjacent values will have a negative impact on the calculation of curvature.

I don't see a straightforward solution for NPT systems unless the calculation returns a different grid per frame and then results must be normalized over grids of the same size, and I don't have a clear idea, at least right now, of how to keep track of such variation over frames. Do you have any suggestions here?

membrane_curvature/base.py Show resolved Hide resolved
pbc=True, **kwargs):

super().__init__(universe.universe.trajectory, **kwargs)
self.selection = universe.select_atoms(select)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make everything clearer to future developers I would rename this from self.selection to either self.ag or self.atomgroup. That way folks can look at the name and immediately know what type of object it is (also better matches other MDA analysis methods).

self.y_range = y_range if y_range else (0, universe.dimensions[1])

# Raise if selection doesn't exist
if (self.selection.n_residues) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing wrong with this technically, but it might be good to simplify it to just len(atomgroup) == 0 purely on the the basis that it's more descriptive of your intent / more widely used way of doing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By renaming selection to ag, your suggestion reads better, so I updated it. Thanks.

@pep8speaks
Copy link

pep8speaks commented Jul 11, 2021

Hello @ojeda-e! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 21:15: W291 trailing whitespace
Line 35:47: W291 trailing whitespace
Line 60:40: W291 trailing whitespace
Line 61:54: W291 trailing whitespace
Line 63:45: W291 trailing whitespace
Line 66:48: W291 trailing whitespace
Line 99:86: W291 trailing whitespace

Line 301:5: E303 too many blank lines (2)
Line 309:53: W292 no newline at end of file

Comment last updated at 2021-07-14 01:04:48 UTC

membrane_curvature/base.py Outdated Show resolved Hide resolved
@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 11, 2021

As suggested by @IAlibay, docstrings were added and suggested changes that improve the readability of the code. :)

@lilyminium
Copy link
Member

lilyminium commented Jul 12, 2021 via email

@ojeda-e ojeda-e marked this pull request as ready for review July 13, 2021 14:16
@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 13, 2021

This PR includes the AnalysisBase with respective docstrings and tests.
To limit the extent of this PR, the limitations pointed out by @IAlibay of possible NPT boxes will be in issue #47. PBC conditions will be addressed in issue #36

I would appreciate a final review here @IAlibay @orbeckst @fiona-naughton

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your AnalysisBase version looks very good — clean structure, good use of functions, well done!

I have a bunch of comments on documentation and testing. In addition to the inline comments I noted that your tests always set explicitly selection="all". You should also test that the implicit default works (just in case this gets changed) and you should also check that other selections work.

Again: well done!

membrane_curvature/base.py Outdated Show resolved Hide resolved
membrane_curvature/base.py Outdated Show resolved Hide resolved
membrane_curvature/base.py Show resolved Hide resolved
Number of bins in grid in the `x` dimension.
n_y_bins : int, optional. Default: '100'
Number of bins in grid in the `y` dimension.
x_range : tuple of (float, float), optional. Deafult: (0, `universe.dimensions[0]`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
x_range : tuple of (float, float), optional. Deafult: (0, `universe.dimensions[0]`)
x_range : tuple of (float, float), optional. Default: (0, `universe.dimensions[0]`)

Number of bins in grid in the `y` dimension.
x_range : tuple of (float, float), optional. Deafult: (0, `universe.dimensions[0]`)
Range of coordinates (min, max) in the `x` dimension.
y_range : tuple of (float, float), optional. Deafult: (0, `universe.dimensions[1]`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
y_range : tuple of (float, float), optional. Deafult: (0, `universe.dimensions[1]`)
y_range : tuple of (float, float), optional. Default: (0, `universe.dimensions[1]`)

membrane_curvature/base.py Outdated Show resolved Hide resolved
membrane_curvature/base.py Outdated Show resolved Hide resolved
membrane_curvature/base.py Outdated Show resolved Hide resolved
@@ -1,7 +1,7 @@
import numpy as np


def derive_surface(n_cells_x, n_cells_y, selection, max_width_x, max_width_y):
def derive_surface(selection, n_cells_x, n_cells_y, max_width_x, max_width_y):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • update the doc string
  • renaming selection -> atoms would be clearer

Comment on lines 233 to 234
MembraneCurvature(universe, select='name PO4', x_range=(0, 10))
MembraneCurvature(universe, select='name PO4', y_range=(0, 10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to test these two lines separately, otherwise you can get a passing test even if one of them does not raise the warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest.warns can also match part of the warnings string, always a good idea to test that the warning looks like what you expect it.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking fantastic. There's one required fix with the logger but I am sure you'll come up with a sensible solution there so I'll just approve.

Notes
-----
The derived surface and calculated curvatures are available in the
``.results``::atttributes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark up the results attribute:

Suggested change
``.results``::atttributes.
:attr:`results` attributes.

The derived surface and calculated curvatures are available in the
``.results``::atttributes.

The attribute :attr:`MembraneCurvature.z_surface`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this inside results? So

Suggested change
The attribute :attr:`MembraneCurvature.z_surface`
The attribute :attr:`MembraneCurvature.results.z_surface`

If you only want to display z_surface use the tilde

    The attribute :attr:`~MembraneCurvature.results.z_surface`

The attribute :attr:`MembraneCurvature.z_surface`
contains the derived surface averaged over the `n_frames` of the trajectory.

The attributes :attr:`MembraneCurvature.mean_curvature` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

results?

contains the derived surface averaged over the `n_frames` of the trajectory.

The attributes :attr:`MembraneCurvature.mean_curvature` and
:attr:`MembraneCurvature.gaussian_curvature` contain the computed values of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

results?

Average of the array elements in `gaussian_curvature`
Example
-----------
You can pass a universe containing your selection of reference:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs double colon to format as code

Suggested change
You can pass a universe containing your selection of reference:
You can pass a universe containing your selection of reference::

membrane_curvature/base.py Show resolved Hide resolved

import logging

logger = logging.getLogger("MDAnalysis.analysis.density")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name your logger differently ... something like

Suggested change
logger = logging.getLogger("MDAnalysis.analysis.density")
logger = logging.getLogger("MDAnalysis.MDAKit.membrane_curvature")

I started your logger's name with "MDAnalysis" so that output gets added to the default logger that can be started with

import MDAnalysis
MDAnalysis.start_logging()

and whose root logger is named "MDAnalysis". Hence any logger named "MDAnalysis.*" will also log under it.

I suggest "MDAnalysis.MDAKit" to define a namespace that all MDAKits could be using but we haven't really thought about this so this could definitely be subject to change. Maybe @IAlibay @lilyminium @richardjgowers @fiona-naughton have opinions.

In any case, for right now, pretty much anything is ok that's not "density" ;-).

@ojeda-e ojeda-e merged commit 19d8a24 into main Jul 14, 2021
@orbeckst
Copy link
Member

That was an important merged PR, well done!

@ojeda-e ojeda-e deleted the issue41 branch August 29, 2021 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AnalysisBase refactor
6 participants