Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added derive_surface and get_positions with tests. #40

Merged
merged 12 commits into from Jul 6, 2021
Merged

Added derive_surface and get_positions with tests. #40

merged 12 commits into from Jul 6, 2021

Conversation

ojeda-e
Copy link
Member

@ojeda-e ojeda-e commented Jun 29, 2021

Function core_fast_leaflets split into three functions:

  • get positions for each atom in the atom group for each frame.
  • identify the grid cell for each coordinate.
  • calculate the average z for the atom group.

Tests added:

  • test_get_positions using dummy coordinates for beads 0 to 8, all of them with z=10:
o ______ o _____ o _______ |
|   (6)  |  (7)   |   (8)  |
o ______ o _____ o _______ |
|   (3)  |  (4)   |   (5)  |
o _______o ______ o ______ |
|   (0)  |  (1)   |   (2)  |
o ______ o ______ o ______ |

Using the same number of beads in grids,

  • test_avg_unit_cell added for two systems.
  1. z values of z=10
  2. z values as below:
o ______ o _____ o _______ |
| (z=10) | (z=20) | (z=30) |
o ______ o _____ o _______ |
| (z=10) | (z=20) | (z=30) |
o _______o ______ o ______ |
| (z=10) | (z=20) | (z=30) |
o ______ o ______ o ______ |
  1. z values of z=10 and number of beads per unit cell as shown below:
o ____ o ____ o ___ |
|   2  |  1   |  1  |
o ____ o ____ o ___ |
|   1  |  2   |  1  |
o ____ o ____ o ___ |
|   1  |  1   |  2  |
o ____ o ____ o ___ |
  • test_derive_surface added for same dummy_coordinates as in test_get_positions.

@pep8speaks
Copy link

pep8speaks commented Jun 29, 2021

Hello @ojeda-e! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 40:32: W291 trailing whitespace
Line 43:53: W291 trailing whitespace
Line 45:53: W291 trailing whitespace
Line 49:78: W291 trailing whitespace
Line 53:29: W291 trailing whitespace
Line 70:9: E741 ambiguous variable name 'l'
Line 92:29: W291 trailing whitespace

Comment last updated at 2021-07-05 03:21:02 UTC

@codecov
Copy link

codecov bot commented Jun 29, 2021

Codecov Report

Merging #40 (4567ed1) into main (97713b3) will decrease coverage by 1.33%.
The diff coverage is 88.00%.

@ojeda-e ojeda-e requested a review from lilyminium June 29, 2021 03:23
@ojeda-e ojeda-e linked an issue Jun 29, 2021 that may be closed by this pull request
3 tasks
@@ -24,66 +38,58 @@ def grid_map(coords, factor):
return index_grid_l, index_grid_m


def core_fast_leaflet(universe, z_Ref, n_cells, selection, max_width):
def derive_surface(n_cells, selection, max_width):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ojeda-e I'm looking forward to being able to use your membrane curvature tool! I'm not sure if this fits your purpose exactly, but you could use scipy.stats.binned_statistic_2d to generate a surface of all your beads in one go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this suggestion @p-j-smith. I love the simplicity of binned_statistic_2d(statistic='mean') to get the mean z-values in one go! We might have to remember this for the refactor and remember to unwrap values for PBC

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ojeda-e, I think I had something different in mind when I requested that you only use numpy etc and break it down to basics. Could you have a go at writing the function with the input arguments I suggested? I haven't had a look at tests because I assume they'll change with the new function.

@p-j-smith had a good suggestion with the scipy function. When I said "numpy only" that was probably too restrictive, sorry; totally on me. For future reference, you should consider any package safe if it's something that MDAnalysis considers a "core dependency". I actually find it preferable to use numpy/scipy functions because they're usually written by people much smarter than me, who've thought about the problem longer than me -- this usually results in faster, better code.

If you try something like the code below, you might make this a very short function!

x, y, z = coordinates.T
scipy.stats.binned_statistic_2d(x, y, z, statistic="mean", bins=..., range=...)

Hi @lilyminium, the PR has a lot of different feedback at this point and I ended up following the suggestion from @richardjgowers which conflicts with this one. Could we defer this to an issue as it would require me to add a new dependency and can be done in parallel with other work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def get_z_surface(coordinates, n_x_bins=10, n_y_bins=10, x_range=(0, 100), y_range=(0, 100)):

I don't think the feedback about the function signature contradicts @richardjgowers. You don't have to use scipy now if you don't want to -- the main gist is to have the main function work solely on numpy arrays of positions. I think you'll find this helpful for refactoring in #41; if you have a look at existing AnalysisBase classes, the important functions very rarely work on MDAnalysis Universes. Instead, the positions are extracted in _single_frame and these numpy arrays are used for the actual computation.

You've already done most of the work, to be honest. The simplest, most immediate conversion would be:

def derive_surface(n_cells, selection, max_width):
	coordinates = selection.positions
	return get_z_surface(coordinates, n_x_bins=n_cells, n_y_bins=n_cells,
					     x_range=(0, max_width), y_range=(0, max_width))

def get_z_surface(coordinates, n_x_bins=10, n_y_bins=10, x_range=(0, 100), y_range=(0, 100)):
	z_ref = np.zeros((n_x_bins, n_y_bins))
	grid_z_coordinates = np.zeros((n_x_bins, n_y_bins))
    grid_norm_unit = np.zeros((n_x_bins, n_y_bins))

    x_factor = n_x_bins / (x_range[1] - x_range[0])
    y_factor = n_y_bins / (y_range[1] - y_range[0])
    x_coords, y_coords, z_coords = coordinates.T
    cell_x_floor = np.floor(x_coords * x_factor).astype(int)
    cell_y_floor = np.floor(y_coords * y_factor).astype(int)
    for l, m, z in zip(cell_x_floor, cell_y_floor, z_coords):
    	... # rest of current derive_surface

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wouldn't be afraid of adding a new dependency (especially from the MDAnalysis core dependencies). It's mostly a matter of adding the package name to:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I just thought this was going to be a separate issue. In the function you suggest the number of x and y bins is different, which I thought was going to be addressed later in #35 (together with PBC issue #36).

Would you please confirm that is better to add this change here? @lilyminium
Another option is to use your suggested function but without independent x and y (i.e. square arrays) and then complete the work when I address #35 later on?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO since you know that you want to address #35 in the future, there's no harm in designing your code towards that direction now. If you think it's easier to write and test a function that only takes get_z_surface(coordinates, n_bins=10, range=(0, 100)) and then modify all the tests you write here for the get_z_surface(coordinates, n_x_bins=10, n_y_bins=10, x_range=(0, 100), y_range=(0, 100)) signature later, go for it -- but to me it sounds like extra work.

If you have no intention of testing non-square arrays in this PR, you could just pass the same number into n_x_bins and n_y_bins as shown in the derive_surface example above. Then when you want to add rectangular functionality, you don't need to modify the existing tests from this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick reply. I'll add it here then.

Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ojeda-e, I think I had something different in mind when I requested that you only use numpy etc and break it down to basics. Could you have a go at writing the function with the input arguments I suggested? I haven't had a look at tests because I assume they'll change with the new function.

@p-j-smith had a good suggestion with the scipy function. When I said "numpy only" that was probably too restrictive, sorry; totally on me. For future reference, you should consider any package safe if it's something that MDAnalysis considers a "core dependency". I actually find it preferable to use numpy/scipy functions because they're usually written by people much smarter than me, who've thought about the problem longer than me -- this usually results in faster, better code.

If you try something like the code below, you might make this a very short function!

x, y, z = coordinates.T
scipy.stats.binned_statistic_2d(x, y, z, statistic="mean", bins=..., range=...)

@@ -24,66 +38,58 @@ def grid_map(coords, factor):
return index_grid_l, index_grid_m


def core_fast_leaflet(universe, z_Ref, n_cells, selection, max_width):
def derive_surface(n_cells, selection, max_width):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this suggestion @p-j-smith. I love the simplicity of binned_statistic_2d(statistic='mean') to get the mean z-values in one go! We might have to remember this for the refactor and remember to unwrap values for PBC

membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved
membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved
membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments, mainly towards using numpy array operations. Sorry, did not manage to review everything in the time I had.

membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved
membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved
membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved
membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved
@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 3, 2021

After reviews, functions get_positions and grid_map were replaced by single lines in the function derive_surface. Two tests were added for this function. One using previously used small gro file, and passing coordinates as np.arrays in mda.Universe.

@orbeckst
Copy link
Member

orbeckst commented Jul 3, 2021 via email

@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 3, 2021

Thanks @orbeckst . Issue opened (#42), the line was changed to calculate with factor = n_cells / max_width only.

@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 4, 2021

In this PR:

  • Function derive_surface slightly modified from suggestion by @lilyminium to consider different max_width in x and y dimensions. i.e. derive_surface(n_cells, selection, max_width_x, max_width_y).
  • Smoke test added for rectangular and square grid using pytest.mark.parametrize for get_z_surface.
  • Previous tests modified accordingly to changes from get_z_surface.

Different binning for x and y will be addressed in upcoming issue #35

Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding get_z_surface @ojeda-e. This more modular code is much more understandable to me! I have some comments on test -- mostly, pytest.mark.parametrize is not necessary if you only have one test case, and assert_almost_equal means you don't need for loops. I also have some tips for using numpy.

membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved
membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved
membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved
membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved
membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved
membrane_curvature/tests/test_mdakit_membcurv.py Outdated Show resolved Hide resolved
membrane_curvature/tests/test_mdakit_membcurv.py Outdated Show resolved Hide resolved
membrane_curvature/tests/test_mdakit_membcurv.py Outdated Show resolved Hide resolved
membrane_curvature/tests/test_mdakit_membcurv.py Outdated Show resolved Hide resolved
membrane_curvature/tests/test_mdakit_membcurv.py Outdated Show resolved Hide resolved
Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@ojeda-e ojeda-e merged commit 687d869 into main Jul 6, 2021
@ojeda-e ojeda-e deleted the issue39 branch August 29, 2021 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Break core_fast_leaflet into functions.
7 participants