Added `derive_surface` and `get_positions` with tests. #40

ojeda-e · 2021-06-29T03:19:13Z

Function core_fast_leaflets split into three functions:

get positions for each atom in the atom group for each frame.
identify the grid cell for each coordinate.
calculate the average z for the atom group.

Tests added:

test_get_positions using dummy coordinates for beads 0 to 8, all of them with z=10:

o ______ o _____ o _______ |
|   (6)  |  (7)   |   (8)  |
o ______ o _____ o _______ |
|   (3)  |  (4)   |   (5)  |
o _______o ______ o ______ |
|   (0)  |  (1)   |   (2)  |
o ______ o ______ o ______ |

Using the same number of beads in grids,

test_avg_unit_cell added for two systems.

z values of z=10
z values as below:

o ______ o _____ o _______ |
| (z=10) | (z=20) | (z=30) |
o ______ o _____ o _______ |
| (z=10) | (z=20) | (z=30) |
o _______o ______ o ______ |
| (z=10) | (z=20) | (z=30) |
o ______ o ______ o ______ |

z values of z=10 and number of beads per unit cell as shown below:

o ____ o ____ o ___ |
|   2  |  1   |  1  |
o ____ o ____ o ___ |
|   1  |  2   |  1  |
o ____ o ____ o ___ |
|   1  |  1   |  2  |
o ____ o ____ o ___ |

test_derive_surface added for same dummy_coordinates as in test_get_positions.

pep8speaks · 2021-06-29T03:19:17Z

Hello @ojeda-e! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file membrane_curvature/lib/mods.py:

Line 40:32: W291 trailing whitespace
Line 43:53: W291 trailing whitespace
Line 45:53: W291 trailing whitespace
Line 49:78: W291 trailing whitespace
Line 53:29: W291 trailing whitespace
Line 70:9: E741 ambiguous variable name 'l'
Line 92:29: W291 trailing whitespace

Comment last updated at 2021-07-05 03:21:02 UTC

codecov · 2021-06-29T03:21:25Z

Codecov Report

Merging #40 (4567ed1) into main (97713b3) will decrease coverage by 1.33%.
The diff coverage is 88.00%.

p-j-smith · 2021-06-29T18:16:23Z

membrane_curvature/lib/mods.py

@@ -24,66 +38,58 @@ def grid_map(coords, factor):
    return index_grid_l, index_grid_m


-def core_fast_leaflet(universe, z_Ref, n_cells, selection, max_width):
+def derive_surface(n_cells, selection, max_width):


Hi @ojeda-e I'm looking forward to being able to use your membrane curvature tool! I'm not sure if this fits your purpose exactly, but you could use scipy.stats.binned_statistic_2d to generate a surface of all your beads in one go.

Thanks for this suggestion @p-j-smith. I love the simplicity of binned_statistic_2d(statistic='mean') to get the mean z-values in one go! We might have to remember this for the refactor and remember to unwrap values for PBC

Hi @ojeda-e, I think I had something different in mind when I requested that you only use numpy etc and break it down to basics. Could you have a go at writing the function with the input arguments I suggested? I haven't had a look at tests because I assume they'll change with the new function.

@p-j-smith had a good suggestion with the scipy function. When I said "numpy only" that was probably too restrictive, sorry; totally on me. For future reference, you should consider any package safe if it's something that MDAnalysis considers a "core dependency". I actually find it preferable to use numpy/scipy functions because they're usually written by people much smarter than me, who've thought about the problem longer than me -- this usually results in faster, better code.

If you try something like the code below, you might make this a very short function!

x, y, z = coordinates.T scipy.stats.binned_statistic_2d(x, y, z, statistic="mean", bins=..., range=...)

Hi @lilyminium, the PR has a lot of different feedback at this point and I ended up following the suggestion from @richardjgowers which conflicts with this one. Could we defer this to an issue as it would require me to add a new dependency and can be done in parallel with other work?

def get_z_surface(coordinates, n_x_bins=10, n_y_bins=10, x_range=(0, 100), y_range=(0, 100)):

I don't think the feedback about the function signature contradicts @richardjgowers. You don't have to use scipy now if you don't want to -- the main gist is to have the main function work solely on numpy arrays of positions. I think you'll find this helpful for refactoring in #41; if you have a look at existing AnalysisBase classes, the important functions very rarely work on MDAnalysis Universes. Instead, the positions are extracted in _single_frame and these numpy arrays are used for the actual computation.

You've already done most of the work, to be honest. The simplest, most immediate conversion would be:

def derive_surface(n_cells, selection, max_width): coordinates = selection.positions return get_z_surface(coordinates, n_x_bins=n_cells, n_y_bins=n_cells, x_range=(0, max_width), y_range=(0, max_width)) def get_z_surface(coordinates, n_x_bins=10, n_y_bins=10, x_range=(0, 100), y_range=(0, 100)): z_ref = np.zeros((n_x_bins, n_y_bins)) grid_z_coordinates = np.zeros((n_x_bins, n_y_bins)) grid_norm_unit = np.zeros((n_x_bins, n_y_bins)) x_factor = n_x_bins / (x_range[1] - x_range[0]) y_factor = n_y_bins / (y_range[1] - y_range[0]) x_coords, y_coords, z_coords = coordinates.T cell_x_floor = np.floor(x_coords * x_factor).astype(int) cell_y_floor = np.floor(y_coords * y_factor).astype(int) for l, m, z in zip(cell_x_floor, cell_y_floor, z_coords): ... # rest of current derive_surface

I also wouldn't be afraid of adding a new dependency (especially from the MDAnalysis core dependencies). It's mostly a matter of adding the package name to:

https://github.com/MDAnalysis/membrane-curvature/blob/main/devtools/conda-envs/test_env.yaml

https://github.com/MDAnalysis/membrane-curvature/blob/main/docs/requirements.yaml

Thanks, I just thought this was going to be a separate issue. In the function you suggest the number of x and y bins is different, which I thought was going to be addressed later in #35 (together with PBC issue #36).

Would you please confirm that is better to add this change here? @lilyminium
Another option is to use your suggested function but without independent x and y (i.e. square arrays) and then complete the work when I address #35 later on?

IMO since you know that you want to address #35 in the future, there's no harm in designing your code towards that direction now. If you think it's easier to write and test a function that only takes get_z_surface(coordinates, n_bins=10, range=(0, 100)) and then modify all the tests you write here for the get_z_surface(coordinates, n_x_bins=10, n_y_bins=10, x_range=(0, 100), y_range=(0, 100)) signature later, go for it -- but to me it sounds like extra work.

If you have no intention of testing non-square arrays in this PR, you could just pass the same number into n_x_bins and n_y_bins as shown in the derive_surface example above. Then when you want to add rectangular functionality, you don't need to modify the existing tests from this PR.

Thanks for the quick reply. I'll add it here then.

membrane_curvature/lib/mods.py

lilyminium

Hi @ojeda-e, I think I had something different in mind when I requested that you only use numpy etc and break it down to basics. Could you have a go at writing the function with the input arguments I suggested? I haven't had a look at tests because I assume they'll change with the new function.

@p-j-smith had a good suggestion with the scipy function. When I said "numpy only" that was probably too restrictive, sorry; totally on me. For future reference, you should consider any package safe if it's something that MDAnalysis considers a "core dependency". I actually find it preferable to use numpy/scipy functions because they're usually written by people much smarter than me, who've thought about the problem longer than me -- this usually results in faster, better code.

If you try something like the code below, you might make this a very short function!

x, y, z = coordinates.T
scipy.stats.binned_statistic_2d(x, y, z, statistic="mean", bins=..., range=...)

lilyminium · 2021-06-30T00:39:48Z

membrane_curvature/lib/mods.py

@@ -24,66 +38,58 @@ def grid_map(coords, factor):
    return index_grid_l, index_grid_m


-def core_fast_leaflet(universe, z_Ref, n_cells, selection, max_width):
+def derive_surface(n_cells, selection, max_width):


Thanks for this suggestion @p-j-smith. I love the simplicity of binned_statistic_2d(statistic='mean') to get the mean z-values in one go! We might have to remember this for the refactor and remember to unwrap values for PBC

membrane_curvature/lib/mods.py

orbeckst

Small comments, mainly towards using numpy array operations. Sorry, did not manage to review everything in the time I had.

membrane_curvature/lib/mods.py

ojeda-e · 2021-07-03T19:12:11Z

After reviews, functions get_positions and grid_map were replaced by single lines in the function derive_surface. Two tests were added for this function. One using previously used small gro file, and passing coordinates as np.arrays in mda.Universe.

orbeckst · 2021-07-03T22:22:08Z

If you leave the voodoo cast here, raise an issue to investigate further. From the amount of comments you got here you can see that nobody likes code that we don’t understand.

…

Am 7/3/21 um 12:45 schrieb Estefania Barreto-Ojeda ***@***.***>: @ojeda-e commented on this pull request. In membrane_curvature/lib/mods.py: > - grid_count_frames = np.zeros([n_cells, n_cells]) + factor = np.float32(n_cells / max_width) Thanks for highlighting the docstrings, max_width is not int type, it's float. I'll update them. I don't see anything wrong with keeping it anyway. I'll leave and will re-evaluate in a performance test. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ojeda-e · 2021-07-03T22:44:10Z

Thanks @orbeckst . Issue opened (#42), the line was changed to calculate with factor = n_cells / max_width only.

ojeda-e · 2021-07-04T04:58:39Z

In this PR:

Function derive_surface slightly modified from suggestion by @lilyminium to consider different max_width in x and y dimensions. i.e. derive_surface(n_cells, selection, max_width_x, max_width_y).
Smoke test added for rectangular and square grid using pytest.mark.parametrize for get_z_surface.
Previous tests modified accordingly to changes from get_z_surface.

Different binning for x and y will be addressed in upcoming issue #35

lilyminium

Thanks for adding get_z_surface @ojeda-e. This more modular code is much more understandable to me! I have some comments on test -- mostly, pytest.mark.parametrize is not necessary if you only have one test case, and assert_almost_equal means you don't need for loops. I also have some tips for using numpy.

membrane_curvature/lib/mods.py

membrane_curvature/tests/test_mdakit_membcurv.py

lilyminium

Looks good to me!

ojeda-e requested a review from lilyminium June 29, 2021 03:23

ojeda-e linked an issue Jun 29, 2021 that may be closed by this pull request

Break core_fast_leaflet into functions. #39

Closed

3 tasks

p-j-smith reviewed Jun 29, 2021

View reviewed changes

membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved

lilyminium requested changes Jun 30, 2021

View reviewed changes

ojeda-e added 4 commits June 29, 2021 19:58

Added derive_surface and get_positions with tests.

c3eb126

Names changed and derive_surface test fixed.

b9d2bec

Typos fixed

ae67ca1

PEP8 fixed

dd2c690

richardjgowers reviewed Jun 30, 2021

View reviewed changes

membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved

orbeckst reviewed Jul 2, 2021

View reviewed changes

membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved

membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved

membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved

membrane_curvature/lib/mods.py Outdated Show resolved Hide resolved

Added changes in refactoring with tests

ae7dacd

ojeda-e force-pushed the issue39 branch from dc7a145 to ae7dacd Compare July 3, 2021 18:41

Scaling factor fixed.

b0e7934

Updated derive_surface docstrings

1dd3e23

ojeda-e mentioned this pull request Jul 3, 2021

Investigate effect of np.float32 cast in code performance. #42

Closed

factor modified, np.float32 removed

c53d5dd

function get_z_surface added with tests.

90c1772

lilyminium requested changes Jul 5, 2021

View reviewed changes

ojeda-e added 3 commits July 4, 2021 20:14

Changes in tests and docstrings.

667144d

avg_unit_cell changed to normalized_grid. Tests updated.

5e28599

Changes in docstrings and tests using get_z_surface added

4567ed1

lilyminium approved these changes Jul 5, 2021

View reviewed changes

ojeda-e merged commit 687d869 into main Jul 6, 2021

ojeda-e deleted the issue39 branch August 29, 2021 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added `derive_surface` and `get_positions` with tests. #40

Added `derive_surface` and `get_positions` with tests. #40

ojeda-e commented Jun 29, 2021

pep8speaks commented Jun 29, 2021 •

edited

codecov bot commented Jun 29, 2021 •

edited

p-j-smith Jun 29, 2021

lilyminium Jun 30, 2021

ojeda-e Jul 4, 2021

lilyminium Jul 4, 2021

lilyminium Jul 4, 2021

ojeda-e Jul 4, 2021

lilyminium Jul 4, 2021

ojeda-e Jul 4, 2021

lilyminium left a comment

lilyminium Jun 30, 2021

orbeckst left a comment

ojeda-e commented Jul 3, 2021

orbeckst commented Jul 3, 2021 via email

ojeda-e commented Jul 3, 2021

ojeda-e commented Jul 4, 2021

lilyminium left a comment

lilyminium left a comment

Added derive_surface and get_positions with tests. #40

Added derive_surface and get_positions with tests. #40

Conversation

ojeda-e commented Jun 29, 2021

pep8speaks commented Jun 29, 2021 • edited

Comment last updated at 2021-07-05 03:21:02 UTC

codecov bot commented Jun 29, 2021 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lilyminium left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

ojeda-e commented Jul 3, 2021

orbeckst commented Jul 3, 2021 via email

ojeda-e commented Jul 3, 2021

ojeda-e commented Jul 4, 2021

lilyminium left a comment

Choose a reason for hiding this comment

lilyminium left a comment

Choose a reason for hiding this comment

Added `derive_surface` and `get_positions` with tests. #40

Added `derive_surface` and `get_positions` with tests. #40

pep8speaks commented Jun 29, 2021 •

edited

codecov bot commented Jun 29, 2021 •

edited