New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added derive_surface
and get_positions
with tests.
#40
Conversation
Hello @ojeda-e! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-07-05 03:21:02 UTC |
membrane_curvature/lib/mods.py
Outdated
@@ -24,66 +38,58 @@ def grid_map(coords, factor): | |||
return index_grid_l, index_grid_m | |||
|
|||
|
|||
def core_fast_leaflet(universe, z_Ref, n_cells, selection, max_width): | |||
def derive_surface(n_cells, selection, max_width): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ojeda-e I'm looking forward to being able to use your membrane curvature tool! I'm not sure if this fits your purpose exactly, but you could use scipy.stats.binned_statistic_2d
to generate a surface of all your beads in one go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this suggestion @p-j-smith. I love the simplicity of binned_statistic_2d(statistic='mean')
to get the mean z-values in one go! We might have to remember this for the refactor and remember to unwrap values for PBC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ojeda-e, I think I had something different in mind when I requested that you only use numpy etc and break it down to basics. Could you have a go at writing the function with the input arguments I suggested? I haven't had a look at tests because I assume they'll change with the new function.
@p-j-smith had a good suggestion with the scipy function. When I said "numpy only" that was probably too restrictive, sorry; totally on me. For future reference, you should consider any package safe if it's something that MDAnalysis considers a "core dependency". I actually find it preferable to use numpy/scipy functions because they're usually written by people much smarter than me, who've thought about the problem longer than me -- this usually results in faster, better code.
If you try something like the code below, you might make this a very short function!
x, y, z = coordinates.T scipy.stats.binned_statistic_2d(x, y, z, statistic="mean", bins=..., range=...)
Hi @lilyminium, the PR has a lot of different feedback at this point and I ended up following the suggestion from @richardjgowers which conflicts with this one. Could we defer this to an issue as it would require me to add a new dependency and can be done in parallel with other work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_z_surface(coordinates, n_x_bins=10, n_y_bins=10, x_range=(0, 100), y_range=(0, 100)):
I don't think the feedback about the function signature contradicts @richardjgowers. You don't have to use scipy now if you don't want to -- the main gist is to have the main function work solely on numpy arrays of positions. I think you'll find this helpful for refactoring in #41; if you have a look at existing AnalysisBase classes, the important functions very rarely work on MDAnalysis Universes. Instead, the positions are extracted in _single_frame
and these numpy arrays are used for the actual computation.
You've already done most of the work, to be honest. The simplest, most immediate conversion would be:
def derive_surface(n_cells, selection, max_width):
coordinates = selection.positions
return get_z_surface(coordinates, n_x_bins=n_cells, n_y_bins=n_cells,
x_range=(0, max_width), y_range=(0, max_width))
def get_z_surface(coordinates, n_x_bins=10, n_y_bins=10, x_range=(0, 100), y_range=(0, 100)):
z_ref = np.zeros((n_x_bins, n_y_bins))
grid_z_coordinates = np.zeros((n_x_bins, n_y_bins))
grid_norm_unit = np.zeros((n_x_bins, n_y_bins))
x_factor = n_x_bins / (x_range[1] - x_range[0])
y_factor = n_y_bins / (y_range[1] - y_range[0])
x_coords, y_coords, z_coords = coordinates.T
cell_x_floor = np.floor(x_coords * x_factor).astype(int)
cell_y_floor = np.floor(y_coords * y_factor).astype(int)
for l, m, z in zip(cell_x_floor, cell_y_floor, z_coords):
... # rest of current derive_surface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wouldn't be afraid of adding a new dependency (especially from the MDAnalysis core dependencies). It's mostly a matter of adding the package name to:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I just thought this was going to be a separate issue. In the function you suggest the number of x and y bins is different, which I thought was going to be addressed later in #35 (together with PBC issue #36).
Would you please confirm that is better to add this change here? @lilyminium
Another option is to use your suggested function but without independent x
and y
(i.e. square arrays) and then complete the work when I address #35 later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO since you know that you want to address #35 in the future, there's no harm in designing your code towards that direction now. If you think it's easier to write and test a function that only takes get_z_surface(coordinates, n_bins=10, range=(0, 100))
and then modify all the tests you write here for the get_z_surface(coordinates, n_x_bins=10, n_y_bins=10, x_range=(0, 100), y_range=(0, 100))
signature later, go for it -- but to me it sounds like extra work.
If you have no intention of testing non-square arrays in this PR, you could just pass the same number into n_x_bins
and n_y_bins
as shown in the derive_surface
example above. Then when you want to add rectangular functionality, you don't need to modify the existing tests from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick reply. I'll add it here then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ojeda-e, I think I had something different in mind when I requested that you only use numpy etc and break it down to basics. Could you have a go at writing the function with the input arguments I suggested? I haven't had a look at tests because I assume they'll change with the new function.
@p-j-smith had a good suggestion with the scipy function. When I said "numpy only" that was probably too restrictive, sorry; totally on me. For future reference, you should consider any package safe if it's something that MDAnalysis considers a "core dependency". I actually find it preferable to use numpy/scipy functions because they're usually written by people much smarter than me, who've thought about the problem longer than me -- this usually results in faster, better code.
If you try something like the code below, you might make this a very short function!
x, y, z = coordinates.T
scipy.stats.binned_statistic_2d(x, y, z, statistic="mean", bins=..., range=...)
membrane_curvature/lib/mods.py
Outdated
@@ -24,66 +38,58 @@ def grid_map(coords, factor): | |||
return index_grid_l, index_grid_m | |||
|
|||
|
|||
def core_fast_leaflet(universe, z_Ref, n_cells, selection, max_width): | |||
def derive_surface(n_cells, selection, max_width): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this suggestion @p-j-smith. I love the simplicity of binned_statistic_2d(statistic='mean')
to get the mean z-values in one go! We might have to remember this for the refactor and remember to unwrap values for PBC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comments, mainly towards using numpy array operations. Sorry, did not manage to review everything in the time I had.
After reviews, functions |
If you leave the voodoo cast here, raise an issue to investigate further. From the amount of comments you got here you can see that nobody likes code that we don’t understand.
… Am 7/3/21 um 12:45 schrieb Estefania Barreto-Ojeda ***@***.***>:
@ojeda-e commented on this pull request.
In membrane_curvature/lib/mods.py:
>
- grid_count_frames = np.zeros([n_cells, n_cells])
+ factor = np.float32(n_cells / max_width)
Thanks for highlighting the docstrings, max_width is not int type, it's float. I'll update them.
I don't see anything wrong with keeping it anyway. I'll leave and will re-evaluate in a performance test.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
In this PR:
Different binning for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding get_z_surface
@ojeda-e. This more modular code is much more understandable to me! I have some comments on test -- mostly, pytest.mark.parametrize
is not necessary if you only have one test case, and assert_almost_equal
means you don't need for loops. I also have some tips for using numpy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Function
core_fast_leaflets
split into three functions:Tests added:
test_get_positions
using dummy coordinates for beads 0 to 8, all of them withz=10
:Using the same number of beads in grids,
test_avg_unit_cell
added for two systems.z=10
z=10
and number of beads per unit cell as shown below:test_derive_surface
added for samedummy_coordinates
as intest_get_positions
.