Initial refactor and test added [Fixes #33] #34

ojeda-e · 2021-06-24T22:48:10Z

Helps to fix #33.
In this first version of the refactored code, the calculation of curvature doesn't evaluate per leaflet, and instead, it runs for a surface defined by selected atoms. (More general)

Changes in this PR:

Surface derived from direct selection using MDAnalysis.
Deleted functions def_all_beads, curvature and core_fast.
Function grid_map to map coordinates to grid added.
In grid_map arguments as np.array / tuples instead of topologies.
Replaced MDtraj by MDanalysis 100%.

As suggested here, tests with toy model using pytest.mark.parametrize added:

test_grid_map_small_9grid, using toy model in small grid of 9 lipids in a 3x3 grid, with x values of 0, 1, 2, and y values of 0, 1, 2.
test_grid_map_25grid, using toy model in small grid of 25 lipids in a 5x5 grid, with x values of 0, 1, 2,3,4 and y values of 0, 1, 2,3,4.

This PR may also fix

Write test def_all_beads #27 since function def_all_beads was deleted after replacing MDtraj by MDAnalysis`.
Write test for core_fast in mods.py #28 since function core_fast was deleted after refactoring.
Refactor def_all_beads and add test using toy model. #32 since function def_all_beads was deleted after refactoring and replaced by direct selection using MDAnalysis`.

Edit: This PR also fixes #16 .

codecov · 2021-06-24T22:50:28Z

Codecov Report

Merging #34 (8569cf0) into main (bf6019b) will increase coverage by 9.39%.
The diff coverage is 73.33%.

orbeckst

Good progress! I have some general comments.

My main question is why there are so many changes to the tests in this PR. Once you start doing big refactoring like changing from mdtraj to mda, I would have expected the previous tests to be still in place and passing. (I might not be fully up-to-date on your changes to the tests but I'd still expect any changes to be tests before this PR.)

orbeckst · 2021-06-25T01:20:16Z

membrane_curvature/core.py

-    topology = md.load(grofile).topology
-
-    # 6. Populate universe with coordinates and trajectory
+    # 2. Populate universe with coordinates and trajectory


MDAnalysis is not restricted to GRO format files. I'd just call it topology and trajectory.

orbeckst · 2021-06-25T01:22:32Z

membrane_curvature/core.py

    u = mda.Universe(grofile, trjfile)

-    # 6.1 Set grid: Extract box dimension from MD sim,
+    # 3 Set grid: Extract box dimension from MD sim,
    # set grid max width, set number of unit cells
    box_size = u.dimensions[0]


This gives you Lx only. Do you assume square X-Y ?

Maybe at least

box_size = max(u.dimensions[0], u.dimensions[1])

If you assume orthorhombic boxes then consider just failing here if you encounter a triclinic box.

Yes, this is one of the limitations and one of the next issues to solve. Issue #35 added.

orbeckst · 2021-06-25T01:25:45Z

membrane_curvature/lib/mods.py

+    x: float
+        Value of x coordinate
+    y: float
+        Value of y coordinate


Docs (x,y) do not seem to agree with function signature (coords, factor)

Yes, you are right and thanks for this catch. I'll update it.

orbeckst · 2021-06-25T01:28:35Z

membrane_curvature/lib/mods.py

+    index_grid_l = int(abs(coords[0]) * factor)
+    index_grid_m = int(abs(coords[1]) * factor)


Is coords a single coordinate (essentially a 3-tuple)?

If so then you should keep in mind for later that you can almost certainly gain quite a bit of performance by doing the grid indexing operation on all coordinates at once (there's code related to np.histogramdd that you can probably use... but not in this PR!)

Could you please add a test with negative coordinates? It might fail for now but that's ok.

+1 to @orbeckst's comment -- it would be good to start thinking about expected inputs and output. For example, the most common way that coordinates tend to be passed around is in an (N x 3) numpy array. Even without np.histogramdd you could then simplify this function to return np.abs(coords * factor).astype(int).

orbeckst · 2021-06-25T01:29:31Z

membrane_curvature/lib/mods.py

@@ -160,27 +54,24 @@ def core_fast_leaflet(z_Ref, leaflet, traj, jump, n_cells, lipid_types,

    grid_count = np.zeros([n_cells, n_cells])

-    for frame in range(0, traj.n_frames, jump):


What was the jump argument good for?

jump was to skip frames. I was using jump = 1 for all the tests so I decided it was not very useful at this point either.

membrane_curvature/lib/mods.py

membrane_curvature/tests/test_mdakit_membcurv.py

ojeda-e · 2021-06-25T02:39:50Z

Thanks for your comments @orbeckst
There are two main reasons for all the changes here introduced.

The first one and most relevant, is that in the previous version of my code, topologies and trajectories fromMDtraj were parsed in three of the functions: def_all_beads, core_fast_leaflets, and core_fast.
With the change from MDtraj to MDAnalysis, the arguments of def_all_beads disappeared, core_fast core_fast_leaflets significantly changed.
The second reason is that in previous PRs (see PR test core_def_all_beads added. [Issue #27] #31 and test core_fast added [Issue #28] #30) the tests I was asked to provide were with dummy coordinates, to be more readable. Which is completely ok, but makes more sense only after refactoring.
As I mentioned above, In the previous version of the code, the arguments parsed in 3 of the functions were MDtraj topologies and trajectories that I can't build with numpy arrays. Only after this refactoring, and having MDAnalysis instead, that type of test makes more sense.
Additionally, since the previous PRs I was asked to have tests using dummy coordinates and the ones I provided with grofiles were getting too complicated to read, I simplify the tests in this new version of the code.

In the spirit of moving forward, I added changes in this PR that wrap up the previous limitations and allows me to progress. If further tests are needed, now with this refactored version providing dummy coordinates will be possible. I feel I haven't made progress because of the tests that have been asked previously, which were going to change no matter what with this PR.
If it's better, I can put the big systems tests for two of the functions, mean_curvature and gaussian_curvature, which are essentially the only two functions that remain the same after refactoring.

pep8speaks · 2021-06-25T03:34:53Z

Hello @ojeda-e! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-06-25 03:37:02 UTC

lilyminium

This is a great start to refactoring, @ojeda-e -- the new grid_map tests are much more digestible!

I have a few notes on the refactor. I think it's worth spending some time thinking about how you want to break your MDTraj code and put it back together in an MDAnalysis format. You've swapped MDTraj trajectories for MDAnalysis universes in core_fast_leaflet, which is a good start, but IMO an iterative kind of refactor will constrain potential designs because of trying to be similar to the previous iteration.

Instead, this seems to be a good time to reduce complexity and start separating your functions into more modular ones, where each function does only one main thing. Then it will be easier to put them together in convenient ways as we try to work out the best design for the MembraneCurvature AnalysisBase class. (and easier to test!)

Currently core_fast_leaflet does three things:

gets positions for each atom in the atom group for each frame
identifies the grid cell for each coordinate (currently grid_map)
calculates the average z for the atom group

Could you please break this down into three functions, and write a new function that simply calls them (just for the regression tests?) I would be keen to have these as divorced from either MDTraj or MDAnalysis as possible and work only with numpy arrays, which are generally more versatile. The overall function could grab the positions from the AtomGroup and feed it into the functions instead.

In addition, some unofficial principles of design from personal experience:

Side-effecting functions

Currently core_fast_leaflet is a side-effecting function. This means that it modifies a variable, z_Ref, outside its local environment. In functional programming, where you write functions (as opposed to class/objects in object-oriented programming), it's generally good practice to avoid side-effecting functions. This StackOverflow post summarises it better than I could, but in general side-effect free functions are easier to test, easier to parallelize, and easier to cobble together into larger functions.

To write a side-effect free function, core_fast_leaflet would be given input arguments that it doesn't modify, and return an output value. So the test_core_fast_lealets function could look like:

def test_core_fast_leaflets():
    n_cells, max_width = 3, 3
---    z_calc = np.zeros([n_cells, n_cells])
    u = mda.Universe(GRO_PO4_SMALL, XTC_PO4_SMALL)
    selection = u.select_atoms('index 0:3')
---    core_fast_leaflet(u, z_calc, n_cells, selection, max_width)
+++    z_calc = core_fast_leaflet(u, n_cells, selection, max_width)
    for z, z_test in zip(MEMBRANE_CURVATURE_DATA['grid']['small']['upper'], z_calc):
        print(z, z_test)
        assert_almost_equal(z, z_test)

The principle of least astonishment

The principle of least astonishment basically advises that you use your code to create user expectations, and then try your best to fulfil them. This comes in many forms, such as naming your variables and functions clearly (more on that below). It also means trying to follow the conventions of other packages as best you can. This is a bit nitpicky, but one example is the input arguments to np.zeros. The documentation actually specifies that the shape should be an integer or a tuple of integers. In most contexts, code that looks like np.func([x, y]) means that np.func is operating on a list of values that will be converted to an array. In contrast, shape is commonly specified as a tuple (x, y) so it's much less likely to be misunderstood.

The type of input arguments to np.zeros is of course a very small issue. I just bring it up now because as I said above, refactoring is a good time to start thinking about what kind of inputs and outputs you want your functions to take, as that is a key part of API design.

Even more nitpicky: parameter ordering is also part of designing a function. There isn't really much formal advice on how to order parameters in a function (although I found this Stackoverflow post). This is not important right now, but for example with core_fast_leaflet, as a user, I would be surprised at the ordering of universe, z_Ref, n_cells, selection, max_width. That's because I think of universes and AtomGroups as very similar things (in fact, you don't necessarily need universe as an argument, as you could get the universe = selection.universe). In addition, n_cells and max_width both relate to the same thing: factor. It feels more intuitive to me to call core_fast_leaflet(universe, selection, z_Ref, n_cells, max_width) and have those arguments grouped together.

Naming functions

Part of PLOA is naming things clearly, so future developers (or you in 2 months!) can read your code with minimal effort. PEP8 has a whole section on naming conventions. In general, it's typically a good idea to have a verb in your function name. For example, I am not sure what grid_map does just by looking at its name. If it were more explicit and map were a verb, e.g. map_coordinates_to_grid, that's much more immediately obvious. It's up to you how verbose you would like to make it; other candidates could be map_coordinates_to_array, map_coordinates_to_grid_array, get_coordinate_cell.

The kind of verb can hint at the expected output, such as get_coordinate_cell above. For example, I would expect functions that start with is_ or has_ (e.g. has_atoms, is_negative) to return either True or False.

lilyminium · 2021-06-25T17:15:02Z

membrane_curvature/lib/mods.py

+    index_grid_l = int(abs(coords[0]) * factor)
+    index_grid_m = int(abs(coords[1]) * factor)


Could you please add a test with negative coordinates? It might fail for now but that's ok.

lilyminium · 2021-06-25T17:18:01Z

membrane_curvature/tests/test_mdakit_membcurv.py

+    # should map to
+    lambda xy: (xy[0], xy[1]),
+    5, 5),
+])


This parametrization is just testing one set of arguments, do you plan to add more? You could combine the 9-grid and the 25-grid into the one function with parametrize though.

lilyminium · 2021-06-25T17:21:30Z

membrane_curvature/lib/mods.py

+    index_grid_l = int(abs(coords[0]) * factor)
+    index_grid_m = int(abs(coords[1]) * factor)


+1 to @orbeckst's comment -- it would be good to start thinking about expected inputs and output. For example, the most common way that coordinates tend to be passed around is in an (N x 3) numpy array. Even without np.histogramdd you could then simplify this function to return np.abs(coords * factor).astype(int).

membrane_curvature/lib/mods.py

lilyminium · 2021-06-25T17:48:51Z

membrane_curvature/tests/test_mdakit_membcurv.py

+def test_grid_map_25grid(dummy_coordinates, test_mapper, n_cells, max_width):
+    factor = np.float32(n_cells / max_width)
+    for dummy_coord in dummy_coordinates:
+        assert test_mapper(dummy_coord) == grid_map(dummy_coord, factor)


I think that test_mapper is not strictly needed here -- I think you could do the below?

Suggested change

assert test_mapper(dummy_coord) == grid_map(dummy_coord, factor)

assert grid_map(dummy_coord, factor) == dummy_coord

In addition, could you please add a test where the output of grid_map is different from the input dummy_coord? Otherwise this will pass even with grid_map = lambda x, *args: x :)

lilyminium · 2021-06-25T18:50:25Z

membrane_curvature/lib/mods.py

-        if grid_count[i, j] > 0:
-            z_Ref[i, j] /= grid_count[i, j]
+        if grid_count_frames[i, j] > 0:
+            z_Ref[i, j] /= grid_count_frames[i, j]


I see that you're first creating an average surface before computing the curvature. On a scientific level, maybe we should consider treating each surface for each frame separately? Otherwise, if you have a membrane that undulates such that a patch has ~50% positive curvature and 50% negative curvature over the frames, the Gaussian curvature is ultimately computed on a flat surface?

ojeda-e · 2021-06-25T21:30:51Z

Hi @lilyminium Lily, thanks for your review. Regarding this comment (and below)

Currently core_fast_leaflet does three things:

gets positions for each atom in the atom group for each frame

identifies the grid cell for each coordinate (currently grid_map)

calculates the average z for the atom group

Could you please break this down into three functions, and write a new function that simply calls them (just for the regression tests?) I would be keen to have these as divorced from either MDTraj or MDAnalysis as possible and work only with numpy arrays, which are generally more versatile. The overall function could grab the positions from the AtomGroup and feed it into the functions instead.

If it works for you, I'll add these remarks as new issues instead of adding them in this PR. The changes are getting a bit too long. Would that work?
If yes, then it'll be happy to submit a PR with the requested changes.

ojeda-e added 4 commits June 24, 2021 13:57

Test files of small system with 9 lipids

e09c2df

MDtraj replaced by MDAnalysis. Defintion of beads deleted.

3df5cdd

test for toy systems added. Deleted complex tests.

042e625

Pep 8 format

c9ea462

ojeda-e requested a review from orbeckst June 24, 2021 22:59

orbeckst reviewed Jun 25, 2021

View reviewed changes

ojeda-e mentioned this pull request Jun 25, 2021

Define grid size for different box shapes. #35

Closed

2 tasks

Requested changes added. Previous curvature tests added back.

0e4a100

pep8 fixed

8569cf0

ojeda-e requested a review from orbeckst June 25, 2021 17:30

lilyminium requested changes Jun 25, 2021

View reviewed changes

ojeda-e mentioned this pull request Jun 25, 2021

test core_def_all_beads added. [Issue #27] #31

Merged

2 tasks

ojeda-e merged commit 1675ea9 into main Jun 25, 2021

This was referenced Jun 29, 2021

Break core_fast_leaflet into functions. #39

Closed

Finish initial code refactor #37

Closed

lilyminium mentioned this pull request Jun 30, 2021

Added derive_surface and get_positions with tests. #40

Merged

6 tasks

ojeda-e deleted the issue33 branch August 29, 2021 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial refactor and test added [Fixes #33] #34

Initial refactor and test added [Fixes #33] #34

ojeda-e commented Jun 24, 2021 •

edited

Loading

codecov bot commented Jun 24, 2021 •

edited

Loading

orbeckst left a comment

orbeckst Jun 25, 2021

orbeckst Jun 25, 2021

orbeckst Jun 25, 2021

ojeda-e Jun 25, 2021

orbeckst Jun 25, 2021

ojeda-e Jun 25, 2021

orbeckst Jun 25, 2021

lilyminium Jun 25, 2021

lilyminium Jun 25, 2021

orbeckst Jun 25, 2021

ojeda-e Jun 25, 2021

ojeda-e commented Jun 25, 2021

pep8speaks commented Jun 25, 2021 •

edited

Loading

lilyminium left a comment

lilyminium Jun 25, 2021

lilyminium Jun 25, 2021

lilyminium Jun 25, 2021

lilyminium Jun 25, 2021

lilyminium Jun 25, 2021

ojeda-e commented Jun 25, 2021

		index_grid_l = int(abs(coords[0]) * factor)
		index_grid_m = int(abs(coords[1]) * factor)

		@@ -160,27 +54,24 @@ def core_fast_leaflet(z_Ref, leaflet, traj, jump, n_cells, lipid_types,

		grid_count = np.zeros([n_cells, n_cells])

		for frame in range(0, traj.n_frames, jump):

	assert test_mapper(dummy_coord) == grid_map(dummy_coord, factor)
	assert grid_map(dummy_coord, factor) == dummy_coord

Initial refactor and test added [Fixes #33] #34

Initial refactor and test added [Fixes #33] #34

Conversation

ojeda-e commented Jun 24, 2021 • edited Loading

codecov bot commented Jun 24, 2021 • edited Loading

Codecov Report

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ojeda-e commented Jun 25, 2021

pep8speaks commented Jun 25, 2021 • edited Loading

Comment last updated at 2021-06-25 03:37:02 UTC

lilyminium left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ojeda-e commented Jun 25, 2021

ojeda-e commented Jun 24, 2021 •

edited

Loading

codecov bot commented Jun 24, 2021 •

edited

Loading

pep8speaks commented Jun 25, 2021 •

edited

Loading