Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial test. [Fixes #10] #13

Merged
merged 24 commits into from Jun 16, 2021
Merged

Initial test. [Fixes #10] #13

merged 24 commits into from Jun 16, 2021

Conversation

ojeda-e
Copy link
Member

@ojeda-e ojeda-e commented Jun 11, 2021

Description

Initial test. Fixes [#10] This first version of the tests includes:

  • Fixtures for universe, reference beads and cloud of Z points.
    Test for lines (in core.py)
  • line 76
  • line 92
  • line 99
  • line 100
  • line 101
  • line 117-118
    Test for functions (in lib\mods.py):
  • line 391
  • line 417

Notes

  • Some fixtures added for future work (leaflets and lipid types)
  • The code currently uses mdtraj, therefore some fixtures are temporary and will be gone after refactoring.
  • Added a new profile that contains only beads of reference to derive the surfaces (GRO_PO4 in test_mdakit_mmebcurv.py).

@pep8speaks
Copy link

pep8speaks commented Jun 11, 2021

Hello @ojeda-e! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 211:27: W291 trailing whitespace

Comment last updated at 2021-06-16 04:46:56 UTC

@ojeda-e ojeda-e mentioned this pull request Jun 11, 2021
5 tasks
@codecov
Copy link

codecov bot commented Jun 11, 2021

Codecov Report

Merging #13 (e1afb6c) into main (e18102e) will increase coverage by 24.00%.
The diff coverage is 100.00%.

@ojeda-e
Copy link
Member Author

ojeda-e commented Jun 11, 2021

Test are not failing here. 💪
Now I really need to solve the documentation part (issue #5 , PR #8 ).

@ojeda-e ojeda-e mentioned this pull request Jun 13, 2021
7 tasks
Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting up the tests! I had a few questions about the Z-points and some notes about tests :)

Comment on lines 88 to 174
cell_l = int(abs(x) * factor)
cell_m = int(abs(y) * factor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I get nervous when I see coordinates being absoluted. Just checking that you're sure that this puts them in the right cell? In my mind, an x value of -0.5 for a box length of 10 nm should put it in the last cell (9.5-10), not the first (0-0.5).

Copy link
Member Author

@ojeda-e ojeda-e Jun 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can trigger a long conversation about the method I implemented, but let's try :)
I used absolute values because of two main reasons. 1) I am dismissing those few atoms that have -2.x and lower (is never higher than that, otherwise that would be reasonable.)
With that range, when negative, the bead will be very likely mapped to first or second unit cell.
In previous attempts I tried to generate a cells for those rare cases, but it messed up the grid because then it will introduce lots of np.nans, and that will drive the values of the surface pretty badly.
2) I am not considering PBD conditions in this method.

Despite these, I am pretty sure of the abs there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that range, when negative, the bead will be very likely mapped to first or second unit cell.

Could you not subtract the coordinates from the box size to support PBC? I think the MDAnalysis refactor will very likely include PBC so it's a good idea to start thinking about it now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks.

mdakit_membcurv/tests/test_mdakit_membcurv.py Outdated Show resolved Hide resolved
mdakit_membcurv/tests/test_mdakit_membcurv.py Outdated Show resolved Hide resolved
for i, j in it.product(range(n_cells), range(n_cells)):
if grid_2[i, j] > 0:
z_ref[i, j] += grid_1[i, j] / grid_2[i, j]
grid_count[i, j] += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a nice place to take advantage of masking in numpy arrays. Does the below (untested) snippet accomplish the same thing for you?

mask = grid_2 > 0

z_ref[mask] = grid_1[mask] / grid_2[mask]

However, I am a little confused as to the difference between grid_count and grid_2. Shouldn't they have the same values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my bad for putting it here. I'll try to explain. Initially the code (in main.py) has two grids to count how many beads lie in a given unit cell, one that is associated to the total number of beads over the full iteration of the n frames (grid_count), and one associated to the beads of each lipid type per frame (grid_2).
In the Z_cloud fixture, I removed the iteration over frames since I am extracting the cloud of Z points out of the .gro file containing PO4 beads only.

cell_m = int(abs(y) * factor)

try:
grid_1[cell_l, cell_m] += z
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you need to account for PBC with the z coordinate? With a box length of 10, the mean of a list of z-values [0.5, 9.5] should ideally be 0, not 5.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... interesting. I guess I could, but that system doesn't make sense in my mind. In that type of setups I'd say the simulation box is too small and the interactions of the images will affect the dynamic of the atoms in the simulation box itself. Maybe I am missing something here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of the scenario that the membrane is on the "bottom" of the box on the z-axis instead of the middle, i.e. around the origin -- maybe rare but very possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holy. Yeah, I wasn't prepared for that type of escenarios. In that case I'll have to add PBC on z. Thanks :)


for i, j in it.product(range(n_cells), range(n_cells)):
if grid_count[i, j] > 0:
z_ref[i, j] /= grid_count[i, j]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just so I'm understanding this correctly, the z-ref coordinate is the sum of all the z-coordinates of the atoms in that cell, divided by the square of the count of atoms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

z_ref is the sum of all the z coordinates in that cell, divided by the total number of atoms that populated that cell (grid_count stores that number).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think you divide it by grid_2 already, which also stores the atom count?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point now. In this test I am not iterating over frames. I am actually extracting the surface and calculating H and K from the file with PO4 beads only as in getting one frame from the gro file. Very likely I missed or added something but can't see it now. I'll review it. Thanks.

mdakit_membcurv/tests/test_mdakit_membcurv.py Show resolved Hide resolved

def test_mdakit_membcurv_imported():
"""Sample test, will always pass so long as import statement worked"""
assert "mdakit_membcurv" in sys.modules


def test_output_folder(): # line 76, gone after refactoring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't so much test your code as the os library. Is it necessary?

Comment on lines 143 to 223
def test_box_size(universe): # line 99
box_size = universe.dimensions[0]
assert_almost_equal(box_size, 184.31013, decimal=5)


def test_max_width(universe): # line 100
max_width = universe.dimensions[0] * 0.1
assert max_width == 18.431013488769533


def test_n_cells(universe): # line 101
unit_width = 20
max_width = universe.dimensions[0] * 0.1
n_cells = math.ceil(max_width / unit_width * 10)
print(n_cells)
assert n_cells == 10.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are very fine-grained. test_box_size is mostly checking your assumptions of how MDAnalysis reads dimensions, which is fine, but test_max_width seems to be testing that division by 10 works as expected -- I think that is probably unnecessary.

In general the tests in this repository should be for your code. A good rule of thumb is that each test should test the output of a function that you wrote. So if you had a function called get_max_width, you could test that, but IMO it's out of scope to test MDAnalysis and math functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
Maybe I need more directions at this point. I remembered I asked @orbeckst about what part of the code should I test to start, considering I have a main.py and mods.py. I tried to test main, with the exception of the parse_arg part and some other redundant functions like curvature that I wrote just to made my life easier when I was using it for me.
I thought what I had in this version of the tests was checking some lines in main.py and some other is mods.py. Not all of them, because there are parts of the code that will be gone in the refactoring.
Maybe mentors can help giving me some lines to test? ☺️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add at least one test for every function that you think you will need in some form in the MDAnalysis refactor? Even if the form of the function changes when refactoring, we would want to check that the returned output for the given input does not.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress, please see comments inline.

@@ -5,7 +5,7 @@
Handles the primary functions
"""

from mods import *
from .lib.mods import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid blanket from x import * imports but be explicit about which functions you actually need here. It makes the code a lot easier to debug and to read. Furthermore, if you have two blanket imports, nobody (including future you) will likely have an idea where the functions really came from.

(I am not even a fan of from x import y and prefer to use import y and then use y.x to make clear where it comes from, or use a variation with submodule imports and aliasing like from matplotlib import pyplot as plt.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. I'll leave it as absolute import in the current test as from mods import y bit I will keep in mind the change as you suggested in the refactoring process.

Location of datafiles for Membrane Curvature unit tests
=======================================================

MD simulations files stored in ``\\data`` sub-directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd write it as

MD simulations files stored in ``data`` sub-directory.

and not start with a leading directory separator. (Also, I think convention is to use Unix forward slashes as directory separators in documentation... but I might be biased.)

Comment on lines 13 to 14
"GRO_", # Gromacs file of POPC POPE CHOL membrane
"XTC_" # Gromacs trajectory of 10 frames.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe be more explicit and name them GRO_MIXED_MEMBRANE and XTC_MIXED_MEMBRANE instead of a trailing underscore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed to a more explicit name.

import mdakit_membcurv
import pickle

from numpy.testing._private.utils import assert_array_almost_equal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does

from numpy.testing import assert_array_almost_equal

not work? It would be simpler and not rely on an import that seems to not be guaranteed to exist in future versions. I'd avoid using anything starting with underscores as it is generally not part of the published API.

Btw, we typically use assert_almost_equal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... as you do below, so I think the above can just go.

import os
import mdtraj as md # This will be gone after refactoring
import itertools as it
from ..lib.mods import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit imports



@pytest.fixture()
def Z_cloud(ref_beads):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code reproducing some functionality in the main code? You are doing a fair amount of calculation to set up a reference value. Can you add some commentary here to make clear what happens?

This is even more important in the light of https://github.com/MDAnalysis/membrane-curvature/pull/13/files#r650475495 if there are concerns how PBC are treated here. If you are generated an incorrect reference and then test your code against it, you will ensure that your code is incorrect. I understand that at this stage you want to test that your code produces the values that it has been producing in the past. If during writing the tests you find potential bugs in your code then you have two options (in my opinion)

  1. write correct test and mark them XFAIL (i.e., you know your code will currently not produce the correct result) and fix later
  2. write correct tests and fix your code now

If things are not clear cut then at least add a comment in the tests and in your main code noting that this is something that needs to be looked into.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It partially does, yes. It is a short calculation to map the (x,y) to [i,j] of the beads in the grofile that contains PO4 beads only. The subsequent calculation of membrane curvature uses the grid I am returning here, which is a np.array, indexed according to number of cells in the grid, and which contains the average z values of the beads that populate such unit cell. I will add comment on the steps. It's the same way it is calculated in mods.py, line 291-303.
My original code doesn't consider PBC conditions of any kind, and I'm aware that's a potential risk. The calculation of membrane curvature in this approach has an exception for the atoms of coordinates that when mapped to a grid, they fall in an [i,j] pair bigger than the initial size of the box.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @orbeckst that I am worried this is checking for incorrect functionality. Instead of having this function and loading ref_beads from a file, could you simply define Z_cloud as the z_ref array?

The simplest way to do this would be to run the code yourself and copy the coordinates into here, then we can work up to making sure that the generation of z_ref is correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could then also write a test for the lines of code that generate z_ref :)

dict2pickle(name, dict_)
unpickled = pickle.load(open(name + '.pickle', 'rb'))
assert dict_ == unpickled
os.remove(name + '.pickle')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests that create files should use the tmpdir fixture

def test(... tmpdir):
   with tmpdir.cwd():
        # create files etc
        do_test
        assert something

When the with block finishes, tmpdir and everything inside will be autodeleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I managed to miss this comment. Fixture added in last commit.

Comment on lines 235 to 237
def test_md_traj(): # line 108, gone after refactoring
traj = md.load(XTC_, top=GRO_)
assert traj.n_frames == 11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, just checks that your trajectory has the length that you think it should have. This does not check your code.

assert traj.n_frames == 11


def test_gaussian_curvature(Z_cloud): # line 391 of mods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important test as it checks your function.

@ojeda-e ojeda-e requested a review from lilyminium June 15, 2021 20:49
@ojeda-e
Copy link
Member Author

ojeda-e commented Jun 15, 2021

Hi @lilyminium and @orbeckst.
I submitted the requested changes and replied to some of the comments you made in the previous PR. Thanks.



@pytest.fixture()
def Z_cloud(ref_beads):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @orbeckst that I am worried this is checking for incorrect functionality. Instead of having this function and loading ref_beads from a file, could you simply define Z_cloud as the z_ref array?

The simplest way to do this would be to run the code yourself and copy the coordinates into here, then we can work up to making sure that the generation of z_ref is correct.

Comment on lines 247 to 253
def test_dict_to_pickle(output): # line 117-118
name = 'test_pickle_output'
dict_ = {'A': 1, 'B': 2, 'C': 3}
with output.as_cwd():
dict2pickle(name, dict_)
unpickled = pickle.load(open(name + '.pickle', 'rb'))
assert dict_ == unpickled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_dict_to_pickle(output): # line 117-118
name = 'test_pickle_output'
dict_ = {'A': 1, 'B': 2, 'C': 3}
with output.as_cwd():
dict2pickle(name, dict_)
unpickled = pickle.load(open(name + '.pickle', 'rb'))
assert dict_ == unpickled
def test_dict_to_pickle(tmpdir): # line 117-118
name = 'test_pickle_output'
dict_ = {'A': 1, 'B': 2, 'C': 3}
with tmpdir.as_cwd():
dict2pickle(name, dict_)
unpickled = pickle.load(open(name + '.pickle', 'rb'))
assert dict_ == unpickled

tmpdir is a fixture provided by pytest, so you don't have to make it yourself -- you could just use it directly here.

assert_almost_equal(box_size, 184.31013, decimal=5)


def test_n_cells(universe): # line 101
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_box_size, test_dict_to_pickle, test_md_topology don't test your code so they should get removed

Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great initial start!

GSOC 2021 automation moved this from Review in progress to Reviewer approved Jun 16, 2021
@ojeda-e ojeda-e merged commit 7ebc22f into main Jun 16, 2021
GSOC 2021 automation moved this from Reviewer approved to Done Jun 16, 2021
@ojeda-e ojeda-e deleted the issue10 branch August 29, 2021 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants