New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial test. [Fixes #10] #13
Conversation
Hello @ojeda-e! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-06-16 04:46:56 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting up the tests! I had a few questions about the Z-points and some notes about tests :)
cell_l = int(abs(x) * factor) | ||
cell_m = int(abs(y) * factor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I get nervous when I see coordinates being abs
oluted. Just checking that you're sure that this puts them in the right cell? In my mind, an x value of -0.5 for a box length of 10 nm should put it in the last cell (9.5-10), not the first (0-0.5).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can trigger a long conversation about the method I implemented, but let's try :)
I used absolute values because of two main reasons. 1) I am dismissing those few atoms that have -2.x and lower (is never higher than that, otherwise that would be reasonable.)
With that range, when negative, the bead will be very likely mapped to first or second unit cell.
In previous attempts I tried to generate a cells for those rare cases, but it messed up the grid because then it will introduce lots of np.nans
, and that will drive the values of the surface pretty badly.
2) I am not considering PBD conditions in this method.
Despite these, I am pretty sure of the abs
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that range, when negative, the bead will be very likely mapped to first or second unit cell.
Could you not subtract the coordinates from the box size to support PBC? I think the MDAnalysis refactor will very likely include PBC so it's a good idea to start thinking about it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks.
for i, j in it.product(range(n_cells), range(n_cells)): | ||
if grid_2[i, j] > 0: | ||
z_ref[i, j] += grid_1[i, j] / grid_2[i, j] | ||
grid_count[i, j] += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a nice place to take advantage of masking in numpy arrays. Does the below (untested) snippet accomplish the same thing for you?
mask = grid_2 > 0
z_ref[mask] = grid_1[mask] / grid_2[mask]
However, I am a little confused as to the difference between grid_count
and grid_2
. Shouldn't they have the same values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my bad for putting it here. I'll try to explain. Initially the code (in main.py
) has two grids to count how many beads lie in a given unit cell, one that is associated to the total number of beads over the full iteration of the n frames (grid_count
), and one associated to the beads of each lipid type per frame (grid_2
).
In the Z_cloud
fixture, I removed the iteration over frames since I am extracting the cloud of Z points out of the .gro
file containing PO4 beads only.
cell_m = int(abs(y) * factor) | ||
|
||
try: | ||
grid_1[cell_l, cell_m] += z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you need to account for PBC with the z coordinate? With a box length of 10, the mean of a list of z-values [0.5, 9.5] should ideally be 0, not 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... interesting. I guess I could, but that system doesn't make sense in my mind. In that type of setups I'd say the simulation box is too small and the interactions of the images will affect the dynamic of the atoms in the simulation box itself. Maybe I am missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of the scenario that the membrane is on the "bottom" of the box on the z-axis instead of the middle, i.e. around the origin -- maybe rare but very possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy. Yeah, I wasn't prepared for that type of escenarios. In that case I'll have to add PBC on z
. Thanks :)
|
||
for i, j in it.product(range(n_cells), range(n_cells)): | ||
if grid_count[i, j] > 0: | ||
z_ref[i, j] /= grid_count[i, j] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just so I'm understanding this correctly, the z-ref coordinate is the sum of all the z-coordinates of the atoms in that cell, divided by the square of the count of atoms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
z_ref
is the sum of all the z
coordinates in that cell, divided by the total number of atoms that populated that cell (grid_count
stores that number).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think you divide it by grid_2
already, which also stores the atom count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point now. In this test I am not iterating over frames. I am actually extracting the surface and calculating H
and K
from the file with PO4 beads only as in getting one frame from the gro file. Very likely I missed or added something but can't see it now. I'll review it. Thanks.
|
||
def test_mdakit_membcurv_imported(): | ||
"""Sample test, will always pass so long as import statement worked""" | ||
assert "mdakit_membcurv" in sys.modules | ||
|
||
|
||
def test_output_folder(): # line 76, gone after refactoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't so much test your code as the os
library. Is it necessary?
def test_box_size(universe): # line 99 | ||
box_size = universe.dimensions[0] | ||
assert_almost_equal(box_size, 184.31013, decimal=5) | ||
|
||
|
||
def test_max_width(universe): # line 100 | ||
max_width = universe.dimensions[0] * 0.1 | ||
assert max_width == 18.431013488769533 | ||
|
||
|
||
def test_n_cells(universe): # line 101 | ||
unit_width = 20 | ||
max_width = universe.dimensions[0] * 0.1 | ||
n_cells = math.ceil(max_width / unit_width * 10) | ||
print(n_cells) | ||
assert n_cells == 10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are very fine-grained. test_box_size
is mostly checking your assumptions of how MDAnalysis reads dimensions, which is fine, but test_max_width
seems to be testing that division by 10 works as expected -- I think that is probably unnecessary.
In general the tests in this repository should be for your code. A good rule of thumb is that each test should test the output of a function that you wrote. So if you had a function called get_max_width
, you could test that, but IMO it's out of scope to test MDAnalysis
and math
functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Maybe I need more directions at this point. I remembered I asked @orbeckst about what part of the code should I test to start, considering I have a main.py
and mods.py
. I tried to test main, with the exception of the parse_arg
part and some other redundant functions like curvature
that I wrote just to made my life easier when I was using it for me.
I thought what I had in this version of the tests was checking some lines in main.py
and some other is mods.py
. Not all of them, because there are parts of the code that will be gone in the refactoring.
Maybe mentors can help giving me some lines to test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add at least one test for every function that you think you will need in some form in the MDAnalysis refactor? Even if the form of the function changes when refactoring, we would want to check that the returned output for the given input does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress, please see comments inline.
mdakit_membcurv/core.py
Outdated
@@ -5,7 +5,7 @@ | |||
Handles the primary functions | |||
""" | |||
|
|||
from mods import * | |||
from .lib.mods import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid blanket from x import *
imports but be explicit about which functions you actually need here. It makes the code a lot easier to debug and to read. Furthermore, if you have two blanket imports, nobody (including future you) will likely have an idea where the functions really came from.
(I am not even a fan of from x import y
and prefer to use import y
and then use y.x
to make clear where it comes from, or use a variation with submodule imports and aliasing like from matplotlib import pyplot as plt
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I'll leave it as absolute import in the current test as from mods import y
bit I will keep in mind the change as you suggested in the refactoring process.
mdakit_membcurv/tests/datafiles.py
Outdated
Location of datafiles for Membrane Curvature unit tests | ||
======================================================= | ||
|
||
MD simulations files stored in ``\\data`` sub-directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd write it as
MD simulations files stored in ``data`` sub-directory.
and not start with a leading directory separator. (Also, I think convention is to use Unix forward slashes as directory separators in documentation... but I might be biased.)
mdakit_membcurv/tests/datafiles.py
Outdated
"GRO_", # Gromacs file of POPC POPE CHOL membrane | ||
"XTC_" # Gromacs trajectory of 10 frames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe be more explicit and name them GRO_MIXED_MEMBRANE
and XTC_MIXED_MEMBRANE
instead of a trailing underscore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed to a more explicit name.
import mdakit_membcurv | ||
import pickle | ||
|
||
from numpy.testing._private.utils import assert_array_almost_equal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
from numpy.testing import assert_array_almost_equal
not work? It would be simpler and not rely on an import that seems to not be guaranteed to exist in future versions. I'd avoid using anything starting with underscores as it is generally not part of the published API.
Btw, we typically use assert_almost_equal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... as you do below, so I think the above can just go.
import os | ||
import mdtraj as md # This will be gone after refactoring | ||
import itertools as it | ||
from ..lib.mods import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicit imports
|
||
|
||
@pytest.fixture() | ||
def Z_cloud(ref_beads): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this code reproducing some functionality in the main code? You are doing a fair amount of calculation to set up a reference value. Can you add some commentary here to make clear what happens?
This is even more important in the light of https://github.com/MDAnalysis/membrane-curvature/pull/13/files#r650475495 if there are concerns how PBC are treated here. If you are generated an incorrect reference and then test your code against it, you will ensure that your code is incorrect. I understand that at this stage you want to test that your code produces the values that it has been producing in the past. If during writing the tests you find potential bugs in your code then you have two options (in my opinion)
- write correct test and mark them XFAIL (i.e., you know your code will currently not produce the correct result) and fix later
- write correct tests and fix your code now
If things are not clear cut then at least add a comment in the tests and in your main code noting that this is something that needs to be looked into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It partially does, yes. It is a short calculation to map the (x,y)
to [i,j]
of the beads in the grofile that contains PO4
beads only. The subsequent calculation of membrane curvature uses the grid I am returning here, which is a np.array
, indexed according to number of cells in the grid, and which contains the average z
values of the beads that populate such unit cell. I will add comment on the steps. It's the same way it is calculated in mods.py
, line 291-303.
My original code doesn't consider PBC conditions of any kind, and I'm aware that's a potential risk. The calculation of membrane curvature in this approach has an exception for the atoms of coordinates that when mapped to a grid, they fall in an [i,j]
pair bigger than the initial size of the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @orbeckst that I am worried this is checking for incorrect functionality. Instead of having this function and loading ref_beads from a file, could you simply define Z_cloud as the z_ref array?
The simplest way to do this would be to run the code yourself and copy the coordinates into here, then we can work up to making sure that the generation of z_ref is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could then also write a test for the lines of code that generate z_ref
:)
dict2pickle(name, dict_) | ||
unpickled = pickle.load(open(name + '.pickle', 'rb')) | ||
assert dict_ == unpickled | ||
os.remove(name + '.pickle') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests that create files should use the tmpdir
fixture
def test(... tmpdir):
with tmpdir.cwd():
# create files etc
do_test
assert something
When the with
block finishes, tmpdir
and everything inside will be autodeleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I managed to miss this comment. Fixture added in last commit.
def test_md_traj(): # line 108, gone after refactoring | ||
traj = md.load(XTC_, top=GRO_) | ||
assert traj.n_frames == 11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, just checks that your trajectory has the length that you think it should have. This does not check your code.
assert traj.n_frames == 11 | ||
|
||
|
||
def test_gaussian_curvature(Z_cloud): # line 391 of mods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important test as it checks your function.
…only. Fixtures fixed.
Hi @lilyminium and @orbeckst. |
|
||
|
||
@pytest.fixture() | ||
def Z_cloud(ref_beads): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @orbeckst that I am worried this is checking for incorrect functionality. Instead of having this function and loading ref_beads from a file, could you simply define Z_cloud as the z_ref array?
The simplest way to do this would be to run the code yourself and copy the coordinates into here, then we can work up to making sure that the generation of z_ref is correct.
def test_dict_to_pickle(output): # line 117-118 | ||
name = 'test_pickle_output' | ||
dict_ = {'A': 1, 'B': 2, 'C': 3} | ||
with output.as_cwd(): | ||
dict2pickle(name, dict_) | ||
unpickled = pickle.load(open(name + '.pickle', 'rb')) | ||
assert dict_ == unpickled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_dict_to_pickle(output): # line 117-118 | |
name = 'test_pickle_output' | |
dict_ = {'A': 1, 'B': 2, 'C': 3} | |
with output.as_cwd(): | |
dict2pickle(name, dict_) | |
unpickled = pickle.load(open(name + '.pickle', 'rb')) | |
assert dict_ == unpickled | |
def test_dict_to_pickle(tmpdir): # line 117-118 | |
name = 'test_pickle_output' | |
dict_ = {'A': 1, 'B': 2, 'C': 3} | |
with tmpdir.as_cwd(): | |
dict2pickle(name, dict_) | |
unpickled = pickle.load(open(name + '.pickle', 'rb')) | |
assert dict_ == unpickled |
tmpdir
is a fixture provided by pytest, so you don't have to make it yourself -- you could just use it directly here.
assert_almost_equal(box_size, 184.31013, decimal=5) | ||
|
||
|
||
def test_n_cells(universe): # line 101 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_box_size, test_dict_to_pickle, test_md_topology
don't test your code so they should get removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great initial start!
Description
Initial test. Fixes [#10] This first version of the tests includes:
Test for lines (in
core.py
)Test for functions (in
lib\mods.py
):Notes
mdtraj
, therefore some fixtures are temporary and will be gone after refactoring.GRO_PO4
intest_mdakit_mmebcurv.py
).