Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite make_whole in Cython #1965

Merged
merged 3 commits into from Jul 6, 2018
Merged

Rewrite make_whole in Cython #1965

merged 3 commits into from Jul 6, 2018

Conversation

richardjgowers
Copy link
Member

@richardjgowers richardjgowers commented Jun 30, 2018

The performance of make_whole has been raised in #1961 benchmarking it showed that the check that the molecule is traversable via bonds was taking a significant amount of the time. This rewrites make_whole in Cython and also adds support for triclinic unit cells

package/setup.py Outdated
include_dirs=include_dirs,
define_macros=define_macros,
extra_compile_args=extra_compile_args)
extra_compile_args=[a for a in extra_compile_args
if not a.startswith('-std')])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so c++ didn't play well with the -std=c99 flag. What does this do? And does removing it break anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag sets the C standard we are using. Yeah, c++ doesn't have this standard. Having an explicit standard for the C-code is nice. I did a PR a while ago where I did small clean up in the C code and I would like to keep the explicit standard requirement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also set the c++ standard we use. I'll have a look at the function you are using and give you my advise after reading up on the docs. The behavior has some subtle changes depending on the used standard version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the functions I used make use of modern c++ features. If we want to have the compatibility with very old compilers gcc 4.8 and smaller and therefore almost all linux installations out there we can go for the 2003 standard version. Generally I prefer the c++11/14 standard aka modern C++ but we don't make use of any features of those just yet.


@cython.boundscheck(False)
@cython.wraparound(False)
def _is_contiguous(int[:] atoms, int[:, :] bonds, int start):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a docstring! I don't see why this function should be hidden.

package/setup.py Outdated
include_dirs=include_dirs,
define_macros=define_macros,
extra_compile_args=extra_compile_args)
extra_compile_args=[a for a in extra_compile_args
if not a.startswith('-std')])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also set the c++ standard we use. I'll have a look at the function you are using and give you my advise after reading up on the docs. The behavior has some subtle changes depending on the used standard version

output = intset()

for val in a:
if b.count(val) != 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to be defensive here. The c++ set container will return the existing element if it is already there. https://en.cppreference.com/w/cpp/container/set/insert

You doing the count and the insert doubles the computation cost. Because sets don't have duplicates the count will always be either 0 or 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually go directly for the std version of this algorithm. It scales better linearly instead of O(a.size() * log(b.size)

https://en.cppreference.com/w/cpp/algorithm/set_difference

from libcpp.algorithm import set_difference
from libcpp.iterator import inserter


cdef intset difference(intset a, intset b):
    output = intset()
    set_difference(a.begin(), a.end(), b.begin(), b.end(), inserter(output, output.begin()))
    return output

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a bit more complicated because the stl functions aren't exposed by cython -.-

x = bonds[i, 0]
y = bonds[i, 1]
# only add bonds if both atoms are in atoms set
if total.count(x):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combine into one if statement if total.count(x) and total.count(y).

N = total.size()

nloops = 0
while seen.size() < N:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while seen.size() < N and nloops < N:

@@ -51,7 +53,7 @@ def unique_int_1d(np.ndarray[np.int64_t, ndim=1] values):
cdef int i = 0
cdef int j = 0
cdef int n_values = values.shape[0]
cdef np.ndarray[np.int64_t, ndim=1] result = np.empty(n_values, dtype=np.int64)
cdef long[:] result = np.empty(n_values, dtype=np.int64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should continue to use the np.int64_t type to ensure the size of the type independent of the used platform. What a long can vary in size.

@kain88-de
Copy link
Member

I addressed my own comments already. If this works correctly it could give another little boost in performance but this should be checked!

@jbarnoud
Copy link
Contributor

jbarnoud commented Jul 1, 2018

Ping #1961

@kain88-de
Copy link
Member

I also used memoryviews in unique_int_1d, they're generally faster

They might be faster. It's always worth to check the c-code cython generates. In this case I don't assume that it did not make a huge difference. Especially since we have to cast the memoryview back into a numpy array.

@coveralls
Copy link

coveralls commented Jul 1, 2018

Coverage Status

Coverage decreased (-0.03%) to 89.933% when pulling 9ea6328 on cpp_is_contiguous into 22dfcdb on develop.

@kain88-de
Copy link
Member

if c++14 isn't available on macOS give c++11 a try. This will also work here.

@richardjgowers
Copy link
Member Author

@kain88-de thanks for the help, I've not used cpp much ever!

@kain88-de
Copy link
Member

well in the end I didn't add much. I just fixed the unique_id code you broke by accident and clean up the code that's it. osx means we can't use nice things -.-

@kain88-de
Copy link
Member

btw @richardjgowers how much faster is this version?

@richardjgowers
Copy link
Member Author

@kain88-de the cython version of check_contiguous was about 2x faster than pure python, we were mostly just doing set operations to check connectivity and python's sets are good

I've reworked make_whole instead. Previously we checked that a solution could be found (_is_contiguous) then found the solution, whereas we can just try and do the solution (which implicitly does the check anyway). So _is_contiguous has been melted into make_whole


oldpos = atomgroup.positions
newpos = np.zeros((oldpos.shape[0], 3), dtype=np.float32)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I checked the C code made for this, between here and the return at the bottom is pure C :)

vec[i] = oldpos[other, i] - oldpos[atom, i]
# Apply periodic boundary conditions to this vector
if ortho:
apply_pbc_ortho(&vec[0], &box[0])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here ideally we'd use the minimum_image things from calc_distances.h but there's type mismatches I need to figure out

ortho = True
box = atomgroup.dimensions
for i in range(3):
if box[i + 3] != 90.0:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I use something like fabs(box[i + 3] - 90) > SOMETHING_SMALL here?

else:
from .mdamath import triclinic_vectors
# forgive me code Jesus for I have sinned
B = triclinic_vectors(box)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does anyone know how we'd turn a 3,3 numpy array into the coordinate[3] type we need for the pbc_triclinic below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see libmdaxdr
<coordinate*>&box[0, 0]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh to the typecasting on the B of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line takes the first pointer to the first element of box aka box[0, 0] and casts that to a pointer of type coordinate.

def test_zero_box_size(self, universe, ag):
universe.dimensions = [0., 0., 0., 90., 90., 90.]
with pytest.raises(ValueError):
mdamath.make_whole(ag)

def test_too_small_box_size(self, universe, ag):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing that test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's likely to happen, you either have box dimensions or not. The chance of an incorrect box being supplied is low.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could keep the check & test, but it would make the algorithm slower because you need to calculate all bond lengths. I thought it was too much of an unlikely case to slow things down for it

@richardjgowers
Copy link
Member Author

Ok thanks to @davidercruz we've got a fullerene test too

Should be good to go now

@jbarnoud jbarnoud requested a review from davidercruz July 3, 2018 07:06
@richardjgowers richardjgowers changed the title Cpp is contiguous Rewrite make_whole in Cython Jul 3, 2018
@richardjgowers
Copy link
Member Author

@MDAnalysis/coredevs reviews please :)

@jbarnoud
Copy link
Contributor

jbarnoud commented Jul 3, 2018 via email

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly questions. The code looks sane. I am a bit annoyed that the precision of the tests needed to be reduced, though.

return np.array(result)


cdef intset difference(intset a, intset b):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not look the most efficient way to do, see https://en.cppreference.com/w/cpp/algorithm/set_difference. But it should do for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @kain88-de tried to use this but it's not available on osx or something

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rewrite the algorithm ourselves. They assume that the containers are sorted (which they are for c++11 and later). Using the std algorithm is difficult on linux. I did not get the compiler to resolve the template. This could get better when we update to use the new conda compilers.

"You can set dimensions using 'atomgroup.dimensions='")

ortho = True
for i in range(3):
Copy link
Contributor

@jbarnoud jbarnoud Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not range(3, 6) instead of doing index gymnastic?

if ortho:
minimum_image(&vec[0], &box[0], &inverse_box[0])
else:
minimum_image_triclinic(&vec[0], <coordinate*>&tri_box[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't tri_box be casted only once before the loops?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does casting take any time? I thought it was more an instruction to the compiler rather than a copy of data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not expect any cost for the cast; though, I do not see why tri_box could not be directly the right type. Nevertheless, it is really not a big deal.

raise ValueError("Reference atom not in atomgroup")

# Check all of atomgroup is accessible from ref
if not _is_contiguous(atomgroup, ref):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is not there in the new version. From what I understand, a non contiguous atom group will trigger the test at the very end of the new version, but the error message is not super clear about the reason, and it fails late instead of early.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when benchmarking this I found it takes a non negligible amount of time. Because we also implicitly do this check when applying the algorithm, it made more sense to not check this before and just try. We only apply the position change if the alg. worked, so it's defensive in that respect. But yes this means that failing because of non-contiguous is slower, although succeeding is faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Good for me.

@@ -305,13 +282,13 @@ def test_solve_1(self, universe, ag):

assert_array_almost_equal(universe.atoms[:4].positions, refpos)
assert_array_almost_equal(universe.atoms[4].position,
np.array([110.0, 50.0, 0.0]))
np.array([110.0, 50.0, 0.0]), decimal=3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a drastic decrease in precision, doesn't it? Could it be that make_whole needs to work on doubles?

lib.mdamath.make_whole now supports triclinic boxes

renamed from new to avoid cpp name conflict in calc_distances.h

added fullerene test case for make_whole
@richardjgowers
Copy link
Member Author

@jbarnoud I bumped precision up to decimal=5 and it still passes (hopefully)

@jbarnoud jbarnoud merged commit 01e2511 into develop Jul 6, 2018
@jbarnoud jbarnoud deleted the cpp_is_contiguous branch July 6, 2018 17:12
@jbarnoud jbarnoud added this to the 0.19.0 milestone Jul 6, 2018
@jbarnoud jbarnoud self-assigned this Jul 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants