-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use atom slice #35
Use atom slice #35
Conversation
The Travis error is a py-3.4 error that has existed for a few weeks now. It looks like something goes wrong in the pandas install. That's not on you (exists independently of this PR), but you're more than welcome to investigate! I haven't had time. On the CodeClimate stuff, I'm somewhat ambivalent. Things that better fit the current CodeClimate rules (relaxed from the defaults) are better, but I'm willing to make exceptions. Maybe try making Finally, don't worry about serializing |
Alright, I benchmarked everything in
Even for this trajectory without waters using atom_slice performs better for almost everything in the example, except for the Seeing this I would argue that |
@dwhswenson This is ready for some benchmarking. I don't expect the working of the code to change, I only need to write extra documentation/tests and update the example.
And on a 4 core desktop
This was done using the following script: from contact_map import ContactFrequency
import mdtraj as md
import timeit
traj = md.load('54.xtc', top='54.gro')
print("length traj: {} frames".format(str(len(traj))))
top = traj.topology
c = top.select('element C')
o = top.select('element O')
print("Carbons: {}".format(str(len(c))))
print("Oxygens: {}".format(str(len(o))))
print("Combined selections: {}".format(str(len(c)+len(o))))
print("All atoms: {}".format(str(top.n_atoms)))
class Test(object):
from contact_map import ContactFrequency
import mdtraj as md
def __init__(self, skip=1):
self.traj = md.load('54.xtc', top = '54.gro')[::skip]
self.top = traj.topology
self.c = self.top.select('element C')
self.o = self.top.select('element O')
def use_atom_slice_cs(self):
ContactFrequency._class_use_atom_slice = True
_ = ContactFrequency(self.traj, query=self.c, haystack=self.c)
def use_atom_slice_os(self):
ContactFrequency._class_use_atom_slice = True
_ = ContactFrequency(self.traj, query=self.c, haystack=self.o)
def use_no_atom_slice_cs(self):
ContactFrequency._class_use_atom_slice = False
_ = ContactFrequency(self.traj, query=self.c, haystack=self.c)
def use_no_atom_slice_os(self):
ContactFrequency._class_use_atom_slice = False
_ = ContactFrequency(self.traj, query=self.c, haystack=self.o)
for skip in [1000,100,10,1]:
test = Test(skip)
frames = len(test.traj)
print("\n {} frames".format(str(frames)))
print("carbon query and haystack")
print("with atom_slice: {}".format(timeit.Timer(test.use_atom_slice_cs).timeit(number=1)))
print("no atom_slice: {}".format(timeit.Timer(test.use_no_atom_slice_cs).timeit(number=1)))
print("carbon query and oxygen haystack")
print("with atom_slice: {}".format(timeit.Timer(test.use_atom_slice_os).timeit(number=1)))
print("no atom_slice: {}".format(timeit.Timer(test.use_no_atom_slice_os).timeit(number=1))) I do not have a straightforward way of providing you with the .xtc and .gro as they are 100 MB (when tarred) and the login of the figshare seems to be down. Could you try it on one of your systems to see how it holds up? |
Alright, I am pretty confident it behaves properly now. every single contactmap/frequency in the example notebook (that has all the waters already cut out) is at least as fast or a lot faster (when query and haystack are both defined). This should be the worst real world case. You could also check this performance by checking out this branch and by running the example notebook. Too deactivate the use of atom_slice add a cell after the import cell with: ContactMap._class_use_atom_slice = False
ContactFrequency._class_use_atom_slice = False
ContactDifference._class_use_atom_slice = False |
@dwhswenson this is ready for a review and a merge |
coverage falls due to an error on the last line EDIT: And it bumped up again to 100%, strange... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good. One other change, aside from the comments above: In the main example, you didn't run the stuff under the "Use a different cutoff". I know they take a while, but could you re-run that before this goes into master?
Overall, it seems to me that what we really want here is to create two different IndexManager
classes; one that works without atom slicing and one that works with atom slicing. This would move all the code you now have for that out of the ContactObject
class, essentially abstracting the stuff we have now. This would also get rid of a lot of if/else blocks you have.
But we'll merge it in like this now, and try that as part of a later update.
@sroet : Is there anything left to do on this? I had been misreading GitHub's status on my review comments (not noticing that they were outdated) and didn't realize you'd already dealt with pretty much everything. The only (tiny) thing left is that my style for the alignment of multiline statement where the linebreak is immediately after the opening punctuation to be this: dict_comprehension = {
key: function_that_creates_value(key)
for key in some_long_name_here
} (similar to "one true brace" in C). You're still one indent level too far in. (I can understand why my comment wasn't clear.) Obviously not a big deal, but either you update it or I will. You might as well get credit for the commit (and keep the git-blame for the line!) Go ahead and update that in a (really fast) commit, and let me know if there's anything else you're cleaning up in this before I merge. Also, note that your GitHub email wasn't set (probably in gitconfig on whatever computer you were working on.) Please make sure that's set to take credit for future contributions! |
@dwhswenson Did the last comment, and fixed the email. |
This uses atom_slice to cut down a trajectory before computing the neighborlist. This should speed up contactmap/frequency calculations for real systems
The code is not yet completely implemented, but is only tested at one point, and not yet benchmarked for now.
TODO:
EDIT: found the first bug already (only converting the list/ not the numpy arrays)