-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the performance of ParmEd converter. (Fix #3028) #3029
Conversation
The origin code used index() of list to lookup the atom indices, which is nearly O(N^2) when iterating all atoms. This commit converts the list to a dictionary mapping the atom objects to indices, and hence improves the overall performance.
Here's my simple benchmark code (saved as load.py) of converting a system that has 10234 atoms, 3407 residues and 10233 bonds:
Before this commit, by running
After this commit, I got:
I am new to contribute code to MDAnalysis. Do I need to update CHANGELOG and tests accordingly? |
Codecov Report
@@ Coverage Diff @@
## develop #3029 +/- ##
========================================
Coverage 93.09% 93.09%
========================================
Files 186 186
Lines 24665 24666 +1
Branches 3195 3196 +1
========================================
+ Hits 22961 22962 +1
Misses 1656 1656
Partials 48 48
Continue to review full report at Codecov.
|
Thanks for working on this @HanatoK !
Yes please. I think the current tests cover this change, but please do add an extra test if you feel it may be necessary. For the most part it looks good to me, but I've pinged @lilyminium who definitely has a better handle of how the ParmEd converter works. A related thought here (mostly for @lilyminium but also others given it's a general converter thing), we removed timesteps as arguments to the writers in #2754, would it also be worth considering this here? I.e. was the option to include ts as an argument purely "because writers did it"? |
Please also add yourself to |
Yeah we should remove it, it can be in a different PR though :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, a neat patch for a great speed-up. Thank you!
Should we backport this simple fix to 1.x ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the one very small change on my end. Thanks!
package/CHANGELOG
Outdated
@@ -107,6 +108,7 @@ Enhancements | |||
'protein' selection (#2751 PR #2755) | |||
* Added an RDKit converter that works for any input with all hydrogens | |||
explicit in the topology (Issue #2468, PR #2775) | |||
* Improved performance of the ParmEd converter (Issue #3028, PR #3029) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entries are ordered newer first :)
edit: I can't read and didn't see it was already in enhancements my apologies, if you can just put it at the top of the list that'd be great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :) I'll merge once CI returns green.
…DAnalysis#3029) Fixes MDAnalysis#3028 * Improves the performance of the ParmEd converter by using a dictionary lookup for the atomgroup to universe index mapping.
…DAnalysis#3029) Fixes MDAnalysis#3028 * Improves the performance of the ParmEd converter by using a dictionary lookup for the atomgroup to universe index mapping.
…DAnalysis#3029) Fixes MDAnalysis#3028 * Improves the performance of the ParmEd converter by using a dictionary lookup for the atomgroup to universe index mapping.
…DAnalysis#3029) Fixes MDAnalysis#3028 * Improves the performance of the ParmEd converter by using a dictionary lookup for the atomgroup to universe index mapping.
…DAnalysis#3029) Fixes MDAnalysis#3028 ## Work done in this PR * Improves the performance of the ParmEd converter by using a dictionary lookup for the atomgroup to universe index mapping.
The origin code used index() of list to lookup the atom indices, which
is nearly O(N^2) when iterating all atoms. This commit converts the list
to a dictionary mapping the atom objects to indices, and hence improves
the overall performance.
Fixes #3028
Changes made in this Pull Request:
PR Checklist