Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the performance of ParmEd converter. (Fix #3028) #3029

Merged
merged 4 commits into from
Nov 19, 2020

Conversation

HanatoK
Copy link
Contributor

@HanatoK HanatoK commented Nov 17, 2020

The origin code used index() of list to lookup the atom indices, which
is nearly O(N^2) when iterating all atoms. This commit converts the list
to a dictionary mapping the atom objects to indices, and hence improves
the overall performance.

Fixes #3028

Changes made in this Pull Request:

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

The origin code used index() of list to lookup the atom indices, which
is nearly O(N^2) when iterating all atoms. This commit converts the list
to a dictionary mapping the atom objects to indices, and hence improves
the overall performance.
@pep8speaks
Copy link

pep8speaks commented Nov 17, 2020

Hello @HanatoK! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-19 01:12:26 UTC

@HanatoK
Copy link
Contributor Author

HanatoK commented Nov 17, 2020

Here's my simple benchmark code (saved as load.py) of converting a system that has 10234 atoms, 3407 residues and 10233 bonds:

import MDAnalysis as mda
import parmed as pmd
prm = pmd.load_file('alad.parm7', 'alad.rst7')
print(len(prm.bonds))
u = mda.Universe(prm)
prot = u.select_atoms('all')
prm_u = prot.convert_to('PARMED')
prm_u.write_pdb('converted.pdb')
prm_u.write_psf('converted.psf')
print(prm_u)

Before this commit, by running time python3 ./load.py, I got:

<Structure 10234 atoms; 3407 residues; 10233 bonds; PBC (orthogonal); parametrized>
python3 ./load.py  50.02s user 0.44s system 101% cpu 49.865 total

After this commit, I got:

<Structure 10234 atoms; 3407 residues; 10233 bonds; PBC (orthogonal); parametrized>
python3 ./load.py  2.41s user 0.44s system 130% cpu 2.181 total

I am new to contribute code to MDAnalysis. Do I need to update CHANGELOG and tests accordingly?

@codecov
Copy link

codecov bot commented Nov 17, 2020

Codecov Report

Merging #3029 (72c94e0) into develop (e9d0e88) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #3029   +/-   ##
========================================
  Coverage    93.09%   93.09%           
========================================
  Files          186      186           
  Lines        24665    24666    +1     
  Branches      3195     3196    +1     
========================================
+ Hits         22961    22962    +1     
  Misses        1656     1656           
  Partials        48       48           
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/ParmEd.py 91.01% <100.00%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9d0e88...72c94e0. Read the comment docs.

@IAlibay
Copy link
Member

IAlibay commented Nov 17, 2020

Thanks for working on this @HanatoK !

I am new to contribute code to MDAnalysis. Do I need to update CHANGELOG and tests accordingly?

Yes please. I think the current tests cover this change, but please do add an extra test if you feel it may be necessary.

For the most part it looks good to me, but I've pinged @lilyminium who definitely has a better handle of how the ParmEd converter works.

A related thought here (mostly for @lilyminium but also others given it's a general converter thing), we removed timesteps as arguments to the writers in #2754, would it also be worth considering this here? I.e. was the option to include ts as an argument purely "because writers did it"?

@IAlibay
Copy link
Member

IAlibay commented Nov 17, 2020

Please also add yourself to AUTHORS :)

@lilyminium
Copy link
Member

I.e. was the option to include ts as an argument purely "because writers did it"?

Yeah we should remove it, it can be in a different PR though :-)

Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, a neat patch for a great speed-up. Thank you!

@orbeckst
Copy link
Member

Should we backport this simple fix to 1.x ?

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one very small change on my end. Thanks!

@@ -107,6 +108,7 @@ Enhancements
'protein' selection (#2751 PR #2755)
* Added an RDKit converter that works for any input with all hydrogens
explicit in the topology (Issue #2468, PR #2775)
* Improved performance of the ParmEd converter (Issue #3028, PR #3029)
Copy link
Member

@IAlibay IAlibay Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entries are ordered newer first :)

edit: I can't read and didn't see it was already in enhancements my apologies, if you can just put it at the top of the list that'd be great!

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :) I'll merge once CI returns green.

@IAlibay IAlibay merged commit 4040405 into MDAnalysis:develop Nov 19, 2020
lilyminium pushed a commit to lilyminium/mdanalysis that referenced this pull request Dec 8, 2020
…DAnalysis#3029)

Fixes MDAnalysis#3028

* Improves the performance of the ParmEd converter by using a dictionary lookup for the atomgroup to universe index mapping.
lilyminium pushed a commit to lilyminium/mdanalysis that referenced this pull request Dec 8, 2020
…DAnalysis#3029)

Fixes MDAnalysis#3028

* Improves the performance of the ParmEd converter by using a dictionary lookup for the atomgroup to universe index mapping.
lilyminium pushed a commit to lilyminium/mdanalysis that referenced this pull request Dec 9, 2020
…DAnalysis#3029)

Fixes MDAnalysis#3028

* Improves the performance of the ParmEd converter by using a dictionary lookup for the atomgroup to universe index mapping.
lilyminium pushed a commit to lilyminium/mdanalysis that referenced this pull request Dec 14, 2020
…DAnalysis#3029)

Fixes MDAnalysis#3028

* Improves the performance of the ParmEd converter by using a dictionary lookup for the atomgroup to universe index mapping.
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
…DAnalysis#3029)

Fixes MDAnalysis#3028 

## Work done in this PR
* Improves the performance of the ParmEd converter by using a dictionary lookup for the atomgroup to universe index mapping.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

convert_to('PARMED') super slow
6 participants