Improve the performance of ParmEd converter. (Fix #3028) #3029

HanatoK · 2020-11-17T14:46:28Z

The origin code used index() of list to lookup the atom indices, which
is nearly O(N^2) when iterating all atoms. This commit converts the list
to a dictionary mapping the atom objects to indices, and hence improves
the overall performance.

Fixes #3028

Changes made in this Pull Request:

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

The origin code used index() of list to lookup the atom indices, which is nearly O(N^2) when iterating all atoms. This commit converts the list to a dictionary mapping the atom objects to indices, and hence improves the overall performance.

pep8speaks · 2020-11-17T14:46:31Z

Hello @HanatoK! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-19 01:12:26 UTC

HanatoK · 2020-11-17T14:52:54Z

Here's my simple benchmark code (saved as load.py) of converting a system that has 10234 atoms, 3407 residues and 10233 bonds:

import MDAnalysis as mda
import parmed as pmd
prm = pmd.load_file('alad.parm7', 'alad.rst7')
print(len(prm.bonds))
u = mda.Universe(prm)
prot = u.select_atoms('all')
prm_u = prot.convert_to('PARMED')
prm_u.write_pdb('converted.pdb')
prm_u.write_psf('converted.psf')
print(prm_u)

Before this commit, by running time python3 ./load.py, I got:

<Structure 10234 atoms; 3407 residues; 10233 bonds; PBC (orthogonal); parametrized>
python3 ./load.py  50.02s user 0.44s system 101% cpu 49.865 total

After this commit, I got:

<Structure 10234 atoms; 3407 residues; 10233 bonds; PBC (orthogonal); parametrized>
python3 ./load.py  2.41s user 0.44s system 130% cpu 2.181 total

I am new to contribute code to MDAnalysis. Do I need to update CHANGELOG and tests accordingly?

codecov · 2020-11-17T16:26:44Z

Codecov Report

Merging #3029 (72c94e0) into develop (e9d0e88) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           develop    #3029   +/-   ##
========================================
  Coverage    93.09%   93.09%           
========================================
  Files          186      186           
  Lines        24665    24666    +1     
  Branches      3195     3196    +1     
========================================
+ Hits         22961    22962    +1     
  Misses        1656     1656           
  Partials        48       48

Impacted Files	Coverage Δ
package/MDAnalysis/coordinates/ParmEd.py	`91.01% <100.00%> (+0.05%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9d0e88...72c94e0. Read the comment docs.

IAlibay · 2020-11-17T16:35:09Z

Thanks for working on this @HanatoK !

I am new to contribute code to MDAnalysis. Do I need to update CHANGELOG and tests accordingly?

Yes please. I think the current tests cover this change, but please do add an extra test if you feel it may be necessary.

For the most part it looks good to me, but I've pinged @lilyminium who definitely has a better handle of how the ParmEd converter works.

A related thought here (mostly for @lilyminium but also others given it's a general converter thing), we removed timesteps as arguments to the writers in #2754, would it also be worth considering this here? I.e. was the option to include ts as an argument purely "because writers did it"?

IAlibay · 2020-11-17T17:27:36Z

Please also add yourself to AUTHORS :)

lilyminium · 2020-11-18T05:24:39Z

I.e. was the option to include ts as an argument purely "because writers did it"?

Yeah we should remove it, it can be in a different PR though :-)

lilyminium

Looks great, a neat patch for a great speed-up. Thank you!

orbeckst · 2020-11-18T18:33:00Z

Should we backport this simple fix to 1.x ?

IAlibay

Just the one very small change on my end. Thanks!

IAlibay · 2020-11-18T18:41:14Z

package/CHANGELOG

@@ -107,6 +108,7 @@ Enhancements
    'protein' selection (#2751 PR #2755)
  * Added an RDKit converter that works for any input with all hydrogens
    explicit in the topology (Issue #2468, PR #2775)
+  * Improved performance of the ParmEd converter (Issue #3028, PR #3029)


Entries are ordered newer first :)

edit: I can't read and didn't see it was already in enhancements my apologies, if you can just put it at the top of the list that'd be great!

IAlibay

Thanks :) I'll merge once CI returns green.

…DAnalysis#3029) Fixes MDAnalysis#3028 * Improves the performance of the ParmEd converter by using a dictionary lookup for the atomgroup to universe index mapping.

…DAnalysis#3029) Fixes MDAnalysis#3028 ## Work done in this PR * Improves the performance of the ParmEd converter by using a dictionary lookup for the atomgroup to universe index mapping.

Fix a PEP 8 warning in the previous commit.

ec12676

IAlibay requested a review from lilyminium November 17, 2020 14:58

Update AUTHORS and CHANGELOG.

7d36d8b

lilyminium approved these changes Nov 18, 2020

View reviewed changes

orbeckst assigned IAlibay and lilyminium Nov 18, 2020

IAlibay requested changes Nov 18, 2020

View reviewed changes

IAlibay mentioned this pull request Nov 18, 2020

Remove support for ts as an input to the ParmedConverter #3031

Closed

Update CHANGELOG.

72c94e0

IAlibay approved these changes Nov 19, 2020

View reviewed changes

IAlibay merged commit 4040405 into MDAnalysis:develop Nov 19, 2020

IAlibay mentioned this pull request Nov 19, 2020

convert_to('PARMED') super slow #3028

Closed

fiona-naughton added enhancement Component-Converters labels Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the performance of ParmEd converter. (Fix #3028) #3029

Improve the performance of ParmEd converter. (Fix #3028) #3029

HanatoK commented Nov 17, 2020 •

edited by orbeckst

pep8speaks commented Nov 17, 2020 •

edited

HanatoK commented Nov 17, 2020

codecov bot commented Nov 17, 2020 •

edited

IAlibay commented Nov 17, 2020

IAlibay commented Nov 17, 2020

lilyminium commented Nov 18, 2020

lilyminium left a comment

orbeckst commented Nov 18, 2020

IAlibay left a comment

IAlibay Nov 18, 2020 •

edited

IAlibay left a comment

Improve the performance of ParmEd converter. (Fix #3028) #3029

Improve the performance of ParmEd converter. (Fix #3028) #3029

Conversation

HanatoK commented Nov 17, 2020 • edited by orbeckst

Changes made in this Pull Request:

PR Checklist

pep8speaks commented Nov 17, 2020 • edited

Comment last updated at 2020-11-19 01:12:26 UTC

HanatoK commented Nov 17, 2020

codecov bot commented Nov 17, 2020 • edited

Codecov Report

IAlibay commented Nov 17, 2020

IAlibay commented Nov 17, 2020

lilyminium commented Nov 18, 2020

lilyminium left a comment

Choose a reason for hiding this comment

orbeckst commented Nov 18, 2020

IAlibay left a comment

Choose a reason for hiding this comment

IAlibay Nov 18, 2020 • edited

Choose a reason for hiding this comment

IAlibay left a comment

Choose a reason for hiding this comment

HanatoK commented Nov 17, 2020 •

edited by orbeckst

pep8speaks commented Nov 17, 2020 •

edited

codecov bot commented Nov 17, 2020 •

edited

IAlibay Nov 18, 2020 •

edited