Make the TPR parser a bit faster (#2804)

* Make the TPR parser a bit faster A function in the TPR parser calls list.pop thousands of times, which is slow. This commit avoids that exansive call. Taking the TPR from https://github.com/bioexcel/covid_modelling_simulation_data/tree/master/spike_protein/full_spike/trimer the parsing time on my computer goes from 25.6s to 9.39s. On a more pathological TPR file, it goes from 3 minutes to about 6s. * Update changelog for #2804 Co-authored-by: Richard Gowers <richardjgowers@gmail.com>
MDAnalysis · Jul 3, 2020 · 2e2672c · 2e2672c
1 parent fe65603
commit 2e2672c
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 9 deletions.
diff --git a/package/CHANGELOG b/package/CHANGELOG
@@ -13,7 +13,7 @@ The rules for this file:
   * release numbers follow "Semantic Versioning" http://semver.org
 
 ------------------------------------------------------------------------------
-??/??/?? richardjgowers, IAlibay, orbeckst, tylerjereddy
+??/??/?? richardjgowers, IAlibay, orbeckst, tylerjereddy, jbarnoud
 
   * 1.0.1
 
@@ -22,6 +22,8 @@ Fixes
   * pip installation only requests Python 2.7-compatible packages (#2736)
   * Testsuite does not use any more matplotlib.use('agg') (#2191)
 
+Enhancements
+  * Improved performances when parsing TPR files (PR #2804)
 
 
 06/09/20 richardjgowers, kain88-de, lilyminium, p-j-smith, bdice, joaomcteixeira,
@@ -229,7 +231,7 @@ Deprecations
   * Writer.write_next_timestep is deprecated, use write() instead (remove in 2.0)
   * Writer.write(Timestep) is deprecated, use either a Universe or AtomGroup
 
->>>>>>> develop
+
 09/05/19 IAlibay, richardjgowers
 
   * 0.20.1

diff --git a/package/MDAnalysis/topology/tpr/obj.py b/package/MDAnalysis/topology/tpr/obj.py
@@ -129,10 +129,7 @@ def __init__(self, name, long_name, natoms):
         self.natoms = natoms
 
     def process(self, atom_ndx):
-        while atom_ndx:
-            # format for all info: (type, [atom1, atom2, ...])
-            # yield atom_ndx.pop(0), [atom_ndx.pop(0) for i in range(self.natoms)]
-
-            # but currently only [atom1, atom2, ...] is interested
-            atom_ndx.pop(0)
-            yield [atom_ndx.pop(0) for i in range(self.natoms)]
+        # The format for all record is (type, atom1, atom2, ...)
+        # but we are only interested in the atoms.
+        for cursor in range(0, len(atom_ndx), self.natoms + 1):
+            yield atom_ndx[cursor + 1: cursor + 1 + self.natoms]