Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the TPR parser a bit faster #2804

Merged
merged 3 commits into from Jul 2, 2020
Merged

Conversation

jbarnoud
Copy link
Contributor

@jbarnoud jbarnoud commented Jun 29, 2020

A function in the TPR parser calls list.pop thousands of times, which is
slow. This commit avoids that exansive call.

Taking the TPR from
https://github.com/bioexcel/covid_modelling_simulation_data/tree/master/spike_protein/full_spike/trimer
the parsing time on my computer goes from 25.6s to 9.39s. On a more
pathological TPR file, it goes from 3 minutes to about 6s.

Changes made in this Pull Request:

PR Checklist

- [ ] Tests?
- [ ] Docs?

  • CHANGELOG updated?
  • Issue raised/referenced?

A function in the TPR parser calls list.pop thousands of times, which is
slow. This commit avoids that exansive call.

Taking the TPR from
https://github.com/bioexcel/covid_modelling_simulation_data/tree/master/spike_protein/full_spike/trimer
the parsing time on my computer goes from 25.6s to 9.39s. On a more
pathological TPR file, it goes from 3 minutes to about 6s.
Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbarnoud looks solid, thanks!

@codecov
Copy link

codecov bot commented Jun 29, 2020

Codecov Report

Merging #2804 into develop will decrease coverage by 0.13%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2804      +/-   ##
===========================================
- Coverage    92.22%   92.08%   -0.14%     
===========================================
  Files          184      183       -1     
  Lines        24141    23670     -471     
  Branches      3123     3083      -40     
===========================================
- Hits         22263    21797     -466     
+ Misses        1813     1808       -5     
  Partials        65       65              
Impacted Files Coverage Δ
package/MDAnalysis/topology/tpr/obj.py 96.72% <100.00%> (-0.06%) ⬇️
package/MDAnalysis/auxiliary/base.py 90.75% <0.00%> (-0.57%) ⬇️
package/MDAnalysis/coordinates/chain.py 91.94% <0.00%> (-0.37%) ⬇️
package/MDAnalysis/coordinates/base.py 93.88% <0.00%> (-0.33%) ⬇️
package/MDAnalysis/coordinates/TRZ.py 88.16% <0.00%> (-0.27%) ⬇️
package/MDAnalysis/coordinates/GSD.py 88.63% <0.00%> (-0.26%) ⬇️
package/MDAnalysis/coordinates/chemfiles.py 88.12% <0.00%> (-0.22%) ⬇️
package/MDAnalysis/coordinates/INPCRD.py 93.33% <0.00%> (-0.22%) ⬇️
package/MDAnalysis/coordinates/GMS.py 92.30% <0.00%> (-0.16%) ⬇️
package/MDAnalysis/coordinates/TXYZ.py 93.18% <0.00%> (-0.16%) ⬇️
... and 38 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8314f7d...2fc42b3. Read the comment docs.

@orbeckst
Copy link
Member

Could be backported to 1.0.1 via PR #2798.

@richardjgowers richardjgowers merged commit 61e236d into develop Jul 2, 2020
@richardjgowers richardjgowers deleted the slightly-faster-tpr branch July 2, 2020 17:16
orbeckst pushed a commit that referenced this pull request Jul 3, 2020
* Make the TPR parser a bit faster

A function in the TPR parser calls list.pop thousands of times, which is
slow. This commit avoids that exansive call.

Taking the TPR from
https://github.com/bioexcel/covid_modelling_simulation_data/tree/master/spike_protein/full_spike/trimer
the parsing time on my computer goes from 25.6s to 9.39s. On a more
pathological TPR file, it goes from 3 minutes to about 6s.

* Update changelog for #2804

Co-authored-by: Richard Gowers <richardjgowers@gmail.com>
orbeckst pushed a commit that referenced this pull request Jul 12, 2020
* Make the TPR parser a bit faster

A function in the TPR parser calls list.pop thousands of times, which is
slow. This commit avoids that exansive call.

Taking the TPR from
https://github.com/bioexcel/covid_modelling_simulation_data/tree/master/spike_protein/full_spike/trimer
the parsing time on my computer goes from 25.6s to 9.39s. On a more
pathological TPR file, it goes from 3 minutes to about 6s.

* Update changelog for #2804

Co-authored-by: Richard Gowers <richardjgowers@gmail.com>
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
* Make the TPR parser a bit faster

A function in the TPR parser calls list.pop thousands of times, which is
slow. This commit avoids that exansive call.

Taking the TPR from
https://github.com/bioexcel/covid_modelling_simulation_data/tree/master/spike_protein/full_spike/trimer
the parsing time on my computer goes from 25.6s to 9.39s. On a more
pathological TPR file, it goes from 3 minutes to about 6s.

* Update changelog for MDAnalysis#2804

Co-authored-by: Richard Gowers <richardjgowers@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants