New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of MDAnalysis Readers #1694

Open
kain88-de opened this Issue Oct 31, 2017 · 0 comments

Comments

Projects
None yet
1 participant
@kain88-de
Member

kain88-de commented Oct 31, 2017

I was wondering recently what performance penalty we have to pay for our reader infrastructure. Since we now have the low level libraries for DCD and XTC this actually easy to test. I did a short test for the DCD reader (gist with all measurements)

u = mda.Universe(PSF, DCD)
for ts in u.trajectory:
    pass

This takes about 10 ms on my laptop

dcd = mda.lib.formats.libdcd
for frame in dcd:
    pass

This takes only 2 ms. So there is a huge potential for improvement.

Using a line profiler (%prun) it's easy to see that _frame_to_ts and from_timestep take up most of the additional time. Below is the whole output from the profiler.

I guess we can improve the speed of _frame_to_ts for DCD by sniffing the dimension format once and not for every frame and moving the conversion functions into libdcd, as separate functions of the module.

For from_timestep this is not so easy. Here we do a lot of copying (which is good for a lot of use cases!) and some if-else checks. Improving the speed of this function will likely be much harder and require more careful analysis. On the plus side this would benefit all readers.

Profiler output

       11509 function calls (10914 primitive calls) in 0.016 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       99    0.002    0.000    0.002    0.000 {method 'read' of 'MDAnalysis.lib.formats.libdcd.DCDFile' objects}
       99    0.002    0.000    0.006    0.000 DCD.py:210(_frame_to_ts)
      198    0.001    0.000    0.001    0.000 base.py:575(positions)
       99    0.001    0.000    0.006    0.000 base.py:277(from_timestep)
   693/99    0.001    0.000    0.002    0.000 copy.py:132(deepcopy)
       99    0.001    0.000    0.001    0.000 {method 'copy' of 'numpy.ndarray' objects}
      198    0.001    0.000    0.001    0.000 {method 'reduce' of 'numpy.ufunc' objects}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment