Use multiprocessing to speed up offline compression #353

Closed
wants to merge 7 commits into
from

Conversation

Projects
None yet
5 participants
Contributor

pindia commented Jan 5, 2013

For large projects, offline compression starts to take a while because there are many templates with many files each to process. But since each template is compiled separately, we can use multiprocessing to compile them at the same time.

This patch adds an option --processes to the compress manager command to instruct it to use multiple processes.

I'm submitting this request before writing tests and documentation for the new behavior to see if this is something that you'd be interested in merging. If so, I'm happy to do both.

Owner

diox commented Jan 5, 2013

Hi,

I like the idea of having offline compression run faster :) Have you ran any benchmarks ? My gut instinct tells me that the biggest performance bottleneck when running the compress command is disk I/O (walking through the template dir and reading template files), so I'm curious about the gains you see with your patch ; If there are solid gains, I'd be interested into merging this. On the machine I'm using to write this, which has an SSD, compress is so fast I can't reliably test, even with hundreds of templates.

Note: compressor still supports python 2.5, which doesn't have multiprocessing. Maybe you could entirely de-activate multi-processing support if multiprocessing import fails (and we could add multiprocessing as an optional dependency for people using 2.5).

Contributor

pindia commented Jan 5, 2013

I use a lot of filters that are implemented in pure Python and am caching external filters (#346), so a lot of the time is spent in Python. On my 4-core machine with an SSD, the time taken for my project is reduced from 1m16s to 24s.

Owner

diox commented Jan 5, 2013

Ah, right, thanks for the benchmark, it makes sense.

diox commented on f61f99b Jan 5, 2013

Looks ok (maybe warn the user if he tries to use multiprocessing but hasn't multiprocessing support ?) but you forgot to remove the import multiprocessing at the top of the file, so the travis build is still failing :-)

Owner

diox commented Jan 5, 2013

Minus the comment I made on the commit, it looks fine to me. Can you add tests and documentation? Thanks!

About python 2.5 support and tests: cc'ing @jezdez who might disagree, but here is what I think we should do:

Making tests with and without multiprocessing in 2.5 to test both cases could be tricky. I think that the simplest solution is to make the 2.5 tests run without multiprocessing module, and add something in the documentation saying that the option might work in python 2.5 with the backported module installed, but it's not officially supported because it's not tested.

What do you think ?

Contributor

pindia commented Jan 6, 2013

What about this approach, adding another test method to OfflineTestCaseMixin to always run the tests with and without multiprocessing, skipping the multiprocessing test if the module is not available?

I'm still thinking about how to verify that multiprocessing was actually used in the appropriate tests.

Owner

diox commented Jan 6, 2013

Yup, that's a good idea, but (and I wasn't clear on that, my fault) my main concern was the Travis builds. We could add another target, but I am not sure it's worth the hassle.

+1. Can't wait for this :)

Hey, what's the status of this? Any chance it's going to get landed in soon?

Owner

diox commented May 25, 2014

We don't care about python 2.5 anymore, so I'd be happy to merge this pull request if it's rebased against the current develop branch.

Contributor

karyon commented Apr 21, 2016

superseded by #737 which looks nicer and is less out of date :)

@karyon karyon closed this Apr 21, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment