Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tests faster #778

Merged
merged 8 commits into from Aug 5, 2012
Merged

Make tests faster #778

merged 8 commits into from Aug 5, 2012

Conversation

mdboom
Copy link
Member

@mdboom mdboom commented Apr 17, 2012

This is just a placeholder for some work that would be nice to get done.

  1. The baseline images are converted from PDF and SVG to PNG every run. These could be cached between runs to save a good portion of runtime

  2. Look into running tests in parallel. Nose supports this, but it wasn't working well last time I tried.

  3. Maybe we could use lower resolution PNGs and convert the PDFs and SVGs at lower resolution to make things faster. This would also hopefully gloss over some of the text-related false positives we see on a regular basis.

Any other ideas? Let's brainstorm!

@ivanov
Copy link
Member

ivanov commented Mar 20, 2012

this sounds great - a year ago we made the tests run faster, but since then, I think the fine grained nature of the text / mathtext tests made them pretty slow again (most of the time is spent in test_mathtext on my machine.)

there's also the issue (related to 1.) of calling inkscape multiple times - I think we've tried looking into this before about leaving a process running and batching those calls together.

it would also be nice to limit the testing to one of PDF, SVG, or PNG - I do this by hand to speed things up.

@astrofrog
Copy link
Contributor

If (and this is a big if) we can make freetype behave predictably (as we've started to in #779, though more work needs to be done), then would one not expect the SVG files to be identical, in which case we could just compare the contents of the file and not convert them to PNG? (same for PDF)

@mdboom
Copy link
Member Author

mdboom commented Mar 22, 2012

The freetype issue is totally orthogonal to SVG and PDF, since the freetype renderer is not used by vector backends. The SVG backend only uses the freetype TrueType file parser to get the curves, which come directly from the file and shouldn't be impacted by the version of freetype. The PDF backend doesn't even use that -- it uses a library called ttconv.

The conversion of SVG and PDF to PNG will be necessary for the foreseeable future.

Since both PDF and SVG use dictionaries and hash-based ids to some extent, the ordering of the elements changes between runs. The backends are constantly tweaked to fix bugs and improve efficiencies etc. which can change the structure of the file in radical ways while (if all goes well) not impacting the final result. The only way to look at the final result in an objective way is to render it to a bitmap and compare it in that space.

@mdboom
Copy link
Member Author

mdboom commented Apr 17, 2012

Here's the first chunk of changes to make the tests faster. As these things tend to be finicky across systems, I'd appreciate as much testing as possible.

  1. The conversions from PDF and SVG to PNG are cached. By default, this is to ~/.matplotlib/test-cache, but the location can be configured (as usual) with the MPLCONFIGDIR environment variable. The files are named based on the MD5 cache of the source file, so caching should work and have benefit even when flipping between different git branches. For this reason, only the baseline images are cached, since they have a fixed MD5 hash. Test result images can not be cached because the ordering of the elements in them may change between identical runs at the whim of dictionary ordering. This would result in the size of the cache exploding. Still, I get about a 33% speedup overall with this technique.

  2. For a smaller gain, the histogram of the images that is used for comparison is replaced by bincount if Numpy >= 1.6 (the first version to have a minlength argument to bincount). This was the largest time sink after ghostscript and inkscape when running tests.

  3. If available, it will use the python bindings to the ghostscript API (python-ghostscript). This prevents firing off separate processes and reinitializing ghostscript many times. The speedup is pretty modest, unfortunately, but it still may be worth leaving in. pip install ghostscript to get the bindings.

  4. Lastly, unrelated to speed, this changes the file-naming scheme so rather than expected-foo.png, we have foo-expected.png. When displaying the files in alphabetical order in a thumbnail viewer (such as gthumb) the relevant files are grouped together. This helps to more efficiently scan for problems.

@jkseppan
Copy link
Member

jkseppan commented May 5, 2012

It might be useful to leave ghostscript and inkscape running as separate processes. For ghostscript, you'd start it with a command line like

gs -sDEVICE=png16m -dNOPAUSE -sOutputFile=page%d.png

and send input lines like

(foo/bar/baz.pdf) run
(another/path/to/file.pdf) run

Trying this from the command line, it seems to work, but I don't know if there are subtle problems where e.g. one pdf file causes a font to be loaded that affects the rendering of the next file. Error recovery is another big question. I think you'd want to use something like pexpect to wait for gs prompts to know that the latest file has been completely written.

Inkscape has a --shell option that allows you to send it input lines like "input.svg --export-png=output.png".

@jkseppan
Copy link
Member

jkseppan commented May 5, 2012

Our tests are mainly end-to-end regression tests: you run a script and check that the output stays the same as it was before. That's useful for noticing regressions, but tests of a smaller scope would be faster to run. We could for example set up a test suite that includes predefined sequences of backend function calls and checks that the output for each sequence stays the same. That would allow skipping the frontend code when you are working on a specific backend. Likewise, we could have a stub backend that only records the calls made by the frontend code, and compares that sequence to a predefined "correct" sequence for a given test case.

A related thought: if the freetype version differences are problematic, we could make most test cases not include any text output (disable tick labels, etc), and have a smaller number of cases that specifically test text rendering, and include a larger set of correct answers for just those tests.

@jkseppan
Copy link
Member

jkseppan commented May 5, 2012

Hey, it shouldn't be too hard to make a checksummer that's aware of pdf syntax and tolerates trivial changes like reordering of dictionary elements and ignores things like the CreationDate timestamp. That might speed the usual case up quite a bit.

@mdboom
Copy link
Member Author

mdboom commented May 7, 2012

@jkseppan: These are some good ideas.

I had implemented using Inkscape around a year ago, but ultimately reverted it. Inkscape leaked memory with each render, so it needed to be restarted periodically, and I wasn't terribly comfortable with coming up with a heuristic for how often to do that. See 9086ef1 for the original kick at that can. It may be that current versions of Inkscape are less leaky, and it's also possible that my implementation could be improved. I can't find my old numbers, but I don't recall that it made a huge improvement in runtime, but that may have been that the memory leaks were causing problems on a memory-constrained machine.

I haven't attempted a similar approach with ghostscript, but it's nice to know it can be done.

I think removing text except where it is necessary is a good idea. There are cases where the ticking itself is being tested and we want to make sure the numbers and layout of the ticks are correct, but for most tests the ticking is superfluous.

@mdboom
Copy link
Member Author

mdboom commented May 7, 2012

I have an initial commit that coverts all of the PDF's to PNG's using the same gs process. The speedup isn't terribly dramatic: It went from 4:30 to 4:15 wall clock time on my machine. It would be great if this could be tested -- I have no idea how robust this is to gs crashing etc.

@mdboom
Copy link
Member Author

mdboom commented May 8, 2012

Rebased on current master

@jkseppan
Copy link
Member

Tests pass on a Macbook Pro running OS X 10.6.8, Python 2.7.2, GPL Ghostscript 8.71:

Ran 1090 tests in 615.499s

OK (KNOWNFAIL=2)

@jkseppan
Copy link
Member

I tried this out on Amazon EC2. I compared the runtime of python -c 'import matplotlib; matplotlib.test(1)' of this commit ("after") and this commit with e062691 reverted ("before"). I used an Amazon Linux AMI that has Python 2.6, installed various prerequisites and compiled matplotlib from git. I couldn't find a version of Inkscape for that distribution, so the numbers don't include svg conversion.

On a small instance (single core, pretty slow), "before" took 702 seconds and "after" took 706 seconds, so it was slightly slower but probably within measurement noise.

On a high-cpu medium instance (dual core, each supposedly 2.5 times faster than the small instance), "before" took 452 seconds and "after" took 357 seconds. Just to be sure, I ran "before" again, and it took 446 seconds, so I don't think it was a fluke.

Based on this, it looks like this makes sense on dual-core but doesn't help on single-core. Perhaps the overhead from starting a new process is similar to that from context switches on single-core.

@pelson
Copy link
Member

pelson commented May 23, 2012

Since both PDF and SVG use dictionaries and hash-based ids to some extent, the ordering of the elements changes between runs.

Depending on how much of an overhead the conversion/storage of pdf&svg -> png is, we could always use an OrderedDict for python >= 2.7 (there are loads of OrderedDics recipes pre that version) to give the svg/pdf output a consistent ordering.

…ff" toward the end of the filename so files are more appropriately grouped when viewed in file managers and thumbnail tools (eg. gthumb)
….6. Continue to use histogram for earlier Numpy versions.
…ache the baseline images, since the test results are never the same twice given different dictionary orderings in the file.
… removes titles and ticks from a figure before comparison. Use this throughout to make more of the tests less dependent on text differences.
@pelson
Copy link
Member

pelson commented Jul 5, 2012

@mdboom: What kind of state is this in? I'm prepared to go through these changes if its going to save us all time in the long run.

@mdboom
Copy link
Member Author

mdboom commented Aug 3, 2012

@jkseppan: The real advantage of this PR is on subsequent runs, since it caches the results of conversions. For a real speed comparison, you should run each test run twice (cleaning ~/.matplotlib before each pair of runs).

@pelson: I think this is good to merge. It would be nice to be able to get some predictable ordering in SVG and PDF such that the PNG conversions of the test results (rather than just the baseline images) could also be done. But there's no reason that can't be done as a follow-up in a separate PR.

@@ -127,7 +172,7 @@ def comparable_formats():
on this system.'''
return ['png'] + converter.keys()

def convert(filename):
def convert(filename, cache):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high level comments about cache wouldn't go amiss here.

@pelson
Copy link
Member

pelson commented Aug 3, 2012

On the whole I agree with removing the texts on some of the test results. However I wonder if the text in the symlog2 should remain. (there may be more as I go through them)

@pelson
Copy link
Member

pelson commented Aug 3, 2012

I've got to the stage of reviewing the actual test_*.py files, which I will try to do tomorrow, but I have no probs with the rest of the changes.

@pelson
Copy link
Member

pelson commented Aug 5, 2012

@mdboom: I'm going to merge this while it is still fresh (and mergable). For some reason github isn't showing all the diffs (the test files for instance) so I had to read the diffs on my clone. There are no show stoppers, but I do think it is worth addressing some of my comments either on this PR, or in a seperate one (if they need code).

Good stuff.

pelson added a commit that referenced this pull request Aug 5, 2012
@pelson pelson merged commit 1456f05 into matplotlib:master Aug 5, 2012
@mdboom mdboom mentioned this pull request Aug 6, 2012
@mdboom mdboom deleted the tests-faster branch March 3, 2015 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants