Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failing tests on maintenance branch #779

Merged
merged 5 commits into from Mar 22, 2012
Merged

Conversation

mdboom
Copy link
Member

@mdboom mdboom commented Mar 20, 2012

Fix failing tests -- many of them were broken due to changes in the snapping algorithm that made the axes off by one pixel. Others were broken due to changes in text alignment. Also fixes a bug in the SVG backend's Gouraud shading.

@jdh2358
Copy link
Collaborator

jdh2358 commented Mar 20, 2012

I just accidentally ran the tests on a remote box w/ no X11 connection and got a ton of failures because the test import triggered my backend import (which was GTKAgg). The traceback is below. The question is: would it do any harm to call:

use('agg') 

as the first line of matplotlib.test, eg before importing nose here: https://github.com/matplotlib/matplotlib/blob/v1.1.x/lib/matplotlib/__init__.py#L995

Michael, if you think this is a good idea maybe we should just tack it on to this PR

here is the traceback:
johnh@lettuce:mpl_test> python -c 'import matplotlib as m; m.test()'
/usr/lib64/python2.7/site-packages/gtk-2.0/gtk/init.py:57: GtkWarning: could not open display
warnings.warn(str(e), _gtk.Warning)
/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/backends/backend_gtk.py:49: GtkWarning: IA__gdk_cursor_new_for_display: assertion `GDK_IS_DISPLAY (display)' failed

ERROR: Failure: RuntimeError (could not create GdkCursor object)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/nose/loader.py", line 379, in loadTestsFromName
module = resolve_name(addr.module)
File "/usr/lib/python2.7/site-packages/nose/util.py", line 321, in resolve_name
module = __import__('.'.join(parts_copy))
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/tests/test_dates.py", line 3, in <module>
from matplotlib.testing.decorators import image_comparison, knownfailureif, cleanup
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/testing/decorators.py", line 8, in <module>
from matplotlib import pyplot as plt
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/pyplot.py", line 95, in <module>
new_figure_manager, draw_if_interactive, _show = pylab_setup()
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/backends/__init__.py", line 25, in pylab_setup
globals(),locals(),[backend_name])
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/backends/backend_gtkagg.py", line 10, in <module>
from matplotlib.backends.backend_gtk import gtk, FigureManagerGTK, FigureCanvasGTK,\
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/backends/backend_gtk.py", line 49, in <module>
cursors.MOVE          : gdk.Cursor(gdk.FLEUR),
RuntimeError: could not create GdkCursor object

@mdboom
Copy link
Member Author

mdboom commented Mar 20, 2012

@jdh2358: Sure -- that works for me. Added to this PR.

@astrofrog
Copy link
Contributor

I also had issues where the tests would try and open many interactive windows, so I'd be in favor of explicitly setting the backend for the tests to 'Agg'.

@jdh2358
Copy link
Collaborator

jdh2358 commented Mar 20, 2012

I'm still getting 50 test failures and I haven't checked all of them but the ones I've spot checked appear to be the dreaded font differences. This can be ignored for now, but we need to put our heads together and come up with a system where we are not overriding each other's baseline images because of local configuration differences. One approach would be to get some racked hosting and give all developers access to a common build environment for testing. Another would be to standardize on a freetype version and config.

@mdboom
Copy link
Member Author

mdboom commented Mar 20, 2012

@jdh2358: I'd like to experiment with this font issue a little bit. Can you send me one image with lots of mismatched text? I'm going to see if I can somehow reproduce your output by playing with Freetype's various flags.

@astrofrog
Copy link
Contributor

@mdboom - I'm getting lots of test failures because PIL isn't installed, but I wasn't getting this with the master branch, because I think the PIL dependence was removed in the mean time. I'll create a new environment with PIL to test this pull request.

@astrofrog
Copy link
Contributor

@jdh2358 - regarding font differences, I like the option to explicitly state that the tests should pass with a specific version of freetype. If you know which versions of freetype cause issues, one could even mark the relevant tests as known fails (for those 'bad' freetype versions). This requires knowing the freetype version in the tests though.

@mdboom
Copy link
Member Author

mdboom commented Mar 20, 2012

@astrofrog: Yes, PIL is required for 1.1.x, but not for master.

The problem with relying on a particular version of freetype is that it affects virtually every test -- we'll have a basically useless
test suite if that's the case. And I don't want to force all developers to build their own freetype if I can avoid it. I'm still looking into ways that the freetype renderer may be forced to behave a certain way.

@astrofrog
Copy link
Contributor

I've tested the branch in this PR with various Python and Numpy versions, and it does fix some of the test_axes tests:

matplotlib.tests.test_axes.test_marker_edges.test
matplotlib.tests.test_axes.test_markevery.test
matplotlib.tests.test_axes.test_polar_annotations.test
matplotlib.tests.test_axes.test_polar_rmin.test
matplotlib.tests.test_axes.test_polar_theta_position.test
matplotlib.tests.test_axes.test_polar_wrap.test
matplotlib.tests.test_axes.test_polar_wrap.test
matplotlib.tests.test_axes.test_symlog.test

There are still some tests failing, but looking at the diff images, this seems to be due to fonts (the residuals for the other test_axes tests now don't have a frame).

The log for the build/install/tests is here: http://db.tt/iNBIScaA

The new test images including residuals is here: http://db.tt/tvbk02A0

So the bottom line is that this is an improvement (67 instead of 77 failures), and I think most of the remaining issues are font-based (but if you look at the residuals, you'll see some exceptions).

I agree that it would be better to figure out how to force freetype to render a certain way, rather than requiring a specific version.

@mdboom
Copy link
Member Author

mdboom commented Mar 20, 2012

I'm not finding anything useful to fix the freetype version difference problem. There are some other more blunt options, like converting text to rectangles when in test mode or something, but that brings its own set of problems.

I'd still lean toward merging this before the release, despite its problems, with a note that the baseline images were generated with freetype 13.1.7 and any other versions may cause problems with the tests.

@jdh2358
Copy link
Collaborator

jdh2358 commented Mar 20, 2012

Yes, I'm in favor of merging too rather than worrying about this long-standing issue right now. This is something we can work on ahead of the next release. @astrofrog, could you tell us which test failures you are seeing that don't appear font related?

@astrofrog
Copy link
Contributor

I agree that this should be merged without waiting for a font-related fix. If we know which freetype version should work, we can at least set up the continuous integration with that version. I'll try and do that on my side.

Tests that look like they are not just font related:

matplotlib.tests.test_axes.test_canonical.test (frame issue)
matplotlib.tests.test_text.test_font_styles.test (frame issue)
matplotlib.tests.test_axes.test_basic_annotate.test (frame issue)
matplotlib.tests.test_axes.test_shaped_data.test (frame and markers issues)
matplotlib.tests.test_axes.test_single_point.test (frame and markers issues)

The images are all in the tar file I linked to previously.

@jdh2358
Copy link
Collaborator

jdh2358 commented Mar 20, 2012

I can confirm that canonical, shaped_data, font_styles and single_point have changed (I am not seeing any images for test_basic_annotate at all). They did not trigger an error on my system because the diff was too small. They look like the difference is from #695 because there is a 1 pixel shift in the entire axes bounding box and tick markers. My suggestion is to add the new baseline image for these tests after you have had a chance to confirm that the new look correct to you Michael.

@mdboom
Copy link
Member Author

mdboom commented Mar 20, 2012

The version of freetype I'm using to generate the images is 13.1.7. The version John is using (which has different results) is 12.2.6. Perhaps anything in the 13 series is close enough... I'm not sure how to determine except by experimentation.

This is messy. In this pull request, I only updated the images that were failing for me -- which were primarily related to the frame snapping offset differences. There are some bonafide frame snapping offset differences still in there, but they don't cause the threshold to be tripped unless, as on @astrofrog's machine, a different version of freetype is being used. I'm thinking the best thing to do may be to update all the pngs to my output set (I've already manually verified that everything is close enough to correct). It would then be nice to lower the threshold to be more sensitive to changes once we can assume that freetype differences don't need to be accounted for. Better yet, to not have a threshold but to expect a perfect match would be even better...

@jdh2358
Copy link
Collaborator

jdh2358 commented Mar 20, 2012

Agree on pushing all of your result images as the new baseline since you have manually verified them and the diffs we are seeing are wither font related or pixel shifting changes where the new result images are the correct ones.

@astrofrog
Copy link
Contributor

I had freetype 14.1.8. I'm just setting up my Jenkins environment with 13.1.7 so I can get rid of the font errors and hopefully be able to focus on the real failures. I should be able to report back in a couple of hours.

By the way, out of interest, what are all the known failures? Are they all font-related?

@astrofrog
Copy link
Contributor

Side note: did you see that Github allows you to show the actual difference image between the old and new images in your commit? (default is just side by side, but difference is an option) Very cool!

@WeatherGod
Copy link
Member

I performed the tests on Linux with freetype 13.1.7 (before commit 152fb09), and I had two Known fails and 1 other failure. The failure was with test_axes.test_pcolormesh with the svg output.

@astrofrog
Copy link
Contributor

Is there any way that the libpng version could affect things? Which version of libpng are you using? I tried running the tests building Matplotlib with freetype 13.1.7 (instead of 14.1.8), and I'm still getting the same label-related failures. It almost looks more like an anti-aliasing difference than differences in actual font/characters. In most cases it seems the characters are the same, but the differences are in the faint anti-aliased part of the font, so the difference images show a 'fuzz' around each character.

@WeatherGod
Copy link
Member

Looks like the baseline image for pcolormesh_svg.png is missing the gourond shading example (it is blank) meanwhile, the generated gourond shading for svg does not look as smooth as it is for the png output

@WeatherGod
Copy link
Member

obviously, I can't spell.... gouraud shading

@astrofrog
Copy link
Contributor

By the way, is it normal there are so many known failures on MacOS X?

FAILED (KNOWNFAIL=526, failures=67)

(this is with freetype 13.1.7)

@jdh2358
Copy link
Collaborator

jdh2358 commented Mar 20, 2012

@astrofrog , you have so many known fails because you don't have the requirements to test one or more of the PDF, PS or SVG backends. the test infrastructure will check for a dependency for a given image type, and if it is not there, register it as a knownfail.

http://matplotlib.sourceforge.net/devel/coding_guide.html#running-the-tests

@astrofrog
Copy link
Contributor

@jdh2358 - ah, that makes sense, thanks! I'm running these in a clean environment, so I'll try and add the requirements for the other tests.

Regarding the font issue, I'm fairly convinced it's an anti-aliasing issue (with the fonts) - is there any way that freetype on Mac might include a different anti-aliasing algorithm than on Linux?

@mdboom
Copy link
Member Author

mdboom commented Mar 20, 2012

@WeatherGod: I neglected to include the fixed pcolormesh.svg baseline image. It's normal that the gouraud shading in SVG is not as smooth as the other backends -- it has to be "faked" since it isn't native to the format.

As for differences in freetype, it may be that it is configured differently. This is the ftconfig.h for my machine (it is the prebuilt package for Fedora 16):

http://db.tt/I5hfjXke

Seeing how that compares with your ftconfig.h may offer some clues.

See also this:

http://www.freetype.org/freetype2/docs/ft2faq.html#builds-differences

@astrofrog
Copy link
Contributor

My ftconfig.h file is pretty long... http://db.tt/9OPuGJeH

Though according to the second link you sent - maybe we should compare ftoption.h? Mine is: http://db.tt/6Jo7a8wH

@mdboom
Copy link
Member Author

mdboom commented Mar 20, 2012

Ah, indeed. The correct file to compare is ftoption.h. Mine is here:

http://db.tt/pZgy6q52

@mdboom
Copy link
Member Author

mdboom commented Mar 20, 2012

Hmmm... and they're identical. Perhaps there is some sort of gamma correction or colorspace conversion going on the PNG reading/writing. (It should otherwise be lossless). As an experiment -- can you run the simple_plot.py example and save it in rgba format (this is just a raw dump of the image buffer). I've put mine up here.

http://db.tt/t8ma7qVt

If there are no differences here, but there are when saved as png we can rule out freetype and point the finger at png (though it seems like a longshot).

@astrofrog
Copy link
Contributor

Ok, so I did some further tests with freetype_test, and it seems some other characters do have changes in the monochromatic version at other versions of freetype. For example, character 'y' (which I think is causing some of the above failures) changes at 2.3.0, 2.3.9, 2.4.4, and 2.4.5. This is why we are still not getting consistent results across all versions.

@astrofrog
Copy link
Contributor

Anyway, it seems like the current pull request already provides significant improvement compared to before, so maybe we should just merge this (after marking the above tests as known fails) for this candidate release, and we can do a more exhaustive test of freetype varying the characters to see where all the transitions are, and whether we can fix any of them?

@jdh2358
Copy link
Collaborator

jdh2358 commented Mar 22, 2012

This is the last issue to close before I burn the 1.1.1 release candidate and put a tarball up for building and testing, so Michael, fire at will when you are ready to merge this sucker. If you want to wait another day and try stamping out the remaining brush fires, I have no problem delaying.

@mdboom
Copy link
Member Author

mdboom commented Mar 22, 2012

I can hopefully polish this a little more tonight... save the bigger ideas for another time.

@astrofrog
Copy link
Contributor

@mdboom - should we start a new issue to keep track of more findings? I've done some more research into which freetype versions are causing issues, but I don't want to continue flooding this PR with information at this stage.

@astrofrog
Copy link
Contributor

I won't go into much detail here, but basically 2.4.5 is the freetype version where the most characters changed. Other versions where things change are 2.3.0, 2.3.10, 2.4.0, 2.4.4, and 2.4.5. But the bottom line of my testing is that things are stable from 2.4.5 onwards (for monochromatic font rendering), which is the most important for now.

Once there is a new issue to keep track of future progress on this topic, I will post more detailed findings, and we can then see if any of the changes can be dealt with in the calls to the freetype library.

@astrofrog astrofrog mentioned this pull request Mar 22, 2012
@astrofrog
Copy link
Contributor

Final comment from me: on MacOS 10.7, all the tests pass (with freetype 2.4.9), apart from the failures in #788, but I think those must be Jenkins-specific.

@mdboom
Copy link
Member Author

mdboom commented Mar 22, 2012

@jenshnielsen: That missing 'i' is troubling, but I wonder if it's due to mono rendering rounding error or in fact not reading the character at all. Does the 'i' appear in the corresponding SVG and/or PDF in your output?

@jenshnielsen: How are the delaunay tests failing? Font-only or something else?

@astrofrog: I've marked those 4 failing tests as needing freetype version 2.4.5 through 2.4.9. That should resolve this for now and then (barring jenshnielsen's issues) should resolve this PR.

@astrofrog
Copy link
Contributor

All right, now everything seems fine!

Linux with freetype 2.3.9:

OK (KNOWNFAIL=11)

Linux with freetype 2.2.1:

FAILED (KNOWNFAIL=273, errors=1)

but the one error is just a gs error to do with pcolormesh, but this might just be due to the version of gs.

MacOS 10.7 with freetype 2.4.9:

FAILED (KNOWNFAIL=2, errors=2)

but these are just the inkscape errors in #787 (I think that I managed to fix the font_style errors from #788 - more later in that issue)

So this PR looks good to me!

@mdboom
Copy link
Member Author

mdboom commented Mar 22, 2012

I missed earlier on that @jenshnielsen pointed out that test_delaunay and test_legend are not part of the standard set of tests. I usually test directly with nose at the commandline (for various reasons) so I didn't notice that. How do those test sets fare font-wise? @astrofrog: Would you mind testing those with one of your ancient freetypes?

@mdboom
Copy link
Member Author

mdboom commented Mar 22, 2012

I'm going to mark the 4 delaunay tests @jenshnielsen as known fail for old freetypes.

@astrofrog
Copy link
Contributor

@mdboom - how do I activate these tests?

@mdboom
Copy link
Member Author

mdboom commented Mar 22, 2012

@astrofrog: They should be activated by commit f22547c

@mdboom
Copy link
Member Author

mdboom commented Mar 22, 2012

The problem running tests remotely should now also be fixed.

@astrofrog
Copy link
Contributor

Linux with freetype 2.2.1:

FAILED (KNOWNFAIL=280, errors=1)

(no inkscape, and 1 error due to a ghostscript error, which does seem like a genuine error, but I don't think it's related to this pull request, so will open a new issue for that if it persists)

Linux with freetype 2.3.9:

OK (KNOWNFAIL=16)

MacOS 10.7:

FAILED (KNOWNFAIL=2, errors=3)

(errors are inkscape as usual)

@astrofrog
Copy link
Contributor

Looks ready to merge!

mdboom added a commit that referenced this pull request Mar 22, 2012
Fix failing tests on maintenance branch
@mdboom mdboom merged commit 5846304 into matplotlib:v1.1.x Mar 22, 2012
@astrofrog
Copy link
Contributor

(forgot to mention - the tests do work remotely now!)

@ivanov
Copy link
Member

ivanov commented Mar 23, 2012

I just want to reiterate: EPIC EFFORT @mdboom & @astrofrog!!! job well done, I can't believe all these tests are passing now!

@astrofrog
Copy link
Contributor

For info, I've opened a new ticket (#792) to keep track of progress with the freetype issue moving forward (though no immediate action is required - just thought I'd let you know in case you want to sign up for notifications on that issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants