New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH : added test on number of large differences between images #2730
Conversation
Added a test in compare_images to count the number of pixel which are different by some threshold. If the number of pixels which have a difference which exceeds this threshold is greater than some maximum, the test fails. This is to address the observation that some of the image tests lost half of their axes labels, but were still passing the RMS test.
Should we also set the excursion_threshold on some tests that were problematically not failing here to let Travis give this code a once over? |
At the time of @mgiuca-google /@mgiuca 's addition of the RMS calculation I made sure we had some good testing of the functionality - I encourage the same here - some unit tests would be valuable I think. @tacaswell - in terms of implementation, I think I'd like to see two distinct functions - one for RMS and another for pixelwise thresholded difference count. I'm more than happy for the two functions to be called by I'm trying to think of a usecase for the actual threshold value (different shades of grey maybe) - perhaps just a non-zero difference count is just as valuable? On the whole - great work! 😄 |
Discussed this with @mdboom. Going to hold of work until we can find a way to standardize the image tests on a fixed version of freetype. |
To summarize the discussion: If we could have a optional build mode that downloads and uses a specific version of freetype, then we can regenerate all of the images with that, and then Travis-CI and anyone who cares about text matching would just use that special build and we could move the comparison threshold down to 0. But that's just a completely untested theory... |
My aspiration is to implement unit tests on resulting graphics contexts, and then to implement backend specific unit tests for context -> renderer. I think we could then add some fuzzy logic for text locations etc. if needs be. Do you know of a reason why that wouldn't be feasible? |
I'm not sure I follow what you mean by testing the resulting graphics contexts? You're saying capture the stream of backend calls and test against some known values? That might make sense for some things, but not in the general case. There are many changes that get made to matplotlib that change the sequencing or content of the backend graphics context but that are meant to have no net effect on the resulting image. That's why the tests are designed to be as end-to-end as possible, and, for example, we don't compare PDF output but the results of rendering that PDF output. That said, of course testing in smaller pieces and at lower levels is extremely valuable and we don't have enough of it. I just see that as adjunct to what we already are doing, not a replacement for it. |
Added a test in compare_images to count the number of pixel which
are different by some threshold. If the number of pixels which
have a difference which exceeds this threshold is greater than
some maximum, the test fails.
This is to address the observation (see #2707) that some of the image tests
lost half of their axes labels, but were still passing the RMS
test.