Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH : added test on number of large differences between images #2730

Closed
wants to merge 1 commit into from

Conversation

tacaswell
Copy link
Member

Added a test in compare_images to count the number of pixel which
are different by some threshold. If the number of pixels which
have a difference which exceeds this threshold is greater than
some maximum, the test fails.

This is to address the observation (see #2707) that some of the image tests
lost half of their axes labels, but were still passing the RMS
test.

Added a test in compare_images to count the number of pixel which
are different by some threshold.  If the number of pixels which
have a difference which exceeds  this threshold is greater than
some maximum, the test fails.

This is to address the observation that some of the image tests
lost half of their axes labels, but were still passing the RMS
test.
@mdboom
Copy link
Member

mdboom commented Jan 15, 2014

This is actually very similar to the original image comparison metric that @ivanov (I believe) implemented, before it was replaced by @mgiuca. Not to say this is a regression, because it think it bears out from the data that both metrics are ultimately useful.

@mdboom
Copy link
Member

mdboom commented Jan 15, 2014

Should we also set the excursion_threshold on some tests that were problematically not failing here to let Travis give this code a once over?

@pelson
Copy link
Member

pelson commented Jan 15, 2014

At the time of @mgiuca-google /@mgiuca 's addition of the RMS calculation I made sure we had some good testing of the functionality - I encourage the same here - some unit tests would be valuable I think.

@tacaswell - in terms of implementation, I think I'd like to see two distinct functions - one for RMS and another for pixelwise thresholded difference count. I'm more than happy for the two functions to be called by compare_images but from an external point of view, having a separation would be a good thing IMHO (not to mention it makes unit testing easier).

I'm trying to think of a usecase for the actual threshold value (different shades of grey maybe) - perhaps just a non-zero difference count is just as valuable?

On the whole - great work! 😄

@tacaswell
Copy link
Member Author

Discussed this with @mdboom. Going to hold of work until we can find a way to standardize the image tests on a fixed version of freetype.

@tacaswell tacaswell closed this Jan 24, 2014
@mdboom
Copy link
Member

mdboom commented Jan 24, 2014

To summarize the discussion: If we could have a optional build mode that downloads and uses a specific version of freetype, then we can regenerate all of the images with that, and then Travis-CI and anyone who cares about text matching would just use that special build and we could move the comparison threshold down to 0. But that's just a completely untested theory...

@pelson
Copy link
Member

pelson commented Jan 24, 2014

If we could have a optional build mode that downloads and uses a specific version of freetype, then we can regenerate all of the images with that, and then Travis-CI and anyone who cares about text matching would just use that special build and we could move the comparison threshold down to 0. But that's just a completely untested theory...

My aspiration is to implement unit tests on resulting graphics contexts, and then to implement backend specific unit tests for context -> renderer. I think we could then add some fuzzy logic for text locations etc. if needs be. Do you know of a reason why that wouldn't be feasible?

@mdboom
Copy link
Member

mdboom commented Jan 24, 2014

I'm not sure I follow what you mean by testing the resulting graphics contexts? You're saying capture the stream of backend calls and test against some known values? That might make sense for some things, but not in the general case. There are many changes that get made to matplotlib that change the sequencing or content of the backend graphics context but that are meant to have no net effect on the resulting image. That's why the tests are designed to be as end-to-end as possible, and, for example, we don't compare PDF output but the results of rendering that PDF output.

That said, of course testing in smaller pieces and at lower levels is extremely valuable and we don't have enough of it. I just see that as adjunct to what we already are doing, not a replacement for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants