Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix broken test result TestAtlanticProfiles #671

Merged
merged 1 commit into from Aug 15, 2013

Conversation

cpelley
Copy link

@cpelley cpelley commented Aug 2, 2013

This commit increases the graphic tolerance for the tests. Visual
comparison of the test failure has been made and confirmation
that it is correct.

@ajdawson
Copy link
Member

ajdawson commented Aug 6, 2013

This changes the default tolerance for all image tests though doesn't it? Is that desirable? Would it be better just to override the tolerance parameter for this particular test?

When I created the Atlantic profiles example I tested it on two machines both using matplotlib v1.2.0 and it passed on each (although I created it on one of them so it is bound to pass there). Since the example tests don't run on Travis it is particularly difficult to decide if a custom tolerance is required when creating an example. Did you just run this on your own machine? I'd be interested in the version of matplotlib used because I'm not really sure why the images would be so different.

@esc24
Copy link
Member

esc24 commented Aug 8, 2013

I agree with you @ajdawson. @cpelley's not around until next week. I'm sure he'll respond then.

@cpelley
Copy link
Author

cpelley commented Aug 12, 2013

I'm using matplotlib v1.2.0 and running these tests on my own machine.
This will change the tolerances for all images yes. The existing set tolerance encompasses the existing unittests (prior to the one you added). The existing number was already chosen on this basis (as I am doing in this PR). In reality, this number will different for the various graphical unittests we have i.e. on the basis of the argument above, all images should have their own tolerance not just this one.

Since this number cannot be predicted (we cannot know how hardware/software changes might effect the result) as you have found, what is important is encapsulating the set of tests with a suitable tolerance to reduce this occurrence of test failure in future.

@ajdawson
Copy link
Member

I was just concerned because _DEFAULT_IMAGE_TOLERANCE is used for all tests that check graphics not just the example tests, mainly because if the tolerance is too high we might start missing things that are genuine failures.

@cpelley
Copy link
Author

cpelley commented Aug 12, 2013

...missing things that are genuine failures.

@ajdawson I understand your concern, however, my point is that the current set tolerance may already be doing this. The value already being used was chosen on exactly the same basis as the one for this PR (making this chosen value no less valid than the one before it). The problem you describe is how graphics unittests are implemented, not I think on the change which is this PR.

To restate, the tolerance is not calculated, cannot be predicted and differences are also subject to problem-specific variation.
If specifying a particular tolerance on a specific problem is truly necessary, then this should be done on ALL graphical unittests.

In answer to your concern, one possibility would be to write a set of unittests that describe the boundary between those differences which are acceptable and those which are not. However, writing such an exhaustive set of unittests may be unfeasible?! i.e. trying to encompasses all types of differences.
Alternatively, a better approach for graphical unittests in general would be to have a number which describes the distribution of these differences as apposed to just a single number. In any case, I still believe these ideas are out of scope of this PR.

@ajdawson
Copy link
Member

To restate, the tolerance is not calculated, cannot be predicted and differences are also subject to problem-specific variation.

Yeah I understand the issue and that it is a deeply rooted problem. I don't necessarily disagree with the change in this PR, I just wanted to make certain that is has been thought through and discussed properly.

In any case, I still believe these ideas are out of scope of this PR.

I agree.

This commit increases the graphic tolerance this specific tests.
Visual comparison of the test failure has been made and confirmation
that it is correct.
@cpelley
Copy link
Author

cpelley commented Aug 14, 2013

I have updated my commit to use problem specific tolerance for this test as requested.
Have raised a separate issue #692, which is to look into this problem in general.

Travis failure is not related to these changes, refer to: #688

Ready review.

esc24 added a commit that referenced this pull request Aug 15, 2013
BUG: Fix broken test result TestAtlanticProfiles
@esc24 esc24 merged commit 7834706 into SciTools:master Aug 15, 2013
@pelson
Copy link
Member

pelson commented Aug 21, 2013

@ajdawson - just checking, did you originally generate this file with mpl 1.3?

@ajdawson
Copy link
Member

I'm pretty much certain it was with 1.2, the version I use inside my iris development virtualenv.

@pelson
Copy link
Member

pelson commented Aug 21, 2013

ok. thanks, just I came across this issue separately and assumed it had been a v1.3.0 issue.

Ah, actually, could it be you're using v1.2.1? I fixed a pixel difference between v1.2.0 and v1.2.1 which could be having a similar impact.

Cheers!

@ajdawson
Copy link
Member

I'm using v1.2.0, just checked.

@ajdawson
Copy link
Member

I guess I could have messed up somehow, but the image compares fine on another machine I have that uses v1.2.0 of matplotlib...

@ajdawson
Copy link
Member

@pelson - could it be something to do with the behaviour of tight_layout? I'm not familiar with how that works.

@pelson
Copy link
Member

pelson commented Aug 22, 2013

could it be something to do with the behaviour of tight_layout? I'm not familiar with how that works.

Not sure tbh. If I get a chance, I'll have a go at reproducing the image exactly and then using git blame on mpl to figure out what feature has caused it.

@ajdawson
Copy link
Member

I've just discovered something worrying... on my system at least, the graphics test does not distinguish a difference between the correct version of the first plot and the same plot upside down! This is crazy because it moves the plot lines and the y-axis labels. I tested this by commenting out plt.gca().invert_yaxis, and the test still passes. Quick checking of the results images from the test shows that they are indeed different:

test_atlantic_profiles testatlanticprofiles test_atlantic_profiles 0
result-test_atlantic_profiles testatlanticprofiles test_atlantic_profiles 0

... but no test failure.

I thought that something my system/virtualenv may be broken so I tried modifying the plot in other ways (changed the color of the text in the top x-axis) and the test fails as expected. Can you think of any reason this may be happening? I fiddled with the tolerance and it made no difference... There is something might strange going on here!

@ajdawson
Copy link
Member

I fiddled with the tolerance and it made no difference

This is not the full story. I only reset tolerance to the test suite default.

I tested again and made the tolerance smaller and I was able to get the test to fail, but get this, with a tolerance of 0.00043. I'm struggling to believe that these plots can't be distinguished at a tolerance higher than that... The existence of the very discussion I'm writing in suggests this cannot be correct!

@pelson
Copy link
Member

pelson commented Aug 22, 2013

There is something mighty strange going on here!

I think this is fixed by matplotlib/matplotlib#1291 - if I remember correctly, the RMS being computed incorrectly....

@ajdawson
Copy link
Member

@pelson do you know which released version of matplotlib is that fix included in?

@pelson
Copy link
Member

pelson commented Aug 22, 2013

Just learnt something new about git (from matplotlib clone):

$> git tag --contains f5d86b1
v1.3.0rc1
v1.3.0rc2
v1.3.0rc3
v1.3.0rc4

Don't now how much I trust it, since v1.3.0 isn't actually in there, but looks like it may be v1.3.0 ...

EDIT: I need to do a fetch before I make statements like that 😉. It is in v1.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants