BUG: Fix broken test result TestAtlanticProfiles #671

cpelley · 2013-08-02T10:00:38Z

This commit increases the graphic tolerance for the tests. Visual
comparison of the test failure has been made and confirmation
that it is correct.

ajdawson · 2013-08-06T15:18:52Z

This changes the default tolerance for all image tests though doesn't it? Is that desirable? Would it be better just to override the tolerance parameter for this particular test?

When I created the Atlantic profiles example I tested it on two machines both using matplotlib v1.2.0 and it passed on each (although I created it on one of them so it is bound to pass there). Since the example tests don't run on Travis it is particularly difficult to decide if a custom tolerance is required when creating an example. Did you just run this on your own machine? I'd be interested in the version of matplotlib used because I'm not really sure why the images would be so different.

esc24 · 2013-08-08T12:55:53Z

I agree with you @ajdawson. @cpelley's not around until next week. I'm sure he'll respond then.

cpelley · 2013-08-12T08:37:03Z

I'm using matplotlib v1.2.0 and running these tests on my own machine.
This will change the tolerances for all images yes. The existing set tolerance encompasses the existing unittests (prior to the one you added). The existing number was already chosen on this basis (as I am doing in this PR). In reality, this number will different for the various graphical unittests we have i.e. on the basis of the argument above, all images should have their own tolerance not just this one.

Since this number cannot be predicted (we cannot know how hardware/software changes might effect the result) as you have found, what is important is encapsulating the set of tests with a suitable tolerance to reduce this occurrence of test failure in future.

ajdawson · 2013-08-12T08:43:04Z

I was just concerned because _DEFAULT_IMAGE_TOLERANCE is used for all tests that check graphics not just the example tests, mainly because if the tolerance is too high we might start missing things that are genuine failures.

cpelley · 2013-08-12T12:21:39Z

...missing things that are genuine failures.

@ajdawson I understand your concern, however, my point is that the current set tolerance may already be doing this. The value already being used was chosen on exactly the same basis as the one for this PR (making this chosen value no less valid than the one before it). The problem you describe is how graphics unittests are implemented, not I think on the change which is this PR.

To restate, the tolerance is not calculated, cannot be predicted and differences are also subject to problem-specific variation.
If specifying a particular tolerance on a specific problem is truly necessary, then this should be done on ALL graphical unittests.

In answer to your concern, one possibility would be to write a set of unittests that describe the boundary between those differences which are acceptable and those which are not. However, writing such an exhaustive set of unittests may be unfeasible?! i.e. trying to encompasses all types of differences.
Alternatively, a better approach for graphical unittests in general would be to have a number which describes the distribution of these differences as apposed to just a single number. In any case, I still believe these ideas are out of scope of this PR.

ajdawson · 2013-08-12T12:49:46Z

To restate, the tolerance is not calculated, cannot be predicted and differences are also subject to problem-specific variation.

Yeah I understand the issue and that it is a deeply rooted problem. I don't necessarily disagree with the change in this PR, I just wanted to make certain that is has been thought through and discussed properly.

In any case, I still believe these ideas are out of scope of this PR.

I agree.

This commit increases the graphic tolerance this specific tests. Visual comparison of the test failure has been made and confirmation that it is correct.

cpelley · 2013-08-14T09:48:40Z

I have updated my commit to use problem specific tolerance for this test as requested.
Have raised a separate issue #692, which is to look into this problem in general.

Travis failure is not related to these changes, refer to: #688

Ready review.

BUG: Fix broken test result TestAtlanticProfiles

pelson · 2013-08-21T14:37:56Z

@ajdawson - just checking, did you originally generate this file with mpl 1.3?

ajdawson · 2013-08-21T14:42:12Z

I'm pretty much certain it was with 1.2, the version I use inside my iris development virtualenv.

pelson · 2013-08-21T14:47:56Z

ok. thanks, just I came across this issue separately and assumed it had been a v1.3.0 issue.

Ah, actually, could it be you're using v1.2.1? I fixed a pixel difference between v1.2.0 and v1.2.1 which could be having a similar impact.

Cheers!

ajdawson · 2013-08-21T14:57:47Z

I'm using v1.2.0, just checked.

ajdawson · 2013-08-21T14:58:38Z

I guess I could have messed up somehow, but the image compares fine on another machine I have that uses v1.2.0 of matplotlib...

ajdawson · 2013-08-22T07:53:36Z

@pelson - could it be something to do with the behaviour of tight_layout? I'm not familiar with how that works.

pelson · 2013-08-22T07:54:55Z

could it be something to do with the behaviour of tight_layout? I'm not familiar with how that works.

Not sure tbh. If I get a chance, I'll have a go at reproducing the image exactly and then using git blame on mpl to figure out what feature has caused it.

ajdawson · 2013-08-22T08:19:34Z

I've just discovered something worrying... on my system at least, the graphics test does not distinguish a difference between the correct version of the first plot and the same plot upside down! This is crazy because it moves the plot lines and the y-axis labels. I tested this by commenting out plt.gca().invert_yaxis, and the test still passes. Quick checking of the results images from the test shows that they are indeed different:

... but no test failure.

I thought that something my system/virtualenv may be broken so I tried modifying the plot in other ways (changed the color of the text in the top x-axis) and the test fails as expected. Can you think of any reason this may be happening? I fiddled with the tolerance and it made no difference... There is something might strange going on here!

ajdawson · 2013-08-22T08:38:49Z

I fiddled with the tolerance and it made no difference

This is not the full story. I only reset tolerance to the test suite default.

I tested again and made the tolerance smaller and I was able to get the test to fail, but get this, with a tolerance of 0.00043. I'm struggling to believe that these plots can't be distinguished at a tolerance higher than that... The existence of the very discussion I'm writing in suggests this cannot be correct!

pelson · 2013-08-22T08:39:39Z

There is something mighty strange going on here!

I think this is fixed by matplotlib/matplotlib#1291 - if I remember correctly, the RMS being computed incorrectly....

ajdawson · 2013-08-22T08:43:17Z

@pelson do you know which released version of matplotlib is that fix included in?

pelson · 2013-08-22T08:51:04Z

Just learnt something new about git (from matplotlib clone):

$> git tag --contains f5d86b1
v1.3.0rc1
v1.3.0rc2
v1.3.0rc3
v1.3.0rc4

Don't now how much I trust it, since v1.3.0 isn't actually in there, but looks like it may be v1.3.0 ...

EDIT: I need to do a fetch before I make statements like that 😉. It is in v1.3.0.

BUG: Fix broken test result TestAtlanticProfiles

1dfeb93

This commit increases the graphic tolerance this specific tests. Visual comparison of the test failure has been made and confirmation that it is correct.

esc24 added a commit that referenced this pull request Aug 15, 2013

Merge pull request #671 from cpelley/fix_atlanticprof_test

7834706

BUG: Fix broken test result TestAtlanticProfiles

esc24 merged commit 7834706 into SciTools:master Aug 15, 2013

ajdawson mentioned this pull request Aug 22, 2013

y-axis inversion for vertical coords #522

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix broken test result TestAtlanticProfiles #671

BUG: Fix broken test result TestAtlanticProfiles #671

cpelley commented Aug 2, 2013

ajdawson commented Aug 6, 2013

esc24 commented Aug 8, 2013

cpelley commented Aug 12, 2013

ajdawson commented Aug 12, 2013

cpelley commented Aug 12, 2013

ajdawson commented Aug 12, 2013

cpelley commented Aug 14, 2013

pelson commented Aug 21, 2013

ajdawson commented Aug 21, 2013

pelson commented Aug 21, 2013

ajdawson commented Aug 21, 2013

ajdawson commented Aug 21, 2013

ajdawson commented Aug 22, 2013

pelson commented Aug 22, 2013

ajdawson commented Aug 22, 2013

ajdawson commented Aug 22, 2013

pelson commented Aug 22, 2013

ajdawson commented Aug 22, 2013

pelson commented Aug 22, 2013

BUG: Fix broken test result TestAtlanticProfiles #671

BUG: Fix broken test result TestAtlanticProfiles #671

Conversation

cpelley commented Aug 2, 2013

ajdawson commented Aug 6, 2013

esc24 commented Aug 8, 2013

cpelley commented Aug 12, 2013

ajdawson commented Aug 12, 2013

cpelley commented Aug 12, 2013

ajdawson commented Aug 12, 2013

cpelley commented Aug 14, 2013

pelson commented Aug 21, 2013

ajdawson commented Aug 21, 2013

pelson commented Aug 21, 2013

ajdawson commented Aug 21, 2013

ajdawson commented Aug 21, 2013

ajdawson commented Aug 22, 2013

pelson commented Aug 22, 2013

ajdawson commented Aug 22, 2013

ajdawson commented Aug 22, 2013

pelson commented Aug 22, 2013

ajdawson commented Aug 22, 2013

pelson commented Aug 22, 2013