-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graphics test change to rehash every time #4640
Conversation
Note that not all tests marked |
Good point - have changed the to-do |
Or we could be really radical and drop both Edit: see also discussion at #4179 |
Why do we spend some much time checking if it exists then? :/ Lines 47 to 52 in ebc2039
iris/lib/iris/tests/__init__.py Lines 58 to 74 in ebc2039
I'm pretty sure I've done this when I've put Iris on a non-internetted machine and wanted to check the install was basically functional? Probably not a huge extra effort to grab the test data though |
Just so you know... offline discussions have been developing the concept of 'core Iris' to the point that Matplotlib becomes an optional dependency - only some use cases include visualisation. At the same time, the impending inclusion of more optional dependencies for Mesh handling - each of which brings a sizeable dependency stack - has lead to more granular tests being on the table again:
Of course if this became a reality then we'd probably need a more 'global' way of skipping optional dependencies, rather than the decorators we have now. |
N.B. #4465 refers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjbenfold I really love your thinking and the way you've approached this, it seems like we're on the cusp of a much more robust graphical workflow for testing, but I have one high level recommendation for change, and one objection.
Firstly, given the accompanying PR SciTools/iris-test-data#71, we could take this opportunity to completely rationalise the proliferation of references images that we care about. The genius game changer that you've tapped into there is that we version the iris-test-data
repo. Therefore, there is now no reason to explicitly maintain several baseline images for a graphical test; only keep the latest baseline image i.e., one test = one image (we still need to support graphical tests that generate multiple images from multiple sub-tests within that one test). Previously we'd explicitly maintained baseline image history within the imagerepo.json
file, but now we don't need to do that as the history of the baseline images per test is implicitly recorded for free by GH versioning the iris-test-data
repo through time. All we need to do now is also lock down the version of the test data that is valid for the tests along with the underlying software dependencies that we're collectively running testing with and I think this will make a much more maintainable workflow. (As an aside, I've got thoughts on how we can automate the availability and verification of test data based on versioning - but that's for another time)
Secondly, I completely understand your motivation to dump the imagerepo.json
file, and calculate the expected perceptual image hash at runtime. However, this will make graphical testing slower (which for me, is neither here nor there really, it shouldn't incur a massive additional cost, so whatevz really) BUT, more importantly, we're now exposing ourselves to not knowing when the expected perceptual image hash of a baseline image changes. That for me is a bit concerning, and after a bit of thought I'm not in favour of your proposed approach (happy to be convinced otherwise though). I think it's important for us to know exactly when the hash of the baseline image has unexpectedly changed and appropriately manage the fallout (and that typically manifests itself though a graphical test failure due to a larger hamming distance with the associated actual result image hash)... but this fallout will be massively easier to manage given your lovely proposal to version the baseline images. So, that said, I'd prefer that we maintained a single pre-computed expected perceptual image hash associated with each test (and each sub-test within a test) - IMHO there is still important value in doing that.
Anyways, happy to discuss
@@ -34,7 +34,7 @@ dependencies: | |||
- filelock | |||
- imagehash >=4.0 | |||
- nose | |||
- pillow <7 | |||
- pillow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjbenfold We no longer need pillow
as an explicit top level dependency.
It'll be pulled in by imagehash
, and also other packages such as matplotlib
and cartopy
with open(data_path, "wb") as fh: | ||
fh.writelines(gz_fh) | ||
|
||
return data_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjbenfold Nice, thanks for aligning get_data_path
with get_result_path
👍
They have the same use case and yet through the mists of time one was kept as a function whilst the other as a static method. At least now they are in they are defined in the same namespace and used in the same way 😄
|
||
finally: | ||
plt.close() | ||
graphics.check_graphic(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjbenfold Interesting 🤔
I understand the motivation to corral all the graphic related infrastructure into one module 👍
However this one line is kinda strongly hinting that the definition of check_graphic
belongs within the class as a method and not as a method simply wrapping a function. Also, the graphics.check_graphic
function isn't called directly from anywhere else in the codebase (that I can see, so correctly me if I'm wrong here), which would bolster the argument of detaching it and making it a function.
I'd personally rather go one way or the other and not hybrid like this, which seems like an artificial indirection (again, although I understand why you've done it).
import unittest | ||
|
||
import numpy as np | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjbenfold Might be good to define a __all__
as well?
@bjlittle To refer back to what the imagehash was in the past, it sounds like a little json as a repository of our imagehashes might just do the job... Said |
@wjbenfold I'd recommend keeping state in For me the goal here is to reduce the maintenance burden when imagehash/pillow causes changes, not for graphical tests to silently continue passing by adaptively changing the expected result on the fly. In hindsight, our historical mistake was not versioning the image repo to preserve known baseline snapshots, and also attempting to cope with such dependency changes through a maze of soft links, which compounds the problem. We can certainly avoid all that, given your approach, but for me it's important that we still know when a change in dependencies causes graphical test failures, and in order to do that we need the hashes of the baseline images captured; that way we know exactly what we're aiming for. I think this is a minor change, but makes a big difference. |
Replaced by #4759 |
🚀 Pull Request
Description
This changes the graphics testing setup to re-phash images (found in the test data under test_data/images) every time to check them against the test results.
I've also pulled out the graphics testing infrastructure into a subfolder of iris/tests
Feedback requested on:
repodata.json
. I've removedrepodata.json
so I'm checking the new file against being the same as any existing ones. This could probably be improved, but I'm not sure what's better to test against.Known to-dos:
Consult Iris pull request check list