Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphics test change to rehash every time #4640

Closed

Conversation

wjbenfold
Copy link
Contributor

@wjbenfold wjbenfold commented Mar 15, 2022

🚀 Pull Request

Description

This changes the graphics testing setup to re-phash images (found in the test data under test_data/images) every time to check them against the test results.

I've also pulled out the graphics testing infrastructure into a subfolder of iris/tests

Feedback requested on:

  • Overall approach
  • Structure of directories / files
  • The current implementation of idiff checks that the phash isn't already in repodata.json. I've removed repodata.json so I'm checking the new file against being the same as any existing ones. This could probably be improved, but I'm not sure what's better to test against.

Known to-dos:

  • Attach decorators to skip tests that use reference images when test data is missing
  • Documentation (specifically developers guide) update
  • What's new

Consult Iris pull request check list

@rcomer
Copy link
Member

rcomer commented Mar 28, 2022

Update decorators so that tests marked skip_plot are also skipped if there's no test data

Note that not all tests marked @skip_plot use a reference image. E.g none of these I think: https://github.com/SciTools/iris/blob/main/lib/iris/tests/unit/plot/test_contourf.py

@wjbenfold
Copy link
Contributor Author

Note that not all tests marked @skip_plot use a reference image. E.g none of these I think: https://github.com/SciTools/iris/blob/main/lib/iris/tests/unit/plot/test_contourf.py

Good point - have changed the to-do

@rcomer
Copy link
Member

rcomer commented Mar 28, 2022

Or we could be really radical and drop both @skip_plot and @skip_data altogether. We already removed the “minimal” version of the tests from the CI at #4218. Matplotlib is listed as a core dependency so it seems unlikely that anyone would run the tests without it. We could also just insist that if you’re going to run the tests, you need the test data. I’m not sure under what circumstances you’d know it was safe to only use the tests that don’t need the test data.

Edit: see also discussion at #4179

@wjbenfold
Copy link
Contributor Author

wjbenfold commented Mar 28, 2022

Matplotlib is listed as a core dependency

Why do we spend some much time checking if it exists then? :/

iris/setup.cfg

Lines 47 to 52 in ebc2039

install_requires =
cartopy>=0.20
cf-units>=3
cftime>=1.5.0
dask[array]>=2
matplotlib

# Test for availability of matplotlib.
# (And remove matplotlib as an iris.tests dependency.)
try:
import matplotlib
# Override any user settings e.g. from matplotlibrc file.
matplotlib.rcdefaults()
# Set backend *after* rcdefaults, as we don't want that overridden (#3846).
matplotlib.use("agg")
# Standardise the figure size across matplotlib versions.
# This permits matplotlib png image comparison.
matplotlib.rcParams["figure.figsize"] = [8.0, 6.0]
import matplotlib.pyplot as plt
except ImportError:
MPL_AVAILABLE = False
else:
MPL_AVAILABLE = True

I’m not sure under what circumstances you’d know it was safe to only use the tests that don’t need the test data.

I'm pretty sure I've done this when I've put Iris on a non-internetted machine and wanted to check the install was basically functional? Probably not a huge extra effort to grab the test data though

@trexfeathers
Copy link
Contributor

Just so you know... offline discussions have been developing the concept of 'core Iris' to the point that Matplotlib becomes an optional dependency - only some use cases include visualisation.

At the same time, the impending inclusion of more optional dependencies for Mesh handling - each of which brings a sizeable dependency stack - has lead to more granular tests being on the table again:

  • Iris tests with core dependencies only
    Avoids spending too much resource on environment construction
  • 1 or more test sessions of Iris working with its optional dependencies
    More resource on environment construction, but way fewer tests to complete

Of course if this became a reality then we'd probably need a more 'global' way of skipping optional dependencies, rather than the decorators we have now.

@wjbenfold wjbenfold marked this pull request as ready for review April 8, 2022 16:09
@pp-mo
Copy link
Member

pp-mo commented Apr 13, 2022

N.B. #4465 refers

Copy link
Member

@bjlittle bjlittle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjbenfold I really love your thinking and the way you've approached this, it seems like we're on the cusp of a much more robust graphical workflow for testing, but I have one high level recommendation for change, and one objection.

Firstly, given the accompanying PR SciTools/iris-test-data#71, we could take this opportunity to completely rationalise the proliferation of references images that we care about. The genius game changer that you've tapped into there is that we version the iris-test-data repo. Therefore, there is now no reason to explicitly maintain several baseline images for a graphical test; only keep the latest baseline image i.e., one test = one image (we still need to support graphical tests that generate multiple images from multiple sub-tests within that one test). Previously we'd explicitly maintained baseline image history within the imagerepo.json file, but now we don't need to do that as the history of the baseline images per test is implicitly recorded for free by GH versioning the iris-test-data repo through time. All we need to do now is also lock down the version of the test data that is valid for the tests along with the underlying software dependencies that we're collectively running testing with and I think this will make a much more maintainable workflow. (As an aside, I've got thoughts on how we can automate the availability and verification of test data based on versioning - but that's for another time)

Secondly, I completely understand your motivation to dump the imagerepo.json file, and calculate the expected perceptual image hash at runtime. However, this will make graphical testing slower (which for me, is neither here nor there really, it shouldn't incur a massive additional cost, so whatevz really) BUT, more importantly, we're now exposing ourselves to not knowing when the expected perceptual image hash of a baseline image changes. That for me is a bit concerning, and after a bit of thought I'm not in favour of your proposed approach (happy to be convinced otherwise though). I think it's important for us to know exactly when the hash of the baseline image has unexpectedly changed and appropriately manage the fallout (and that typically manifests itself though a graphical test failure due to a larger hamming distance with the associated actual result image hash)... but this fallout will be massively easier to manage given your lovely proposal to version the baseline images. So, that said, I'd prefer that we maintained a single pre-computed expected perceptual image hash associated with each test (and each sub-test within a test) - IMHO there is still important value in doing that.

Anyways, happy to discuss

@@ -34,7 +34,7 @@ dependencies:
- filelock
- imagehash >=4.0
- nose
- pillow <7
- pillow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjbenfold We no longer need pillow as an explicit top level dependency.

It'll be pulled in by imagehash, and also other packages such as matplotlib and cartopy

with open(data_path, "wb") as fh:
fh.writelines(gz_fh)

return data_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjbenfold Nice, thanks for aligning get_data_path with get_result_path 👍

They have the same use case and yet through the mists of time one was kept as a function whilst the other as a static method. At least now they are in they are defined in the same namespace and used in the same way 😄


finally:
plt.close()
graphics.check_graphic(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjbenfold Interesting 🤔

I understand the motivation to corral all the graphic related infrastructure into one module 👍

However this one line is kinda strongly hinting that the definition of check_graphic belongs within the class as a method and not as a method simply wrapping a function. Also, the graphics.check_graphic function isn't called directly from anywhere else in the codebase (that I can see, so correctly me if I'm wrong here), which would bolster the argument of detaching it and making it a function.

I'd personally rather go one way or the other and not hybrid like this, which seems like an artificial indirection (again, although I understand why you've done it).

import unittest

import numpy as np

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjbenfold Might be good to define a __all__ as well?

@wjbenfold
Copy link
Contributor Author

@bjlittle To refer back to what the imagehash was in the past, it sounds like a little json as a repository of our imagehashes might just do the job...

Said imagerepo.json (or another name if leaving it the same will cause confusion) would conveniently allow us to run the plotting tests even in the absence of the test data (though is such a state we wouldn't be robust to the version of imagehash/pillow having changed). Do you think that's a feature to add, or shall we just update it so that the change to an imagehash shows up in the diff but never read from it?

@bjlittle
Copy link
Member

@wjbenfold I'd recommend keeping state in json. In fact we're headed in that direction when we adopt pytest-mpl.

For me the goal here is to reduce the maintenance burden when imagehash/pillow causes changes, not for graphical tests to silently continue passing by adaptively changing the expected result on the fly. In hindsight, our historical mistake was not versioning the image repo to preserve known baseline snapshots, and also attempting to cope with such dependency changes through a maze of soft links, which compounds the problem.

We can certainly avoid all that, given your approach, but for me it's important that we still know when a change in dependencies causes graphical test failures, and in order to do that we need the hashes of the baseline images captured; that way we know exactly what we're aiming for.

I think this is a minor change, but makes a big difference.

@wjbenfold
Copy link
Contributor Author

Replaced by #4759

@wjbenfold wjbenfold closed this May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants