Graphics test change to rehash every time #4640

wjbenfold · 2022-03-15T16:00:17Z

🚀 Pull Request

Description

This changes the graphics testing setup to re-phash images (found in the test data under test_data/images) every time to check them against the test results.

I've also pulled out the graphics testing infrastructure into a subfolder of iris/tests

Feedback requested on:

Overall approach
Structure of directories / files
The current implementation of idiff checks that the phash isn't already in repodata.json. I've removed repodata.json so I'm checking the new file against being the same as any existing ones. This could probably be improved, but I'm not sure what's better to test against.

Known to-dos:

Attach decorators to skip tests that use reference images when test data is missing
Documentation (specifically developers guide) update
What's new

Consult Iris pull request check list

rcomer · 2022-03-28T12:21:52Z

Update decorators so that tests marked skip_plot are also skipped if there's no test data

Note that not all tests marked @skip_plot use a reference image. E.g none of these I think: https://github.com/SciTools/iris/blob/main/lib/iris/tests/unit/plot/test_contourf.py

wjbenfold · 2022-03-28T12:34:33Z

Note that not all tests marked @skip_plot use a reference image. E.g none of these I think: https://github.com/SciTools/iris/blob/main/lib/iris/tests/unit/plot/test_contourf.py

Good point - have changed the to-do

rcomer · 2022-03-28T12:52:20Z

Or we could be really radical and drop both @skip_plot and @skip_data altogether. We already removed the “minimal” version of the tests from the CI at #4218. Matplotlib is listed as a core dependency so it seems unlikely that anyone would run the tests without it. We could also just insist that if you’re going to run the tests, you need the test data. I’m not sure under what circumstances you’d know it was safe to only use the tests that don’t need the test data.

Edit: see also discussion at #4179

wjbenfold · 2022-03-28T13:37:59Z

Matplotlib is listed as a core dependency

Why do we spend some much time checking if it exists then? :/

iris/setup.cfg

Lines 47 to 52 in ebc2039

    
           install_requires = 
        
               cartopy>=0.20 
        
               cf-units>=3 
        
               cftime>=1.5.0 
        
               dask[array]>=2 
        
               matplotlib

iris/lib/iris/tests/__init__.py

Lines 58 to 74 in ebc2039

    
           # Test for availability of matplotlib. 
        
           # (And remove matplotlib as an iris.tests dependency.) 
        
           try: 
        
               import matplotlib 
        
               # Override any user settings e.g. from matplotlibrc file. 
        
               matplotlib.rcdefaults() 
        
               # Set backend *after* rcdefaults, as we don't want that overridden (#3846). 
        
               matplotlib.use("agg") 
        
               # Standardise the figure size across matplotlib versions. 
        
               # This permits matplotlib png image comparison. 
        
               matplotlib.rcParams["figure.figsize"] = [8.0, 6.0] 
        
               import matplotlib.pyplot as plt 
        
           except ImportError: 
        
               MPL_AVAILABLE = False 
        
           else: 
        
               MPL_AVAILABLE = True

I’m not sure under what circumstances you’d know it was safe to only use the tests that don’t need the test data.

I'm pretty sure I've done this when I've put Iris on a non-internetted machine and wanted to check the install was basically functional? Probably not a huge extra effort to grab the test data though

trexfeathers · 2022-03-30T15:03:40Z

Just so you know... offline discussions have been developing the concept of 'core Iris' to the point that Matplotlib becomes an optional dependency - only some use cases include visualisation.

At the same time, the impending inclusion of more optional dependencies for Mesh handling - each of which brings a sizeable dependency stack - has lead to more granular tests being on the table again:

Iris tests with core dependencies only
Avoids spending too much resource on environment construction
1 or more test sessions of Iris working with its optional dependencies
More resource on environment construction, but way fewer tests to complete

Of course if this became a reality then we'd probably need a more 'global' way of skipping optional dependencies, rather than the decorators we have now.

pp-mo · 2022-04-13T11:05:01Z

N.B. #4465 refers

bjlittle

@wjbenfold I really love your thinking and the way you've approached this, it seems like we're on the cusp of a much more robust graphical workflow for testing, but I have one high level recommendation for change, and one objection.

Firstly, given the accompanying PR SciTools/iris-test-data#71, we could take this opportunity to completely rationalise the proliferation of references images that we care about. The genius game changer that you've tapped into there is that we version the iris-test-data repo. Therefore, there is now no reason to explicitly maintain several baseline images for a graphical test; only keep the latest baseline image i.e., one test = one image (we still need to support graphical tests that generate multiple images from multiple sub-tests within that one test). Previously we'd explicitly maintained baseline image history within the imagerepo.json file, but now we don't need to do that as the history of the baseline images per test is implicitly recorded for free by GH versioning the iris-test-data repo through time. All we need to do now is also lock down the version of the test data that is valid for the tests along with the underlying software dependencies that we're collectively running testing with and I think this will make a much more maintainable workflow. (As an aside, I've got thoughts on how we can automate the availability and verification of test data based on versioning - but that's for another time)

Secondly, I completely understand your motivation to dump the imagerepo.json file, and calculate the expected perceptual image hash at runtime. However, this will make graphical testing slower (which for me, is neither here nor there really, it shouldn't incur a massive additional cost, so whatevz really) BUT, more importantly, we're now exposing ourselves to not knowing when the expected perceptual image hash of a baseline image changes. That for me is a bit concerning, and after a bit of thought I'm not in favour of your proposed approach (happy to be convinced otherwise though). I think it's important for us to know exactly when the hash of the baseline image has unexpectedly changed and appropriately manage the fallout (and that typically manifests itself though a graphical test failure due to a larger hamming distance with the associated actual result image hash)... but this fallout will be massively easier to manage given your lovely proposal to version the baseline images. So, that said, I'd prefer that we maintained a single pre-computed expected perceptual image hash associated with each test (and each sub-test within a test) - IMHO there is still important value in doing that.

Anyways, happy to discuss

bjlittle · 2022-05-05T14:13:22Z

requirements/ci/py38.yml

@@ -34,7 +34,7 @@ dependencies:
  - filelock
  - imagehash >=4.0
  - nose
-  - pillow <7
+  - pillow


@wjbenfold We no longer need pillow as an explicit top level dependency.

It'll be pulled in by imagehash, and also other packages such as matplotlib and cartopy

bjlittle · 2022-05-06T06:52:44Z

lib/iris/tests/__init__.py

+                        with open(data_path, "wb") as fh:
+                            fh.writelines(gz_fh)
+
+        return data_path


@wjbenfold Nice, thanks for aligning get_data_path with get_result_path 👍

They have the same use case and yet through the mists of time one was kept as a function whilst the other as a static method. At least now they are in they are defined in the same namespace and used in the same way 😄

bjlittle · 2022-05-06T07:14:06Z

lib/iris/tests/__init__.py

-
-        finally:
-            plt.close()
+        graphics.check_graphic(self)


@wjbenfold Interesting 🤔

I understand the motivation to corral all the graphic related infrastructure into one module 👍

However this one line is kinda strongly hinting that the definition of check_graphic belongs within the class as a method and not as a method simply wrapping a function. Also, the graphics.check_graphic function isn't called directly from anywhere else in the codebase (that I can see, so correctly me if I'm wrong here), which would bolster the argument of detaching it and making it a function.

I'd personally rather go one way or the other and not hybrid like this, which seems like an artificial indirection (again, although I understand why you've done it).

bjlittle · 2022-05-06T07:25:16Z

lib/iris/tests/graphics/__init__.py

+import unittest
+
+import numpy as np
+


@wjbenfold Might be good to define a __all__ as well?

wjbenfold · 2022-05-09T09:18:56Z

@bjlittle To refer back to what the imagehash was in the past, it sounds like a little json as a repository of our imagehashes might just do the job...

Said imagerepo.json (or another name if leaving it the same will cause confusion) would conveniently allow us to run the plotting tests even in the absence of the test data (though is such a state we wouldn't be robust to the version of imagehash/pillow having changed). Do you think that's a feature to add, or shall we just update it so that the change to an imagehash shows up in the diff but never read from it?

bjlittle · 2022-05-10T07:59:54Z

@wjbenfold I'd recommend keeping state in json. In fact we're headed in that direction when we adopt pytest-mpl.

For me the goal here is to reduce the maintenance burden when imagehash/pillow causes changes, not for graphical tests to silently continue passing by adaptively changing the expected result on the fly. In hindsight, our historical mistake was not versioning the image repo to preserve known baseline snapshots, and also attempting to cope with such dependency changes through a maze of soft links, which compounds the problem.

We can certainly avoid all that, given your approach, but for me it's important that we still know when a change in dependencies causes graphical test failures, and in order to do that we need the hashes of the baseline images captured; that way we know exactly what we're aiming for.

I think this is a minor change, but makes a big difference.

wjbenfold · 2022-05-20T13:50:05Z

Replaced by #4759

Will Benfold added 6 commits March 14, 2022 17:18

Use phash every time

76de5bb

Rearrange structure, share functionality, introduce pathlib

2138888

Improvements

7835e14

Only define prefix once

02198a8

Reduce diff

1389539

Reduce diff

f841bbf

wjbenfold marked this pull request as ready for review April 8, 2022 16:09

bjlittle self-assigned this Apr 26, 2022

bjlittle self-requested a review April 26, 2022 10:39

bjlittle added Type: Enhancement Type: Infrastructure New: Pull Request labels Apr 26, 2022

bjlittle requested changes May 6, 2022

View reviewed changes

wjbenfold closed this May 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graphics test change to rehash every time #4640

Graphics test change to rehash every time #4640

wjbenfold commented Mar 15, 2022 •

edited

Loading

rcomer commented Mar 28, 2022

wjbenfold commented Mar 28, 2022

rcomer commented Mar 28, 2022 •

edited

Loading

wjbenfold commented Mar 28, 2022 •

edited

Loading

trexfeathers commented Mar 30, 2022

pp-mo commented Apr 13, 2022

bjlittle left a comment

bjlittle May 5, 2022

bjlittle May 6, 2022

bjlittle May 6, 2022

bjlittle May 6, 2022

wjbenfold commented May 9, 2022

bjlittle commented May 10, 2022

wjbenfold commented May 20, 2022

Graphics test change to rehash every time #4640

Graphics test change to rehash every time #4640

Conversation

wjbenfold commented Mar 15, 2022 • edited Loading

🚀 Pull Request

Description

rcomer commented Mar 28, 2022

wjbenfold commented Mar 28, 2022

rcomer commented Mar 28, 2022 • edited Loading

wjbenfold commented Mar 28, 2022 • edited Loading

trexfeathers commented Mar 30, 2022

pp-mo commented Apr 13, 2022

bjlittle left a comment

Choose a reason for hiding this comment

bjlittle May 5, 2022

Choose a reason for hiding this comment

bjlittle May 6, 2022

Choose a reason for hiding this comment

bjlittle May 6, 2022

Choose a reason for hiding this comment

bjlittle May 6, 2022

Choose a reason for hiding this comment

wjbenfold commented May 9, 2022

bjlittle commented May 10, 2022

wjbenfold commented May 20, 2022

wjbenfold commented Mar 15, 2022 •

edited

Loading

rcomer commented Mar 28, 2022 •

edited

Loading

wjbenfold commented Mar 28, 2022 •

edited

Loading