Replace pyke #4170

pp-mo · 2021-06-02T13:16:13Z

🚀 Pull Request

Description

Eventually, will address #3415

Technical details of approach

I started by writing a drop-in replacement approach to replace Pyke rules with so-called 'action' routines.
- at present it purposely mimics the structure of existing Pyke rules, plus the three-stage translation process built into
  iris.fileformats.netcdf._load_cube , i.e. broadly assert facts; run rules; complete cube
- it does that by providing a new thin wrapper object which can mimic the Pyke "engine".
- for now, I have retained the current approach to storing Iris info on the 'engine' (example), and also mimicked the storage of intermediate 'facts' there too.
- some of that could well simplify in future, but I'm aiming to first ensure equivalence with the exisiting behaviour
- N.B. currently incomplete, have only so far done the 'major' rules (grid-mapping and dim-coords).
I am also creating some targetted testing for this ...
- the choice of tests is heavily informed by the structure of the existing rules
- the area tested, essentially what is currently "the Pyke rules bit", is not maybe that obvious a separate distinct category ..
  - .. but it should be an appropriate starting point for testing the 'behavioural equivalence'
  - I intend eventually to test for each code-branch of the rules behaviour
- ... so, again, in future we could simplify/rework this

Current progress + status summary

Done

actions + tests for
- grid-mappings + dim-coords
- time-coords
- hybrid vertical coords

Todo:

some areas are still unaddressed - lacking both actions implementation + tests
known problems
1. already observed that actions do not match original behaviour for some of the grid-mapping/dimcoords areas
  - ( intend to complete the test coverage before addressing this )
2. missing parts of grid-mapping/dimcoords tests :
  - still not exercising the following rules at all:
    - fc_build_auxiliary_coordinate_latitude
    - fc_build_auxiliary_coordinate_latitude_rotated
    - fc_build_auxiliary_coordinate_longitude
    - fc_build_auxiliary_coordinate_longitude_rotated
3. odd extras : Ukmo-attributes ; label-coords ; cell-measures; ancillary-variables
Later (intended) stages :
1. get all functionality tested + working with existing (Pyke) code
2. add actions missing for TODO features
3. get all tests pasing with actions implementation
4. check that all Iris tests run with temporary shim : do all netcdf loads both ways + check results are the same
5. remove the Pyke implementation (+dependency), and tidy away all the scaffolding

NOTE: I'm assuming that we will have enough confidence to "simply replace", which means

everything goes via "new route"
completely removed the old Pyke-based rules (and dependency)
don't need, or have, any 'Future'-style switch for user to revert to 'old' mechanics

Consult Iris pull request check list

pp-mo · 2021-06-02T13:37:55Z

@trexfeathers if you can find a moment, I'd be interested to know if you can work through what's going on here ?

Especially, I'd like to think that once we're confident that behaviour is preserved, we can subsequently remove some of the scaffolding which is the legacy of "how it used to be done".

So, I'm hoping at some point to run all exisiting Iris tests, with changes to run any netcdf loading via both routes, and check for identical results in all cases.
With that, and given ~full codebranch testing (with the tests I'm still writing), I hope we can feel confident to simply replace the "old way" with the "new" (and drop the Pyke dependency).

The 'engine mimic' approach I'm using should enable this extensive equivalence testing, and also to focus on just replacing the 'rules' with 'actions'.
But obviously that is going to leave a good deal of unnecessary complication in place.
I'm hoping I can complete the main effort by next week, and hopefully there will also be some time to introduce some subsequent simplification (i.e. remove the worst vestiges of the 'old way').

pp-mo · 2021-06-02T17:24:52Z

Note: at this point [i.e. commit https://github.com//pull/4170/commits/5f18f2369b8ccc9e6844825dbf2af356abcfbf83], removing the skip on the non-Pyke testing shows quite a few anomalies.
I'm going to ignore that for now, while aiming to complete test coverage.

TBH, writing this test suite is taking rather longer than I had thought.

DPeterK · 2021-06-03T08:58:29Z

@pp-mo this code might be of interest to the conversation here: https://github.com/informatics-lab/tiledb_netcdf/blob/master/nctotdb/data_model.py#L11. I had a go at writing a non-Pyke NetCDF Dataset "understander" recently, too... It functions, but isn't full-featured, as I thnk there are some cf-netcdf elements it still ignores. Might provide some food for thought

pp-mo · 2021-06-04T08:44:18Z

@DPeterK might be of interest to the conversation

Thanks! 💐 Interesting, but it goes a long way beyond what I'm trying to do here, as a lot of what it does, like classifying variables and identifying which are coords etc, is for us handled in iris.fileformats.cf. Which I'm purposely not touching at this point.

At some point, we should probably re-consider our division of the task into the cf.py file-structure analysis and "rules", but that's not what this is about (if I can help it!).

Another goal would be to split out the CF "understanding" from Iris, and make it a more generally usable solutuon. I think that would be really useful, but it means that the output of the 'build' process must be in some Iris-agnostic form.
I think a suitable "common currency" for that could be derived from that of iris.fileformats.rules, but again it's very much a future project.

lib/iris/fileformats/_nc_load_rules/actions.py

lib/iris/fileformats/_nc_load_rules/helpers.py

pp-mo · 2021-06-07T09:38:01Z

👀 @trexfeathers @stephenpascoe @bjlittle
Just fixed the issue header : updated account of "progress, status + plans".

pp-mo · 2021-06-08T11:36:49Z

(Updated status in header, again)

DPeterK · 2021-06-08T12:36:12Z

it goes a long way beyond what I'm trying to do here

Sorry! I mistook the scope of this particular bit of work (or misremembered what the pyke rules do!)

Another goal would be to split out the CF "understanding" from Iris

Doing this would be super helpful! 💯

pp-mo · 2021-06-08T20:32:51Z

So, I believe the tests now at least exercise all the Pyke rules.
Hopefully they also cover all the rule activation patterns -- well I hope the important ones, at least.
Awaiting your views on that @stephenworsley !

Next up: I will progressively extend + debug the 'actions', so that they match rules behaviour in all the tests.

pp-mo · 2021-06-08T20:35:14Z

N.B. the test failure is yet another link-test failure 😞
But we can just ignore that : all unit tests are passing OK.

rcomer · 2021-06-08T20:39:20Z

@pp-mo I am pretty sure that link has been fixed in master, so should pass if you rebase 👍

lib/iris/tests/unit/fileformats/netcdf/load_cube/load_cube__activate/test__grid_mappings.py

pp-mo · 2021-06-09T09:04:57Z

Rebased onto master, to pick up link-check fix 🤞

pp-mo · 2021-06-09T16:54:50Z

Ok, I broke some of the other tests there by turning on the "compare all results pyke + non-pyke" option.
But I'll fix those too shortly.

pp-mo · 2021-06-09T17:59:43Z

That's it !
I've now added support for all the rules behaviour, and it all seems to all work.
At present, the rules tests are running everything through both routes + checking that results compare.
😁

pp-mo · 2021-06-09T22:51:48Z

(
~~I have also managed now to run all Iris tests with the new mechanism replacing pyke + no further problems emerged,~~
~~i.e. no existing tests failed when using the non-pyke mechanism.~~
Whoops, spoke too soon : When test data is provided, there are some problems. Will investigate
)

Looking forward, I think this looks very hopeful for us to entirely remove Pyke within this PR 🚀 !

We should get the new code + tests reviewed as-is,
Once we're happy that nothing is missing, I should be able to simplify this branch + remove Pyke altogether.

Regarding the future of the new tests, I was intending to provide two TestCase classes for every set of tests, one to run against the old mechanism and one against the new (by setting the class use_pyke switch).
However, I realise now that we just don't need that : Anyone can easily run tests, manually, with either mechanism.
So, once we have prior confidence in all these tests, we can just switch over + be happy as long as they still pass.
So the tests can be usefully simplified from what we now have, and I will aim to do that going forward, along with whatever other changes may be needed.

stephenworsley · 2021-06-10T10:03:19Z

lib/iris/tests/unit/fileformats/netcdf/load_cube/load_cube__activate/__init__.py

+        iris.fileformats.netcdf.DEBUG = self.debug
+
+        # Call the main translation function to load a single cube.
+        def load_single_cube(engine):


Would this make more sense to be named load_specific_cube? It seems like _load_cube is already loading a 'single' cube (i.e. the result is just one cube), but this function if different in that it has already specified which cube it will load.

I thin I may just call this 'translate_cube'.
It is only needed to encapsulate the "call _load_cube and record formula info" operation, as we need to do that twice.

pp-mo · 2021-06-10T16:56:18Z

Added actions for formulae (aka hybrid coords, factories). 3e15750

I think this is what was wrong!
Not quite sure how I missed doing that.
I still need to confirm that this works for existing tests, so this is maybe just a preview

pp-mo · 2021-06-10T23:41:38Z

Ok, this time it's real :
I have now tested running all Iris tests (with the test-data), and a small hack to send all netcdf loads through the new mechanism.
FYI those temporary hacks are shown in another draft PR (= this code + one extra commit)
-- that shows nearly everything passing : it just upsets some tests that check the warnings output

stephenworsley · 2021-06-11T13:59:55Z

lib/iris/fileformats/_nc_load_rules/actions.py

+    rule_name = "fc_formula_type"
+    (var_name,) = formula_root_fact
+    cf_var = engine.cf_var.cf_group[var_name]
+    # var.standard_name is a formula type (or we should never get here).


I believe this comment should refer to cf_var.standard_name.

stephenworsley · 2021-06-11T14:46:15Z

lib/iris/fileformats/_nc_load_rules/actions.py

+    # Must run AFTER formula root identification.
+    (termvar_name, rootvar_name, term_name) = formula_term_fact
+    # The rootname is implicit :  have only one per cube
+    # TODO: change when we adopt cf-1.7 advanced grid-mping syntax


I'm guessing "grid-mping" should be "grid-mapping"

stephenworsley · 2021-06-15T10:09:07Z

lib/iris/tests/unit/fileformats/netcdf/load_cube/load_cube__activate/__init__.py

+
+            # First zap cube-data, as masked data does not compare well.
+            def unmask_cube(cube):
+                # preserve the original, we're going to realise..


This comment could be clearer, I can't make out what "we're going to realise" is refering to.
Also, capitalise the first letter for consistency.

…code.

…ared dim.

…ncluding pyke == nonpyke checks.

pp-mo · 2021-06-23T13:05:50Z

Rebased, now checking we are okay with all CI checks.

Next I will be pulling in code from #4198, which removes Pyke and simplifies all the testing, but has not yet got more recent test additions from here.

pp-mo · 2021-06-23T17:30:30Z

Basically, #4198 now replaces this.
But this may remain a useful reference for the scope, while it was still testing against both mechanisms

pp-mo requested a review from trexfeathers June 2, 2021 13:24

stephenworsley self-requested a review June 4, 2021 09:44

stephenworsley reviewed Jun 7, 2021

View reviewed changes

lib/iris/fileformats/_nc_load_rules/actions.py Outdated Show resolved Hide resolved

stephenworsley reviewed Jun 7, 2021

View reviewed changes

lib/iris/fileformats/_nc_load_rules/helpers.py Outdated Show resolved Hide resolved

trexfeathers removed their request for review June 8, 2021 08:34

pp-mo force-pushed the replace_pyke branch from aed541f to f861318 Compare June 8, 2021 11:40

stephenworsley reviewed Jun 9, 2021

View reviewed changes

lib/iris/tests/unit/fileformats/netcdf/load_cube/load_cube__activate/test__grid_mappings.py Outdated Show resolved Hide resolved

stephenworsley reviewed Jun 9, 2021

View reviewed changes

lib/iris/tests/unit/fileformats/netcdf/load_cube/load_cube__activate/test__grid_mappings.py Outdated Show resolved Hide resolved

pp-mo force-pushed the replace_pyke branch from 4b1bee2 to 85752ca Compare June 9, 2021 09:02

stephenworsley reviewed Jun 10, 2021

View reviewed changes

pp-mo mentioned this pull request Jun 10, 2021

Replace pyke testinghacks #4188

Closed

stephenworsley reviewed Jun 11, 2021

View reviewed changes

stephenworsley reviewed Jun 15, 2021

View reviewed changes

pp-mo added 24 commits June 23, 2021 14:02

Move grid-mapping-specific content out of common load_cube__activate …

28626d0

…code.

Add tests for time rules.

e5f0794

Simplify and remove unused keys; check coord classes; test dim+aux sh…

8206910

…ared dim.

Simpler 'Opts' implementation.

aa2f90c

Tidy testing classes a bit.

6e05f1b

Tests for hybrid vertical coords.

96c1fce

Small review changes.

e22d658

Regularise per-test records of rules triggered.

df47b55

Added tests for auxiliary lat+lon coords.

02d22b2

Tests for remaining miscellaneous rules.

8fcf87f

Review: fix typos.

9107a09

Add testing option to compare pyke and nonpyke loads.

8fd89f4

Fixes to grid-mapping/dimcoord actions: passing all compare-tests.

f99378a

Actions for remaining 'miscellaneous' behaviour: All tests passing, i…

a306e8a

…ncluding pyke == nonpyke checks.

Improved comments in actions routines.

fb7a154

Added actions for formulae (aka hybrid coords, factories).

8cc39cd

Tiny fix.

fc4f334

Small review changes.

302857f

Added basic tests for Engine class.

0a07635

Raise actual warning for unrecognised formulae.

2142391

Remove unreachable warning code.

d7859df

Remove unreachable rule-fail code.

db13546

Testcase for multiple hybrid coordinates.

2c4ba36

Fix rotated-aux-latitude test.

fbdbfa4

pp-mo force-pushed the replace_pyke branch from 30eaa09 to fbdbfa4 Compare June 23, 2021 13:03

Isort fixes.

f49c600

pp-mo closed this Jun 23, 2021

pp-mo deleted the replace_pyke branch March 18, 2022 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace pyke #4170

Replace pyke #4170

pp-mo commented Jun 2, 2021 •

edited

Loading

pp-mo commented Jun 2, 2021 •

edited

Loading

pp-mo commented Jun 2, 2021

DPeterK commented Jun 3, 2021

pp-mo commented Jun 4, 2021 •

edited

Loading

pp-mo commented Jun 7, 2021 •

edited

Loading

pp-mo commented Jun 8, 2021

DPeterK commented Jun 8, 2021

pp-mo commented Jun 8, 2021 •

edited

Loading

pp-mo commented Jun 8, 2021 •

edited

Loading

rcomer commented Jun 8, 2021

pp-mo commented Jun 9, 2021

pp-mo commented Jun 9, 2021

pp-mo commented Jun 9, 2021

pp-mo commented Jun 9, 2021 •

edited

Loading

stephenworsley Jun 10, 2021

pp-mo Jun 14, 2021

pp-mo commented Jun 10, 2021

pp-mo commented Jun 10, 2021 •

edited

Loading

stephenworsley Jun 11, 2021

stephenworsley Jun 11, 2021

stephenworsley Jun 15, 2021

pp-mo commented Jun 23, 2021

pp-mo commented Jun 23, 2021 •

edited

Loading

Replace pyke #4170

Replace pyke #4170

Conversation

pp-mo commented Jun 2, 2021 • edited Loading

🚀 Pull Request

Description

Technical details of approach

Current progress + status summary

pp-mo commented Jun 2, 2021 • edited Loading

pp-mo commented Jun 2, 2021

DPeterK commented Jun 3, 2021

pp-mo commented Jun 4, 2021 • edited Loading

pp-mo commented Jun 7, 2021 • edited Loading

pp-mo commented Jun 8, 2021

DPeterK commented Jun 8, 2021

pp-mo commented Jun 8, 2021 • edited Loading

pp-mo commented Jun 8, 2021 • edited Loading

rcomer commented Jun 8, 2021

pp-mo commented Jun 9, 2021

pp-mo commented Jun 9, 2021

pp-mo commented Jun 9, 2021

pp-mo commented Jun 9, 2021 • edited Loading

stephenworsley Jun 10, 2021

Choose a reason for hiding this comment

pp-mo Jun 14, 2021

Choose a reason for hiding this comment

pp-mo commented Jun 10, 2021

pp-mo commented Jun 10, 2021 • edited Loading

stephenworsley Jun 11, 2021

Choose a reason for hiding this comment

stephenworsley Jun 11, 2021

Choose a reason for hiding this comment

stephenworsley Jun 15, 2021

Choose a reason for hiding this comment

pp-mo commented Jun 23, 2021

pp-mo commented Jun 23, 2021 • edited Loading

pp-mo commented Jun 2, 2021 •

edited

Loading

pp-mo commented Jun 2, 2021 •

edited

Loading

pp-mo commented Jun 4, 2021 •

edited

Loading

pp-mo commented Jun 7, 2021 •

edited

Loading

pp-mo commented Jun 8, 2021 •

edited

Loading

pp-mo commented Jun 8, 2021 •

edited

Loading

pp-mo commented Jun 9, 2021 •

edited

Loading

pp-mo commented Jun 10, 2021 •

edited

Loading

pp-mo commented Jun 23, 2021 •

edited

Loading