Extend Cube Metadata NetCDF zlib compression #6552

bjlittle · 2025-07-08T08:16:51Z

🚀 Pull Request

Description

Extend NetCDF zlib compression serialization to also include auxiliary coordinates and ancillary variables that have the same shape as the associated parent cube data payload.

Does not apply to metadata with string dtypes.

The complevel, fletcher32, shuffle and zlib kwargs applied to the cube data payload will also be applied to the above applicable metadata.

Closes #6539

Ping @pasc85

Consult Iris pull request check list

Add any of the below labels to trigger actions on this PR:

benchmark_this Request that this pull request be benchmarked to check if it introduces performance shifts

codecov · 2025-07-08T08:42:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.90%. Comparing base (e15c079) to head (86499ce).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #6552   +/-   ##
=======================================
  Coverage   89.89%   89.90%           
=======================================
  Files          90       90           
  Lines       24146    24153    +7     
  Branches     4493     4496    +3     
=======================================
+ Hits        21707    21714    +7     
  Misses       1679     1679           
  Partials      760      760

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pasc85 · 2025-07-09T15:19:59Z

I tested the new feature, and that went really well! The notebook I used works as follows:

load a diagnostic on pressure levels that has status flag information
three samples: one from a big model (182M entries) and two from a smaller model (7.8M)
cf. the __original files below
remove the status flag data and add it again, either as an ancillary variable or
as an auxiliary coordinate, and either as an int8 or as an float32
there's also an option where the status flag information isn't added back for comparison
so, 5 modes in total: data_only, anc_float, anc_int, aux_float, aux_int

The output below shows the size of the produced files; there are two sections:

"before" the changes: has been run with my current stage development environment
"after" the changes: run with minimal environment with @bjlittle's branch pip-installed in it

The numbers in the file names below are the number of entries in the cube, which is useful for
checking the change in file size when the data type is changed (only in the "before" files).

The results are very good:

the "after" cubes including status_flag are not much larger than the data_only cubes
... and they are much smaller than the "before" versions
int8 is preferable to float32
no difference between using ancillary variable of auxiliary coordinate
both cube data and status flag data haven't changed

So, this is great -- thank you very much!

bjlittle · 2025-07-09T15:31:44Z

@pasc85 Brilliant!

Thanks for taking the time and effort to test this and also share the comprehensive results 🥇

Looks like this this change is a good win for the community, and your testing give us confidence that it's fit for purpose 👍

pp-mo · 2025-07-11T08:47:11Z

I tested the new feature, and that went really well! The notebook I used works as follows:

This is really useful.
Just for completeness though, can you explain what compression settings you applied ?
If the script is short, it would be great to just share that.

Apart from general helpfulness, my reason was ...
I was wondering whether to raise the possible criticism that the same settings might not apply equally well to ancils/aux-coords of different dtypes as the main data.
And I noted what you said that the float32 status did not compress as well as the byte one.

The thing is, this PR is a great quick win, but I have a bit of a hidden agenda !!
I think it would ultimately be much neater and more appropriate to replace our keyword controls for compression, packing and fill-value with a context manager approach as we have done with loading controls -- notably the load chunking control, which allows us to apply different chunking to different file variables.
That type of separate management makes the primary method API tidier, and it is easier to apply more detailed context.
Whereas, as we have it, the settings apply per cube you are saving, so they must be used for all components of the cube.
(BTW another great benefit is it avoids settings down through all intermediate routines, as seen in the changes here !)

@bjlittle what do you think of this ?
I don't think it can fairly be raised as a criticism of this PR, as that is far too much breaking change to introduce at this point. But it does show up limitations. Perhaps we can bank the idea somewhere else.

bjlittle · 2025-07-11T09:13:24Z

@pp-mo Great idea, but there's no major benefit here given the effort required for this particular use case. I'd be keen to bank this pull-request "as is" without straying off the path too far.

Apart from general helpfulness, my reason was ... I was wondering whether to raise the possible criticism that the same settings might not apply equally well to ancils/aux-coords of different dtypes as the main data. And I noted what you said that the float32 status did not compress as well as the byte one.

As it stands we're getting ~>95% improvement in compression footprint ... further optimisations would be minimal.

Also, this type of compression is only applicable to NetCDF and not other data formats (easily); we're getting a lot for free from the netcdf4 package.

As a compromise why not raise a separate dedicated issue capturing your thoughts @pp-mo, and we can start a conversation there? Perhaps we can agree on a way forward as a separate concern and target iris 3.14 ?

pp-mo

The code looks really neat, but I'm not sure about the testing yet

lib/iris/fileformats/netcdf/saver.py

lib/iris/tests/unit/fileformats/netcdf/saver/test_Saver.py

bjlittle · 2025-07-11T10:22:18Z

@pp-mo Tweaked the non-compression tests to do as I intended, nice spot 👀

Also added some comments to clarify 👍

pp-mo · 2025-07-11T10:33:32Z

@pp-mo Great idea, but there's no major benefit here given the effort required for this particular use case. I'd be keen to bank this pull-request "as is" without straying off the path too far.

Absolutely

Also, this type of compression is only applicable to NetCDF and not other data formats (easily); we're getting a lot for free from the netcdf4 package.

Yes, and in fact the criticism applies more directly to the 'packing' arguments, which aren't covered here.

As a compromise why not raise a separate dedicated issue capturing your thoughts @pp-mo, and we can start a conversation there? Perhaps we can agree on a way forward as a separate concern and target iris 3.14 ?

If anything, given the breaking change nature, perhaps more of an Iris 4 thing?

bjlittle · 2025-07-11T11:04:52Z

If anything, given the breaking change nature, perhaps more of an Iris 4 thing?

You're proposing to not merge this and leave @pasc85 blocked for at least another 5 months to wait from iris 3.14? Really? 😬 Or have I misunderstood?

I wouldn't see what you're proposing beyond this PR as a breaking change, certainly not in the API. Additionally we could still support the compression behaviour proposed here moving forwards as a minimal default.

Also, I'm talking about netCDF compression here and not straying into the minefield of packing, netCDF or otherwise. Anyways, criticism accepted and overdue but out of scope for this PR 😜

for more information, see https://pre-commit.ci

pp-mo · 2025-07-14T11:16:36Z

I'm talking about netCDF compression here and not straying into the minefield of packing, netCDF or otherwise

Ah ok, wasn't really a clear distinction to me, but point taken.

I wouldn't see what you're proposing beyond this PR as a breaking change, certainly not in the API. Additionally we could still support the compression behaviour proposed here moving forwards as a minimal default.

Well, I was suggesting we might replace the save keywords with a context manager, which I think is pretty breaking. I'm not sure we could combine the two effectively, since part of the point would be to be able to compress cube components differently to the cube. So, I thought a bit future wishlist really.

All good now

pp-mo

All good now.

bjlittle marked this pull request as draft July 8, 2025 08:25

bjlittle force-pushed the save-coord-zlib branch from 7b4c609 to ad65005 Compare July 8, 2025 08:32

bjlittle force-pushed the save-coord-zlib branch from fb05d52 to cf55e4a Compare July 8, 2025 09:00

bjlittle marked this pull request as ready for review July 8, 2025 10:33

scitools-ci bot added this to 🚴 Peloton Jul 9, 2025

HGWright requested a review from pp-mo July 9, 2025 09:07

HGWright assigned pp-mo Jul 9, 2025

bjlittle force-pushed the save-coord-zlib branch from 6c031e0 to 3d92777 Compare July 10, 2025 10:55

pp-mo previously requested changes Jul 11, 2025

View reviewed changes

bjlittle force-pushed the save-coord-zlib branch from a08105e to 20ca1bb Compare July 11, 2025 21:57

bjlittle and others added 4 commits July 14, 2025 11:43

Extend Cube Metadata NetCDF zlib compression

8a767b8

[pre-commit.ci] auto fixes from pre-commit.com hooks

c427557

for more information, see https://pre-commit.ci

add whatsnew entry

9987182

review actions

bea3580

bjlittle force-pushed the save-coord-zlib branch from 20ca1bb to bea3580 Compare July 14, 2025 10:43

Merge branch 'main' into save-coord-zlib

86499ce

pp-mo enabled auto-merge (squash) July 14, 2025 11:19

pp-mo approved these changes Jul 14, 2025

View reviewed changes

pp-mo merged commit f8030d0 into SciTools:main Jul 14, 2025
21 checks passed

github-project-automation bot moved this to Done in 🚴 Peloton Jul 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend Cube Metadata NetCDF zlib compression #6552

Extend Cube Metadata NetCDF zlib compression #6552

Uh oh!

bjlittle commented Jul 8, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 8, 2025 •

edited

Loading

Uh oh!

pasc85 commented Jul 9, 2025

Uh oh!

bjlittle commented Jul 9, 2025 •

edited

Loading

Uh oh!

pp-mo commented Jul 11, 2025 •

edited

Loading

Uh oh!

bjlittle commented Jul 11, 2025 •

edited

Loading

Uh oh!

pp-mo left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bjlittle commented Jul 11, 2025

Uh oh!

pp-mo commented Jul 11, 2025

Uh oh!

bjlittle commented Jul 11, 2025 •

edited

Loading

Uh oh!

pp-mo commented Jul 14, 2025

Uh oh!

pp-mo left a comment

Uh oh!

Uh oh!

Uh oh!

Extend Cube Metadata NetCDF zlib compression #6552

Extend Cube Metadata NetCDF zlib compression #6552

Uh oh!

Conversation

bjlittle commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Pull Request

Description

Uh oh!

codecov bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pasc85 commented Jul 9, 2025

Uh oh!

bjlittle commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pp-mo commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjlittle commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pp-mo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bjlittle commented Jul 11, 2025

Uh oh!

pp-mo commented Jul 11, 2025

Uh oh!

bjlittle commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pp-mo commented Jul 14, 2025

Uh oh!

pp-mo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bjlittle commented Jul 8, 2025 •

edited

Loading

codecov bot commented Jul 8, 2025 •

edited

Loading

bjlittle commented Jul 9, 2025 •

edited

Loading

pp-mo commented Jul 11, 2025 •

edited

Loading

bjlittle commented Jul 11, 2025 •

edited

Loading

bjlittle commented Jul 11, 2025 •

edited

Loading