-
Notifications
You must be signed in to change notification settings - Fork 297
Extend Cube Metadata NetCDF zlib compression #6552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6552 +/- ##
=======================================
Coverage 89.89% 89.90%
=======================================
Files 90 90
Lines 24146 24153 +7
Branches 4493 4496 +3
=======================================
+ Hits 21707 21714 +7
Misses 1679 1679
Partials 760 760 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I tested the new feature, and that went really well! The notebook I used works as follows:
The output below shows the size of the produced files; there are two sections:
The numbers in the file names below are the number of entries in the cube, which is useful for ![]() The results are very good:
So, this is great -- thank you very much! |
@pasc85 Brilliant! Thanks for taking the time and effort to test this and also share the comprehensive results 🥇 Looks like this this change is a good win for the community, and your testing give us confidence that it's fit for purpose 👍 |
6c031e0
to
3d92777
Compare
This is really useful. Apart from general helpfulness, my reason was ... The thing is, this PR is a great quick win, but I have a bit of a hidden agenda !! @bjlittle what do you think of this ? |
@pp-mo Great idea, but there's no major benefit here given the effort required for this particular use case. I'd be keen to bank this pull-request "as is" without straying off the path too far.
As it stands we're getting ~>95% improvement in compression footprint ... further optimisations would be minimal. Also, this type of compression is only applicable to NetCDF and not other data formats (easily); we're getting a lot for free from the As a compromise why not raise a separate dedicated issue capturing your thoughts @pp-mo, and we can start a conversation there? Perhaps we can agree on a way forward as a separate concern and target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks really neat, but I'm not sure about the testing yet
@pp-mo Tweaked the non-compression tests to do as I intended, nice spot 👀 Also added some comments to clarify 👍 |
Absolutely
Yes, and in fact the criticism applies more directly to the 'packing' arguments, which aren't covered here.
If anything, given the breaking change nature, perhaps more of an Iris 4 thing? |
You're proposing to not merge this and leave @pasc85 blocked for at least another 5 months to wait from I wouldn't see what you're proposing beyond this PR as a breaking change, certainly not in the API. Additionally we could still support the compression behaviour proposed here moving forwards as a minimal default. Also, I'm talking about netCDF compression here and not straying into the minefield of packing, netCDF or otherwise. Anyways, criticism accepted and overdue but out of scope for this PR 😜 |
a08105e
to
20ca1bb
Compare
20ca1bb
to
bea3580
Compare
Ah ok, wasn't really a clear distinction to me, but point taken.
Well, I was suggesting we might replace the save keywords with a context manager, which I think is pretty breaking. I'm not sure we could combine the two effectively, since part of the point would be to be able to compress cube components differently to the cube. So, I thought a bit future wishlist really. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good now.
🚀 Pull Request
Description
Extend NetCDF
zlib
compression serialization to also include auxiliary coordinates and ancillary variables that have the same shape as the associated parent cube data payload.Does not apply to metadata with string dtypes.
The
complevel
,fletcher32
,shuffle
andzlib
kwargs applied to the cube data payload will also be applied to the above applicable metadata.Closes #6539
Ping @pasc85
Consult Iris pull request check list
Add any of the below labels to trigger actions on this PR: