Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow data to be omitted from netCDF files during cfdm.write #222

Merged
merged 12 commits into from
Oct 31, 2022

Conversation

davidhassell
Copy link
Contributor

Fixes #221

@davidhassell davidhassell added this to the Next release milestone Oct 28, 2022
@davidhassell davidhassell marked this pull request as draft October 28, 2022 09:23
@davidhassell davidhassell marked this pull request as ready for review October 28, 2022 09:32
@davidhassell
Copy link
Contributor Author

OK - I've finished fiddling - good to review, now, thanks.

@codecov
Copy link

codecov bot commented Oct 28, 2022

Codecov Report

Merging #222 (e298008) into master (46cf194) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head e298008 differs from pull request most recent head f580ea5. Consider uploading reports for the commit f580ea5 to get more accurate results

@@            Coverage Diff             @@
##           master     #222      +/-   ##
==========================================
+ Coverage   87.66%   87.67%   +0.01%     
==========================================
  Files         123      123              
  Lines       12631    12641      +10     
==========================================
+ Hits        11072    11082      +10     
  Misses       1559     1559              
Flag Coverage Δ
unittests 87.67% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cfdm/read_write/write.py 100.00% <ø> (ø)
cfdm/cfdmimplementation.py 83.05% <100.00%> (+0.07%) ⬆️
cfdm/read_write/netcdf/netcdfwrite.py 86.96% <100.00%> (+0.08%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks great, implementing the feature request at hand in an elegant way, and apart from one suggestion relating to testing (see in-line) I can't pick out anything to say in fault or suggestion, but I have noticed a few things in reviewing that I wanted to check were expected, in case they are issues that mean some tweaking is required.

Namely, to investigate the new keyword I wrote out all of the example fields with omit_data="all" set, and eyeballed the dump on each field when read back in to compare with the original.

Everything looked good, with the re-read fields showing masked data e.g. [[--, ..., --]] instead of [[1.0, ..., 8.0]] but the metadata reported otherwise being reported as identical, with some exceptions, as follows. All of these differences were noticed in the 6th and 7th fields:

  • example field 6: the Auxiliary coordinate: ncvar%z construct gets re-labelled as Auxiliary coordinate: altitude, which corresponds to the standard name of the bounds, but I don't think there should be a change in the label like that? Explicitly I see:

    - Auxiliary coordinate: ncvar%z
    + Auxiliary coordinate: altitude
        Geometry: polygon
        Bounds:axis = 'Z'
        Bounds:standard_name = 'altitude'
        Bounds:units = 'm'
        Bounds:Data(cf_role=timeseries_id(2), 3, 4) = [[[1.0, ..., --]]] m
        Interior Ring:Data(cf_role=timeseries_id(2), 3) = [[0, ..., --]]

    where I am also wondering why the first few Data points remain for the bounds and interior ring, i.e. why they don't change and show as [[--, ..., --]] like the other data that gets omitted?

  • example field 7: the _FillValue = -1073741824.0 changes slightly to _FillValue = -1073741800.0, is that expected and/or harmless?

  • generally is it expected that datetimes are reported (i.e. does this indicate an under-the-hood issue, not just a representation quirk) after omit_data and re-reading as e.g. Data(time(1)) = [0]? And for multiple datetimes, they sometimes have a 0 entry e.g. (from example field 7 construct Dimension coordinate: time), Data(time(3)) = [--, 0, --] gregorian, though sometimes not e.g. Bounds:Data(time(3), 2) = [[--, ..., --]] gregorian from the same construct?

cfdm/read_write/write.py Show resolved Hide resolved
cfdm/test/test_read_write.py Show resolved Hide resolved
@davidhassell
Copy link
Contributor Author

where I am also wondering why the first few Data points remain for the bounds and interior ring, i.e. why they don't change and show as [[--, ..., --]] like the other data that gets omitted?

e0d61cb passes on the omission to the interior ring

@davidhassell
Copy link
Contributor Author

Data(time(1)) = [0]?

This one's an unrelated bug. If we look at the actual array we get:

>>> t = g.construct('time')
>>> t.data
<Data(3): [--, 0, --] gregorian>   # Wrong
>>> t.data.array 
masked_array(data=[--, --, --],    # Right
             mask=[ True,  True,  True],
       fill_value=1e+20,
            dtype=float64)

So the underlying data is correct, but the str (which is called by the repr representation is wrong. Let's raise another issue for that.

@davidhassell
Copy link
Contributor Author

davidhassell commented Oct 28, 2022

example field 7: the _FillValue = -1073741824.0 changes slightly to _FillValue = -1073741800.0, is that expected and/or harmless?

Unexpected, but harmless for now! It happens through a write/read cycle at v1.10.0.0, too, so isn't related to omit_data. Might be a rounding concern (I also looked at some double precision cases). One for another issue, too.

@davidhassell
Copy link
Contributor Author

example field 6: the Auxiliary coordinate: ncvar%z construct gets re-labelled as Auxiliary coordinate: altitude,

Again, unrelated (thankfully). If you inspect f and g after import cfdm; f = cfdm.example_field(6); cfdm.write(f, 'delme1.nc'); cfdm.write(f, 'delme2.nc', omit_data='all'); f = cfdm.read('delme1.nc')[0]; g = cfdm.read('delme2.nc')[0], where f is read from disk, rather than created ab initio, all is fine.

I'll have a look in the example field definition to see if it's obvious why "altitude" doesn't appear in the first place ...

@sadielbartholomew
Copy link
Member

Thanks David for drilling down to investigate all of the facets. Always good to catch a bug even if it isn't immediately related 😄

So as far as I can tell from your recent comments, #222 (comment) is the only issue directly related to this PR, and you have put in a fix, is that right? In which case I just need to re-review with the new commit and then we're good to go?

@davidhassell
Copy link
Contributor Author

davidhassell commented Oct 28, 2022

example field 6: the Auxiliary coordinate: ncvar%z construct gets re-labelled as Auxiliary coordinate: altitude,

This turns out to be a feature! It's to do with the unusual case where a coordinate construct has bounds but no data. In this case we shouldn't be assigning an ncvar to the coordinates (which is what example 6 was doing). This commit 3ae74fe makes things consistent.

@davidhassell
Copy link
Contributor Author

Can we also do a quick dump on the re-read field g

I've added to the test:

        # Check that a dump works
        g.dump(display=False)

Is that what you meant?

@sadielbartholomew
Copy link
Member

sadielbartholomew commented Oct 28, 2022

Is that what you meant?

Yes, precisely, just test that it doesn't fail miserably due to some facet of missing data.

@davidhassell
Copy link
Contributor Author

Great - are we good to go on this? (notwithstanding the two new issues that have been spawned :))

@davidhassell
Copy link
Contributor Author

(sorry - I missed #222 (comment), but I think we've ended up in the same place!)

@sadielbartholomew
Copy link
Member

(sorry - I missed #222 (comment), but I think we've ended up in the same place!)

No worries! I also missed your comments afterwards so my apologies too.

I'm just taking a final look and will let you know in a moment.

Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All questions I raised in first review have been answered and problems emerging from those have either been addressed by new commits that I've since reviewed, or delegated to (intended) new issues (or identified as a feature! 😆):

... notwithstanding the two new issues that have been spawned :) ...

so we just need to open these two (#222 (comment) and #222 (comment)), possibly just by linking to your comments, since I can't see any issues referenced or on the Issue Trackers so assume they haven;t been created yet. Shall I do that or would you like to?

All good, though! Thanks. Please merge.

@davidhassell
Copy link
Contributor Author

Thanks, Sadie - Happy for you to raise new issues :) (but just ping me otherwise)

@davidhassell davidhassell merged commit 4d3f599 into NCAS-CMS:master Oct 31, 2022
@davidhassell davidhassell deleted the write-no-data branch November 14, 2022 08:55
@davidhassell davidhassell added the netCDF write Relating to writing netCDF datasets label Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
netCDF write Relating to writing netCDF datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow data to be omitted from netCDF files during cfdm.write
2 participants