Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data which is averaged over some area should have cell methods, and not just rely on names #3

Open
bnlawrence opened this issue Nov 3, 2022 · 3 comments
Labels
output-cf-compliance Relates to the CF compliance of the output

Comments

@bnlawrence
Copy link
Contributor

The following piece of code finds all the files which have the standard_name of surface_temperature and the long_name of OPEN SEA SURFACE TEMP AFTER TIMESTEP. (This is from the one file per field output, but I don't think that's relevant to the problem.)

$files = index['surface_temperature:OPEN SEA SURFACE TEMP AFTER TIMESTEP']
$print(files)
['1m_0h__m01s00i507_2_195001-195001.nc', '1m_12h__m01s00i507_6_195001-195001.nc', 
   '1m_15h__m01s00i507_7_195001-195001.nc', '1m_18h__m01s00i507_8_195001-195001.nc', 
   '1m_21h__m01s00i507_9_195001-195001.nc', '1m_3h__m01s00i507_3_195001-195001.nc', 
   '1m_6h__m01s00i507_4_195001-195001.nc', '1m_9h__m01s00i507_5_195001-195001.nc', 
   '1m__m01s00i507_195001-195001.nc', '3h__m01s00i507_10_19500101-19500110.nc', 
   '3h__m01s00i507_10_19500111-19500120.nc', '3h__m01s00i507_10_19500121-19500130.nc']
$flds = index.get_fields('surface_temperature:OPEN SEA SURFACE TEMP AFTER TIMESTEP',
    'u-cn134-1fpf/19500101T0000Z/')

It then does the aggregation to the two CF-fields that are really in play:

$print(flds)
[<CF Field: surface_temperature(time(241), latitude(324), longitude(432)) K>,
 <CF Field: surface_temperature(time(8), latitude(324), longitude(432)) K>]

These two fields are:

Field: surface_temperature (ncvar%m01s00i507_2)
-----------------------------------------------
Data            : surface_temperature(time(8), latitude(324), longitude(432)) K
Cell methods    : time(8): point within days time(8): mean over days
Dimension coords: latitude(324) = [-89.72222137451172, ..., 89.72222137451172] degrees_north
                : longitude(432) = [0.4166666567325592, ..., 359.5833435058594] degrees_east
Auxiliary coords: time(time(8)) = [1950-01-16 00:00:00, ..., 1950-01-16 21:00:00] 360_day

Field: surface_temperature (ncvar%m01s00i507_10)
------------------------------------------------
Data            : surface_temperature(time(241), latitude(324), longitude(432)) K
Cell methods    : time(241): mean (interval: 900 s)
Dimension coords: latitude(324) = [-89.72222137451172, ..., 89.72222137451172] degrees_north
                : longitude(432) = [0.4166666567325592, ..., 359.5833435058594] degrees_east
Auxiliary coords: time(time(241)) = [1950-01-01 01:30:00, ..., 1950-01-30 22:30:00] 360_day

In both cases there should be a cell method which conforms to the relevant part of the CF conventions. All long names should be checked for such averaging and the appropriate cell methods used.

@bnlawrence bnlawrence added the output-cf-compliance Relates to the CF compliance of the output label Nov 3, 2022
@bnlawrence bnlawrence added this to the Pre-Production-Data-Checks milestone Nov 3, 2022
@bnlawrence bnlawrence changed the title Data which is averaged over some area should have cell methods, and not just rely on standard names Data which is averaged over some area should have cell methods, and not just rely on names Nov 3, 2022
@grenville
Copy link

grenville commented Nov 4, 2022

STASH_to_CF.txt might provide this or can be modified to do so, m01s00i507 has:

0!506!surface_temperature!K!where_land!
0!507!surface_temperature!K!where_open_sea!
0!508!surface_temperature!K!where_sea_ice!

@davidhassell
Copy link

Hi - this is a feature! It's because the time coordinates are encoded as auxiliary coordinates, rather than dimension coordinates.

The solution is to fix the netCDF files, which is already on my list :)

The aggregation rules are different for axes without dimension coordinates, because the dimension coordinates have more restrictions (e.g. dimension coordinates must be strictly monotonically [in|de]creasing).

When I move the time coordinates from auxiliary coordinates to dimension coordinates:

>>> import cf
>>> f = cf.read(['1m_0h__m01s00i507_2_195001-195001.nc', '1m_12h__m01s00i507_6_195001-195001.nc',
...       '1m_15h__m01s00i507_7_195001-195001.nc', '1m_18h__m01s00i507_8_195001-195001.nc',
...       '1m_21h__m01s00i507_9_195001-195001.nc', '1m_3h__m01s00i507_3_195001-195001.nc',
...       '1m_6h__m01s00i507_4_195001-195001.nc', '1m_9h__m01s00i507_5_195001-195001.nc',
...       '1m__m01s00i507_195001-195001.nc', '3h__m01s00i507_10_19500101-19500110.nc',
...       '3h__m01s00i507_10_19500111-19500120.nc', '3h__m01s00i507_10_19500121-19500130.nc'],
...      aggregate=False)
...
>>> for i in f:
...     axis_t = i.domain_axis('T', key=True)
...     aux_t = i.del_construct('T')
...     i.set_construct(cf.DimensionCoordinate(source=aux_t), axes=axis_t)
... 

and then aggregate, I get the expected result:

>>> cf.aggregate(f, verbose=2)
Unaggregatable 'surface_temperature' fields have been output: 'time' dimension coordinate ranges overlap: [869400.0, 1722600.0], [1296000.0, 1296000.0]
[<CF Field: surface_temperature(time(1), latitude(324), longitude(432)) K>,
 <CF Field: surface_temperature(time(80), latitude(324), longitude(432)) K>,
 <CF Field: surface_temperature(time(80), latitude(324), longitude(432)) K>,
 <CF Field: surface_temperature(time(80), latitude(324), longitude(432)) K>,
 <CF Field: surface_temperature(time(8), latitude(324), longitude(432)) K>]

This is not perhaps the three fields that you might have imagined, because when faced with an unresolvable ambiguity, it can't do anything along that axis. The ambiguity here is one field (the monthly mean) has time ranges that overlap ambiguously with the short period fields.

The next solution is to not include the monthly mean in the same aggregation command.

@davidhassell
Copy link

I've just remembered that you can use the equal keyword to cf.aggregate to separate fields by property. E.g to group fields which have common values of their interval_write and interval_operation properties you could do:

>>> f = cf.read('*.nc', aggregate={'equal': ['interval_write', 'interval_operation']}) 

This will then correctly aggregate the monthly means and daily means in one aggregation call.

That said, there is a trivial little bug in the code at the moment that stops this working - but I've fixed it for the next release of cf-python (i.e. very soon).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
output-cf-compliance Relates to the CF compliance of the output
Projects
None yet
Development

No branches or pull requests

3 participants