Data which is averaged over some area should have cell methods, and not just rely on names #3

bnlawrence · 2022-11-03T15:28:34Z

The following piece of code finds all the files which have the standard_name of surface_temperature and the long_name of OPEN SEA SURFACE TEMP AFTER TIMESTEP. (This is from the one file per field output, but I don't think that's relevant to the problem.)

$files = index['surface_temperature:OPEN SEA SURFACE TEMP AFTER TIMESTEP']
$print(files)
['1m_0h__m01s00i507_2_195001-195001.nc', '1m_12h__m01s00i507_6_195001-195001.nc', 
   '1m_15h__m01s00i507_7_195001-195001.nc', '1m_18h__m01s00i507_8_195001-195001.nc', 
   '1m_21h__m01s00i507_9_195001-195001.nc', '1m_3h__m01s00i507_3_195001-195001.nc', 
   '1m_6h__m01s00i507_4_195001-195001.nc', '1m_9h__m01s00i507_5_195001-195001.nc', 
   '1m__m01s00i507_195001-195001.nc', '3h__m01s00i507_10_19500101-19500110.nc', 
   '3h__m01s00i507_10_19500111-19500120.nc', '3h__m01s00i507_10_19500121-19500130.nc']
$flds = index.get_fields('surface_temperature:OPEN SEA SURFACE TEMP AFTER TIMESTEP',
    'u-cn134-1fpf/19500101T0000Z/')

It then does the aggregation to the two CF-fields that are really in play:

$print(flds)
[<CF Field: surface_temperature(time(241), latitude(324), longitude(432)) K>,
 <CF Field: surface_temperature(time(8), latitude(324), longitude(432)) K>]

These two fields are:

Field: surface_temperature (ncvar%m01s00i507_2)
-----------------------------------------------
Data            : surface_temperature(time(8), latitude(324), longitude(432)) K
Cell methods    : time(8): point within days time(8): mean over days
Dimension coords: latitude(324) = [-89.72222137451172, ..., 89.72222137451172] degrees_north
                : longitude(432) = [0.4166666567325592, ..., 359.5833435058594] degrees_east
Auxiliary coords: time(time(8)) = [1950-01-16 00:00:00, ..., 1950-01-16 21:00:00] 360_day

Field: surface_temperature (ncvar%m01s00i507_10)
------------------------------------------------
Data            : surface_temperature(time(241), latitude(324), longitude(432)) K
Cell methods    : time(241): mean (interval: 900 s)
Dimension coords: latitude(324) = [-89.72222137451172, ..., 89.72222137451172] degrees_north
                : longitude(432) = [0.4166666567325592, ..., 359.5833435058594] degrees_east
Auxiliary coords: time(time(241)) = [1950-01-01 01:30:00, ..., 1950-01-30 22:30:00] 360_day

In both cases there should be a cell method which conforms to the relevant part of the CF conventions. All long names should be checked for such averaging and the appropriate cell methods used.

The text was updated successfully, but these errors were encountered:

grenville · 2022-11-04T09:56:22Z

STASH_to_CF.txt might provide this or can be modified to do so, m01s00i507 has:

0!506!surface_temperature!K!where_land!
0!507!surface_temperature!K!where_open_sea!
0!508!surface_temperature!K!where_sea_ice!

davidhassell · 2022-11-07T14:46:23Z

Hi - this is a feature! It's because the time coordinates are encoded as auxiliary coordinates, rather than dimension coordinates.

The solution is to fix the netCDF files, which is already on my list :)

The aggregation rules are different for axes without dimension coordinates, because the dimension coordinates have more restrictions (e.g. dimension coordinates must be strictly monotonically [in|de]creasing).

When I move the time coordinates from auxiliary coordinates to dimension coordinates:

>>> import cf
>>> f = cf.read(['1m_0h__m01s00i507_2_195001-195001.nc', '1m_12h__m01s00i507_6_195001-195001.nc',
...       '1m_15h__m01s00i507_7_195001-195001.nc', '1m_18h__m01s00i507_8_195001-195001.nc',
...       '1m_21h__m01s00i507_9_195001-195001.nc', '1m_3h__m01s00i507_3_195001-195001.nc',
...       '1m_6h__m01s00i507_4_195001-195001.nc', '1m_9h__m01s00i507_5_195001-195001.nc',
...       '1m__m01s00i507_195001-195001.nc', '3h__m01s00i507_10_19500101-19500110.nc',
...       '3h__m01s00i507_10_19500111-19500120.nc', '3h__m01s00i507_10_19500121-19500130.nc'],
...      aggregate=False)
...
>>> for i in f:
...     axis_t = i.domain_axis('T', key=True)
...     aux_t = i.del_construct('T')
...     i.set_construct(cf.DimensionCoordinate(source=aux_t), axes=axis_t)
...

and then aggregate, I get the expected result:

>>> cf.aggregate(f, verbose=2)
Unaggregatable 'surface_temperature' fields have been output: 'time' dimension coordinate ranges overlap: [869400.0, 1722600.0], [1296000.0, 1296000.0]
[<CF Field: surface_temperature(time(1), latitude(324), longitude(432)) K>,
 <CF Field: surface_temperature(time(80), latitude(324), longitude(432)) K>,
 <CF Field: surface_temperature(time(80), latitude(324), longitude(432)) K>,
 <CF Field: surface_temperature(time(80), latitude(324), longitude(432)) K>,
 <CF Field: surface_temperature(time(8), latitude(324), longitude(432)) K>]

This is not perhaps the three fields that you might have imagined, because when faced with an unresolvable ambiguity, it can't do anything along that axis. The ambiguity here is one field (the monthly mean) has time ranges that overlap ambiguously with the short period fields.

The next solution is to not include the monthly mean in the same aggregation command.

davidhassell · 2022-11-15T12:28:50Z

I've just remembered that you can use the equal keyword to cf.aggregate to separate fields by property. E.g to group fields which have common values of their interval_write and interval_operation properties you could do:

>>> f = cf.read('*.nc', aggregate={'equal': ['interval_write', 'interval_operation']})

This will then correctly aggregate the monthly means and daily means in one aggregation call.

That said, there is a trivial little bug in the code at the moment that stops this working - but I've fixed it for the next release of cf-python (i.e. very soon).

bnlawrence added the output-cf-compliance Relates to the CF compliance of the output label Nov 3, 2022

bnlawrence added this to the Pre-Production-Data-Checks milestone Nov 3, 2022

bnlawrence changed the title ~~Data which is averaged over some area should have cell methods, and not just rely on standard names~~ Data which is averaged over some area should have cell methods, and not just rely on names Nov 3, 2022

bnlawrence mentioned this issue Nov 4, 2022

cf aggregation not working properly: cf-python issue or metadata issue? #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data which is averaged over some area should have cell methods, and not just rely on names #3

Data which is averaged over some area should have cell methods, and not just rely on names #3

bnlawrence commented Nov 3, 2022

grenville commented Nov 4, 2022 •

edited

davidhassell commented Nov 7, 2022

davidhassell commented Nov 15, 2022

Data which is averaged over some area should have cell methods, and not just rely on names #3

Data which is averaged over some area should have cell methods, and not just rely on names #3

Comments

bnlawrence commented Nov 3, 2022

grenville commented Nov 4, 2022 • edited

davidhassell commented Nov 7, 2022

davidhassell commented Nov 15, 2022

grenville commented Nov 4, 2022 •

edited