Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: fixes for all datasets of specific variable #428

Closed
schlunma opened this issue Jan 14, 2020 · 9 comments
Closed

Feature request: fixes for all datasets of specific variable #428

schlunma opened this issue Jan 14, 2020 · 9 comments
Assignees
Labels
enhancement New feature or request fix for dataset Related to dataset-specific fix files

Comments

@schlunma
Copy link
Contributor

For some variables, it would be nice to apply a common fix for all datasets of that specific variable.

For example, for the 3D cloud fraction field cl, most models use a atmosphere_hybrid_sigma_pressure_coordinate (see also #59). Reading those is not a problem for most models, but since iris only allows one variable per cube, the resulting cubes after loading the file contain more than one cube. Thus, the tool outputs this warning

2020-01-14 13:57:18,501 UTC [33290] WARNING Found variable cl in CMIP6:MPI-ESM1-2-HR, but there were other present in the file. Those extra variables are usually metadata (cell area, latitude descriptions) that was not saved properly. It is possible that errors appear further on because of this. 
Full list of cubes encountered: 0: vertical coordinate formula term: b(k+1/2) / (1) (atmosphere_hybrid_sigma_pressure_coordinate: 95; -- : 2)
1: vertical coordinate formula term: ap(k+1/2) / (Pa) (atmosphere_hybrid_sigma_pressure_coordinate: 95; -- : 2)
2: cloud_area_fraction_in_atmosphere_layer / (%) (time: 60; atmosphere_hybrid_sigma_pressure_coordinate: 95; latitude: 192; longitude: 384)
3: Surface Air Pressure / (Pa)         (time: 60; latitude: 192; longitude: 384)

for every model. I don't want to write a fix for every model for that, so it would be nice to handle that in a smarter way.

I'm pretty sure we don't have that feature yet, but please correct me if I'm wrong.

@schlunma schlunma added enhancement New feature or request fix for dataset Related to dataset-specific fix files labels Jan 14, 2020
@valeriupredoi
Copy link
Contributor

smells to me like a very initial preprocessor step? 🍺

@zklaus
Copy link
Contributor

zklaus commented Jan 14, 2020

@bjlittle explains here the problem. If the netcdf file was correct, there would not be several variables and everything would work out just as expected. The problem is really in fixing erroneous metadata.

Having said that, I argued in the past that it would be nice to have a more flexible way of applying fixes since indeed we are frequently using the same fixes for different datasets, also across models.

A crutch is to reuse the fix classes ala:

cesm2.py

class Tas(Fix):
    pass  # contains the actual fix

cesm2_waccm.py

from cesm2 import Tas as BaseTas

Tas = BaseTas

I would still prefer an approach where the fixing code is separate from its application with a matching along the lines of our normal {dataset:...} specification. But this I leave with those responsible for the fixing infrastructure.

@schlunma
Copy link
Contributor Author

@bjlittle explains here the problem. If the netcdf file was correct, there would not be several variables and everything would work out just as expected. The problem is really in fixing erroneous metadata.

I don't think this is correct. Loading the corresponding iris sample cube with

import iris

path = iris.sample_data_path('hybrid_height.nc')
cube = iris.load(path)

print(cube)

gives this

0: air_potential_temperature / (K)     (model_level_number: 15; grid_latitude: 100; grid_longitude: 100)
1: surface_altitude / (m)              (grid_latitude: 100; grid_longitude: 100)

ncdump of that cube gives

netcdf hybrid_height {
dimensions:
        model_level_number = UNLIMITED ; // (15 currently)
        grid_latitude = 100 ;
        grid_longitude = 100 ;
        bnds = 2 ;
variables:
        float air_potential_temperature(model_level_number, grid_latitude, grid_longitude) ;
                air_potential_temperature:standard_name = "air_potential_temperature" ;
                air_potential_temperature:units = "K" ;
                air_potential_temperature:ukmo__um_stash_source = "m01s00i004" ;
                air_potential_temperature:source = "Data from Met Office Unified Model 7.04" ;
                air_potential_temperature:grid_mapping = "rotated_latitude_longitude" ;
                air_potential_temperature:coordinates = "forecast_period forecast_reference_time level_height sigma surface_altitude time" ;
        int rotated_latitude_longitude ;
                rotated_latitude_longitude:grid_mapping_name = "rotated_latitude_longitude" ;
                rotated_latitude_longitude:longitude_of_prime_meridian = 0. ;
                rotated_latitude_longitude:semi_major_axis = 6371229. ;
                rotated_latitude_longitude:semi_minor_axis = 6371229. ;
                rotated_latitude_longitude:grid_north_pole_latitude = 37.5 ;
                rotated_latitude_longitude:grid_north_pole_longitude = 177.5 ;
                rotated_latitude_longitude:north_pole_grid_longitude = 0. ;
        int model_level_number(model_level_number) ;
                model_level_number:axis = "Z" ;
                model_level_number:units = "1" ;
                model_level_number:standard_name = "model_level_number" ;
                model_level_number:positive = "up" ;
        float grid_latitude(grid_latitude) ;
                grid_latitude:axis = "Y" ;
                grid_latitude:bounds = "grid_latitude_bnds" ;
                grid_latitude:units = "degrees" ;
                grid_latitude:standard_name = "grid_latitude" ;
        float grid_latitude_bnds(grid_latitude, bnds) ;
        float grid_longitude(grid_longitude) ;
                grid_longitude:axis = "X" ;
                grid_longitude:bounds = "grid_longitude_bnds" ;
                grid_longitude:units = "degrees" ;
                grid_longitude:standard_name = "grid_longitude" ;
        float grid_longitude_bnds(grid_longitude, bnds) ;
        double forecast_period ;
                forecast_period:units = "hours" ;
                forecast_period:standard_name = "forecast_period" ;
        double forecast_reference_time ;
                forecast_reference_time:units = "hours since 1970-01-01 00:00:00" ;
                forecast_reference_time:standard_name = "forecast_reference_time" ;
                forecast_reference_time:calendar = "gregorian" ;
        float level_height(model_level_number) ;
                level_height:bounds = "level_height_bnds" ;
                level_height:axis = "Z" ;
                level_height:formula_terms = "a: level_height b: sigma orog: surface_altitude" ;
                level_height:units = "m" ;
                level_height:standard_name = "atmosphere_hybrid_height_coordinate" ;
                level_height:long_name = "level_height" ;
                level_height:positive = "up" ;
        float level_height_bnds(model_level_number, bnds) ;
        float sigma(model_level_number) ;
                sigma:bounds = "sigma_bnds" ;
                sigma:units = "1" ;
                sigma:long_name = "sigma" ;
        float sigma_bnds(model_level_number, bnds) ;
        float surface_altitude(grid_latitude, grid_longitude) ;
                surface_altitude:units = "m" ;
                surface_altitude:standard_name = "surface_altitude" ;
                surface_altitude:ukmo__um_stash_source = "m01s00i033" ;
                surface_altitude:source = "Data from Met Office Unified Model 7.04" ;
        double time ;
                time:units = "hours since 1970-01-01 00:00:00" ;
                time:standard_name = "time" ;
                time:calendar = "gregorian" ;

// global attributes:
                :Conventions = "CF-1.5" ;
}

surface_altitude is comparable to the Surface Air Pressure coordinate I mentioned in this issue. Regarding the vertical coordinate formula term: ap(k+1/2), I think you're right, iris cannot determine the bounds of the resulting coordinate, but the coordinate points were read correctly. Maybe @bjlittle can comment on that?

@zklaus
Copy link
Contributor

zklaus commented Jan 14, 2020

There might be a bug in iris, but I think we really need to tackle this for specific example files of interest to determine the best course of action. In the hybrid_heights.nc example, I don't think sigma and surface_altitude should appear in the coordinates attribute of the data variable, but also atmosphere_hybrid_height_coordinate is less common than the pressure variant and might have received less testing.

Maybe let's look at the CMIP6:MPI-ESM1-2-HR file from above (which table and timeslice is this from?) and see if corrected metadata leads to the desired outcome.

@schlunma
Copy link
Contributor Author

It's this dataset:

{dataset: MPI-ESM1-2-HR, project: CMIP6, mip: Amon, short_name: cl, exp: historical, grid: gn,  ensemble: r1i1p1f1, start_year: 1980, end_year: 2004}

@schlunma
Copy link
Contributor Author

schlunma commented Jan 14, 2020

I encountered a more serious problem during the concatenation. After the concatenation step, the resulting cube does not have a derived coordinate anymore, even though the input cubes all had them. I think this is a bug of iris, I will open an issue there.

Until now, we can only read datasets where one single file covers the whole time range.

EDIT: this was already discovered back in 2017: SciTools/iris#2478

@zklaus
Copy link
Contributor

zklaus commented Jan 14, 2020

Ok, so it seems that the appearance of both surface_pressure in the MPI example and surface_altitude in the hybrid_heights.nc example are actually intentional (see discussion in SciTools/iris#1159 and SciTools/iris#1070) and were actually a feature request. As such, we might consider applying a default constraint on loading to only get the variable we want.

The MPI file has some more problems, namely ap, b need a bounds attribute and ps should get a standard_name.

I haven't looked at the killing of derived coordinates in concatenation at all. That will have to wait until tomorrow.

@schlunma
Copy link
Contributor Author

All right. I think I've encountered another bug (that appears when Cube.aggregated_by() is used), see SciTools/iris#3637.

@mattiarighi mattiarighi added this to To do in High priority issues via automation May 25, 2020
@schlunma
Copy link
Contributor Author

With the possibility to add fixes for all variables of a mip table (e.g., class Amon(Fix): ...) and to re-use fixes in other datasets with simple imports (e.g., from .cmip5.canesm2 import Cl as BaseCl; Cl = BaseCl) I don't think this is necessary anymore.

Feel free to re-open if necessary.

High priority issues automation moved this from To do to Done Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fix for dataset Related to dataset-specific fix files
Projects
Development

No branches or pull requests

5 participants