Problems with repeated standard_names in file #132

marceloandrioni · 2020-09-06T16:17:46Z

Hello,

I was trying to use ncWMS2 to visualize current data from CMEMS but there were some problems with the automatically created layers -group, -mag and -dir.

The dataset that gave me problems is the global-analysis-forecast-phy-001-024-hourly-merged-uv. This dataset has 4 "types" of currents. Tidal currents (utide: vtide), Stokes Drift (vsdx:vsdy), Eulerian currents (uo:vo) and Total currents (utotal:vtotal), that is the sum of all other currents.

I think the cause of the problem is that several of these variables use the same standard_name attribute:

uo:standard_name = "eastward_sea_water_velocity" ;
vo:standard_name = "northward_sea_water_velocity" ;

vsdx:standard_name = "sea_surface_wave_stokes_drift_x_velocity" ;
vsdy:standard_name = "sea_surface_wave_stokes_drift_y_velocity" ;

utide:standard_name = "eastward_sea_water_velocity" ;
vtide:standard_name = "northward_sea_water_velocity" ;

utotal:standard_name = "eastward_sea_water_velocity" ;
vtotal:standard_name = "northward_sea_water_velocity" ;

As far as I know, there is no rule in the CF Convetions against multiple variables in the same file using the same standard_name, so I would say that the file is semantically correct.

As shown in the image bellow, only 2 velocity groups were created instead of 4. Besides, only the Stokes Drift (vsdx: vsdy) is correct, the second group used the u component from the Tide (utide) with the v component of the Total current (vtotal), resulting in incorrect values.

I tried to think of a solution for this problem but could not find a definitive answer. Maybe an extra strep in the code to check for repeated standard_names and if found, try to match u:v variables using some string similarity between the variable or long names, e.g.:
uo, utide and utotal are all eastward_sea_water_velocity, but utide has a greater similarity with vtide than with vo or vtotal, so the pair should be utide:vtide.
But I am just guessing here since I am not even sure that there is a string similarity Java library (Python and JS have it).

Also, I don't think this problem is restricted to this specific dataset. Another example would be ERA5 data from ECMWF where a file could have wind components for 10m height and 100m height that would also use the same standard_names, as there is no specific standard_names depending of the height.

The file I used in the test is here.

Thank you very much.

The text was updated successfully, but these errors were encountered:

jonblower · 2020-09-07T10:28:52Z

Hi - thanks very much for this report and your detailed description, which is very helpful indeed. I think your analysis is correct. The solution you propose sounds reasonable, but I'll leave @guygriffiths to comment on how easy this may be to implement.

On a side note, I think you're right that the CF conventions technically permit duplicated standard names, although personally I think these should be strongly discouraged in favour of creating new, more specific, standard names. For example, there could be eastward_sea_water_velocity_due_to_tide or something like that. The purpose of the standard name is to distinguish between fields, and it would be odd if a dataset recorded exactly the same field twice.

However, your point about ERA5 is also valid - if the standard name does not contain the height coordinate then they cannot be distinguished. I know there have been a lot of conversations in the CF community about how to handle this, but I don't know what the latest thinking is.

In any case, I know that you're not able to change the CMEMS data so it's useful to find a practical solution.

Do the fields in the CMEMS data have different long_names?

marceloandrioni · 2020-09-07T15:50:17Z

Hello @jonblower, thank you for the prompt reply. Answering your question:

Do the fields in the CMEMS data have different long_names?

Yes, I forgot to include the long_names in my question, but all the variables in the file have different long_names. I think the "string similarity" check could be applied to the variables names (e.g.: uo) and/or the variables long_names (e.g.: "Eastward Eulerian velocity (Navier-Stokes current)") to try and match the u:v pairs, but that is only my suggestion. I have no idea how feasible it would be to do this in the edal/ncWMS codebase, as I have basically null Java knowledge.

uo:long_name = "Eastward Eulerian velocity (Navier-Stokes current)" ;
uo:standard_name = "eastward_sea_water_velocity" ;

vo:long_name = "Northward Eulerian velocity (Navier-Stokes current)" ;
vo:standard_name = "northward_sea_water_velocity" ;

vsdx:long_name = "Eastward wave-induced velocity (Stokes drift)" ;
vsdx:standard_name = "sea_surface_wave_stokes_drift_x_velocity" ;

vsdy:long_name = "Northward wave-induced velocity (Stokes drift)" ;
vsdy:standard_name = "sea_surface_wave_stokes_drift_y_velocity" ;

utide:long_name = "Eastward tide-induced velocity (Tide current)" ;
utide:standard_name = "eastward_sea_water_velocity" ;

vtide:long_name = "Northward tide-induced velocity (Tide current)" ;
vtide:standard_name = "northward_sea_water_velocity" ;

utotal:long_name = "Eastward total velocity (Eulerian + Waves + Tide)" ;
utotal:standard_name = "eastward_sea_water_velocity" ;

vtotal:long_name = "Northward total velocity (Eulerian + Waves + Tide) " ;
vtotal:standard_name = "northward_sea_water_velocity" ;

Regarding specifically the tides variables, actually I suggested adding the standard_names

eastward_sea_water_velocity_due_to_tides
northward_sea_water_velocity_due_to_tides

in the cf-metadata-request@cgd.ucar.edu mailing list (before the discussion board migration to GitHub page) and it got approved and incorporated to the standard_names table starting with Version 70. But this was made after CMEMS released the dataset in question, so they still use the old general names (_sea_water_velocity) for tidal currents.

Regarding repeated standard_names in files, I agree with you that the ideal solution would be individual standard_names but I think completely avoiding repeated standard_names in a file is difficult because the Standard Names Guideline says

Surfaces which are defined using a coordinate value (e.g. height of 1.5 m) are indicated by a single-valued coordinate variable, not by the standard name.

so using the ERA5 example again, the model can have an output file with the 3d wind field in the original vertical coordinates, and several other interpolated to z levels (e.g. 10m, 50m, 100m) all having the same standard_name (e.g. wind_speed).
Also, considering the CMEMS dataset, creating new standard_names for "sum" variables, like utotal being the sum of tides, Stokes Drift and eulerian can cause a proliferation of standard_names, something that the CF committee wants to avoid according to some discussions I saw in the GitHub page.

Thank you.

guygriffiths · 2020-09-18T10:59:58Z

@marceloandrioni Thanks for reporting this. I've fixed it in commit 1c76ba7. It's not released yet, so you'll need to build it from the develop branch to use it. I can probably do a 1.5 release next week if you're unable to build from develop. I used string similarity on the variable names, and tested it with your data file, where it worked fine.

marceloandrioni · 2020-09-18T13:50:31Z

Hi @guygriffiths, thank you very much for this fix. I am glad the option of using string similarity worked. I am not really sure how to build from the develop branch so I think I will wait for the war file of the v1.5 next week.
Thank you.

guygriffiths · 2020-09-23T14:26:33Z

Fix released https://github.com/Reading-eScience-Centre/ncwms/releases/tag/ncwms-2.5.0

marceloandrioni · 2020-09-23T14:42:34Z

Hello @guygriffiths, I was trying to deploy the new version (ncWMS v2.5) but even after removing the $HOME/.ncWMS2 to get rid of cache the applet fails to start with tomcat message:
FAIL - Application at context path [/ncWMS2] could not be started

I am using jdk1.8.0_261 and apache-tomcat-8.5.57 in linux x64. In the release notes it says that is JDK11 compatible, but it is mandatory from now on?

Thank you.

guygriffiths · 2020-09-23T14:48:04Z

Ah yes, apologies, I should have made that clear. Supporting both Java 8 and Java 11 (which are the only two long-term support releases at the moment) turned out to be much more of a headache than I'd first anticipated, so I went with Java 11 for this release.

marceloandrioni · 2020-09-23T14:51:21Z

Got it, I will try with SDK 11 then. Thank you.

marceloandrioni · 2020-09-23T15:48:03Z

Just to leave here the information that using ncWMS2 v2.5 running on Oracle jdk-11.0.8 and apache-tomcat-9.0.38 everything worked perfectly. The several groups of u:v components were correctly recognized. Thank you @guygriffiths.

guygriffiths closed this as completed Sep 23, 2020

marceloandrioni mentioned this issue Sep 24, 2020

Version 2.5.0 not running on Java 8 Reading-eScience-Centre/ncwms#70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with repeated standard_names in file #132

Problems with repeated standard_names in file #132

marceloandrioni commented Sep 6, 2020

jonblower commented Sep 7, 2020

marceloandrioni commented Sep 7, 2020

guygriffiths commented Sep 18, 2020

marceloandrioni commented Sep 18, 2020

guygriffiths commented Sep 23, 2020

marceloandrioni commented Sep 23, 2020

guygriffiths commented Sep 23, 2020

marceloandrioni commented Sep 23, 2020

marceloandrioni commented Sep 23, 2020

Problems with repeated standard_names in file #132

Problems with repeated standard_names in file #132

Comments

marceloandrioni commented Sep 6, 2020

jonblower commented Sep 7, 2020

marceloandrioni commented Sep 7, 2020

guygriffiths commented Sep 18, 2020

marceloandrioni commented Sep 18, 2020

guygriffiths commented Sep 23, 2020

marceloandrioni commented Sep 23, 2020

guygriffiths commented Sep 23, 2020

marceloandrioni commented Sep 23, 2020

marceloandrioni commented Sep 23, 2020