Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
TRAC #155: Invalid "id" values in CF Standard Name aliasses #132
Running an XML schema check on the CF standard name list, I found the following minor (because they relate to aliasses, not the standard name definitions) issues:
There are spurios spaces in these ids:
The standard name surface_carbon_dioxide_mole_flux has two aliasses, surface_upward_mole_flux_of_carbon_dioxide and surface_downward_mole_flux_of_carbon_dioxide, which is intended (the definitions of the two newer names indicate that the deprecated name was too imprecise). The problem here is that the XSD schema does not allow for two aliasses with the same id. Having unique id values for each element is useful, so I suggest we change the schema and the document to replace
This looks sensible to me. Alison's comment would be useful on this one too. Changing the standard names rectifies a defect, I agree, but I think that changing the schema should be treated as a proposal for substantial change to the convention.
Sorry for not getting to this ticket sooner!
I'm not sure I agree with changing the ids with "spurious spaces". The problem is that when the names were first published they did accidentally contain spaces - the aliases were introduced to correct the mistake (in the same way as we would do for a simple spelling mistake). The versions of the names containing spaces had been around for quite a long time before they were noticed. "rate_of_ hydroxyl_radical_destruction_due_to_reaction_with_nmvoc" appeared in versions 28 - 36 of the standard name table, spanning a period of 18 months in 2015-16. The other four appeared in versions 8 - 10 spanning a 7 month period in 2008. It is possible that during those periods data files were written containing the erroneous names. To avoid invalidating such files I thought it was better to use aliases rather than just quietly delete the problem! I could of course simply delete the aliases if that is generally felt to be acceptable, but that would mean treating typos involving spaces differently from any other minor error that might crop up in standard names.
Regarding the other alias that points to two current names, this again was done to avoid possibly invalidating existing data files. The original name, surface_carbon_dioxide_mole_flux, contained no indication of sign convention and this was felt not to be satisfactory. That particular name dates back to pre-version 1 of the standard name table and the aliases weren't introduced until version 15, a period of at least 2006 - 2010. Data files could have been written during that period using either upwards positive or downwards positive as a sign convention and both would have been valid CF at the time. I support the idea of changing the schema to make this use of aliases valid - such a use case was probably not envisaged when the schema was created but the main aim should always be to preserve the original meaning of the data, not to accidentally change it by imposing a schema that is too rigid.
I agree with "the main aim should always be to preserve the original meaning of the data, not to accidentally change it by imposing a schema that is too rigid", but I do not agree that the original meaning of the data has been preserved by aliasing it to two identifiers.
Anyone who used the original identifier undoubtedly had one of those two identifiers in mind, but we have not clarified the intended meaning through this process. I'm sorry I missed this topic first time around, and it isn't worth getting up in arms about, but the original term has a clearly different meaning and application than either of its referenced replacements.