-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Naming Conventions and periods (.) #256
Comments
Many users who work with CF-compliant data like to adopt the variable names found in the netCDF files as the names for their variables in their computer codes (e.g., if specific humidity has the name "hus" in the file, they might use that in their codes when manipulating that variable). Some languages forbid the use of a period in variable names, so this would be one argument against allowing periods in the variable names of CF-compliant files (e.g., if specific humidity were allowed to be named "spec.hum", that name could not be used in their codes). |
Thanks @taylor13! That is an argument that hadn't come to mind, and may be why the dash (-) is also excluded though it is allowed in netCDF. As a particular example, python has the same naming conventions as adopted by the CF community. Any thoughts on whether the addition of periods to the allowed characters in identifiers has any chance on getting accepted given the above argument or any others not yet accounted for? |
This reminds me of a very similar discussion in cf-conventions/#237. In addition to @taylor13's argument against allowing In my 'previous life' as an avid Matlab user I, too, was parsing netCDF variable names into program variables, and had to invent my own rules for illegal characters like I think that the question is whether CF should maintain the current restriction, which in practice means that data producers will have to translate reasonably natural names to conform with CF restrictions (as indicated in the initial post), or if CF should relax the current restriction in allowed characters to align with netCDF rules. Finally, as was concluded in cf-conventions/#237 it is be useful clarify what a letter is in the sentence
Something for this year's workshop, ping @ethanrd, @davidhassell ? |
Thanks for that link to the earlier discussion! It looks like the earlier conclusions were that relaxing the restriction might be possible, but people wanted to see actual use cases. I hadn't actually picked up on the "should" vs. "must/shall" usage in the naming conventions section which, based on my understanding of the terms, means products are technically already CF conforming even if variable or group names include periods. |
I would in fact propose to relax the character restrictions for CF names considerably since these restrictions limit the usability of the convention. I will soon have to propose names for the concentrations of PCBs, so we are looking at names of the type 2,2',3,3',4,4',5,5'-octachlorobiphenyl mass concentration The commas and quotation marks in this name are essential to denote the chemical, so they can't be replaced. PCBs and brominated flame retardants are clearly a relevant area of atmospheric research and need to have a place in the CF naming convention. To limit the character set of a vocabulary to meet the needs of programming languages is rather outdated. Programming languages should serve the use cases, not limit them. |
Dear Sebastian @sval-dev and all Thanks for this useful discussion and the references to the previous explorations of the same issue.
Best wishes Jonathan |
When I see words like "should" and "must" I almost always assume that they are RFC2119. i.e. "should" means that you can ignore that particular requirement if there is a good reason and it is not spec breaking. CF itself doesn't have this sort of requirement level statement, thought it might be a good idea for some future version. I'm willing to take a crack at starting this effort if desired. My thoughts about the security implications of variable names still applies. We should absolutely not consider the valid symbol name restrictions in programing languages to be constraining for CF netCDF variable names. |
A couple of comments/questions regarding rather different aspects of this discussion: @sval-dev, @JonathanGregory: The link you both gives to the to the rules for python identifiers (also referred to as names) includes, for python3, a range of unicode characters. For example both
And at least
Again, I think that it is useful to clarify what CF means with the word "letter". And if the sentence above is used then we need to make reference to Unicode to explain what the "U+0001" etc. means. I concur with @DocOtak, and @sval-dev as it seems, regarding in my own interpretation "should" etc. Mainly because of RFC2119 is more and more becoming the de facto standard way outside the original context of the "Internet Community". It has been referred at least a couple of times before in CF conversations (here and here). And as @JonathanGregory writes, the language for expressing requirements has been discussed many times. I think that it would be absolutely brilliant if you Andrew would be willing to make a good start at introducing RCF2119 into CF. Then I hope other would join in to support, at least I am willing. I also agree with @DocOtak, and @zklaus, (here and here) regarding the potentially very serious problems of eval-uating netCDF variable names into program variables. |
Dear @DocOtak and @larsbarring Thanks a lot for offering to have a go at making the language consistent for recommendations and requirements in the CF document, Andrew. That is very generous and helpful. Like @larsbarring, I would be willing to help. A definite source of guidance is the conformance document, which clearly distinguishes requirements and recommendations, where a requirement is something that must be done, and a recommendation is something which isn't compulsory but is advisable. (I understand a prohibition to be a requirement not to do something, and a deprecation as a recommendation not to do it.) Not all requirements and recommendations are stated there, because it includes only those which can be checked automatically, and in other cases it might be unclear whether the text means a requirement or a recommendation. We may have to discuss such cases to clarify them.I think we may also have things which are "strongly recommended" or other such words, but that doesn't imply a third category. They're just recommendations. Thanks for clarifying the point about the Python characters, Lars. Yes, I was referring to the ASCII range. Sorry to be sloppy. Best wishes Jonathan |
@sval-dev in you initial post you mention that you would like to have variable names like For the record, I am not a fan of assigning too detailed a meaning to variable names. As has been pointed out before, this should instead go into standard names and other metadata. At the same time I think that we should seriously (and carefully) consider to allow selected additional characters. But could this discussion continue over in cf-conventions/#237 with the intention to clarify the character set allowed for variable names? |
Thanks @markusfiebig, @JonathanGregory, @DocOtak and @larsbarring for your thoughts! On the comma containing chemical names, it is useful to have another illustrative use case! I think goal of adding metadata to help product discoverability and understanding is certainly worthwhile. In terms of driving the adoption of CF conventions, it also seems like it would be ideal if we could layer CF conventions onto existing legal netCDF4 products (e.g. like those that might have periods in their dataset names) while minimizing changes to those products (e.g. by forcing the modification of dataset names). Just for reference, here is what netCDF4 has:
I also agree with DocOtak and larsbarring that the common understanding of "should" and "must" from the RFC is a good thing to adopt if that isn't already the meaning imparted when those words are used here. On the python naming restriction, I was indeed referring to the more restrictive section outlined. On the example naming with slashes, I included the slashes because that is how netCDF4 groups (starting from the root group at "/") are referenced in my software. Here is an example of reading a file in Python with the netCDF4 Python package:
Note the above purposely preserves the ability to have variable names (really keys here), refer to a name with periods in it by using a dictionary. There is nothing that stops us from using 2p5 above instead, but that was not the preferred usage and so wasn't what ended up becoming adopted. Looking forward to further discussion in #237 ! |
I am closing this now:
|
The linked example MAIA_L4_GFPM_20180101T000000Z_FB_NOM_R01_USA-Boston_F01_VSIM01p01p01p01.nc file does indeed require a (free) NASA Earthdata login. |
In section 2.3, "Naming Conventions" from CF Conventions 1.10, it states:
However, not mentioned above, is that the netCDF interface also allows the use of the period character (.) (as well as some others not mentioned by CF Conventions)
For our data products, we think the period is more meaningful than the alternative underscore representations.
In particular consider the following two variable names:
/PM_2.5_Total
/PM_10_Total
The alternative formulation of
/PM_2_5_Total
, in consultation with our user community, has been evaluated as less clear than the first construction and was not preferred.Is there some reason why a period is not allowed in either in CF Conventions or in COARDS from which CF Conventions are derived?
What obstacles might we face if we tried to get in a change request to allow periods in the naming conventions of the CF Conventions?
The text was updated successfully, but these errors were encountered: