Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the set of characters allowed in netCDF object names #32

Open
ethanrd opened this issue Nov 9, 2020 · 2 comments
Open

Clarify the set of characters allowed in netCDF object names #32

ethanrd opened this issue Nov 9, 2020 · 2 comments

Comments

@ethanrd
Copy link
Member

ethanrd commented Nov 9, 2020

The NUG "Permitted Characters in NetCDF Names" section includes this sentence:

Names that have trailing space characters are also not permitted.

Which some have read (e.g., here) as implying that internal space characters are allowed.

I'm pretty sure that was not the intent. So we should clarify.


Also, just before the "trailing space characters" sentence (and in a few other places, like the nc-3 file format BNF) it says that UTF-8 encoded Unicode characters are allowed. I think we might want to be a bit more restrictive, perhaps using Unicode character categories to specify any restrictions. [The use of Unicode character categories was also suggested in this comment in the same discussion mentioned above.]

@Dave-Allured
Copy link

@ethanrd, internal spaces are explicitly part of the allowed character set for object names, in the netcdf-3 classic file format spec in current NUG appendix B. Space is ASCII \x20, included in the regular expression for internal characters. If the intent were to exclude, then the first exclusion range would be \x00-\x20, not \x00-\x1F.

name = nelems  namestring
            // Names a dimension, variable, or attribute.
            // Names should match the regular expression
            // ([a-zA-Z0-9_]|{MUTF8})([^\x00-\x1F/\x7F-\xFF]|{MUTF8})*

Also, a little lower down, there are the special1 and special2 enumerated subsets of allowed internal ASCII special characters. Space character is explicit at the start of the second set. (Note, typos in several other positions cleaned up here, for clarity.)

special2 = ' ' | '!' | '"' | '#' | '$' | '%' | '&' | ''' |
           '(' | ')' | '*' | ',' | ':' | ';' | '<' | '=' |
           '>' | '?' | '[' | '\' | ']' | '^' | '‘’ | '{' |
           '|' | '}' | '~'

Given these precise definitions, there is no doubt that the original intent was to allow internal spaces.

@ethanrd
Copy link
Member Author

ethanrd commented Nov 23, 2020

@Dave-Allured - Well, interesting. I stand corrected. Thanks for your closer look. I guess the netCDF format and library are quite permissive (which makes sense now that I think on it again) and it is up to conventions like CF to place more stringent limits on the characters allowed if they so desire.

Are there advisories or guidance the NUG should give on maximizing interoperability? Maybe in the Best Practices page.

@WardF, @DennisHeimbigner, @lesserwhirls, @dopplershift - Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants