-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly handle unicode values passed to cmor.axis in Python #616
Conversation
…alue used when copying string axis values
@mauzey1 I wonder if explicitly decoding to unicode e.g. string.decode('utf-8') Would be a good way to force all types? |
@durack1
I could make the conversion more explicit by using
|
@mauzey1 my comment was just that. I found that in generating html content from the contributed contents for the CMIP6_CVs all manner of weird non-standard (and non UTF-8) characters were sneaking in (presumably from folks copying characters out of Word, or some other rich text software), and the parsers weren't expecting these characters and consequently barfed. I think this is true to CMOR, so catching non UTF-8 characters and throwing an explicit error (if a decode function can't handle things) would be my more bulletproof preference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mauzey1 happy to green light your contribution, it looks good, but was just wanting to make the statement about non-UTF-8 chars for the record
@durack1 After some experimenting with unicode, I decided to change the code to explicitly convert the strings to UTF-8. |
@mauzey1 I think that was a wise tweak, hopefully, it catches those fringe cases |
Fixes #612
When string values get passed to cmor.axis, they will be treated as a Numpy array of type numpy.unicode_.
From https://numpy.org/doc/stable/reference/arrays.dtypes.html
The C code has been modified to properly copy string values from a Numpy array to a C array to be processed by cmor_axis.