Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode string to UTF-8 to prevent issues where ASCII fails to decode characters #31

Merged
merged 2 commits into from May 18, 2022

Conversation

moisespr123
Copy link
Contributor

Some Python environments are not able to decode special characters used in this print statement. Workarounds includes adding or changing some python variables or modifying an environment to default to UTF-8. After adding .encode("utf-8"), these workarounds are not needed anymore, as it seems only this line was causing UnicodeEncodeError.

@PythonCHB
Copy link
Contributor

Rankly, I think any system that can't print full Unicode without errors these days is misconfigured -- are there really still modern properly configured systems where this is an issue?

But in any case, this would mean that everyone would see the bytstring instead of the unit, for example, instead of:

'µm'

they'd get:

b'\xc2\xb5m'

Which partially defeats the point of printing it at all :-)

If you really think this is an issue, then we could wrap the print, somethign like:

try:
    print(f"Adding: {unit_type}: {n}")
except UnicodeEncodeError:
    print(f"Adding: {unit_type}: {n}".encode("utf-8"))

@moisespr123
Copy link
Contributor Author

Hi, thanks for the explanation.

Unfortunately, we needed to modify this to allow a specific environment to be able to successfully import this library and run. I was unaware of the bytestring output when using .encode('utf-8'), since we are just importing this library as part of PyGnome where it seemed to stop because of this line with the UnicodeEncodeError. Modifying the line made it run.

I updated the PR to surround this line in a try/except block.

I do agree this issue should not happen on modern systems :-)

@ChrisBarker-NOAA
Copy link
Contributor

Ah -- I see.

We had the same problem on operational servers a few years back. It turns out that our Docker images were "minimal", and thus configured with the default "C locale", which is asci only.

It turns out that PYthon added "utf-8" mode to address this (and similar) issues:

https://peps.python.org/pep-0540/

I highly encourage you to look into that -- Python is Unicode native, a system that will error out with a write to stdout is not stable. (it could be a log message, or ....)

I will merge this for now, because there's not reason not too, but now that I think about it, the issue here is probably that nucos shouldn't be using an print statements on import anyway -- that was probably put ther (by me) as a debugging tool, and I forgot to remove it.

I'll take a more careful look now.

@ChrisBarker-NOAA ChrisBarker-NOAA merged commit 0e367e2 into NOAA-ORR-ERD:master May 18, 2022
@ChrisBarker-NOAA
Copy link
Contributor

PS: thanks for the report!

@ChrisBarker-NOAA
Copy link
Contributor

NOTE: updated releases on gitHub, PyPi and conda-forge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants