Encode string to UTF-8 to prevent issues where ASCII fails to decode characters #31

moisespr123 · 2022-05-17T11:49:29Z

Some Python environments are not able to decode special characters used in this print statement. Workarounds includes adding or changing some python variables or modifying an environment to default to UTF-8. After adding .encode("utf-8"), these workarounds are not needed anymore, as it seems only this line was causing UnicodeEncodeError.

…characters.

PythonCHB · 2022-05-18T06:16:17Z

Rankly, I think any system that can't print full Unicode without errors these days is misconfigured -- are there really still modern properly configured systems where this is an issue?

But in any case, this would mean that everyone would see the bytstring instead of the unit, for example, instead of:

'µm'

they'd get:

b'\xc2\xb5m'

Which partially defeats the point of printing it at all :-)

If you really think this is an issue, then we could wrap the print, somethign like:

try:
    print(f"Adding: {unit_type}: {n}")
except UnicodeEncodeError:
    print(f"Adding: {unit_type}: {n}".encode("utf-8"))

moisespr123 · 2022-05-18T11:53:22Z

Hi, thanks for the explanation.

Unfortunately, we needed to modify this to allow a specific environment to be able to successfully import this library and run. I was unaware of the bytestring output when using .encode('utf-8'), since we are just importing this library as part of PyGnome where it seemed to stop because of this line with the UnicodeEncodeError. Modifying the line made it run.

I updated the PR to surround this line in a try/except block.

I do agree this issue should not happen on modern systems :-)

ChrisBarker-NOAA · 2022-05-18T16:49:26Z

Ah -- I see.

We had the same problem on operational servers a few years back. It turns out that our Docker images were "minimal", and thus configured with the default "C locale", which is asci only.

It turns out that PYthon added "utf-8" mode to address this (and similar) issues:

https://peps.python.org/pep-0540/

I highly encourage you to look into that -- Python is Unicode native, a system that will error out with a write to stdout is not stable. (it could be a log message, or ....)

I will merge this for now, because there's not reason not too, but now that I think about it, the issue here is probably that nucos shouldn't be using an print statements on import anyway -- that was probably put ther (by me) as a debugging tool, and I forgot to remove it.

I'll take a more careful look now.

ChrisBarker-NOAA · 2022-05-18T16:49:48Z

PS: thanks for the report!

ChrisBarker-NOAA · 2022-05-18T22:17:53Z

NOTE: updated releases on gitHub, PyPi and conda-forge

Encode string to UTF-8 to prevent issues where ASCII fails to decode …

d99f0fa

…characters.

Wrap in Try/Except UnicodeEncodeError.

725c72b

ChrisBarker-NOAA merged commit 0e367e2 into NOAA-ORR-ERD:master May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode string to UTF-8 to prevent issues where ASCII fails to decode characters #31

Encode string to UTF-8 to prevent issues where ASCII fails to decode characters #31

moisespr123 commented May 17, 2022

PythonCHB commented May 18, 2022

moisespr123 commented May 18, 2022

ChrisBarker-NOAA commented May 18, 2022

ChrisBarker-NOAA commented May 18, 2022

ChrisBarker-NOAA commented May 18, 2022

Encode string to UTF-8 to prevent issues where ASCII fails to decode characters #31

Encode string to UTF-8 to prevent issues where ASCII fails to decode characters #31

Conversation

moisespr123 commented May 17, 2022

PythonCHB commented May 18, 2022

moisespr123 commented May 18, 2022

ChrisBarker-NOAA commented May 18, 2022

ChrisBarker-NOAA commented May 18, 2022

ChrisBarker-NOAA commented May 18, 2022