New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"UnicodeEncodeError: 'ascii' codec can't encode character" on peru sync #136

Closed
oryband opened this Issue Oct 20, 2015 · 19 comments

Comments

Projects
None yet
2 participants
@oryband

oryband commented Oct 20, 2015

Using peru in a docker image, with python3 and git packages installed via virtualenv yields the following result:

(peru)root@7ccee9c73fb8:~/# peru sync
Exception in callback None
handle: <TimerHandle cancelled when=85353.65284133301>
Traceback (most recent call last):
  File "/usr/lib/python3.4/asyncio/events.py", line 120, in _run
    self._callback(*self._args)
  File "/root/peru/lib/python3.4/site-packages/peru/display.py", line 158, in _draw
    self.output.write('\u250c' if len(self._job_slots) > 1 else '\u2576')
UnicodeEncodeError: 'ascii' codec can't encode character '\u2576' in position 0: ordinal not in range(128)

# this error is repeated multiple times, i truncated for easier reading
@oryband

This comment has been minimized.

Show comment
Hide comment
@oryband

oryband commented Oct 20, 2015

@oconnor663

This comment has been minimized.

Show comment
Hide comment
@oconnor663

oconnor663 Oct 20, 2015

Member

I think this error is happening when peru wants to print fancy unicode bracket characters, like this:

┌ a
├ b
└ c

The immediate workaround is to run peru with -q/--quiet or -v/--verbose, so that it stops trying to be fancy. Quiet is also the default if peru is running with its stdout redirected to a file.

It sounds like your terminal is configured in ASCII mode, without Unicode support. If you run

python3 -c "import sys; print(sys.stdout.encoding)"

does it say ASCII?

If you run

python3 -c "print('日本語')"

does it fail with the same error?

Member

oconnor663 commented Oct 20, 2015

I think this error is happening when peru wants to print fancy unicode bracket characters, like this:

┌ a
├ b
└ c

The immediate workaround is to run peru with -q/--quiet or -v/--verbose, so that it stops trying to be fancy. Quiet is also the default if peru is running with its stdout redirected to a file.

It sounds like your terminal is configured in ASCII mode, without Unicode support. If you run

python3 -c "import sys; print(sys.stdout.encoding)"

does it say ASCII?

If you run

python3 -c "print('日本語')"

does it fail with the same error?

@oryband

This comment has been minimized.

Show comment
Hide comment
@oryband

oryband Oct 21, 2015

i'm using urxvt, which supports unicode. however:

root@dddde7beacea:~/# python3 -c "import sys; print(sys.stdout.encoding)"
ANSI_X3.4-1968
root@dddde7beacea:~/# python3 -c "print('')"  # couldn't copy the unicode chars from here

root@dddde7beacea:~/#

oryband commented Oct 21, 2015

i'm using urxvt, which supports unicode. however:

root@dddde7beacea:~/# python3 -c "import sys; print(sys.stdout.encoding)"
ANSI_X3.4-1968
root@dddde7beacea:~/# python3 -c "print('')"  # couldn't copy the unicode chars from here

root@dddde7beacea:~/#
@oconnor663

This comment has been minimized.

Show comment
Hide comment
@oconnor663

oconnor663 Oct 21, 2015

Member

Oops, I should've predicted copy/paste wouldn't work there. I bet you can repro the error with this:

python3 -c "print('\u65e5\u672c\u8a9e')"

It looks like ANSI_X3.4-1968 is what Python falls back to when the system locale is bad. It's what I see on my own system if I run LANG="foo" python3 -c "import sys; print(sys.stdout.encoding)". (Normally without messing with LANG I see UTF-8.) Is it possible you have a misconfigured /etc/locale.conf or something like that? Or was your system intentionally set up to be ASCII-only?

Ultimately I'm not sure what the Right Thing for peru to do is here. It would be easy enough to avoid printing our own unicode brackets when a terminal is in ASCII mode, but we'd still have to worry about unicode in git URLs, for example. Do you know if this is something that other Python tools explicitly handle? Maybe there's a hack we can use to tell Python to just emit UTF8 anyway?

Member

oconnor663 commented Oct 21, 2015

Oops, I should've predicted copy/paste wouldn't work there. I bet you can repro the error with this:

python3 -c "print('\u65e5\u672c\u8a9e')"

It looks like ANSI_X3.4-1968 is what Python falls back to when the system locale is bad. It's what I see on my own system if I run LANG="foo" python3 -c "import sys; print(sys.stdout.encoding)". (Normally without messing with LANG I see UTF-8.) Is it possible you have a misconfigured /etc/locale.conf or something like that? Or was your system intentionally set up to be ASCII-only?

Ultimately I'm not sure what the Right Thing for peru to do is here. It would be easy enough to avoid printing our own unicode brackets when a terminal is in ASCII mode, but we'd still have to worry about unicode in git URLs, for example. Do you know if this is something that other Python tools explicitly handle? Maybe there's a hack we can use to tell Python to just emit UTF8 anyway?

@oconnor663

This comment has been minimized.

Show comment
Hide comment
@oconnor663

oconnor663 Oct 21, 2015

Member

Could you please let me know what docker image you're using? I could take a look at the locale configs myself.

Aw heck, I just tried it on the standard Ubuntu image and totally reproduced this. Here's a guy who ran into the same thing: http://jaredmarkell.com/docker-and-locales/

Member

oconnor663 commented Oct 21, 2015

Could you please let me know what docker image you're using? I could take a look at the locale configs myself.

Aw heck, I just tried it on the standard Ubuntu image and totally reproduced this. Here's a guy who ran into the same thing: http://jaredmarkell.com/docker-and-locales/

@oconnor663

This comment has been minimized.

Show comment
Hide comment
@oconnor663

oconnor663 Oct 21, 2015

Member

It looks like exporting PYTHONIOENCODING="utf8" is another workaround for this, which you might prefer over --verbose or --quiet if you expect a person to be watching the terminal output from your docker.

Member

oconnor663 commented Oct 21, 2015

It looks like exporting PYTHONIOENCODING="utf8" is another workaround for this, which you might prefer over --verbose or --quiet if you expect a person to be watching the terminal output from your docker.

@oryband

This comment has been minimized.

Show comment
Hide comment
@oryband

oryband Nov 1, 2015

@oconnor663 maybe this should be set internally in peru? is that even possible? asking the user to do that manually feels wrong. If not possible, at least add this to the docs.

oryband commented Nov 1, 2015

@oconnor663 maybe this should be set internally in peru? is that even possible? asking the user to do that manually feels wrong. If not possible, at least add this to the docs.

@oryband

This comment has been minimized.

Show comment
Hide comment
@oryband

oryband Nov 1, 2015

adding a link to that article is also a good idea

oryband commented Nov 1, 2015

adding a link to that article is also a good idea

@oconnor663

This comment has been minimized.

Show comment
Hide comment
@oconnor663

oconnor663 Nov 1, 2015

Member

Agreed, I think fixing it under the covers would be the right thing to do. I'm not sure how easy that will be though, because I think this setting might get locked in early in interpreter startup, before any of our code has a chance to run? If we have to do some hack where we replace the global stdout and stderr pipes, I'd be worried that'll break some random system. I need to do some googling.

Member

oconnor663 commented Nov 1, 2015

Agreed, I think fixing it under the covers would be the right thing to do. I'm not sure how easy that will be though, because I think this setting might get locked in early in interpreter startup, before any of our code has a chance to run? If we have to do some hack where we replace the global stdout and stderr pipes, I'd be worried that'll break some random system. I need to do some googling.

oconnor663 added a commit that referenced this issue Nov 2, 2015

force utf8 mode when sys.stdout.encoding is ASCII
In some environments (particularly Docker), Python tends to start up
with the locale set to ASCII. That means trying to print unicode
characters raises an exception, like in our fancy display. Rather than
requiring the user to explicitly set PYTHONIOENCODING=utf8, we
explicitly rewrap stdout and stderr in UTF8 file objects.

I'm a little worried that this will break something down the line...

#136
@oconnor663

This comment has been minimized.

Show comment
Hide comment
@oconnor663

oconnor663 Nov 2, 2015

Member

@oryband could you please confirm that that change fixes the issue for you? The easiest way is to checkout that commit in your container and then run pip install . from the root of the repo.

Member

oconnor663 commented Nov 2, 2015

@oryband could you please confirm that that change fixes the issue for you? The easiest way is to checkout that commit in your container and then run pip install . from the root of the repo.

@oryband

This comment has been minimized.

Show comment
Hide comment
@oryband

oryband Nov 2, 2015

actually, i can't. i already avoided this bug
i think i reproduced it by using a bad peru.yaml file with a bad imports section
when executing peru sync, i saw this bug. don't remember why

i also already added the locale fixed to the docker image i was running this from, so this also bypasses the problem

sorry :\

oryband commented Nov 2, 2015

actually, i can't. i already avoided this bug
i think i reproduced it by using a bad peru.yaml file with a bad imports section
when executing peru sync, i saw this bug. don't remember why

i also already added the locale fixed to the docker image i was running this from, so this also bypasses the problem

sorry :\

@oconnor663

This comment has been minimized.

Show comment
Hide comment
@oconnor663

oconnor663 Nov 2, 2015

Member

No worries. I think the most common way someone would hit this would be to do a peru sync inside an interactive docker shell, so that we default to fancy formatting and then choke on the etc. fancy characters. This fix seems to work for at least that case, so I'm guessing it works for the rest.

Member

oconnor663 commented Nov 2, 2015

No worries. I think the most common way someone would hit this would be to do a peru sync inside an interactive docker shell, so that we default to fancy formatting and then choke on the etc. fancy characters. This fix seems to work for at least that case, so I'm guessing it works for the rest.

@oryband

This comment has been minimized.

Show comment
Hide comment
@oryband

oryband Nov 3, 2015

very well then. if you feel this has been resolved, feel free to close this issue. 💯

oryband commented Nov 3, 2015

very well then. if you feel this has been resolved, feel free to close this issue. 💯

@oconnor663

This comment has been minimized.

Show comment
Hide comment
@oconnor663

oconnor663 Nov 5, 2015

Member

Note to self: The default Windows terminal encoding seems to be cp437, which also causes these problems. We might need to handle more than just ASCII mode...

Member

oconnor663 commented Nov 5, 2015

Note to self: The default Windows terminal encoding seems to be cp437, which also causes these problems. We might need to handle more than just ASCII mode...

@oryband

This comment has been minimized.

Show comment
Hide comment
@oryband

oryband Nov 15, 2015

@oconnor663 this might also be a solution (please verify):

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

oryband commented Nov 15, 2015

@oconnor663 this might also be a solution (please verify):

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

oconnor663 added a commit that referenced this issue Nov 16, 2015

force utf8 mode when sys.stdout.encoding is ASCII
In some environments (particularly Docker), Python tends to start up
with the locale set to ASCII. That means trying to print unicode
characters raises an exception, like in our fancy display. Rather than
requiring the user to explicitly set PYTHONIOENCODING=utf8, we
rewrap stdout and stderr in UTF8 file objects.

I'm a little worried that this will break something down the line...

#136
@oconnor663

This comment has been minimized.

Show comment
Hide comment
@oconnor663

oconnor663 Nov 16, 2015

Member

That fix looks like it's Python-2-only I'm afraid.

Since I don't have any better ideas, I've gone ahead and landed the hack above (a8072aa) and pushed release 0.2.6. Hopefully this won't break anything!

Member

oconnor663 commented Nov 16, 2015

That fix looks like it's Python-2-only I'm afraid.

Since I don't have any better ideas, I've gone ahead and landed the hack above (a8072aa) and pushed release 0.2.6. Hopefully this won't break anything!

@oconnor663 oconnor663 closed this Nov 16, 2015

@oconnor663

This comment has been minimized.

Show comment
Hide comment
@oconnor663

oconnor663 Nov 16, 2015

Member

Thanks for your help throughout all this, by the way!

Member

oconnor663 commented Nov 16, 2015

Thanks for your help throughout all this, by the way!

@oryband

This comment has been minimized.

Show comment
Hide comment
@oryband

oryband Nov 16, 2015

happy to help

oryband commented Nov 16, 2015

happy to help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment