Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError: 'ascii' codec can't encode character u'\xb4' #115

Closed
jcalonso opened this issue Apr 10, 2016 · 14 comments
Closed

UnicodeEncodeError: 'ascii' codec can't encode character u'\xb4' #115

jcalonso opened this issue Apr 10, 2016 · 14 comments
Labels

Comments

@jcalonso
Copy link

I'm having the the following error, on the latest version of the cli tool (0.5.2)

jcalonso@my-server:/media/data# b2 sync myData b2:myBucket/myData
Traceback (most recent call last):
  File "/usr/local/bin/b2", line 9, in <module>
    load_entry_point('b2==0.5.2', 'console_scripts', 'b2')()
  File "/usr/local/lib/python2.7/dist-packages/b2/console_tool.py", line 782, in main
    exit_status = ct.run_command(decoded_argv)
  File "/usr/local/lib/python2.7/dist-packages/b2/console_tool.py", line 258, in run_command
    return self.sync(args)
  File "/usr/local/lib/python2.7/dist-packages/b2/console_tool.py", line 736, in sync
    self._print("+ %s" % filename)
  File "/usr/local/lib/python2.7/dist-packages/b2/console_tool.py", line 292, in _print
    print(*args, file=self.stdout)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb4' in position 44: ordinal not in range(128)

@bwbeach
Copy link
Contributor

bwbeach commented Apr 10, 2016

Thanks for the bug report. I'll be working on sync this week, and will get this fixed.

@bwbeach
Copy link
Contributor

bwbeach commented Apr 10, 2016

Python uses sys.stdout.encoding to decide how to translate unicode to bytes when printing to stdout. My guess is that your encoding is set to 'ascii', which means that it won't be able to print unicode.

My environment has 'utf-8' as the default encoding for sys.stdout, and it works for me.

I'm not sure what the right thing to do is. When the encoding is unicode-capable, things should work. When it's not, what should the code do?

@bwbeach
Copy link
Contributor

bwbeach commented Apr 11, 2016

@jcalonso - I think that you could get around this by setting LANG to use an encoding that can print the characters in your file names. Here's a quick experiment I ran:

$ echo $LANG
en_US.UTF-8
$ python
Python 2.7.11 (default, Mar  7 2016, 13:29:38) 
>>> import sys ; sys.stdout.encoding
'UTF-8'
>>> print u'\u81ea'
自
>>> 
$ export LANG=en_US.ASCII
$ python
Python 2.7.11 (default, Mar  7 2016, 13:29:38) 
>>> import sys ; sys.stdout.encoding
'US-ASCII'
>>> print u'\u81ea'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u81ea' in position 0: ordinal not in range(128)

Any suggestions for what the b2 command should do when it can't print the file names?

@ppolewicz
Copy link
Collaborator

We should catch this and return an appropriate error message, like "your terminal cannot output Unicode in its current configuration. You can fix it by..."

@jcalonso
Copy link
Author

Just to complement the information, my terminal LANG is set to en_GB.UTF-8

@bwbeach
Copy link
Contributor

bwbeach commented Apr 12, 2016

That's interesting. I wonder what your sys.stdin.encoding is.

@ppolewicz
Copy link
Collaborator

@jcalonso please tell us what happens when you start python and type this in:

import sys
print sys.stdin.encoding

@jcalonso
Copy link
Author

$ python
Python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print sys.stdin.encoding
UTF-8
>>>

@bwbeach
Copy link
Contributor

bwbeach commented Apr 21, 2016

When I test on Windows, I get this:

>>> import sys
>>> sys.stdout.encoding
'cp437'

Apparently this is normal on Windows.

@bwbeach
Copy link
Contributor

bwbeach commented Apr 21, 2016

This test case exhibits the problem on Mac:

PYTHONIOENCODING=ascii python -m b2 ls <bucketName>

bwbeach added a commit that referenced this issue Apr 21, 2016
@seb2411
Copy link

seb2411 commented Jul 7, 2016

I have the same problem it seem with the version 0.5.6.

b2 sync /backup b2://mybucket/path
Traceback (most recent call last):
  File "/usr/local/bin/b2", line 9, in <module>
    load_entry_point('b2==0.5.6', 'console_scripts', 'b2')()
  File "/usr/local/lib/python2.7/dist-packages/b2/console_tool.py", line 873, in main
    exit_status = ct.run_command(decoded_argv)
  File "/usr/local/lib/python2.7/dist-packages/b2/console_tool.py", line 806, in run_command
    self._print_stderr('ERROR: %s' % (str(e),))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2011' in position 25: ordinal not in range(128)

print sys.stdin.encoding:

Python 2.7.11+ (default, Apr 17 2016, 14:00:29) 
[GCC 5.3.1 20160413] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print sys.stdin.encoding
UTF-8

@abeusher
Copy link

I experienced this and solved the problem by removing dashes/hyphens from my bucket names. Try doing this with a bucket name that is 100% alpha numeric.

@RX14
Copy link

RX14 commented Sep 18, 2016

I'm on backblaze-b2 version 0.6.2.

$ locale
LANG=en_GB.UTF-8
LC_CTYPE=en_GB.UTF-8
LC_NUMERIC=en_GB.UTF-8
LC_TIME=en_GB.UTF-8
LC_COLLATE=en_GB.UTF-8
LC_MONETARY=en_GB.UTF-8
LC_MESSAGES=en_GB.UTF-8
LC_PAPER=en_GB.UTF-8
LC_NAME=en_GB.UTF-8
LC_ADDRESS=en_GB.UTF-8
LC_TELEPHONE=en_GB.UTF-8
LC_MEASUREMENT=en_GB.UTF-8
LC_IDENTIFICATION=en_GB.UTF-8
LC_ALL=
$ backblaze-b2 sync --delete --replaceNewer /foo b2://bucket-with-dashes/
Traceback (most recent call last):
  File "/usr/bin/backblaze-b2", line 11, in <module>
    load_entry_point('b2==0.6.2', 'console_scripts', 'b2')()
  File "/usr/lib/python2.7/site-packages/b2/console_tool.py", line 879, in main
    exit_status = ct.run_command(decoded_argv)
  File "/usr/lib/python2.7/site-packages/b2/console_tool.py", line 811, in run_command
    self._print_stderr('ERROR: %s' % (str(e),))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2011' in position 23: ordinal not in range(128)

Removing the hyphens fixes the error, but this isn't an acceptable workaround, please reopen.

@ppolewicz
Copy link
Collaborator

Ok this is actually #266: user copied the bucket name from web interface, where it was displayed in a malformed manner. No actual hypens were used - instead \u2011 (NON-BREAKING HYPHEN) was passed to the CLI. We've fixed the exceptions and we requested the cloud team to stop using NON-BREAKING HYPHEN in their UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants