Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode decode error [backport to 1.4.x] #3594

Merged
merged 2 commits into from Sep 30, 2014

Conversation

wernerfb
Copy link
Contributor

Correct the following exception on Windows with Py2.x and user having e.g. Umlaute in his home folder name.
twcbsrc.app_base - ERROR - app_base.pyo - 292 - Traceback (most recent call last):
File "twcbsrc\controllers\app_cb.pyo", line 403, in onTBStats
File "importlib__init__.pyo", line 37, in import_module
File "zipextimporter.pyo", line 82, in load_module
File "twcbsrc\controllers\app_stats.pyo", line 29, in
File "zipextimporter.pyo", line 82, in load_module
File "twcbsrc\views\statistics.pyo", line 19, in
File "zipextimporter.pyo", line 82, in load_module
File "matplotlib__init__.pyo", line 1048, in
File "matplotlib__init__.pyo", line 897, in rc_params
File "matplotlib__init__.pyo", line 759, in matplotlib_fname
File "matplotlib__init__.pyo", line 630, in get_configdir
File "matplotlib__init
_.pyo", line 555, in get_xdg_config_dir
File "matplotlib__init
_.pyo", line 323, in wrapper
File "matplotlib__init__.pyo", line 509, in _get_home
File "ntpath.pyo", line 310, in expanduser
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 10: invalid start byte

See also: #3532

@tacaswell tacaswell added this to the v1.4.1 milestone Sep 29, 2014
@@ -523,7 +523,7 @@ def _get_home():
http://mail.python.org/pipermail/python-list/2005-February/325395.html
"""
try:
path = os.path.expanduser("~")
path = os.path.expanduser(b"~").decode(sys.getfilesystemencoding())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a tad confused why this goes through a byte string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 9/29/2014 20:42, Thomas A Caswell wrote:

In lib/matplotlib/init.py:

@@ -523,7 +523,7 @@ def _get_home():
http://mail.python.org/pipermail/python-list/2005-February/325395.html
"""
try:

  •    path = os.path.expanduser("~")
    
  •    path = os.path.expanduser(b"~").decode(sys.getfilesystemencoding())
    

I am a tad confused why this goes through a byte string


Reply to this email directly or view it on GitHub
https://github.com/matplotlib/matplotlib/pull/3594/files#r18174163.

I got the idea from a comment in this thread:
https://stackoverflow.com/questions/23888120/an-alternative-to-os-path-expanduser

If I correctly understand what is happening because I pass in b""
expanduser returns a byte string, this can then be decoded using the
filesystemencoding, if u"
" or just "~" is used then expanduser will
convert using the sys.defaultencoding.

Hopefully someone with more Python know how then me can confirm that
this is a valid approach - at least on Windows.

@mdboom
Copy link
Member

mdboom commented Sep 29, 2014

See this bug: http://bugs.python.org/issue13207

I just pulled up the code in ntpath.py, and I think I understand why this works. When passing a Unicode string, it gets the user's path from an environment variable (as bytes, because that's what environment variables are using the old Windows API that Python 2.x uses). When it tries to concatenate that with the original Unicode string, it uses the default encoding (quite often ASCII or UTF-8) rather than the file system encoding. So essentially, passing a byte string is a hack so that we can do the decoding outselves outside of the expanduser function where it is broken.

This PR shouldn't be merged exactly as-is, though. This special workaround should only occur on Windows, and only on Python 2.x.

On POSIX, os.path.expanduser(b'~') always returns a utf-8 string, not in getfilesystemencoding, so there, this will break if the file system encoding is not utf-8 (rare, but possible).

On Python 3, os.path.expanduser(u"~") reportedly just works, so we shouldn't use this workaround.

Other than that, looks good to me. Thanks for getting to the bottom of this.

@wernerfb
Copy link
Contributor Author

Hi Thomas,
I have already made the changes Michael recommended. Is there something else to revise?

@tacaswell tacaswell changed the title Unicode decode error Unicode decode error [backport to 1.4.x] Sep 30, 2014
@tacaswell
Copy link
Member

Looks good to me. I am going to leave this open for a bit longer to get feed back from any windows devs or linux users who have unicode in their user names.

cc @Tillsten, @jbmohler

@WeatherGod
Copy link
Member

Perhaps we should ask some people on the scipy and/or numpy list? They
might have a bit more non-English reach than we do?

On Tue, Sep 30, 2014 at 9:29 AM, Thomas A Caswell notifications@github.com
wrote:

Looks good to me. I am going to leave this open for a bit longer to get
feed back from any windows devs or linux users who have unicode in their
user names.

cc @Tillsten https://github.com/tillsten, @jbmohler
https://github.com/jbmohler


Reply to this email directly or view it on GitHub
#3594 (comment)
.

@jenshnielsen
Copy link
Member

Or someone from IPython?

@jbmohler
Copy link
Contributor

This is consistent with a fix I submitted to PySide (getfilesystemencoding needed on win32 python 2) and it looks correct to me here. I tried to test this a bit on my machine, but I'm foiled on getting a successful login with a user name with an "ä" on my english win7.

@wernerfb
Copy link
Contributor Author

To enter an 'ä' on an English Keyboard hold 'alt' and enter '0228' on the numpad or install a 'de' keyboard.
ö = alt 0246
ü = alt 0252
and some French ones would be:
ç = alt 0231
è = alt 0232
â = alt 0226

@jbmohler
Copy link
Contributor

Thanks @wernerfb ; I had actually figured that part out. After walking away from my computer & coming back, I realized my problem. It was much more basic (logging onto my domain rather than the local machine); sigh. Now I can review.

I can confirm that with "C:\Users\säm" user profile & with-out the PR here I get the UnicodeDecodeError as above. With this PR included, the matplotlib import works. "+1" from me.

tacaswell added a commit that referenced this pull request Sep 30, 2014
BUG : Unicode decode error on with unicode username on py2+win
@tacaswell tacaswell merged commit 95a9ef8 into matplotlib:master Sep 30, 2014
tacaswell added a commit that referenced this pull request Sep 30, 2014
BUG : Unicode decode error on with unicode username on py2+win
@jenshnielsen
Copy link
Member

Should this be cherry-picked onto 1.4.x ?

@tacaswell
Copy link
Member

I am about to push it up, got distracted by noticing someone (turns out it was me) had deleted the v1.4.0-doc branch and fixing it.

@tacaswell
Copy link
Member

cherry picked as df4e10a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants