BUG: in font_manager, handle unicode output from fontconfig #2431

efiring · 2013-09-17T20:40:02Z

OS X uses utf-8 filesystem encoding, so the bytes returned by
fontconfig need to be interpreted as unicode using the system
encoding.

OS X uses utf-8 filesystem encoding, so the bytes returned by fontconfig need to be interpreted as unicode using the system encoding.

mdboom · 2013-09-18T01:35:43Z

Was this in response to a real bug, or just theoretical?

I'm not sure this is the right fix. I know I probably introduced this bug a few years ago, but I think the best thing to do here is to keep everything as byte strings -- that's the only thing that's guaranteed to pass through and roundtrip on POSIX platforms (i.e. the file encoding may be utf-8 on a particular platform, but it's defined as being just byte strings). More importantly, the FT2Font extension can't open files using a Unicode object as a file name (because it's just C underneath). That's actually a hard problem to solve -- the only sure fire way would be to open the file in Python and pass the file handle to the C extension, but freetype doesn't make it terribly easy to then use it (though it is possible). If we keep the filenames as byte strings throughout, things are a lot simpler.

I'll probably need to make an alternative PR to better explain what I'm suggesting, but in the meantime, I thought I'd confirm that you don't have a real case where this was broken and new fixed by this change.

efiring · 2013-09-18T07:35:10Z

@mdboom, It's a real bug on my system, and this fixes it--at least enough that I could proceed. I was trying to do some work after foolishly updating, and tripped over two successive bugs. This one was fixable, the other was not but was restricted to the macosx backend. I reported that one--a crash--on matplotlib-devel.

The exception was raised by the line following the one I changed, calling the split() method on output which had been run through str(), so it would not have been bytes on Python 3 anyway. The exception was raised because split() was trying to convert the string using ascii, and the Chinese characters in some font name on my system were tripping it up. (There are several such names; I don't know whether they are standard with Mountain Lion, or came in via NeoOffice or something like that.)

Here is an example line from my fontconfig output:
/Library/Fonts/雅痞-繁.otf:

And here is an illustration of how to trigger the same error in ipython with python 2.7:

In [11]: x = b"/Library/Fonts/雅痞-繁.otf:"

In [12]: x
Out[12]: '/Library/Fonts/\xe9\x9b\x85\xe7\x97\x9e-\xe7\xb9\x81.otf:'

In [13]: x.split('\n')
Out[13]: ['/Library/Fonts/\xe9\x9b\x85\xe7\x97\x9e-\xe7\xb9\x81.otf:']

In [14]: from __future__ import unicode_literals

In [15]: x.split('\n')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-15-30d0e68606dd> in <module>()
----> 1 x.split('\n')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 15: ordinal not in range(128)

efiring · 2013-09-18T07:39:36Z

@mdboom, maybe the solution here is to convert back to bytes after splitting the output into a list of strings? Or to make sure that conversion occurs before anything is passed into the ft2font extension?

…ntconfig in the filesystemencoding. Fix a number of buglets that did not allow fonts with non-ascii filenames to be opened. This involves taking advantage of code already in FT2Font to use unicode filenames, and encoding the filenames before passing into ttconv.

BUG: in font_manager, handle unicode output from fontconfig

f7bb92f

OS X uses utf-8 filesystem encoding, so the bytes returned by fontconfig need to be interpreted as unicode using the system encoding.

mdboom mentioned this pull request Sep 18, 2013

Handle Unicode font filenames correctly/Fix crashing MacOSX backend #2433

Merged

efiring closed this Sep 18, 2013

anntzer mentioned this pull request Jun 25, 2014

On Windows, matplotlib fails to load fonts when installed to a folder with non-ascii path #3148

Closed

efiring deleted the font_list_encoding branch February 18, 2015 00:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: in font_manager, handle unicode output from fontconfig #2431

BUG: in font_manager, handle unicode output from fontconfig #2431

efiring commented Sep 17, 2013

mdboom commented Sep 18, 2013

efiring commented Sep 18, 2013

efiring commented Sep 18, 2013

BUG: in font_manager, handle unicode output from fontconfig #2431

BUG: in font_manager, handle unicode output from fontconfig #2431

Conversation

efiring commented Sep 17, 2013

mdboom commented Sep 18, 2013

efiring commented Sep 18, 2013

efiring commented Sep 18, 2013