Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: in font_manager, handle unicode output from fontconfig #2431

Closed
wants to merge 1 commit into from

Conversation

efiring
Copy link
Member

@efiring efiring commented Sep 17, 2013

OS X uses utf-8 filesystem encoding, so the bytes returned by
fontconfig need to be interpreted as unicode using the system
encoding.

OS X uses utf-8 filesystem encoding, so the bytes returned by
fontconfig need to be interpreted as unicode using the system
encoding.
@mdboom
Copy link
Member

mdboom commented Sep 18, 2013

Was this in response to a real bug, or just theoretical?

I'm not sure this is the right fix. I know I probably introduced this bug a few years ago, but I think the best thing to do here is to keep everything as byte strings -- that's the only thing that's guaranteed to pass through and roundtrip on POSIX platforms (i.e. the file encoding may be utf-8 on a particular platform, but it's defined as being just byte strings). More importantly, the FT2Font extension can't open files using a Unicode object as a file name (because it's just C underneath). That's actually a hard problem to solve -- the only sure fire way would be to open the file in Python and pass the file handle to the C extension, but freetype doesn't make it terribly easy to then use it (though it is possible). If we keep the filenames as byte strings throughout, things are a lot simpler.

I'll probably need to make an alternative PR to better explain what I'm suggesting, but in the meantime, I thought I'd confirm that you don't have a real case where this was broken and new fixed by this change.

@efiring
Copy link
Member Author

efiring commented Sep 18, 2013

@mdboom, It's a real bug on my system, and this fixes it--at least enough that I could proceed. I was trying to do some work after foolishly updating, and tripped over two successive bugs. This one was fixable, the other was not but was restricted to the macosx backend. I reported that one--a crash--on matplotlib-devel.

The exception was raised by the line following the one I changed, calling the split() method on output which had been run through str(), so it would not have been bytes on Python 3 anyway. The exception was raised because split() was trying to convert the string using ascii, and the Chinese characters in some font name on my system were tripping it up. (There are several such names; I don't know whether they are standard with Mountain Lion, or came in via NeoOffice or something like that.)

Here is an example line from my fontconfig output:
/Library/Fonts/雅痞-繁.otf:

And here is an illustration of how to trigger the same error in ipython with python 2.7:

In [11]: x = b"/Library/Fonts/雅痞-繁.otf:"

In [12]: x
Out[12]: '/Library/Fonts/\xe9\x9b\x85\xe7\x97\x9e-\xe7\xb9\x81.otf:'

In [13]: x.split('\n')
Out[13]: ['/Library/Fonts/\xe9\x9b\x85\xe7\x97\x9e-\xe7\xb9\x81.otf:']

In [14]: from __future__ import unicode_literals

In [15]: x.split('\n')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-15-30d0e68606dd> in <module>()
----> 1 x.split('\n')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 15: ordinal not in range(128)

@efiring
Copy link
Member Author

efiring commented Sep 18, 2013

@mdboom, maybe the solution here is to convert back to bytes after splitting the output into a list of strings? Or to make sure that conversion occurs before anything is passed into the ft2font extension?

mdboom added a commit to mdboom/matplotlib that referenced this pull request Sep 18, 2013
…ntconfig in

the filesystemencoding.  Fix a number of buglets that did not allow
fonts with non-ascii filenames to be opened.  This involves taking
advantage of code already in FT2Font to use unicode filenames, and
encoding the filenames before passing into ttconv.
@efiring efiring closed this Sep 18, 2013
@efiring efiring deleted the font_list_encoding branch February 18, 2015 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants