Handle Unicode font filenames correctly/Fix crashing MacOSX backend #2433

mdboom · 2013-09-18T14:21:24Z

This is an alternative to #2431.

First, when handling the output of fontconfig (fc-match and fc-list), the bulk of the output is in ascii, so it is maintained, parsed and handled as bytes. Only once the filename part of the output is extracted do we decode it using sys.getfilesystemencoding().

Behind that bug, if you actually try to use a font file with non-ascii characters in the path:

Every time we passed the path to FT2Font, we were calling str() on it. This was required when FT2Font did not handle Unicode paths, prior to f4adec. However, then and now, it just serves to raise an exception when provided with a non-ascii path.
Within FT2Font, that unicode string was encoded for the purpose of displaying error messages. This could fail both on encoding and displaying it. Encoding it using "unicode_escape" ensures that we'll have something printable in any environment.
ttconv does not handle Unicode filenames. ttconv is in the process of being eliminated for MEP14, so I didn't want to invest a lot of effort fixing that. Instead, since it is only called from two places, the path is encoded back to filesystemencoding on the Python side before calling into it. This doesn't change the API to ttconv.

This will require a manual backport to 1.3.x if we want that -- things are different enough there that this won't really work or apply.

This approach is not necessarily "best practice" for POSIX platforms, which is to keep paths as byte strings everywhere. I tried that first, but there are too many assumptions throughout the code that file paths are unicode (particularly post f4adec with the move to six and from __future__ import unicode_literals). The downside of that is pretty minor -- if a path contains a sequence that does not decode using the filesystemencoding, such as an invalid utf-8 surrogate pair, it will not be openable by matplotlib. I'd consider that a pathological, even if technically legal, case, however, so I don't think we should worry too much about it. Keeping the paths as unicode objects does make the processing of them much easier, particularly with six and Python 3.

…ntconfig in the filesystemencoding. Fix a number of buglets that did not allow fonts with non-ascii filenames to be opened. This involves taking advantage of code already in FT2Font to use unicode filenames, and encoding the filenames before passing into ttconv.

efiring · 2013-09-18T17:07:02Z

lib/matplotlib/font_manager.py

+        # The bulk of the output from fc-list is ascii, so we keep the
+        # result in bytes and parse it as bytes, until we extract the
+        # filename, which is in sys.filesystemencoding().
+        for line in output.split(b'\n'):


Can b'\n' or b':' be part of a bytestring representation of a unicode filename itself? If so, this will fail. I don't see the rationale for doing all this splitting on bytestrings instead of converting to unicode, doing the string-based processing, and then converting back to bytes before returning from this function.

The need to do this is because if the file system encoding is something that isn't byte-based, such as utf-16be on Windows, then you have a combination of the ascii output from fontconfig with the embedded filename path in utf-16.

It is possible that '\n' could be in the filename, but again, that's kind of a degenerate case. It would not appear unexpectedly as a utf-8 surrogate pair (since utf-8 ensures that the control characters always remain control characters). It could appear in utf-16 as an upper byte, I suppose.

The colon is indeed an issue, however.

Looking at the fc-list manpage further, I think we can provide it with a custom output format that would only contain the filepath (which is the only part we care about anyway). That would simplify this...

I have a new solution to this using the --format argument to fontconfig.

anything complex ourselves.

mdboom · 2013-09-18T18:09:57Z

Travis is very unhappy -- looking into it.

mdehoon · 2013-09-20T02:04:41Z

The fix for the Mac OS X backend in src/_macosx.m looks fine. We may want to raise an Exception if the result of setfont is not NULL, just to be sure that we don't get a hard crash if for some reason setfont fails to find the font.

mdboom · 2013-09-20T12:53:43Z

@mdehoon: I'll add the setfont NULL check. Thanks for the tip.

mdboom · 2013-09-23T12:57:10Z

I have added the NULL check on setfont.

mdehoon · 2013-09-24T04:00:14Z

That looks fine to me. You may consider to skip the call to CGContextSelectFont(cr, name, size, kCGEncodingMacRoman) if font is NULL.

Handle Unicode font filenames correctly/Fix crashing MacOSX backend

mdboom added 2 commits September 18, 2013 10:05

Just skip a file if its filename can not be decoded

cdf87de

efiring reviewed Sep 18, 2013
View reviewed changes

mdboom added 2 commits September 18, 2013 13:12

Fix MacOSX backend post-six transition

895ed32

Use the --format argument to fontconfig so we don't have to parse

8e52475

anything complex ourselves.

Fix failing mathtext tests.

71e41e9

dmcdougall mentioned this pull request Sep 23, 2013

_macosx.so crash in build using Xcode 5 #2451

Closed

Better error handling with font can not be loaded by macosx backend

01ce972

Skip CGContextSelectFont if font lookup fails

e2a49e0

mdboom added a commit that referenced this pull request Sep 24, 2013

Merge pull request #2433 from mdboom/unicode-font-filenames

b060f7c

Handle Unicode font filenames correctly/Fix crashing MacOSX backend

mdboom merged commit b060f7c into matplotlib:master Sep 24, 2013

mdboom deleted the unicode-font-filenames branch August 7, 2014 13:53

This was referenced Nov 7, 2019

Note minimum supported version for fontconfig. #15626

Merged

Cannot use many system fonts in matplotlib #15625

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Unicode font filenames correctly/Fix crashing MacOSX backend #2433

Handle Unicode font filenames correctly/Fix crashing MacOSX backend #2433

mdboom commented Sep 18, 2013

efiring Sep 18, 2013

mdboom Sep 18, 2013

mdboom Sep 18, 2013

mdboom commented Sep 18, 2013

mdehoon commented Sep 20, 2013

mdboom commented Sep 20, 2013

mdboom commented Sep 23, 2013

mdehoon commented Sep 24, 2013

Handle Unicode font filenames correctly/Fix crashing MacOSX backend #2433

Handle Unicode font filenames correctly/Fix crashing MacOSX backend #2433

Conversation

mdboom commented Sep 18, 2013

efiring Sep 18, 2013

Choose a reason for hiding this comment

mdboom Sep 18, 2013

Choose a reason for hiding this comment

mdboom Sep 18, 2013

Choose a reason for hiding this comment

mdboom commented Sep 18, 2013

mdehoon commented Sep 20, 2013

mdboom commented Sep 20, 2013

mdboom commented Sep 23, 2013

mdehoon commented Sep 24, 2013