-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Font Unicode Coverage #3
Comments
That sounds really useful. It looks like the infrastructure in the Android platform could be expanded to do this. The Android implementation opens up the TTF files and pulls out the font name -- information on which characters are supported is also held inside the font files themselves (see https://unix.stackexchange.com/questions/247108/how-to-find-out-which-unicode-codepoints-are-defined-in-a-ttf-file for a python example, although our Android is in Java). I'd have to poke around a little more for how to do this on iOS. We don't deal directly with font files there. I'm currently working towards a beta of our mobile project (adapt-it-mobile), but might be able to poke at this once that's out the door. Alternately, if you have the desire to go and implement it, you could just send a pull request and I can fold in the changes. You would be adding a method to the |
Erik, I have not yet done any plugin development, but I will have to learn after a while. I am hoping that you find this worth doing. Gary
|
If you can access the cmap from the fonts you want to use, this would be a first step to determining whether the font supports a given character set. Some background you need to understand: The term the cmap can be slightly ambiguous. In terms of the binary format of font files (which, at the top level of organization, is a series of data structures called tables), the term refers to the table that has the tag So if you are reading the binary font file, you first have to locate the Once you have the relevant subtable, you now have a character to glyph mapping. Be aware that there are several different formats this mapping can take, and for the rest of this I'll assume what is the most common one, namely a mapping from Unicode character code (USV) to glyph index (i.e., the index into the At this point you will be able to ask the question: For a given USV, does this cmap return a non-zero glyph index? The glyph with index zero is the "not defined" glyph, so if any USV maps to glyph zero you can say that character isn't supported by that font. I suspect most algorithms that try to determine character support of a font will stop at this point and use the zero/non-zero cmap result to indicate whether the character is not-supported/supported. This may be all you need to do in your situation -- certainly it would eliminate most of the "square box" issues mentioned by the OP. In reality, however, if you get a non-zero glyph index about the best you can say is that the character is likely to be supported by the font, and getting more resolution to the question is harder to do. If the indicated glyph has no "ink", i.e., no outline, it could be that this is intended as it is representing a whitespace character, but it might be that the font author just never implemented the outline. Even if it has an outline, does it really behave properly? For example, suppose the USV is that of a diacritic -- does the OpenType or Graphite logic in the font correctly position the glyph over all the base characters that it needs to for the language in question? (and for that matter, do you know what bases need to take that diacritic for that given language?) As another example, many scripts require specific shapes of a character in specific contexts (e.g., Arabic contextual shaping) -- does the font logic select the correct glyph in all the needed contexts (for the given language)? Even if the font logic is correct, does the rendering library your app uses know to use that logic? If the character is relatively new to Unicode, it may be that the shaping logic in your support libraries hasn't been updated to know this character's properties so might not know it needs positioning or contextual shaping, etc. (This is particularly problematic with OpenType logic; less so but not entirely absent for Graphite logic.) Like I said above, it may be that in your context "in the cmap or not" is a sufficient test, but I wanted you to understand that the full answer about what characters are supported by a font is more complex than that. |
This is a feature request, not a bug report. We would like to know the unicode coverage of each font that is on a device. Our very international App sometimes displays text that has no glyph (displays as boxes). If we could test a device for the ranges of unicode characters that each font supports, then we would be able to drop the languages from a device that would not display. I am hoping this information is available in the font file, and could be provided along with the name.
The text was updated successfully, but these errors were encountered: