New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RT 57248] Cyrillic letters #7
Comments
The current situation is: 1: TTF does not appear to be missing any characters, including the three listed. 2a: The overlap of characters is because the width listed in PDF::Builder::Resource::Font::CoreFont::[fontname].pm's "missingwidth" value of 250, which is as little as a quarter of what is needed. Only the standard Latin-1 glyphs, and their widths, are listed. Everything else is "missing". Possibly this could be fixed by extending the [fontname].pm glyph and width tables, but that will be quite a bit of work. It's also possible that instead of using fixed .pm files, that PDF::Builder could read the local copy of the core files. Reading the local core font files for metrics and embedding the fonts (see #80) would ensure that all glyphs are always properly rendered. 2b: Core fonts do not support UTF-8 -- only single byte encodings at this time. UTF-8 support for core and Type1 fonts would certainly be desirable, but I don't know if it's feasible to add it (see #81). To access core font glyphs which are outside of Latin-1, consider using automap() to break up the font into multiple planes, each up to 256 characters. However (020_corefonts uses this), it still does not appear that this gives correct character widths. |
Update to RT 57248:
I think the best resolution of this is to switch to TTF (ttfont) rather than using core fonts. |
The missing widths problem is more general than just Cyrillic — it appears that even if the encoding is supported by Perl, that only those characters appearing in Latin-1 (ISO-8859-1) will get proper widths; everything else gets "missingwidth". If the original source font is TTF, possibly we could cheat and read widths from the font file. However, as the font file on the Reader's machine will always be used (no embedding of the font), there is no guarantee that widths will match. The same goes for extending the glyph width list to handle all supported Perl single byte encodings. It looks like the only real solution is to use TTF fonts instead of Core Fonts, if you want to use non-Latin-1 characters. Possibly allow only Latin-1/ISO-8859-1 encoding for Core Fonts? How many applications would this break? How about giving a warning for all other encodings? Give a switch to shut off this warning. |
Corefonts already give the correct glyph names in the PDF file, so all that appears to be needed is to add all the missing glyph widths to the [typeface].pm file. Added This appears to fix the problems described in the ticket, so I'm closing it. |
Thu May 06 02:28:32 2010 kuzvesov [...] list.ru Ticket created
Subject: Cyrillic letters
afii10047 (uppercase 'Э')
afii10049 (uppercase 'Я')
afii10095 (lowercase 'э')
are not displayed when using TrueType fonts. I tried different encodings (CP1251, UTF8) with the same result.
Perl version v5.10.1 built for MSWin32-x86-multi-thread
Binary build 1007 [291969] provided by ActiveState
Operating system Windows Vista Home Premium, Service Pack 1 (ver. 6.0.6001)
Subject: test-utf8.pdf
Subject: [rt.cpan.org #57248]
Date: Mon, 15 Feb 2016 16:40:51 -0500
To: bug-PDF-API2 [...] rt.cpan.org
From: Phil M Perry
I modified the example text file to display x40 through xFF for both TrueType and Core fonts. I ran it for CP1251 (Cyrillic), CP1252 (Latin 1), CP1253 (Greek), and CP1254 (Turkish). This is Windows XP SP3, PDF::API2 2.025, Adobe Reader 11.0.08. All four character sets have some variety of MS "Smart Quotes" in the x80 - x9F range. I have not yet tried UTF-8 encoded text.
In all cases, the TTF displays perfectly, even the unassigned characters in the Smart Quotes range. The three Cyrillic characters reported missing in the original bug report are present and in the right place. All the CoreFont displays have problems with the Smart Quotes unassigned characters still displaying the empty box, but evidently having a near-zero width (so that the following character mostly overprints it).
Core Font only problems:
CP1251: All Cyrillic and possibly some other characters print correctly, but apparently have about 33% width and are overprinted by following characters.
CP1252: The unassigned characters in the Smart Quotes range get overprinted, but the rest of the Latin-1 characters look OK.
CP1253: The Greek letters behave just like the Cyrillic letters in 1251.
CP1254: The Turkish letters behave just like the Latin-1 letters in 1252.
The bottom line is that TTF looks OK from here (at least for CP125x encoding), but Core Fonts have trouble with unassigned ("box") characters and non-Latin characters, where the characters look OK, but the text location is not advanced far enough and we get overprinting. Perhaps the font data (especially character width) isn't being read correctly? Since it works for (e.g.) CP1252, it seems odd that it would fail for non-Latin sets (note that Turkish is Latin). That would imply that the font files themselves are defective or non-standard in some way.
test-cp1251.pdf
test-utf8.pdf
57248.zip
The text was updated successfully, but these errors were encountered: