New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text handling oddities #193
Comments
I learned something else today about fonts. While it's true that Linux etc. variants place their fonts in all sorts of locations, Windows isn't as pure as I thought it was. When you add a new font, say, by dragging and dropping a .ttf file into \Windows\Fonts, there's no guarantee that it will end up there! Its name also will often be changed. This knowledge is important for knowing the font path and file name for using a TrueType font. To find out where your TTF or OTF file ended up, if you don't see an obvious entry in \Windows\Fonts, you need to look in \Users\XXXX\AppData\Local\Microsoft\Windows\Fonts, depending on what user you were signed on as when you installed the font. Even then, you may not be done, as the name may have been changed to something unrecognizable. You may need to look at Windows' mapping of font name to filename. In the command shell (command line), or whatever equivalent you like to use, enter "regedit" to bring up the registry editor. For the top level, choose (click on) either HKEY_LOCAL_MACHINE (for global font settings, in \Windows\Fonts) or HKEY_CURRENT_USER (for fonts installed by whoever is currently signed on, in \Users\XXXX\AppData...). From there, both have the same path: SOFTWARE > Microsoft > Windows NT > CurrentVersion > Fonts. This should bring up a listing of all the installed fonts (full name, e.g. "Papyrus Regular") and their actual filename ("PAPYRUS.TTF"). For instance, I just installed a blackletter "Gothic" font English Towne Medium. It ended up in the \Users\Phil... directory as EnglishTowne.ttf. You don't need to change anything in the registry, just look. You do have the capability to change things, including hiding/showing the font, if you care to get into those things. Anyway, this should give you the information you need to get the proper path and file name for TTF fonts you install (and even those that come with Windows). Other font types don't seem to jump through these hoops. At some point, this should probably go into the Credit: much of this information came from https://superuser.com/questions/1658678/detect-path-of-font-on-windows |
As discussed on the Adobe Support Community site, this is a matter of the encoding the PDF creator uses for the font in question:
Thus, if you want to use the Tw instruction to manipulate the spacing between words, you have to use an encoding for your font which uses the single-byte 32 character code for the space glyph. |
Regarding the Tw/wordspace issue, follow along on here. The bottom line (so far) is that there is no way when glyph IDs are used for TrueType fonts that it will ever support I will have to think about adding a hack to split up a
|
I don't know why you keep insisting (here and on the Adobe forum) that I am using a multibyte character encoding for the text. It's not. The original "space" is a single byte x20. For TTF support in PDF::Builder, the Reader is presented with a list of glyph IDs, which will vary by the particular font being used. A 'space' (x20) may end up 0003 in one font file and 00b7 in another. If the Reader is searching for an actual byte of x20, it ain't gonna find it. This is a limitation of the Reader implementation, in that it doesn't go looking for inkless glyphs (spaces) when presented with a glyph ID list rather than a text string (where a space is x20). My complaint is that I don't see this limitation documented, except in a very round-about way. |
You misunderstand what the PDF specification means when it talks about multibyte character codes. It does not talk about the character encoding you use in your application before you transform some text strings into content streams. It doesn't care what encoding you use in your application code. What it talks about is what you eventually store in the strings (literal of hexadecimal) in the content streams. And as you use Identity-H as font encoding, you store doublebyte codes there. With this misunderstanding cleared up, the excerpt from the specification I quoted above requires a PDF viewer to operate like Adobe Acrobat does in this regard, and it does so in a clear way. |
I have updated PDF::Builder to honor the Tw setting when using a TrueType font. This will hit CPAN with the 3.026 release. It splits out x20 ASCII spaces and gives them their own kerning, to adjust their width. Note that |
I just ran across something odd with TrueType fonts (
$pdf->ttfont(...)
). It appears that word spacing ($text->wordspace(n)
) is ignored for TrueType fonts. The PDF::Builder call itself seems to work OK, leaving an Tw
command in the stream. However, Adobe Acrobat Reader seems to ignore the Tw command -- I need to find some other readers to test on. The character spacing commandn Tc
($text->charspace(n)
) appears to work properly for TrueType fonts.I tested with corefonts and psfonts (Type 1 fonts) and they both work properly with both word and character spacing. I wonder if the problem is that Tw is implemented to look for an ASCII space (x20) only, to adjust its size, and misses the boat on the glyph ID hex codes used with ttfonts? Certainly, the hex code for a space glyph can vary widely!
I need to find out if this is something peculiar to Adobe, or if it's widespread. Either way, the wordspace() method's limitation will have to be documented. I haven't checked yet to see if the order of commands matters.
Add: A workaround for this, assuming it isn't a bug in PDF::Builder itself, would be to output words individually, using some multiplier on the actual space width:
There might be more elegant ways, if I think about it for a bit. And of course, it could be a loop to split up a single run of words and spaces, or even a build in a method to do this. Something like this may need to be added to all the text output methods, including
column()
. I'd appreciate hearing from others if they've also seen this problem, and suggestions on what to do about it. Is there a mechanism for reporting this to Adobe? The Reader might not know which glyph corresponds to a space, but it could potentially see a character with no ink (not just x20) and apply a multiplier to it if Tw is in use.The text was updated successfully, but these errors were encountered: