Fixed issues introduced with Apache FOP: #352

armin-weber · 2020-04-14T10:43:57Z

When updating to the latest OpenPDF version we observed that adopting Apache FOP led to some unexpected changes of behavior. I've tried to restore previous behavior with the code changes of this pull request, or at least to mitigate the effects. This pull requests addresses the following two issues:

With Apache FOP, if a glyph was not found in a font, there would be inserted the character '#'. This is different to previous behavior where nothing was changed, and it is especially annoying when one has got a string containing '\n' or similar characters. Therefore, I've restored the previous behavior.
Apache FOP supports ligatures. Because ligatures are not used in the same way in every language, Apache FOP asks for the language before substituting anything. The current default of OpenPDF is the language "dflt", which just means: Use ligatures wherever possible. I've not restored the previous default in this case (because using ligatures is fine), but made the language configurable per document, as some kind of work-around. I know that more would desireable here (e.g. specifying the language of each phrase), but I can't spend more time to this point at the moment (especially because I'm currently investigating more cases of changed behavior).

I hope you are ok with these changes.

* Skip unknown characters again, instead of printing a '#'. * Introduce a document language in order to make it configurable which ligatures should be used by Apache FOP.

sonarcloud · 2020-04-14T10:44:24Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities (and 0 Security Hotspots to review)
0 Code Smells

No Coverage information
0.0% Duplication

codecracker2014 · 2020-04-14T12:58:58Z

Is there some language auto-detect feature in FOP? other changes related to default character seems to be fine.

armin-weber · 2020-04-14T13:27:59Z

Hi @codecracker2014, I'm not an expert for FOP, but I couldn't find any auto-detection. It would be very hard to detect the language at this place anyway, because here the available text often consists of a few characters or even numbers, only. But looking at the text of a whole document may be wrong, too, if multiple languages are used.

So, I can't see an easy solution. I propose the above changes because they make nothing worse than it is, but offer a work-around, though. But, to be honest, I could also live with an option to disable FOP usage.

Fixed issues introduced with Apache FOP:

5c39f1d

* Skip unknown characters again, instead of printing a '#'. * Introduce a document language in order to make it configurable which ligatures should be used by Apache FOP.

andreasrosdal merged commit 68d59af into LibrePDF:master Apr 14, 2020

armin-weber deleted the fix-issues-introduced-with-apache-fop branch April 22, 2020 07:52

armin-weber mentioned this pull request Apr 22, 2020

Re-use the already set document language in new PdfWriter instances. #357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed issues introduced with Apache FOP: #352

Fixed issues introduced with Apache FOP: #352

armin-weber commented Apr 14, 2020

sonarcloud bot commented Apr 14, 2020

codecracker2014 commented Apr 14, 2020

armin-weber commented Apr 14, 2020

Fixed issues introduced with Apache FOP: #352

Fixed issues introduced with Apache FOP: #352

Conversation

armin-weber commented Apr 14, 2020

sonarcloud bot commented Apr 14, 2020

codecracker2014 commented Apr 14, 2020

armin-weber commented Apr 14, 2020