Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF Conversion Problems #9

Closed
tajmone opened this issue Aug 30, 2018 · 8 comments
Closed

PDF Conversion Problems #9

tajmone opened this issue Aug 30, 2018 · 8 comments
Labels
📖 Alan Manual Issues relating to "The Alan Language Manual" 👑 PDF Format Issues with conversion to PDF format 🔨 Asciidoctor PDF Tool: Asciidoctor PDF backend ♻️ help wanted Collaborative help is needed 💀 format porting issues Cross-format problems (ADoc, HTML, PDF, etc.) ⭐ footnotes Topic: Footnotes and/or endnotes

Comments

@tajmone
Copy link
Collaborator

tajmone commented Aug 30, 2018

UPDATED: Asciidoctor-fopub is now being used in the project!

We need to work out a reliable way to convert the documentation to PDF.

So far, I've experimented with the following methods, but stumbled into problems which make the unsuitable for the task at hand:

If anyone has some advise on which would be the best way to convert the Alan documents to PDF while preserving the styles of the original ODT/PDF documents, any suggestions and help are much appreciated!

@tajmone tajmone added ♻️ help wanted Collaborative help is needed 💀 format porting issues Cross-format problems (ADoc, HTML, PDF, etc.) 📖 Alan Manual Issues relating to "The Alan Language Manual" labels Aug 30, 2018
@tajmone
Copy link
Collaborator Author

tajmone commented Sep 1, 2018

Asciidoctor PDF Problems

The footnotes in Table 1 are not rendered correctly using Asciidoctor PDF:

Table 1 Footnotes in PDF Manual

This is known current limitation of Asciidoctor PDF, and it's being discussed in Issues #73, #85, #927.

Also, entries in the Index are being asciibetically sorted, which means that same words with different casing appear separated in the list (all uppercase entries are listed first, for each Index letter section). (See #928 for more details.)

@tajmone
Copy link
Collaborator Author

tajmone commented Sep 1, 2018

Pandoc Problems

Here is a list of the problems I've stumbled upon while experimenting with pandoc conversions. It looks like pandoc might not be the right tool for the task at hand; but I could be missing out some advanced options that I'm not aware of, and which could fix these problems.

It seems to me that the root of the problem is that pandoc will always process the input document to build an AST out of it, and that some unsupported elements are always lost during the process — ie, regardless of the input/output format, the issue is with pandoc's AST.

If anyone has some ideas to contribute, please do!

Asciidoctor to DocBook » pandoc to ODT

I had thought using pandoc to create an ODT version of the documents, using DocBook as an intermediate format and custom templates to preserve/reconstruct the styles of the original Manual.

This approach is simple and it allows using an ODT reference document as a template for styling:

--reference-doc=FILE
    Use the specified file as a style reference in producing a docx or ODT file.

For best results, the reference ODT should be a modified version of an ODT produced using pandoc. The contents of the reference ODT are ignored, but its stylesheets are used in the new ODT. [...]

To produce a custom "reference.odt", first get a copy of the default "reference.odt":

pandoc --print-default-data-file reference.odt > custom-reference.odt 

Then open "custom-reference.odt" in LibreOffice, modify the styles as you wish, and save the file.

The problem here is that when pandoc parses the DocBook file to build the document AST some elements of the original ADoc are lost (eg, custom roles), resulting in the flatting out of many custom styles in the final ODT document. Therefore, using the custom ODT reference file won't solve the problem since paragraphs and blocks which were assigned different roles in the original ADoc (and which are preserved in the DocBook file) are given same styles in the converted ODT.

Pandoc to PDF via LaTeX

Pandoc could also be used to convert directly from DocBook to PDF, but it would rely on Tex-based third party tools to do so:

By default, pandoc will use LaTeX to create the PDF, which requires that a LaTeX engine be installed (see --pdf-engine below).

Alternatively, pandoc can use ConTeXt, pdfroff, or any of the following HTML/CSS-to-PDF-engines, to create a PDF: wkhtmltopdf, weasyprint or prince.

Therefore, I don't see any advantages in using pandoc instead of converting to a Tex-based format directly with Asciidoctor — chances are that we'd have less control over styling with pandoc, or even loose some elements.

@tajmone
Copy link
Collaborator Author

tajmone commented Sep 4, 2018

Asciidoctor-fopub

Today I've managed to setup and experiment with asciidoctor-fopub, and I'm very happy with the results!

Setup

Setting it up turned out to be more problematic than expected because the package hasn't been update in the last 3 years, and there were some issues with Java JDK 10 (had to install nad use JDK 8) and a few tricks where required for some dependencies. But having gone through them (and having annotated it down), I should be able to provide some notes on how to setup this package without going totally berserk and throwing the PC out of the window.

Apart from these small setup problems, asciidoctor-fopub setup is quite straight forward as it only requires cloning the repo (or downloading it as a Zip archive) and it will take care of downloading by itself all the required dependencies.

Custom Styling

This tool provides a true DocBook to PDF toolachain, with all XSL stylesheets presets ready to use. I took my time to read some DocBook XSL documentation and experiment with the styles templates, and it turns out that customizing the document styles isn't too hard, and that we have full control over styles in the final PDF — which is good, because the standard AsciiDoc templates are nothing special, and for PDF there are only three of them (unlinke HTML, which has more).

So, I'm very optimist. There are a few issues to work on (special chars not redering properly, most likely due to fonts settings), but I've already mamanged to customize the look and feel and Alan and BNF code blocks, adding different background colors to them, and other tweaks.

Syntax Highlighting (via XSLHL)

This package also provides templates to use XSL syntax highlighter, which seems quite straight forward to use: I'll only need to find a syntax definition for BNF, and to create a custom Alan syntax for it, so that it can highlight the code correctly. The syntax definitions of this highlighter are similar to Notepad++ syntaxes, or Vim, in some respect (ie, simple RegExs definition of various syntax elements), so they are not too complex to setup.

It looks like we'll be using Highlight for syntax highlighting the HTML version of the Manual, and another Alan XSLHL syntax for the PDF version.

Inclusion in Alan-Docs

Now I just need some time to put order in my test files, clean them up, and then add them to project ... then we shall see the first PDF draft in the repository.

Let's cross fingers that we don't come across some strange and unexpected problems that prevent us from creating the PDF Manual — so far, footnotes and the Index look fine, so I don't expect many surprises really. Asciidoctor was built with DocBook in mind, so this is a solid backend that we're about to use.

Nevertheless, it might require some serious reading of the DockBook XSL specs and references if I wan't to be able to understand how to tweak all the various options (there are really lots of them!). Some options are conditional, so the same stylesheet can produce variations (eg, in color and black and white, A4 page size or smaller, etc.) which can be controlled via command line options (aka "XSL params").

@tajmone
Copy link
Collaborator Author

tajmone commented Sep 4, 2018

PDF Toolchain Now in Project!

@thoni56,

I've finally managed to produce a first decent PDF draft of the Alan Manual, now included in the project:

I've added also some customized DocBook XSL stylesheets, and even though there is still lots of work to do on them, I've managed already to add different background colors to Alan examples and BNF rules, and to add custom borders thickness and radius to them.

When I find time, I'll add some notes on installing asciidoc-fopub.

@thoni56
Copy link
Contributor

thoni56 commented Sep 4, 2018

Not too shabby!

I've just skimmed through the first 3rd of it, but I think that simple styling looks great! Styling "game play" is next?

I also found that in Acrobat Reader DC the image for the class-hierarchy was blank and Acrobat complained about some error on the same page (and that I should contact the author ;-). But PDF Annotator had no problem showing the image. Nor did Firefox, Chrome and Edge (except that Edge adds this yellowy, Microsoftish, background to both images, probably based on transparency or image type or something).

But, again, great work!

@tajmone
Copy link
Collaborator Author

tajmone commented Sep 4, 2018

I think the images problem is due to the fact that I've used an online optimizator for the PNG files — while it reduces size drastically, sometimes not all image viewers show them correctly.

In my version of Adobe Reader they both show correctly, but in any case I'm going to change them with SVG images. For the time being, I just captured a screenshot from the original PDF file, but I'd like to recreate all images.

I've recreated the predefined classes diagram using mermaid Ascii syntax inside an ADoc file, but I thought that including it in the actual manual sources would only add more dependencies (the Asciidoctor diagrams extension, plus mermaid). So, I could just add the plain mermaid source in the assets sources folder, and include the convertd image in the docs (so users don't have to install mermaid and other diagrams tools, just mermaid cli takes up more than 120 Mb because it installs Chromium!).

But probably optimizing the PNG wasn't a good idea, not beside browser usage (if size dropped by more than 30% without quality loss it must have removed lots of metadata and/or used non standard PNG specs).

Styling "game play" is next?

Defintely, but I need to read the DocBook specs to understand how the AsciiDoc roles and elements are renamed in DocBook format, and which attributes I should target. The documentation is all there, it's just quite huge. I'm sure that I'll end falling in love with DocBook, it's pretty powerful stuff.

tajmone added a commit that referenced this issue Sep 5, 2018
Carry on cleaning Ch 3, up to 3.7.
Replace PNG images with non optimized PNGs (@thoni56 mentioned that some PDF
readers can't render the images; See #9)
@tajmone
Copy link
Collaborator Author

tajmone commented Sep 5, 2018

Changed PNG Images: Do they Show in PDF Now?

@thoni56,

I've changed all the PNG images with their non-optimized versions. Could please tell me if now they show up correctly in the PDF reader that reported an error with the previous images:

I just wanted to understand if the problem was traceable back to the optimization tool I have used, or if it has to do with the PDF conversion backend.

Thanks!

@thoni56
Copy link
Contributor

thoni56 commented Sep 5, 2018

Check. Yes, Adobe Reader DC could render these images.

@tajmone tajmone closed this as completed Sep 5, 2018
@tajmone tajmone added 🔨 Asciidoctor PDF Tool: Asciidoctor PDF backend 👑 PDF Format Issues with conversion to PDF format ⭐ footnotes Topic: Footnotes and/or endnotes labels Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📖 Alan Manual Issues relating to "The Alan Language Manual" 👑 PDF Format Issues with conversion to PDF format 🔨 Asciidoctor PDF Tool: Asciidoctor PDF backend ♻️ help wanted Collaborative help is needed 💀 format porting issues Cross-format problems (ADoc, HTML, PDF, etc.) ⭐ footnotes Topic: Footnotes and/or endnotes
Projects
None yet
Development

No branches or pull requests

2 participants