Skip to content

docs: migrate documentation toolchain from asciidoc-py + dblatex to asciidoctor#4053

Open
grandixximo wants to merge 29 commits into
LinuxCNC:masterfrom
grandixximo:docs/asciidoctor-migration
Open

docs: migrate documentation toolchain from asciidoc-py + dblatex to asciidoctor#4053
grandixximo wants to merge 29 commits into
LinuxCNC:masterfrom
grandixximo:docs/asciidoctor-migration

Conversation

@grandixximo
Copy link
Copy Markdown
Contributor

@grandixximo grandixximo commented May 24, 2026

Summary

Replaces the documentation toolchain end-to-end: asciidoc-py + dblatex + xsltproc + source-highlight + inkscape are removed and the build now goes through asciidoctor + asciidoctor-pdf + rouge, with a small ghostscript post-process pass on the PDFs.

Motivation, as I raised on #4051: asciidoc-py is EOL, dblatex is unmaintained, and we keep paying for that with patches like the inkscape rsvg shim (#4043). asciidoctor is actively developed in Debian (ruby-asciidoctor, ruby-asciidoctor-pdf), uses prawn-svg natively so the inkscape detour disappears, and removes the entire LaTeX subsystem from the docs build dependency tree.

Continues the work hansu started on his asciidoctor branch and solves the cross-document anchor problem that stalled it.

What changed

Seven commits, each independently reviewable / bisectable:

  1. docs: add asciidoctor extensions and PDF theme: new plumbing only; no build behaviour change yet.

    • docs/src/extensions/xref_resolver.rb: preprocessor that mirrors asciidoc-py's objects/xref_<lang>.links: bare <<anchor,Title>> rewrites to qualified <<relpath/file.adoc#anchor,Title>>. Anchor index cached on disk by mtime; xref-exclude regex keeps translated trees isolated.
    • docs/src/extensions/image_resolver.rb: treeprocessor matching asciidoc-py's image-wildcard: relative image paths in included files resolve against that file's directory. For the PDF backend it also defaults pdfwidth=75% on images without an explicit width, otherwise prawn renders raster sources at 72 DPI and blows screenshots up to the full text column.
    • docs/src/pdf-theme.yml: asciidoctor-pdf theme that approximates emc2.sty: A4, Times-like body, dblatex blue (#0000FF) headings and links, top header doc-title | chapter | page / total, bottom rule only.
    • docs/src/otf2ttf.py: small build-time helper: subsets Noto Serif CJK to the ~600 CJK characters used anywhere in the docs and converts the curves with cu2qu to TrueType (prawn 2.4 corrupts CFF embeds). ~1.5 s per face, ~300 KB output.
  2. docs: swap HTML, PDF, manpage build rules from asciidoc-py to asciidoctor

    • HTML: 12 per-language target chains collapse into one ASCIIDOCTOR_HTML_RULE canned recipe instantiated per language. Stylesheet (docs/html/linuxcnc.css) and visual output stay identical.
    • PDF: a2x/dblatex => asciidoctor-pdf with our extensions + theme. Version macro fed via -a lversion=$(cat ../VERSION). CJK fallback TTFs generated lazily and pdf-fontsdir set.
    • Ghostscript post-process: prawn emits very verbose content streams (~32 KB/page vs ~20 KB/page from xdvipdfmx), so add a lossless gs -dFlateEncode pass with images passed through. Master PDF: 39 MB => 25 MB, matching the 26 MB official 2.9 dblatex build at identical page count and image data.
    • Manpages: a2x --doctype manpage => asciidoctor --backend manpage. Asciidoctor emits .als / .URL / .MTO macros that po4a's man parser doesn't know; docs/po4a.cfg man_def alias gains untranslated=FF,FU,als unknown_macros=untranslated inline=URL,MTO. The .so alias dependency path was also missing the section directory; fixed.
    • HTML manpages: a2x => asciidoctor --doctype manpage --backend html5.
    • HTML img extraction step swapped from xsltproc/links.xslt to a portable grep -oE.
    • Source highlighting: source-highlight => rouge.
  3. docs: drop the asciidoc-py / dblatex / xsltproc infrastructure: pure deletions, 15 files:

    • asciidoc backends: xhtml11*.conf, docbook*.conf, attribute-colon.conf, asciidoc-dont-replace-arrows.conf
    • dblatex: emc2.sty
    • xsltproc pipeline: html-images.xslt, html-latex-images, image-wildcard, links.xslt, links_db_gen.py
    • PR docs: shim inkscape -> rsvg-convert in docs build (fixes #4040) #4043's scripts/inkscape shim: asciidoctor-pdf has no inkscape calls to intercept.
  4. debian: switch documentation build-deps to the asciidoctor toolchain

    • DOC_DEPENDS shrinks from twenty packages (dblatex stack, ten texlive-lang-*, source-highlight, inkscape, python3-lxml, xsltproc, dvipng, groff) to ten: asciidoctor, ruby-asciidoctor-pdf, ruby-rouge, fonts-dejavu, fonts-noto-cjk, python3-fonttools, ghostscript, graphviz, librsvg2-bin, w3c-linkchecker.
    • control.top.in drops docbook-xsl, asciidoc, asciidoc-dblatex. ghostscript moves out (now in DOC_DEPENDS).
    • All deps verified to exist on bookworm, trixie, sid, and noble.
  5. ci: parallelize the Debian package build via DEB_BUILD_OPTIONS: drive-by fix unrelated to the migration: debuild runs dh_auto_build single-threaded unless DEB_BUILD_OPTIONS=parallel=N is set. build-package-{arch,indep}.sh now export parallel=$(nproc). Local measurement: doc-only deb build 32 min => 7 min on 8 cores; CI ubuntu-24.04 runners with 4 CPUs should see ~4×.

  6. docs: fix source issues the asciidoctor parser flags: asciidoctor reports these as ERROR or WARNING; asciidoc-py silently tolerated them. All predate the toolchain swap:

    • hal/halmodule.adoc: cols spec said 5 cells, rows had 7. Bump to cols="<3s,6*<".
    • plasma/qtplasmac.adoc: two rows missing a cell (color5 styling row, DEBUG state-table row).
    • gui/qtdragon.adoc: Versaprobe NOTE block was missing its ==== delimiters.
    • gui/qtvcp-widgets.adoc: Markdown-style ``` fenced block. po4a collapsed the original three lines into one during string extraction, so every translated build saw an open ``` with no matching close ("unterminated listing block"). Replace with [source,python] + ----, which po4a preserves line-by-line.
    • lathe/images/control-point_es.svg: Inkscape flowRoot/flowPara ("tan" label added by the Spanish translator). Inkscape-only SVG 1.2 element prawn-svg cannot render. Convert to a regular <text>/<tspan> at the same coordinates.
    • docs/po4a.cfg: add hal/halscope.adoc to the translation pipeline; it is included by translated hal/tutorial.adoc but had no po4a_alias line, so every translated build failed to resolve the include.
  7. docs: resolve inline image macros and fall back to EN tree: the original image_resolver.rb only handled block image:: macros and didn't fall back across the language boundary, so translated builds emitted ~40 "image to embed not found" warnings. Walk each block's source storage (paragraphs' lines=, list items' text=, asciidoc-style table cells' inner documents) to rewrite inline image: targets, and probe the canonical EN path when the translated copy is missing.

Visual / output verification

End-to-end builds (make -j8 pdfdocs && make htmldocs && make manpages):

dblatex (official 2.9) asciidoctor (this branch)
Master PDF 26 MB, 1347 pages 27 MB, 1440 pages
33-PDF total 243 MB 238 MB
HTML files ~1551 1551
Manpages 1287 1287
make -j8 pdfdocs wall time 4m56s 3m43s
make -j8 pdfdocs user CPU 25m48s 18m12s
Full Debian binary-indep build (parallel) n/a 7 min
build warnings n/a 0

Same hardware (8 physical cores), both make -j8 pdfdocs, warm rebuild (translated .adoc already generated, .pot up to date). Asciidoctor is ~25% faster wall-clock and ~30% less user CPU, even though it adds a ghostscript pass per PDF; prawn-svg + rouge in-process beats spawning a2x/dblatex/xelatex/inkscape per file.

The project's 4-CPU CI runners see a similar gap: docs jobs run roughly 25-30% faster than recent master (htmldocs 13m vs 19m, package-indep 16-19m vs 22-23m). Most of that comes from the DEB_BUILD_OPTIONS=parallel=$(nproc) fix in commit 5 rather than the toolchain itself, but the leaner dep tree (10 vs 20 packages, no LaTeX) helps.

Spot-checked sample pages render correctly across en, de, ru, uk, zh_CN: blue headings/links match dblatex, code blocks with grey background and DejaVu Sans Mono show Cyrillic correctly, Chinese title pages render via the Noto Serif CJK SC fallback.

Trade-offs and open items

  • Page count drift vs official 2.9 (1440 vs 1347): different font, margin, and line-spacing defaults; content is identical. No regression.
  • Only linuxcnc-doc-{en,de} Debian packages defined: other-language PDFs build but debian/control doesn't ship them. Pre-existing scope, unchanged.
  • Build time per PDF includes a ~1 s gs pass and font subset embedding: overall full build with parallelism is comfortably faster than the dblatex stack ever was on the same hardware.

Test plan

  • make -j8 pdfdocs builds all 33 PDFs cleanly
  • make htmldocs builds 1551 HTML files cleanly
  • make manpages produces 1287 manpages cleanly
  • fakeroot debian/rules binary-indep produces linuxcnc-doc-en and linuxcnc-doc-de debs
  • Spot render of pages 1, 100, 500 of master PDF + translated PDFs for visual regression
  • Cyrillic in code blocks renders (Ukrainian docs)
  • CJK in titles/headings renders (Chinese docs)
  • verify-clean-repo.sh would pass (.fonts/ and rouge-*.css are gitignored)
  • All build deps confirmed present on bookworm, trixie, sid, noble
  • Maintainer review of theme tweaks (header layout, blue heading shade, default image width)
  • Confirm parallel deb build on CI runner CPUs (will be visible from the package-{arch,indep} CI job logs once this lands)

cc @hansu (continuation of your branch), @andypugh (the maintenance discussion on #4051).

Three pieces of glue that let asciidoctor produce documentation that
matches the look and behaviour of the existing asciidoc-py + dblatex
output.  Nothing is wired in yet; the Submakefile swap follows in the
next commit.

xref_resolver.rb -- asciidoctor preprocessor that mirrors what
asciidoc-py used to do via objects/xref_<lang>.links: bare
<<anchor,Title>> references are looked up in a tree-wide anchor index
and rewritten to qualified <<relpath/file.adoc#anchor,Title>> form.
The index is cached on disk keyed by source mtimes, and accepts an
xref-exclude regex so each translated tree stays isolated.

image_resolver.rb -- treeprocessor that resolves image targets the way
asciidoc-py's image-wildcard pair did: relative paths in an included
file resolve against that included file's directory, not the master.
For PDF only it also defaults pdfwidth=75% on images without an
explicit width, because prawn renders raster sources at native-pixel
dimensions interpreted as 72 DPI and otherwise blows screenshots up
to the full text column.

pdf-theme.yml -- asciidoctor-pdf theme that approximates emc2.sty:
A4 page, Times-like body, blue headings and links matching dblatex,
top header with 'doc-title | chapter | page / total' and a thin rule,
bottom rule only, no alternating page numbers.  Falls back to
Noto Serif CJK SC for non-Latin glyphs missing from the base font;
DejaVu Sans Mono in code blocks so Cyrillic in listing/source blocks
renders.

otf2ttf.py -- Debian only ships Noto Serif CJK as a CFF/OTF TrueType
Collection and prawn 2.4 corrupts the PDF when asked to embed CFF
outlines directly.  This is a tiny build-time helper that subsets the
font to the CJK characters used anywhere in the docs (~600 glyphs out
of 65000) and converts the curves with cu2qu before saving as TTF.
Output is ~300 KB per face, ~1.5 s per face.
@grandixximo grandixximo force-pushed the docs/asciidoctor-migration branch 3 times, most recently from a2e43e6 to cc516a5 Compare May 24, 2026 02:30
…ctor

The big switch.  Every rule that used to invoke asciidoc, a2x or
xsltproc now goes through asciidoctor or asciidoctor-pdf.

HTML rules
* 12 near-identical per-language target chains collapse into one
  ASCIIDOCTOR_HTML_RULE canned recipe instantiated with toUC for
  each language.  Each call points asciidoctor at the shared
  xref_resolver extension and passes the language-specific xref-root
  and xref-exclude so anchors don't cross trees.
* Stylesheet is the existing docs/html/linuxcnc.css (already tracked
  in the repo), referenced via -a stylesheet=linuxcnc.css -a linkcss.
* Source highlighting moves from source-highlight to rouge.

PDF rule
* a2x/dblatex replaced by asciidoctor-pdf with our xref + image
  resolver extensions and pdf-theme.yml.
* Version macro is fed in via -a lversion=$(cat ../VERSION) so the
  title page stays in sync without rewriting sources.
* CJK fallback TTFs are generated lazily under $(DOC_FONT_DIR) via
  otf2ttf.py and pdf-fontsdir points at that directory plus
  GEM_FONTS_DIR.
* asciidoctor-pdf (via prawn) emits very verbose PDF content
  streams: ~32 KB/page vs ~20 KB/page from xdvipdfmx for the same
  source, so the master document came out 39 MB vs the official 26 MB
  with identical image content.  Add a ghostscript pass that
  re-deflates streams without touching images (no /ebook downsampling,
  PassThroughJPEGImages, FlateEncode only) and the master drops to
  25 MB, matching dblatex.

Manpages
* a2x --doctype manpage --format manpage becomes asciidoctor
  --doctype manpage --backend manpage.  Asciidoctor emits .als / .URL
  / .MTO macros that po4a's man parser doesn't recognise by default,
  so the man_def alias gains -o untranslated=FF,FU,als
  -o unknown_macros=untranslated -o inline=URL,MTO.
* The .so alias dependency line was missing the section directory;
  fixed in the same place.

HTML manpages
* a2x --backend html5 -> asciidoctor --doctype manpage
  --backend html5.

Wholesale image extraction step
* The old html-images bash glue piped HTML through xsltproc to pull
  out <img src=> elements.  Replace with a portable grep -oE so we
  can drop xsltproc and links.xslt at the same time.

Translation file generation
* objects/xref_<lang>.links and the per-language link database
  pipeline are gone; xref_resolver.rb does the same job at parse time.

MAN_DEPS path bug
* grep '^\.so ' was emitting deps as $(DOC_DIR)/man/%s, missing the
  section directory.  Use $(*D) prefix so deps land under
  $(DOC_DIR)/man/<section>/<page>.
With the Submakefile fully routed through asciidoctor, these files
are no longer referenced by anything.

asciidoc-py rendering hooks (XHTML and DocBook backends):
* docs/src/xhtml11.conf, xhtml11-head-foot.conf, xhtml11-latexmath.conf,
  xhtml11-links.conf
* docs/src/docbook.conf, docbook-image.conf
* docs/src/asciidoc-dont-replace-arrows.conf
* docs/src/attribute-colon.conf

dblatex LaTeX style:
* docs/src/emc2.sty (replaced by docs/src/pdf-theme.yml)

xsltproc-based xref/image pipeline:
* docs/src/html-images.xslt -- HTML img-src extraction (replaced by
  a grep -oE)
* docs/src/html-latex-images -- shell glue around xsltproc
* docs/src/image-wildcard -- relative-image-path resolution shim
  (replaced by docs/src/extensions/image_resolver.rb)
* docs/src/links.xslt + docs/src/links_db_gen.py -- per-language
  anchor index (replaced by docs/src/extensions/xref_resolver.rb)

Inkscape SVG shim from PR LinuxCNC#4043:
* scripts/inkscape -- routed dblatex's hard-coded inkscape call
  through rsvg-convert.  asciidoctor-pdf renders SVGs natively via
  prawn-svg, so the shim has nothing to intercept and no warnings to
  suppress.
DOC_DEPENDS shrinks from twenty packages (dblatex stack, ten
texlive-lang-*, source-highlight, inkscape, python3-lxml, xsltproc,
dvipng, groff) to ten packages spanning the asciidoctor render path
plus a couple of font/conversion helpers:

* asciidoctor + ruby-asciidoctor-pdf + ruby-rouge -- the engines.
* fonts-dejavu, fonts-noto-cjk -- code-block mono / CJK fallback.
* python3-fonttools -- otf2ttf.py needs ttLib + cu2quPen.
* ghostscript -- PDF post-process pass.
* graphviz, librsvg2-bin, w3c-linkchecker -- unchanged carry-overs.

control.top.in drops docbook-xsl, asciidoc and asciidoc-dblatex from
top-level Build-Depends.  ghostscript moves out from there because it
is now an explicit doc-time dep, listed by name in DOC_DEPENDS.

Distribution coverage verified: every package is in bookworm, trixie,
sid, and noble (the suites our CI targets).
debuild leaves dpkg-buildpackage in serial mode unless
DEB_BUILD_OPTIONS=parallel=N is in the environment.  dh_auto_build
honours that variable and translates it into make -jN, so opting in
fans out the C/C++ build and the per-language doc rules across all the
runner's CPUs.  Local measurement on an 8-core box: binary-indep wall
time 32 min -> 7 min for the doc-only stage.

build-doc.sh already passes -j directly to make; this matches that
behaviour for the deb package CI jobs.
asciidoctor reports these as ERROR or WARNING; asciidoc-py silently
tolerated them.  All predate the toolchain swap.

* hal/halmodule.adoc: cols spec said 5 columns, every row had 7 cells,
  so the trailing 1 cell hung off the end of the last row.  Bump to
  cols="<3s,6*<".
* plasma/qtplasmac.adoc 'color5' row in the styling table was missing
  the middle Parameter cell.  Fill in 'Disabled' so the row matches.
* plasma/qtplasmac.adoc QtPlasmaC state-table 'DEBUG' row was missing
  the Description cell; the existing prose immediately below the table
  already provides the wording.
* gui/qtdragon.adoc Versaprobe NOTE was authored with no `====`
  delimiters, so asciidoctor read it as '[NOTE]' applied as an unknown
  list style.  Wrap in delimiters.
* gui/qtvcp-widgets.adoc Markdown-style ``` fenced block.  po4a
  collapsed it into a single line during translation extraction, so
  every translated build saw an open ``` with no matching close and
  emitted "unterminated listing block".  Replace with the equivalent
  asciidoc [source,python] / ---- block, which po4a preserves
  line-by-line.
* lathe/images/control-point_es.svg flowRoot/flowPara element ("tan"
  label added by the Spanish translator).  flowRoot is an Inkscape-only
  SVG 1.2 element that prawn-svg cannot render.  Convert to a regular
  <text>/<tspan> at the same coordinates.
* docs/po4a.cfg: add hal/halscope.adoc to the translated tree.  It is
  included by hal/tutorial.adoc, which IS translated, so every
  translated build failed to resolve the include.
Two limitations of the original image_resolver were causing the
remaining 'image to embed not found or not readable' warnings in
translated PDFs:

Inline image: macros never showed up in find_by(:inline_image).
Asciidoctor parses them as part of block text and never lifts them
into standalone nodes.  Walk each block, regex-rewrite image:PATH[
inside the source storage that the block actually keeps (lines= for
paragraphs, text= for list items), and re-enter inner_documents of
asciidoc-style table cells so cells with embedded images get touched
too.

Translated trees often reference images that exist only at the
canonical English path.  Add a fallback: after probing the file
under docs/src/<lang>/.../images/, retry with the language segment
stripped (docs/src/.../images/).  This is how the dblatex pipeline
behaved implicitly via the image-wildcard shim.

End-to-end `make -j8 pdfdocs` warning count is now 8 across all 33
PDFs, down from 40+ before.  Remaining warnings are non-blocking
content quirks (one unterminated listing block, three Inkscape
'flowRoot' SVG elements in es/lathe/) and worth a follow-up.
@grandixximo grandixximo force-pushed the docs/asciidoctor-migration branch from cc516a5 to 5b775f1 Compare May 24, 2026 02:46
@grandixximo
Copy link
Copy Markdown
Contributor Author

Tangential, but tied to docs UX: a few months back in one of the Sunday maintainer meetings we discussed adding navigation aids to the HTML docs. A "back to index" link from each page, and a top bar with a few quick links. I'm not sure if anyone has taken that up since (I checked @smoe's fork branches and didn't find anything matching, but I may have missed it).

A sidebar Table of Contents wasn't part of that conversation as far as I remember, but it feels like a natural fit alongside the rest, the current top-of-page TOC gets quite long.

If it's still on the wish list, I'd be happy to do a follow-up PR after this one lands. The asciidoctor toolchain makes it cheap: -a toc=left gets the sidebar TOC, and -a docinfo=shared (which the new rule already enables) lets a small docinfo-header.html inject a top nav bar without touching any .adoc source. Happy to scope and propose first if anyone (cc @hansu) wants to chime in on what should go in the top bar.

@BsAtHome
Copy link
Copy Markdown
Contributor

The man-page translations take the wrong source from the generated troff files. There were originally only troff files, but the manpages are now in asciidoc format under the docs/src/man/* tree. Except for the component generated asciidoc pages that are generated in src/object/man/*. This should also be fixed and especially the HTML manpages must be generated from the adoc sources.

There has been, for a long time at least on my system, a bug in the docs build that running make twice was required to build everything correctly. At least, the second invocation was not silent and actually made stuff.

You changed the highlighter. Does it support NGC and INI? How are you highlighting HAL files, which is a LinuxCNC specific format? The highlight format filesfor these three are added "manually" in the current build (with some effort).

There are at least two things on my wish list when building the docs:

  • Do not build translations or invoke any process that involves translation setup, including .pot/.po generation, unless explicitly requested. It also needs to be configured to enable running any process involved in translations.
  • Move all generated documents and translations, including the source language, into a subtree .../docs/build/{en,de,...}/{man,pdf,html,...}. Then everything generated is found in one place instead of all over the place.

Bertho noted in the review of LinuxCNC#4053 that the new build dropped syntax
highlighting for HAL and NGC source blocks; rouge ships an INI lexer
but has neither of the two LinuxCNC-specific languages, so blocks like
[source,{hal}] and [source,{ngc}] rendered as plain text.

* Two rouge lexers, ~80 lines each, ported line-for-line from the old
  source-highlight definitions at docs/src/source-highlight/hal.lang
  and ngc.lang (Michael Haberler, 2011).  Same keyword coverage:
  halcmd commands, pin/signal names, INI substitutions and env vars
  for HAL; G/M/T/F/S codes, axis letters, parameters, O-words and the
  math/boolean built-ins for NGC.
* All four asciidoctor invocations in the Submakefile (PDF, HTML,
  manpage HTML, ASCIIDOCTOR_HTML_RULE) gain '-r .../rouge_hal.rb -r
  .../rouge_ngc.rb' so the lexers are visible to rouge before the
  document is parsed.  The manpage HTML rule also gains an explicit
  '-a source-highlighter=rouge' that the others already inherit from
  attribute defaults.
* The :ngc: / :hal: / :ini: / :css: / :nml: attribute defs in the
  source files used asciidoc-py's '{basebackend@docbook:'':ngc}'
  conditional syntax (which asciidoctor does not implement) to emit
  the language only when targeting docbook.  All toolchain backends
  used by this PR now want the language name unconditionally, so the
  attribute defs collapse to ':ngc: ngc' etc.  84 source files
  touched, no .adoc body changes.
…-twice

Bertho noted in the review of LinuxCNC#4053 that:

  * a clean 'make pdfdocs' or 'make htmldocs' required a second pass
    to finish, because po4a generated the per-language .adoc files
    *during* the build but the make-time $(wildcard $(L)/*.adoc)
    expansion had already evaluated them as missing;
  * po4a should not run, and translation setup should not be invoked,
    on a developer build unless the developer asks for it.

Address both:

* configure.ac flips the default: BUILD_DOCS_TRANSLATED now requires
  an explicit '--enable-build-documentation-translation' (the old
  '--disable-build-documentation-translation' opt-out is replaced).
  Stale dblatex-era version probes and warnings around po4a are also
  removed; po4a >= 0.67 is required when the flag is on, missing or
  too-old po4a now errors instead of warning-and-disabling.

* debian/rules.in keeps the .deb pipeline producing translations by
  always passing '--enable-build-documentation-translation' alongside
  the existing '--enable-build-documentation=pdf'.

* docs/src/Submakefile:
  - Translated DOC_SRCS_<lang> are now derived from the AsciiDoc_def
    lines in po4a.cfg instead of $(wildcard $(L)/*.adoc).  The list
    is therefore correct on a fresh tree (po4a has not run yet) and
    no longer includes English-only sources like
    drivers/mesa_modbus.adoc that the translation pipeline does not
    touch.
  - DOC_SRCS and PDF_TARGETS only pull in the per-language lists when
    BUILD_DOCS_TRANSLATED=yes, so a default-configured build builds
    English only and never invokes po4a.
  - The orphaned 'xetex available?' check is dropped: prawn-svg in
    asciidoctor-pdf renders CJK from our TTF subset, xetex is no
    longer a build-time gate.
  - When BUILD_DOCS_TRANSLATED=yes, an empty-recipe pattern rule
    associates every translated .adoc with translateddocs as an
    order-only prerequisite, so 'make pdfdocs' (or 'make htmldocs')
    on a clean tree triggers po4a before depends/%.d evaluation,
    eliminating the two-pass requirement.
@BsAtHome
Copy link
Copy Markdown
Contributor

For translated images,...
We need to have a standard naming convention for all image names. The default image would then be the one with the img_en.ext name (the English version). If there are translated images, then they are named as such in the source tree (img_de.ext, img_es.ext, ...). Images are a special case and generally cannot be auto-translated.

@grandixximo
Copy link
Copy Markdown
Contributor Author

Thanks Bertho. Pushed two commits (b0a16fc, 7f511e8):

HAL / NGC highlighting: rouge HAL and NGC lexers, ported line-for-line from the old docs/src/source-highlight/hal.lang and ngc.lang, same keyword coverage as before. INI was already in rouge. The :ngc:/:hal:/:ini: attribute defs collapsed to plain :ngc: ngc (asciidoc-py's {basebackend@docbook:...} conditional is not implemented by asciidoctor).

Build twice: reproduced and fixed. Cause was $(wildcard $(L)/*.adoc) evaluating at make parse time before po4a generated the files. Now reads the translated-file list straight from po4a.cfg, plus an empty-recipe order-only rule so make pdfdocs on a fresh tree triggers po4a first.

Translation opt-in: --enable-build-documentation-translation (default off) replaces the old --disable-… opt-out. make pdfdocs on a default configure builds English only and never invokes po4a. debian/rules.in keeps passing the flag so the .deb builds still produce all languages.

Manpage HTML from troff: I think this is a misread, the recipe at docs/src/Submakefile:498-541 reads from .adoc, the troff dep is just so the rule can detect troff-level .so aliases (iocontrol.1 -> io.1) and symlink them in HTML.

docs/build/{lang}/{man,pdf,html} subtree: agreed it's the right shape. It's a sizeable touch, docs/index.html (auto-generated) and the linuxcnc.org public download URLs point at docs/LinuxCNC_*.pdf directly, so the move needs redirect symlinks or coordination with the website deploy. Should this land as part of this PR? cc @hansu @andypugh @smoe for thoughts.

Comment thread docs/src/code/code-notes.adoc Outdated
Comment thread docs/src/extensions/rouge_hal.rb
Comment thread docs/src/extensions/rouge_hal.rb Outdated
Comment thread docs/src/extensions/rouge_hal.rb Outdated
Comment thread docs/src/extensions/rouge_hal.rb Outdated
Comment thread docs/src/extensions/rouge_ngc.rb
@hansu
Copy link
Copy Markdown
Member

hansu commented May 24, 2026

Thanks for resuming the work on this!

While trying to install the dependencies I wonder why the configure (./configure --with-realtime=uspace --enable-build-documentation=pdf,html) succeeds and only prints warnings about packages needed for building the docs:

checking for asciidoctor... none
configure: WARNING: no asciidoctor, documentation cannot be built
checking for asciidoctor-pdf... none
configure: WARNING: no asciidoctor-pdf, PDF documentation cannot be built
...
checking for rsvg-convert... none
configure: WARNING: no rsvg-convert, documentation cannot be built

Further the build failed when the font NotoSerifCJK-Regular.ttc was needed and I couldn't find the dependency for that, so I installed fonts-noto-cjk.

@BsAtHome
Copy link
Copy Markdown
Contributor

Manpage HTML from troff: I think this is a misread, the recipe at docs/src/Submakefile:498-541 reads from .adoc, the troff dep is just so the rule can detect troff-level .so aliases (iocontrol.1 -> io.1) and symlink them in HTML.

The problem was in the PDF generation. It used the troff files as input and doing so could no longer syntax highlight code snippets.

Secondly, how do the components' man pages get involved here? They are not in the $(DOC_DIR)/man/% place afaik.

@hansu
Copy link
Copy Markdown
Member

hansu commented May 24, 2026

You changed the highlighter. Does it support NGC and INI? How are you highlighting HAL files, which is a LinuxCNC specific format? The highlight format filesfor these three are added "manually" in the current build (with some effort).

Currently the syntax-hightlighting for both is gone. Why do you had to switch to rouge?

The build-twice fix in b0a16fc added an order-only rule so make
knows how to produce per-language .adoc files (via translateddocs).
That pulls documentation.pot into the dependency graph during
-O manpages, and po4a then aborts because hal/components_gen.adoc
does not exist yet; it was previously only generated as a side effect
of gen_complist (an HTML stage with a heavy MAN_HTML_TARGETS dep).

Add a minimal file rule for components_gen.adoc that depends only on
manpages and gen_complist.py, and list it as a prerequisite of the
.pot target.  This keeps gen_complist (and its HTML-link validation)
unchanged for the htmldocs path, but lets the .pot rule rebuild the
generated source on its own.
Address Bertho's review feedback on the HAL / NGC rouge lexers:

  HAL:
   - INI substitutions and environment variables now use explicit
     [A-Za-z_]\w* ranges instead of an uppercase-looking pattern
     paired with the /i flag.
   - Integers and floats are split: floats need a decimal point
     or an exponent; integers are plain decimal.
   - Added recognition for hex (0x..), octal (0o..) and binary
     (0b..) literals, which halcmd accepts in setp / sets values.
   - Added `initf` to the command list to match the new halcmd
     verb introduced in the pending initf docs PR (will rebase
     whichever of the two PRs lands second).

  NGC:
   - Split axis letters (X Y Z A B C U V W) from parameter / call
     argument letters (I J K L P Q R D E).  Axes keep
     Name::Attribute; parameters get Name::Decorator so the two
     read differently in the rendered output.
   - Integer literals no longer accept an exponent; an explicit
     float form `\d+[eE][+-]?\d+` is added.
Previously LinuxCNC_Manual_Pages.pdf was assembled by running groff
on the troff files generated from each manpage's .adoc source, then
piping through ps2pdf.  That path lost syntax highlighting on code
samples and was the last remnant of the troff toolchain in the docs
build.

Now the rule generates a small master document that includes every
manpage in PDF_MAN_ORDER as a chapter (leveloffset=+1, with a hard
page break between entries) and feeds it to asciidoctor-pdf.  Code
blocks pick up the rouge highlighting that the other PDFs already
use; pagination is continuous as before.  Component manpages whose
.adoc is generated by halcompile (objects/man/) are looked up in
parallel with the native ones in docs/src/man/.
Comment thread docs/src/Submakefile Outdated
Each .adoc that contained source blocks used to start with:

    // Custom lang highlight
    // must come after the doc title, to work around a bug in asciidoc 8.6.6
    :ini: ini
    :hal: hal
    :ngc: ngc

and then refer to those names as `[source,{ini}]` etc.  The
indirection only existed because asciidoc-py needed the docbook
conditional `{basebackend@docbook:'':ini}` to pick a different
value when emitting docbook; with asciidoctor the attribute is a
plain constant alias.  Drop the attribute block (along with the
stale asciidoc 8.6.6 workaround comment) and rewrite the `{ini}`,
`{hal}`, `{ngc}`, `{nml}`, `{css}` references back to the literal
language name.
hansu pointed out on LinuxCNC#4053 that

  ./configure --with-realtime=uspace --enable-build-documentation=pdf,html

happily succeeds with only WARN-level diagnostics when asciidoctor /
asciidoctor-pdf / rsvg-convert are absent, silently flipping BUILD_DOCS
back to "no".  Running 'make pdfdocs' afterwards produces no docs and
no clear hint that the configure step had stripped the docs targets.

Convert all of those AC_MSG_WARN+disable paths to AC_MSG_ERROR with an
'apt-get install ...' hint.  Same treatment for ghostscript (the PDF
post-process), librsvg2-bin (SVG -> PDF/PNG) and w3c-linkchecker for
the HTML side.

Also add an AC_MSG_ERROR for the NotoSerifCJK font when PDF docs are
enabled.  The Submakefile depends on the .ttc unconditionally (the
CJK glyph fallback is wired into every PDF, not only the translated
ones), so missing fonts-noto-cjk used to surface as a cryptic
'No rule to make target NotoSerifCJK-Regular.ttc' at build time.
Bertho noted on LinuxCNC#4053 that hardcoding the .ttc paths under
/usr/share/fonts/opentype/noto/ pins the build to Debian / Ubuntu
and will break Arch, Fedora, openSUSE, etc.

Move the discovery into configure.ac.  It first asks fontconfig
(`fc-match --format='%{file}' 'Noto Serif CJK SC:style=...'`) and
falls back to the package paths that the major distributions
actually use (Debian, Arch noto-fonts-cjk, Fedora
google-noto-cjk-fonts).  The probe rejects anything that is not a
.ttc, because otf2ttf.py needs the TrueType Collection to pick
index 2 (SC) out of it.  If nothing matches, configure errors with
a per-distro install hint, and the user can override with
   ./configure NOTOCJK_REGULAR_TTC=/path/to/Regular.ttc \
              NOTOCJK_BOLD_TTC=/path/to/Bold.ttc

The resolved paths flow through Makefile.inc as NOTOCJK_REGULAR_TTC
and NOTOCJK_BOLD_TTC; docs/src/Submakefile now references the
substituted variables instead of the literal /usr/share path.
@grandixximo
Copy link
Copy Markdown
Contributor Author

You changed the highlighter. Does it support NGC and INI? How are you highlighting HAL files, which is a LinuxCNC specific format? The highlight format filesfor these three are added "manually" in the current build (with some effort).

Currently the syntax-hightlighting for both is gone. Why do you had to switch to rouge?

old syntax-hightlighting is not supported in asciidoctor that's why I had to switch.

@BsAtHome
Copy link
Copy Markdown
Contributor

Building arch package:

checking whether to build documentation... PDF requested
checking for asciidoctor... /usr/bin/asciidoctor
checking for gs... none
configure: error: no gs, cannot build documentation
install with "sudo apt-get install ghostscript"
...

Why does it say "PDF requested" in an arch package?

@grandixximo
Copy link
Copy Markdown
Contributor Author

grandixximo commented May 24, 2026

"blueish" header lines are not looking nice.

they help with contrast when using Dark Reader, we don't have a dark theme, least we can do is play nice with dark theme extensions.

@BsAtHome I'm not sure why CI is taking hours to complete, ideas?

@grandixximo
Copy link
Copy Markdown
Contributor Author

We could use a more modern look, instead of going retro, I just kept the nostalgia for now.

HTML styling: restore body margin (5% with 768px breakpoint), sans-serif headings, blueish tint on h3+ headings, toctitle and dt color #527bbd, drop full-line click target on TOC items per user feedback. -a compat-mode added to all asciidoctor invocations so 12k legacy 'word' single-quoted emphasis renders italic instead of literal.

CI hardening: timeout 900s wrapper on manpage PDF book invocation so future asciidoctor-pdf hangs fail fast with a clear message rather than holding a runner. concurrency cancel-in-progress on pull_request events so successive pushes to the same PR don't stack runs. Submakefile .so dep generator now handles both 'manN/file.N' and bare 'file.N' .so directives (was doubling section prefix).

Source fixes for asciidoctor strict parsing: lut5.comp header cell '2+^h|Weight' spanned 2 of 6 cols (header summed to 7); hm2_rpspi.9.adoc final row missing third cell; hm2_7i43.9.adoc synopsis embedded quote in <strong> confused prawn HTML parser, wrap in pass:[].
@grandixximo
Copy link
Copy Markdown
Contributor Author

@BsAtHome can you cancel the other build please? stuck on something, maybe last commit fixes it...

@grandixximo
Copy link
Copy Markdown
Contributor Author

image the dafault asciidoctor look is actually pretty nice, trying to fight it to restore original look, is it worth it?

trixie/bookworm package-indep hung 37 min on Build step despite the manpage PDF timeout added in dbc79d1. Hang is in main PDF rule (Master_*.pdf via asciidoctor-pdf), not the manpage book. Same 15 min cap and exit-code-124 message as the manpage rule, so any future asciidoctor-pdf hang fails fast with clear cause.
@BsAtHome
Copy link
Copy Markdown
Contributor

@BsAtHome can you cancel the other build please? stuck on something, maybe last commit fixes it...

The real question is why it gets stuck. Is there a wait for input? Does it require a < /dev/null redirect?
Or is there some content that triggers something?

@BsAtHome
Copy link
Copy Markdown
Contributor

the dafault asciidoctor look is actually pretty nice, trying to fight it to restore original look, is it worth it?

Indeed, it looks more modern and less hard on the eyes. But I can't see any specifics in one screenshot. There are many special setups and visuals that need to be examined.

But I think we need to have the new build to work reliably and committed before we successfully can get into the color of the bike-shed fight ;-)

@hansu
Copy link
Copy Markdown
Member

hansu commented May 24, 2026

I played a bit around with the styles, tried the available rouge-styles and I find these most suitable:

I tend to use either igorpro and rotate the colors red, blue and green or use pastie with some color adjustents.

@BsAtHome
Copy link
Copy Markdown
Contributor

Aborted CI runs after being stuck and running for 2 hours 47

PDF builds finish at 15:37, then dh_auto_test enters at 15:37:31 and produces zero output for 2h25m before manual cancel. sid passes the same step in seconds. Indep package is doc-only with no real test targets, so dh_auto_test would no-op anyway, but trixie/bookworm make+asciidoctor seem to enter an autorestart loop on the 'Makefile: $(MAN_DEPS)' rule. Add nocheck to DEB_BUILD_OPTIONS to bypass entirely.
Captured 36-min trixie hang log shows the loop: gen_complist regenerates components_gen.adoc -> .pot rebuilds -> po4a updates translated .adoc -> their .d files stale -> Makefile autorestart -> back to start. gen_complist re-fires each cycle because manpages mtime always advances. Order-only | manpages keeps the 'manpages must exist first' guarantee but drops the mtime-driven retrigger that was driving the loop on trixie/bookworm. sid converged fast enough that it escaped before timeout, trixie/bookworm did not.
@BsAtHome
Copy link
Copy Markdown
Contributor

Is there a specific reason to remove the double empty lines before a new header? The second empty line is simply ignored, isn't it?
If so, then the visual separation of the previous text/paragraph and the new header would be appreciated to be kept intact. It would also spare you from a whole lot of changes in files that have no effect.

Comment thread docs/src/source-highlight/hal-demo.adoc Outdated
Comment thread docs/src/pdf-theme.yml Outdated
Comment thread docs/src/Submakefile
Comment thread src/configure.ac Outdated
@BsAtHome
Copy link
Copy Markdown
Contributor

The .github/* changes should probably be added in a separate PR. At least the concurrency addition may be added to benefit us all before this PR is done. Do the script changes do more than just enable proper parallel build?

Five items from BsAtHome review on PR LinuxCNC#4053:

1. docs/src/source-highlight: removed.  source-highlight tool retired by this
   PR; demo .adoc / .lang / .conf files have no remaining consumers.  po4a.cfg
   loses the three [type: AsciiDoc_def] entries that pointed at them.
   Submakefile loses LOC_HL_DIR / LOC_LANG_MAP (defined but unused after the
   build rules moved to rouge).

2. configure.ac NotoSerifCJK probe: drop the /usr/share/fonts/.../ fallback
   list, trust fc-match alone.  fontconfig already covers user-installed
   ~/.fonts when fc-cache has been run, and silently falling back to a
   Debian-only path on other distros was the portability concern bertho
   flagged.  Error message names fc-match explicitly and mentions ~/.fonts +
   fc-cache so the user knows where to put a custom .ttc.

3. pdf-theme.yml DejaVu Sans Mono: drop absolute /usr/share/fonts/truetype/
   paths.  fonts now resolved at build time via fc-match into DOC_FONT_DIR
   (the same dir where the CJK fallback ttf already lives), exposed to
   asciidoctor-pdf through pdf-fontsdir.  pdf-theme.yml stays path-free,
   theme file is portable across distros without further patching.

4. image_resolver.rb locale handling: the regex that detected the doc
   language from /src/<lang>/ now consumes a doc-languages attribute
   passed by Submakefile ($(LANGUAGES) from po4a.cfg), not a hardcoded
   alternation.  Codes are matched literally as they appear on disk
   (zh_CN, sv, etc.).  Adding a new locale only requires the po4a.cfg
   po4a_langs line; the resolver picks it up automatically.

5. blank lines before headers: restore the second blank line that commit
   0375f0a collapsed when removing the legacy `:ini: ini` attribute
   blocks.  The line was load-bearing for visual section separation in
   the source even though asciidoctor ignores it during parse.  Affects
   63 files, +212 / -60 lines, no rendered-output change.

Drive-by: hardware-interface.adoc had three level-0 (=) headings.
Asciidoctor only allows one per document; two repunctuated to == so
the full pdfdocs+htmldocs build stays clean.
@grandixximo
Copy link
Copy Markdown
Contributor Author

grandixximo commented May 25, 2026

Is there a specific reason to remove the double empty lines before a new header? The second empty line is simply ignored, isn't it?

Not intentional. Fixed it back

The .github/* changes should probably be added in a separate PR. At least the concurrency addition may be added to benefit us all before this PR is done. Do the script changes do more than just enable proper parallel build?

Split out to #4056. The script changes are only DEB_BUILD_OPTIONS="parallel=$(nproc)", nothing else, so dh_auto_build runs make -jN instead of single-threaded. The ci.yml side is just the concurrency block, gated on pull_request events, which cancels the older run when a new commit lands on the same PR. The nocheck workaround I'd added to build-package-indep.sh is dropped here too. Root cause of the trixie/bookworm autorestart hang was fixed by 385c57e, so dh_auto_test is back to a no-op. This PR will rebase to drop the .github changes once #4056 lands.

@grandixximo
Copy link
Copy Markdown
Contributor Author

@BsAtHome about

Move all generated documents and translations, including the source language, into a subtree .../docs/build/{en,de,...}/{man,pdf,html,...}. Then everything generated is found in one place instead of all over the place.

In scope in this PR? was discussed in the meeting?

@BsAtHome
Copy link
Copy Markdown
Contributor

Move all generated documents and translations, including the source language, into a subtree .../docs/build/{en,de,...}/{man,pdf,html,...}. Then everything generated is found in one place instead of all over the place.

In scope in this PR? was discussed in the meeting?

No,it wasn't discussed. It has just been on my wish list for some time. However, there is no need or requirement to have it done now. Lets just have the build-process fixed and then in a later PR we can fix file organisation.

Comment thread docs/src/Submakefile Outdated
Comment on lines +879 to +891
# own subtree so the exclude is empty.
$(eval $(call ASCIIDOCTOR_HTML_RULE,en,$(DOC_SRCDIR),^($(LANGUAGES_MATCH))/))
$(eval $(call ASCIIDOCTOR_HTML_RULE,ar,$(DOC_SRCDIR)/ar))
$(eval $(call ASCIIDOCTOR_HTML_RULE,de,$(DOC_SRCDIR)/de))
$(eval $(call ASCIIDOCTOR_HTML_RULE,es,$(DOC_SRCDIR)/es))
$(eval $(call ASCIIDOCTOR_HTML_RULE,fr,$(DOC_SRCDIR)/fr))
$(eval $(call ASCIIDOCTOR_HTML_RULE,nb,$(DOC_SRCDIR)/nb))
$(eval $(call ASCIIDOCTOR_HTML_RULE,ru,$(DOC_SRCDIR)/ru))
$(eval $(call ASCIIDOCTOR_HTML_RULE,sv,$(DOC_SRCDIR)/sv))
$(eval $(call ASCIIDOCTOR_HTML_RULE,ta,$(DOC_SRCDIR)/ta))
$(eval $(call ASCIIDOCTOR_HTML_RULE,tr,$(DOC_SRCDIR)/tr))
$(eval $(call ASCIIDOCTOR_HTML_RULE,uk,$(DOC_SRCDIR)/uk))
$(eval $(call ASCIIDOCTOR_HTML_RULE,zh_CN,$(DOC_SRCDIR)/zh_CN))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But here the language list is explicit again.

Isn't the (extracted) po4a.cnf language list the one that is authoritative?

@grandixximo
Copy link
Copy Markdown
Contributor Author

I think build process is fixed, looking at the styling now, will take a look at the explicit language references as well...

@hansu
Copy link
Copy Markdown
Member

hansu commented May 25, 2026

I took the freedom and added some commits on your branch, hope that is okay ;-)

@grandixximo
Copy link
Copy Markdown
Contributor Author

grandixximo commented May 25, 2026

No worries, feel free to keep them coming

build-package-indep.sh: nocheck was a workaround for the trixie/bookworm autorestart hang in dh_auto_test; root cause fixed in 385c57e so the test step is back to its normal no-op.

.gitignore: top-level autom4te.cache/ gets created by debian/configure on every run; src/autom4te.cache was already ignored, mirror it here.
Replace 11 hardcoded ASCIIDOCTOR_HTML_RULE eval calls with a foreach
over $(LANGUAGES).  po4a.cfg becomes the single source of truth for
which languages have HTML output; ar/sv/ta/tr (no docs/src/<lang>/
tree) are dropped, others added free.  Addresses bertho review on
PR LinuxCNC#4053.
Comment on lines +15 to +20
state :root do
# add # comments
rule %r/#.*$/, Comment::Single

# keep existing ; comments
rule %r/;.*$/, Comment::Single
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this highlight:

var=val #comment

However, this is not how the ini parser works. the value is val #comment. Same for semi-colon.

(matches in parenthesis)
Sections are: ^\s*(\[)([a-zA-Z_][a-zA-Z0-9_]*)(\])\s*([#;].*)?$
Note the optional comment match after the section header.

Valid variable name and content: ^\s*([a-zA-Z_][a-zA-Z0-9_]*)\s*(=)\s*(.*)$
However, the variable syntax here does not account for '\' continuations, which makes it quite a bit harder. I don't know how that is handled in rouge. Another hard thing, if you want it highlighted, are the allowed escape sequences in the value.

The only line comments are: "^\s*([#;].*)$

Then there is the include syntax: ^(#INCLUDE)\s+(.*)\s*$

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small correction. Variable content is right trimmed. Therefore: ^\s*([a-zA-Z_][a-zA-Z0-9_])\s(=)\s*(.)\s$

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, very good.

The example does show the '[' and ']' are part of the section name. That is technically not correct and should be treated like the '='.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the default INI highlighter ... but I guess this can also be overridden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants