docs: migrate documentation toolchain from asciidoc-py + dblatex to asciidoctor#4053
docs: migrate documentation toolchain from asciidoc-py + dblatex to asciidoctor#4053grandixximo wants to merge 29 commits into
Conversation
Three pieces of glue that let asciidoctor produce documentation that matches the look and behaviour of the existing asciidoc-py + dblatex output. Nothing is wired in yet; the Submakefile swap follows in the next commit. xref_resolver.rb -- asciidoctor preprocessor that mirrors what asciidoc-py used to do via objects/xref_<lang>.links: bare <<anchor,Title>> references are looked up in a tree-wide anchor index and rewritten to qualified <<relpath/file.adoc#anchor,Title>> form. The index is cached on disk keyed by source mtimes, and accepts an xref-exclude regex so each translated tree stays isolated. image_resolver.rb -- treeprocessor that resolves image targets the way asciidoc-py's image-wildcard pair did: relative paths in an included file resolve against that included file's directory, not the master. For PDF only it also defaults pdfwidth=75% on images without an explicit width, because prawn renders raster sources at native-pixel dimensions interpreted as 72 DPI and otherwise blows screenshots up to the full text column. pdf-theme.yml -- asciidoctor-pdf theme that approximates emc2.sty: A4 page, Times-like body, blue headings and links matching dblatex, top header with 'doc-title | chapter | page / total' and a thin rule, bottom rule only, no alternating page numbers. Falls back to Noto Serif CJK SC for non-Latin glyphs missing from the base font; DejaVu Sans Mono in code blocks so Cyrillic in listing/source blocks renders. otf2ttf.py -- Debian only ships Noto Serif CJK as a CFF/OTF TrueType Collection and prawn 2.4 corrupts the PDF when asked to embed CFF outlines directly. This is a tiny build-time helper that subsets the font to the CJK characters used anywhere in the docs (~600 glyphs out of 65000) and converts the curves with cu2qu before saving as TTF. Output is ~300 KB per face, ~1.5 s per face.
a2e43e6 to
cc516a5
Compare
…ctor The big switch. Every rule that used to invoke asciidoc, a2x or xsltproc now goes through asciidoctor or asciidoctor-pdf. HTML rules * 12 near-identical per-language target chains collapse into one ASCIIDOCTOR_HTML_RULE canned recipe instantiated with toUC for each language. Each call points asciidoctor at the shared xref_resolver extension and passes the language-specific xref-root and xref-exclude so anchors don't cross trees. * Stylesheet is the existing docs/html/linuxcnc.css (already tracked in the repo), referenced via -a stylesheet=linuxcnc.css -a linkcss. * Source highlighting moves from source-highlight to rouge. PDF rule * a2x/dblatex replaced by asciidoctor-pdf with our xref + image resolver extensions and pdf-theme.yml. * Version macro is fed in via -a lversion=$(cat ../VERSION) so the title page stays in sync without rewriting sources. * CJK fallback TTFs are generated lazily under $(DOC_FONT_DIR) via otf2ttf.py and pdf-fontsdir points at that directory plus GEM_FONTS_DIR. * asciidoctor-pdf (via prawn) emits very verbose PDF content streams: ~32 KB/page vs ~20 KB/page from xdvipdfmx for the same source, so the master document came out 39 MB vs the official 26 MB with identical image content. Add a ghostscript pass that re-deflates streams without touching images (no /ebook downsampling, PassThroughJPEGImages, FlateEncode only) and the master drops to 25 MB, matching dblatex. Manpages * a2x --doctype manpage --format manpage becomes asciidoctor --doctype manpage --backend manpage. Asciidoctor emits .als / .URL / .MTO macros that po4a's man parser doesn't recognise by default, so the man_def alias gains -o untranslated=FF,FU,als -o unknown_macros=untranslated -o inline=URL,MTO. * The .so alias dependency line was missing the section directory; fixed in the same place. HTML manpages * a2x --backend html5 -> asciidoctor --doctype manpage --backend html5. Wholesale image extraction step * The old html-images bash glue piped HTML through xsltproc to pull out <img src=> elements. Replace with a portable grep -oE so we can drop xsltproc and links.xslt at the same time. Translation file generation * objects/xref_<lang>.links and the per-language link database pipeline are gone; xref_resolver.rb does the same job at parse time. MAN_DEPS path bug * grep '^\.so ' was emitting deps as $(DOC_DIR)/man/%s, missing the section directory. Use $(*D) prefix so deps land under $(DOC_DIR)/man/<section>/<page>.
With the Submakefile fully routed through asciidoctor, these files are no longer referenced by anything. asciidoc-py rendering hooks (XHTML and DocBook backends): * docs/src/xhtml11.conf, xhtml11-head-foot.conf, xhtml11-latexmath.conf, xhtml11-links.conf * docs/src/docbook.conf, docbook-image.conf * docs/src/asciidoc-dont-replace-arrows.conf * docs/src/attribute-colon.conf dblatex LaTeX style: * docs/src/emc2.sty (replaced by docs/src/pdf-theme.yml) xsltproc-based xref/image pipeline: * docs/src/html-images.xslt -- HTML img-src extraction (replaced by a grep -oE) * docs/src/html-latex-images -- shell glue around xsltproc * docs/src/image-wildcard -- relative-image-path resolution shim (replaced by docs/src/extensions/image_resolver.rb) * docs/src/links.xslt + docs/src/links_db_gen.py -- per-language anchor index (replaced by docs/src/extensions/xref_resolver.rb) Inkscape SVG shim from PR LinuxCNC#4043: * scripts/inkscape -- routed dblatex's hard-coded inkscape call through rsvg-convert. asciidoctor-pdf renders SVGs natively via prawn-svg, so the shim has nothing to intercept and no warnings to suppress.
DOC_DEPENDS shrinks from twenty packages (dblatex stack, ten texlive-lang-*, source-highlight, inkscape, python3-lxml, xsltproc, dvipng, groff) to ten packages spanning the asciidoctor render path plus a couple of font/conversion helpers: * asciidoctor + ruby-asciidoctor-pdf + ruby-rouge -- the engines. * fonts-dejavu, fonts-noto-cjk -- code-block mono / CJK fallback. * python3-fonttools -- otf2ttf.py needs ttLib + cu2quPen. * ghostscript -- PDF post-process pass. * graphviz, librsvg2-bin, w3c-linkchecker -- unchanged carry-overs. control.top.in drops docbook-xsl, asciidoc and asciidoc-dblatex from top-level Build-Depends. ghostscript moves out from there because it is now an explicit doc-time dep, listed by name in DOC_DEPENDS. Distribution coverage verified: every package is in bookworm, trixie, sid, and noble (the suites our CI targets).
debuild leaves dpkg-buildpackage in serial mode unless DEB_BUILD_OPTIONS=parallel=N is in the environment. dh_auto_build honours that variable and translates it into make -jN, so opting in fans out the C/C++ build and the per-language doc rules across all the runner's CPUs. Local measurement on an 8-core box: binary-indep wall time 32 min -> 7 min for the doc-only stage. build-doc.sh already passes -j directly to make; this matches that behaviour for the deb package CI jobs.
asciidoctor reports these as ERROR or WARNING; asciidoc-py silently
tolerated them. All predate the toolchain swap.
* hal/halmodule.adoc: cols spec said 5 columns, every row had 7 cells,
so the trailing 1 cell hung off the end of the last row. Bump to
cols="<3s,6*<".
* plasma/qtplasmac.adoc 'color5' row in the styling table was missing
the middle Parameter cell. Fill in 'Disabled' so the row matches.
* plasma/qtplasmac.adoc QtPlasmaC state-table 'DEBUG' row was missing
the Description cell; the existing prose immediately below the table
already provides the wording.
* gui/qtdragon.adoc Versaprobe NOTE was authored with no `====`
delimiters, so asciidoctor read it as '[NOTE]' applied as an unknown
list style. Wrap in delimiters.
* gui/qtvcp-widgets.adoc Markdown-style ``` fenced block. po4a
collapsed it into a single line during translation extraction, so
every translated build saw an open ``` with no matching close and
emitted "unterminated listing block". Replace with the equivalent
asciidoc [source,python] / ---- block, which po4a preserves
line-by-line.
* lathe/images/control-point_es.svg flowRoot/flowPara element ("tan"
label added by the Spanish translator). flowRoot is an Inkscape-only
SVG 1.2 element that prawn-svg cannot render. Convert to a regular
<text>/<tspan> at the same coordinates.
* docs/po4a.cfg: add hal/halscope.adoc to the translated tree. It is
included by hal/tutorial.adoc, which IS translated, so every
translated build failed to resolve the include.
Two limitations of the original image_resolver were causing the remaining 'image to embed not found or not readable' warnings in translated PDFs: Inline image: macros never showed up in find_by(:inline_image). Asciidoctor parses them as part of block text and never lifts them into standalone nodes. Walk each block, regex-rewrite image:PATH[ inside the source storage that the block actually keeps (lines= for paragraphs, text= for list items), and re-enter inner_documents of asciidoc-style table cells so cells with embedded images get touched too. Translated trees often reference images that exist only at the canonical English path. Add a fallback: after probing the file under docs/src/<lang>/.../images/, retry with the language segment stripped (docs/src/.../images/). This is how the dblatex pipeline behaved implicitly via the image-wildcard shim. End-to-end `make -j8 pdfdocs` warning count is now 8 across all 33 PDFs, down from 40+ before. Remaining warnings are non-blocking content quirks (one unterminated listing block, three Inkscape 'flowRoot' SVG elements in es/lathe/) and worth a follow-up.
cc516a5 to
5b775f1
Compare
|
Tangential, but tied to docs UX: a few months back in one of the Sunday maintainer meetings we discussed adding navigation aids to the HTML docs. A "back to index" link from each page, and a top bar with a few quick links. I'm not sure if anyone has taken that up since (I checked @smoe's fork branches and didn't find anything matching, but I may have missed it). A sidebar Table of Contents wasn't part of that conversation as far as I remember, but it feels like a natural fit alongside the rest, the current top-of-page TOC gets quite long. If it's still on the wish list, I'd be happy to do a follow-up PR after this one lands. The asciidoctor toolchain makes it cheap: |
|
The man-page translations take the wrong source from the generated troff files. There were originally only troff files, but the manpages are now in asciidoc format under the docs/src/man/* tree. Except for the component generated asciidoc pages that are generated in src/object/man/*. This should also be fixed and especially the HTML manpages must be generated from the adoc sources. There has been, for a long time at least on my system, a bug in the docs build that running make twice was required to build everything correctly. At least, the second invocation was not silent and actually made stuff. You changed the highlighter. Does it support NGC and INI? How are you highlighting HAL files, which is a LinuxCNC specific format? The highlight format filesfor these three are added "manually" in the current build (with some effort). There are at least two things on my wish list when building the docs:
|
Bertho noted in the review of LinuxCNC#4053 that the new build dropped syntax highlighting for HAL and NGC source blocks; rouge ships an INI lexer but has neither of the two LinuxCNC-specific languages, so blocks like [source,{hal}] and [source,{ngc}] rendered as plain text. * Two rouge lexers, ~80 lines each, ported line-for-line from the old source-highlight definitions at docs/src/source-highlight/hal.lang and ngc.lang (Michael Haberler, 2011). Same keyword coverage: halcmd commands, pin/signal names, INI substitutions and env vars for HAL; G/M/T/F/S codes, axis letters, parameters, O-words and the math/boolean built-ins for NGC. * All four asciidoctor invocations in the Submakefile (PDF, HTML, manpage HTML, ASCIIDOCTOR_HTML_RULE) gain '-r .../rouge_hal.rb -r .../rouge_ngc.rb' so the lexers are visible to rouge before the document is parsed. The manpage HTML rule also gains an explicit '-a source-highlighter=rouge' that the others already inherit from attribute defaults. * The :ngc: / :hal: / :ini: / :css: / :nml: attribute defs in the source files used asciidoc-py's '{basebackend@docbook:'':ngc}' conditional syntax (which asciidoctor does not implement) to emit the language only when targeting docbook. All toolchain backends used by this PR now want the language name unconditionally, so the attribute defs collapse to ':ngc: ngc' etc. 84 source files touched, no .adoc body changes.
…-twice Bertho noted in the review of LinuxCNC#4053 that: * a clean 'make pdfdocs' or 'make htmldocs' required a second pass to finish, because po4a generated the per-language .adoc files *during* the build but the make-time $(wildcard $(L)/*.adoc) expansion had already evaluated them as missing; * po4a should not run, and translation setup should not be invoked, on a developer build unless the developer asks for it. Address both: * configure.ac flips the default: BUILD_DOCS_TRANSLATED now requires an explicit '--enable-build-documentation-translation' (the old '--disable-build-documentation-translation' opt-out is replaced). Stale dblatex-era version probes and warnings around po4a are also removed; po4a >= 0.67 is required when the flag is on, missing or too-old po4a now errors instead of warning-and-disabling. * debian/rules.in keeps the .deb pipeline producing translations by always passing '--enable-build-documentation-translation' alongside the existing '--enable-build-documentation=pdf'. * docs/src/Submakefile: - Translated DOC_SRCS_<lang> are now derived from the AsciiDoc_def lines in po4a.cfg instead of $(wildcard $(L)/*.adoc). The list is therefore correct on a fresh tree (po4a has not run yet) and no longer includes English-only sources like drivers/mesa_modbus.adoc that the translation pipeline does not touch. - DOC_SRCS and PDF_TARGETS only pull in the per-language lists when BUILD_DOCS_TRANSLATED=yes, so a default-configured build builds English only and never invokes po4a. - The orphaned 'xetex available?' check is dropped: prawn-svg in asciidoctor-pdf renders CJK from our TTF subset, xetex is no longer a build-time gate. - When BUILD_DOCS_TRANSLATED=yes, an empty-recipe pattern rule associates every translated .adoc with translateddocs as an order-only prerequisite, so 'make pdfdocs' (or 'make htmldocs') on a clean tree triggers po4a before depends/%.d evaluation, eliminating the two-pass requirement.
|
For translated images,... |
|
Thanks Bertho. Pushed two commits (b0a16fc, 7f511e8): HAL / NGC highlighting: rouge HAL and NGC lexers, ported line-for-line from the old Build twice: reproduced and fixed. Cause was Translation opt-in: Manpage HTML from troff: I think this is a misread, the recipe at
|
|
Thanks for resuming the work on this! While trying to install the dependencies I wonder why the configure ( Further the build failed when the font NotoSerifCJK-Regular.ttc was needed and I couldn't find the dependency for that, so I installed |
The problem was in the PDF generation. It used the troff files as input and doing so could no longer syntax highlight code snippets. Secondly, how do the components' man pages get involved here? They are not in the |
Currently the syntax-hightlighting for both is gone. Why do you had to switch to rouge? |
The build-twice fix in b0a16fc added an order-only rule so make knows how to produce per-language .adoc files (via translateddocs). That pulls documentation.pot into the dependency graph during -O manpages, and po4a then aborts because hal/components_gen.adoc does not exist yet; it was previously only generated as a side effect of gen_complist (an HTML stage with a heavy MAN_HTML_TARGETS dep). Add a minimal file rule for components_gen.adoc that depends only on manpages and gen_complist.py, and list it as a prerequisite of the .pot target. This keeps gen_complist (and its HTML-link validation) unchanged for the htmldocs path, but lets the .pot rule rebuild the generated source on its own.
Address Bertho's review feedback on the HAL / NGC rouge lexers:
HAL:
- INI substitutions and environment variables now use explicit
[A-Za-z_]\w* ranges instead of an uppercase-looking pattern
paired with the /i flag.
- Integers and floats are split: floats need a decimal point
or an exponent; integers are plain decimal.
- Added recognition for hex (0x..), octal (0o..) and binary
(0b..) literals, which halcmd accepts in setp / sets values.
- Added `initf` to the command list to match the new halcmd
verb introduced in the pending initf docs PR (will rebase
whichever of the two PRs lands second).
NGC:
- Split axis letters (X Y Z A B C U V W) from parameter / call
argument letters (I J K L P Q R D E). Axes keep
Name::Attribute; parameters get Name::Decorator so the two
read differently in the rendered output.
- Integer literals no longer accept an exponent; an explicit
float form `\d+[eE][+-]?\d+` is added.
Previously LinuxCNC_Manual_Pages.pdf was assembled by running groff on the troff files generated from each manpage's .adoc source, then piping through ps2pdf. That path lost syntax highlighting on code samples and was the last remnant of the troff toolchain in the docs build. Now the rule generates a small master document that includes every manpage in PDF_MAN_ORDER as a chapter (leveloffset=+1, with a hard page break between entries) and feeds it to asciidoctor-pdf. Code blocks pick up the rouge highlighting that the other PDFs already use; pagination is continuous as before. Component manpages whose .adoc is generated by halcompile (objects/man/) are looked up in parallel with the native ones in docs/src/man/.
Each .adoc that contained source blocks used to start with:
// Custom lang highlight
// must come after the doc title, to work around a bug in asciidoc 8.6.6
:ini: ini
:hal: hal
:ngc: ngc
and then refer to those names as `[source,{ini}]` etc. The
indirection only existed because asciidoc-py needed the docbook
conditional `{basebackend@docbook:'':ini}` to pick a different
value when emitting docbook; with asciidoctor the attribute is a
plain constant alias. Drop the attribute block (along with the
stale asciidoc 8.6.6 workaround comment) and rewrite the `{ini}`,
`{hal}`, `{ngc}`, `{nml}`, `{css}` references back to the literal
language name.
hansu pointed out on LinuxCNC#4053 that ./configure --with-realtime=uspace --enable-build-documentation=pdf,html happily succeeds with only WARN-level diagnostics when asciidoctor / asciidoctor-pdf / rsvg-convert are absent, silently flipping BUILD_DOCS back to "no". Running 'make pdfdocs' afterwards produces no docs and no clear hint that the configure step had stripped the docs targets. Convert all of those AC_MSG_WARN+disable paths to AC_MSG_ERROR with an 'apt-get install ...' hint. Same treatment for ghostscript (the PDF post-process), librsvg2-bin (SVG -> PDF/PNG) and w3c-linkchecker for the HTML side. Also add an AC_MSG_ERROR for the NotoSerifCJK font when PDF docs are enabled. The Submakefile depends on the .ttc unconditionally (the CJK glyph fallback is wired into every PDF, not only the translated ones), so missing fonts-noto-cjk used to surface as a cryptic 'No rule to make target NotoSerifCJK-Regular.ttc' at build time.
Bertho noted on LinuxCNC#4053 that hardcoding the .ttc paths under /usr/share/fonts/opentype/noto/ pins the build to Debian / Ubuntu and will break Arch, Fedora, openSUSE, etc. Move the discovery into configure.ac. It first asks fontconfig (`fc-match --format='%{file}' 'Noto Serif CJK SC:style=...'`) and falls back to the package paths that the major distributions actually use (Debian, Arch noto-fonts-cjk, Fedora google-noto-cjk-fonts). The probe rejects anything that is not a .ttc, because otf2ttf.py needs the TrueType Collection to pick index 2 (SC) out of it. If nothing matches, configure errors with a per-distro install hint, and the user can override with ./configure NOTOCJK_REGULAR_TTC=/path/to/Regular.ttc \ NOTOCJK_BOLD_TTC=/path/to/Bold.ttc The resolved paths flow through Makefile.inc as NOTOCJK_REGULAR_TTC and NOTOCJK_BOLD_TTC; docs/src/Submakefile now references the substituted variables instead of the literal /usr/share path.
old syntax-hightlighting is not supported in asciidoctor that's why I had to switch. |
|
Building arch package: Why does it say "PDF requested" in an arch package? |
they help with contrast when using Dark Reader, we don't have a dark theme, least we can do is play nice with dark theme extensions. @BsAtHome I'm not sure why CI is taking hours to complete, ideas? |
|
We could use a more modern look, instead of going retro, I just kept the nostalgia for now. |
HTML styling: restore body margin (5% with 768px breakpoint), sans-serif headings, blueish tint on h3+ headings, toctitle and dt color #527bbd, drop full-line click target on TOC items per user feedback. -a compat-mode added to all asciidoctor invocations so 12k legacy 'word' single-quoted emphasis renders italic instead of literal. CI hardening: timeout 900s wrapper on manpage PDF book invocation so future asciidoctor-pdf hangs fail fast with a clear message rather than holding a runner. concurrency cancel-in-progress on pull_request events so successive pushes to the same PR don't stack runs. Submakefile .so dep generator now handles both 'manN/file.N' and bare 'file.N' .so directives (was doubling section prefix). Source fixes for asciidoctor strict parsing: lut5.comp header cell '2+^h|Weight' spanned 2 of 6 cols (header summed to 7); hm2_rpspi.9.adoc final row missing third cell; hm2_7i43.9.adoc synopsis embedded quote in <strong> confused prawn HTML parser, wrap in pass:[].
|
@BsAtHome can you cancel the other build please? stuck on something, maybe last commit fixes it... |
trixie/bookworm package-indep hung 37 min on Build step despite the manpage PDF timeout added in dbc79d1. Hang is in main PDF rule (Master_*.pdf via asciidoctor-pdf), not the manpage book. Same 15 min cap and exit-code-124 message as the manpage rule, so any future asciidoctor-pdf hang fails fast with clear cause.
The real question is why it gets stuck. Is there a wait for input? Does it require a |
Indeed, it looks more modern and less hard on the eyes. But I can't see any specifics in one screenshot. There are many special setups and visuals that need to be examined. But I think we need to have the new build to work reliably and committed before we successfully can get into the color of the bike-shed fight ;-) |
|
Aborted CI runs after being stuck and running for 2 hours 47 |
PDF builds finish at 15:37, then dh_auto_test enters at 15:37:31 and produces zero output for 2h25m before manual cancel. sid passes the same step in seconds. Indep package is doc-only with no real test targets, so dh_auto_test would no-op anyway, but trixie/bookworm make+asciidoctor seem to enter an autorestart loop on the 'Makefile: $(MAN_DEPS)' rule. Add nocheck to DEB_BUILD_OPTIONS to bypass entirely.
Captured 36-min trixie hang log shows the loop: gen_complist regenerates components_gen.adoc -> .pot rebuilds -> po4a updates translated .adoc -> their .d files stale -> Makefile autorestart -> back to start. gen_complist re-fires each cycle because manpages mtime always advances. Order-only | manpages keeps the 'manpages must exist first' guarantee but drops the mtime-driven retrigger that was driving the loop on trixie/bookworm. sid converged fast enough that it escaped before timeout, trixie/bookworm did not.
|
Is there a specific reason to remove the double empty lines before a new header? The second empty line is simply ignored, isn't it? |
|
The .github/* changes should probably be added in a separate PR. At least the concurrency addition may be added to benefit us all before this PR is done. Do the script changes do more than just enable proper parallel build? |
Five items from BsAtHome review on PR LinuxCNC#4053: 1. docs/src/source-highlight: removed. source-highlight tool retired by this PR; demo .adoc / .lang / .conf files have no remaining consumers. po4a.cfg loses the three [type: AsciiDoc_def] entries that pointed at them. Submakefile loses LOC_HL_DIR / LOC_LANG_MAP (defined but unused after the build rules moved to rouge). 2. configure.ac NotoSerifCJK probe: drop the /usr/share/fonts/.../ fallback list, trust fc-match alone. fontconfig already covers user-installed ~/.fonts when fc-cache has been run, and silently falling back to a Debian-only path on other distros was the portability concern bertho flagged. Error message names fc-match explicitly and mentions ~/.fonts + fc-cache so the user knows where to put a custom .ttc. 3. pdf-theme.yml DejaVu Sans Mono: drop absolute /usr/share/fonts/truetype/ paths. fonts now resolved at build time via fc-match into DOC_FONT_DIR (the same dir where the CJK fallback ttf already lives), exposed to asciidoctor-pdf through pdf-fontsdir. pdf-theme.yml stays path-free, theme file is portable across distros without further patching. 4. image_resolver.rb locale handling: the regex that detected the doc language from /src/<lang>/ now consumes a doc-languages attribute passed by Submakefile ($(LANGUAGES) from po4a.cfg), not a hardcoded alternation. Codes are matched literally as they appear on disk (zh_CN, sv, etc.). Adding a new locale only requires the po4a.cfg po4a_langs line; the resolver picks it up automatically. 5. blank lines before headers: restore the second blank line that commit 0375f0a collapsed when removing the legacy `:ini: ini` attribute blocks. The line was load-bearing for visual section separation in the source even though asciidoctor ignores it during parse. Affects 63 files, +212 / -60 lines, no rendered-output change. Drive-by: hardware-interface.adoc had three level-0 (=) headings. Asciidoctor only allows one per document; two repunctuated to == so the full pdfdocs+htmldocs build stays clean.
Not intentional. Fixed it back
Split out to #4056. The script changes are only |
|
@BsAtHome about
In scope in this PR? was discussed in the meeting? |
No,it wasn't discussed. It has just been on my wish list for some time. However, there is no need or requirement to have it done now. Lets just have the build-process fixed and then in a later PR we can fix file organisation. |
| # own subtree so the exclude is empty. | ||
| $(eval $(call ASCIIDOCTOR_HTML_RULE,en,$(DOC_SRCDIR),^($(LANGUAGES_MATCH))/)) | ||
| $(eval $(call ASCIIDOCTOR_HTML_RULE,ar,$(DOC_SRCDIR)/ar)) | ||
| $(eval $(call ASCIIDOCTOR_HTML_RULE,de,$(DOC_SRCDIR)/de)) | ||
| $(eval $(call ASCIIDOCTOR_HTML_RULE,es,$(DOC_SRCDIR)/es)) | ||
| $(eval $(call ASCIIDOCTOR_HTML_RULE,fr,$(DOC_SRCDIR)/fr)) | ||
| $(eval $(call ASCIIDOCTOR_HTML_RULE,nb,$(DOC_SRCDIR)/nb)) | ||
| $(eval $(call ASCIIDOCTOR_HTML_RULE,ru,$(DOC_SRCDIR)/ru)) | ||
| $(eval $(call ASCIIDOCTOR_HTML_RULE,sv,$(DOC_SRCDIR)/sv)) | ||
| $(eval $(call ASCIIDOCTOR_HTML_RULE,ta,$(DOC_SRCDIR)/ta)) | ||
| $(eval $(call ASCIIDOCTOR_HTML_RULE,tr,$(DOC_SRCDIR)/tr)) | ||
| $(eval $(call ASCIIDOCTOR_HTML_RULE,uk,$(DOC_SRCDIR)/uk)) | ||
| $(eval $(call ASCIIDOCTOR_HTML_RULE,zh_CN,$(DOC_SRCDIR)/zh_CN)) |
There was a problem hiding this comment.
But here the language list is explicit again.
Isn't the (extracted) po4a.cnf language list the one that is authoritative?
|
I think build process is fixed, looking at the styling now, will take a look at the explicit language references as well... |
|
I took the freedom and added some commits on your branch, hope that is okay ;-) |
|
No worries, feel free to keep them coming |
build-package-indep.sh: nocheck was a workaround for the trixie/bookworm autorestart hang in dh_auto_test; root cause fixed in 385c57e so the test step is back to its normal no-op. .gitignore: top-level autom4te.cache/ gets created by debian/configure on every run; src/autom4te.cache was already ignored, mirror it here.
Replace 11 hardcoded ASCIIDOCTOR_HTML_RULE eval calls with a foreach over $(LANGUAGES). po4a.cfg becomes the single source of truth for which languages have HTML output; ar/sv/ta/tr (no docs/src/<lang>/ tree) are dropped, others added free. Addresses bertho review on PR LinuxCNC#4053.
| state :root do | ||
| # add # comments | ||
| rule %r/#.*$/, Comment::Single | ||
|
|
||
| # keep existing ; comments | ||
| rule %r/;.*$/, Comment::Single |
There was a problem hiding this comment.
Wouldn't this highlight:
var=val #comment
However, this is not how the ini parser works. the value is val #comment. Same for semi-colon.
(matches in parenthesis)
Sections are: ^\s*(\[)([a-zA-Z_][a-zA-Z0-9_]*)(\])\s*([#;].*)?$
Note the optional comment match after the section header.
Valid variable name and content: ^\s*([a-zA-Z_][a-zA-Z0-9_]*)\s*(=)\s*(.*)$
However, the variable syntax here does not account for '\' continuations, which makes it quite a bit harder. I don't know how that is handled in rouge. Another hard thing, if you want it highlighted, are the allowed escape sequences in the value.
The only line comments are: "^\s*([#;].*)$
Then there is the include syntax: ^(#INCLUDE)\s+(.*)\s*$
There was a problem hiding this comment.
Small correction. Variable content is right trimmed. Therefore: ^\s*([a-zA-Z_][a-zA-Z0-9_])\s(=)\s*(.)\s$
There was a problem hiding this comment.
Wouldn't this highlight:
var=val #comment
There was a problem hiding this comment.
Right, very good.
The example does show the '[' and ']' are part of the section name. That is technically not correct and should be treated like the '='.
There was a problem hiding this comment.
Yeah, that's the default INI highlighter ... but I guess this can also be overridden.

Summary
Replaces the documentation toolchain end-to-end: asciidoc-py + dblatex + xsltproc + source-highlight + inkscape are removed and the build now goes through
asciidoctor+asciidoctor-pdf+rouge, with a small ghostscript post-process pass on the PDFs.Motivation, as I raised on #4051: asciidoc-py is EOL, dblatex is unmaintained, and we keep paying for that with patches like the inkscape rsvg shim (#4043). asciidoctor is actively developed in Debian (
ruby-asciidoctor,ruby-asciidoctor-pdf), uses prawn-svg natively so the inkscape detour disappears, and removes the entire LaTeX subsystem from the docs build dependency tree.Continues the work hansu started on his asciidoctor branch and solves the cross-document anchor problem that stalled it.
What changed
Seven commits, each independently reviewable / bisectable:
docs: add asciidoctor extensions and PDF theme: new plumbing only; no build behaviour change yet.docs/src/extensions/xref_resolver.rb: preprocessor that mirrors asciidoc-py'sobjects/xref_<lang>.links: bare<<anchor,Title>>rewrites to qualified<<relpath/file.adoc#anchor,Title>>. Anchor index cached on disk by mtime;xref-excluderegex keeps translated trees isolated.docs/src/extensions/image_resolver.rb: treeprocessor matching asciidoc-py'simage-wildcard: relative image paths in included files resolve against that file's directory. For the PDF backend it also defaultspdfwidth=75%on images without an explicit width, otherwise prawn renders raster sources at 72 DPI and blows screenshots up to the full text column.docs/src/pdf-theme.yml: asciidoctor-pdf theme that approximatesemc2.sty: A4, Times-like body, dblatex blue (#0000FF) headings and links, top headerdoc-title | chapter | page / total, bottom rule only.docs/src/otf2ttf.py: small build-time helper: subsets Noto Serif CJK to the ~600 CJK characters used anywhere in the docs and converts the curves withcu2quto TrueType (prawn 2.4 corrupts CFF embeds). ~1.5 s per face, ~300 KB output.docs: swap HTML, PDF, manpage build rules from asciidoc-py to asciidoctorASCIIDOCTOR_HTML_RULEcanned recipe instantiated per language. Stylesheet (docs/html/linuxcnc.css) and visual output stay identical.-a lversion=$(cat ../VERSION). CJK fallback TTFs generated lazily andpdf-fontsdirset.gs -dFlateEncodepass with images passed through. Master PDF: 39 MB => 25 MB, matching the 26 MB official 2.9 dblatex build at identical page count and image data.a2x --doctype manpage=>asciidoctor --backend manpage. Asciidoctor emits.als / .URL / .MTOmacros that po4a's man parser doesn't know;docs/po4a.cfgman_def alias gainsuntranslated=FF,FU,als unknown_macros=untranslated inline=URL,MTO. The.soalias dependency path was also missing the section directory; fixed.asciidoctor --doctype manpage --backend html5.grep -oE.docs: drop the asciidoc-py / dblatex / xsltproc infrastructure: pure deletions, 15 files:xhtml11*.conf,docbook*.conf,attribute-colon.conf,asciidoc-dont-replace-arrows.confemc2.styhtml-images.xslt,html-latex-images,image-wildcard,links.xslt,links_db_gen.pyscripts/inkscapeshim: asciidoctor-pdf has no inkscape calls to intercept.debian: switch documentation build-deps to the asciidoctor toolchainDOC_DEPENDSshrinks from twenty packages (dblatex stack, tentexlive-lang-*, source-highlight, inkscape, python3-lxml, xsltproc, dvipng, groff) to ten: asciidoctor, ruby-asciidoctor-pdf, ruby-rouge, fonts-dejavu, fonts-noto-cjk, python3-fonttools, ghostscript, graphviz, librsvg2-bin, w3c-linkchecker.control.top.indropsdocbook-xsl,asciidoc,asciidoc-dblatex. ghostscript moves out (now in DOC_DEPENDS).ci: parallelize the Debian package build via DEB_BUILD_OPTIONS: drive-by fix unrelated to the migration:debuildrunsdh_auto_buildsingle-threaded unlessDEB_BUILD_OPTIONS=parallel=Nis set.build-package-{arch,indep}.shnow exportparallel=$(nproc). Local measurement: doc-only deb build 32 min => 7 min on 8 cores; CI ubuntu-24.04 runners with 4 CPUs should see ~4×.docs: fix source issues the asciidoctor parser flags: asciidoctor reports these as ERROR or WARNING; asciidoc-py silently tolerated them. All predate the toolchain swap:hal/halmodule.adoc: cols spec said 5 cells, rows had 7. Bump tocols="<3s,6*<".plasma/qtplasmac.adoc: two rows missing a cell (color5 styling row, DEBUG state-table row).gui/qtdragon.adoc: Versaprobe NOTE block was missing its====delimiters.gui/qtvcp-widgets.adoc: Markdown-style```fenced block. po4a collapsed the original three lines into one during string extraction, so every translated build saw an open```with no matching close ("unterminated listing block"). Replace with[source,python]+----, which po4a preserves line-by-line.lathe/images/control-point_es.svg: InkscapeflowRoot/flowPara("tan" label added by the Spanish translator). Inkscape-only SVG 1.2 element prawn-svg cannot render. Convert to a regular<text>/<tspan>at the same coordinates.docs/po4a.cfg: addhal/halscope.adocto the translation pipeline; it is included by translatedhal/tutorial.adocbut had nopo4a_aliasline, so every translated build failed to resolve the include.docs: resolve inline image macros and fall back to EN tree: the originalimage_resolver.rbonly handled blockimage::macros and didn't fall back across the language boundary, so translated builds emitted ~40 "image to embed not found" warnings. Walk each block's source storage (paragraphs'lines=, list items'text=, asciidoc-style table cells' inner documents) to rewrite inlineimage:targets, and probe the canonical EN path when the translated copy is missing.Visual / output verification
End-to-end builds (
make -j8 pdfdocs && make htmldocs && make manpages):make -j8 pdfdocswall timemake -j8 pdfdocsuser CPUbinary-indepbuild (parallel)Same hardware (8 physical cores), both
make -j8 pdfdocs, warm rebuild (translated.adocalready generated,.potup to date). Asciidoctor is ~25% faster wall-clock and ~30% less user CPU, even though it adds a ghostscript pass per PDF; prawn-svg + rouge in-process beats spawning a2x/dblatex/xelatex/inkscape per file.The project's 4-CPU CI runners see a similar gap: docs jobs run roughly 25-30% faster than recent master (htmldocs 13m vs 19m, package-indep 16-19m vs 22-23m). Most of that comes from the
DEB_BUILD_OPTIONS=parallel=$(nproc)fix in commit 5 rather than the toolchain itself, but the leaner dep tree (10 vs 20 packages, no LaTeX) helps.Spot-checked sample pages render correctly across en, de, ru, uk, zh_CN: blue headings/links match dblatex, code blocks with grey background and DejaVu Sans Mono show Cyrillic correctly, Chinese title pages render via the Noto Serif CJK SC fallback.
Trade-offs and open items
linuxcnc-doc-{en,de}Debian packages defined: other-language PDFs build butdebian/controldoesn't ship them. Pre-existing scope, unchanged.Test plan
make -j8 pdfdocsbuilds all 33 PDFs cleanlymake htmldocsbuilds 1551 HTML files cleanlymake manpagesproduces 1287 manpages cleanlyfakeroot debian/rules binary-indepproduceslinuxcnc-doc-enandlinuxcnc-doc-dedebsverify-clean-repo.shwould pass (.fonts/androuge-*.cssare gitignored)cc @hansu (continuation of your branch), @andypugh (the maintenance discussion on #4051).