Skip to content

DocBookVsReStructuredText

Vic edited this page Feb 14, 2023 · 3 revisions

I started using DocBook for the Fityk manual in 2002, after reading ESR's DocBook Demystification HOWTO. AFAIR in that time the major alternatives were LaTeX, GNU Texinfo and HTML. I maintained the manual (20-30 pages) in DocBook format for 7 years.

DocBook was a pain to write, read and process (especially math and figures). In 2009 I considered a few lightweight alternatives:

  • In the zoo of wiki and forum markups, Markdown is an elegant and popular one, but rather limited and not extensible.
  • There is an effort to create an universal, unified markup language ''to be used across different wikis'' called Creole. I really hope they succeed, but like Markdown, it's not really suitable for my manual.
  • One guy working for OLPC wrote a draft of CrossMark, an extensible markup language based on Markdown, intended to be used by children, but the idea was abandoned.
  • Doxygen. I use it to generate docs from code, but it's not so useful for stand-alone manual.
  • ReST. Although I like more the syntax of Markdown, reST is extensible and, thanks to Sphinx, it is easy to generate good looking HTML and PDF. Disadvantages:
    • no nested inline markup,
    • ugly syntax of links.

Finally I converted the manual to reStructuredText and, after a few months, I can see this was a good move.

Why I do not like DocBook

heavy weight

DocBook is a heavy-weight markup language.

Two representative examples from DocBook docs (from here and here):

<para>
You can exit from GNU Emacs with
<menuchoice>
  <shortcut>
    <keycombo><keysym>C-x</keysym><keysym>C-c</keysym></keycombo>
  </shortcut>
  <guimenu>Files</guimenu>
  <guimenuitem>Exit Emacs</guimenuitem>
</menuchoice>.
</para>
<para>
The <acronym>IRQ</acronym> of the <hardware>SCSI Controller</hardware>
can be set to 7, 11, or 15.  The factory default setting is 7.
</para>

This makes documentation (source) hard to read and write.

You waste a lot of time on tagging, and the only difference it makes is that the source is uglier and harder to maintain. There is a lot of tags that don't change rendering. The tags don't help with searching in any real world scenario, and probably will never do.

In my experience, the most painful thing in writing DocBook is embedding mathematical formulas and figures, and trying to make it look decently in both HTML and PDF.

During the seven years of maintaining Fityk manual only one user bothered to edit the XML. He corrected my English and rewrote parts of the text. I feel bad that he had to read between the ugly tags and that the markup wasted also his time.

417 elements

DocBook has hundreds of tags (elements), mostly very specific tags, but often none of them is appropriate.

One example. Fityk has a mini-language. The commands of this mini-language are neither a <command> (which is used for executable program) nor <function> (''function or subroutine, as in a programming language'').

Let's quote an [[email|http://www.mail-archive.com/gnome-doc-list@gnome.org/msg01529.html]] from gnome-doc-list:

You know how, in DocBook, you have to hunt through 50 or so inline elements, and then there are maybe five that sort of closely match what you're trying to mark up? Or you have to use systemitem because there's not a specific element for your needs, but then that feels dirty because there are very specific elements for other things. Let's stop the insanity.

(GNOME folks are designing own XML format for documentation.)

The tag craziness is what drives DocBook. Browsing the list of all elements takes a lot of time and although every new DocBook version comes with even more elements, it will never have all that you need.

It is possible to extend DocBook on your own, but it takes time and the extended DocBook is actually not DocBook anymore. The tools you are using may not understand it. This is not a way to go.

semantic markup hype

The ESR's HOWTO calls DocBook a structural markup, as opposed to presentation markup. That's like the difference between HTML tags <em> and <i>.

In DocBook the tree of elements is explicit

   <section>
    <title>My Title</title>
    <para>
     first paragraph
    </para>
    <para>
     second paragraph
    </para>
   </section>

It is equivalent to this in LaTeX:

   \section{My Title}

   first paragraph

   second paragraph

In this case it is 1:1 mapping. There is also a lot of examples of N:1 mapping (multiple semantic tags correspond to one presentation) e.g. <emphasis>, <firstterm>, <replaceable> and <biblioentry/title> are rendered in italic. (Users who have another motivation to display italic text will probably abuse <emphasis>, because that's the most generic tag).

But that is not true that semantics and presentation are separated. Tags such as lists, table or sidebar specify the look rather than the meaning. Look at this example:

<para>
Here is the same <sgmltag>SimpleList</sgmltag> rendered horizontally with
three columns:
<simplelist type='horiz' columns='3'>
<member>A</member>
<member>B</member>
<member>C</member>
<member>D</member>
<member>E</member>
<member>F</member>
<member>G</member>
</simplelist>
</para>

It is clearly a presentation markup. Like all popular markup languages, DocBook contains both structural (semantic) and presentation information. Obviously, the former dominates, but having more of the semantics is rarely worth the hassle.

Developers of Boost libraries use BoostBook (extended DocBook) as a documentation format. BoostBook docs are not easier to maintain than DocBook docs, so some of the developers use a light-weight markup language named QuickBook. The presentation-oriented QuickBook is converted to "semantic" BoostBook, which in turn is converted to (presentation) HTML and PDF.

ESR also wrote in the HOWTO:

DocBook might help get us to a world in which all the documentation on your open-source operating system is one rich, searchable, cross-indexed and hyperlinked database.

Even if you believe this, for now it might be better to focus on documentation itself, and if your resources are limited, it is better to keep your markup simple.