Fetching contributors…
Cannot retrieve contributors at this time
644 lines (468 sloc) 20.2 KB

## Introduction

Initially this was meant as an email then it grew out of control…​

## Greetings

Hello everybody!

I am writing now to testify that the documentation effort is not dormant nor over. I am investigating all the doc format in use to document programs out there. There are plenty, everyone with pros and cons. The effort is taugher than I could imagine but I do not despair and so you should not! You have to forgive me because I have been painfully slow in the search and test of these tools: almost all the tools were new to me. Doing this search I have learned much and discovered a very interesting world that was new to me and I have to thank the KiCad team for this.

## Status

This is the current status of my efforts:

1) I have successfully converted Cvpcb manual from odt to asciidoc, rest and markdown (markdown with some limitations). The converted versions have a nice automatically generated TOC. In the while I have discovered that old odt files, presumably once converted from staroffice files, could contains "strange" image format like "StarView MetaFile". Presumably other doc file have the same images inside. No problem: they are easily converted into png, though through a manual procedure.

2) I have successfully created the nationalized .po files from the source docs.

3) I have successfully created outputs from merging the translation with the reference documents.

Thus so far we have just to decide which lightweight markup format to choose which best fits our documentation needs. This is a great start to compare them all:

in particular:

Let’s start the comparison (easier to read first):

markup easy to read footnotes tables indexes translation books generation

asciidoc

best

yes

yes

yes

using po4a

using docbook(direct)

markdown

good

no/yes

no/yes

no(yes)

using po4a

using docbook

rest

good

yes

yes

yes

native

using docbook(direct)

## My choice

Bear in mind that probably these are very personal considerations. I am aware of it.

### The format I prefer: asciidoc

"AsciiDoc is a human-readable document format, semantically equivalent to DocBook XML, but using plain-text mark-up conventions." from wikipedia

Pros:

• one of the cleaner format

• sufficient powerful: it supports table/table formatting/index/references

• plenty of tools available: pandoc(haskel), asciidoc(python), asciidoctor(ruby)

• support on websites like github: just write a README.adoc and other supports are planned (like gitbook)

• actively developed tools

• in the future there will probably be a direct pdf conversion tool without the need of the docbook burden

• there is even a nice asciidoc GUI editor, AsciidocFX: http://www.asciidocfx.com/

Cons:

• there are some formats to or from asciidoc direct conversion tool missing. I have converted, for an example, ODT into rest and then rest to asciidoc. Since rest is almost equipotential and have a unambiguous syntax (since rest is a real standard) the rest to asciidoc conversion (trough wonderful pandoc) is flawless. This is an (incomplete) list:

• odt → asciidoc:

• odt (odt2rest)→ rest

• rest (pandoc)→ asciidoc

• asciidoc → rest:

• asciidoc → html

• html(pandoc)→rest

### Second choice: rest

Pros:

• almost complete: it supports table/index/reference/etc.

• plenty of tools available: pandoc(haskel), sphinx(python), docutils(python)

• actively developed tools (all)

• widely used: it is the standard for python documentation, CMake, etc.

• you can (trough pandoc) convert almost anything to rest (apart asciidoc) and rest into anything (with pandoc, sphinx or docutils)

• many web (python) platforms use rest as an internal format for texts. See Zope/Plone, Trac, etc.

• nice gui editor, Retext, with realtime preview

Cons:

• some formatting options are missing:

• table formatting: table dimensions, column dimensions, cell justification, etc.

• you cannot have text bold and italics

• you cannot decide where to put toc and other things

• not (arguably) immediate (but still easy) syntax. See the links or the comments and such.

• support for i18n is native but more complex to handle than po4a

• support for i18n in sphinx is a work-in-progress state and actually have some quirks that, hopefully, will be resolved with the next sphinx version (wrong position of intermediate (.mo) files, fuzzy strings not handled, Makefile do not have any option about i18n, and so on)

### What I suggest to avoid (and why)

#### markdown

pros:

• much used today, probably due to its simplicity

• easier to integrate with documentation tools that already produce markdown as output as doxygen does

• the git flavoured format is simple and powerful enough to handle easily all documentation needs as tables (with cell justification), implicit links, and such.

cons:

• despite its popularity I do not think that could be a good choice for these reasons:

• it is not a standard: just as an example pandoc supports some 5 incompatible markdown "flavours":

• markdown: its pandoc own markdown superset format

• markdown_github: github supeset format

• markdown_mmd: multimarkdown superset format

• markdown_phpextra: php superset format

• markdown_strict: the least common multiple format

• markdown common subset is too poor to be used at anything apart from html pages. This is its aim and there finishes its use. See http://en.wikipedia.org/wiki/Markdown#Standardization. Markdown original converter is defined abandonware. Its development cycle lasted 1 year 9 years ago.

• its syntax is easier than rest but somehow arguably not easier than asciidoc: see links, images and tables for some examples

#### txt2tags

Wonderful small and powerful piece of software.

pros:

• one sigle portable python executable;

• exportable in many formats, and thanks to pandoc even more;

• completeness: the format should have all the necessary characteristics for the needs of a considerable complex and complete documentation task;

• diffusion. There are essentially two implementation:

• txt2tags: its native executable whose development seems stopped by the year 2010. This is not always a bad thing since if the format is fairy complete, its obsolescence is a guarantee of stability;

• pandoc: its powerful capability greatly enhance txt2tags output formats

cons:

• future: the format must stand the test of time. If the format is the result of a one people effort, as in this case, I am afraid it is doomed to extinction. On the other hand, as said previously, this is also a good thing; depends on which side you see it.

#### textile

Very interesting project with a standard, easy and powerful markup reference. It is very diffused and embedded in many web platforms. It has many implementations in various languages and it is a pity I haven’t found any way to make it internationalized. It is an alternative to markdown as some tools/libraries support both.

pros:

• diffused in many web libraries and web apps

• exportable in many formats thanks to pandoc

cons:

• somewhat less used then other more popular and substantially equivalent tools

• not easy to handle i18n

TO COMPLETE

#### sisu

Another interesting and powerful software project with a comprehensive format specification.

pros:

• complete as docbook or more

• translatable with the aid of po4a

cons:

• no tools to convert automatically document from other formats. No support from pandoc.

TO COMPLETE

## To sum up

• the only documentation standard in full sense is docbook. Almost every new format or conversion tool, in a way or another, do refer to docbook. This is logical since docbook-xml is derived directly from SGML that is a reference and was the reference for many years in the publishing field. The problem is that docbook is easy to produce by means of some automatic tool but it is not easy to write by humans, even with the aid of intelligent editors like emacs or eclipse;

• odt is easy to write, using Libre/OpenOffice but not to maintain and translate;

• so we have to switch from odt to some other documentation tool that we should choose from among similar tools that have some minimal characteristics we need such as:

• easier than docbook

• complete i.e. with many features like tables, indexes, toc, etc.

• standard

• more than one implementation (i.e. tools) of the standard

• easily translatable (i.e. automatic strings extractions and merge)

• tools actively developed

• I am getting acquainted with asciidoc. This document is in fact, full asciidoc compliant; try by yourself: copy and past the exact copy of this mail text and type these commands:

asciidoc this-text.adoc    #convert into html
a2x -f pdf this-text.adoc  #convert into pdf
a2x -f epub this-text.adoc #convert into epub

## Creating the outputs

### Common conversion

To test my experiments I’ve started converting the easier KiCad document: cvpcb. To do this, the easiest way I found was to use odt2sphinx, and odt to rest converter (that was useful for the rest tests too) found here.

Done simply this:

odt2sphinx cvpcb.odt

Obtained the file index.rst and the images into images folder and renamed index.rst into cvpcb.rst. Some of these images were in an obscure obsolete Star View Metafile format. Unoconv does not work well because convert the entire A4 page with the image inside so I converted the images in png format manually in this way:

2. copy & pasted into GIMP

3. exported into PNG

4. search and replace of all references of .svm files into .png in the rest file

The rest files obtained are full of small errors. The script adds spaces randomly, adds unwanted image parameters and do not recognize the headings but …​ the results are very easily corrected manually and with some sed scripting.

Probably this process could be improved exporting the odt to html first and then separate the embedded images to external png images via some script.

Anyway, once obtained a correct rest file with external images it is easy to convert this file into asciidoc or markdown with wonderful pandoc:

pandoc -f rst -t asciidoc cvpcb.rst -o cvpcb.adoc
pandoc -f rst -t markdown cvpcb.rst -o cvpcb.md

### Asciidoc output

As I seen above, to create the output files I have to simply do this:

asciidoc cvpcb.adoc    #convert into html
a2x -f pdf cvpcb.adoc  #convert into pdf
a2x -f epub cvpcb.adoc #convert into epub

### Rest output

To create output from rest files the best tool to use is sphinx. Included in bundle with the sphinx distribution there is a nice auto-configuration tool called sphinx-quickstart. Just exec this utility to create a configuration file conf.py and a Makefile to automate the document output generation. One done to create html simply type:

make -e html

or

make -e SPHINXOPTS="-D html_logo=images/kicad_logo.png" html

Similarly to create the other outputs:

make -e SPHINXOPTS="-D latex_logo=images/kicad_logo.png -D latex_paper_size=a4" latexpdf
make -e SPHINXOPTS="-D epub_cover=$$'images/kicad_logo.png', ''$$" epub

## Internationalization

This one of the most useful things that this document format conversion will bring: easy internationalization of all documentation. Different tools bring different approaches.

### asciidoc and markdown

The tools that use these formats are not able to handle internationalization directly but there is a beautiful little utility by Debian: po4a

These are the source format supported with the actual 0.45 po4a version:

po4a-gettextize --help-format
List of valid formats:
- asciidoc: AsciiDoc format.
- dia: uncompressed Dia diagrams.
- docbook: DocBook XML.
- guide: Gentoo Linux's XML documentation format.
- ini: INI format.
- kernelhelp: Help messages of each kernel compilation option.
- latex: LaTeX format.
- man: Good old manual page format.
- pod: Perl Online Documentation format.
- sgml: either DebianDoc or DocBook DTD.
- texinfo: The info page format.
- text: simple text document.
- wml: WML documents.
- xhtml: XHTML documents.
- xml: generic XML documents (see also docbook).

Markdown is not listed but is supported. See http://po4a.alioth.debian.org/man/man3/Locale::Po4a::Text.3pm.php

Please note that if you want to use po4a with Asciidoc you should use po4a at least version 0.45 or specify the "text" format, as for the markdown format. For version 0.45, the "text" filter asciidoc option is deprecated.

Usually working directories are specified in the po4a.cfg (see man po4a) config file but in the examples that follow I have done without it for clarity.

The process of internationalization is done in different steps.

#### Step 1: string template extraction

For asciidoc

po4a-gettextize -f asciidoc -M utf-8 -m cvpcb.adoc -p po/cvpcb.pot

or

po4a-gettextize -f text -o markdown -M utf-8 -m cvpcb.adoc -p po/cvpcb.pot

for markdown.

#### Step 2: translation

Copy the template into our nationalized version:

cp po/cvpcb.pot po/it.po

and use the gettext editor you like:

emacs it.po
poedit it.po

keep in mind that snapshots images should be nationalized. I suggest to create a internationalized image dirs such as:

images
images-es
images-fr
images-it

in this way untranslated images fallback to English images. po4a correctly translate image reference to enable the fallback.

#### Step 3: produce internationalized master documents

po4a-translate -f asciidoc -M utf-8 -m cvpcb.adoc -p po/it.po -k 0 -l cvpcb_it.adoc

#### Step 4: produce all kind of internationalized output formats

asciidoc -a lang=it cvpcb_it.adoc    #convert into html
a2x -a lang=it -f pdf cvpcb_it.adoc  #convert into pdf
a2x -a lang=it -f epub cvpcb_it.adoc #convert into epub

#### Step 5: update translations

With the following command the .po file will be updated automatically.

po4a-updatepo -f asciidoc -m cvpcb.adoc -p po/it.po

#### Step 6: loop

repeat from step 2

### rest (sphinx)

The sphinx software suite contains all the tools to handle i18n needs. Below a step-to-step guide to obtain a nationalized document with sphinx.

#### Step 1: string template extraction

Extraction of the template constituent of the messages to be translated. Create build/locale/docname.pot

sphinx-build -b gettext -d build/doctrees source source/catalog

#### Step 2: adding the languages to the configuration file

Add the following variable assignment to conf.py:

locale_dirs = ['locale/'] # path is example but recommended

Because (perhaps a bug?) I have not found a working way to assign the variable directly in the sphinx-intl command string.

#### Step 3: creation/update of the localized strings

sphinx-intl -c source/conf.py update -p source/catalog -d source/locale -l it.
 Note due to a unresolved bug, fuzzy strings are not yet handled. It is suggested to manage fuzzy strings with this command:
msgmerge -U source/locale/it/LC_MESSAGES/cvpcb.po source/catalog/\$DOCNAME.pot

#### Step 4: translate with the preferred .po files editor

poedit source/locale/it/LC_MESSAGES/cvpcb.po
emacs source/locale/it/LC_MESSAGES/cvpcb.po

#### Step 5: stats about localized strings

sphinx-intl -c source/conf.py stat -d source/locale -l it.

#### Step 6: compilation of the translated strings files (.mo)

sphinx-intl -c source/conf.py build -d source/locale

#### Step 7: Build nationalized documents

for html:

sphinx-build -a -b html -d build/doctrees source build/html
sphinx-build -a -b html -d build/doctrees -D language=it source build/html-it
sphinx-build -a -b html -d build/doctrees -D language=fr source build/html-fr

for pdf:

sphinx-build -a -b latex -d build/doctrees -D language='it' source build/latex-it
make -C built/latex-it pdf-all

for epub:

sphinx-build -a -b epub -d build/doctrees -D language='it' source build/epub-it

## Tools install

### asciidoc

• po4a (for asciidoc and markdown i18n)

sudo apt-get/yumm install po4a libunicode-linebreak-perl
• latex (for pdf generation)

sudo apt-get/yumm install dblatex texlive-lang-Your_Language_Here
• asciidoc

sudo apt-get/yumm install asciidoc
sudo apt-get/yumm install source-highlight
• asciidoctor

sudo apt-get/yumm install asciidoctor

or better (i.e. to get a more updated version):

sudo gem install asciidoctor

### rest

• docutils

sudo apt-get/yumm install docutils
• sphinx

sudo apt-get/yumm install python-sphinx

or better (i.e. to get a more updated version)

sudo easy_install install

and then:

sudo easy_install sphinx-intl

## Notes

1. I found cover images are a little tricky. For example, using sphinx you have to specify the same image for the cover with every output format (epub, html, pdf) in a different way. This is not a big problem but it is annoying. This is due to the fact that some formats like pdf or epub are usually produced via docbook. There are some exceptions:

1. for asciidoc there is one promising project, asciidoctor-pdf, that hopefully will be able to produce pdf directly but is experimental and unfortunately it is not able to include images yet. With asciidoc I have found a hack (thanks to John Beard) for generating nice cover with images as logos for pdf and epub outputs:

• for pdf:

a2x -f pdf --dblatex-opts "-P latex.output.revhistory=0 -P doc.publisher.show=0 -s pdf-cover-dblatex.sty" document.adoc

where the content of pdf-cover-dblatex.sty is:

\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{asciidoc-dblatex}[2012/10/24 AsciiDoc DocBook Style]

%% Just use the original package and pass the options.
\RequirePackageWithOptions{docbook}

% custom cover page
\def\DBKcover{
\thispagestyle{empty}
\begin{center}
\includegraphics[width=1\textwidth]{images/logo.png} \\
\vspace*{1in}
\bfseries
\sffamily
{\Huge \DBKtitle \\[1ex]\large ~~~ \\}
\vspace*{2.1in}
{\color{blue} \Huge ~~~ \\ \huge ~~~ \\}
\vspace*{2.1in}
{\Large\DBKdate \\}
\end{center}
\vfill
}
• and for epub:

a2x -f epub -a docinfo document.adoc

where the content of document-docinfo.xml is:

<mediaobject role="cover">
<imageobject>
<imagedata fileref="images/logo.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>DOCTITLE</phrase>
</textobject>
</mediaobject>

DOCTITLE should and could be substituted with a simple sed script like this:

sed -e 's/DOCTITLE/Real title/' document-docinfo.xml
2. for rest, as a pdf direct converter I have recently discovered rst2pdf but its development seems to be at a standstill since 2012.