Skip to content
Carsten Bormann edited this page Mar 8, 2022 · 4 revisions

Common Pitfalls

Triggering Markdown Syntax Involuntarily

It is quite common that kramdown interprets some of the text that is supposed to go into a document as markup. Usually, the remedy is adding a backslash (\).

An opening square bracket ([) is interpreted as starting a link. In most cases, this just causes a warning, but to silence the warning, use a backslash (\[).

Underscores are used to delimit emphasis (italics or bold) if used in pairs, even if they occur within a word. Again, backslash to the rescue (\_). Similar problems can be caused for asterisks (prevent with \*). (Backslashes are not needed if the underscore/asterisk is delimited by spaces.)

The worst offender is probably the pipe (vertical bar) symbol: |. Kramdown is rather aggressive in interpreting this as a table cell boundary, except in a code block or a code span. (This has been recognized as a bug in kramdown, but is hard to fix in a backward compatible way.) A common trick is also to enter a broken bar (¦, U+00A6), which is then re-interpreted by xml2rfc as a vertical bar.

If you want to start a paragraph with something that looks like a list item marker, you need to escape it. This is done by escaping the period in what might look like an ordered list or the list item marker in what would look like an unordered list:

1984\. It was great

\- others say that, too!

Not letting your IALs snuggle up

IALs (inline attribute lists) are used to convey attributes that xml2rfc needs but that are not provided by basic markdown. E.g., the title for some sourcecode:

     a = 1;
     b = 2
{: title="Assigning to two variables"}

Trailing IALs usually need to immediately follow the construct they are providing attributes to, on a new line. This doesn't work:

     a = 1;
     b = 2

{: title="Assigning to two variables"}

Such a "loose" IAL will now be assigned to the next item following. If that is a section heading, the section will have two titles, which will be too much for xml2rfc; more generally, the attributes will land on elements that didn't expect them and you will get cryptic messages.

Remember: trailing IALs want to snuggle up.

Reference Pitfalls

kramdown-rfc provides a highly automated mechanism for including bibliographic references (called citations here). Sometimes the automation can get in the way, or a conflict can turn up with xml2rfc's XML-based rules.

Reference Anchors

Like all anchors, reference anchors (the short names that occur in the square brackets at the point of citation and in the references sections) need to be XML names in xml2rfc. As it is generally unwise to use characters outside ASCII in xml2rfc, this limits the first character of a reference anchor to upper or lower case ASCII letters (A-Z, a-z) and underscores (_); followon characters can also use ASCII digits (0-9), ASCII minus (-), and dots (.). kramdown-rfc has some feeble mechanisms to insert a leading _ where a digit is used as the start of an anchor, but this is not well-debugged territory (most document authors would rather choose a better alternative anchor than rely on this cop-out). Note that the XML syntax also excludes colons, which can be popular for some standards references.

Kramdown's automation classifies some reference anchors as subject to automation. This includes anchors that start with rfc, bcp, or std (lower or upper case), and anchors that start with a sequence of two or more uppercase characters, numbers, or dashes followed by a dot, e.g., I-D., IANA., IEEE., W3C., currently limited to this (potentially growing) set:

  • BCP, RFC, STD as mentioned above, and I-D -- generated automatically out of IETF datatracker and RFC-editor bibliographic information,
  • IANA, generated automatically out of IANA registry names,
  • DOI, generated automatically out of data supplied by dx.doi.org, and
  • 3GPP and SDO-3GPP, ANSI, CCITT, FIPS, IEEE, ISO, ITU, NIST, OASIS, PKCS, W3C -- note that many of these point to collections of references that are not currently being updated; this, and the level of overall coverage of other SDOs, is likely to change in the course of 2021.

Where interaction with automation is not desired (i.e., manual specification), anchors should be chosen that do not follow these conventions.

Interaction with Markdown editors and processors

VSCode

VSCode seems to need a line of three dots (...) at the end of the front matter YAML, before the abstract separator.

Github

To show a table, github needs to have pipe symbols (|) throughout, no + signs where horizontal and vertical rules cross.

Trailing Whitespace

Giving a meaning to trailing whitespace is one of the big blunders that John Gruber made when defining markdown.

Kramdown-rfc plays along, but the whole thing is so error-prone that Martin Thomson in his widely used I-D-template created a commit hook that prevents committing changes that introduce trailing whitespace.

Which maybe is a reasonable thing to do, but it creates an interesting failure situation: Somebody manages to sneak in a commit that does create trailing whitespace. If somebody tries to edit the document that has been updated with that commit, they can’t commit their changes because the commit hook doesn’t let them commit their edit that still has the trailing whitespace from the previous commit.

Of course, modern editors have easy ways to get rid of trailing whitespace (e.g., M-x delete-trailing-whitespace RET in emacs), but the next committer...

  1. first has to diagnose the problem,
  2. needs to find that command that they may not need too often,
  3. litters their PR with all those whitespace changes because they'd lose the changes in the working copy otherwise.

I don’t know a way to avoid trailing whitespace that doesn’t have that problem, or I would already have suggested that change to Martin.

TL;DR, yes, best avoid committing trailing whitespace.

Further Reading