invalid xml in docbook output #4539

ciklysta · 2024-01-24T11:05:16Z

When rendering this code:

[#the_link]**something #something# something**

output to docbook format is an invalid xml (in the xreflabel part):

<simpara><anchor xml:id="the_link" xreflabel="something <emphasis role="marked">something</emphasis> something"/><emphasis role="strong">something <emphasis role="marked">something</emphasis> something</emphasis></simpara>

the html5 output is correct:

<p><strong id="the_link">something <mark>something</mark> something</strong></p>

Proposed correct behavior is to put only the text content from the paragraph to the xreflabel attribute. An alternative is to discard sub-elements completely.

mojavelinux · 2024-01-24T12:41:12Z

This is a known limitation of the DocBook backend/converter (and has always been this way since even before Asciidoctor). The reftext needs to be specified explicitly if the reftext would otherwise contain formatting.

[[the_link,something something something]]*something #something# something*

This issue is FAR more complicated to address than you think. In your original example, here's what the converter sees after converting the strong text:

<anchor xml:id="the_link" xreflabel="something #something# something"/><emphasis role="strong">something #something# something</emphasis>

The emphasis hasn't yet been converted, so there are no XML tags to clean. And, in fact, the converter already does clean XML tags if they are there. See https://github.com/asciidoctor/asciidoctor/blob/main/lib/asciidoctor/converter/docbook5.rb#L643-L645

Ultimately, there's nothing more that core can do at this point in Asciidoctor until #61 is resolved (which we're working on the AsciiDoc Language specification).

However, if you want to change this behavior, you can extend the DocBook converter and sanitize the reftext more aggressively, perhaps by looking for AsciiDoc markup and stripping it out.

class MyDocBookConverter < (Asciidoctor::Converter.for 'docbook')
  register_for 'docbook'

  def common_attributes id, role = nil, reftext = nil
    if reftext
      reftext = "replacement text goes here"
    end
    super
  end
end

mojavelinux · 2024-01-24T20:25:08Z

The best we could hope for right now is to just leave the xreflabel attribute off for this kind of anchor. That works in dblatex (it will still use the adjacent element as the reftext), but not Apache FOP, which uses the title from the parent element. The reason we added xreflabel here was to make it more precise and consistent. But there is a risk, as you have observed.

mojavelinux added this to the support milestone Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

invalid xml in docbook output #4539

invalid xml in docbook output #4539

ciklysta commented Jan 24, 2024

mojavelinux commented Jan 24, 2024

mojavelinux commented Jan 24, 2024

invalid xml in docbook output #4539

invalid xml in docbook output #4539

Comments

ciklysta commented Jan 24, 2024

mojavelinux commented Jan 24, 2024

mojavelinux commented Jan 24, 2024