Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid xml in docbook output #4539

Open
ciklysta opened this issue Jan 24, 2024 · 2 comments
Open

invalid xml in docbook output #4539

ciklysta opened this issue Jan 24, 2024 · 2 comments
Milestone

Comments

@ciklysta
Copy link

When rendering this code:

[#the_link]**something #something# something**

output to docbook format is an invalid xml (in the xreflabel part):

<simpara><anchor xml:id="the_link" xreflabel="something <emphasis role="marked">something</emphasis> something"/><emphasis role="strong">something <emphasis role="marked">something</emphasis> something</emphasis></simpara>

the html5 output is correct:

<p><strong id="the_link">something <mark>something</mark> something</strong></p>

Proposed correct behavior is to put only the text content from the paragraph to the xreflabel attribute. An alternative is to discard sub-elements completely.

@mojavelinux mojavelinux added this to the support milestone Jan 24, 2024
@mojavelinux
Copy link
Member

This is a known limitation of the DocBook backend/converter (and has always been this way since even before Asciidoctor). The reftext needs to be specified explicitly if the reftext would otherwise contain formatting.

[[the_link,something something something]]*something #something# something*

This issue is FAR more complicated to address than you think. In your original example, here's what the converter sees after converting the strong text:

<anchor xml:id="the_link" xreflabel="something #something# something"/><emphasis role="strong">something #something# something</emphasis>

The emphasis hasn't yet been converted, so there are no XML tags to clean. And, in fact, the converter already does clean XML tags if they are there. See https://github.com/asciidoctor/asciidoctor/blob/main/lib/asciidoctor/converter/docbook5.rb#L643-L645

Ultimately, there's nothing more that core can do at this point in Asciidoctor until #61 is resolved (which we're working on the AsciiDoc Language specification).

However, if you want to change this behavior, you can extend the DocBook converter and sanitize the reftext more aggressively, perhaps by looking for AsciiDoc markup and stripping it out.

class MyDocBookConverter < (Asciidoctor::Converter.for 'docbook')
  register_for 'docbook'

  def common_attributes id, role = nil, reftext = nil
    if reftext
      reftext = "replacement text goes here"
    end
    super
  end
end

@mojavelinux
Copy link
Member

The best we could hope for right now is to just leave the xreflabel attribute off for this kind of anchor. That works in dblatex (it will still use the adjacent element as the reftext), but not Apache FOP, which uses the title from the parent element. The reason we added xreflabel here was to make it more precise and consistent. But there is a risk, as you have observed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants