Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XHTML: Bad nested <a> elements in 1.79.* #24

Closed
rlpowell opened this issue Feb 1, 2017 · 12 comments
Closed

XHTML: Bad nested <a> elements in 1.79.* #24

rlpowell opened this issue Feb 1, 2017 · 12 comments

Comments

@rlpowell
Copy link

rlpowell commented Feb 1, 2017

Given the input of:

   <foreignphrase xml:lang="jbo"><indexterm type="lojban-word"><primary role="valsi">bridi</primary></indexterm><glossterm role="valsi" linkend="valsi-bridi">bridi</glossterm></foreignphrase>

(a pattern my book uses literally thousands of times), in 1.78.1 this turned into

   <span xml:lang="jbo" class="foreignphrase" lang="jbo"><em xml:lang="jbo" class="foreignphrase" lang="jbo"><a id="idm47493134773216" class="indexterm"></a><a class="glossterm" href="#valsi-bridi"><em class="glossterm">bridi</em></a></em></span>

and in 1.79.1 it turns into:

   <span xml:lang="jbo" class="foreignphrase" lang="jbo"><em xml:lang="jbo" class="foreignphrase" lang="jbo"><a id="idm47807852677856" class="indexterm"></a><a class="glossterm" href="#valsi-bridi"><em class="glossterm"><a class="glossterm" href="#valsi-bridi" title="bridi">bridi</a></em></a></em></span>

Look at the closing tags to see the problem.

This causes EPUB generation to fail, as it doesn't like nested <a> elements (and rightly so).

To repro:

1.  Grab http://vrici.lojban.org/~rlpowell/media/public/bad_a_test.xml
2.  Grab  https://sourceforge.net/projects/docbook/files/docbook-xsl/1.79.1/docbook-xsl-1.79.1.zip/download and  https://sourceforge.net/projects/docbook/files/docbook-xsl/1.78.1/docbook-xsl-1.78.1.zip/download ; unzip both
3.  xmlto -vv -x docbook-xsl-1.78.1/xhtml/docbook.xsl -o bad_nochunks/ xhtml-nochunks bad_a_test.xml
4.  mv bad_nochunks/bad_a_test.html bad_nochunks/bad_a_test-1.78.1.html
5.  xmlto -vv -x docbook-xsl-1.79.1/xhtml/docbook.xsl -o bad_nochunks/ xhtml-nochunks bad_a_test.xml
6.  mv bad_nochunks/bad_a_test.html bad_nochunks/bad_a_test-1.79.1.html
7.  diff bad_nochunks/bad_a_test-1.78.1.html  bad_nochunks/bad_a_test-1.79.1.html

To make it even more clear you can do:

 diff <(sed -r 's/idm[0-9]+//g' bad_nochunks/bad_a_test-1.78.1.html)  <(sed -r 's/idm[0-9]+//g' bad_nochunks/bad_a_test-1.79.1.html)

The key bit is:

<       <span xml:lang="jbo" class="foreignphrase" lang="jbo"><em xml:lang="jbo" class="foreignphrase" lang="jbo"><a id="" class="indexterm"></a><a class="glossterm" href="#valsi-bridi"><em class="glossterm">bridi</em></a></em></span>, and the central part of speech is the
<       <span xml:lang="jbo" class="foreignphrase" lang="jbo"><em xml:lang="jbo" class="foreignphrase" lang="jbo"><a id="" class="indexterm"></a><a class="glossterm" href="#valsi-selbri"><em class="glossterm">selbri</em></a></em></span>. Logicians refer to the things thus related as
---
>       <span xml:lang="jbo" class="foreignphrase" lang="jbo"><em xml:lang="jbo" class="foreignphrase" lang="jbo"><a id="" class="indexterm"></a><a class="glossterm" href="#valsi-bridi"><em class="glossterm"><a class="glossterm" href="#valsi-bridi" title="bridi">bridi</a></em></a></em></span>, and the central part of speech is the
>       <span xml:lang="jbo" class="foreignphrase" lang="jbo"><em xml:lang="jbo" class="foreignphrase" lang="jbo"><a id="" class="indexterm"></a><a class="glossterm" href="#valsi-selbri"><em class="glossterm"><a class="glossterm" href="#valsi-selbri" title="selbri">selbri</a></em></a></em></span>. Logicians refer to the things thus related as
@fiskr
Copy link

fiskr commented Feb 1, 2017

@rlpowell when did this start happening?

@rlpowell
Copy link
Author

rlpowell commented Feb 2, 2017

If you follow my repro steps, I am comparing 1.78.1 with 1.79.1, as pulled by zip from sourceforge. I believe those are adjacent releases.

@rlpowell
Copy link
Author

Any ideas here? This seems like a pretty bad new bug, no?

@fiskr
Copy link

fiskr commented Mar 25, 2017

@rlpowell - I have no idea what caused it. Looking through the log, as best I can tell there isn't much of a history in this repo, and nothing obvious is to blame here. I am hoping one of the actual devs of this repo could help.

Do you know whether it's still happening?

@rlhamilton
Copy link
Contributor

This is an interesting one. I was able to reproduce your results using xsltproc (which is the xslt processor used by xmlto). When I ran the same test using Saxon, I got a different result with 1.79.1. Here it is:

<span xml:lang="jbo" class="foreignphrase"><em xml:lang="jbo" class="foreignphrase"><a id="d5e13" class="indexterm"/><a class="glossterm" href="#valsi-bridi"><em class="glossterm"/></a><a class="glossterm" href="#valsi-bridi" title="bridi">bridi</a><a class="glossterm" href="#valsi-bridi"><em class="glossterm"/></a></em></span

With xsltproc, you get the nested link; with Saxon, there are two empty <a class="glossterm" href="#valsi-bridi"><em class="glossterm/></a> elements on either side of the real glossterm link, but they don't nest. My guess, before diving too deep into the stylesheets, is that Saxon is getting it right (though the stylesheets shouldn't create those empty elements), and xsltproc is trying to optimize and gets it wrong; but I could be wrong, too:-).

Clearly there's a problem with the stylesheets. If you have the option to use Saxon, that might be a stopgap. But I'll leave this as an open issues for the developers to look at.

@fiskr
Copy link

fiskr commented Mar 26, 2017

@rlhamilton thank you for looking at this. If there is a problem with the stylesheets, which dev might be the best to ask? I can try to fix this myself.

@bobstayton
Copy link
Contributor

bobstayton commented Mar 26, 2017

I traced the problem through the stylesheets, and it is a bug introduced in 1.79.1. In response to issue #1359 on SourceForge, all the inline *.charseq templates were changed to process the content with simple.xlink when the $content template param was passed. Because the glossterm template already generates <a>, this nests a second link for that element.

The solution is to change the call to inline.italicseq in the match="glossterm" template in inline.xsl from:
<xsl:call-template name="inline.italicseq">
<xsl:with-param name="content" select="$content"/>
</xsl:call-template>
to:
<xsl:call-template name="inline.italicseq">
<xsl:with-param name="contentwithlink" select="$content"/>
</xsl:call-template>

That prevents the nested linking. Other elements that produce their own link should also have this change. If this bug is kept open I will work to integrate these changes for the next release.

I will also provide my usual suggestion that since you are using DocBook 5 XML files that you should use the namespaced version of the stylesheets (docbook-xsl-ns-1.79.1 instead of docbook-xsl-1.79.1). That will prevent double processing of the file to strip the namespace and prevent potential problems with working with an internal nodeset instead of the content directly. 8^)

@fiskr
Copy link

fiskr commented Mar 26, 2017

Would it be OK if I make a pull request?

@rlhamilton
Copy link
Contributor

Bob is the main developer, so I think this is already on his todo list. That said, I'm sure he'll let you know if he'd like you to create a pull request.

Thanks, Bob, for finding the problem.

@worldmind
Copy link

worldmind commented Apr 17, 2022

Hello, can somebody say in which version it's fixed?
I use

Package: docbook-xsl              
Version: 1.79.2+dfsg-1

on Ubuntu 21.10 and have similar issue

btw, thank you for great tools!

@rlhamilton
Copy link
Contributor

I don't think there has been a formal release since that bug was fixed. However, it has been fixed in recent snapshots, which you can find here: https://github.com/docbook/xslt10-stylesheets/releases

@worldmind
Copy link

Thank you, actually I tried, but had some issues with import path, so just created python script for fix )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants