Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix index link when term has subterms #3320; fix index-see-also #3314 #3373

Merged
merged 6 commits into from
Oct 10, 2019

Conversation

robander
Copy link
Member

@robander robander commented Oct 3, 2019

Signed-off-by: Robert D Anderson robander@us.ibm.com


name: Pull request
about: Propose changes to fix an issue or implement a new feature


Description

Current PDF output will drop all instances of an index term from the index, if even one instance of the term has a sub-term. This is done silently, so adding one secondary or tertiary term can erase any number of page links for the primary term.

Doing this ignores what the actual markup says, and might not even be the original intent (some of the code design is unclear).

Some indexing tools follow a practice that when there are child terms, the author should not index the primary; however, our code should respect the markup and leave that restriction to local rules.

The update here adds a variable $index.allow-link-with-subterm which defaults to true(); if set to false() it will ignore the links on the parent term as we do today. However, that should not be the default.

Motivation and Context

Fixes #3320
Fixes #3314

How Has This Been Tested?

Tested with the content in 3320; using the default true() setting keeps the expected links, while setting to false() removes them.

Type of Changes

  • Bug fix (non-breaking change which fixes an issue)

Signed-off-by: Robert D Anderson <robander@us.ibm.com>
@robander robander added bug plugin/pdf Issue related to PDF plug-in priority/medium Medium (or unknown) priority issue index Issue related to index sorting or rendering labels Oct 3, 2019
@robander robander self-assigned this Oct 3, 2019
@robander
Copy link
Member Author

robander commented Oct 3, 2019

Not yet ready for review - still need to remove any page link for the primary term when it does specify a secondary.

Signed-off-by: Robert D Anderson <robander@us.ibm.com>
@robander
Copy link
Member Author

robander commented Oct 4, 2019

Updated and added a better / more complete test case; tested for AH and FOP output.
3320.zip

To validate output when publishing 3320.ditamap:

  • EVERYWHERE is only a primary term, should have page links. (FOP doesn't remove duplicates so I get one page twice.)
  • Issue3320 should have one page link because used as primary once; it should have 2 sub-terms each with a page link
  • Only subterms should not have any page links, only subterms
  • PrimaryA should have two page links and one child with a page link
  • PrimaryB should have one page link and one child with a page link
  • SeeALSOTest should have a page link, with the See also
  • SeeTest should not have a page link

@robander robander changed the title Keep index links when term has subterms #3320 Fix index link when term has subterms #3320; fix index-see-also #3314 Oct 5, 2019
Signed-off-by: Robert D Anderson <robander@us.ibm.com>
@robander
Copy link
Member Author

robander commented Oct 5, 2019

Adding to this pull request a fix for #3314.

  • If a term has more than one index-see-also, current processing ignores all but the first. Latest commit fixes that issue by preserving all see-also instances, separating them with a comma. If the two "see also" references are to IMPORTANT and MORE IMPORTANT, current processing will generate "See also IMPORTANT". With this fix, it will generate "See also IMPORTANT, MORE IMPORTANT"`. Each see-also target is linked correctly.
  • If a term has more than one index-see, current processing ignores all but the first. Latest commit fixes that issue by preserving all see instances, separating them with a comma. If the two "see" references are to IMPORTANT and MORE IMPORTANT, current processing will generate "See IMPORTANT" after the term. With this fix, it will generate "See IMPORTANT, MORE IMPORTANT". Each see-also target is linked correctly. A better long term fix would be to convert one or all of these to see-also but it didn't seem like it was worth going that far (particularly having to handle the case of a term with both index-seeandindex-see-also`.

Attaching test cases for #3314.
3314.zip

Copy link
Member

@infotexture infotexture left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing with FOP

  • EVERYWHERE is only a primary term, should have page links. (FOP doesn't remove duplicates so I get one page twice.) ✅
  • Issue3320 should have one page link because used as primary once; it should have 2 sub-terms each with a page link

❌ 3 page links (sub-term page links also appear in parent)

  • Only subterms should not have any page links, only subterms ✅
  • PrimaryA should have two page links and one child with a page link

❌ 3 page links (sub-term page link also appears in parent)

  • PrimaryB should have one page link and one child with a page link

❌ I get PrimaryC with 2 page links (sub-term page link also appears in parent)

  • SeeALSOTest should have a page link, with the See also ✅
  • SeeTest should not have a page link ✅

@infotexture
Copy link
Member

Output files from FOP test above:

@infotexture
Copy link
Member

The update here adds a variable $index.allow-link-with-subterm which defaults to true(); if set to false() it will ignore the links on the parent term as we do today. However, that should not be the default.

@robander Would users be able to adjust this default by passing a parameter on the CLI, or would changing this require an XSLT customization plug-in?

@lief-erickson With this change, the current docs index in dita-ot/docs#264 would show a large number of (redundant) page numbers for certain parent entries:

docs-index_pr-264_pr-3373

@lief-erickson
Copy link
Contributor

@infotexture, that screen capture certainly exposes the duplicate page number issue in FOP, doesn't it?

@robander
Copy link
Member Author

robander commented Oct 7, 2019

@infotexture not sure why you are getting page links on the primary term when the secondary is indexed. That's not what I see so I need to double check if something wasn't checked in.

As currently defined this is not a CLI option - to me, this feels way too specific an option to expose on the CLI. Especially for this case -- the only reason to suppress them is if you have a rule that primary terms cannot be indexed when secondary terms are present, but in that case, I think our default should still be "Do what the markup says and if you don't like it, you can update the markup per your local rules"

And the duplicate page numbers -- yeah, that has always been the case with FOP output, just wasn't exposed for us in this case because we ignored all of those entries.

@@ -116,6 +116,9 @@ See the accompanying LICENSE file for applicable license.
</xsl:variable>
<xsl:if test="contains($isNormalChilds,'true ')">
<xsl:apply-templates select="." mode="make-index-ref">
<xsl:with-param name="idxs" select="if ($index.allow-link-with-subterm and exists(key('index-leaves',@value)))
then (opentopic-index:refID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
then (opentopic-index:refID)
then opentopic-index:refID

<xsl:key name="index-leaves"
match="opentopic-index:index.entry
[empty(opentopic-index:index.entry|opentopic-index:see-childs)]
[not(ancestor::opentopic-index:index.group)]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[not(ancestor::opentopic-index:index.group)]"
[empty(ancestor::opentopic-index:index.group)]"

@@ -220,16 +228,16 @@ See the accompanying LICENSE file for applicable license.

<xsl:template match="opentopic-index:index.entry[not(opentopic-index:index.entry)]" mode="index-postprocess" priority="1">
<xsl:variable name="page-setting" select=" (ancestor-or-self::opentopic-index:index.entry/@no-page | ancestor-or-self::opentopic-index:index.entry/@start-page)[last()]"/>
<xsl:variable name="isNoPage" select=" $page-setting = 'true' and name($page-setting) = 'no-page' "/>
<xsl:variable name="isNoPage" select=" $page-setting = 'true' and name($page-setting) = 'no-page' "/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<xsl:variable name="isNoPage" select=" $page-setting = 'true' and name($page-setting) = 'no-page' "/>
<xsl:variable name="isNoPage" select="$page-setting = 'true' and name($page-setting) = 'no-page'"/>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really work? Because

<xsl:variable name="page-setting"
  select="(ancestor-or-self::opentopic-index:index.entry/@no-page |
           ancestor-or-self::opentopic-index:index.entry/@start-page)
          [last()]"/>

type is (@no-page | @start-page)? and the name() function will never return no-page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly - it looks like no, for the reason you describe. I didn't touch that code and haven't tested it. (Didn't even examine it while working on the rest of this.) I'm guessing it's related to page ranges, which I think have other open issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the spacing on the variable declaration but would rather leave this alone until I have a test case that would demonstrate whether it works or not - I think you're right that it shouldn't work, but ...

<xsl:apply-templates select="opentopic-index:index.entry[1]" mode="get-see-value"/>
</fo:basic-link>
<xsl:for-each select="opentopic-index:index.entry">
<xsl:if test="not(position() eq 1)">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<xsl:if test="not(position() eq 1)">
<xsl:if test="position() ne 1">

<xsl:apply-templates select="opentopic-index:index.entry[1]" mode="get-see-value"/>
</fo:basic-link>
<xsl:for-each select="opentopic-index:index.entry">
<xsl:if test="not(position() eq 1)">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<xsl:if test="not(position() eq 1)">
<xsl:if test="position() ne 1">

@@ -298,8 +316,10 @@ See the accompanying LICENSE file for applicable license.
<xsl:template match="opentopic-index:index.entry" mode="get-see-value">
<fo:inline>
<xsl:apply-templates select="opentopic-index:formatted-value/node()"/>
<xsl:text> </xsl:text>
<xsl:apply-templates select="opentopic-index:index.entry[1]" mode="get-see-value"/>
<xsl:if test="opentopic-index:index.entry[1]">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<xsl:if test="opentopic-index:index.entry[1]">
<xsl:if test="exists(opentopic-index:index.entry)">

IMO using exists() is more readable.

<xsl:apply-templates select="opentopic-index:index.entry[1]" mode="get-see-value"/>
</fo:basic-link>
<xsl:for-each select="opentopic-index:index.entry">
<xsl:if test="not(position() eq 1)">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<xsl:if test="not(position() eq 1)">
<xsl:if test="position() ne 1">

@@ -371,6 +396,9 @@ See the accompanying LICENSE file for applicable license.
</xsl:variable>
<xsl:if test="contains($isNormalChilds,'true ')">
<xsl:apply-templates select="." mode="make-index-ref">
<xsl:with-param name="idxs" select="if ($index.allow-link-with-subterm and exists(key('index-leaves',@value)))
then (opentopic-index:refID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
then (opentopic-index:refID)
then opentopic-index:refID

@robander
Copy link
Member Author

robander commented Oct 10, 2019

@infotexture testing again with code built from github and I reproduced the results you got with FOP -- investigating. [Edit: fixed with next commit.]

Robert D Anderson added 2 commits October 9, 2019 21:11
Signed-off-by: Robert D Anderson <robander@us.ibm.com>
Signed-off-by: Robert D Anderson <robander@us.ibm.com>
@lief-erickson
Copy link
Contributor

Tested with FOP:

  • Using the DITA-OT User Guide 1393f62b. The dropped see-also terms are present. That is fixed.

I am seeing primary entries that have sub-terms show page numbers. Several examples: Ant, dita command, DITAVAL, and many more. This is good.

In the case of DITAVAL, which is one example of several, the page numbers shown after the primary term are also shown after a sub-term. After some testing it appears as though the <index-see-also> markup is contributing a page number to the primary term. This doesn't seem correct to me.

Both of these entries below contribute a page locator to the primary DITAVAL entry (even the second one). I would expect no page locator and the term (clickable) in the "see also" list (for the relevant primary term).

<indexterm>DITAVAL<index-see-also>filters</index-see-also></indexterm>
<indexterm>profiling<index-see-also>DITAVAL</index-see-also></indexterm>

  • Using a private test document that showed results similar to the DITA-OT User Guide.

@robander
Copy link
Member Author

robander commented Oct 10, 2019

@lief-erickson That was not my understanding of index-see-also, based on this bit from the spec: http://docs.oasis-open.org/dita/dita/v1.3/errata02/os/complete/part1-base/langRef/base/index-see-also.html

In addition to its "see also" redirection, an <index-see-also> functions as a pointwise index term, thereby typically generating a page reference as well as the "see also" indication.

As I've understood the specification and the implementation -- "see also" means "Index this point, and also put out a pointer to other terms that may be useful"; "index-see" does not put out a point, and is purely a redirect.

EDIT: wait I misunderstood part of your comment. With these two terms, I would expect the first to contribute a page link to DITAVAL, but not the second:
<indexterm>DITAVAL<index-see-also>filters</index-see-also></indexterm>
<indexterm>profiling<index-see-also>DITAVAL</index-see-also></indexterm>

@lief-erickson
Copy link
Contributor

lief-erickson commented Oct 10, 2019

@robander, you're right. The spec does indeed say otherwise. It's not my expectation, but the output does match the spec.

Edit: it appears there's wiggle room because of the word "typically." I would typically expect no pager numbers, but apparently someone else might.

Yes, even the second entry adds a page locator, which I found definitely odd. The first one I could possibly understand, but not the second.

jelovirt
jelovirt previously approved these changes Oct 10, 2019
@robander
Copy link
Member Author

@lief-erickson thanks for testing, and highlighting that issue. I've updated my test case to add this in the first topic:
<indexterm>ChildA</indexterm>
and these in the second topic:
<indexterm>ChildB<index-see-also>ChildA</index-see-also></indexterm>
<indexterm>ChildBSee<index-see>ChildA</index-see></indexterm>

As you noted -- both of those entries in the second topic incorrectly added page links to "ChildA", going to that second topic. I've got an update ready so that only the entry in the first topic creates a page link to "ChildA".

Signed-off-by: Robert D Anderson <robander@us.ibm.com>
@robander robander merged commit d529c79 into dita-ot:develop Oct 10, 2019
@robander robander deleted the hotfix/3320 branch October 10, 2019 14:10
@robander robander added this to the 3.4 milestone Oct 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug index Issue related to index sorting or rendering plugin/pdf Issue related to PDF plug-in priority/medium Medium (or unknown) priority issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index entries missing if child entries exist PDF index <index-see-also> and <index-see> issues
4 participants