Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xref does not support slash in the id #1540

Open
jmini opened this issue Oct 23, 2015 · 16 comments
Open

Xref does not support slash in the id #1540

jmini opened this issue Oct 23, 2015 · 16 comments
Milestone

Comments

@jmini
Copy link

jmini commented Oct 23, 2015

It seems that slashes / are not supported in key names for the cross-references (id of the xref).

This do not work:

Goto <<img/sunset>>

== Images

[[img/sunset, Sunset Figure]]
.A sunset image
image::sunset.jpg[scaledwidth=75%]

This works:

Goto <<img-sunset>>

== Images

[[img-sunset, Sunset Figure]]
.A sunset image
image::sunset.jpg[scaledwidth=75%]

The only difference between the two snippets is the id:
img-sunset works and img/sunset does not work.

When slashes are used, it breaks not only the xref:

  • Image block is not correctly interpreted in the PDF output (and the link "Goto xref" does as if the anchor does not exist => I think this last part is OK).
  • The whole line containing the xref link ("Goto xref") is missing in the HTML output (and no anchor is defined near the image in the HTML sources)

Tested with version 1.5.2 (used with maven)

@mojavelinux
Copy link
Member

That's because, as far as understand, / is an illegal character for an id. We implement a more strict version of the XML id rules (i.e, NCName).

Here's the regexp we use specifically:

[\p{Alpha}:_][\p{Word}:.-]*

...which is perhaps easier to read in the non-unicode aware variant

[a-zA-Z:_][a-zA-Z0-9:.-]*

Forward slash is not an allowed character.

@mojavelinux
Copy link
Member

I'd also say that these rules for IDs are pretty universal. So much so that it would cause a lot of problems for converters if we were not this strict.

@mojavelinux
Copy link
Member

Keep in mind that the \p{Alpha} and \p{Word} honor alpha and word characters in all the world's languages, respectively (as defined by Unicode). So the following is valid id:

[[img-café]]

@mojavelinux mojavelinux added this to the support milestone Oct 23, 2015
@mojavelinux mojavelinux changed the title Xref does not support splash in the id Xref does not support slash in the id Nov 6, 2015
@mojavelinux
Copy link
Member

Is this reasoning acceptable?

@jmini
Copy link
Author

jmini commented Nov 7, 2015

This makes sense... I wanted to add a sentence in 29.1. Defining an Anchor.

In the HTML backend case, just dropping the complete line without any error is not really nice.

@mojavelinux
Copy link
Member

I wanted to add a sentence in 29.1. Defining an Anchor.

Definitely. We should be very explicit about what a valid ID is and what to keep in mind when selecting characters. What might help are "good examples" vs "bad examples" kind of thing. It gets the point across well.

just dropping the complete line without any error is not really nice.

I think it just leaves it unparsed, doesn't it?

@jmini
Copy link
Author

jmini commented Nov 8, 2015

I think it just leaves it unparsed, doesn't it?

Ok this is true. I had something else in mind. The image bloc is broken and therefore the xref links switch to the "anchor not found" mode.

@cirosantilli
Copy link

Hmmm, / appears to be allowed in HTML5: https://stackoverflow.com/questions/70579/what-are-valid-values-for-the-id-attribute-in-html and I want to use them as a workaround for: #3148

If the user explicitly requests a given ID, wouldn't it be better to just honor it?

E.g. this would be a good ID formation when merging big subjects into smaller topics:

[[big-subject]]
== Big subject

[[big-subject/small-topic]]
=== Small topic

[[big]]
== Big

[[big/subject-small-topic]]
=== subject small topic

@cirosantilli
Copy link

cirosantilli commented Jul 27, 2019

Similar for allowing the ID to start with digits, appears legal in HTML5. Also found this related ticket now: #3307

@mojavelinux
Copy link
Member

mojavelinux commented Jul 27, 2019

The reason for the restricted syntax is historical. It's also because it ensures a valid ID for DocBook (which requires that the ID be a valid XML name). Changing it would diverge with AsciiDoc Python (which I'm not necessarily opposed to).

If you want to lift all restrictions today, you can use the [#id] syntax.

See #2777 (comment) for more context.

@cirosantilli
Copy link

Cool, thanks for mentioning the workaround Dan, I didn't know that one.

@mojavelinux
Copy link
Member

I did some thinking about this and I think what a valid ID is should really be up to a validator or the user, not the processor. The one risk with changing the matcher is that it could end up matching lines it didn't previously match. Now, give that the ID is enclosed in double square brackets, I think the risk of that is pretty low. But we still have to put some boundaries on what we're matching. I think a slash is reasonable, as is a leading number. But we have to think hard about what we allow beyond that.

@cirosantilli
Copy link

Thanks for looking into this! Yes, definitely, I understand that all those decisions are hard to make. I wonder if anything would break if we just forbade the ] character.

@elextr
Copy link

elextr commented Jul 27, 2019

I would strongly recommend forbidding end of line so if the user makes a mistake with the closing ] it doesn't gobble up a large part of the document as an ID.

@mojavelinux
Copy link
Member

mojavelinux commented Jul 28, 2019

I wonder if anything would break if we just forbade the ] character.

That would go way too far, IMO. Of course, we're already line-based, so it would be still be restricted to a single line. But when you think about what it could match, such as a paragraph that starts with [[ and ends with ]], with lots going on in between, you begin to realize how much it could potentially match. So it always has to be restricted. But I think we can loosen our definition of an ID value beyond what XML mandates.

@philipp2100
Copy link

philipp2100 commented Mar 21, 2022

It seems that some work has been done since and for example ":" and "_" are now allowed in IDs (assuming it really wasn't supported before).
However, "/" (and also ".") are still not working, while the documentation suggests they should, so at least the documentation should be changed I think.

$ id=a.b; printf '%s\n' "[#$id]" 'mytext' '' "xref:#$id[reference] from same file" >included.adoc
$ asciidoctor included.adoc
$ grep -B1 mytext included.html
<div id="a" class="paragraph b">
<p>mytext</p>
$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants