Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character references in link definition labels #616

Open
wooorm opened this issue Oct 1, 2019 · 5 comments
Open

Character references in link definition labels #616

wooorm opened this issue Oct 1, 2019 · 5 comments

Comments

@wooorm
Copy link
Contributor

wooorm commented Oct 1, 2019

  • Character references are allowed everywhere, except in fenced code, indented code, or code spans
  • They represent their resolved character, not syntax

There’s even example 318 of having them in link definition destinations and link definition titles.

But, the following does not resolve into a link:

[©]: example.com

[©][]

I interpret the spec as saying that it should resolve, but then the dingus doesn’t.
This may be a bug for the dingus implementation, rather than the spec.

@Crissov
Copy link
Contributor

Crissov commented Oct 2, 2019

I agree that, to meet author expectation or intuition, character references of all kinds should be normalized in link labels (and elsewhere), especially since letter case is being ignored. Unfortunately, only a single implementation, Maruku, does it this way, although most CM-conformant parsers (and Pandoc) will happily convert any HTML entities to plain characters on output.

One label matches another just in case their normalized forms are equal.
To normalize a label,
strip off the opening and closing brackets,
perform the Unicode case fold,
strip leading and trailing whitespace and
collapse consecutive internal whitespace to a single space.

Note that matching is performed on normalized strings,
not parsed inline content.
So the following does not match,
even though the labels define equivalent inline content:

Example 541

[bar][foo\!]

[foo!]: /url

The rules for the link text are the same as with inline links.

An inline link […]
character references in the destination will be parsed into the corresponding Unicode code points, as usual.

character references are recognized in any context besides code spans or code blocks,
including URLs, link titles, […]

link label […]
The contents of the first link label are parsed as inlines, which are used as the link’s text.

The link text may contain inline content: [Example 526]

@mgeier
Copy link
Contributor

mgeier commented Oct 2, 2019

This might be related to #572.

@wooorm
Copy link
Contributor Author

wooorm commented Jul 4, 2020

Btw, I think this should be true for character escapes too:

[©]: a.com
[\!]: b.com

Both should link: [©], [!]

Yields:

Both should link: [©], [!]

@wooorm
Copy link
Contributor Author

wooorm commented Jul 4, 2020

@jgm Is this something you agree with? I can create a PR to clarify the docs

@vassudanagunta
Copy link

I don't think this needs clarification of the docs so much as a bug report against CommonMark.js.

That said, every single Markdown implementation but one fails this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants