Parse link references without knowledge of definitions #702

chrisjsewell · 2022-04-10T01:09:54Z

Heya, I would like to understand the rationale behind https://spec.commonmark.org/0.30/#example-568

[foo][bar][baz]

[baz]: /url

<p>[foo]<a href="/url">bar</a></p>

This enforces on both users and parsers that [foo][bar][baz] cannot be "understood" in isolation, but only after all definitions within the document have been identified.

Particularly for parsers (such as markdown-it and remark), this necessitates a bunch of extra complexity to run a "pre-parse", before one can actually parse the document in full.
In turn, it precludes any kind of streamed or incremental parsing, or to write a good regex based syntax highlighter (such as TextMate grammars)

I feel the output of this example should be:

<p>[foo][bar]<a href="/url">baz</a></p>

or even just

<p><a href="/url">baz</a></p>

i.e.

There would be a full parse, during which both [foo][bar] and [baz] are captured as link references in the AST.
During this parse all definitions are also captured
On conversion to HTML, when encountering the [foo][bar] link reference, with no matched definition, it would be output in its raw (encoded) format, or even just omitted.

Is there any rationale to Example 568 that I am missing?

In fact, the syntax highlighting, here on GitHub, demonstrates exactly the problem, in that it cannot "work out" what is a link reference, and incorrectly highlights [foo]:

The text was updated successfully, but these errors were encountered:

jgm · 2022-04-10T02:42:58Z

This is exactly the point I made here: https://johnmacfarlane.net/beyond-markdown.html#reference-links
It is one of a number of things I would have done differently if we were not constrained by compatibility with existing markdown behavior.

jgm · 2022-04-10T02:49:19Z

Oh, and as I say in the article: changing things so that links can be recognized without parsing the whole document means no more "shortcut" links, e.g. [foo]. (Unless you want to recognize everything of that form as a link, which then requires escaping of every literal [ character.) I think many markdownists would regard this as a heavy cost.

chrisjsewell · 2022-04-10T02:57:57Z

Thanks for the link @jgm that's really interesting, and glad to know that I was completely alone in feeling this 😅

if we were not constrained by compatibility with existing markdown behavior.
I think many markdownists would regard this as a heavy cost.

So, I guess my question would be; do we have to forever be constrained by legacy, or is there any world where this could have some form of spec compliance 😬

Let me be clear up front that I’m not suggesting any change in the goals of the Commonmark project. If these reflections lead to anything, it should probably be an entirely new project under a new name.

Did you ever look into getting any "consensus" over your proposals?

chrisjsewell mentioned this issue Apr 13, 2022

Better link syntax for cross-references executablebooks/MyST-Parser#548

Open

chrisjsewell mentioned this issue Dec 5, 2022

Include directive: Allow separate includes for reference links and reference definition executablebooks/MyST-Parser#517

Open

chrisjsewell mentioned this issue Jun 7, 2023

Centralise indented code block checks markdown-it/markdown-it#936

Open

chrisjsewell mentioned this issue May 12, 2024

Add warning for invalid footnotes reference (and unused definitions) executablebooks/MyST-Parser#930

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse link references without knowledge of definitions #702

Parse link references without knowledge of definitions #702

chrisjsewell commented Apr 10, 2022

jgm commented Apr 10, 2022

jgm commented Apr 10, 2022

chrisjsewell commented Apr 10, 2022 •

edited

Parse link references without knowledge of definitions #702

Parse link references without knowledge of definitions #702

Comments

chrisjsewell commented Apr 10, 2022

jgm commented Apr 10, 2022

jgm commented Apr 10, 2022

chrisjsewell commented Apr 10, 2022 • edited

chrisjsewell commented Apr 10, 2022 •

edited