Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse link references without knowledge of definitions #702

Open
chrisjsewell opened this issue Apr 10, 2022 · 3 comments
Open

Parse link references without knowledge of definitions #702

chrisjsewell opened this issue Apr 10, 2022 · 3 comments

Comments

@chrisjsewell
Copy link

Heya, I would like to understand the rationale behind https://spec.commonmark.org/0.30/#example-568

[foo][bar][baz]

[baz]: /url
<p>[foo]<a href="/url">bar</a></p>

This enforces on both users and parsers that [foo][bar][baz] cannot be "understood" in isolation, but only after all definitions within the document have been identified.

Particularly for parsers (such as markdown-it and remark), this necessitates a bunch of extra complexity to run a "pre-parse", before one can actually parse the document in full.
In turn, it precludes any kind of streamed or incremental parsing, or to write a good regex based syntax highlighter (such as TextMate grammars)

I feel the output of this example should be:

<p>[foo][bar]<a href="/url">baz</a></p>

or even just

<p><a href="/url">baz</a></p>

i.e.

  1. There would be a full parse, during which both [foo][bar] and [baz] are captured as link references in the AST.
  2. During this parse all definitions are also captured
  3. On conversion to HTML, when encountering the [foo][bar] link reference, with no matched definition, it would be output in its raw (encoded) format, or even just omitted.

Is there any rationale to Example 568 that I am missing?


In fact, the syntax highlighting, here on GitHub, demonstrates exactly the problem, in that it cannot "work out" what is a link reference, and incorrectly highlights [foo]:

image

@jgm
Copy link
Member

jgm commented Apr 10, 2022

This is exactly the point I made here: https://johnmacfarlane.net/beyond-markdown.html#reference-links
It is one of a number of things I would have done differently if we were not constrained by compatibility with existing markdown behavior.

@jgm
Copy link
Member

jgm commented Apr 10, 2022

Oh, and as I say in the article: changing things so that links can be recognized without parsing the whole document means no more "shortcut" links, e.g. [foo]. (Unless you want to recognize everything of that form as a link, which then requires escaping of every literal [ character.) I think many markdownists would regard this as a heavy cost.

@chrisjsewell
Copy link
Author

chrisjsewell commented Apr 10, 2022

Thanks for the link @jgm that's really interesting, and glad to know that I was completely alone in feeling this 😅

if we were not constrained by compatibility with existing markdown behavior.
I think many markdownists would regard this as a heavy cost.

So, I guess my question would be; do we have to forever be constrained by legacy, or is there any world where this could have some form of spec compliance 😬

Let me be clear up front that I’m not suggesting any change in the goals of the Commonmark project. If these reflections lead to anything, it should probably be an entirely new project under a new name.

Did you ever look into getting any "consensus" over your proposals?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants