New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LinkReferenceDefinition followed by setText with blank line #395

Open
xoofx opened this Issue Feb 19, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@xoofx
Contributor

xoofx commented Feb 19, 2016

How the following should be parsed?

[foo]: 
  /url
  "title"
Test
====

[foo]

While implementing my parser, I'm generating a LinkReferenceDefinition and a proper Setext heading But with the default CommonMark implem, it treats Settext heading as a plain paragraph.

In my implems it outputs:

<h1>Test</h1>
<a href="/url" title="title">foo</a>

but cmark generates this:

<h1>[foo]:
/url
&quot;title&quot;
Test</h1>
<p>[foo]</p>
@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Feb 19, 2016

Member

Actually, current cmark gives:

[foo]: /url "title" Test

[foo]

The spec is currently unclear about this. It says that a link
reference definition cannot interrupt a paragraph, but it doesn't
say anything about whether a paragraph or another block element,
like a header, can directly follow it, without intervening blank space.

The parsers use the following method: when they've accumulated
a bunch of lines of paragraph text, they first try to see if
the initial lines can be parsed as link reference definitions.
Anything that remains is still a paragraph. But in this case
the whole thing is being taken as a multiline setext header,
so the parsing for reference link defs doesn't occur.

+++ Alexandre Mutel [Feb 19 16 01:29 ]:

How the following should be parsed?
[foo]:
/url
"title"

Test

[foo]

While implementing my parser, I'm generating a LinkReferenceDefinition
and a proper Setext heading But with the default CommonMark implem, it
treats Settext heading as a plain paragraph.


Reply to this email directly or [1]view it on GitHub.

References

  1. #395
Member

jgm commented Feb 19, 2016

Actually, current cmark gives:

[foo]: /url "title" Test

[foo]

The spec is currently unclear about this. It says that a link
reference definition cannot interrupt a paragraph, but it doesn't
say anything about whether a paragraph or another block element,
like a header, can directly follow it, without intervening blank space.

The parsers use the following method: when they've accumulated
a bunch of lines of paragraph text, they first try to see if
the initial lines can be parsed as link reference definitions.
Anything that remains is still a paragraph. But in this case
the whole thing is being taken as a multiline setext header,
so the parsing for reference link defs doesn't occur.

+++ Alexandre Mutel [Feb 19 16 01:29 ]:

How the following should be parsed?
[foo]:
/url
"title"

Test

[foo]

While implementing my parser, I'm generating a LinkReferenceDefinition
and a proper Setext heading But with the default CommonMark implem, it
treats Settext heading as a plain paragraph.


Reply to this email directly or [1]view it on GitHub.

References

  1. #395
@jgm

This comment has been minimized.

Show comment
Hide comment
@jgm

jgm Oct 28, 2016

Member

I think this is really an implementation bug rather than a problem with the spec.
The spec says that the lines that form the content of a setext header must be such that, in the absence of the header line, they'd form a paragraph. That's not the case in this example; they'd form a link definition + a paragraph.

So the implementation needs to be fixed; we need to resolve link references before generating a setext header (and check that there's still some residual paragraph content left).

Member

jgm commented Oct 28, 2016

I think this is really an implementation bug rather than a problem with the spec.
The spec says that the lines that form the content of a setext header must be such that, in the absence of the header line, they'd form a paragraph. That's not the case in this example; they'd form a link definition + a paragraph.

So the implementation needs to be fixed; we need to resolve link references before generating a setext header (and check that there's still some residual paragraph content left).

@xoofx

This comment has been minimized.

Show comment
Hide comment
@xoofx

xoofx Oct 29, 2016

Contributor

So the implementation needs to be fixed; we need to resolve link references before generating a setext header (and check that there's still some residual paragraph content left).

Yes, that's what I'm doing in markdig

Contributor

xoofx commented Oct 29, 2016

So the implementation needs to be fixed; we need to resolve link references before generating a setext header (and check that there's still some residual paragraph content left).

Yes, that's what I'm doing in markdig

@mgeier

This comment has been minimized.

Show comment
Hide comment
@mgeier

mgeier May 17, 2017

Contributor

I think this is related to #461.

Not only link reference definitions have to be stripped, (after that) also leading and trailing whitespace.
Only if something is left then, a setext header or paragraph should be created.

There is still a problem, though:

[a]: b
======

If there is nothing left to turn this into a setext header, what's supposed to happen with the line of equals signs?

markdig seems to swallow them: http://johnmacfarlane.net/babelmark2/?normalize=1&text=%5Ba%5D%3A+b%0A%3D%3D%3D%3D%3D%3D.
@xoofx This doesn't sound right to me.

The problem is that checking for the link reference definition should IMHO be "undoable".
This is how I imagine a parser could "think" about the above input:

  • Does the line [a]: b create a block? -> No. Do nothing, but keep the line somewhere.
  • Does the line ====== create a block? -> Possibly, if the preceding lines look like inline content.
    • But wait! We also have to check for link reference definitions!
    • It happens that the previous line is a valid link reference definition
    • But wait! No inline content is left, so this isn't a setext header after all!
    • We should try if the line ====== creates some other block.
    • But for that to make sense, we have to undo the parsing of the link reference definition and keep the line [a]: b for later use, right?
    • It turns out that ====== doesn't create a block in this case, let's do nothing, but keep the line somewhere (together with the un-parsed line [a]: b from before)
  • End of input. We should check if we kept some lines and handle them now
    • Check (again) for link reference definition. We found one, let's keep it!
    • After stripping leading and trailing whitespace, there is still one line of ====== left, we should put it into a paragraph!
  • Done.

Does this sound reasonable?

Checking if the line ====== creates some other block (other than a setext header) might look superfluous, but I think it is nicer for potential extensions to keep this step.

Also, it would seem strange to me if the parser creates a link reference while parsing something that in the end turns out to be no match.

Contributor

mgeier commented May 17, 2017

I think this is related to #461.

Not only link reference definitions have to be stripped, (after that) also leading and trailing whitespace.
Only if something is left then, a setext header or paragraph should be created.

There is still a problem, though:

[a]: b
======

If there is nothing left to turn this into a setext header, what's supposed to happen with the line of equals signs?

markdig seems to swallow them: http://johnmacfarlane.net/babelmark2/?normalize=1&text=%5Ba%5D%3A+b%0A%3D%3D%3D%3D%3D%3D.
@xoofx This doesn't sound right to me.

The problem is that checking for the link reference definition should IMHO be "undoable".
This is how I imagine a parser could "think" about the above input:

  • Does the line [a]: b create a block? -> No. Do nothing, but keep the line somewhere.
  • Does the line ====== create a block? -> Possibly, if the preceding lines look like inline content.
    • But wait! We also have to check for link reference definitions!
    • It happens that the previous line is a valid link reference definition
    • But wait! No inline content is left, so this isn't a setext header after all!
    • We should try if the line ====== creates some other block.
    • But for that to make sense, we have to undo the parsing of the link reference definition and keep the line [a]: b for later use, right?
    • It turns out that ====== doesn't create a block in this case, let's do nothing, but keep the line somewhere (together with the un-parsed line [a]: b from before)
  • End of input. We should check if we kept some lines and handle them now
    • Check (again) for link reference definition. We found one, let's keep it!
    • After stripping leading and trailing whitespace, there is still one line of ====== left, we should put it into a paragraph!
  • Done.

Does this sound reasonable?

Checking if the line ====== creates some other block (other than a setext header) might look superfluous, but I think it is nicer for potential extensions to keep this step.

Also, it would seem strange to me if the parser creates a link reference while parsing something that in the end turns out to be no match.

@jgm jgm added this to the 0.29 milestone Aug 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment