-
-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add clarifying tests about HTML and block quotes #738
Conversation
Hmm, this doesn’t have anything to do with block quotes, from what I understand? What’s going on in these two cases is that the start condition of HTML kind 6 matches for https://spec.commonmark.org/0.30/#html-blocks. Some examples: <div
> a
<di>
> b
<di
> c
The same happens for “containers” other than block quotes:
<div
* d
Or say headings:
<div
# e? |
It's true that the <a
ping> |
Right, but that’s something else. The inline rules/algos have nothing to do with how (block quotes and) HTML interact. |
Which section of the spec do you think these test cases belong in, then? |
From a quick glance, this seems close to the cases of https://spec.commonmark.org/0.30/#example-156 and 157? It has “emphasis” currently, and could also have say an ATX heading and a block quote? |
@wooorm Okay, I sorted the examples under the HTML Block and Raw HTML sections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty much in favor of this, I think it’s good to have!
spec.txt
Outdated
<p><a | ||
b></p> | ||
```````````````````````````````` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure this one is needed, it can never be a block quote. But not against it. Wonder what other folks think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Example 614 already covers this case.
<p><a</p> | ||
<blockquote> | ||
</blockquote> | ||
```````````````````````````````` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice here to not have an empty block quote, but to actually have a word there? I don’t think many folks will want to write empty block quotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Let's do that.
A block quote can prevent a line from being parsed as inline HTML, | ||
even though line breaks are allowed in tags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something along the lines of “Block quotes, and other blocks such as headings or lists, precede over inline things:”? My intent here is to show that this isn’t about block quotes per se. Do you think something along those lines makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble with bringing up other block structures is that their syntax isn't part of a valid HTML tag.
An attribute name consists of an ASCII letter, _, or :, followed by zero or more ASCII letters, digits, _, ., :, or -.
...
An open tag consists of a < character, a tag name, zero or more attributes, optional spaces, tabs, and up to one line ending, an optional / character, and a > character.
The weird thing about the spec is that the open tag syntax sounds like it allows <a\n>
, but there's an entirely separate part of the specification (the rules around paragraph interruption) that means you can't actually write that. For an analogous, apparent ambiguity to show up, you'd have to be able to take the below template:
<a
SOMETHING>
... and find a string to fill in for SOMETHING
that is both a valid HTML attribute, but also makes the line a valid instance of some block structure.
Since HTML attributes can't start with #
, *
, -
, `
, ~
, or ASCII digits, that means it can't be ATX headings, bulleted lists, numbered lists, or fenced code blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a very good point, that in the case of block HTML, as condition 7, line endings are not supported, even though the productions of open tag and closing tag could have them. 🤔
<xxx
yyy>
…turns into:
<p><xxx
yyy></p>
Importantly, the text for start condition 7 starts with “line begins with” and ends with “followed by the end of the line”, so I think the intent there is to say that it is all on one line. But it is probably good to add that all of this has to be on one line?
Maybe something like this?:
Start condition: line begins with a complete open tag (with any tag name other than pre, script, style, or textarea) or a complete closing tag, followed by zero or more spaces and tabs, **all on a single line, **followed by the end of the line.
I wonder what @jgm or others think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't object to adding "all on a single line" if you think it resolves a doubt or ambiguity.
Maybe something along the lines of “Block quotes, and other blocks such as headings or lists, precede over inline things:”? My intent here is to show that this isn’t about block quotes per se. Do you think something along those lines makes sense?
See the beginning 3.1 which makes this general assertion about the priority of block-level parsing over inline.
. | ||
<div | ||
> | ||
```````````````````````````````` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe fill this “block quote” with something? I have the same comment on the 3rd new example too
This result makes sense, since inline HTML being inline implies that it is parsed after block quotes, while block HTML being copy-and-pasteable implies that it should eat Markdown syntax like block quotes. However, pulldown-cmark got this wrong, and apparently so do MD4C, markdown-it, and parsedown, according to [Babelmark 3]. [Babelmark 3]: https://babelmark.github.io/?text=%3Ca%0A%3E%0A%0A%3Cdiv%0A%3E
Are there any additional blocking concerns for this patch? |
Looks good to me! |
This result makes sense, since inline HTML being inline implies that it is parsed after block quotes, while block HTML being copy-and-pasteable implies that it should eat Markdown syntax like block quotes. However, pulldown-cmark got this wrong, and apparently so do MD4C, markdown-it, and parsedown, according to Babelmark 3.