Question about attribute names #714

AndrewLester · 2022-06-16T23:09:02Z

From the spec, an attribute name must start with an ASCII letter, _, or :.

The way this is implemented can lead to different results in parsing (as far as I can tell, though I may be wrong). Take two p tags with invalid attributes named --foo, one inline and one as a block:

Inline: https://spec.commonmark.org/dingus/?text=inline%3A%20%3Cp%20--q%3D%22hi%22%20%2F%3E
Block: https://spec.commonmark.org/dingus/?text=%3Cp%20--q%3D%22hi%22%20%2F%3E

The inline example parses as text, presumably(?) because the attribute name isn't valid. The block example, on the other hand, parses as an HTML block with the attribute intact.

Regardless of all this, I was just wondering why attribute names can only start with those specific characters.

The text was updated successfully, but these errors were encountered:

jgm · 2022-06-17T00:38:20Z

Block HTML rules are a bit different, yes -- that's partly because we want to parse line-by-line, but may not have a complete tag on a line.

Regardless of all this, I was just wondering why attribute names can only start with those specific characters.

I honestly don't remember, but presumably we were targeting the old XML rules?
Also, if anything goes, there is more room for ambiguity.
Is there a specific need to relax this?

AndrewLester · 2022-06-17T05:06:22Z

Also, if anything goes, there is more room for ambiguity.

That's a good point, and I probably should have just started with the reason I was asking.

Essentially, I've been using an MDX binding for the frontend JS framework Svelte, and one useful Svelte feature is passing CSS variables to components. Using components in Svelte is just like using a tag in HTML, and passing CSS variables is like adding an attribute. It looks like so:

<Component --var="val" />

Since the spec doesn't support attributes starting with -, the parser doesn't output an HTML block for this code (if it's inline) when that's what is necessary for the binding to work.

I'm not too experienced with writing such bindings, so there may be a better way to go about fixing this in the binding itself. That said, after looking at this sentence in the spec,

Tag and attribute names are not limited to current HTML tags, so custom tags (and even, say, DocBook tags) may be used.

It seems like it would make sense to support more forms of attribute names. Perhaps names starting with -.

wooorm · 2022-06-18T20:37:17Z

This isn’t MDX btw. This is a different language.

I don’t understand what CommonMark has to do with this. You are using mdsvex, which has its own parser, which parses this. It is expected that other parsers don’t support mdsvex’s custom additions.

AndrewLester · 2022-06-19T03:27:12Z

I don’t understand what CommonMark has to do with this. You are using mdsvex, which has its own parser, which parses this. It is expected that other parsers don’t support mdsvex’s custom additions.

That's a great point, and yes MDsveX does have its own parser, which is built upon the unified js pipeline and incorporates remark-parse. I might be mistaken, but remark-parse uses some other libraries that follow this spec? When remark-parse passes over the code it gives a text node for what would be nice to have as HTML, just like in the demo. Nice to have meaning, if it parsed as HTML, MDsveX wouldn't have to make any changes. Nor would any other parsers that may deal with a language incorporating such syntax (which is actually just valid HTML). Let me know if my explanation here isn't great, as again I'm not too experienced with this.

wooorm · 2022-06-19T15:39:25Z

remark-parse changed a bunch. It now indeed uses a set of lower-level packages (micromark) to deal with parsing. As far as I am aware, MDsveX is on an older version where all of that was still done in `remark-parse.

The problem, compared to MDX, is that remark itself doesn’t understand MDsveX, and it can’t, because the syntax extensions are not made as a plugin.
remark can understand MDX, because it is available with a plugin: remark-mdx. Similar to how GFM can be understood by using remark-gfm.

Now, I am not sure your problems are solved by changing commonmark. I presume that there are many more changes needed. Probably changes that other people that do use commonmark but don’t use mdsvex wouldn’t agree with.

There is definitely something to say for improved HTML parsing here. Though there are downsides with relaxing things as John points out above.
But, relaxing these rules may not solve your root problem.
And, 100% supporting HTML is really complex and involves a lot of code, making every markdown parser out there like double in size!

AndrewLester · 2022-06-19T19:17:01Z

Now, I am not sure your problems are solved by changing commonmark. I presume that there are many more changes needed. Probably changes that other people that do use commonmark but don’t use mdsvex wouldn’t agree with.

From what I can tell, my specific issue is fixed by just making the small change to the spec/implementations I've noted above. I tested on MDsveX with its current remark-parse@8.0.2 with the small change, and on the latest version of remark-parse and making a micromark change. I also tested micromark with such a change and all tests pass.

There is definitely something to say for improved HTML parsing here. Though there are downsides with relaxing things as John points out above. But, relaxing these rules may not solve your root problem. And, 100% supporting HTML is really complex and involves a lot of code, making every markdown parser out there like double in size!

That's a good point, and I wouldn't argue for supporting 100% of the HTML spec. That would certainly take a lot of work on other people's parts because I wouldn't know where to start. This change, however, would fix an issue of mine, is quite a small addition (from what I can tell), and would reduce work on other people's parts (MDsveX, and anyone else trying to do use attributes starting with -).

remark-parse changed a bunch. It now indeed uses a set of lower-level packages (micromark) to deal with parsing. As far as I am aware, MDsveX is on an older version where all of that was still done in `remark-parse.

To me, this seems to be the primary blocker for any work here being valuable. MDsveX will need to update a lot to even obtain any changes to micromark, and in doing so it could just create one of those remark plugins anyway. Thank you for your explanation on this system by the way, as I didn't grasp it completely.

I guess all of this boils down to a tradeoff here, if I've even convinced you that changing the spec to support attribute names starting with - is worth it. Would MDsveX creating its own remark plugin or just updating from remark-parse 8.0.2 to 10.x be the better choice? The former seems like a lot more work to me, but may have future benefits. The latter might be simple, but would require the specific micromark/spec change to have any value.

To avoid dragging on this issue any further, I'll close it since my question was certainly answered. If there's reason to change the spec another issue can be opened. Thank you both for your time and advice.

wooorm · 2022-06-19T19:38:28Z

I do recommend that mdsvex update to modern remark and through that to switch to micromark. micromark exists a) to support commonmark completely, which the 10 year old parser did not, b) to support JSX and ESM completely, which MDX 1 did not but MDX 2 can due to micromark.

By the way, you could use MDX itself with svelte: https://mdxjs.com/docs/getting-started/#jsx. I still don’t really grasp your root problem so I am not sure how to best help you with what you want.

Adding this change, --foo, does not make MDX itself obsolete: MDX intentionally breaks with markdown in several ways to add more things, and instead of having “loose” HTML rules that are in commonmark and you suggest changing, it actually does parse JSX completely.

I am personally in favor of improving alignment of HTML in markdown here (see e.g., #713). But I think that should be disconnected from custom markdown flavors and be solely focussed on improving HTML support while not breaking folks.

AndrewLester closed this as completed Jun 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about attribute names #714

Question about attribute names #714

AndrewLester commented Jun 16, 2022 •

edited

Loading

jgm commented Jun 17, 2022

AndrewLester commented Jun 17, 2022 •

edited

Loading

wooorm commented Jun 18, 2022

AndrewLester commented Jun 19, 2022

wooorm commented Jun 19, 2022

AndrewLester commented Jun 19, 2022

wooorm commented Jun 19, 2022

Question about attribute names #714

Question about attribute names #714

Comments

AndrewLester commented Jun 16, 2022 • edited Loading

jgm commented Jun 17, 2022

AndrewLester commented Jun 17, 2022 • edited Loading

wooorm commented Jun 18, 2022

AndrewLester commented Jun 19, 2022

wooorm commented Jun 19, 2022

AndrewLester commented Jun 19, 2022

wooorm commented Jun 19, 2022

AndrewLester commented Jun 16, 2022 •

edited

Loading

AndrewLester commented Jun 17, 2022 •

edited

Loading