Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about attribute names #714

Closed
AndrewLester opened this issue Jun 16, 2022 · 7 comments
Closed

Question about attribute names #714

AndrewLester opened this issue Jun 16, 2022 · 7 comments

Comments

@AndrewLester
Copy link

AndrewLester commented Jun 16, 2022

From the spec, an attribute name must start with an ASCII letter, _, or :.

The way this is implemented can lead to different results in parsing (as far as I can tell, though I may be wrong). Take two p tags with invalid attributes named --foo, one inline and one as a block:

Inline: https://spec.commonmark.org/dingus/?text=inline%3A%20%3Cp%20--q%3D%22hi%22%20%2F%3E
Block: https://spec.commonmark.org/dingus/?text=%3Cp%20--q%3D%22hi%22%20%2F%3E

The inline example parses as text, presumably(?) because the attribute name isn't valid. The block example, on the other hand, parses as an HTML block with the attribute intact.

Regardless of all this, I was just wondering why attribute names can only start with those specific characters.

@jgm
Copy link
Member

jgm commented Jun 17, 2022

Block HTML rules are a bit different, yes -- that's partly because we want to parse line-by-line, but may not have a complete tag on a line.

Regardless of all this, I was just wondering why attribute names can only start with those specific characters.

I honestly don't remember, but presumably we were targeting the old XML rules?
Also, if anything goes, there is more room for ambiguity.
Is there a specific need to relax this?

@AndrewLester
Copy link
Author

AndrewLester commented Jun 17, 2022

Also, if anything goes, there is more room for ambiguity.

That's a good point, and I probably should have just started with the reason I was asking.

Essentially, I've been using an MDX binding for the frontend JS framework Svelte, and one useful Svelte feature is passing CSS variables to components. Using components in Svelte is just like using a tag in HTML, and passing CSS variables is like adding an attribute. It looks like so:

<Component --var="val" />

Since the spec doesn't support attributes starting with -, the parser doesn't output an HTML block for this code (if it's inline) when that's what is necessary for the binding to work.

I'm not too experienced with writing such bindings, so there may be a better way to go about fixing this in the binding itself. That said, after looking at this sentence in the spec,

Tag and attribute names are not limited to current HTML tags, so custom tags (and even, say, DocBook tags) may be used.

It seems like it would make sense to support more forms of attribute names. Perhaps names starting with -.

@wooorm
Copy link
Contributor

wooorm commented Jun 18, 2022

This isn’t MDX btw. This is a different language.

I don’t understand what CommonMark has to do with this. You are using mdsvex, which has its own parser, which parses this. It is expected that other parsers don’t support mdsvex’s custom additions.

@AndrewLester
Copy link
Author

I don’t understand what CommonMark has to do with this. You are using mdsvex, which has its own parser, which parses this. It is expected that other parsers don’t support mdsvex’s custom additions.

That's a great point, and yes MDsveX does have its own parser, which is built upon the unified js pipeline and incorporates remark-parse. I might be mistaken, but remark-parse uses some other libraries that follow this spec? When remark-parse passes over the code it gives a text node for what would be nice to have as HTML, just like in the demo. Nice to have meaning, if it parsed as HTML, MDsveX wouldn't have to make any changes. Nor would any other parsers that may deal with a language incorporating such syntax (which is actually just valid HTML). Let me know if my explanation here isn't great, as again I'm not too experienced with this.

@wooorm
Copy link
Contributor

wooorm commented Jun 19, 2022

remark-parse changed a bunch. It now indeed uses a set of lower-level packages (micromark) to deal with parsing. As far as I am aware, MDsveX is on an older version where all of that was still done in `remark-parse.

The problem, compared to MDX, is that remark itself doesn’t understand MDsveX, and it can’t, because the syntax extensions are not made as a plugin.
remark can understand MDX, because it is available with a plugin: remark-mdx. Similar to how GFM can be understood by using remark-gfm.

Now, I am not sure your problems are solved by changing commonmark. I presume that there are many more changes needed. Probably changes that other people that do use commonmark but don’t use mdsvex wouldn’t agree with.

There is definitely something to say for improved HTML parsing here. Though there are downsides with relaxing things as John points out above.
But, relaxing these rules may not solve your root problem.
And, 100% supporting HTML is really complex and involves a lot of code, making every markdown parser out there like double in size!

@AndrewLester
Copy link
Author

Now, I am not sure your problems are solved by changing commonmark. I presume that there are many more changes needed. Probably changes that other people that do use commonmark but don’t use mdsvex wouldn’t agree with.

From what I can tell, my specific issue is fixed by just making the small change to the spec/implementations I've noted above. I tested on MDsveX with its current remark-parse@8.0.2 with the small change, and on the latest version of remark-parse and making a micromark change. I also tested micromark with such a change and all tests pass.

There is definitely something to say for improved HTML parsing here. Though there are downsides with relaxing things as John points out above. But, relaxing these rules may not solve your root problem. And, 100% supporting HTML is really complex and involves a lot of code, making every markdown parser out there like double in size!

That's a good point, and I wouldn't argue for supporting 100% of the HTML spec. That would certainly take a lot of work on other people's parts because I wouldn't know where to start. This change, however, would fix an issue of mine, is quite a small addition (from what I can tell), and would reduce work on other people's parts (MDsveX, and anyone else trying to do use attributes starting with -).

remark-parse changed a bunch. It now indeed uses a set of lower-level packages (micromark) to deal with parsing. As far as I am aware, MDsveX is on an older version where all of that was still done in `remark-parse.

To me, this seems to be the primary blocker for any work here being valuable. MDsveX will need to update a lot to even obtain any changes to micromark, and in doing so it could just create one of those remark plugins anyway. Thank you for your explanation on this system by the way, as I didn't grasp it completely.

I guess all of this boils down to a tradeoff here, if I've even convinced you that changing the spec to support attribute names starting with - is worth it. Would MDsveX creating its own remark plugin or just updating from remark-parse 8.0.2 to 10.x be the better choice? The former seems like a lot more work to me, but may have future benefits. The latter might be simple, but would require the specific micromark/spec change to have any value.

To avoid dragging on this issue any further, I'll close it since my question was certainly answered. If there's reason to change the spec another issue can be opened. Thank you both for your time and advice.

@wooorm
Copy link
Contributor

wooorm commented Jun 19, 2022

I do recommend that mdsvex update to modern remark and through that to switch to micromark. micromark exists a) to support commonmark completely, which the 10 year old parser did not, b) to support JSX and ESM completely, which MDX 1 did not but MDX 2 can due to micromark.

By the way, you could use MDX itself with svelte: https://mdxjs.com/docs/getting-started/#jsx. I still don’t really grasp your root problem so I am not sure how to best help you with what you want.

Adding this change, --foo, does not make MDX itself obsolete: MDX intentionally breaks with markdown in several ways to add more things, and instead of having “loose” HTML rules that are in commonmark and you suggest changing, it actually does parse JSX completely.

I am personally in favor of improving alignment of HTML in markdown here (see e.g., #713). But I think that should be disconnected from custom markdown flavors and be solely focussed on improving HTML support while not breaking folks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants