Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attr_list breaks if the value contains braces, requires HTML-encoding them #1390

Closed
iBug opened this issue Oct 12, 2023 · 16 comments · Fixed by #1414
Closed

attr_list breaks if the value contains braces, requires HTML-encoding them #1390

iBug opened this issue Oct 12, 2023 · 16 comments · Fixed by #1414
Labels
someday-maybe Approved low priority request.

Comments

@iBug
Copy link

iBug commented Oct 12, 2023

This issue originated from squidfunk/mkdocs-material#6177 where I was exploring the "override copied text" feature.

The following code block breaks Markdown parsing:

``` { .c data-copy="int main() { return 0; }" }
Try copying me for some C code
```

It doesn't quite make sense to me as everything is already inside double quotes.

Instead, HTML-encoding the braces (and you can add newlines) solves the problem:

``` { .c data-copy="int main() {
  return 0;
&#125" }
Try copying me for some C code
```

Using HTML-encoding is not intuitive to most users. Maybe this could get changed to Python escapes?

@waylan
Copy link
Member

waylan commented Oct 16, 2023

Interesting issue. When the feature was implemented, we did not foresee any sort of formatted text being passed in as a value, which explains why you need to jump through hoops to make it work. For example, we might expect a class name or a value for an HTML title attribute. In either case, there would be no markup. Although, I suppose a title attribute could conceivably contain curly braces. I'll have to think about this.

Regarding the specific feature of mkdocs-material; I find it very strange that the feature even exists. I can't imagine any scenario where anyone would want that specific feature. Of course, use cases should not be limited by my imagination (or lack thereof), but it does give me pause. I will certainly not be adding specific support for newlines or other advanced formatting. The only thing I would consider here at all is to maybe allow curly braces within quoted values.

The tricky thing about this is that the initial regex does not parse the keys and values. It simply grabs the entire string (between the curly braces) and then passes that on to a separate key/value parser. We currently grab that string with \{(?P<attrs>[^\}\n]*)\} which does not allow any closing curly braces within the string, We would need to change that portion of the regex to be more greedy and match to the last closing curly brace on the line (maybe \{(?P<attrs>[^\n]*)\} or similar).

I'm curious how Superfences handles this scenario. Any input from @facelessuser or @squidfunk is welcome.

@waylan waylan added the needs-decision A decision needs to be made regarding request. label Oct 16, 2023
@squidfunk
Copy link

Thanks for the evaluation, @waylan.

Regarding the specific feature of mkdocs-material; I find it very strange that the feature even exists. I can't imagine any scenario where anyone would want that specific feature. Of course, use cases should not be limited by my imagination (or lack thereof), but it does give me pause. I will certainly not be adding specific support for newlines or other advanced formatting. The only thing I would consider here at all is to maybe allow curly braces within quoted values.

A user wanted to explicitly set the text for the clipboard button, which we implemented in squidfunk/mkdocs-material#6086. We consider this to be highly experimental and are considering removing this functionality again, if it turns out to be complicated for users to use. If you follow the issue, you can learn that the use case is that some users have comments in their code blocks which they don't want to be copied.

If this turns out to be too involved to implement, or it can only be partially solved (e.g. curly braces are okay, but everything else needs to be escaped) we will remove the feature again, since we don't consider the necessity for escape everything with HTML entities to be very user-friendly. It's too prone to errors.

@facelessuser
Copy link
Collaborator

SuperFences uses attr_lists if enabled, to handle parsing code block headers of this sort. I don't personally ever use anything complicated that requires { or } in my attributes, nor have I had a need so far. I am generally indifferent as I don't use the data-copy feature that Mkdocs Material has implemented either.

With that said, I can see how and why someone would desire sane handling. I do think it is possible to craft a solution that could consume strings properly. But I also would probably not care if it was decided { and } handling wasn't fixed. Again, I don't stress the attribute system in this way currently. I guess I can't rule out the possibility that one day I may run into such an issue, but if I did, since HTML escapes can navigate me around the issue, I'd probably be okay for the 1 or 2 times I may ever need it.

@waylan
Copy link
Member

waylan commented Oct 16, 2023

@squidfunk thanks for the explanation. I commented on your issue with a suggestion for a different way to solve your problem. Whether you go that route or another is your choice. But the best you can hope for from us here is to allow curly braces in values. All other limitations will remain.

@facelessuser the one question I was hoping you would answer you didn't. The attr_list parser is not the issue, The issue is that fenced_code doesn't pass the curly braces on to attr_list. I was wondering if maybe superfences wasn't so restrictive.

@facelessuser
Copy link
Collaborator

@waylan It passes whatever is in the attribute to attr_list. This is no different than what using attr_list on a header does.

If use the following, attr_list fails to parse the attributes and they are not applied.

# Test {data-test="{}"}

When using Superfences, such an attribute would cause parsing to fail as well. If fenced_code handles it special, we do not currently model that behavior.

@iBug
Copy link
Author

iBug commented Oct 16, 2023

If I were to fix this, this seems sufficient:

-((\{(?P<attrs>[^\}\n]*)\})|
+((\{(?P<attrs>[^\n]*)\})|

As it is currently written, the only thing that may come after the attr_list on the opening line is hl_lines, which I would not consider a candidate for bringing braces. Removing } from that character class allows greedy matching while staying on the opening line. Any ideas?

@waylan
Copy link
Member

waylan commented Oct 17, 2023

@facelessuser the attr_list parser handles curly braces just fine.

>>> from markdown.extensions.attr_list import get_attrs
>>> get_attrs('data-test="{}"')
[('data-test', '{}')]

Like fenced_code, headers fail because the regex which grabs the attribute list is failing, not the attr_list parser itself. And that is why I ask what Superfences behavior is. It's possible that the method you use to identify the list before passing to the list parser works differently and this is a nonissue.

@facelessuser
Copy link
Collaborator

@waylan I was comparing the actual processing of the values in Markdown:

import markdown

MD = R"""
## test {data-test="{}"}
"""

print(markdown.markdown(MD, extensions=['attr_list']))
<h2>test {data-test="{}"}</h2>

In general, we were being consistent with how attr_list was handled everywhere else, but yes, we could adjust our own pattern to deviate. If get_attrs handles them fine, we could possibly relax our code patterns.

@facelessuser
Copy link
Collaborator

I think as far as SuperFences is concerned, we have a clear starting indicator (```), so we can probably just relax the pattern.

For normal inline patterns, attr_list would need to have a more precise pattern to avoid gobbling a whole line. Again, I don't know if attr_list plans to fix this for other cases, but at least as far as code blocks are concerned, this may be an easy fix for SuperFences.

In the worst case, if we find undesirable examples of it being too greedy, we could craft a more explicit rule for ourselves.

@facelessuser
Copy link
Collaborator

The change has been made in SuperFences.

@waylan
Copy link
Member

waylan commented Oct 17, 2023

For normal inline patterns, attr_list would need to have a more precise pattern to avoid gobbling a whole line. Again, I don't know if attr_list plans to fix this for other cases, but at least as far as code blocks are concerned, this may be an easy fix

This is what gives me pause. While it is an easy fix for fenced code blocks, it introduces an inconsistency with all other attr_lists. And to adjust the others would be a much more involved process.

@facelessuser
Copy link
Collaborator

Yep, it is definitely more involved. While I think possible, and maybe even doable as a single (much more complex) pattern, it is not nearly as simple.

@squidfunk
Copy link

FYI, we removed this feature from our documentation. The current implementation is too fragile and will likely lead to many problems when being used by non-technical users. We don't feel comfortable providing support for it at the current stage.

@facelessuser
Copy link
Collaborator

Well, regardless, the next version of pymdown-extensions will handle braces better in code block attributes.

@waylan
Copy link
Member

waylan commented Oct 18, 2023

@squidfunk thanks for the update. Given the fact that (1) there is no officially supported need for it, (2) the proposed modification would introduce a inconsistency between attr_lists on fenced code blocks compared to all other elements, and (3) Superfences provides a working solution for those who really want it, I'm inclined to not make a change to fenced code blocks at this time.

That said, if someone were to submit a PR which consistently altered the attr_list behavior across all elements to allow curly braces, I would be wiling to give it consideration. Therefore, I'm not going to close this but will give it the [someday-maybe] label.

@waylan waylan added someday-maybe Approved low priority request. and removed needs-decision A decision needs to be made regarding request. labels Oct 18, 2023
@oprypin
Copy link
Contributor

oprypin commented Nov 11, 2023

That said, if someone were to submit a PR which consistently altered the attr_list behavior across all elements to allow curly braces, I would be wiling to give it consideration.

I believe it satisfies this requirement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
someday-maybe Approved low priority request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants