[Bug?] CommonMark Markdown Table cells are not CommonMark leaf blocks #66

schlichtanders · 2023-11-09T16:26:55Z

better example

it turned out the actual confusion is better grasped by different examples
see #66 (comment)

original example

given the following text

html_str = "<bond def=\"mu\" unique_id=\"zOnak3/Y4WZl\"><input type='range' min='1' max='5' value='3'/><script>const input_el = currentScript.previousElementSibling; const output_el = currentScript.nextElementSibling; const displays = [\"-3\", \"-1\", \"0\", \"1\", \"3\"]; input_el.addEventListener(\"input\", () => {output_el.value = displays[input_el.valueAsNumber - 1]})</script><output style=' font-family: system-ui; font-size: 15px; margin-left: 3px; transform: translateY(-4px); display: inline-block;'>0</output></bond>"

it works perfectly in HTML, but CommonMark exchanges symbols such that the script does no longer work

"<p><bond def=\"mu\" unique_id=\"zOnak3/Y4WZl\"><input type='range' min='1' max='5' value='3'/><script>const input_el = currentScript.previousElementSibling; const output_el = currentScript.nextElementSibling; const displays = [&quot;-3&quot;, &quot;-1&quot;, &quot;0&quot;, &quot;1&quot;, &quot;3&quot;]; input_el.addEventListener(&quot;input&quot;, () =&gt; {output_el.value = displays[input_el.valueAsNumber - 1]});</script><output style=' font-family: system-ui; font-size: 15px; margin-left: 3px; transform: translateY(-4px); display: inline-block;'>0</output></bond></p> "

EDIT: the above does not parse because the html tag <bond> is non-standard. It is confusing example. Better look at #66 (comment)

I am using the following parser

import CommonMark
parser = CommonMark.Parser()
CommonMark.enable!(parser, CommonMark.DollarMathRule())
CommonMark.enable!(parser, CommonMark.TableRule())
parser(html_str)

Motivation

I am using CommonMark to create Pluto interface for Python. There the most intuitive way for interpolation is to interpolate valid html text into the plain markdown string. Hence it would be really really great, if <script> tags could be supported (they appear already in PlutoUI.Slider for instance)

The text was updated successfully, but these errors were encountered:

MichaelHatherly · 2023-11-09T16:41:44Z

CommonMark (the spec) has got some specific rules for handling of blocks of HTML embedded in markdown, which is what I'vve followed in this implementation. I'd suggest splitting the embedded HTML over several lines, e.g.

julia> html_str_2 = """
       <bond def="mu" unique_id="zOnak3/Y4WZl">
       <input type='range' min='1' max='5' value='3'/>
       <script>
       const input_el = currentScript.previousElementSibling;
       const output_el = currentScript.nextElementSibling;
       const displays = ["-3", "-1", "0", "1", "3"];
       input_el.addEventListener("input", () => {output_el.value = displays[input_el.valueAsNumber - 1]})
       </script>
       <output style=' font-family: system-ui; font-size: 15px; margin-left: 3px; transform: translateY(-4px); display: inline-block;'>0</output>
       </bond>
       """
"<bond def=\"mu\" unique_id=\"zOnak3/Y4WZl\">\n<input type='range' min='1' max='5' value='3'/>\n<script>\nconst input_el = currentScript.previousElementSibling;\nconst output_el = currentScript.nextElementSibling;\nconst displays = [\"-3\", \"-1\", \"0\", \"1\", \"3\"];\ninput_el.addEventListener(\"input\", () => {output_el.value = displays[input_el.valueAsNumber - 1]})\n</script>\n<output style=' font-family: system-ui; font-size: 15px; margin-left: 3px; transform: translateY(-4px); display: inline-block;'>0</output>\n</bond>\n"

julia> parser(html_str_2)
 <bond def="mu" unique_id="zOnak3/Y4WZl">
 <input type='range' min='1' max='5' value='3'/>
 <script>
 const input_el = currentScript.previousElementSibling;
 const output_el = currentScript.nextElementSibling;
 const displays = ["-3", "-1", "0", "1", "3"];
 input_el.addEventListener("input", () => {output_el.value = displays[input_el.valueAsNumber - 1]})
 </script>
 <output style=' font-family: system-ui; font-size: 15px; margin-left: 3px; transform: translateY(-4px); display: inline-block;'>0</output>
 </bond>


julia> html(ans)
"<bond def=\"mu\" unique_id=\"zOnak3/Y4WZl\">\n<input type='range' min='1' max='5' value='3'/>\n<script>\nconst input_el = currentScript.previousElementSibling;\nconst output_el = currentScript.nextElementSibling;\nconst displays = [\"-3\", \"-1\", \"0\", \"1\", \"3\"];\ninput_el.addEventListener(\"input\", () => {output_el.value = displays[input_el.valueAsNumber - 1]})\n</script>\n<output style=' font-family: system-ui; font-size: 15px; margin-left: 3px; transform: translateY(-4px); display: inline-block;'>0</output>\n</bond>\n"

julia> print(ans)
<bond def="mu" unique_id="zOnak3/Y4WZl">
<input type='range' min='1' max='5' value='3'/>
<script>
const input_el = currentScript.previousElementSibling;
const output_el = currentScript.nextElementSibling;
const displays = ["-3", "-1", "0", "1", "3"];
input_el.addEventListener("input", () => {output_el.value = displays[input_el.valueAsNumber - 1]})
</script>
<output style=' font-family: system-ui; font-size: 15px; margin-left: 3px; transform: translateY(-4px); display: inline-block;'>0</output>
</bond>

which, I think, might be closer to what you want.

schlichtanders · 2023-11-09T16:43:44Z

This is surprising indeed!

I just made everything into a single line to be able to interpolate it inside a Markdown Table (which will obviously break with newlines)

MichaelHatherly · 2023-11-09T16:47:48Z

Yeah, behaviour is looking pretty consistent with https://spec.commonmark.org/dingus/, so there's not much that we can really change in this package since it appears (at least from what I've tried) to be the same as the spec.

I just made everything into a single line to be able to interpolate it inside a Markdown Table (which will obviously break with newlines)

Hmm, yeah, tricky. There's really just a limit to the complexity that commonmark can really handle before it ends up not really being commonmark any more.

MichaelHatherly · 2023-11-09T16:48:36Z

https://spec.commonmark.org/0.30/#html-blocks for the link to the HTML block spec that is causing this.

MichaelHatherly · 2023-11-09T16:50:14Z

which will obviously break with newlines

Only solution I see really is multiline cells for markdown tables. I don't really have the need for it, but I would accept any PRs that can manage to implement it as a parser plugin, which I reckon is probably doable, if a bit complex.

schlichtanders · 2023-11-09T17:00:00Z

Multiline cells would not work, as they also would get destroyed by a interpolated newline.

I still do not understand why this is not treated as html from CommonMark: Is it because inside markdown tables html tags are never intrpreted as html, because the entire line does not start with an html tag?

schlichtanders · 2023-11-09T17:05:50Z

My current understanding of the specs:

html leaf blocks need to start with the tag and end with the tag (and a new line sometimes)
leaf blocks can be nested inside container blocks
markdown table is not specified inside CommonMark, but I guess it is best understood as a container block and not a leaf block

Hence every cell in a markdown table is a leaf block (what else should it be?). But then, the html in the cell should be recognized as html, shouldn't it?

schlichtanders · 2023-11-09T17:09:34Z

I just tried wrapping the above into <div></div> and indeed CommonMark does intrprete everything correctly.

but inside the markdown table it breaks

schlichtanders · 2023-11-09T17:29:16Z

Here new minimal examples:

Parsed Correctly

"<div> => </div>"

is parsed as

"<div> => </div> "

Parsed Wrongly (?)

"""
| title    |
| -------- |
|<div> => </div>|
"""

is parsed as

"<table><thead><tr><th align=\"left\">title</th></tr></thead><tbody><tr><td align=\"left\"><div> =&gt; </div></td></tr></tbody></table>"

the > is different

MichaelHatherly · 2023-11-09T17:50:14Z

The table parser was modeled on the github_markdown one from pandoc, so best check whether that behaviour you're seeing there matches, or not, with it. https://pandoc.org/try/

schlichtanders · 2023-11-09T17:59:05Z

As you probably already know, the pandoc github_markdown parser has the same unfortunate behaviour.

Summary:

CommonMark allows for markdown table cells to behave like leaf blocks
intuitively it makes sense that markdown table cells behave like leaf blocks (this interpretation seems most apt when considering CommonMark spec overall)
currently markdown table cells do not behave like leaf blocks (because the parser is modelled after pandoc, which has this behaviour)

Hence I would argue that it makes sense to treat markdown table cells as leaf blocks in CommonMark.jl

schlichtanders · 2023-11-09T18:00:56Z

title
=>

Github markdown seems different from pandocs implementation - you can inspect this very webpage - it shows => and no >. (also the div is rendered and not escaped)

MichaelHatherly · 2023-11-09T18:09:36Z

Feel free to submit a PR that adjusts that behaviour. I'm unlikely to get around to it myself since I'm not really in need of embedded HTML inside table cells myself.

schlichtanders · 2023-11-09T18:25:17Z

I just tested about 4 other python markdown parsers - and they all have the same problem... probably they all copy pandoc's implementation. Hence I have no easy workaround 😅

I indeed may find resources to create the PullRequest soon. Let's see.

I am glad you agree that this would be a good addition.
Looking forward to contribute.

schlichtanders · 2023-11-10T10:24:53Z

@MichaelHatherly a short question - the code mentions that emphasis and links necesarily need to be determined before the TableCells are build up.

Do you know why this is a *must*?

# Low priority since this *must* happen after nested structure of emphasis and
# links is determined. 100 should do fine.
inline_modifier(rule::TableRule) = Rule(100) do parser, block

MichaelHatherly · 2023-11-10T10:38:56Z

In the case where there are | characters inside of emphasis (or other inline syntax), since it needs to ignore those otherwise a | inside of them would be treated as a column separator which isn't the behaviour that is needed.

schlichtanders · 2023-11-10T14:33:29Z

Thank you for the explanation.

I read through the code and understood that a leaf block in CommonMark is kind of hardcoded to be multiline

leaf blocks consume full lines, sometimes skipping the start (like in lists) but never skipping the end

Hence I gave up 🙂 and instead build a tiny workaround which is much easier to maintain and should be enough for my case currently

using CommonMark
using Crayons

struct HtmlFragmentInlineRule end
struct HtmlFragmentInline <: CommonMark.AbstractInline end
function parse_html_fragment(parser::CommonMark.InlineParser, block::CommonMark.Node)
    m = CommonMark.consume(parser, match(r"<>.*</>", parser))
    m === nothing && return false
    node = CommonMark.Node(HtmlFragmentInline())
    node.literal = @views m.match[begin+length("<>"):end-length("</>")]
    CommonMark.append_child(block, node)
    return true
end
CommonMark.inline_rule(::HtmlFragmentInlineRule) = CommonMark.Rule(parse_html_fragment, 1.5, "<")

function CommonMark.write_term(::HtmlFragmentInline, render, node, enter)
    style = crayon"dark_gray"
    CommonMark.print_literal(render, style)
    CommonMark.push_inline!(render, style)
    CommonMark.print_literal(render, node.literal)
    CommonMark.pop_inline!(render)
    CommonMark.print_literal(render, inv(style))
end
CommonMark.write_html(::HtmlFragmentInline, r, n, ent) = CommonMark.literal(r, r.format.safe ? "<!-- raw HTML omitted -->" : n.literal)
CommonMark.write_latex(::HtmlFragmentInline, w, node, ent) = nothing
CommonMark.write_markdown(::HtmlFragmentInline, w, node, ent) = CommonMark.literal(w, node.literal)

with this I can now do

using CommonMark
parser = CommonMark.Parser()
CommonMark.enable!(parser, CommonMark.TableRule())
CommonMark.enable!(parser, HtmlFragmentInlineRule())

ast = parser("""
| title    |
| -------- |
|<><div> => </div></>|
""")
html(ast)

and it returns the html unchanged

"<table><thead><tr><th align=\"left\">title</th></tr></thead><tbody><tr><td align=\"left\"><div> => </div></td></tr></tbody></table>"

schlichtanders · 2023-11-10T14:34:57Z

@MichaelHatherly do you think this is worth adding to the extensions?

MichaelHatherly · 2023-11-10T14:39:14Z

and instead build a tiny workaround which is much easier to maintain and should be enough for my case currently

Glad you've found a solution.

do you think this is worth adding to the extensions?

For the time being I would suggest not making it an official extension and see how it evolves for your usecase. As I'm sure you can tell this package does not change much now so you're not going to run into any issues using some of those internals you needed. Let's just see whether it turns out to be something that a lot of users end up asking for and re-evaluate it in a while.

MichaelHatherly · 2023-11-10T14:40:41Z

m.match[begin+length("<>"):end-length("</>")]

Curious whether using a regex group rather than this work?

schlichtanders · 2023-11-10T14:49:14Z

m.match[begin+length("<>"):end-length("</>")]

Curious whether using a regex group rather than this work?

I was cautious not to interfere with the matching and consumption logic which is one reason why I am not using regex groups.
Do you think regex groups would have an advantage?

MichaelHatherly · 2023-11-10T14:54:49Z

I was just curious :)

schlichtanders · 2023-11-10T14:56:04Z

~~I guess they work to and are slightly faster :)~~

it does not work :D
the consume will return a string as a match and not the regex match itself. The string contains everything (as it probably should).

So the substring seems like the easiest solution

schlichtanders changed the title ~~[Bug?] CommonMark changes text inside <script> tags~~ [Bug?] CommonMark Markdown Table cells are not CommonMark leaf blocks Nov 9, 2023

schlichtanders mentioned this issue Nov 10, 2023

End html input elements with "/>" JuliaPluto/PlutoUI.jl#276

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug?] CommonMark Markdown Table cells are not CommonMark leaf blocks #66

[Bug?] CommonMark Markdown Table cells are not CommonMark leaf blocks #66

schlichtanders commented Nov 9, 2023 •

edited

Loading

MichaelHatherly commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

MichaelHatherly commented Nov 9, 2023

MichaelHatherly commented Nov 9, 2023

MichaelHatherly commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

MichaelHatherly commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

schlichtanders commented Nov 9, 2023 •

edited

Loading

MichaelHatherly commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

schlichtanders commented Nov 10, 2023

MichaelHatherly commented Nov 10, 2023

schlichtanders commented Nov 10, 2023

schlichtanders commented Nov 10, 2023

MichaelHatherly commented Nov 10, 2023

MichaelHatherly commented Nov 10, 2023

schlichtanders commented Nov 10, 2023 •

edited

Loading

MichaelHatherly commented Nov 10, 2023

schlichtanders commented Nov 10, 2023 •

edited

Loading

[Bug?] CommonMark Markdown Table cells are not CommonMark leaf blocks #66

[Bug?] CommonMark Markdown Table cells are not CommonMark leaf blocks #66

Comments

schlichtanders commented Nov 9, 2023 • edited Loading

better example

original example

Motivation

MichaelHatherly commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

MichaelHatherly commented Nov 9, 2023

MichaelHatherly commented Nov 9, 2023

MichaelHatherly commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

Parsed Correctly

Parsed Wrongly (?)

MichaelHatherly commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

schlichtanders commented Nov 9, 2023 • edited Loading

MichaelHatherly commented Nov 9, 2023

schlichtanders commented Nov 9, 2023

schlichtanders commented Nov 10, 2023

MichaelHatherly commented Nov 10, 2023

schlichtanders commented Nov 10, 2023

schlichtanders commented Nov 10, 2023

MichaelHatherly commented Nov 10, 2023

MichaelHatherly commented Nov 10, 2023

schlichtanders commented Nov 10, 2023 • edited Loading

MichaelHatherly commented Nov 10, 2023

schlichtanders commented Nov 10, 2023 • edited Loading

schlichtanders commented Nov 9, 2023 •

edited

Loading

schlichtanders commented Nov 9, 2023 •

edited

Loading

schlichtanders commented Nov 10, 2023 •

edited

Loading

schlichtanders commented Nov 10, 2023 •

edited

Loading