-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug?] CommonMark Markdown Table cells are not CommonMark leaf blocks #66
Comments
CommonMark (the spec) has got some specific rules for handling of blocks of HTML embedded in markdown, which is what I'vve followed in this implementation. I'd suggest splitting the embedded HTML over several lines, e.g.
which, I think, might be closer to what you want. |
This is surprising indeed! I just made everything into a single line to be able to interpolate it inside a Markdown Table (which will obviously break with newlines) |
Yeah, behaviour is looking pretty consistent with https://spec.commonmark.org/dingus/, so there's not much that we can really change in this package since it appears (at least from what I've tried) to be the same as the spec.
Hmm, yeah, tricky. There's really just a limit to the complexity that commonmark can really handle before it ends up not really being commonmark any more. |
https://spec.commonmark.org/0.30/#html-blocks for the link to the HTML block spec that is causing this. |
Only solution I see really is multiline cells for markdown tables. I don't really have the need for it, but I would accept any PRs that can manage to implement it as a parser plugin, which I reckon is probably doable, if a bit complex. |
Multiline cells would not work, as they also would get destroyed by a interpolated newline. I still do not understand why this is not treated as html from CommonMark: Is it because inside markdown tables html tags are never intrpreted as html, because the entire line does not start with an html tag? |
My current understanding of the specs:
Hence every cell in a markdown table is a leaf block (what else should it be?). But then, the html in the cell should be recognized as html, shouldn't it? |
I just tried wrapping the above into but inside the markdown table it breaks |
Here new minimal examples: Parsed Correctly
is parsed as
Parsed Wrongly (?)
is parsed as
the |
The table parser was modeled on the |
As you probably already know, the pandoc github_markdown parser has the same unfortunate behaviour. Summary:
Hence I would argue that it makes sense to treat markdown table cells as leaf blocks in CommonMark.jl |
Github markdown seems different from pandocs implementation - you can inspect this very webpage - it shows |
Feel free to submit a PR that adjusts that behaviour. I'm unlikely to get around to it myself since I'm not really in need of embedded HTML inside table cells myself. |
I just tested about 4 other python markdown parsers - and they all have the same problem... probably they all copy pandoc's implementation. Hence I have no easy workaround 😅 I indeed may find resources to create the PullRequest soon. Let's see. I am glad you agree that this would be a good addition. |
@MichaelHatherly a short question - the code mentions that emphasis and links necesarily need to be determined before the TableCells are build up. Do you know why this is a # Low priority since this *must* happen after nested structure of emphasis and
# links is determined. 100 should do fine.
inline_modifier(rule::TableRule) = Rule(100) do parser, block |
In the case where there are |
Thank you for the explanation. I read through the code and understood that a leaf block in CommonMark is kind of hardcoded to be multiline
Hence I gave up 🙂 and instead build a tiny workaround which is much easier to maintain and should be enough for my case currently using CommonMark
using Crayons
struct HtmlFragmentInlineRule end
struct HtmlFragmentInline <: CommonMark.AbstractInline end
function parse_html_fragment(parser::CommonMark.InlineParser, block::CommonMark.Node)
m = CommonMark.consume(parser, match(r"<>.*</>", parser))
m === nothing && return false
node = CommonMark.Node(HtmlFragmentInline())
node.literal = @views m.match[begin+length("<>"):end-length("</>")]
CommonMark.append_child(block, node)
return true
end
CommonMark.inline_rule(::HtmlFragmentInlineRule) = CommonMark.Rule(parse_html_fragment, 1.5, "<")
function CommonMark.write_term(::HtmlFragmentInline, render, node, enter)
style = crayon"dark_gray"
CommonMark.print_literal(render, style)
CommonMark.push_inline!(render, style)
CommonMark.print_literal(render, node.literal)
CommonMark.pop_inline!(render)
CommonMark.print_literal(render, inv(style))
end
CommonMark.write_html(::HtmlFragmentInline, r, n, ent) = CommonMark.literal(r, r.format.safe ? "<!-- raw HTML omitted -->" : n.literal)
CommonMark.write_latex(::HtmlFragmentInline, w, node, ent) = nothing
CommonMark.write_markdown(::HtmlFragmentInline, w, node, ent) = CommonMark.literal(w, node.literal) with this I can now do using CommonMark
parser = CommonMark.Parser()
CommonMark.enable!(parser, CommonMark.TableRule())
CommonMark.enable!(parser, HtmlFragmentInlineRule())
ast = parser("""
| title |
| -------- |
|<><div> => </div></>|
""")
html(ast) and it returns the html unchanged
|
@MichaelHatherly do you think this is worth adding to the extensions? |
Glad you've found a solution.
For the time being I would suggest not making it an official extension and see how it evolves for your usecase. As I'm sure you can tell this package does not change much now so you're not going to run into any issues using some of those internals you needed. Let's just see whether it turns out to be something that a lot of users end up asking for and re-evaluate it in a while. |
Curious whether using a regex group rather than this work? |
I was cautious not to interfere with the matching and consumption logic which is one reason why I am not using regex groups. |
I was just curious :) |
it does not work :D So the substring seems like the easiest solution |
better example
it turned out the actual confusion is better grasped by different examples
see #66 (comment)
original example
given the following text
it works perfectly in HTML, but CommonMark exchanges symbols such that the script does no longer work
EDIT: the above does not parse because the html tag
<bond>
is non-standard. It is confusing example. Better look at #66 (comment)I am using the following parser
Motivation
I am using CommonMark to create Pluto interface for Python. There the most intuitive way for interpolation is to interpolate valid html text into the plain markdown string. Hence it would be really really great, if <script> tags could be supported (they appear already in PlutoUI.Slider for instance)
The text was updated successfully, but these errors were encountered: