Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra newlines between HTML blocks should be preserved #16

Open
tavianator opened this issue Jul 17, 2020 · 3 comments
Open

Extra newlines between HTML blocks should be preserved #16

tavianator opened this issue Jul 17, 2020 · 3 comments
Labels
bug Something isn't working

Comments

@tavianator
Copy link
Contributor

tavianator commented Jul 17, 2020

stupicat turns this:

<p>
foo
</p>

<script>
let bar;

let baz;
</script>

into this:

<p>
foo
</p>
<script>
let bar;

let baz;
</script>

But apparently without the extra newline, pulldown-cmark doesn't think the <script> tag is its own block:

$ pulldown-cmark <foo.md
<p>
foo
</p>
<script>
let bar;

let baz;
</script>
$ cargo run --example stupicat -- foo.md | pulldown-cmark 
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/examples/stupicat foo.md`
<p>
foo
</p>
<script>
let bar;
<p>let baz;
</script></p>

Obviously the tags in the middle of the <script> mangle everything.

tavianator added a commit to tavianator/tavianator.com that referenced this issue Jul 18, 2020
@Byron Byron added the bug Something isn't working label Aug 2, 2020
@Byron
Copy link
Owner

Byron commented Aug 2, 2020

Thanks a lot for letting me know!
Here are some notes in case somebody wants to pick this issue up. pulldown-cmark isn't really supporting round trips, that is it degenerates information so it's hard to turn that into something that will parse exactly the same. Thus it's likely that fixing this will require additional state to be kept to detect such patterns and do something differently, possibly breaking something else in the process.
They are various unit tests and integration tests (via stupicat) which can be used to verify it's still working as expected, or adjust the expectations, after all, even what's there is certainly not perfect for more complex markdown.

@tavianator
Copy link
Contributor Author

Yeah the event streams are quite hard to tell apart:

$ echo '<p>\nfoo\n</p>\n<script>\na\n\nb\n</script>' | pulldown-cmark -e
0..4: Html(Borrowed("<p>\n"))
4..8: Html(Borrowed("foo\n"))
8..13: Html(Borrowed("</p>\n"))
13..22: Html(Borrowed("<script>\n"))
22..24: Html(Borrowed("a\n"))
25..37: Start(Paragraph)
25..26: Text(Borrowed("b"))
26..27: SoftBreak
27..36: Html(Borrowed("</script>"))
25..37: End(Paragraph)
EOF
$ echo '<p>\nfoo\n</p>\n\n<script>\na\n\nb\n</script>' | pulldown-cmark -e
0..4: Html(Borrowed("<p>\n"))
4..8: Html(Borrowed("foo\n"))
8..13: Html(Borrowed("</p>\n"))
14..23: Html(Borrowed("<script>\n"))
23..25: Html(Borrowed("a\n"))
25..26: Html(Borrowed("\n"))
26..28: Html(Borrowed("b\n"))
28..38: Html(Borrowed("</script>\n"))
EOF

There's no difference until it actually encounters the blank line at position 25. That's potentially a lot of state to have to buffer. Maybe it's better to fix this in pulldown-cmark itself somehow?

@Byron
Copy link
Owner

Byron commented Aug 5, 2020

Yes, that would be optimal. My guess is that this one has the potential for a bigger conversation, as it really asks the question if pulldown-cmark should consider degenerating information to be a bug.

If the answer is negative, we can only try to fix things here with more state and generally, complication, which I am to some extend OK with as long as I don't have to write it :D.

If the answer is positive, I feel that the to-cmark part would have to become part of pulldown-cmark anyway in order to be able to test that properly and protect from regressions.
The latter certainly seems like something worth checking with them… .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants