-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging repeated elements without any non-whitespace content between them #270
Comments
Hi @zumoshi, yes this is a bit of a tricky one, and has been discussed in #123 in particular in this comment: #123 (comment) |
perhaps I should've made two separate issues for newline one and merging tags. I don't think it's the same issue though. the discussion you linked to had problems figuring out the ideal markdown output since no such output existed that would've recreated the input HTML. ignoring the first part, for now, I think the example I provided is actually a bug. turndown's demo converts the following: <strong>
a
<br>
</strong>
b into: **a
**b and giving that to commonmark's demo will give: <p>**a<br />
**b</p> which doesn't involve bolding at all. not all markdown implantations do this of course, but if you want to be compatible with common mark, according to section 6.4, example 347:
same goes for starting delimiter of most tags. you would need to push the whitespace to outside and put delimiters right before/after the first/last non-whitespace character. (unless I'm mistaken and the default rule-set is not based on commonmark, and I messed up something in the config of the demo to get this non-standard output) |
If I understand correctly, I think it might be related to the other issue, which discusses I wonder if the most pragmatic approach would be to not convert inline elements with |
according to common mark spec 6.9, there are two ways to generate an output which results in linebreaks for the input I gave: **a**
b (note the spaces after the first line) **a**\
b this is related to the issue you mentioned, since using this method double however, my issue is not with how the newlines are handled, rather with the placement of delimiters. if you look at the spec links I gave in my last message, it explicitly says there should be no whitespace (including newline) right after the starting delimiter, and right before ending delimiter. while turndown's generated code for that input, puts a newline as the last character inside the bold section (i.e. a newline before |
In this case, the two are linked. The markdown examples given both result in the following when converted to HTML: <strong>a</strong><br />
b … which is not the same as the original: <strong>a<br /></strong>
b Turndown handles I think there are a few possibilities for solving this issue:
|
I would argue they are. Whitespace characters can't be bold. So does it really makes any difference if a newline or space is inside or outside a Right now the output doesn't result in a strong tag at all, I would prefer the |
I don't think it is Turndown's responsibility to fix up poorly generated HTML. You may wish to parse the HTML string yourself, manipulate the HTML to your required structure then pass in the DOM tree to |
It has come up multiple times now that people want/need their HTML fixed before conversion. It may be useful to recommend a different utility that shakes out and properly rearranges HTML tag nesting, and provide a way to attach it to this one? |
@spirograph 👍 do you know of any libraries that will do this? |
many WYSIWYG HTML editors leave a lot of artifacts, spamming elements being one example. currently, turndown converts the following:
to
which is not wrong, but
**a b**
would've been preferred. the actual code examples are not so short and are spammed with repeated tags that make the output less readable.a similar issue is about general handling of whitespace. for example:
is converted to:
while the ideal output would've been:
I'm not sure how complex it would be to make these changes, but generally ignoring whitespace between multiple tags of the same kind, and pushing whitespace from beginning and end to the outside of tags would increase the quality of output for my main use case a lot.
thanks.
The text was updated successfully, but these errors were encountered: