Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserving empty line nodes #264

Open
ijioio opened this issue Jul 29, 2022 · 7 comments
Open

Preserving empty line nodes #264

ijioio opened this issue Jul 29, 2022 · 7 comments

Comments

@ijioio
Copy link

ijioio commented Jul 29, 2022

I am using commonmark library to generate emails. It is a really nice and easy to use library. It have both HtmlRenderer for rich email body, and TextContentRenderer for plain text email body. Looks like perfect solution. The person who sends an emails, prepares it using markdown syntax, then content is processed by both parsers and embedded to the email as plain and html body. But the problem is that parser discards line breaks (or empty lines).

Example:

Parser parser = Parser.builder().build();

Node document = parser.parse("foo\n\nbar\n\nbaz");

HtmlRenderer htmlRenderer = HtmlRenderer.builder().build();
TextContentRenderer textRenderer = TextContentRenderer.builder().build();

htmlRenderer.render(document);
textRenderer.render(document);

will result in for html

<p>foo</p>
<p>bar</p>
<p>baz</p>

for plain text:

foo
bar
baz

Parser just discards the empty lines completely. While for html this is acceptable cause each line will be wrapped with paragraph, for plain text renderer it is critical as the text looses its formatting.

What I expect is that parser will handle empty lines as a separate nodes, but renderers can process them differently, html renderer can just skip it, while plain text renderer will append them to the ouptut:

foo

bar

baz

I was not dig too deep in the sources, so not sure if there is some possible solution for that at the current state...

@robinst
Copy link
Collaborator

robinst commented Oct 18, 2022

Hi @ijioio. As far as I understand, you're just unhappy with what the TextContentRenderer does with paragraphs, right? If it rendered it as follows:

foo

bar

baz

You would be happy?

I'm honestly not sure why it renders it this way at the moment:

foo
bar
baz

@JinneeJ do you remember what the reason was for this? We might have to make this configurable for backwards compat reasons, but it would still be good to know why.

@robinst
Copy link
Collaborator

robinst commented Oct 18, 2022

@ijioio Also note, you can already change this behavior with existing API via TextContentRenderer.builder().nodeRendererFactory(myFactory) and overriding the rendering of Paragraph nodes.

@JinneeJ
Copy link
Contributor

JinneeJ commented Oct 24, 2022

@JinneeJ do you remember what the reason was for this? We might have to make this configurable for backwards compat reasons, but it would still be good to know why.

@robinst I don't remember exactly 😅 I guess it didn't occur to me that we might need adding extra empty lines for paragraphs. But anyway having backward compatibility would be great.

@ijioio
Copy link
Author

ijioio commented Jan 15, 2023

Hi @ijioio. As far as I understand, you're just unhappy with what the TextContentRenderer does with paragraphs, right? If it rendered it as follows:

foo

bar

baz

You would be happy?

Yes, exactly! I just want plain emails still keep some logical structure of the original text.

@ijioio
Copy link
Author

ijioio commented Jan 15, 2023

@ijioio Also note, you can already change this behavior with existing API via TextContentRenderer.builder().nodeRendererFactory(myFactory) and overriding the rendering of Paragraph nodes.

Thank you for the suggestion. Not sure is it a universal solution though. I mean, that I will never know in advance in which commonmark syntax block these new line breaks will be used. Is that implies I need to override all possible cases?

@mattrob33
Copy link

+1 for the original feature request of actually preserving empty lines. As a specific use case, I'm using commonmark to build a sort of rich text editor using markdown as the data, which requires maintaining a mapping between the literal and rendered text. So, e.g., in

An _italic_ text.

when the user places the cursor at index 5 in the rendered text, this actually maps to index 6 in the markup text, to account for the first underscore.

Unfortunately this breaks with newlines since they are simply discarded, so I have no way of knowing how many newlines the user entered.

@robinst
Copy link
Collaborator

robinst commented Sep 18, 2024

@mattrob33 I think that's a different use case than the original reporter, which is for the TextContentRenderer. Could you create a new issue with your use case please? It sounds like you want the empty lines to be represented in the AST (Node) somehow? You might be able to do something with source spans, see includeSourceSpans on Parser.Builder?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants