Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocumentParser drops whitespaces at the beginning of a line #301

Open
panpeter opened this issue Oct 20, 2023 · 2 comments
Open

DocumentParser drops whitespaces at the beginning of a line #301

panpeter opened this issue Oct 20, 2023 · 2 comments
Labels

Comments

@panpeter
Copy link

When using DocumentParser with no enabled block types, the Text nodes do not include whitespaces at the beginning of a line.

For example, when we use a parser with no enabled block types and the input is:

- text 1
  - text 2

the expected result is a document containing two Text nodes:

  • Text("- text 1")
  • Text(" - text 2")

but the second Text is "- text 2" (without preceding whitespaces)

To better illustrate here is a sample test that fails:

public class ParserTest {
    
    ...
    
    @Test
    public void noBlockTypes() {
        String given = "- text 1\n  - text 2";
        Parser parser = Parser.builder().enabledBlockTypes(Collections.<Class<? extends Block>>emptySet()).build();
        Node document = parser.parse(given);

        Node child = document.getFirstChild();
        assertThat(child, instanceOf(Paragraph.class));

        child = child.getFirstChild();
        assertThat(child, instanceOf(Text.class));
        assertEquals("- text 1", ((Text) child).getLiteral());

        child = child.getNext();
        assertThat(child, instanceOf(SoftLineBreak.class));

        child = child.getNext();
        assertThat(child, instanceOf(Text.class));
        assertEquals("  - text 2", ((Text) child).getLiteral());
    }
}
@panpeter panpeter added the bug label Oct 20, 2023
@robinst
Copy link
Collaborator

robinst commented Feb 8, 2024

The reason for this is the paragraph parser. The spec says that leading whitespace is skipped: https://spec.commonmark.org/0.31.2/#example-222

Not sure how we would handle it. We can't add the leading whitespace to the literal of Text nodes as that would change rendering for existing code, but maybe we could add it as another attribute.

Note that you should be able to work around this limitation by checking the source spans of the text (see includeSourceSpans on Parser.Builder).

@robinst
Copy link
Collaborator

robinst commented Mar 9, 2024

See also #290 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants