Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the whitespaces before backslash hard line break be removed? #724

Open
chenzhiguang opened this issue Oct 19, 2022 · 6 comments
Open

Comments

@chenzhiguang
Copy link

In other words, should parse the backslash and the proceeding whitespace as a hard line break as a whole. or leave the proceeding whitespaces and only parse the backslash as a hard line break?

for example, should parse

a  \
b

into

a<br />b

or

a <br />b

The two or more spaces hard line break is clear, all the whitespaces before the line ending represent a hard line break, but I didn't find any mention of backslash hard line break anywhere.

@wooorm
Copy link
Contributor

wooorm commented Oct 19, 2022

As far as I am aware this is not explained somewhere in the spec. I don’t think it needs to. In my mind, it’s similar to putting anything else at the end of a line, such as:

a  &amp;
b

->

<p>a  &amp;
b</p>

Or:

a  b
c
<p>a  b
c</p>

@chenzhiguang
Copy link
Author

But it still maters in some situation, for example

  1. two or more spaces hard line break, 4 spaces before the line ending.
a    
b

Output to AST:

[
  {
    "type": "paragraph",
    "start": {
      "line": 0,
      "column": 0,
      "offset": 0
    },
    "end": {
      "line": 1,
      "column": 1,
      "offset": 7
    },
    "children": [
      {
        "text": "a",
        "start": {
          "line": 0,
          "column": 0,
          "offset": 0
        },
        "end": {
          "line": 0,
          "column": 1,
          "offset": 1
        }
      },
      {
        "type": "hardLineBreak",
        "start": {
          "line": 0,
          "column": 1,
          "offset": 1
        },
        "end": {
          "line": 0,
          "column": 5,
          "offset": 5
        },
        "markers": [
          {
            "start": {
              "line": 0,
              "column": 1,
              "offset": 1
            },
            "end": {
              "line": 0,
              "column": 5,
              "offset": 5
            },
            "text": "    "
          }
        ]
      },
      {
        "text": "b",
        "start": {
          "line": 1,
          "column": 0,
          "offset": 6
        },
        "end": {
          "line": 1,
          "column": 1,
          "offset": 7
        }
      }
    ]
  }
]

The offset from 1 to 4 hit the hardLineBreak marker.

  1. backslash hardline break, a backslash proceeded by 3 spaces:
a   \
b

If we do not count these proceeding spaces as part of the hard line break, the AST output will be:

[
  {
    "type": "paragraph",
    "start": {
      "line": 0,
      "column": 0,
      "offset": 0
    },
    "end": {
      "line": 1,
      "column": 1,
      "offset": 7
    },
    "children": [
      {
        "text": "a   ",
        "start": {
          "line": 0,
          "column": 0,
          "offset": 0
        },
        "end": {
          "line": 0,
          "column": 4,
          "offset": 4
        }
      },
      {
        "type": "hardLineBreak",
        "start": {
          "line": 0,
          "column": 4,
          "offset": 4
        },
        "end": {
          "line": 0,
          "column": 5,
          "offset": 5
        },
        "markers": [
          {
            "start": {
              "line": 0,
              "column": 4,
              "offset": 4
            },
            "end": {
              "line": 0,
              "column": 5,
              "offset": 5
            },
            "text": "\\"
          }
        ]
      },
      {
        "text": "b",
        "start": {
          "line": 1,
          "column": 0,
          "offset": 6
        },
        "end": {
          "line": 1,
          "column": 1,
          "offset": 7
        }
      }
    ]
  }
]

This way, only offset 5 is the hard line break marker.

There might be no difference when rendered to HTML. but in markdown editor, it might matter.

@wooorm
Copy link
Contributor

wooorm commented Oct 19, 2022

If you have a problem with an AST, this is not the place to report it. This spec does not define ASTs.

@chenzhiguang
Copy link
Author

I meant I do not have a clear specification to follow when parsing the backslash hard line break to an AST. whether or not count the spaces before the backslash as a part of the hard line break will output different ASTs

@wooorm
Copy link
Contributor

wooorm commented Oct 19, 2022

I think this is a problem in your AST, and unrelated to this specification.
a) I believe the tool generating the AST should remove the whitespace: the text should be 'a', not 'a '
b) the tool generating markdown from the AST should prefer hard breaks with a backslash too, they’re more clear and have a higher chance of working now that CommonMark is basically everywhere, as editor tooling will typically remove trailing whitespace.

I don’t believe there is anything that has to happen in this project.

If you’re interested in AST tools that do generate such as AST, and AST tools tools that do serialize with backslashes, you might find my projects mdast, mdast-util-from-markdown, and mdast-util-to-markdown useful.

@chenzhiguang
Copy link
Author

Thanks a lot! Yep, the trailing whitespace should always be removed if there is not a specific reason.

This is my project dart_markdown, a Markdown to AST parser, which is definitely inspired by your mdast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants