Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-spaced tokens with emphasis seems not parsed as expected #488

Open
gfx opened this issue Aug 10, 2017 · 9 comments
Open

Non-spaced tokens with emphasis seems not parsed as expected #488

gfx opened this issue Aug 10, 2017 · 9 comments

Comments

@gfx
Copy link

gfx commented Aug 10, 2017

We have introduced CommonMark to our web service and found that some emphasises do not work as expected if they are in non-spaced tokens and links. Non-spaced tokens are important for agglutinative languages like Japanese, so I think it is a spec bug.

Example

Expect:

foo**bar**baz

foo**[bar](#)**baz

to be:

foo<strong>bar</strong>baz

foo<strong><a href="#">bar</a></strong>baz

but got:

foo<strong>bar</strong>baz # OK

foo**<a href="#">bar</a>**baz # NG: ** are not parsed as emphasis

Environment

  • CommonMarker v0.16.8
  • commonmark.js v0.28.1
    • e.g. echo 'foo**[bar](#)**baz' | commonmark -- /dev/stdin
@gfx gfx changed the title Non-spaced tokens with emphasis seems storange Non-spaced tokens with emphasis seems not parsed as expected Aug 10, 2017
@aidantwoods
Copy link
Contributor

cc @iology, we've had a discussion which briefly touched on some potential issues with spaces in Chinese texts when using the CommonMark emphasis, so incase you had any thoughts here.

@gfx
Copy link
Author

gfx commented Aug 17, 2017

Interestingly, most of markdown parsers that are listed in babelmark2 can parse it as expected.

babelmark2 foo**[bar](#)**baz

I'll try to make a patch to fix it.

@mity
Copy link

mity commented Aug 17, 2017

I'll try to make a patch to fix it.

Well, it could make sense that the left-flanking and right-flanking runs do not take punctuation into an account if that character was consumed in a link or another syntax Markdown construct. But it can be relatively e difficult to implement.

Interestingly, most of markdown parsers that are listed in babelmark2 can parse it as expected.

But it seems, most of them are not that clever, but so simplistic: Punctuation obviously has no impact on left-flanking and right-flanking runs for (most of) them. See foo**+**baz.

@jgm
Copy link
Member

jgm commented Mar 25, 2018

One possibility would be to say that a delim run that is immediately to the left of an open parenthesis, bracket, or brace is automatically left flanking, and a delim run that is immediately to the right of a close parenthesis, bracket, or brace is automatically right flanking.

Any thoughts about this proposal?

@mity
Copy link

mity commented Mar 25, 2018

Any thoughts about this proposal?

The proposal would "fix" foo**[bar](#)**baz and "break" foo[**bar**](#)baz instead. So the overall score would be the same, at the cost of yet another rule to implement.

@jgm
Copy link
Member

jgm commented Mar 26, 2018 via email

@mity
Copy link

mity commented Mar 26, 2018

Oops. You are right. Still I am not sure about it.

Although it is artificial example, consider trying to make xxx bold in xxx(. Note that escaping can help you here only if ( does not form e.g. start of a link.

Also the proposal does not help in situations where the punctuation character itself does not imply opening/closing per se, yet within context of markdown syntax parsing it does:

**&sum;**x

or:

foo**![bar](#)**baz

@mity
Copy link

mity commented Mar 26, 2018

Also, when I return to the original report, I might want as well to make bold the text span before and/or after the link:

**foo**[bar](#)baz
foo[bar](#)**baz**
**foo**[bar](#)**baz**

@jgm
Copy link
Member

jgm commented Mar 26, 2018

I don't think my proposal creates any problems for

**foo**[bar](#)baz

or these others. With current spec rules, the first ** in this example is left-flanking but not right-flanking, and the second ** is right-flanking but not left-flanking. With the change I suggested, the second ** would be BOTH right-flanking and left-flanking. So you'd still get boldface here.

The example

**&sum;**x

is not handled by my proposal, but I'm less worried about this kind of case. The example

foo**![bar](#)**baz

is also not handled. This, too, seems like something that will come up much more rarely than a boldface link. (Why would you boldface an image anyway?) So I'm less concerned about it.

Of course, we could handle all these cases if we wanted to, by looking at more context than just the immediately preceding and following characters, but that makes parsing more complicated and maybe less efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants