New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emphasis intersection bug? #475

Open
aidantwoods opened this Issue May 13, 2017 · 9 comments

Comments

Projects
None yet
2 participants
@aidantwoods
Contributor

aidantwoods commented May 13, 2017

Given the following markdown:

**strong* still strong**

The online reference parser gives this output:

<p><em><em>strong</em> still strong</em>*</p>

However, using rule 15 (just below http://spec.commonmark.org/0.27/#can-open-emphasis)

  1. When two potential emphasis or strong emphasis spans overlap, so that the second begins before the first ends and ends after the first ends, the first takes precedence. Thus, for example, *foo _bar* baz_ is parsed as <em>foo _bar</em> baz_ rather than *foo <em>bar* baz</em>.

I think the output should be this

<p><strong>strong* still strong</strong></p>

Instead, the parser is behaving more like it would when faced with rule 16

  1. When there are two potential emphasis or strong emphasis spans with the same closing delimiter, the shorter one (the one that opens later) takes precedence. Thus, for example, **foo **bar baz** is parsed as **foo <strong>bar baz</strong> rather than <strong>foo **bar baz</strong>.

Even though these emphasis and strong emphasis spans do not have the same closing delimiter (so this rule should not apply).


Note that I am assuming that the phrase same closing delimiter (which is not formally defined) is referring to a delimiter run as categorised by its starting position in the string (this holds for the example given).

@aidantwoods

This comment has been minimized.

Contributor

aidantwoods commented May 13, 2017

Apologies, correction: it appears that rule 15 does not apply because the given example does not overlap in the way specified.

In which case rule 13 should be applied as far as I can tell:

  1. The number of nestings should be minimized. Thus, for example, an interpretation <strong>...</strong> is always preferred to <em><em>...</em></em>.

In any case,

<p><strong>strong* still strong</strong></p>

is prefered by this rule too.

@aidantwoods

This comment has been minimized.

Contributor

aidantwoods commented Jun 2, 2017

Most implementations seems to agree with this http://johnmacfarlane.net/babelmark2/?text=**strong*+still+strong**

(though they don't appear on the page for me, can get some results by going through the network responses):
screen shot 2017-06-02 at 19 12 33
screen shot 2017-06-02 at 19 12 43
screen shot 2017-06-02 at 19 12 49
screen shot 2017-06-02 at 19 12 52
screen shot 2017-06-02 at 19 14 26
screen shot 2017-06-02 at 19 14 30
screen shot 2017-06-02 at 19 14 35
etc...

@jgm

This comment has been minimized.

Member

jgm commented Mar 25, 2018

Although I agree that the interpretation of this case is unexpected, I've come to appreciate that it's not possible to give a spec for emphasis that gives "intuitive" results in every case. The best we can do is to minimize the unintuitive cases, and the principles in the spec (now fairly complex) have been motivated by consideration of a large number of cases.

Maybe there's a way to modify the principles to get the "intuitive" result in your case without messing delivering other unintuitive results elsewhere, and without making it impossible to parse emphasis efficiently. We're very open to suggestions there. But otherwise we may have to accept that this is a case where you need to backslash-escape the asterisk. Not a big deal.

@jgm

This comment has been minimized.

Member

jgm commented Mar 25, 2018

So I'll close this, but feel free to re-open if you have a specific proposal.

@jgm jgm closed this Mar 25, 2018

@aidantwoods

This comment has been minimized.

Contributor

aidantwoods commented Mar 25, 2018

My concern, I suppose, isn't that it's an unintuitive result, rather that the result given by the reference parser contradicts the spec as far as I can tell?
I wonder if perhaps we could extract the decision made by the reference parser so that the behaviour here could be formalised.

I don't much mind which result we pick (I would perhaps lean toward the one that I say is "expected" but it doesn't matter so much), rather I think it is important that the result is defined :)

(I'm painfully aware of how complex emphasis parsing is already)

@aidantwoods

This comment has been minimized.

Contributor

aidantwoods commented Mar 25, 2018

(Btw apparently I can't re-open the issue if you closed it o.0 – behaviour is news to me)

@jgm jgm reopened this Mar 26, 2018

@jgm

This comment has been minimized.

Member

jgm commented Mar 26, 2018

Sorry, in skimming this I saw the point about rule 15 not applying, but missed the point about rule 13 applying.... I'll re-open.

@jgm

This comment has been minimized.

Member

jgm commented Mar 26, 2018

Rule 13 is a bit vague; perhaps if we try to be more precise about what is meant by "minimize nesting", we can bring the spec in conformity with the way the reference parsers treat this case. E.g. maybe we could just say:

  1. In cases of ambiguity, an interpretation <strong>...</strong> is always preferred to <em><em>...</em></em>.
@aidantwoods

This comment has been minimized.

Contributor

aidantwoods commented Mar 26, 2018

I think if you remove the phrase "minimise nesting" then this case becomes undefined. Since the difference between what I've called the expected result and the reference parser's isn't picking <strong>...</strong> over <em><em>...</em></em>, but rather it is picking <strong>...*...</strong> over <em><em>...</em>...</em>*. Without this rule I'm not sure how to choose between these results?
i.e. the difference between these results does minimise nesting, but it's a little more than just picking <strong>...</strong> over <em><em>...</em></em> because there are positional changes in the structure that come with that choice (i.e. different *s are used).

Just to be clear here, which result are we aiming for? :)

If we're aiming for the result that the reference parser currently gives (<em>s), then I think then rule 13 needs to change to what you've said to allow for this result to exist – but I think there should be an additional rule that makes clear why this result occurs (i.e. need to make it a possible result, and then specify why to pick it over the other one).

If on the other hand we are aiming for the "expected result" (<strong>s), then I think this is only a parser bug.

@jgm jgm added this to the 0.29 milestone Aug 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment