Too eager parsing of italic markup #668

munen · 2021-05-08T12:55:21Z

*** TODO organice parser error: Foo/Bar/Baz
Foo/Bar/Baz

Is shown as:

The text was updated successfully, but these errors were encountered:

schoettl · 2021-05-08T17:15:40Z

I think that's a known bug and applies for all kind of markup. I think I remember that, once I worked on the regex parser, I decided that it's too hard to fix and postponed it to the future clojure parser :) But maybe a fresh look on the regex parser allows you to fix it.

schoettl · 2021-05-08T17:20:25Z

See also #94 and #198

munen · 2021-05-08T19:53:54Z

I don’t think it’s a duplicate. The other issues are about the parser choking on lines containing multiple inline markup statements.

This issue is much more simple: Foo/Bar/Baz is just text. It shouldn’t be italicized at all^^

schoettl · 2021-05-08T20:05:33Z

Yep, the issues linked above are only related. It's probably the same regex.

Anyway, the problem applies to all kind of markup, e.g. Foo+Bar+Baz

munen · 2021-05-09T06:53:38Z

You are correct, it does apply to all kind of inline markup. I've written a test and looked at the Regexp. But I don't understand why it doesn't work as written at the moment(;

munen · 2021-05-17T10:37:31Z

Worse example with only one slash:

“organice supports parsing and preserving the minimum/maximum range timestamps.” becomes:

lechten · 2022-12-03T10:14:35Z

I did not look at the proper Org syntax, but what if we just required a whitespace before inline markup?

Thus, replace [*/~=_+] with \s[*/~=_+]?

munen · 2022-12-03T10:57:52Z

@lechten Your proposal fixes some terms like foo/bar and foo/bar/baz, but it breaks terms like *bold*. I tried adding to your proposal with [^\s][*/~=_+], but that's not working as expected. I must make a trivial mistake here^^

munen · 2022-12-03T11:05:58Z

Partial update: The ^ negates the \s, of course(; But [\s^] and [\^\s] also don't work.

lechten · 2022-12-03T11:11:06Z

The ^ does not work inside square brackets (where it is just another character). We might add (^|\s) but this adds another group, which seems to ask for lots of subsequent changes...

munen · 2022-12-03T11:12:31Z

hihi, we're looking at the same options currently. I'm checking if there's a way for a match group to basically be ignored. But that's probably against the whole idea of match groups.

munen · 2022-12-03T11:13:53Z

There is a way! ((?:\s|^)?[*/~=_+]) looks promising.

munen · 2022-12-03T11:15:38Z

Looked promising and passes all tests, but it breaks foo/bar/baz, again-_-

munen · 2022-12-03T11:19:04Z

((?:^|\s+)[*/~=_+]) is most promising atm, but it fails *bold*. (There's blanks before the *bold* term, GH just opts to not render them).

munen · 2022-12-03T11:36:10Z

I think ((?:^|\s+)[*/~=_+]) would actually be a good solution, however the match group just direct before (([\s({'"]?)((?:^|\s+)[*/~=_+]) also has a \s which interferes...

Welcome to regexp based parsing hell(;

As suggested in issue 200ok-ch#668, markup should only be used at "word" boundaries. Thus, make the previous prefix non-optional and add "^". Also, in response to the examples given in PR 200ok-ch#910, allow to mark up single characters.

lechten · 2022-12-03T15:34:48Z

I hope that PR #910 fixes this now (one more commit not mentioning this, sorry).

munen · 2022-12-03T16:44:37Z

Closed by the amazing work of @lechten in #910 🎆 🎊

munen added bug Something isn't working parser labels May 8, 2021

munen mentioned this issue May 9, 2021

WIP test: Too eager parsing of italic markup #669

Open

This was referenced May 24, 2022

As soon as our new BNF-based parser (separate repo) is ready, replace the parse code #18

Open

Toggling checkbox deletes marked-up text #796

Closed

munen closed this as completed Dec 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too eager parsing of italic markup #668

Too eager parsing of italic markup #668

munen commented May 8, 2021

schoettl commented May 8, 2021

schoettl commented May 8, 2021

munen commented May 8, 2021

schoettl commented May 8, 2021

munen commented May 9, 2021

munen commented May 17, 2021

lechten commented Dec 3, 2022

munen commented Dec 3, 2022

munen commented Dec 3, 2022

lechten commented Dec 3, 2022

munen commented Dec 3, 2022

munen commented Dec 3, 2022

munen commented Dec 3, 2022

munen commented Dec 3, 2022 •

edited

munen commented Dec 3, 2022

lechten commented Dec 3, 2022

munen commented Dec 3, 2022

Too eager parsing of italic markup #668

Too eager parsing of italic markup #668

Comments

munen commented May 8, 2021

schoettl commented May 8, 2021

schoettl commented May 8, 2021

munen commented May 8, 2021

schoettl commented May 8, 2021

munen commented May 9, 2021

munen commented May 17, 2021

lechten commented Dec 3, 2022

munen commented Dec 3, 2022

munen commented Dec 3, 2022

lechten commented Dec 3, 2022

munen commented Dec 3, 2022

munen commented Dec 3, 2022

munen commented Dec 3, 2022

munen commented Dec 3, 2022 • edited

munen commented Dec 3, 2022

lechten commented Dec 3, 2022

munen commented Dec 3, 2022

munen commented Dec 3, 2022 •

edited