Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too eager parsing of italic markup #668

Closed
munen opened this issue May 8, 2021 · 17 comments
Closed

Too eager parsing of italic markup #668

munen opened this issue May 8, 2021 · 17 comments
Labels
bug Something isn't working parser

Comments

@munen
Copy link
Collaborator

munen commented May 8, 2021

*** TODO organice parser error: Foo/Bar/Baz
Foo/Bar/Baz

Is shown as:

image

@munen munen added bug Something isn't working parser labels May 8, 2021
@schoettl
Copy link
Collaborator

schoettl commented May 8, 2021

I think that's a known bug and applies for all kind of markup. I think I remember that, once I worked on the regex parser, I decided that it's too hard to fix and postponed it to the future clojure parser :) But maybe a fresh look on the regex parser allows you to fix it.

@schoettl
Copy link
Collaborator

schoettl commented May 8, 2021

See also #94 and #198

@munen
Copy link
Collaborator Author

munen commented May 8, 2021

I don’t think it’s a duplicate. The other issues are about the parser choking on lines containing multiple inline markup statements.

This issue is much more simple: Foo/Bar/Baz is just text. It shouldn’t be italicized at all^^

@schoettl
Copy link
Collaborator

schoettl commented May 8, 2021

Yep, the issues linked above are only related. It's probably the same regex.

Anyway, the problem applies to all kind of markup, e.g. Foo+Bar+Baz

@munen
Copy link
Collaborator Author

munen commented May 9, 2021

You are correct, it does apply to all kind of inline markup. I've written a test and looked at the Regexp. But I don't understand why it doesn't work as written at the moment(;

@munen
Copy link
Collaborator Author

munen commented May 17, 2021

Worse example with only one slash:

“organice supports parsing and preserving the minimum/maximum range timestamps.” becomes:

C144C10B-A184-45F9-92DB-725D3927B7FE

@lechten
Copy link
Contributor

lechten commented Dec 3, 2022

I did not look at the proper Org syntax, but what if we just required a whitespace before inline markup?

Thus, replace [*/~=_+] with \s[*/~=_+]?

@munen
Copy link
Collaborator Author

munen commented Dec 3, 2022

@lechten Your proposal fixes some terms like foo/bar and foo/bar/baz, but it breaks terms like *bold*. I tried adding to your proposal with [^\s][*/~=_+], but that's not working as expected. I must make a trivial mistake here^^

@munen
Copy link
Collaborator Author

munen commented Dec 3, 2022

Partial update: The ^ negates the \s, of course(; But [\s^] and [\^\s] also don't work.

@lechten
Copy link
Contributor

lechten commented Dec 3, 2022

The ^ does not work inside square brackets (where it is just another character). We might add (^|\s) but this adds another group, which seems to ask for lots of subsequent changes...

@munen
Copy link
Collaborator Author

munen commented Dec 3, 2022

hihi, we're looking at the same options currently. I'm checking if there's a way for a match group to basically be ignored. But that's probably against the whole idea of match groups.

@munen
Copy link
Collaborator Author

munen commented Dec 3, 2022

There is a way! ((?:\s|^)?[*/~=_+]) looks promising.

@munen
Copy link
Collaborator Author

munen commented Dec 3, 2022

Looked promising and passes all tests, but it breaks foo/bar/baz, again-_-

@munen
Copy link
Collaborator Author

munen commented Dec 3, 2022

((?:^|\s+)[*/~=_+]) is most promising atm, but it fails *bold*. (There's blanks before the *bold* term, GH just opts to not render them).

@munen
Copy link
Collaborator Author

munen commented Dec 3, 2022

I think ((?:^|\s+)[*/~=_+]) would actually be a good solution, however the match group just direct before (([\s({'"]?)((?:^|\s+)[*/~=_+]) also has a \s which interferes...

Welcome to regexp based parsing hell(;

lechten added a commit to lechten/organice that referenced this issue Dec 3, 2022
As suggested in issue 200ok-ch#668, markup should only be used at "word"
boundaries.  Thus, make the previous prefix non-optional and add "^".

Also, in response to the examples given in PR 200ok-ch#910, allow to mark up
single characters.
@lechten
Copy link
Contributor

lechten commented Dec 3, 2022

I hope that PR #910 fixes this now (one more commit not mentioning this, sorry).

@munen
Copy link
Collaborator Author

munen commented Dec 3, 2022

Closed by the amazing work of @lechten in #910 🎆 🎊

@munen munen closed this as completed Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parser
Projects
None yet
Development

No branches or pull requests

3 participants