-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with parsing emphasis/style markup #12
Comments
@munen, @schoettl I acknowledge that Org-mode might have syntactic elements that cannot properly parsed by EBNF/PEG. While the project's goal is to have as much of the Org-mode syntax formalized in a EBNF/PEG, we need an alternative (more pragmatic) approach to provide a full featured parser. Some of the issues mentioned here can be implemented in the grammar, while others might need to be deferred to the |
Here I layed out how the code for transformation could look like: #15 For parsing multi-line styles a 2nd transformation step would be needed. (Not part of this PR.) |
Even single-line styles have severe problems in EBNF. I want to check out, if it is reasonable to put all style (emphasis and verbatim, how it's called in the spec) into the transformations. An advantage of this approach is that we can reuse the logic and regexes from orgmode. |
👍 |
That could actually prove to be a major benefit. As long as Emacs is not using |
Note to self:
It would be great if instaparse has a way to get the original, unparsed text along with the parse tree, |
Regarding 2.: instaparse has a built-in way to get position/location meta information from the parse tree! Even if the parse tree looks like it only holds the parsed data, the So, if we have the original input text, it's no problem to apply emphasis regexes on the original line. We do have all position information about elements parsed via EBNF. |
It's been 4 years, and this issue seems to be the last unfinished todo item in
|
The project hasn't been very active since then and no one has worked on this specific problem. I guess it's not the only gap. The check list in the README may miss some less common org features. There is also a big room for enhancements on the transformation side (the step of converting the instaparse parse result into a more meaningful data structure). E.g. joining lines of a paragraph which would be a requirement for parsing style markup in the transformation step. I'm personally more concerned about #56 – that's why I don't invest much time. |
Problems with the ungreatful and recursive nature of emphasis markup
[/*_+]
are documented in #9, specificallyA summary in German:
Ein Syntaxelement in Orgmode ist ähnlich wie bei Markdown:
Dies ist /kursiver/ Text.
(* = fett, _ = unterstrichen, + = durchgestrichen, ...)
Jetzt geht es darum den Text zu parsen.
Die Schwierigkeiten sind jetzt die:
Org Mode selbst löst das beim Export wohl auf eine andere Weise, nicht durch einen BNF Parser sondern durch Programmierung und insbesondere einen Regex, der nicht nur den kursiven Text matcht, sondern auch den Buchstaben davor und dahinter.
Nur hier funktioniert das nicht so einfach: Zum einen kann ich beim Symbol text-kursiv keinen Regex angeben (wegen der Rekursion). Zum anderen kann text-kursiv nicht wissen, ob vor ihm ein Leerzeichen kommt, oder nicht. Look-ahead ist unterstützt vom verwendeten BNF, aber nicht Look-back. Und zuletzt gestaltet sich der Regex von text-normal als schwierig, weil er eben an der richtigen Stelle stoppen muss: Mal nach einem Leerzeichen, wenn danach ein / kommt. Mal ohne zusätzliches Kriterium, wenn danach ein [ kommt (Link oder Fußnote).
Siehe auch in Emacs
org-emph-re
.The text was updated successfully, but these errors were encountered: