Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try parsing semantic elements instead of line-based #11

Merged
merged 12 commits into from
May 13, 2021
Merged

Try parsing semantic elements instead of line-based #11

merged 12 commits into from
May 13, 2021

Conversation

schoettl
Copy link
Collaborator

@schoettl schoettl commented May 1, 2020

In this branch, I work on the higher level syntax according to https://orgmode.org/worg/dev/org-syntax.html

Specifically, I want to check out, if we can move away from line-based parsing towards more semantical blocks, called "elements". The orgmode parser used for export is also called org-element.el.

The spec says, that most elements of the syntax are not context-free and the categories for these elements are

“Greater elements”, “elements”, and “objects”

Greater elements are e.g. #+BEGIN_EXAMPLE blocks. Some of these blocks contain raw text (EXAMPLE, SRC, COMMENT, ...), others can contain formatted text (CENTER, QUOTE, ...). Hence, it's better to parse context-aware and parse the multi-line raw content in EXAMPLE but formatted text in CENTER block.

Also, paragraphs, multi-line footnote definitions, lists, tables, property drawers are maybe better parsed as units instead of line-based.

@schoettl schoettl changed the title Try parsing semantic elements instead of line-based [WIP] Try parsing semantic elements instead of line-based May 1, 2020
@munen
Copy link
Contributor

munen commented May 1, 2020

Best of luck with this approach, the described rationale seems absolutely reasonable to me! 👍

@schoettl
Copy link
Collaborator Author

schoettl commented May 6, 2020

Note to self: I thought tests passed before rebase at bf7104c ...

@munen
Copy link
Contributor

munen commented May 13, 2020

@schoettl The tests were crashing on this branch, because of (what I think is) a typo. I took the liberty to fix the typo in this commit, if it was not accidental, I apologize for stepping on your toes^^

@@ -151,7 +151,7 @@ keyword-value = anything-but-newline

(* TODO allow empty properties with or without trailing space *)
(* TODO looks like node-property-line also parses :END: *)
node-property-line = ! <':END:> <':'> node-property-name [node-property-plus] <':'> ( <' '> node-property-value | [<' '>] ) <eol>
node-property-line = ! <':END:'> <':'> node-property-name [node-property-plus] <':'> ( <' '> node-property-value | [<' '>] ) <eol>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this was a typo. Thanks!

Copy link
Contributor

@munen munen May 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome. Good work on the PR so far!

NB: I've asked my business partner on feedback on how to work with semantic elements and he's writing something up as I type this^^

@schoettl
Copy link
Collaborator Author

schoettl commented May 9, 2021

Hi @munen @branch14 ,

I have some changes I'd like to merge:

  • Rename token head-line to headline (better now than later)
  • Add some elements to EBNF (with tests), including horizontal-rule, comment-line, TODO keyword
  • Refactor and document some tokens in EBNF, including drawers, property drawers
  • Fix <word> regex (there was a $ inside [...]
  • Use insta/failure? instead of map? in tests
  • Add tests

My TODOs after merge:

@schoettl schoettl changed the title [WIP] Try parsing semantic elements instead of line-based Try parsing semantic elements instead of line-based May 9, 2021
@branch14
Copy link
Member

@schoettl This looks great!

I was initial wary of this PR mainly because of its title. A previous attempt of mine to move from line based parsing to semantic blocks failed miserably.

I just read through your changes and I'm totally happy with where this is going. 😄

Definitely LGTM! 👍

@schoettl
Copy link
Collaborator Author

I was initial wary of this PR mainly because of its title. A previous attempt of mine to move from line based parsing to semantic blocks failed miserably.

Thanks! Right, the title of this PR is wrong. I didn't switched to parsing of semantic blocks yet, but I prepared parts of it :) I'll try again in new PR.

@schoettl schoettl merged commit 9744f6b into 200ok-ch:master May 13, 2021
@munen
Copy link
Contributor

munen commented May 13, 2021

@schoettl I'm late to the party, but I don't want to miss the chance to say that this PR looks great. Good work and solid merge! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants