Try parsing semantic elements instead of line-based #11

schoettl · 2020-05-01T14:36:06Z

In this branch, I work on the higher level syntax according to https://orgmode.org/worg/dev/org-syntax.html

Specifically, I want to check out, if we can move away from line-based parsing towards more semantical blocks, called "elements". The orgmode parser used for export is also called org-element.el.

The spec says, that most elements of the syntax are not context-free and the categories for these elements are

“Greater elements”, “elements”, and “objects”

Greater elements are e.g. #+BEGIN_EXAMPLE blocks. Some of these blocks contain raw text (EXAMPLE, SRC, COMMENT, ...), others can contain formatted text (CENTER, QUOTE, ...). Hence, it's better to parse context-aware and parse the multi-line raw content in EXAMPLE but formatted text in CENTER block.

Also, paragraphs, multi-line footnote definitions, lists, tables, property drawers are maybe better parsed as units instead of line-based.

And implement comments: https://orgmode.org/manual/Comment-Lines.html

munen · 2020-05-01T19:12:33Z

Best of luck with this approach, the described rationale seems absolutely reasonable to me! 👍

schoettl · 2020-05-06T22:14:27Z

Note to self: I thought tests passed before rebase at bf7104c ...

munen · 2020-05-13T07:13:00Z

@schoettl The tests were crashing on this branch, because of (what I think is) a typo. I took the liberty to fix the typo in this commit, if it was not accidental, I apologize for stepping on your toes^^

schoettl · 2020-05-13T07:18:28Z

resources/org.ebnf

@@ -151,7 +151,7 @@ keyword-value = anything-but-newline

 (* TODO allow empty properties with or without trailing space *)
 (* TODO looks like node-property-line also parses :END: *)
-node-property-line = ! <':END:> <':'> node-property-name [node-property-plus] <':'> ( <' '> node-property-value | [<' '>] ) <eol>
+node-property-line = ! <':END:'> <':'> node-property-name [node-property-plus] <':'> ( <' '> node-property-value | [<' '>] ) <eol>


Nice, this was a typo. Thanks!

You're welcome. Good work on the PR so far!

NB: I've asked my business partner on feedback on how to work with semantic elements and he's writing something up as I type this^^

…ests

They cannot work as long as the grammar defines line-based parsing

schoettl · 2021-05-09T21:26:02Z

Hi @munen @branch14 ,

I have some changes I'd like to merge:

Rename token head-line to headline (better now than later)
Add some elements to EBNF (with tests), including horizontal-rule, comment-line, TODO keyword
Refactor and document some tokens in EBNF, including drawers, property drawers
Fix <word> regex (there was a $ inside [...]
Use insta/failure? instead of map? in tests
Add tests

My TODOs after merge:

I also added and then deleted cacf5db a bunch of tests for semantic parsing of whole blocks. I'll reintroduce them in a new PR.
I'll also create a separate PR for parsing content-line as more than just .* (see Parser will not parse styles within sections. #26 ).

branch14 · 2021-05-10T09:10:13Z

@schoettl This looks great!

I was initial wary of this PR mainly because of its title. A previous attempt of mine to move from line based parsing to semantic blocks failed miserably.

I just read through your changes and I'm totally happy with where this is going. 😄

Definitely LGTM! 👍

schoettl · 2021-05-10T21:10:18Z

I was initial wary of this PR mainly because of its title. A previous attempt of mine to move from line based parsing to semantic blocks failed miserably.

Thanks! Right, the title of this PR is wrong. I didn't switched to parsing of semantic blocks yet, but I prepared parts of it :) I'll try again in new PR.

munen · 2021-05-13T08:40:21Z

@schoettl I'm late to the party, but I don't want to miss the chance to say that this PR looks great. Good work and solid merge! 🙏

schoettl changed the title ~~Try parsing semantic elements instead of line-based~~ [WIP] Try parsing semantic elements instead of line-based May 1, 2020

This was referenced May 1, 2020

Parse content text that can contain style, links, footnotes, timestamps, ... #9

Merged

node-properties are not parsing correctly #6

Closed

schoettl commented May 13, 2020

View reviewed changes

branch14 mentioned this pull request May 13, 2020

[wip] Add: first draft of parsing basic structure elements #7

Closed

munen force-pushed the master branch from 836aced to 6b988ba Compare January 4, 2021 16:03

schoettl mentioned this pull request Jan 6, 2021

Fix drawer-end-line not parsed #25

Merged

schoettl added 12 commits May 9, 2021 11:28

Rename head-line to headline, add support for todo keyword, fix/add t…

8d4413d

…ests

Implement horizontal rules

55df592

Implement comment lines

b7943e1

Blocks, tests, and bugfix

18bcf7b

Implement drawers and property drawers

cf1535b

Add TODO comments

7100245

Fix bug in EBNF

38ea73f

Fix places that use head-line instead of headline

4180661

Allow line-based parsing of various blocks

e148af3

Remove tests for semantic parsing of blocks temporarily

cacf5db

They cannot work as long as the grammar defines line-based parsing

Fix tests to work with line-based parsing

c36a26e

Remove invalid comment

50a338f

schoettl changed the title ~~[WIP] Try parsing semantic elements instead of line-based~~ Try parsing semantic elements instead of line-based May 9, 2021

schoettl merged commit 9744f6b into 200ok-ch:master May 13, 2021

schoettl mentioned this pull request May 15, 2021

Parse semantic blocks where appropriate #32

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try parsing semantic elements instead of line-based #11

Try parsing semantic elements instead of line-based #11

schoettl commented May 1, 2020 •

edited

Loading

munen commented May 1, 2020 •

edited

Loading

schoettl commented May 6, 2020

munen commented May 13, 2020

schoettl May 13, 2020

munen May 13, 2020 •

edited

Loading

schoettl commented May 9, 2021 •

edited

Loading

branch14 commented May 10, 2021

schoettl commented May 10, 2021

munen commented May 13, 2021

Try parsing semantic elements instead of line-based #11

Try parsing semantic elements instead of line-based #11

Conversation

schoettl commented May 1, 2020 • edited Loading

munen commented May 1, 2020 • edited Loading

schoettl commented May 6, 2020

munen commented May 13, 2020

schoettl May 13, 2020

Choose a reason for hiding this comment

munen May 13, 2020 • edited Loading

Choose a reason for hiding this comment

schoettl commented May 9, 2021 • edited Loading

branch14 commented May 10, 2021

schoettl commented May 10, 2021

munen commented May 13, 2021

schoettl commented May 1, 2020 •

edited

Loading

munen commented May 1, 2020 •

edited

Loading

munen May 13, 2020 •

edited

Loading

schoettl commented May 9, 2021 •

edited

Loading