Parse content text that can contain style, links, footnotes, timestamps, ... #9

schoettl · 2020-04-25T18:53:37Z

text is any orgmode text that can contain style, links, footnotes, timestamps, ...

It can be a full line or part of a line (e.g. in title, lists, property values, tables, ...)

test/org_parser/parser_test.cljc

schoettl · 2020-04-26T20:23:19Z

Maybe, I should mention, why I put so much effort in file links compared to all other forms of links.

File links, specifically with location information (::), are most likely interpreted by the app using org-parser – in contrast to most other URL forms which are just started with the appropriate external app (or understood specifically in Emacs orgmode).

Also organice might use the file links and internal links in future, so it's best to implement detailed parsing directly here in EBNF.

munen · 2020-04-27T15:57:08Z

@schoettl I'm super stoked about your internal links implementation. That'll be a great addition to organice at some point 😄

schoettl · 2020-04-27T16:08:18Z

BNF for styled text and links must be adapted to match orgmodes syntax but the new tests are good so far.

schoettl · 2020-04-28T07:18:16Z

Note: Reference implementation for emphasizing text (style):

Font in Emacs: https://github.com/emacs-mirror/emacs/blob/763df4bc171c9742cf04b029ddfe495d52461bcf/lisp/org/org.el#L5048
LaTeX export: https://lists.gnu.org/archive/html/emacs-orgmode/2010-08/msg00885.html
org.el -> org-emph-re

(insert org-emph-re)
delete enclosing quotes, CR, \\, \ before [{(|)}]
([- ('\"{]|^)(([*/_+])([^ ]|[^ ].*?(?:.*?){0,1}[^ ])\3)([- .,:!?;'\")}\[]|$)
https://www.debuggex.com/

Problem: orgmode (at least in export) supports multi-line emphasize; we can't because parsing is line-based.

We don't use this regex directly because we have a EBNF parser. Probably, I have to

use the capture group 1 as look-ahead stop criteria when parsing text-normal
use the capture group 5 as look-ahead stop criteria when regex-parsing the inner text; or better: not as look-ahead in the regex but as separate symbol & <thing-after-emph>

I will see how it behaves with recursion and what will happen when a match does not work (backtracking?).

Not parsing multi-line emphasize correctly wouldn't be too bad, if the line is parsed as text-normal instead.

Ref card revealing some details:
https://github.com/fniessen/refcard-org-mode

More:
https://orgmode.org/manual/Markup-for-Rich-Contents.html#Markup-for-Rich-Contents

And tables!

/*strong and italic*/ should also work.

Implement sub- and superscripts: https://orgmode.org/manual/Subscripts-and-Superscripts.html

munen · 2020-04-29T14:14:12Z

Just a comment at a random time: you’re on 🔥, there’s already loads of amazing stuff in this PR!

Sorry for the non-helpful comment, but I had to say it out loud(;

schoettl · 2020-04-29T14:45:15Z

Problem with parsing styled text:

I have to first parse preceding non-styled normal text. But I can't just stop parsing normal text before any style delimiter because I have to make sure, the delimiter is preceded by a space (e.g. the *fox* <- space before first *).

If I stop parsing only before delimiters preceded by space (using look-ahead), I cannot parse links, footnotes, or <http://example.com> if they are not preceded by a space :(

Combinations of the styled text delimiters and link/footnote delimiters do not work, e.g.

[^<]*|[^/]*
([^<]|[^/])*

(possibly with look-aheads)

They cannot work because 1) eats up styled text and 2) matches everything!

I'm close to giving up with styled text. Edit: Not before I've asked some experts...

org.el works differently: It has a regex (org-emph-re and org-verbatim-re) that matches the styled text and the character before and after. The BNF parser cannot do this. It can look ahead but not backwards, i.e. it cannot check the character before the styled text.

ox-latex.el – don't know yet, how the org to latex exporter work, I couldn't find the place where parses styled text. Maybe it's in org-element.el, e.g. org-element-bold-parser. Anyway, it uses the org-emph-re regex which I cannot use in BNF parser.

schoettl · 2020-05-01T14:51:49Z

wait for feedback on above parser problem
compare regular links with spec
implement targets and radio targets
implement {{{macros()}}}
implement \entitys (\LaTeX stuff is out of scope of this PR)
line breaks \\ (see also Try parsing semantic elements instead of line-based #11 paragraphs)
diary timestamps
what else?[1]

[1] https://orgmode.org/manual/Markup-for-Rich-Contents.html#Markup-for-Rich-Contents

schoettl · 2020-05-03T22:04:30Z

Hi @munen,

I think this PR is ready for review and merge.

There is a lot of good and stable things in it.

The remaining problems are mostly related to the ungreatful and recursive nature of the emphasis markup [/*_=~+]. I'll open an issue for that.

Anyway, it should be a solid base for improvements.

I'll rebase #11 after the merge.

munen · 2020-05-05T15:35:54Z

Amazing stuff, @schoettl, as always!

Will happily merge! 🙏 🙇

schoettl added 4 commits April 25, 2020 21:02

Add TODOs

861c419

Add literal lines (:), tests, comments, and placeholders

193633e

Add support for "external links" i.e. URLs

ef9a266

Implement interal links, fix comments

6414a05

munen reviewed Apr 26, 2020

View reviewed changes

test/org_parser/parser_test.cljc Outdated Show resolved Hide resolved

schoettl added 2 commits April 26, 2020 20:40

Add support for am/pm and tests

c429bb4

Add "Project State" section to README

bd347da

Add rough parsing of styled text and text links

16c587c

schoettl changed the title ~~[WIP] Parse content text that can contain markup, links, footnotes, timestamps, ...~~ [WIP] Parse content text that can contain style, links, footnotes, timestamps, ... Apr 27, 2020

Update README

9358d4c

schoettl added 8 commits April 27, 2020 18:15

Fix typos

0f46702

Implement backslash escapes in links

92539d0

Styled text can contain text

3bdd9b4

Fix misunderstanding: www.example.com is an internal, not a web link

0269896

Formatting

a372396

Allow colons in filename, behave like orgmode

9130d8c

Implement footnotes, tests, and fix bug in footnote-line

6da507d

Support text as property value

07f1be9

Add tests

e5326a6

schoettl added 5 commits April 29, 2020 19:07

Implement subscript and superscript with tests

2796ed9

Fix orgmode syntax (was markdown)

d044b29

Fix README

06097b6

Fix BNF for two kind of links

12ba4bb

Add currently failing tests

5263de0

schoettl added 12 commits May 1, 2020 18:16

Implement targets and radio targets

87e61bc

Fix super/subscript

92caf19

Fix fixed-width lines (was literal line)

a179365

Fix syntax for regular links, add id links

fafe559

Implement macros

2bef241

Implement character entities

8b3f20b

Bugfix: close comment

a40a307

Add TODO comment

8969341

Add doc

d43fac4

Implement diary timestamps

1927a74

Implement line breaks

bed3852

Fix verbatim text markup

bf7104c

schoettl changed the title ~~[WIP] Parse content text that can contain style, links, footnotes, timestamps, ...~~ Parse content text that can contain style, links, footnotes, timestamps, ... May 3, 2020

schoettl mentioned this pull request May 3, 2020

Problems with parsing emphasis/style markup #12

Open

chore: orgmode -> Org mode

223c04c

munen merged commit 2cae43c into 200ok-ch:master May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse content text that can contain style, links, footnotes, timestamps, ... #9

Parse content text that can contain style, links, footnotes, timestamps, ... #9

schoettl commented Apr 25, 2020 •

edited

Loading

schoettl commented Apr 26, 2020

munen commented Apr 27, 2020

schoettl commented Apr 27, 2020

schoettl commented Apr 28, 2020 •

edited

Loading

munen commented Apr 29, 2020

schoettl commented Apr 29, 2020 •

edited

Loading

schoettl commented May 1, 2020 •

edited

Loading

schoettl commented May 3, 2020

munen commented May 5, 2020

Parse content text that can contain style, links, footnotes, timestamps, ... #9

Parse content text that can contain style, links, footnotes, timestamps, ... #9

Conversation

schoettl commented Apr 25, 2020 • edited Loading

schoettl commented Apr 26, 2020

munen commented Apr 27, 2020

schoettl commented Apr 27, 2020

schoettl commented Apr 28, 2020 • edited Loading

munen commented Apr 29, 2020

schoettl commented Apr 29, 2020 • edited Loading

schoettl commented May 1, 2020 • edited Loading

schoettl commented May 3, 2020

munen commented May 5, 2020

schoettl commented Apr 25, 2020 •

edited

Loading

schoettl commented Apr 28, 2020 •

edited

Loading

schoettl commented Apr 29, 2020 •

edited

Loading

schoettl commented May 1, 2020 •

edited

Loading