Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse content text that can contain style, links, footnotes, timestamps, ... #9

Merged
merged 35 commits into from
May 5, 2020
Merged

Conversation

schoettl
Copy link
Collaborator

@schoettl schoettl commented Apr 25, 2020

text is any orgmode text that can contain style, links, footnotes, timestamps, ...

It can be a full line or part of a line (e.g. in title, lists, property values, tables, ...)

@schoettl
Copy link
Collaborator Author

Maybe, I should mention, why I put so much effort in file links compared to all other forms of links.

File links, specifically with location information (::), are most likely interpreted by the app using org-parser – in contrast to most other URL forms which are just started with the appropriate external app (or understood specifically in Emacs orgmode).

Also organice might use the file links and internal links in future, so it's best to implement detailed parsing directly here in EBNF.

@munen
Copy link
Contributor

munen commented Apr 27, 2020

@schoettl I'm super stoked about your internal links implementation. That'll be a great addition to organice at some point 😄

@schoettl schoettl changed the title [WIP] Parse content text that can contain markup, links, footnotes, timestamps, ... [WIP] Parse content text that can contain style, links, footnotes, timestamps, ... Apr 27, 2020
@schoettl
Copy link
Collaborator Author

BNF for styled text and links must be adapted to match orgmodes syntax but the new tests are good so far.

@schoettl
Copy link
Collaborator Author

schoettl commented Apr 28, 2020

Note: Reference implementation for emphasizing text (style):

  1. (insert org-emph-re)
  2. delete enclosing quotes, CR, \\, \ before [{(|)}]
  3. ([- ('\"{]|^)(([*/_+])([^ ]|[^ ].*?(?:.*?){0,1}[^ ])\3)([- .,:!?;'\")}\[]|$)
  4. https://www.debuggex.com/

Problem: orgmode (at least in export) supports multi-line emphasize; we can't because parsing is line-based.

We don't use this regex directly because we have a EBNF parser. Probably, I have to

  • use the capture group 1 as look-ahead stop criteria when parsing text-normal
  • use the capture group 5 as look-ahead stop criteria when regex-parsing the inner text; or better: not as look-ahead in the regex but as separate symbol & <thing-after-emph>

I will see how it behaves with recursion and what will happen when a match does not work (backtracking?).

Not parsing multi-line emphasize correctly wouldn't be too bad, if the line is parsed as text-normal instead.

Ref card revealing some details:
https://github.com/fniessen/refcard-org-mode

More:
https://orgmode.org/manual/Markup-for-Rich-Contents.html#Markup-for-Rich-Contents

And tables!

/*strong and italic*/ should also work.

Implement sub- and superscripts: https://orgmode.org/manual/Subscripts-and-Superscripts.html

@munen
Copy link
Contributor

munen commented Apr 29, 2020

Just a comment at a random time: you’re on 🔥, there’s already loads of amazing stuff in this PR!

Sorry for the non-helpful comment, but I had to say it out loud(;

@schoettl
Copy link
Collaborator Author

schoettl commented Apr 29, 2020

Problem with parsing styled text:

I have to first parse preceding non-styled normal text. But I can't just stop parsing normal text before any style delimiter because I have to make sure, the delimiter is preceded by a space (e.g. the *fox* <- space before first *).

If I stop parsing only before delimiters preceded by space (using look-ahead), I cannot parse links, footnotes, or <http://example.com> if they are not preceded by a space :(

Combinations of the styled text delimiters and link/footnote delimiters do not work, e.g.

[^<]*|[^/]*
([^<]|[^/])*

(possibly with look-aheads)

They cannot work because 1) eats up styled text and 2) matches everything!

I'm close to giving up with styled text. Edit: Not before I've asked some experts...

org.el works differently: It has a regex (org-emph-re and org-verbatim-re) that matches the styled text and the character before and after. The BNF parser cannot do this. It can look ahead but not backwards, i.e. it cannot check the character before the styled text.

ox-latex.el – don't know yet, how the org to latex exporter work, I couldn't find the place where parses styled text. Maybe it's in org-element.el, e.g. org-element-bold-parser. Anyway, it uses the org-emph-re regex which I cannot use in BNF parser.

@schoettl
Copy link
Collaborator Author

schoettl commented May 1, 2020

  • wait for feedback on above parser problem
  • compare regular links with spec
  • implement targets and radio targets
  • implement {{{macros()}}}
  • implement \entitys (\LaTeX stuff is out of scope of this PR)
  • line breaks \\ (see also Try parsing semantic elements instead of line-based #11 paragraphs)
  • diary timestamps
  • what else?[1]

[1] https://orgmode.org/manual/Markup-for-Rich-Contents.html#Markup-for-Rich-Contents

@schoettl
Copy link
Collaborator Author

schoettl commented May 3, 2020

Hi @munen,

I think this PR is ready for review and merge.

There is a lot of good and stable things in it.

The remaining problems are mostly related to the ungreatful and recursive nature of the emphasis markup [/*_=~+]. I'll open an issue for that.

Anyway, it should be a solid base for improvements.

I'll rebase #11 after the merge.

@schoettl schoettl changed the title [WIP] Parse content text that can contain style, links, footnotes, timestamps, ... Parse content text that can contain style, links, footnotes, timestamps, ... May 3, 2020
@munen
Copy link
Contributor

munen commented May 5, 2020

Amazing stuff, @schoettl, as always!

Will happily merge! 🙏 🙇

@munen munen merged commit 2cae43c into 200ok-ch:master May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants