Parse nested links #5

tangjeff0 · 2020-04-27T15:01:10Z

Instaparse is the parsing library Athens and Roam use.

The code for Athens's parser can be found in src/cljs/athens/parser.cljs

I watched these videos to learn about EBNF Syntax, the parsing syntax used by Instaparse.

This blog shows an example of instaparse that might be helpful for the link parsing.

There are many useful test cases for parsing at http://localhost:3000/#/page/fxi42G2zA, which is the, the page [[Nested [[Links]]]]

Nested links

The text was updated successfully, but these errors were encountered:

tangjeff0 · 2020-05-01T18:22:47Z

Also see this commit for example hiccup that has a TODO checkbox, a link, and a nested link.

tangjeff0 · 2020-05-04T15:08:38Z

How much can be reused from existing parsing Instaparse libraries? Especially for formatting.

jeroenvandijk · 2020-05-04T15:18:21Z

Also see this commit for example hiccup that has a TODO checkbox, a link, and a nested link.

@tangjeff0 I would be interested to see the datoms belonging to this markdown / example hiccup. It would help me to get the bigger picture. Do you have this mapping somewhere?

tangjeff0 · 2020-05-04T15:29:58Z

@jeroenvandijk The markdown/hiccup comes directly from pure text, so the directly relevant datom would be

{:block/string "{{[[TODO]]}} work on [[Deep Transclusions]] and [[Nested [[Links]]]]"}

If you are running Athens locally, there are many more examples on the "Nested [[Links]]" page: http://localhost:3000/#/page/fxi42G2zA.

That said, it's a great question of how we can see which datoms are written/read directly during development. re-frame-10x doesn't play super well with re-posh.

anshbansal · 2020-05-04T15:47:11Z

Need some help in case anybody knows this

I am trying to replace the clicking of anchor tag with on-click of a span

So this to

[:a {:href (rfee/href :page {:id bid})} title]

this

[:span {:on-click (fn [e]
       (rfee/href :page {:id bid}))}
        title]

rfee is retit.frontend.easy

The reason I am trying to do this is because I want to handle nested links in the text for which I wanted to make changes

The anchor tag works but the on-click is not working

I am checking the event handlers which are registered in the browser on the span and I don't see any handler being registered

I was looking at https://purelyfunctional.tv/guide/re-frame-building-blocks/#introduction so the syntax seems correct

jeroenvandijk · 2020-05-04T16:13:59Z

Not sure, looks correct. What if you try [:span {:on-click (fn [e] (js/alert "works") )} "click me"]? that doesn't work?

anshbansal · 2020-05-04T16:27:09Z

Not sure, looks correct. What if you try [:span {:on-click (fn [e] (js/alert "works") )} "click me"]? that doesn't work?

that worked. Will check the docs. Maybe I am not calling the function properly. I'll go back and spend some time learning the framework first.

anshbansal · 2020-05-04T16:43:10Z

figured it out. had to use push-state. Seems the href is for actual anchor tags. The docs are not super clear on it. Made metosin/reitit#393 to understand this for future reference

teodorlu · 2020-05-08T23:14:18Z

I see that Deep Transclusions has been checked off in the original post. Does that mean deep transclusions are working? Do we have an example of that?

I've been playing with the parser in the REPL, and I can't seem to get it to parse any examples I can come up with. I'm planning to look into this over the weekend.

teodorlu · 2020-05-08T23:17:05Z

Looks like the parser code isn't used in any outside modules as of now.

tangjeff0 · 2020-05-08T23:27:01Z

Deep transclusions at http://localhost:3000/#/page/UxbY48ffJ.

Lots of generic test cases at http://localhost:3000/#/page/fxi42G2zA

roryokane · 2020-05-27T23:31:53Z

I’ve started looking into this. (Edit: written before this issue was closed).

Both parsing and not parsing the link content

One of the complexities of this feature is that the syntax for the nested links is included in the parent link. For example, with [[Nested [[Links]]]], the outermost link is to a page named Nested [[Links]]. And according to my tests, Roam treats this as a different page from one titled Nested Links. It’s the same for other nested formatting like [[Nested **bold**]]: formatting is significant to the link.

This makes parsing nested links challenging because you need to both parse and not parse the interior of links. You need to turn [[Links]] into a data structure representing a link to Links, but you also need to know that that part of the text should contribute [[Links]], with square brackets, to the parent link’s target page title.

Transforming to an intermediate AST

As a first step, I think the idea I was considering, an intermediate AST before transforming into Hiccup, would make this feature easier to write and read. After parsing this markup with Instaparse:

Nested [[Links]]

The first stage of the AST can focus on how to turn this:

[:link "Nested " [:link "Links"]]

into this:

[:link {:destination "Nested [[Links]]"} "Nested " [:link "Links"]]

(Using Hiccup-like structure for the AST because it seems natural, and it’s also powerful enough to represent the CommonMark AST visible in CommonMark’s demo.)

And the second stage can turn that into [:a {:href …} …] or whatever DOM structure fits best in the web app without concerning itself with extracting the link destination.

Ways to do the transformation

I see two possible ways to perform this transformation: remembering the source text or reconstructing the source text.

Remembering the source text

As I found with the following tests (I don’t have the output right now, sorry), Instaparse includes information about the source of parsed blocks in its parse tree:

(binding [*print-meta* true]
  (prn (block-parser "[[link]]")))
(prn (meta (block-parser "[[link]]")))

The meta object of each parsed node from Instaparse contains the :start-index and :end-index (within the original string) of the current node. That could be used with subs to extract the contents of the link from the source text.

Reconstructing the source text

We could reconstruct the source text by printing the AST back to a string. So when a link is transformed, it would look at the AST inside it, print it (with recursive rules for every type of AST node), then set that text as the destination of the link.

This solution feels more elegant in that you don’t have to keep a reference to the original string. It’s also possible that we’ll need tree-printing capability eventually anyway. But this solution would also add complexity in some other ways.

One source of complexity with this solution is the challenge of matching the original syntax exactly. If we define links as pointing to the page with those exact characters between [[ ]], then this printing must be literal printing, not pretty-printing, or links might break due to syntax differences that don’t affect the visual formatting. (I’m thinking how in Markdown, both _foo_ and *foo* are italic. I’m not sure if Roam syntax has any such differences right now, but I bet Athens will have differences like those at some point.)

Alternatively, pretty-printing would be an easy solution if we define all links that parse to the same AST as pointing to the same page. That would require making the link-rendering code normalize all links by pretty-printing them, which is also not that hard. But that might add complexity to the user’s mental model of links. Would it make users confused or happy if they wrote two links with different syntax but the same visual formatting, and found that they linked to the same page?

I’m still deciding which approach to the transformation would be better – remembering the source text or reconstructing the source text. There’s also another difficulty in implementing this feature I’ve noticed: you can’t nest a tags in HTML. I might write more about possible solutions for that later.

tangjeff0 · 2020-05-28T15:52:30Z

Re-opened this and other parser issues @roryokane . Just wanted to focus on the high-level feature issues yesterday while setting up the new project board.

Your explanation for why parsing nested links is spot on: we have to parse and not parse at the same time.

I'm not sure whether reconstructing our remembering the source text is the better option. Hope you can give us insight there.

And, yes, you cannot use a tags. Roam entirely uses span for their links. See this output hiccup: https://github.com/athensresearch/athens/blob/master/data/nested-link.hiccup.

feat(blocks): faster styles for dragging bullet

Fix typo in readme

tangjeff0 added the enhancement label Apr 27, 2020

tangjeff0 changed the title ~~Nested links and deep transclusions with Instaparser~~ Parse nested links and deep transclusions with Instaparser Apr 27, 2020

tangjeff0 added the help wanted label Apr 28, 2020

tangjeff0 mentioned this issue May 4, 2020

Setup unit/generative testing #15

Closed

anshbansal mentioned this issue May 4, 2020

[WIP] athens-5 changes to parser and some tests #16

Closed

tangjeff0 assigned anshbansal May 4, 2020

tangjeff0 mentioned this issue May 6, 2020

Parse and render md images #25

Closed

anshbansal removed their assignment May 8, 2020

tangjeff0 added instaparse and removed feat labels May 15, 2020

tangjeff0 changed the title ~~Parse nested links and deep transclusions with Instaparser~~ Parse nested links May 15, 2020

tangjeff0 mentioned this issue May 27, 2020

Parser #94

Closed

tangjeff0 removed ⭐️⭐️ medium labels May 27, 2020

tangjeff0 closed this as completed May 27, 2020

tangjeff0 reopened this May 28, 2020

shanberg referenced this issue in shanberg/athens Jun 20, 2020

Merge pull request #5 from shanberg/bullet-styles

aa26ab9

feat(blocks): faster styles for dragging bullet

This was referenced Jul 13, 2020

fix(parser, blocks): fixed incorrect parser outputs and added non-existent pages auto-create #252

Merged

feat(blocks): nested link support #254

Closed

tangjeff0 closed this as completed in #252 Jul 14, 2020

juniusfree referenced this issue in juniusfree/athens Jun 29, 2021

Merge pull request #5 from sawhney17/patch-1

894927b

Fix typo in readme

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse nested links #5

Parse nested links #5

tangjeff0 commented Apr 27, 2020 •

edited

Loading

tangjeff0 commented May 1, 2020 •

edited

Loading

tangjeff0 commented May 4, 2020

jeroenvandijk commented May 4, 2020

tangjeff0 commented May 4, 2020 •

edited

Loading

anshbansal commented May 4, 2020

jeroenvandijk commented May 4, 2020

anshbansal commented May 4, 2020

anshbansal commented May 4, 2020 •

edited

Loading

teodorlu commented May 8, 2020 •

edited

Loading

teodorlu commented May 8, 2020

tangjeff0 commented May 8, 2020

roryokane commented May 27, 2020 •

edited

Loading

tangjeff0 commented May 28, 2020

Parse nested links #5

Parse nested links #5

Comments

tangjeff0 commented Apr 27, 2020 • edited Loading

Nested links

tangjeff0 commented May 1, 2020 • edited Loading

tangjeff0 commented May 4, 2020

jeroenvandijk commented May 4, 2020

tangjeff0 commented May 4, 2020 • edited Loading

anshbansal commented May 4, 2020

jeroenvandijk commented May 4, 2020

anshbansal commented May 4, 2020

anshbansal commented May 4, 2020 • edited Loading

teodorlu commented May 8, 2020 • edited Loading

teodorlu commented May 8, 2020

tangjeff0 commented May 8, 2020

roryokane commented May 27, 2020 • edited Loading

Both parsing and not parsing the link content

Transforming to an intermediate AST

Ways to do the transformation

Remembering the source text

Reconstructing the source text

tangjeff0 commented May 28, 2020

tangjeff0 commented Apr 27, 2020 •

edited

Loading

tangjeff0 commented May 1, 2020 •

edited

Loading

tangjeff0 commented May 4, 2020 •

edited

Loading

anshbansal commented May 4, 2020 •

edited

Loading

teodorlu commented May 8, 2020 •

edited

Loading

roryokane commented May 27, 2020 •

edited

Loading