Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse nested links #5

Closed
tangjeff0 opened this issue Apr 27, 2020 · 13 comments · Fixed by #252
Closed

Parse nested links #5

tangjeff0 opened this issue Apr 27, 2020 · 13 comments · Fixed by #252

Comments

@tangjeff0
Copy link
Collaborator

tangjeff0 commented Apr 27, 2020

Instaparse is the parsing library Athens and Roam use.

The code for Athens's parser can be found in src/cljs/athens/parser.cljs

I watched these videos to learn about EBNF Syntax, the parsing syntax used by Instaparse.

This blog shows an example of instaparse that might be helpful for the link parsing.

There are many useful test cases for parsing at http://localhost:3000/#/page/fxi42G2zA, which is the, the page [[Nested [[Links]]]]

Nested links

Screen Shot 2020-04-27 at 10 59 09 AM

@tangjeff0 tangjeff0 changed the title Nested links and deep transclusions with Instaparser Parse nested links and deep transclusions with Instaparser Apr 27, 2020
@tangjeff0
Copy link
Collaborator Author

tangjeff0 commented May 1, 2020

Also see this commit for example hiccup that has a TODO checkbox, a link, and a nested link.

@tangjeff0
Copy link
Collaborator Author

How much can be reused from existing parsing Instaparse libraries? Especially for formatting.

@jeroenvandijk
Copy link
Contributor

Also see this commit for example hiccup that has a TODO checkbox, a link, and a nested link.

@tangjeff0 I would be interested to see the datoms belonging to this markdown / example hiccup. It would help me to get the bigger picture. Do you have this mapping somewhere?

@tangjeff0
Copy link
Collaborator Author

tangjeff0 commented May 4, 2020

@jeroenvandijk The markdown/hiccup comes directly from pure text, so the directly relevant datom would be

{:block/string "{{[[TODO]]}} work on [[Deep Transclusions]] and [[Nested [[Links]]]]"}

If you are running Athens locally, there are many more examples on the "Nested [[Links]]" page: http://localhost:3000/#/page/fxi42G2zA.

That said, it's a great question of how we can see which datoms are written/read directly during development. re-frame-10x doesn't play super well with re-posh.

@anshbansal
Copy link

Need some help in case anybody knows this

I am trying to replace the clicking of anchor tag with on-click of a span

So this to

[:a {:href (rfee/href :page {:id bid})} title]

this

[:span {:on-click (fn [e]
       (rfee/href :page {:id bid}))}
        title]

rfee is retit.frontend.easy

The reason I am trying to do this is because I want to handle nested links in the text for which I wanted to make changes

The anchor tag works but the on-click is not working

I am checking the event handlers which are registered in the browser on the span and I don't see any handler being registered

I was looking at https://purelyfunctional.tv/guide/re-frame-building-blocks/#introduction so the syntax seems correct

@jeroenvandijk
Copy link
Contributor

Not sure, looks correct. What if you try [:span {:on-click (fn [e] (js/alert "works") )} "click me"]? that doesn't work?

@anshbansal
Copy link

Not sure, looks correct. What if you try [:span {:on-click (fn [e] (js/alert "works") )} "click me"]? that doesn't work?

that worked. Will check the docs. Maybe I am not calling the function properly. I'll go back and spend some time learning the framework first.

@anshbansal
Copy link

anshbansal commented May 4, 2020

figured it out. had to use push-state. Seems the href is for actual anchor tags. The docs are not super clear on it. Made metosin/reitit#393 to understand this for future reference

@anshbansal anshbansal removed their assignment May 8, 2020
@teodorlu
Copy link

teodorlu commented May 8, 2020

I see that Deep Transclusions has been checked off in the original post. Does that mean deep transclusions are working? Do we have an example of that?

I've been playing with the parser in the REPL, and I can't seem to get it to parse any examples I can come up with. I'm planning to look into this over the weekend.

@teodorlu
Copy link

teodorlu commented May 8, 2020

Looks like the parser code isn't used in any outside modules as of now.

@tangjeff0
Copy link
Collaborator Author

Deep transclusions at http://localhost:3000/#/page/UxbY48ffJ.

Lots of generic test cases at http://localhost:3000/#/page/fxi42G2zA

@tangjeff0 tangjeff0 added instaparse and removed feat labels May 15, 2020
@tangjeff0 tangjeff0 changed the title Parse nested links and deep transclusions with Instaparser Parse nested links May 15, 2020
@tangjeff0 tangjeff0 mentioned this issue May 27, 2020
@roryokane
Copy link
Contributor

roryokane commented May 27, 2020

I’ve started looking into this. (Edit: written before this issue was closed).

Both parsing and not parsing the link content

One of the complexities of this feature is that the syntax for the nested links is included in the parent link. For example, with [[Nested [[Links]]]], the outermost link is to a page named Nested [[Links]]. And according to my tests, Roam treats this as a different page from one titled Nested Links. It’s the same for other nested formatting like [[Nested **bold**]]: formatting is significant to the link.

This makes parsing nested links challenging because you need to both parse and not parse the interior of links. You need to turn [[Links]] into a data structure representing a link to Links, but you also need to know that that part of the text should contribute [[Links]], with square brackets, to the parent link’s target page title.

Transforming to an intermediate AST

As a first step, I think the idea I was considering, an intermediate AST before transforming into Hiccup, would make this feature easier to write and read. After parsing this markup with Instaparse:

Nested [[Links]]

The first stage of the AST can focus on how to turn this:

[:link "Nested " [:link "Links"]]

into this:

[:link {:destination "Nested [[Links]]"} "Nested " [:link "Links"]]

(Using Hiccup-like structure for the AST because it seems natural, and it’s also powerful enough to represent the CommonMark AST visible in CommonMark’s demo.)

And the second stage can turn that into [:a {:href …} …] or whatever DOM structure fits best in the web app without concerning itself with extracting the link destination.

Ways to do the transformation

I see two possible ways to perform this transformation: remembering the source text or reconstructing the source text.

Remembering the source text

As I found with the following tests (I don’t have the output right now, sorry), Instaparse includes information about the source of parsed blocks in its parse tree:

(binding [*print-meta* true]
  (prn (block-parser "[[link]]")))
(prn (meta (block-parser "[[link]]")))

The meta object of each parsed node from Instaparse contains the :start-index and :end-index (within the original string) of the current node. That could be used with subs to extract the contents of the link from the source text.

Reconstructing the source text

We could reconstruct the source text by printing the AST back to a string. So when a link is transformed, it would look at the AST inside it, print it (with recursive rules for every type of AST node), then set that text as the destination of the link.

This solution feels more elegant in that you don’t have to keep a reference to the original string. It’s also possible that we’ll need tree-printing capability eventually anyway. But this solution would also add complexity in some other ways.

One source of complexity with this solution is the challenge of matching the original syntax exactly. If we define links as pointing to the page with those exact characters between [[ ]], then this printing must be literal printing, not pretty-printing, or links might break due to syntax differences that don’t affect the visual formatting. (I’m thinking how in Markdown, both _foo_ and *foo* are italic. I’m not sure if Roam syntax has any such differences right now, but I bet Athens will have differences like those at some point.)

Alternatively, pretty-printing would be an easy solution if we define all links that parse to the same AST as pointing to the same page. That would require making the link-rendering code normalize all links by pretty-printing them, which is also not that hard. But that might add complexity to the user’s mental model of links. Would it make users confused or happy if they wrote two links with different syntax but the same visual formatting, and found that they linked to the same page?


I’m still deciding which approach to the transformation would be better – remembering the source text or reconstructing the source text. There’s also another difficulty in implementing this feature I’ve noticed: you can’t nest a tags in HTML. I might write more about possible solutions for that later.

@tangjeff0 tangjeff0 reopened this May 28, 2020
@tangjeff0
Copy link
Collaborator Author

Re-opened this and other parser issues @roryokane . Just wanted to focus on the high-level feature issues yesterday while setting up the new project board.

Your explanation for why parsing nested links is spot on: we have to parse and not parse at the same time.

I'm not sure whether reconstructing our remembering the source text is the better option. Hope you can give us insight there.

And, yes, you cannot use a tags. Roam entirely uses span for their links. See this output hiccup: https://github.com/athensresearch/athens/blob/master/data/nested-link.hiccup.

shanberg referenced this issue in shanberg/athens Jun 20, 2020
feat(blocks): faster styles for dragging bullet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants