AST for the content #159

Andrew-Morozko · 2024-04-19T11:18:20Z

Right now each content block produces markdown strings, which requires a lot of string manipulation. For example, the title block replaces newlines in its value, otherwise, the title would be broken:

# Tiltle with
newline is half-broken

We should use an AST structure closely resembling markdown and use it instead:

func Title(size int, text string) ContentNode
func Text(text string) ContentNode
func List(kind ListKind, items [] ContentNode) ContentNode
// And so on

And use it to avoid repeated validation:

List(Unordered, []ContentNode{Text("list item, but it can contain\n * and not be split into two")})

This also decouples us from markdown, making it possible for data sinks to render a document in a custom format (html, latex, or, of course, markdown ...) from our ContentNode structure. Also, this makes the content more easily inspectable and traversable for meta-content providers such as toc.

I considered using goldmark's AST instead of creating our own, but their AST types are assumed to be valid markdown (since they usually come out of their markdown parser), while ours need to be validated/adjusted before we can pass them as a valid markdown.

Since our AST would be based on markdown it would be straightforward to implement ContentNode -> goldmark/ast.Node converter, after which we can use goldmark-markdown to render goldmark AST back into markdown, built-in renderer to produce HTML or goldmark-latex to produce LaTeX code.

The text was updated successfully, but these errors were encountered:

traut · 2024-04-22T08:58:42Z

+1 on using goldmark's AST, seems like a nice data model (better than plain Markdown strings) we can build upon

dobarx · 2024-04-22T09:44:02Z

@traut
-1 on using goldmark's AST. Like @Andrew-Morozko mentioned their AST is strictly based on markdown. If we used their AST directly it's even more complicated than using markdown strings all together and would be harder to extend with nodes that are specific for other outputs, like html or pdf.

traut · 2024-04-22T09:47:55Z

Hmm, okay. Do you have any suggestions on what data model to use here for the content trees? I would love to use an existing one instead of building one, and if it comes from the lib we'll be using -- it's a bonus.

Markdown strings on their own are quite versatile, being a serialized version of a data model -- whatever the plugin returns, we can parse and adjust.

dobarx · 2024-04-22T10:51:06Z

Yes, markdown strings are versatile but it creates a problem that we must do a a lot of string manipulation and sanitization at plugin level and this creates bugs.

Using existing library AST for this might be also tricky since we would need to encode/decode this tree using protobufs and it would be pretty hard to extend it since the focus for that tree is just one output - Markdown. But that doesn't work well since we also have a concept of section which doesn't render into a specific markdown element but can effect markdown content inside.

If we want to extend it then it's best just to make a copy of their AST tree nodes that we could convert to goldmark's AST when rendering markdown. We don't really need all functionality of goldmark AST we just need similar data structure for core content elements.

traut · 2024-04-22T11:04:58Z

we must do a a lot of string manipulation and sanitization at plugin level and this creates bugs.

I like that this is a plugin's problem — Fabric expects a valid Markdown string, so the plugin can do whatever it wants to produce it. It does limit us, though, and is a ticking trouble :) It's too flexible, so the validation of the content is difficult (what if the plugin returns a huge, complex Markdown that will mess up the whole document?) but I'm not too worried about that just yet.

We don't really need all functionality of goldmark AST we just need similar data structure for core content elements.

I agree, I think you are right. Our model can be more straightforward and should not be tied to Markdown explicitly -- we're going to support other serializations anyway. On that note, we can use our template data model as an inspiration for our content data model, adopting a concept of block with a content type. Even if it's just to wrap a markdown strings into nodes with metadata - it will still be very useful, I think

Andrew-Morozko self-assigned this Apr 19, 2024

traut added this to the v0.6 milestone Apr 22, 2024

traut added the enhancement New feature or request label Apr 22, 2024

traut modified the milestones: v0.6, v0.5 May 12, 2024

This was referenced May 12, 2024

Support content selection using meta tags #99

Open

Use unique anchors in ToC links #164

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AST for the content #159

AST for the content #159

Andrew-Morozko commented Apr 19, 2024

traut commented Apr 22, 2024

dobarx commented Apr 22, 2024 •

edited

traut commented Apr 22, 2024 •

edited

dobarx commented Apr 22, 2024 •

edited

traut commented Apr 22, 2024

AST for the content #159

AST for the content #159

Comments

Andrew-Morozko commented Apr 19, 2024

traut commented Apr 22, 2024

dobarx commented Apr 22, 2024 • edited

traut commented Apr 22, 2024 • edited

dobarx commented Apr 22, 2024 • edited

traut commented Apr 22, 2024

dobarx commented Apr 22, 2024 •

edited

traut commented Apr 22, 2024 •

edited

dobarx commented Apr 22, 2024 •

edited