Refactor interpreter into working over a separate HIR and AST #175

CosmicHorrorDev · 2023-11-23T23:55:35Z

Having been working with the interpreter more it's clear that the current design is pretty painful to work with in almost every way. It currently works by doing a pass over the html5ever tokenstream that stores a pretty minimal amount of state while computing the elements for output on the fly. In theory this is very efficient because we can stream our output and we retain a pretty minimal amount of past state, but it also translates to a soup of conditionals that are incredibly difficult to refactor without regressing other behavior.

This also expresses itself where most of the bugs for things lie in not properly keeping track of enough history. Take for instance this markdown snippet

1. First list item

    Some nested text

2. Second list item

This should render as follows:

First list item

Some nested text
Second list item

but the actual HTML for this is pretty complex

<ol>
    <li>
        <p>First list item</p>
        <p>Some nested text</p>
    </li>
    <li>
        <p>Second list item</p>
    </li>
</ol>

There's quite a lot involved in rendering this right: you've got to keep track of which list item you're in, the first element in the list item (<li>) gets the line with the list prefix while the rest get indented further on separate lines, etc. Unfortunately this shows as we do a pretty sloppy job of rendering most things that are nested even semi-complexly

I think the better way forward would be to first convert the raw HTML to an HIR (higher intermediate representation or HTML intermediate representation 😁) which is just a nice full structured format describing the things we care about in the HTML doc (basically a more complete version of Element). Then from there we can more easily do the analysis we need to correctly render things above into our output AST which represents how thing are laid out in inlyne's eyes

The text was updated successfully, but these errors were encountered:

kokoISnoTarget · 2024-03-24T14:36:17Z

So what things would we need to do? (Because this sounds like a major rewrite)
And what would the HIR contain?

CosmicHorrorDev · 2024-03-24T16:17:47Z

So for starters the AST would likely remain a VecDeque<crate::Element>. Each element in the HIR would contain all of the crate::interpreter::html::Attrs that we recognize along with the inner runs of text and other elements. For something like

<p>In a paragraph <a href="https://example.org">https://example.org</a></p>

would get represented as something like

use crate::interpreter::html::{attr::Align, Attr, TagName};

enum HirOrText {
    Hir(Hir),
    Text(String),
}

vec!Hir {
    tag: TagName::Paragraph,
    attrs: vec![],
    contents: vec![
        HirOrText::Text("In a paragraph ".into()),
        HirOrText::Hir(Hir {
            tag: TagName::Anchor,
            attrs: vec![Attr::Href("https://example.org".into())],
            contents: vec![HirOrText::Text("https'//example.org".into())]
        }),
        ...
    ],
}

That would be the main change as the existing code for the AST can be re-written to run over the HIR without having to change much of the logic there (although it opens the door to more freely change things in the future)

nicoburns · 2024-04-29T16:18:21Z

You may be interested in the architecture of https://github.com/DioxusLabs/blitz which parses HTML into a dom-like structure before doing style then layout then rendering passes. It's a more ambitious project as we aim to actually render arbitrary HTML (including CSS but excluding JavaScript), but it might make sense for inlyne too.

(once Blitz is a bit more developed you could also straight-up build inlyne of it / help contribute if that's an approach you were interested in)

CosmicHorrorDev · 2024-04-29T18:56:40Z

@kokoISnoTarget is currently working on things with #318, so I'll leave the immediate direction of where things go up to them, but yes I am very interested in sharing more of the nitty gritty HTML bits

kokoISnoTarget · 2024-04-29T19:23:44Z

(once stylo-dioxus is a bit more developed you could also straight-up build inlyne of it / help contribute if that's an approach you were interested in)

It isn't ready, so nothing I would decide on now. But sounds promising.

nicoburns · 2024-06-13T23:28:05Z

We've just landed inline / inline-block support in Blitz (https://github.com/DioxusLabs/blitz). It's still quite early / immature, and we don't have feature parity with inlyne (missing things like text selection, table layout, and hot reloading) but it's now in a state that inlyne devs could look into using / contributing to / building on it if you were interested. This is a screenshot of blitz rendering it's own README (the markdown example in the repository):

We have a full HTML style/layout engine so this example is using comrak to compile the README.md file to HTML and applying a stylesheet from https://github.com/sindresorhus/github-markdown-css (theming is just CSS).

CosmicHorrorDev · 2024-06-14T00:32:04Z

Thanks for the update! Things are looking very nice

I think my only big concern with migrating to using something like blitz is that I don't know if there will be a nice layer of abstraction to describe the parts that inlyne specifically needs in terms of both 1) the low-level of control we need, and 2) the loss of control over a large chunk of our dependencies

Both of those are pretty crucial to trying to maintain being a lightweight markdown viewer (although that's still more of a future goal than the current state). Maybe it's better at this point for us to steal different things from each other as features land. It's also totally possible that blitz's development will far outpace ours and maybe in the future it'd be silly for us to not use it 🤷

CosmicHorrorDev added C-enhancement Category: New feature or request C-refactor Category: Reworking an existing feature labels Nov 23, 2023

CosmicHorrorDev mentioned this issue Nov 24, 2023

Add tests for elements nested in a list item #176

Merged

CosmicHorrorDev mentioned this issue Feb 11, 2024

Add support for theme-specific picture+source image variants #231

Closed

CosmicHorrorDev added the A-interpreter Area: Changes impacting the HTML interpreter label Feb 12, 2024

kokoISnoTarget mentioned this issue Apr 14, 2024

Add hir #318

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor interpreter into working over a separate HIR and AST #175

Refactor interpreter into working over a separate HIR and AST #175

CosmicHorrorDev commented Nov 23, 2023

kokoISnoTarget commented Mar 24, 2024 •

edited

Loading

CosmicHorrorDev commented Mar 24, 2024 •

edited

Loading

nicoburns commented Apr 29, 2024 •

edited

Loading

CosmicHorrorDev commented Apr 29, 2024

kokoISnoTarget commented Apr 29, 2024 •

edited

Loading

nicoburns commented Jun 13, 2024 •

edited

Loading

CosmicHorrorDev commented Jun 14, 2024

Refactor interpreter into working over a separate HIR and AST #175

Refactor interpreter into working over a separate HIR and AST #175

Comments

CosmicHorrorDev commented Nov 23, 2023

kokoISnoTarget commented Mar 24, 2024 • edited Loading

CosmicHorrorDev commented Mar 24, 2024 • edited Loading

nicoburns commented Apr 29, 2024 • edited Loading

CosmicHorrorDev commented Apr 29, 2024

kokoISnoTarget commented Apr 29, 2024 • edited Loading

nicoburns commented Jun 13, 2024 • edited Loading

CosmicHorrorDev commented Jun 14, 2024

kokoISnoTarget commented Mar 24, 2024 •

edited

Loading

CosmicHorrorDev commented Mar 24, 2024 •

edited

Loading

nicoburns commented Apr 29, 2024 •

edited

Loading

kokoISnoTarget commented Apr 29, 2024 •

edited

Loading

nicoburns commented Jun 13, 2024 •

edited

Loading