Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor interpreter into working over a separate HIR and AST #175

Open
CosmicHorrorDev opened this issue Nov 23, 2023 · 7 comments
Open
Labels
A-interpreter Area: Changes impacting the HTML interpreter C-enhancement Category: New feature or request C-refactor Category: Reworking an existing feature

Comments

@CosmicHorrorDev
Copy link
Collaborator

Having been working with the interpreter more it's clear that the current design is pretty painful to work with in almost every way. It currently works by doing a pass over the html5ever tokenstream that stores a pretty minimal amount of state while computing the elements for output on the fly. In theory this is very efficient because we can stream our output and we retain a pretty minimal amount of past state, but it also translates to a soup of conditionals that are incredibly difficult to refactor without regressing other behavior.

This also expresses itself where most of the bugs for things lie in not properly keeping track of enough history. Take for instance this markdown snippet

1. First list item

    Some nested text

2. Second list item

This should render as follows:

  1. First list item

    Some nested text

  2. Second list item

but the actual HTML for this is pretty complex

<ol>
    <li>
        <p>First list item</p>
        <p>Some nested text</p>
    </li>
    <li>
        <p>Second list item</p>
    </li>
</ol>

There's quite a lot involved in rendering this right: you've got to keep track of which list item you're in, the first element in the list item (<li>) gets the line with the list prefix while the rest get indented further on separate lines, etc. Unfortunately this shows as we do a pretty sloppy job of rendering most things that are nested even semi-complexly

I think the better way forward would be to first convert the raw HTML to an HIR (higher intermediate representation or HTML intermediate representation 😁) which is just a nice full structured format describing the things we care about in the HTML doc (basically a more complete version of Element). Then from there we can more easily do the analysis we need to correctly render things above into our output AST which represents how thing are laid out in inlyne's eyes

@CosmicHorrorDev CosmicHorrorDev added C-enhancement Category: New feature or request C-refactor Category: Reworking an existing feature labels Nov 23, 2023
@CosmicHorrorDev CosmicHorrorDev added the A-interpreter Area: Changes impacting the HTML interpreter label Feb 12, 2024
@kokoISnoTarget
Copy link
Contributor

kokoISnoTarget commented Mar 24, 2024

So what things would we need to do? (Because this sounds like a major rewrite)
And what would the HIR contain?

@CosmicHorrorDev
Copy link
Collaborator Author

CosmicHorrorDev commented Mar 24, 2024

So for starters the AST would likely remain a VecDeque<crate::Element>. Each element in the HIR would contain all of the crate::interpreter::html::Attrs that we recognize along with the inner runs of text and other elements. For something like

<p>In a paragraph <a href="https://example.org">https://example.org</a></p>

would get represented as something like

use crate::interpreter::html::{attr::Align, Attr, TagName};

enum HirOrText {
    Hir(Hir),
    Text(String),
}

vec!Hir {
    tag: TagName::Paragraph,
    attrs: vec![],
    contents: vec![
        HirOrText::Text("In a paragraph ".into()),
        HirOrText::Hir(Hir {
            tag: TagName::Anchor,
            attrs: vec![Attr::Href("https://example.org".into())],
            contents: vec![HirOrText::Text("https'//example.org".into())]
        }),
        ...
    ],
}

That would be the main change as the existing code for the AST can be re-written to run over the HIR without having to change much of the logic there (although it opens the door to more freely change things in the future)

@nicoburns
Copy link
Contributor

nicoburns commented Apr 29, 2024

You may be interested in the architecture of https://github.com/DioxusLabs/blitz which parses HTML into a dom-like structure before doing style then layout then rendering passes. It's a more ambitious project as we aim to actually render arbitrary HTML (including CSS but excluding JavaScript), but it might make sense for inlyne too.

(once Blitz is a bit more developed you could also straight-up build inlyne of it / help contribute if that's an approach you were interested in)

@CosmicHorrorDev
Copy link
Collaborator Author

@kokoISnoTarget is currently working on things with #318, so I'll leave the immediate direction of where things go up to them, but yes I am very interested in sharing more of the nitty gritty HTML bits

@kokoISnoTarget
Copy link
Contributor

kokoISnoTarget commented Apr 29, 2024

(once stylo-dioxus is a bit more developed you could also straight-up build inlyne of it / help contribute if that's an approach you were interested in)

It isn't ready, so nothing I would decide on now. But sounds promising.

@nicoburns
Copy link
Contributor

nicoburns commented Jun 13, 2024

We've just landed inline / inline-block support in Blitz (https://github.com/DioxusLabs/blitz). It's still quite early / immature, and we don't have feature parity with inlyne (missing things like text selection, table layout, and hot reloading) but it's now in a state that inlyne devs could look into using / contributing to / building on it if you were interested. This is a screenshot of blitz rendering it's own README (the markdown example in the repository):

Screenshot 2024-06-12 at 16 16 06

We have a full HTML style/layout engine so this example is using comrak to compile the README.md file to HTML and applying a stylesheet from https://github.com/sindresorhus/github-markdown-css (theming is just CSS).

@CosmicHorrorDev
Copy link
Collaborator Author

Thanks for the update! Things are looking very nice

I think my only big concern with migrating to using something like blitz is that I don't know if there will be a nice layer of abstraction to describe the parts that inlyne specifically needs in terms of both 1) the low-level of control we need, and 2) the loss of control over a large chunk of our dependencies

Both of those are pretty crucial to trying to maintain being a lightweight markdown viewer (although that's still more of a future goal than the current state). Maybe it's better at this point for us to steal different things from each other as features land. It's also totally possible that blitz's development will far outpace ours and maybe in the future it'd be silly for us to not use it 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interpreter Area: Changes impacting the HTML interpreter C-enhancement Category: New feature or request C-refactor Category: Reworking an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants