Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design - Compile the runtime Lexer to WebAssembly #32

Open
ericvergnaud opened this issue Feb 13, 2024 · 16 comments
Open

Design - Compile the runtime Lexer to WebAssembly #32

ericvergnaud opened this issue Feb 13, 2024 · 16 comments

Comments

@ericvergnaud
Copy link
Contributor

ericvergnaud commented Feb 13, 2024

Beyond tooling issues, we also need to deal with paradigms that cannot work with WebAssembly.

In the current runtime, the Lexer is an abstract class, and the generated actual XXXLexer inherits from it.
This paradigm won't work with WebAssembly, especially not across language targets.

Looking at a generated XXXLexer, it doesn't provide behavior, rather it provides data that the runtime Lexer will use.
Therefore an idea that comes to mind is to evolve the design as follows:

  • the generated XXXLexer becomes a standalone class that:
    • provides data to a runtime Lexer instance
    • forwards calls such as nextToken to that Lexer instance
  • the data itself sits in a LexerData record (data class in Kotlin)
  • the runtime Lexer becomes a concrete class that requires a LexerData record when instantiated

My plan is to first make the above work in Kotlin, then compile to Wasm.
Your comments on the proposed design are welcome.

@ericvergnaud
Copy link
Contributor Author

(and more generally, I plan to learn from the lexer migration, and avoid big mistakes when migrating the parser)

@ftomassetti
Copy link
Collaborator

To me seems a plan that makes sense, and it is great to start to see experiments with WASM!

@lppedd
Copy link
Collaborator

lppedd commented Feb 13, 2024

This paradigm won't work with WebAssembly, especially not across language targets.

I think this depends on your goals when exposing the WASM module to target languages.
Which parts of the lexer or parser do you want to expose?

@ericvergnaud
Copy link
Contributor Author

ericvergnaud commented Feb 13, 2024

As you know, Wasm doesn't contain classes, only globals and functions.
Antlr5 needs to connect 3 objects:

  1. a wasm runtime lexer/parser (our code)
  2. a wasm generated lexer/parser (generated from the grammar)
  3. a host language wrapper (generated by the target add-on)

The idea that a class in 2 can derive from a non-class in 1 sounds impossible to me.
Rather it will call functions from 1 and provide callbacks to 1.
Similarly, 3 can and will have bindings to call into 2, but it can't derive from 2 or 3, because to achieve that it would need the ancestor class to genuinely exist in the host language.

So not sure what you mean by what is being exposed ? What am I missing that would make it possible to use inheritance across 1, 2 and 3 ?

@lppedd
Copy link
Collaborator

lppedd commented Feb 13, 2024

Ahhh, I get what you mean now.
But wait, what you want to do is have:

  • a WASM module (.wasm binary) for the runtime
  • a WASM module for the generated grammar
  • a target language wrapper over the generated grammar module

Is that correct?
In case it is, I just don't see why we would want two separate modules. Wasn't the idea to have an all-in-one bundle?

What am I missing that would make it possible to use inheritance across 1, 2 and 3

I'd leave out 3, and focus on 1 and 2, if my understanding of what you want to do is indeed correct.
For that to be possible I guess you'd need the WASM component model.

However, the way Kotlin will implement the component model isn't decided yet, as far as I know.
Thus, it's also impossible to know how we will be able to expose definitions from the WASM module, and what the limitations will be.

@KvanTTT
Copy link
Member

KvanTTT commented Feb 13, 2024

Can't we start with migrating to Gradle and running all tests with Kotlin Wasm? Is it possible?

@lppedd
Copy link
Collaborator

lppedd commented Feb 13, 2024

I would do that before, yeah. We can discuss about it in tomorrow's call in case.

@ftomassetti
Copy link
Collaborator

Looking at a generated XXXLexer, it doesn't provide behavior,

Does this mean no actions or predicates?

@ericvergnaud
Copy link
Contributor Author

ericvergnaud commented Feb 14, 2024

Looking at a generated XXXLexer, it doesn't provide behavior,

Does this mean no actions or predicates?

No, these would be invoked via callbacks rather than inlined. We might treat actions and predicates written in Kotlin differently since these could be inlined, but we're not there yet... (it's an optimization and we shouldn't optimize first)

@ericvergnaud
Copy link
Contributor Author

Can't we start with migrating to Gradle and running all tests with Kotlin Wasm? Is it possible?

I'm not bought into this approach because due to i/o stuff, it would require using WASI, which we're not looking to support (there is no WASI for the web). I'd rather get as close as possible to our target architecture before enabling wasm compilation. But as suggested we can discuss later today.

@ericvergnaud
Copy link
Contributor Author

Ahhh, I get what you mean now. But wait, what you want to do is have:

  • a WASM module (.wasm binary) for the runtime
  • a WASM module for the generated grammar
  • a target language wrapper over the generated grammar module

Is that correct? In case it is, I just don't see why we would want two separate modules. Wasn't the idea to have an all-in-one bundle?

No, the idea is to have a reusable wasm runtime. I can see 3 benefits:

  • faster build time (we don't recompile the full runtime on each grammar change)
  • shared module for deployments that support multiple grammars
  • sticks to a proven and clear separation of concerns

What am I missing that would make it possible to use inheritance across 1, 2 and 3

I'd leave out 3, and focus on 1 and 2, if my understanding of what you want to do is indeed correct. For that to be possible I guess you'd need the WASM component model.

How do you run 1 and 2 without 3 ?

However, the way Kotlin will implement the component model isn't decided yet, as far as I know. Thus, it's also impossible to know how we will be able to expose definitions from the WASM module, and what the limitations will be.

Yes in the end state we should rely on the component model. Given the rather slow speed at which things get done though, we could rely on wasm-merge for the short term (a tool that merges 2 or more modules).

@KvanTTT
Copy link
Member

KvanTTT commented Feb 14, 2024

I'm not bought into this approach because due to i/o stuff, it would require using WASI, which we're not looking to support (there is no WASI for the web).

If I understand correctly, Strumenta antlr kotlin currently supports Kotlin wasm target (I see a wasmJsMain directory there).

@ftomassetti
Copy link
Collaborator

I'm not bought into this approach because due to i/o stuff, it would require using WASI, which we're not looking to support (there is no WASI for the web).

If I understand correctly, Strumenta antlr kotlin currently supports Kotlin wasm target (I see a wasmJsMain directory there).

Yes, my understanding is that in Kotlin there are two WASM targets: one generating WASM and a Js wrapper, intended for running in the browser, and a second target producing WASM for WASI. Both should be supported by the Kotlin target for ANTLR 4

@lppedd please correct me if I am wrong

@lppedd
Copy link
Collaborator

lppedd commented Feb 14, 2024

Yes, Strumenta's repository supports both WASM targets (but overall, it supports * all * Kotlin targets).

due to i/o stuff

It really depends on what I/O stuff we are talking about. If I/O is moved out of the test infrastructure (or at the very beginning, before passing the ball to WASM), targeting wasmJs shouldn't be an issue.

@ericvergnaud
Copy link
Contributor Author

Moving I/O out is indeed one of the preliminary activities required.
Another one is to define an API that will be exposed by the runtime and the generated parser.

@DavidGregory084
Copy link

Looking at a generated XXXLexer, it doesn't provide behavior,

Does this mean no actions or predicates?

No, these would be invoked via callbacks rather than inlined. We might treat actions and predicates written in Kotlin differently since these could be inlined, but we're not there yet... (it's an optimization and we shouldn't optimize first)

For actions and predicates I would suggest looking at the approach used in templating languages like Handlebars.

One of the things that makes Handlebars so portable is that the "helpers" are provided externally, so there is no host language syntax leaking into the template.

I'm aware that the functionality in #51 does not follow this approach at all, but a major release like v5 seems like a good time to change the direction on actions and predicates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants