Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggestion: typescript compilable to WASM #59

Closed
ericprud opened this issue Nov 6, 2020 · 6 comments
Closed

suggestion: typescript compilable to WASM #59

ericprud opened this issue Nov 6, 2020 · 6 comments

Comments

@ericprud
Copy link

ericprud commented Nov 6, 2020

cause it would go like stink.
I would happy to pitch in; I'm motivated by some stuff that takes a minute to parse, which hampers user experience.

@GerHobbelt
Copy link
Owner

🤔 you want to compile typescript to WASM? But isn't that similar to saying you want to compile JavaScript to WASM, or am I reading this incorrectly?

(Aside: jison (jison-gho) is still on the backburner, but I'm getting a little time to fiddle with it again in a month or so, so might do another release if I get enough done. Taking on more would not be possible before 2021 though, I expect. 😢 )

@ericprud
Copy link
Author

ericprud commented Nov 8, 2020

Because JavaScript isn't type-safe, the interpreter spends a lot of time testing types. You basically half to compile that work into a WASM target. AssemblyScript ("a strict variant of TypeScript" that makes typing non-optional)* is type-safe so the WASM target can dispense with that cruft.

The notion I had in mind was not to port Jison to TypeScript, though that could make some maintenance easier, but instead to write the generated parser and state tables in AssemblyScript -- basically a little printf manipulation. That could be used like any other javascript with typescript annotations (mild benefits: enforce type-safe invocation, IDE hints), or compiled to WASM.

I don't know how much of a performance increase that would buy; maybe a lot 'cause you're type-safe and closer to the metal, maybe not a lot 'cause consistent tables of integers are highly-optimized in JavaScript interpreters. At any rate, it would probably produce a smaller image in WASM. A way to test would be to hand-port some parser (I'd use ShEx and test with FHIR) and benchmark it in JavaScript and WASM.

Another option would be TurboScript, which is "based on" TypeScript without memory management (AFAIK) and a restricted set of types (which appear to line up well with what you need to build and navigate state tables and stacks). In principle, it could compile to C and JavaScript, as well as WASM, but that's not yet implemented, and the project seems to be on ice, so probably not a good source language.

  • I'm not sure if "AssemblyScript" is the strict sublanguage of TypeScript or if AssemblyScript is the tool which compiles an unnamed strict sublanguage of TypeScript; I've used it above assuming the former meaning.

@MaxGraey
Copy link

MaxGraey commented Nov 8, 2020

TurboScript was deprecated for a very long time.

@GerHobbelt
Copy link
Owner

GerHobbelt commented Nov 16, 2020

I can see the usefulness of TypeScript for a project like this (doing it is another matter -- time and all that) but the jison kernel isn't WASMasble as it is (or so I guess): it's not just the tables, but also about error handling during lexing / parsing and there are callbacks, etc. there which I wouldn't know how to 'do' in WASM -- this is from reading through WASM introductions and a bit of spec scanning plus old asm.js stuff my brain picked up long time ago and distorted.

Anyway, if you want to have a look at viability, there's two main subjects to consider (at least for me):

  • do we want this to support everything, or is this for a stripped-down parser where several kernel features have been taken out by the generator (like jison-gho does these days: if you don't use error handling/reporting then jison-gho SHOULD remove all error handling code from the kernel), ending with a 'optimized' parser with a minimal feature set.
  • the above then would go together with a kernel inspection to find out about roadblocks and hard-to-translate parts in the kernel. The lexer and parser kernel codes in bleeding edge have been separated out into separate JS files to make life a little easier for me:
    • lexer kernel code is in packages/jison-lex/jison-lexer-kernel.js
    • lexer kernel error "class" definition is in packages/jison-lex/jison-lexer-error-code.js
    • the above two chunks would end up in any generated lexer. jison-gho doesn't do any serious feature stripping in the lexer YET.
    • parser kernel code is in lib/jison-parser-kernel.js -- this one is postprocessed by a regex-based "intelligent" (koff koff koff) feature stripper in the code generator.
    • the parser error "class" is in lib/jison-parser-error-code.js
    • the parser default parseError() function is in lib/jison-parser-parseError-function.js
    • jison --main will also inject a default "main()" code chunk to run the parser as if it were a (very simple) CLI app: this code is in lib/jison-parser-commonJsMain-function.js -- this code is not performance critical so not a concern for WASM in my opinion, just mentioning it for completeness.
    • helper APIs, etc. which are included in every parser class are located in lib/jison-parser-API-section1.js with a single line const API = which is there to make eslint et al happy and stripped off before being dumped in a generated parser.
    • all the above stuff is included in more-or-less stripped-down form in a generated parser+lexer output by jison-gho.
      Have a look at a generated parser and you'll recognize the various chunks in there coming from these files.
    • the kernel codes would be a primary concern for viability research when "porting" this to WASM.

Anyway, that's how jison-gho is "organized". Of course I'd love it if you would use jison-gho, but another odd thought of mine about WASM: wouldn't it be "easier" (for arbitrary levels of "easy" -- probably also non-trivial to get going, but anyway) if you instead took the C language route to WASM via GNU bison (and maybe flex, which is a DFA-based lexer instead of jison's regex-list based approach) and compile their output to WASM using llvm and emscriptem or what-is-the-process-for-that-these-days?

^^^ Just a rough thought, not hampered by any experience with this C-to-WASM stuff, so only riding what I recall of blogs I read about it over the years. 🤭

GerHobbelt added a commit that referenced this issue Nov 16, 2020
- cleanup lib/jison.js Unicode blob in comments
- more work done on lib/jison.js as part of the ES6 migration
- re-add previously removed test ist/sollwert files
- fix jison kernel patcher code for yydebug removal in lib/jison.js
- added patch script to patch the line numbers printed as part of test and assertion failure reports
- adjust the parser generator code to not use `eval()` (which doesn't pass 'let var' variables to the calling scope) but use our own function()-based exec scaffolding instead (borrowed from the lexer test code, where this was already used)
- added test example for issue #59 (grammar is referenced in the comments there)
- fix faulty variable name reference in a parser kernel yydebug statement (due to earlier overzealous ES6 migration cleanup removing the name mapping there)
- fix infinite loop (continue+break interplay) in lib/jison.js
- fix tests/parser/api.js test case: this jison release **requires** yylloc instances to have a valid `range[]` array. (If you want quite different location tracking, one of the required changes would be to specify your own copy_yylloc_native() method via the parser options.)
@ericprud
Copy link
Author

ericprud commented Nov 20, 2020

Yeah, I think that compiling BISON output to WASM with LLVM would give you a pretty optimal executable. I was thinking more about the code path where lazy people (specifically, me) have existing js libraries that they want to shove into the WASM box.

Many thanks for the thorough analysis!

@GerHobbelt
Copy link
Owner

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants