Research supporting WASM for custom code as part of a physical plan #40
Comments
This sounds like an amazing idea. This broadens the range of target languages drastically. With sandboxed execution, no less. |
I know a lot about wasm! I've written a language that compiles to wasm, cli that executes wasm, wasm libraries, etc. I'd love to offer my skillset in any way even if just for tossing idea against the wall. Perhaps a good question beyond just executing wasm might be, what type of API are you trying to expose to wasm modules? |
Another question might be sort of methodology for distributing WASM assets (or other fully assets) needed by all the compute nodes for their computer graphs. |
The idea would be to execute code as part of a distributed computation against data, so processing rows, columns, batches, or partitions of data. In SQL the equivalent would be User Defined Functions (UDF) or User Defined Aggregate Functions (UDAF). |
Greetings. I'm John. Fascinating idea. It sounds like a SQL injection invitation! But, would we lose the benefit of executing native code in the nodes if we side-loaded a wasm execution engine? For parallelism, Yew supports web-workers, but I can't see any reason it would be limited to the web assembly runtimes. Wasmer would be the 'simple' choice: https://github.com/wasmerio/wasmer-rust-example Wasmtime opens up quite a few (future) possibilities. Cranelift is also interesting. Maybe the most interesting. So, concretely, It would optionally receive a parameter (or set of parameters), so the surface of the api would be tightly-coupled with the logical plan? Where would the hook go?
|
Hi @blittable and thanks for the info. I need to learn more about WASM but I'm hoping we wouldn't need external processes. The logical plan is used directly for execution in the current PoC but we're going to be moving to a physical plan and much more extensible execution plans - you can take a look at https://github.com/andygrove/ballista/pull/59 to see the direction this is going in. |
I'm also wondering if Ballista could just generate WASM for the full execution plan, kinda like how Apache Spark uses whole stage code generation now (generating JVM bytecode in that case). |
Maybe 'engine' was over-stating it. The rust rt doesn't run wasm directly, so a library would be required. There are independent run-times that could manage the execution. The payload would be bytes in the protobuf message? |
Related article: https://hacks.mozilla.org/2019/08/webassembly-interface-types/ |
Also potentially related: https://medium.com/wasmer/announcing-the-first-postgres-extension-to-run-webassembly-561af2cfcb1 |
As a user of Ballista, I would like the ability to execute arbitrary code as part of my distributed job. I want the ability to use multiple languages depending on my requirements (perhaps there are third party libraries I want to use).
WASM seems like a good potential choice for this?
The text was updated successfully, but these errors were encountered: