Research supporting WASM for custom code as part of a physical plan #40

andygrove · 2019-07-21T15:34:52Z

As a user of Ballista, I would like the ability to execute arbitrary code as part of my distributed job. I want the ability to use multiple languages depending on my requirements (perhaps there are third party libraries I want to use).

WASM seems like a good potential choice for this?

rrichardson · 2019-07-21T18:07:04Z

This sounds like an amazing idea. This broadens the range of target languages drastically. With sandboxed execution, no less.

richardanaya · 2019-07-22T00:33:58Z

I know a lot about wasm! I've written a language that compiles to wasm, cli that executes wasm, wasm libraries, etc. I'd love to offer my skillset in any way even if just for tossing idea against the wall.

Perhaps a good question beyond just executing wasm might be, what type of API are you trying to expose to wasm modules?

richardanaya · 2019-07-22T00:39:21Z

Another question might be sort of methodology for distributing WASM assets (or other fully assets) needed by all the compute nodes for their computer graphs.

andygrove · 2019-07-22T01:11:38Z

The idea would be to execute code as part of a distributed computation against data, so processing rows, columns, batches, or partitions of data.

In SQL the equivalent would be User Defined Functions (UDF) or User Defined Aggregate Functions (UDAF).

blittable · 2019-07-31T18:19:18Z

Greetings. I'm John.

Fascinating idea. It sounds like a SQL injection invitation! But, would we lose the benefit of executing native code in the nodes if we side-loaded a wasm execution engine?

For parallelism, Yew supports web-workers, but I can't see any reason it would be limited to the web assembly runtimes.

Wasmer would be the 'simple' choice: https://github.com/wasmerio/wasmer-rust-example

Wasmtime opens up quite a few (future) possibilities.

Cranelift is also interesting. Maybe the most interesting.

So, concretely,

It would optionally receive a parameter (or set of parameters), so the surface of the api would be tightly-coupled with the logical plan? Where would the hook go?

            LogicalPlan::Projection { schema, .. } => &schema,
            LogicalPlan::Selection { input, .. } => input.schema(),
            LogicalPlan::Aggregate { schema, .. } => &schema,
            LogicalPlan::Sort { schema, .. } => &schema,
            LogicalPlan::Limit { schema, .. } => &schema,
            LogicalPlan::CreateExternalTable { schema, .. } => &schema,

andygrove · 2019-07-31T21:09:35Z

Hi @blittable and thanks for the info. I need to learn more about WASM but I'm hoping we wouldn't need external processes.

The logical plan is used directly for execution in the current PoC but we're going to be moving to a physical plan and much more extensible execution plans - you can take a look at https://github.com/andygrove/ballista/pull/59 to see the direction this is going in.

andygrove · 2019-07-31T21:10:16Z

I'm also wondering if Ballista could just generate WASM for the full execution plan, kinda like how Apache Spark uses whole stage code generation now (generating JVM bytecode in that case).

blittable · 2019-07-31T23:48:33Z

Maybe 'engine' was over-stating it. The rust rt doesn't run wasm directly, so a library would be required. There are independent run-times that could manage the execution.

The payload would be bytes in the protobuf message?

andygrove · 2019-08-21T17:00:50Z

Related article: https://hacks.mozilla.org/2019/08/webassembly-interface-types/

sd2k · 2019-08-29T21:54:02Z

Also potentially related: https://medium.com/wasmer/announcing-the-first-postgres-extension-to-run-webassembly-561af2cfcb1

andygrove added this to the 2.0 milestone Mar 18, 2020

andygrove added the design label Feb 6, 2021

andygrove removed this from the 2.0.0 - a.k. milestone Feb 6, 2021

andygrove added this to the 1.0.0 milestone Feb 21, 2021

andygrove closed this as completed Apr 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research supporting WASM for custom code as part of a physical plan #40

Research supporting WASM for custom code as part of a physical plan #40

andygrove commented Jul 21, 2019 •

edited

rrichardson commented Jul 21, 2019

richardanaya commented Jul 22, 2019 •

edited

richardanaya commented Jul 22, 2019

andygrove commented Jul 22, 2019

blittable commented Jul 31, 2019

andygrove commented Jul 31, 2019

andygrove commented Jul 31, 2019

blittable commented Jul 31, 2019

andygrove commented Aug 21, 2019

sd2k commented Aug 29, 2019

Research supporting WASM for custom code as part of a physical plan #40

Research supporting WASM for custom code as part of a physical plan #40

Comments

andygrove commented Jul 21, 2019 • edited

rrichardson commented Jul 21, 2019

richardanaya commented Jul 22, 2019 • edited

richardanaya commented Jul 22, 2019

andygrove commented Jul 22, 2019

blittable commented Jul 31, 2019

andygrove commented Jul 31, 2019

andygrove commented Jul 31, 2019

blittable commented Jul 31, 2019

andygrove commented Aug 21, 2019

sd2k commented Aug 29, 2019

andygrove commented Jul 21, 2019 •

edited

richardanaya commented Jul 22, 2019 •

edited