Skip to content
This repository has been archived by the owner on Apr 20, 2021. It is now read-only.

Research supporting WASM for custom code as part of a physical plan #40

Closed
andygrove opened this issue Jul 21, 2019 · 10 comments
Closed
Labels
Milestone

Comments

@andygrove
Copy link
Collaborator

andygrove commented Jul 21, 2019

As a user of Ballista, I would like the ability to execute arbitrary code as part of my distributed job. I want the ability to use multiple languages depending on my requirements (perhaps there are third party libraries I want to use).

WASM seems like a good potential choice for this?

@rrichardson
Copy link
Contributor

This sounds like an amazing idea. This broadens the range of target languages drastically. With sandboxed execution, no less.

@richardanaya
Copy link

richardanaya commented Jul 22, 2019

I know a lot about wasm! I've written a language that compiles to wasm, cli that executes wasm, wasm libraries, etc. I'd love to offer my skillset in any way even if just for tossing idea against the wall.

Perhaps a good question beyond just executing wasm might be, what type of API are you trying to expose to wasm modules?

@richardanaya
Copy link

Another question might be sort of methodology for distributing WASM assets (or other fully assets) needed by all the compute nodes for their computer graphs.

@andygrove
Copy link
Collaborator Author

The idea would be to execute code as part of a distributed computation against data, so processing rows, columns, batches, or partitions of data.

In SQL the equivalent would be User Defined Functions (UDF) or User Defined Aggregate Functions (UDAF).

@blittable
Copy link

Greetings. I'm John.

Fascinating idea. It sounds like a SQL injection invitation! But, would we lose the benefit of executing native code in the nodes if we side-loaded a wasm execution engine?

For parallelism, Yew supports web-workers, but I can't see any reason it would be limited to the web assembly runtimes.

Wasmer would be the 'simple' choice: https://github.com/wasmerio/wasmer-rust-example

Wasmtime opens up quite a few (future) possibilities.

Cranelift is also interesting. Maybe the most interesting.

So, concretely,

It would optionally receive a parameter (or set of parameters), so the surface of the api would be tightly-coupled with the logical plan? Where would the hook go?

            LogicalPlan::Projection { schema, .. } => &schema,
            LogicalPlan::Selection { input, .. } => input.schema(),
            LogicalPlan::Aggregate { schema, .. } => &schema,
            LogicalPlan::Sort { schema, .. } => &schema,
            LogicalPlan::Limit { schema, .. } => &schema,
            LogicalPlan::CreateExternalTable { schema, .. } => &schema,

@andygrove
Copy link
Collaborator Author

Hi @blittable and thanks for the info. I need to learn more about WASM but I'm hoping we wouldn't need external processes.

The logical plan is used directly for execution in the current PoC but we're going to be moving to a physical plan and much more extensible execution plans - you can take a look at https://github.com/andygrove/ballista/pull/59 to see the direction this is going in.

@andygrove
Copy link
Collaborator Author

I'm also wondering if Ballista could just generate WASM for the full execution plan, kinda like how Apache Spark uses whole stage code generation now (generating JVM bytecode in that case).

@blittable
Copy link

Maybe 'engine' was over-stating it. The rust rt doesn't run wasm directly, so a library would be required. There are independent run-times that could manage the execution.

The payload would be bytes in the protobuf message?

@andygrove
Copy link
Collaborator Author

Related article: https://hacks.mozilla.org/2019/08/webassembly-interface-types/

@sd2k
Copy link
Contributor

sd2k commented Aug 29, 2019

@andygrove andygrove added this to the 2.0 milestone Mar 18, 2020
@andygrove andygrove removed this from the 2.0.0 - a.k. milestone Feb 6, 2021
@andygrove andygrove added this to the 1.0.0 milestone Feb 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants