-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test crate to compile DataFusion with wasm-pack #7633
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonmmease -- I tried out the instructions and they worked great!
The CI run also looks good: https://github.com/apache/arrow-datafusion/actions/runs/6284314634/job/17065561427?pr=7633
Though it does take a non trivial amount of time 🤔
[INFO]: Installing wasm-bindgen...
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[INFO]: License key is set in Cargo.toml but no LICENSE file(s) were found; Please add the LICENSE file(s) to your project directory
[INFO]: :-) Done in 11m 48s
[INFO]: :-) Your wasm pkg is ready to publish at /__w/arrow-datafusion/arrow-datafusion/datafusion/wasmtest/pkg.
Is there anything I should do to make sure wasmtest doesn't get published during releases?
This is probably best answered by @andygrove -- the instructions for doing so are here https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#publish-on-cratesio and since involve publishing each crate separately I think things will be ok.
I wonder if we could move datafusion/wasmtest
to datafusion-examples/wasmtest
? That would make this both easier to discover for others who wanted to use DataFusion in WASM as well as make it clearer this shouldn't be distributed with DataFusion
We could move this as a follow on PR
- `datafusion-physical-expr` | ||
- `datafusion-sql` | ||
|
||
The difficulty with getting the remaining DataFusion crates compiled to WASM is that they have non-optional dependencies on the [`parquet`](https://docs.rs/crate/parquet/) crate with its default features enabled. Several of the default parquet crate features require native dependencies that are not compatible with WASM, in particular the `lz4` and `zstd` features. If we can arrange our feature flags to make it possible to depend on parquet with these features disabled, then it should be possible to compile the core `datafusion` crate to WASM as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tustvold do you have any thoughts about finagling the parquet crate's dependencies so it can compile, by default, on wasm? Should we perhaps change datafusion to disable the parquet
default features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC it is the compression codecs that have issues with WASM, disabling these by default I think would be surprising for users. Further I'm not sure how useful parquet support would be given that only InMemory object_store is supported on WASM, although I may have some time to look into this over the next couple of days
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think we'd want DataFusion's default build to disable the default parquet features, but if we could arrange things so that depending on the datafusion
core crate with default-features=false
would either remove the parquet dependency all together, or disable the default parquet features, then I think we could get things at least compiling for wasm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if running DataFusion in the browser via WASM would be a good feature to add to the documentation Perhaps https://arrow.apache.org/datafusion/user-guide/introduction.html#use-cases or a section in Also, Maybe it would be worth adding a ticket to track getting all of datafusion running in the browser 🤔 |
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Thanks for the review and for trying it out @alamb! I think moving the wasmtest crate to an example makes sense. I'll take a look at doing that later today.
One thing I can do is change
I tried disabling the default parquet features locally, and I did get things compiling. But unfortunately I wasn't able to successfully construct a It would be great to get to the point of being able to use all of DataFusion in the browser, similarly to DuckDB-Wasm. But my sense is that there's still a fair bit of work required. |
I don't see a clear way to do this after all. The I added |
2 min looks good 😍 I plan to merge this later this afternoon after I file a follow on ticket about proper WASM support to capture some information from this PR. Thanks @jonmmease and @tustvold |
- `datafusion-physical-expr` | ||
- `datafusion-sql` | ||
|
||
The difficulty with getting the remaining DataFusion crates compiled to WASM is that they have non-optional dependencies on the [`parquet`](https://docs.rs/crate/parquet/) crate with its default features enabled. Several of the default parquet crate features require native dependencies that are not compatible with WASM, in particular the `lz4` and `zstd` features. If we can arrange our feature flags to make it possible to depend on parquet with these features disabled, then it should be possible to compile the core `datafusion` crate to WASM as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @jonmmease |
* Add test crate to compile DataFusion with wasm-pack * Add datafusion-wasm-app to exercise wasm build * prettier * Add license headers * tomlfmt * tomlfmt * Update datafusion/wasmtest/README.md Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * --dev for faster build --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
Closes #177
As a follow on to #7625 and inspired by @alamb comment in #7624 (comment), this PR adds a
wasmtest
crate that depends on the DataFusion crates that are current WASM-compatible and useswasm-pack
to compile it. This wasm-pack build is added as a GitHub action to make sure that we don't inadvertently break the WASM compatibility of crates that are currently compatible.I also added a
datafusion/wasmtest/datafusion-wasm-app
directory containing a small standalone app that loads and runs thewasmtest
crate. This isn't used in CI, but makes it easy to play with WASM support during development.I also added some notes to the README in
wasmtest
regarding the difficulties I've run into when trying to get the remaining DataFusion crates compiling to WASM. In particular, the native dependencies of the parquet crate seem to be the issue.I put the
wasmtest
crate indatafusion/wasmtest
since I saw thesqllogictest
crate was in there and this seemed like a related idea, but happy to move it somewhere else.Is there anything I should do to make sure
wasmtest
doesn't get published during releases?