Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dask in pyscript #6257

Open
martindurant opened this issue May 2, 2022 · 5 comments
Open

dask in pyscript #6257

martindurant opened this issue May 2, 2022 · 5 comments

Comments

@martindurant
Copy link
Member

For those that didn't hear the news, pyscript is a new project from Anaconda, to run a complete CPython runtime in the browser via wasm. I will post the pycon keynote when it becomes available. There are already impressive demos interacting with existing pydata tools (numpy, pandas) as well as hooks into JS display tools (d3...) without any server at all.

One thing pyscript does not have is a good data story, because the browser environment doesn't support (TCP) "sockets" , only HTTP(s) as provided by the browser runtime, and limited by CORS. That means that fsspec has a real hard time doing anything useful.

On the other hand, dask is able to talk to a distributed cluster over websockets - this already works today. My question is, is there any interest in pushing dask forward as an in-browser, async-mode client? Interestingly, this would enable full fsspec operation by doing all operations via a worker. From the point of view of Coiled (or others that run a remote scheduler), it would get around the tricky/jhub issue of where to run the python kernel.

Progress would require a complex build chain to wasm-ify dask's dependencies, but I think that so long as we steer away from explicit socket-level stuff by relying on websockets, it should be doable. (noting that the current python websocket stack relies on python's builtin httplib/sockets, so would need rewriting to use the browser's internal JS-facing interface).

@mrocklin
Copy link
Member

mrocklin commented May 2, 2022 via email

@martindurant
Copy link
Member Author

we'll need to find some real and installable websockets library that plays well with wasm

I believe it is as simple as wrapping the JS native one in python, ideally with the same interface as the existing websocket used by the ws:// comm. The pyscript examples already show (async) http fetching using this method in lieu of python builtins.

@martindurant
Copy link
Member Author

I should have answered this bit too:

but don't know yet how useful it will be

from Coiled and dask's POV in general, it removes the need for the python kernel as a separate piece of the infrastructure. So you can run those nice panel rendering apps or other front-end stuff with just a remote dask cluster, no jupyter/lab/hub, etc. Of course, those technologies come with nice benefits too (persistent file storage!), but this model is much simpler. Now anyone with credentials can kick off compute without any local python installation at all, and also no always-on hub thing; also, the browser is really good at storing local state and auth information.

@martindurant
Copy link
Member Author

REF pyodide/pyodide#574

@jakirkham
Copy link
Member

Also related ( dask/dask#7764 )

jsignell pushed a commit to dask/dask that referenced this issue Jun 20, 2022
This is a small step towards #7764 and dask/distributed#6257. It's basically just defensively importing `threading` and `multiprocessing` and defaulting to the synchronous scheduler if those fail. So this is currently mostly be useful for demos and training around the dask collections API. But it *does* work.

This is distinct from actually getting a `distributed.Client` working and talking to a remote cluster, which will require some actual networking work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants