You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@mrocklin, I just came across #84, having tried dask on Kubernetes this Wednesday with a colleague, and found one thing that I'm not sure whether to consider a limitation or a feature (in general).
I noticed that the containers that Kubernetes is managing has to match the environment of the Jupyter notebook, otherwise, code from the notebook will not execute on the remote workers. This poses a problem for the use case of "JLab on laptop, execute on remote cluster", I think.
I see the problem showing up in two places.
In the first case, problems arise because I actively develop custom source code in a custom Python package per project. This I consider to be good practice, because it means I can reuse my code across notebooks and set up proper unit tests for things that need to be unit tested. However, if I develop on my laptop and don't work in a container, then changes that are local are not reflected in the remote workers. The alternative, then, is to work inside a container, but that means rebuilding the container and shipping it to the remote workers each time I make a change to the code base and want to test-drive it on remote workers.
In the second case, if I on-the-fly decide that I need a new package and install it in my conda environment, code using the new package will not execute on the remote workers without me re-shipping the containers to the workers.
I'm wondering if it might be possible to just package up every function that is used (and their dependencies) and submit them to the dask cluster, rather than requiring the worker containers to be identical to the jupyter server's compute environment? Or am I missing something from my mental model in thinking this?
The text was updated successfully, but these errors were encountered:
The environments don't need to be identical, but any function you serialize on your client's side will need to deserialize on the worker. So if you've installed scikit-learn version X on your client, and call a function defined in that library then Scikit-Learn version X will also have to be on the worker when we deserialize that function. Moving around libraries like that is out of scope for Dask.
But, for example, Jupyter itself doesn't need to be on the workers, unless you plan to send along a Jupyter function.
@mrocklin, I just came across #84, having tried dask on Kubernetes this Wednesday with a colleague, and found one thing that I'm not sure whether to consider a limitation or a feature (in general).
I noticed that the containers that Kubernetes is managing has to match the environment of the Jupyter notebook, otherwise, code from the notebook will not execute on the remote workers. This poses a problem for the use case of "JLab on laptop, execute on remote cluster", I think.
I see the problem showing up in two places.
In the first case, problems arise because I actively develop custom source code in a custom Python package per project. This I consider to be good practice, because it means I can reuse my code across notebooks and set up proper unit tests for things that need to be unit tested. However, if I develop on my laptop and don't work in a container, then changes that are local are not reflected in the remote workers. The alternative, then, is to work inside a container, but that means rebuilding the container and shipping it to the remote workers each time I make a change to the code base and want to test-drive it on remote workers.
In the second case, if I on-the-fly decide that I need a new package and install it in my conda environment, code using the new package will not execute on the remote workers without me re-shipping the containers to the workers.
I'm wondering if it might be possible to just package up every function that is used (and their dependencies) and submit them to the dask cluster, rather than requiring the worker containers to be identical to the jupyter server's compute environment? Or am I missing something from my mental model in thinking this?
The text was updated successfully, but these errors were encountered: