Can we submit functions without requiring containers to have exact same environment? #159

ericmjl · 2019-06-28T00:41:04Z

@mrocklin, I just came across #84, having tried dask on Kubernetes this Wednesday with a colleague, and found one thing that I'm not sure whether to consider a limitation or a feature (in general).

I noticed that the containers that Kubernetes is managing has to match the environment of the Jupyter notebook, otherwise, code from the notebook will not execute on the remote workers. This poses a problem for the use case of "JLab on laptop, execute on remote cluster", I think.

I see the problem showing up in two places.

In the first case, problems arise because I actively develop custom source code in a custom Python package per project. This I consider to be good practice, because it means I can reuse my code across notebooks and set up proper unit tests for things that need to be unit tested. However, if I develop on my laptop and don't work in a container, then changes that are local are not reflected in the remote workers. The alternative, then, is to work inside a container, but that means rebuilding the container and shipping it to the remote workers each time I make a change to the code base and want to test-drive it on remote workers.

In the second case, if I on-the-fly decide that I need a new package and install it in my conda environment, code using the new package will not execute on the remote workers without me re-shipping the containers to the workers.

I'm wondering if it might be possible to just package up every function that is used (and their dependencies) and submit them to the dask cluster, rather than requiring the worker containers to be identical to the jupyter server's compute environment? Or am I missing something from my mental model in thinking this?

mrocklin · 2019-06-28T06:30:13Z

The environments don't need to be identical, but any function you serialize on your client's side will need to deserialize on the worker. So if you've installed scikit-learn version X on your client, and call a function defined in that library then Scikit-Learn version X will also have to be on the worker when we deserialize that function. Moving around libraries like that is out of scope for Dask.

But, for example, Jupyter itself doesn't need to be on the workers, unless you plan to send along a Jupyter function.

ericmjl · 2019-06-28T10:49:45Z

Ok, thanks for clarifying, @mrocklin! Going to close this issue.

ericmjl closed this as completed Jun 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we submit functions without requiring containers to have exact same environment? #159

Can we submit functions without requiring containers to have exact same environment? #159

ericmjl commented Jun 28, 2019 •

edited

Loading

mrocklin commented Jun 28, 2019

ericmjl commented Jun 28, 2019

Can we submit functions without requiring containers to have exact same environment? #159

Can we submit functions without requiring containers to have exact same environment? #159

Comments

ericmjl commented Jun 28, 2019 • edited Loading

mrocklin commented Jun 28, 2019

ericmjl commented Jun 28, 2019

ericmjl commented Jun 28, 2019 •

edited

Loading