Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a Revise workflow on workers #84

Open
kolia opened this issue Aug 6, 2021 · 2 comments
Open

Support a Revise workflow on workers #84

kolia opened this issue Aug 6, 2021 · 2 comments

Comments

@kolia
Copy link
Contributor

kolia commented Aug 6, 2021

devspace sync is an easy local install that establishes a 2-way sync between local folders and folders in containers running in k8s. julia_pod uses this to sync the current julia project folder and with its equivalent in the running container, making it possible to update code locally and use Revise against julia REPL running in k8s.

This convenient development workflow breaks down when using K8sClusterManagers to spin up workers, because workers see separate file systems.

Because of this, it is currently recommended to always test Distributed code using local Distributed procs first before running on k8s.

If devspace sync is installed, we could set up one or more syncs between relevant local and worker folders for each worker as it is spun up, maybe defaulting to the current julia project folder. This would make it possible to use Revise and develop code against code using workers directly.

@omus
Copy link
Member

omus commented Aug 6, 2021

Overall I agree with the premise that it would be good to be able to support a Revise-based workflow without having to fall back on local Distributed processes.

However, I think there are probably better options to this than chaining devspace sync processes on your local system and on the manager pod. One such option is keeping the devspace sync processes on the local system and syncing multiple pods at once.

The tricky part here is that not all pods will be available at the time of the initial sync. We could probably just have a background process which monitors for new pods based upon a selector. When new pods are found it can start syncing to them as well.

Some other alternatives which may be worth investigating:

  • Use NFS to share storage among the manager and worker pods
  • Make Revise.jl distributed-aware. It should be possible to have Revise issue the replacement calls on all processes. This option would be work on any Julia cluster.

@omus omus changed the title devspace sync Support a Revise workflow on workers Aug 6, 2021
@kolia
Copy link
Contributor Author

kolia commented Aug 6, 2021

One such option is keeping the devspace sync processes on the local system and syncing multiple pods at once.

That's what I meant, as far as I can tell devspace is only designed to be run from your local system. Users would probably only turn this syncing on while developing with a small number of workers, and then switch it off when running things on many workers.

The tricky part here is that not all pods will be available at the time of the initial sync. We could probably just have a background process which monitors for new pods based upon a selector. When new pods are found it can start syncing to them as well.

If we do it this way, it could be a little standalone non-julia-specific tool, and the sync part could be removed from julia_pod as we assume users can use this tool instead for syncing.

Some other alternatives which may be worth investigating:

  • Use NFS to share storage among the manager and worker pods

Shared volumes seem to be tied to cloud providers, I haven't seen a cloud-provider-agnostic way to do this.

  • Make Revise.jl distributed-aware. It should be possible to have Revise issue the replacement calls on all processes. This option would be work on any Julia cluster.

That would be nice! but I wouldn't know where to begin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants