Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some progress with future.clustermq #3

Open
michaelmayer2 opened this issue Feb 11, 2022 · 4 comments
Open

Some progress with future.clustermq #3

michaelmayer2 opened this issue Feb 11, 2022 · 4 comments

Comments

@michaelmayer2
Copy link

michaelmayer2 commented Feb 11, 2022

I have been playing around with future.clustermq lately and am getting into things. I now can use it to launch more than one worker on Slurm which is great but now I am stuck at https://github.com/michaelmayer2/future.clustermq/blob/master/R/ClusterMQFuture-class.R#L233

workers$receive_data() reports token: "not set" after which it runs $workers$send_common_data(). This eventually leads to success=NULL and the code stops.

I would be curious if there is any pointers on how to transfer the token to the workers. This at least would get me to a state where the workers are up and running to take some work.

Thanks in advance,

Michael.

@michaelmayer2
Copy link
Author

In a debugging session I can see Slurm jobs running

me@future.clustermq$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            5512_1       all  cmq6336       mm  R       0:04      1 all-st-rstudio-1
            5512_2       all  cmq6336       mm  R       0:04      1 all-st-rstudio-1

Also the workers report everything correctly except the token

Browse[1]> workers$receive_data()
$id
[1] "WORKER_READY"

$auth
[1] "tlfzv"

$pkgver
[1] ‘0.8.95.3’

$token
[1] "not set"

@mschubert
Copy link

mschubert commented Feb 12, 2022

Token not set simply means there is no common_data transferred to the worker yet.

Your answer (for the worker API) to id="WORKER_READY", token="not set" should be w$send_common_data() (where w is your worker object; common data needs to be set first with w$set_common_data())

Does that help?

(Note that this will likely change with clustermq=0.9, the worker API is needlessly complicated)

@michaelmayer2
Copy link
Author

Thanks for that input which was very helpful.

Not so much for the w$send_common_data() part (which is already in the future.clustermq code) but more for the hint about API changes. Seemingly the most recent version of clustermq is not compatible with the most recent version of future.clustermq. If I downgrade clustermq to 0.8.8 and add pkgs=character(0) to the data list() in L#222 of R/ClusterMQFuture-class.R, the futures are starting as expected without any further changes.

Scalability is still an issue but this is the same with PSOCK based cluster futures. I hope that there will be a way to have nested parallelization where each clustermq future will run an array task that internally would use zeromq for efficient communication so that the future overhead is not as significant any more. I have not figured out yet if this is even possible right now.

At the moment I have not figured out if there is a way to specify any resources like n_jobs, cores and mem while using the templates as in https://github.com/mschubert/clustermq/blob/master/inst/SLURM.tmpl

@kkmann
Copy link

kkmann commented Dec 14, 2022

Very interesting! Nested futures with clustermq backend would be a very interesting alternative to futures.batchtools for scaling targets piplelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants