-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set cpus-per-tasks in sbatch script #36
Conversation
I agree with not renaming the parameter, since for people less familiar with SLURM, "cpus_per_node" is clearer than "cpus_per_task" (it's not clear if the task applies to one node or all nodes). Could you update the documentation in slurm_apply.R before we merge the pull request? I would suggest changing the current description of the cpus_per_node parameter: #' @param cpus_per_node The number of CPUs per node on the cluster; determines how #' many processes are run in parallel per node. to: Also, this part of the details section: |
I added the requested documentation. There were also some other roxygen changes not applied so put the Rd updates in another commit. |
Can someone please accept this pull request so this fix gets into the release version? I am getting hit by this as well. |
BTW, I don't want to sound like a snob, but, the rslurm "nodes" are effectively SLURM's "tasks", which are different than SLURM/cluster "nodes" (node = physical computer, task = work piece run on that physical computer), so, I have to admit that it sounds confusing. I had to run a few rslurm examples to figure out that "node" is a "task", not the SLURM node. For those familiar with SLURM, calling the rslurm "nodes" "tasks" would make much more sense. Then you'd ask for N "tasks" which means we'll run N R workers (N SLURM job array jobs). And then it would make sense to use the cpus_per_task to say how many CPU cores per R worker to run. cpus_per_node refers to the physical CPU core count in that piece of hardware. MC |
@mcuma The motivation for creating this package and in particular the slurm_apply function was to automatically split a parallel task on multiple nodes of a SLURM cluster where MPI was not supported, so R had to "manually" split up the task with slurm_apply and regroup the pieces with get_slurm_out. For example, if someone wanted to use 4 nodes with 8 CPUs each, the strategy was to split the initial task into 4 (so it could occupy 4 nodes) and parallelize maximally within each node (hence the cpus_per_node argument). I understand that rslurm has gained a much wider use now, and there might be cases where someone wants to split up a task for other reasons than to submit it to different nodes that don't communicate. However changing the argument names now raises the question of backwards compatibility as the OP said. |
Fixes #33
I did not rename
cpus_per_node
. While there is a discrepancy with the slurm option name, I like the easier to understand option pairnodes
andcpus_per_node
. In addition, it's backwards compatible.If you like it, I could add a comment to the documentation about
cpus_per_node
and#SBATCH --cpus-per-task
.