Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set cpus-per-tasks in sbatch script #36

Merged
merged 4 commits into from
Nov 8, 2019
Merged

Set cpus-per-tasks in sbatch script #36

merged 4 commits into from
Nov 8, 2019

Conversation

sebschub
Copy link
Contributor

Fixes #33

I did not rename cpus_per_node. While there is a discrepancy with the slurm option name, I like the easier to understand option pair nodes and cpus_per_node. In addition, it's backwards compatible.
If you like it, I could add a comment to the documentation about cpus_per_node and #SBATCH --cpus-per-task.

@pmarchand1
Copy link
Collaborator

I agree with not renaming the parameter, since for people less familiar with SLURM, "cpus_per_node" is clearer than "cpus_per_task" (it's not clear if the task applies to one node or all nodes).

Could you update the documentation in slurm_apply.R before we merge the pull request?

I would suggest changing the current description of the cpus_per_node parameter:

#' @param cpus_per_node The number of CPUs per node on the cluster; determines how
#'   many processes are run in parallel per node.

to:
"The number of CPUs requested per node, i.e. how many processes to run in parallel per node; this is mapped to the "cpus-per-task" Slurm parameter."

Also, this part of the details section:
"The "array", "job-name", "nodes" and "output" options are already determined by \code{slurm_apply} and should not be manually set."
should be modified to add "cpus-per-task" to that list.

@sebschub
Copy link
Contributor Author

I added the requested documentation. There were also some other roxygen changes not applied so put the Rd updates in another commit.

@mcuma
Copy link

mcuma commented Oct 1, 2019

Can someone please accept this pull request so this fix gets into the release version? I am getting hit by this as well.
Thanks.

@mcuma
Copy link

mcuma commented Oct 1, 2019

BTW, I don't want to sound like a snob, but, the rslurm "nodes" are effectively SLURM's "tasks", which are different than SLURM/cluster "nodes" (node = physical computer, task = work piece run on that physical computer), so, I have to admit that it sounds confusing. I had to run a few rslurm examples to figure out that "node" is a "task", not the SLURM node.

For those familiar with SLURM, calling the rslurm "nodes" "tasks" would make much more sense. Then you'd ask for N "tasks" which means we'll run N R workers (N SLURM job array jobs). And then it would make sense to use the cpus_per_task to say how many CPU cores per R worker to run. cpus_per_node refers to the physical CPU core count in that piece of hardware.

MC

@pmarchand1
Copy link
Collaborator

pmarchand1 commented Oct 2, 2019

@mcuma The motivation for creating this package and in particular the slurm_apply function was to automatically split a parallel task on multiple nodes of a SLURM cluster where MPI was not supported, so R had to "manually" split up the task with slurm_apply and regroup the pieces with get_slurm_out. For example, if someone wanted to use 4 nodes with 8 CPUs each, the strategy was to split the initial task into 4 (so it could occupy 4 nodes) and parallelize maximally within each node (hence the cpus_per_node argument).

I understand that rslurm has gained a much wider use now, and there might be cases where someone wants to split up a task for other reasons than to submit it to different nodes that don't communicate. However changing the argument names now raises the question of backwards compatibility as the OP said.

@qdread qdread merged commit 68a3a92 into earthlab:master Nov 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No specification on cpus-per-task in the submit.sh
4 participants