Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix issue where job steps wouldn't run if the first node was full
In a multi-node job it was possible to be in a situation where there were more CPUs available for steps to use but steps would not launch. For example, if a node has 2 cores and 1 thread per core and this job is submitted: sbatch -N2 --ntasks-per-node=2 --mem=1000 job.bash And job.bash contains the following: for i in {1..4} do srun --exact --mem=100 -N1 -c1 -n1 sleep 60 & done wait In this case, two steps would run on the first node and one step would run on the second node, but the fourth step would not run until the first step completed, even though there is an available task and CPU on the second node in the allocation. Why does this happen? If the step requests CPUs <= number of nodes, then when _pick_step_nodes() calls _pick_step_nodes_cpus: node_tmp = _pick_step_nodes_cpus(job_ptr, nodes_avail, nodes_needed, cpus_needed, usable_cpu_cnt); it will simply return the first N nodes from the nodes_avail bitmap, where N is the number of nodes that the step requested. In this example job, all the CPUs on the first node are allocated, but the first node remains in the nodes_avail bitmap. Then _pick_step_nodes_cpus() selects the first node and adds it to the nodes_picked bitmap. Right after that, _pick_step_nodes() gets the number of CPUs from nodes in the nodes_picked bitmap, which is 0 CPUs. The fix is to remove fully allocated nodes from nodes_avail bitmap. But this also creates a problem where once all the nodes are fully allocated and another valid step request comes, then an incorrect error message of ESLURM_REQUESTED_NODE_CONFIG_UNAVAILABLE would happen, when the correct error message is ESLURM_NODES_BUSY. So we increment job_blocked_nodes if there are no available cpus on the node. Bug 11357
- Loading branch information