Skip to content

Commit

Permalink
Revert "Remove conf dependence in xcpuinfo_abs_to_mac()"
Browse files Browse the repository at this point in the history
This reverts commit 86495bd.

Commmit 86495bd made task/cgroup get CPU topology info from hwloc
instead of the config (task/cgroup uses xcpuinfo_abs_to_mac()). This made
task/cgroup and task/affinity get out of sync when ThreadsPerCore was
configured to 1 but was actually 2 in the hardware. Since task/affinity
was still getting CPU topology information from the conf (dist_tasks.c
-> _get_avail_map()), this resulted in situations where task/affinity
was trying to bind to CPUs in the incorrect cgroup, resulting in
task_p_pre_launch errors.

Bug 9244, 10613.
  • Loading branch information
hintron authored and wickberg committed Jan 15, 2021
1 parent c51f641 commit 25d74cc
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 11 deletions.
2 changes: 2 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ documents those changes that are of interest to users and administrators.
calling prec_extra.
-- Cleanup all tracked jobacct tasks when extern step child process finishes.
-- slurmrestd/dbv0.0.36 - Correct structure of dbv0.0.36_tres_list.
-- Fix regression causing task/affinity and task/cgroup to be out of sync when
configured ThreadsPerCore is different than the physical threads per core.

* Changes in Slurm 20.11.2
==========================
Expand Down
20 changes: 9 additions & 11 deletions src/slurmd/common/xcpuinfo.c
Original file line number Diff line number Diff line change
Expand Up @@ -995,19 +995,17 @@ xcpuinfo_fini(void)
*/
int xcpuinfo_abs_to_mac(char *lrange, char **prange)
{
uint16_t total_cores, total_cpus;
static int total_cores = -1, total_cpus = -1;
bitstr_t* absmap = NULL;
bitstr_t* macmap = NULL;
int icore, ithread;
int absid, macid;
int rc = SLURM_SUCCESS;

/* init internal data if not already done */
if (xcpuinfo_init() != XCPUINFO_SUCCESS)
return SLURM_ERROR;

total_cores = sockets * cores;
total_cpus = block_map_size;
if (total_cores == -1) {
total_cores = conf->sockets * conf->cores;
total_cpus = conf->block_map_size;
}

/* allocate bitmap */
absmap = bit_alloc(total_cores);
Expand All @@ -1024,14 +1022,14 @@ int xcpuinfo_abs_to_mac(char *lrange, char **prange)
goto end_it;
}

/* mapping abstract id to machine id using block_map */
/* mapping abstract id to machine id using conf->block_map */
for (icore = 0; icore < total_cores; icore++) {
if (bit_test(absmap, icore)) {
for (ithread = 0; ithread < threads; ithread++) {
absid = (icore * threads) + ithread;
for (ithread = 0; ithread < conf->threads; ithread++) {
absid = icore * conf->threads + ithread;
absid %= total_cpus;

macid = block_map[absid];
macid = conf->block_map[absid];
macid %= total_cpus;

bit_set(macmap, macid);
Expand Down

0 comments on commit 25d74cc

Please sign in to comment.