Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binding to certain cores returning invalid on 5x5_amazon resolution #5

Open
glemieux opened this issue Mar 14, 2024 · 2 comments
Open

Comments

@glemieux
Copy link

This issue was discovered after ctsm updated the ccs_confim_cesm version to ccs_config_cesm0.0.92 (ESCOMP/CTSM#2416). Since then, ctsm test cases using 5x5_amazon resolution are failing to run with the following error:

cesm.log

  1 dec0417.hsn.de.hpc.ucar.edu 4: <65-65> is invalid
  2 dec0417.hsn.de.hpc.ucar.edu 4: libnuma: Warning: cpu argument 65-65 is out of range
  3 dec0417.hsn.de.hpc.ucar.edu 4:
  4 dec0417.hsn.de.hpc.ucar.edu 4: usage: numactl [--all | -a] [--balancing | -b] [--interleave= | -i <nodes>]
  5 dec0417.hsn.de.hpc.ucar.edu 4:                [--preferred= | -p <node>] [--physcpubind= | -C <cpus>]
  6 dec0417.hsn.de.hpc.ucar.edu 4:                [--cpunodebind= | -N <nodes>] [--membind= | -m <nodes>]
  7 dec0417.hsn.de.hpc.ucar.edu 4:                [--localalloc | -l] command args ...
  8 dec0417.hsn.de.hpc.ucar.edu 4:        numactl [--show | -s]
  9 dec0417.hsn.de.hpc.ucar.edu 4:        numactl [--hardware | -H]
 10 dec0417.hsn.de.hpc.ucar.edu 4:        numactl [--length | -L <length>] [--offset | -o <offset>] [--shmmode | -M <shmmode>]
 11 dec0417.hsn.de.hpc.ucar.edu 4:                [--strict | -t]
 12 dec0417.hsn.de.hpc.ucar.edu 4:                [--shmid | -I <id>] --shm | -S <shmkeyfile>
 13 dec0417.hsn.de.hpc.ucar.edu 4:                [--shmid | -I <id>] --file | -f <tmpfsfile>
 14 dec0417.hsn.de.hpc.ucar.edu 4:                [--huge | -u] [--touch | -T]
 15 dec0417.hsn.de.hpc.ucar.edu 4:                memory policy [--dump | -d] [--dump-nodes | -D]
 16 dec0417.hsn.de.hpc.ucar.edu 4:
 17 dec0417.hsn.de.hpc.ucar.edu 4: memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l
 18 dec0417.hsn.de.hpc.ucar.edu 4: <nodes> is a comma delimited list of node numbers or A-B ranges or all.
 19 dec0417.hsn.de.hpc.ucar.edu 4: Instead of a number a node can also be:
 20 dec0417.hsn.de.hpc.ucar.edu 4:   netdev:DEV the node connected to network device DEV
 21 dec0417.hsn.de.hpc.ucar.edu 4:   file:PATH  the node the block device of path is connected to
 22 dec0417.hsn.de.hpc.ucar.edu 4:   ip:HOST    the node of the network device host routes through
 23 dec0417.hsn.de.hpc.ucar.edu 4:   block:PATH the node of block device path
 24 dec0417.hsn.de.hpc.ucar.edu 4:   pci:[seg:]bus:dev[:func] The node of a PCI device
 25 dec0417.hsn.de.hpc.ucar.edu 4: <cpus> is a comma delimited list of cpu numbers or A-B ranges or all
 26 dec0417.hsn.de.hpc.ucar.edu 4: all ranges can be inverted with !
 27 dec0417.hsn.de.hpc.ucar.edu 4: all numbers and ranges can be made cpuset-relative with +
 28 dec0417.hsn.de.hpc.ucar.edu 4: the old --cpubind argument is deprecated.
 29 dec0417.hsn.de.hpc.ucar.edu 4: use --cpunodebind or --physcpubind instead
 30 dec0417.hsn.de.hpc.ucar.edu 4: use --balancing | -b to enable Linux kernel NUMA balancing
 31 dec0417.hsn.de.hpc.ucar.edu 4: for the process if it is supported by kernel
 32 dec0417.hsn.de.hpc.ucar.edu 4: <length> can have g (GB), m (MB) or k (KB) suffixes
 33 dec0417.hsn.de.hpc.ucar.edu 3: <64-64> is invalid
 34 dec0417.hsn.de.hpc.ucar.edu 3: libnuma: Warning: cpu argument 64-64 is out of range
 35 dec0417.hsn.de.hpc.ucar.edu 3:
 36 dec0417.hsn.de.hpc.ucar.edu 3: usage: numactl [--all | -a] [--balancing | -b] [--interleave= | -i <nodes>]
 37 dec0417.hsn.de.hpc.ucar.edu 3:                [--preferred= | -p <node>] [--physcpubind= | -C <cpus>]
 38 dec0417.hsn.de.hpc.ucar.edu 3:                [--cpunodebind= | -N <nodes>] [--membind= | -m <nodes>]
 39 dec0417.hsn.de.hpc.ucar.edu 3:                [--localalloc | -l] command args ...
 40 dec0417.hsn.de.hpc.ucar.edu 3:        numactl [--show | -s]
 41 dec0417.hsn.de.hpc.ucar.edu 3:        numactl [--hardware | -H]
 42 dec0417.hsn.de.hpc.ucar.edu 3:        numactl [--length | -L <length>] [--offset | -o <offset>] [--shmmode | -M <shmmode>]
 43 dec0417.hsn.de.hpc.ucar.edu 3:                [--strict | -t]
 44 dec0417.hsn.de.hpc.ucar.edu 3:                [--shmid | -I <id>] --shm | -S <shmkeyfile>
 45 dec0417.hsn.de.hpc.ucar.edu 3:                [--shmid | -I <id>] --file | -f <tmpfsfile>
 46 dec0417.hsn.de.hpc.ucar.edu 3:                [--huge | -u] [--touch | -T]
 47 dec0417.hsn.de.hpc.ucar.edu 3:                memory policy [--dump | -d] [--dump-nodes | -D]
 48 dec0417.hsn.de.hpc.ucar.edu 3:
 49 dec0417.hsn.de.hpc.ucar.edu 3: memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l
 50 dec0417.hsn.de.hpc.ucar.edu 3: <nodes> is a comma delimited list of node numbers or A-B ranges or all.
 51 dec0417.hsn.de.hpc.ucar.edu 3: Instead of a number a node can also be:
 52 dec0417.hsn.de.hpc.ucar.edu 3:   netdev:DEV the node connected to network device DEV
 53 dec0417.hsn.de.hpc.ucar.edu 3:   file:PATH  the node the block device of path is connected to
 54 dec0417.hsn.de.hpc.ucar.edu 3:   ip:HOST    the node of the network device host routes through
 55 dec0417.hsn.de.hpc.ucar.edu 3:   block:PATH the node of block device path
 56 dec0417.hsn.de.hpc.ucar.edu 3:   pci:[seg:]bus:dev[:func] The node of a PCI device
 57 dec0417.hsn.de.hpc.ucar.edu 3: <cpus> is a comma delimited list of cpu numbers or A-B ranges or all
 58 dec0417.hsn.de.hpc.ucar.edu 3: all ranges can be inverted with !
 59 dec0417.hsn.de.hpc.ucar.edu 3: all numbers and ranges can be made cpuset-relative with +
 60 dec0417.hsn.de.hpc.ucar.edu 3: the old --cpubind argument is deprecated.
 61 dec0417.hsn.de.hpc.ucar.edu 3: use --cpunodebind or --physcpubind instead
 62 dec0417.hsn.de.hpc.ucar.edu 3: use --balancing | -b to enable Linux kernel NUMA balancing
 63 dec0417.hsn.de.hpc.ucar.edu 3: for the process if it is supported by kernel
 64 dec0417.hsn.de.hpc.ucar.edu 3: <length> can have g (GB), m (MB) or k (KB) suffixes
 65 dec0417.hsn.de.hpc.ucar.edu: rank 3 exited with code 1
 66 dec0417.hsn.de.hpc.ucar.edu: rank 0 died from signal 15

mpibind.log

Chunk info
  1:ncpus=5:mpiprocs=5:ompthreads=1:mem=230GB:Qlist=cpu:ngpus=0
-- -- -- --
MPI exec line:
  mpiexec --label --line-buffer -n 5 -ppn 5 --cpu-bind none -env OMP_NUM_THREADS=1 /glade/u/apps/opt/mpitools/mpibind/cpu_bind /glade/derecho/scratch/glemieux/ctsm-tests/tests_mpi-nonserial-check-clm_hillslope-dev173/SMS_D_Ld5.5x5_amazon.I1850Clm51Bgc.derecho_gnu.clm-HillslopeC.mpi-nonserial-check-clm_hillslope-dev173/bld/cesm.exe 
-- -- -- --
Binding Report:
rank: 0, cores: 0-0
rank: 1, cores: 1-1
rank: 3, cores: 64-64
rank: 4, cores: 65-65
@roryck
Copy link
Collaborator

roryck commented Mar 15, 2024

Hello Gregory,

Thanks for reporting this. This is actually due to the PBS select line, specifically, because you are only requesting 5 cpus. Under these circumstances, PBS will create a linux cgroup with only 5 cpus, all on the first socket. The mpibind script tries to bind processes across both sockets, to give your job full memory bandwidth, however, core #s > 4 won't exist in the PBS cgroup, hence this failure. So, to get your case to run immediately, try rerunning with 128 CPUs and 5 MPI ranks in the select line, e.g. with something similar to:

#PBS -l select=1:ncpus=128:mpiprocs=5:ompthreads=1:mem=230GB

In general, regardless of how many CPUs you intend to use, you should always request 128 on a derecho node so that you have access to full memory performance.

On the mpibind side, I'll add some code to catch this type of request, and exit gracefully with a more meaningful error message.

Thanks again for the report.

@glemieux
Copy link
Author

Thanks for the detailed explanation @roryck.

@jedwards4b, should I add this as an issue to ccs_config_cesm for an update to config_batch.xml?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants