Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application works with UCX_LOG_LEVEL=info (or more verbose levels), but hangs otherwise #9532

Open
bedroge opened this issue Dec 5, 2023 · 6 comments
Labels

Comments

@bedroge
Copy link

bedroge commented Dec 5, 2023

Describe the bug

I'm running into a weird issue on one particular system where importing the Python interface of waLBerla, which I compiled from source using EasyBuild, hangs:

# doesn't work
mpirun -np 1 python -c "import waLBerla"

Then I found that disabling the UCX PML solved the issue:

# works
mpirun --mca pml ^ucx -np 1 python -c "import waLBerla"

So I tried again with UCX and some more debugging output, but then suddenly it works:

# works
UCX_LOG_LEVEL=DEBUG mpirun -np 1 python -c "import waLBerla"

# works too
UCX_LOG_LEVEL=INFO mpirun -np 1 python -c "import waLBerla"

# doesn't work
UCX_LOG_LEVEL=DIAG mpirun -np 1 python -c "import waLBerla"

I've tried it with both the following set of dependencies:

Currently Loaded Modules:
  1) GCCcore/12.2.0
  2) zlib/1.2.12-GCCcore-12.2.0
  3) binutils/2.39-GCCcore-12.2.0
  4) bzip2/1.0.8-GCCcore-12.2.0
  5) ncurses/6.3-GCCcore-12.2.0
  6) libreadline/8.2-GCCcore-12.2.0
  7) Tcl/8.6.12-GCCcore-12.2.0
  8) SQLite/3.39.4-GCCcore-12.2.0
  9) XZ/5.2.7-GCCcore-12.2.0
 10) GMP/6.2.1-GCCcore-12.2.0
 11) libffi/3.4.4-GCCcore-12.2.0
 12) OpenSSL/1.1
 13) Python/3.10.8-GCCcore-12.2.0
 14) pybind11/2.10.3-GCCcore-12.2.0
 15) GCC/12.2.0
 16) OpenBLAS/0.3.21-GCC-12.2.0
 17) FlexiBLAS/3.2.1-GCC-12.2.0
 18) FFTW/3.3.10-GCC-12.2.0
 19) gfbf/2022b
 20) SciPy-bundle/2023.02-gfbf-2022b
 21) numactl/2.0.16-GCCcore-12.2.0
 22) libxml2/2.10.3-GCCcore-12.2.0
 23) libpciaccess/0.17-GCCcore-12.2.0
 24) hwloc/2.8.0-GCCcore-12.2.0
 25) libevent/2.1.12-GCCcore-12.2.0
 26) UCX/1.13.1-GCCcore-12.2.0
 27) libfabric/1.16.1-GCCcore-12.2.0
 28) PMIx/4.2.2-GCCcore-12.2.0
 29) UCC/1.1.0-GCCcore-12.2.0
 30) OpenMPI/4.1.4-GCC-12.2.0
 31) gompi/2022b
 32) gzip/1.12-GCCcore-12.2.0
 33) lz4/1.9.4-GCCcore-12.2.0
 34) zstd/1.5.2-GCCcore-12.2.0
 35) ICU/72.1-GCCcore-12.2.0
 36) Boost.MPI/1.77.0-gompi-2022b

and with some slightly newer versions:

Currently Loaded Modules:
  1) OpenSSL/1.1
  2) GCC/12.3.0
  3) OpenBLAS/0.3.23-GCC-12.3.0
  4) FlexiBLAS/3.3.1-GCC-12.3.0
  5) FFTW/3.3.10-GCC-12.3.0
  6) gfbf/2023a
  7) binutils/2.40-GCCcore-12.3.0
  8) bzip2/1.0.8-GCCcore-12.3.0
  9) ncurses/6.4-GCCcore-12.3.0
 10) libreadline/8.2-GCCcore-12.3.0
 11) zlib/1.2.13-GCCcore-12.3.0
 12) Tcl/8.6.13-GCCcore-12.3.0
 13) SQLite/3.42.0-GCCcore-12.3.0
 14) XZ/5.4.2-GCCcore-12.3.0
 15) libffi/3.4.4-GCCcore-12.3.0
 16) cffi/1.15.1-GCCcore-12.3.0
 17) cryptography/41.0.1-GCCcore-12.3.0
 18) virtualenv/20.23.1-GCCcore-12.3.0
 19) Python-bundle-PyPI/2023.06-GCCcore-12.3.0
 20) GCCcore/12.3.0
 21) Python/3.11.3-GCCcore-12.3.0
 22) pybind11/2.11.1-GCCcore-12.3.0
 23) SciPy-bundle/2023.07-gfbf-2023a
 24) numactl/2.0.16-GCCcore-12.3.0
 25) libxml2/2.11.4-GCCcore-12.3.0
 26) libpciaccess/0.17-GCCcore-12.3.0
 27) hwloc/2.9.1-GCCcore-12.3.0
 28) libevent/2.1.12-GCCcore-12.3.0
 29) UCX/1.14.1-GCCcore-12.3.0
 30) libfabric/1.18.0-GCCcore-12.3.0
 31) PMIx/4.2.4-GCCcore-12.3.0
 32) UCC/1.2.0-GCCcore-12.3.0
 33) OpenMPI/4.1.5-GCC-12.3.0
 34) gompi/2023a
 35) gzip/1.12-GCCcore-12.3.0
 36) lz4/1.9.4-GCCcore-12.3.0
 37) zstd/1.5.5-GCCcore-12.3.0
 38) ICU/73.2-GCCcore-12.3.0
 39) Boost.MPI/1.79.0-gompi-2023a

And also with UCX/1.15.0 I'm still seeing this same issue.

These are the last lines of strace output for a run that hangs:

read(28, "", 20)                        = 0
close(28)                               = 0
poll([{fd=5, events=POLLIN}, {fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=26, events=POLLIN}], 6, 0) = 0 (Timeout)
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x5), ...}) = 0
fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x5), ...}) = 0
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x5), ...}) = 0
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
getpgrp()                               = 643343
ioctl(0, TIOCGPGRP, [643343])           = 0
poll([{fd=5, events=POLLIN}, {fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=26, events=POLLIN}, {fd=0, events=POLLIN}], 7, -1) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\1\0\0\0\0\0\0\0", 8)          = 8
poll([{fd=5, events=POLLIN}, {fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=26, events=POLLIN}, {fd=0, events=POLLIN}], 7, 0) = 0 (Timeout)
poll([{fd=5, events=POLLIN}, {fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=26, events=POLLIN}, {fd=0, events=POLLIN}], 7, -1

I'm not sure how to get more information, as increasing the verbosity solves the issue. I've included the output for a (successful) run with UCX_LOG_LEVEL at the bottom of this issue.

Steps to Reproduce

  • Command line: mpirun -np 1 python -c "import waLBerla"
  • UCX version used (from github branch XX or release YY) + UCX configure flags (can be checked by ucx_info -v)
    • 1.13.1, 1.14.1, 1.15.0
  • Any UCX environment variables used: none or only UCX_LOG_LEVEL

Setup and versions

  • OS version (e.g Linux distro) + CPU architecture (x86_64/aarch64/ppc64le/...)

    • cat /etc/issue or cat /etc/redhat-release + uname -a
    • Rocky Linux 8.5, Linux login1 4.18.0-348.12.2.el8_5.x86_64 #1 SMP Wed Jan 19 17:53:40 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • For RDMA/IB/RoCE related issues:

    • Driver version:
      • rpm -q rdma-core: rdma-core-35.0-1.el8.x86_64
      • rpm -q libibverbs: libibverbs-35.0-1.el8.x86_64
    • HW information from ibstat or ibv_devinfo -vv command: ibstat is available, but there's no Infiniband, hence no output from the command

Additional information (depending on the issue)

  • OpenMPI version: 4.1.4 and 4.1.5
  • Output of ucx_info -d to show transports and devices recognized by UCX:
# Memory domain: self
#     Component: self
#             register: unlimited, cost: 0 nsec
#           remote key: 0 bytes
#         memory types: host (access,reg,cache)
#
#      Transport: self
#         Device: memory
#           Type: loopback
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 6911.00 MB/sec
#              latency: 0 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 8K
#             am_bcopy: <= 8K
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 0 bytes
#        iface address: 8 bytes
#       error handling: ep_check
#
#
# Memory domain: tcp
#     Component: tcp
#             register: unlimited, cost: 0 nsec
#           remote key: 0 bytes
#         memory types: host (access,reg,cache)
#
#      Transport: tcp
#         Device: lo
#           Type: network
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 11.91/ppn + 0.00 MB/sec
#              latency: 10960 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 1
#     device num paths: 1
#              max eps: 256
#       device address: 18 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure, ep_check, keepalive
#
#      Transport: tcp
#         Device: eth0
#           Type: network
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 11.32/ppn + 0.00 MB/sec
#              latency: 10960 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 0
#     device num paths: 1
#              max eps: 256
#       device address: 6 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure, ep_check, keepalive
#
#
# Connection manager: tcp
#      max_conn_priv: 2064 bytes
#
# Memory domain: sysv
#     Component: sysv
#             allocate: unlimited
#           remote key: 12 bytes
#           rkey_ptr is supported
#         memory types: host (access,alloc,cache)
#
#      Transport: sysv
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 15360.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 100
#             am_bcopy: <= 8256
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 8 bytes
#       error handling: ep_check
#
#
# Memory domain: posix
#     Component: posix
#             allocate: <= 16173312K
#           remote key: 24 bytes
#           rkey_ptr is supported
#         memory types: host (access,alloc,cache)
#
#      Transport: posix
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 15360.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 100
#             am_bcopy: <= 8256
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 8 bytes
#       error handling: ep_check
#
# < failed to open connection manager rdmacm >
#
# Memory domain: cma
#     Component: cma
#             register: unlimited, cost: 9 nsec
#         memory types: host (access,reg,cache)
#
#      Transport: cma
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 11145.00 MB/sec
#              latency: 80 nsec
#             overhead: 2000 nsec
#            put_zcopy: unlimited, up to 16 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 1
#            get_zcopy: unlimited, up to 16 iov
#  get_opt_zcopy_align: <= 1
#        get_align_mtu: <= 1
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 4 bytes
#       error handling: peer failure, ep_check
#


  • Configure result - config.log
  • Log file - configure UCX with "--enable-logging" - and run with "UCX_LOG_LEVEL=data"
    With UCX_LOG_LEVEL=data things work fine, but here it is anyway:
[1701777539.029420] [login1:643305:0]           debug.c:1146 UCX  DEBUG using signal stack 0x7fe1dcc9b000 size 141824
[1701777539.029493] [login1:643305:0]             cpu.c:233  UCX  DEBUG CPU does not support invariant TSC, using fallback timer
[1701777539.029518] [login1:643305:0]            init.c:118  UCX  DEBUG /project/boegelbot/Rocky8/haswell/software/UCX/1.13.1-GCCcore-12.2.0/lib/libucs.so.0 loaded at 0x7fe1dce15000
[1701777539.029541] [login1:643305:0]            init.c:120  UCX  DEBUG cmd line: python -c import waLBerla 
[1701777539.029555] [login1:643305:0]          module.c:72   UCX  DEBUG ucs library path: /project/boegelbot/Rocky8/haswell/software/UCX/1.13.1-GCCcore-12.2.0/lib/libucs.so.0
[1701777539.029560] [login1:643305:0]          module.c:282  UCX  DEBUG loading modules for ucs
[1701777539.030590] [login1:643305:0]            time.c:22   UCX  DEBUG arch clock frequency: 1000000.00 Hz
[1701777539.030660] [login1:643305:0]     ucp_context.c:1849 UCX  INFO  Version 1.13.1 (loaded from /project/boegelbot/Rocky8/haswell/software/UCX/1.13.1-GCCcore-12.2.0/lib/libucp.so.0)
[1701777539.030672] [login1:643305:0]     ucp_context.c:1624 UCX  DEBUG estimated number of endpoints is 1
[1701777539.030675] [login1:643305:0]     ucp_context.c:1631 UCX  DEBUG estimated number of endpoints per node is 1
[1701777539.030682] [login1:643305:0]     ucp_context.c:1638 UCX  DEBUG estimated bcopy bandwidth is 6081740800.000000
[1701777539.030702] [login1:643305:0]     ucp_context.c:1705 UCX  DEBUG allocation method[0] is md 'sysv'
[1701777539.030709] [login1:643305:0]     ucp_context.c:1705 UCX  DEBUG allocation method[1] is md 'posix'
[1701777539.030716] [login1:643305:0]     ucp_context.c:1717 UCX  DEBUG allocation method[2] is 'huge'
[1701777539.030722] [login1:643305:0]     ucp_context.c:1717 UCX  DEBUG allocation method[3] is 'thp'
[1701777539.030725] [login1:643305:0]     ucp_context.c:1705 UCX  DEBUG allocation method[4] is md '*'
[1701777539.030732] [login1:643305:0]     ucp_context.c:1717 UCX  DEBUG allocation method[5] is 'mmap'
[1701777539.030734] [login1:643305:0]     ucp_context.c:1717 UCX  DEBUG allocation method[6] is 'heap'
[1701777539.030751] [login1:643305:0]          module.c:282  UCX  DEBUG loading modules for uct
[1701777539.033561] [login1:643305:0]          module.c:282  UCX  DEBUG loading modules for uct_ib
[1701777539.033640] [login1:643305:0]           ib_md.c:1195 UCX  DEBUG Failed to get IB device list, assuming no devices are present
[1701777539.034009] [login1:643305:0]           ib_md.c:1195 UCX  DEBUG Failed to get IB device list, assuming no devices are present
[1701777539.034080] [login1:643305:0]           mpool.c:98   UCX  DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1701777539.037397] [login1:643305:0]           async.c:230  UCX  DEBUG added async handler 0x954bf0 [id=23 ref 1] ucs_rcache_invalidate_handler() to hash
[1701777539.037532] [login1:643305:0]           async.c:508  UCX  DEBUG listening to async event fd 23 events 0x1 mode thread_spinlock
[1701777539.037600] [login1:643305:0]          module.c:282  UCX  DEBUG loading modules for ucm
[1701777539.037632] [login1:643305:0]     ucp_context.c:1913 UCX  DEBUG created ucp context 0x95dcb0 0x95dcb0 [5 mds 6 tls] features 0x1 tl bitmap 0x3f 0x0
[1701777539.044183] [login1:643305:0]           async.c:155  UCX  DEBUG removed async handler 0x954bf0 [id=23 ref 1] ucs_rcache_invalidate_handler() from hash
[1701777539.044199] [login1:643305:0]           async.c:561  UCX  DEBUG removing async handler 0x954bf0 [id=23 ref 1] ucs_rcache_invalidate_handler()
[1701777539.044270] [login1:643305:0]           async.c:170  UCX  DEBUG release async handler 0x954bf0 [id=23 ref 0] ucs_rcache_invalidate_handler()
[1701777539.044283] [login1:643305:0]         pgtable.c:618  UCX  DEBUG purge empty page table
[1701777539.044293] [login1:643305:0]           mpool.c:154  UCX  DEBUG mpool rcache_mp destroyed
@bedroge bedroge added the Bug label Dec 5, 2023
@bedroge
Copy link
Author

bedroge commented Dec 5, 2023

Forgot to mention it, but the same version of waLBerla works fine on this system (regardless of UCX_LOG_LEVEL) when using even older versions of the compiler toolchain:

Currently Loaded Modules:
  1) GCCcore/10.3.0
  2) zlib/1.2.11-GCCcore-10.3.0
  3) binutils/2.36.1-GCCcore-10.3.0
  4) GCC/10.3.0
  5) numactl/2.0.14-GCCcore-10.3.0
  6) XZ/5.2.5-GCCcore-10.3.0
  7) libxml2/2.9.10-GCCcore-10.3.0
  8) libpciaccess/0.16-GCCcore-10.3.0
  9) hwloc/2.4.1-GCCcore-10.3.0
 10) OpenSSL/1.1
 11) libevent/2.1.12-GCCcore-10.3.0
 12) UCX/1.10.0-GCCcore-10.3.0
 13) libfabric/1.12.1-GCCcore-10.3.0
 14) PMIx/3.2.3-GCCcore-10.3.0
 15) OpenMPI/4.1.1-GCC-10.3.0
 16) OpenBLAS/0.3.15-GCC-10.3.0
 17) FlexiBLAS/3.0.4-GCC-10.3.0
 18) gompi/2021a
 19) FFTW/3.3.9-gompi-2021a
 20) ScaLAPACK/2.1.0-gompi-2021a-fb
 21) foss/2021a
 22) bzip2/1.0.8-GCCcore-10.3.0
 23) ncurses/6.2-GCCcore-10.3.0
 24) libreadline/8.1-GCCcore-10.3.0
 25) Tcl/8.6.11-GCCcore-10.3.0
 26) SQLite/3.35.4-GCCcore-10.3.0
 27) GMP/6.2.1-GCCcore-10.3.0
 28) libffi/3.3-GCCcore-10.3.0
 29) Python/3.9.5-GCCcore-10.3.0
 30) pybind11/2.6.2-GCCcore-10.3.0
 31) gzip/1.10-GCCcore-10.3.0
 32) lz4/1.9.3-GCCcore-10.3.0
 33) zstd/1.4.9-GCCcore-10.3.0
 34) ICU/69.1-GCCcore-10.3.0
 35) Boost.MPI/1.76.0-gompi-2021a
 36) SciPy-bundle/2021.05-foss-2021a

@yosefe
Copy link
Contributor

yosefe commented Dec 5, 2023

@bedroge can you pls attach to the hanging process with gdb and post the backtrace of the hang (gdb command is "thread apply all backtrace")

@bedroge
Copy link
Author

bedroge commented Dec 6, 2023

Sure! Here it is (for the python process):

(gdb) thread apply all backtrace

Thread 4 (Thread 0x7f5467989700 (LWP 903569)):
#0  0x00007f547e24775d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f547e240b44 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f547efcd19f in tls_get_addr_tail.isra () from /lib64/ld-linux-x86-64.so.2
#3  0x00007f547efd3cdc in __tls_get_addr () from /lib64/ld-linux-x86-64.so.2
#4  0x00007f546c27fe8a in ucs_log_set_thread_name (format=format@entry=0x7f546c2930a8 "a") at /tmp/boegelbot/UCX/1.13.1/GCCcore-12.2.0/ucx-1.13.1/src/ucs/log.c:576
#5  0x00007f546c26f85e in ucs_async_thread_func (arg=0xf64c90) at /tmp/boegelbot/UCX/1.13.1/GCCcore-12.2.0/ucx-1.13.1/src/ucs/thread.c:108
#6  0x00007f547e23e17a in start_thread () from /lib64/libpthread.so.0
#7  0x00007f547dd69dc3 in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f546ea21700 (LWP 903566)):
#0  0x00007f547dd6a0f7 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f546f4a54b3 in epoll_dispatch () from /project/boegelbot/Rocky8/haswell/software/libevent/2.1.12-GCCcore-12.2.0/lib/libevent_core-2.1.so.7
#2  0x00007f546f49bc95 in event_base_loop () from /project/boegelbot/Rocky8/haswell/software/libevent/2.1.12-GCCcore-12.2.0/lib/libevent_core-2.1.so.7
#3  0x00007f546ead67e1 in progress_engine () from /project/boegelbot/Rocky8/haswell/software/PMIx/4.2.2-GCCcore-12.2.0/lib/libpmix.so.2
#4  0x00007f547e23e17a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f547dd69dc3 in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f546f44b700 (LWP 903565)):
#0  0x00007f547dd5ea41 in poll () from /lib64/libc.so.6
#1  0x00007f546f4a4825 in poll_dispatch () from /project/boegelbot/Rocky8/haswell/software/libevent/2.1.12-GCCcore-12.2.0/lib/libevent_core-2.1.so.7
#2  0x00007f546f49bc95 in event_base_loop () from /project/boegelbot/Rocky8/haswell/software/libevent/2.1.12-GCCcore-12.2.0/lib/libevent_core-2.1.so.7
#3  0x00007f546f8f854e in progress_engine () from /project/boegelbot/Rocky8/haswell/software/OpenMPI/4.1.4-GCC-12.2.0/lib/libopen-pal.so.40
#4  0x00007f547e23e17a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f547dd69dc3 in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f547f1d7740 (LWP 903564)):
#0  0x00007f547e23f66d in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
#1  0x00007f546c26f579 in ucs_async_thread_stop () at /tmp/boegelbot/UCX/1.13.1/GCCcore-12.2.0/ucx-1.13.1/src/ucs/thread.c:257
#2  0x00007f546c26f7de in ucs_async_thread_remove_event_fd (async=<optimized out>, event_fd=<optimized out>) at /tmp/boegelbot/UCX/1.13.1/GCCcore-12.2.0/ucx-1.13.1/src/ucs/thread.c:353
#3  0x00007f546c26d595 in ucs_async_remove_handler (id=<optimized out>, is_sync=is_sync@entry=1) at /tmp/boegelbot/UCX/1.13.1/GCCcore-12.2.0/ucx-1.13.1/src/ucs/async.c:567
#4  0x00007f546c28239a in ucs_rcache_global_list_remove (rcache=0xbe3a70) at /tmp/boegelbot/UCX/1.13.1/GCCcore-12.2.0/ucx-1.13.1/src/ucs/rcache.c:1193
#5  0x00007f546c2830eb in ucs_rcache_t_cleanup (self=0xbe3a70) at /tmp/boegelbot/UCX/1.13.1/GCCcore-12.2.0/ucx-1.13.1/src/ucs/rcache.c:1331
#6  0x00007f546c28dcae in ucs_class_call_cleanup_chain (cls=cls@entry=0x7f546c2a8620 <ucs_rcache_t_class>, obj=obj@entry=0xbe3a70, limit=limit@entry=-1)
    at /tmp/boegelbot/UCX/1.13.1/GCCcore-12.2.0/ucx-1.13.1/src/ucs/class.c:56
#7  0x00007f546c2841f8 in ucs_rcache_destroy (self=0xbe3a70) at /tmp/boegelbot/UCX/1.13.1/GCCcore-12.2.0/ucx-1.13.1/src/ucs/rcache.c:1358
#8  0x00007f546c31add1 in ucp_mem_rcache_cleanup (context=<optimized out>) at /tmp/boegelbot/UCX/1.13.1/GCCcore-12.2.0/ucx-1.13.1/src/ucp/ucp_mm.c:1048
#9  0x00007f546c307afb in ucp_cleanup (context=0xf5e630) at /tmp/boegelbot/UCX/1.13.1/GCCcore-12.2.0/ucx-1.13.1/src/ucp/ucp_context.c:1938
#10 0x00007f546c3a3265 in mca_pml_ucx_close () from /project/boegelbot/Rocky8/haswell/software/OpenMPI/4.1.4-GCC-12.2.0/lib/openmpi/mca_pml_ucx.so
#11 0x00007f546c3a5719 in mca_pml_ucx_component_close () from /project/boegelbot/Rocky8/haswell/software/OpenMPI/4.1.4-GCC-12.2.0/lib/openmpi/mca_pml_ucx.so
#12 0x00007f546f9148d9 in mca_base_component_close () from /project/boegelbot/Rocky8/haswell/software/OpenMPI/4.1.4-GCC-12.2.0/lib/libopen-pal.so.40
#13 0x00007f546f914965 in mca_base_components_close () from /project/boegelbot/Rocky8/haswell/software/OpenMPI/4.1.4-GCC-12.2.0/lib/libopen-pal.so.40
#14 0x00007f546fc64de4 in mca_pml_base_select () from /project/boegelbot/Rocky8/haswell/software/OpenMPI/4.1.4-GCC-12.2.0/lib/libmpi.so.40
#15 0x00007f546fc70da8 in ompi_mpi_init () from /project/boegelbot/Rocky8/haswell/software/OpenMPI/4.1.4-GCC-12.2.0/lib/libmpi.so.40
#16 0x00007f546fc14c04 in PMPI_Init () from /project/boegelbot/Rocky8/haswell/software/OpenMPI/4.1.4-GCC-12.2.0/lib/libmpi.so.40
#17 0x00007f5470584fa3 in walberla::mpi::MPIManager::initializeMPI(int*, char***, bool) ()
--Type <RET> for more, q to quit, c to continue without paging--
   from /home/bedroge/easybuildinstall/software/waLBerla/6.1-foss-2022b/pythonmodule/waLBerla/walberla_cpp.cpython-310-x86_64-linux-gnu.so
#18 0x00007f5470834bdb in walberla::python_coupling::initWalberlaForPythonModule() ()
   from /home/bedroge/easybuildinstall/software/waLBerla/6.1-foss-2022b/pythonmodule/waLBerla/walberla_cpp.cpython-310-x86_64-linux-gnu.so
#19 0x00007f54703de732 in InitObject::InitObject() ()
   from /home/bedroge/easybuildinstall/software/waLBerla/6.1-foss-2022b/pythonmodule/waLBerla/walberla_cpp.cpython-310-x86_64-linux-gnu.so
#20 0x00007f547038663a in _GLOBAL__sub_I_PythonModule.cpp ()
   from /home/bedroge/easybuildinstall/software/waLBerla/6.1-foss-2022b/pythonmodule/waLBerla/walberla_cpp.cpython-310-x86_64-linux-gnu.so
#21 0x00007f547efca8ba in call_init.part () from /lib64/ld-linux-x86-64.so.2
#22 0x00007f547efca9ba in _dl_init () from /lib64/ld-linux-x86-64.so.2
#23 0x00007f547dda530c in _dl_catch_exception () from /lib64/libc.so.6
#24 0x00007f547efcee8e in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#25 0x00007f547dda52b4 in _dl_catch_exception () from /lib64/libc.so.6
#26 0x00007f547efce6b1 in _dl_open () from /lib64/ld-linux-x86-64.so.2
#27 0x00007f547e7d91ea in dlopen_doit () from /lib64/libdl.so.2
#28 0x00007f547dda52b4 in _dl_catch_exception () from /lib64/libc.so.6
#29 0x00007f547dda5373 in _dl_catch_error () from /lib64/libc.so.6
#30 0x00007f547e7d9969 in _dlerror_run () from /lib64/libdl.so.2
#31 0x00007f547e7d928a in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#32 0x00007f547edd9aae in _PyImport_FindSharedFuncptr (prefix=0x7f547ee6b42f "PyInit", shortname=0x7f5470c4b2f0 "walberla_cpp", 
    pathname=0x7f5470c7f540 "/home/bedroge/easybuildinstall/software/waLBerla/6.1-foss-2022b/pythonmodule/waLBerla/walberla_cpp.cpython-310-x86_64-linux-gnu.so", fp=0x0)
    at Modules/transmogrify.h:100
#33 0x00007f547edd89d3 in _PyImport_LoadDynamicModuleWithSpec (fp=<optimized out>, spec=0x7f5470cabbb0) at ./Python/pycore_hashtable.h:137
#34 _imp_create_dynamic_impl (module=<optimized out>, file=<optimized out>, spec=0x7f5470cabbb0) at Objects/pylifecycle.c:2049
#35 _imp_create_dynamic (module=<optimized out>, args=<optimized out>, nargs=<optimized out>) at Modules/fastsearch.h:330
#36 0x00007f547ed4414a in cfunction_vectorcall_FASTCALL (func=0x7f547f1858a0, args=0x7f5470c4bd78, nargsf=<optimized out>, kwnames=<optimized out>) at ./Python/pycore_bitutils.h:430
#37 0x00007f547ed3bcf0 in _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:4277
#38 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
#39 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=2, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#40 0x00007f547ed398b9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f547f1005c8, callable=0x7f547f139510, tstate=0x9823c0)
    at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#41 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f547f1005c8, callable=0x7f547f139510) at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#42 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffcecdb640, tstate=<optimized out>) at Objects/ceval_gil.h:5891
#43 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:4181
#44 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
#45 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=2, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#46 0x00007f547ed38dff in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f547f190838, callable=0x7f547f1ba950, tstate=0x9823c0)
    at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#47 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f547f190838, callable=0x7f547f1ba950) at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#48 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffcecdb8a0, tstate=<optimized out>) at Objects/ceval_gil.h:5891
#49 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:4198
#50 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
--Type <RET> for more, q to quit, c to continue without paging--
#51 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=1, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#52 0x00007f547ed38ab4 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7f547f139ea0, tstate=0x9823c0)
    at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#53 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7f547f139ea0) at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#54 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffcecdbb00, tstate=<optimized out>) at Objects/ceval_gil.h:5891
#55 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:4213
#56 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
#57 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=1, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#58 0x00007f547ed38ab4 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7f547f13a0e0, tstate=0x9823c0)
    at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#59 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7f547f13a0e0) at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#60 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffcecdbd60, tstate=<optimized out>) at Objects/ceval_gil.h:5891
#61 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:4213
#62 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
#63 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=2, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#64 0x00007f547ed38ab4 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7f547f13b2e0, tstate=0x9823c0)
    at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#65 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7f547f13b2e0) at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#66 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffcecdbfc0, tstate=<optimized out>) at Objects/ceval_gil.h:5891
#67 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:4213
#68 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
#69 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=2, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#70 0x00007f547ed437cb in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x7fffcecdc150, callable=0x7f547f13b370, tstate=0x9823c0)
    at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:99
#71 object_vacall (tstate=0x9823c0, base=<optimized out>, callable=0x7f547f13b370, vargs=0x7fffcecdc1e0) at ./Modules/abstract.h:734
#72 0x00007f547ed4ef08 in _PyObject_CallMethodIdObjArgs (obj=0x0, name=<optimized out>) at ./Modules/abstract.h:825
#73 0x00007f547ed4e7da in import_find_and_load (abs_name=0x7f5470ac9d40, abs_name@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=0x9823c0, tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/pylifecycle.c:1521
#74 PyImport_ImportModuleLevelObject (name=0x7f5470ad4370, globals=<optimized out>, locals=<optimized out>, fromlist=0x7f5470c4b7f0, level=1) at Objects/pylifecycle.c:1622
#75 0x00007f547ed3c068 in import_name (level=0x7f547f0d80f0, fromlist=0x7f5470c4b7f0, name=0x7f5470ad4370, f=<optimized out>, tstate=<optimized out>) at Objects/ceval_gil.h:6016
#76 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:3695
#77 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
#78 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#79 0x00007f547edad249 in PyEval_EvalCode (co=0x7f5470c7ddc0, globals=0x7f5470c9d940, locals=0x7f5470c9d940) at Objects/ceval_gil.h:1134
#80 0x00007f547edb4497 in builtin_exec_impl (module=<optimized out>, locals=0x7f5470c9d940, globals=0x7f5470c9d940, source=0x7f5470c7ddc0) at Python/getplatform.c:1003
#81 builtin_exec (module=<optimized out>, args=<optimized out>, nargs=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/cellobject.c:371
#82 0x00007f547ed4414a in cfunction_vectorcall_FASTCALL (func=0x7f547f170e00, args=0x7f5470cb7458, nargsf=<optimized out>, kwnames=<optimized out>) at ./Python/pycore_bitutils.h:430
#83 0x00007f547ed3bcf0 in _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:4277
--Type <RET> for more, q to quit, c to continue without paging--
#84 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
#85 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=3, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#86 0x00007f547ed398b9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f547f19f648, callable=0x7f547f139510, tstate=0x9823c0)
    at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#87 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f547f19f648, callable=0x7f547f139510) at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#88 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffcecdca20, tstate=<optimized out>) at Objects/ceval_gil.h:5891
#89 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:4181
#90 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
#91 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=2, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#92 0x00007f547ed38dff in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f547f1960c0, callable=0x7f547f1b9a20, tstate=0x9823c0)
    at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#93 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f547f1960c0, callable=0x7f547f1b9a20) at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#94 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffcecdcc80, tstate=<optimized out>) at Objects/ceval_gil.h:5891
#95 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:4198
#96 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
#97 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=1, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#98 0x00007f547ed38ab4 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7f547f13a0e0, tstate=0x9823c0)
    at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#99 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7f547f13a0e0) at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#100 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffcecdcee0, tstate=<optimized out>) at Objects/ceval_gil.h:5891
#101 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:4213
#102 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
#103 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=2, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#104 0x00007f547ed38ab4 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7f547f13b2e0, tstate=0x9823c0)
    at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#105 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7f547f13b2e0) at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:123
#106 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffcecdd140, tstate=<optimized out>) at Objects/ceval_gil.h:5891
#107 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:4213
#108 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
#109 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=2, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#110 0x00007f547ed437cb in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x7fffcecdd2d0, callable=0x7f547f13b370, tstate=0x9823c0)
    at /tmp/boegelbot/Python/3.10.8/GCCcore-12.2.0/Python-3.10.8/abstract.c:99
#111 object_vacall (tstate=0x9823c0, base=<optimized out>, callable=0x7f547f13b370, vargs=0x7fffcecdd360) at ./Modules/abstract.h:734
#112 0x00007f547ed4ef08 in _PyObject_CallMethodIdObjArgs (obj=0x0, name=<optimized out>) at ./Modules/abstract.h:825
#113 0x00007f547ed4e7da in import_find_and_load (abs_name=0x7f5470c46370, abs_name@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=0x9823c0, tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/pylifecycle.c:1521
#114 PyImport_ImportModuleLevelObject (name=0x7f5470c46370, globals=<optimized out>, locals=<optimized out>, fromlist=0x7f547efa5ae0 <_Py_NoneStruct>, level=0)
--Type <RET> for more, q to quit, c to continue without paging--
    at Objects/pylifecycle.c:1622
#115 0x00007f547ed3c068 in import_name (level=0x7f547f0d80d0, fromlist=0x7f547efa5ae0 <_Py_NoneStruct>, name=0x7f5470c46370, f=<optimized out>, tstate=<optimized out>)
    at Objects/ceval_gil.h:6016
#116 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Objects/ceval_gil.h:3695
#117 0x00007f547ed379cb in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/marshal.c:46
#118 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kwnames=<optimized out>) at Objects/ceval_gil.h:5065
#119 0x00007f547edad249 in PyEval_EvalCode (co=0x7f5470befe10, globals=0x7f5470c00100, locals=0x7f5470c00100) at Objects/ceval_gil.h:1134
#120 0x00007f547edbd9e3 in run_eval_code_obj (tstate=0x9823c0, co=0x7f5470befe10, globals=0x7f5470c00100, locals=0x7f5470c00100) at Modules/find.h:1291
#121 0x00007f547edb96ea in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7f5470c00100, locals=0x7f5470c00100, flags=<optimized out>, arena=<optimized out>)
    at Modules/find.h:1312
#122 0x00007f547edb17cd in PyRun_StringFlags (str=<optimized out>, start=257, globals=0x7f5470c00100, locals=0x7f5470c00100, flags=0x7fffcecdd8a0) at Modules/find.h:1183
#123 0x00007f547edb172c in PyRun_SimpleStringFlags (command=0x7f5470c12a10 "import waLBerla\n", flags=0x7fffcecdd8a0) at Modules/find.h:503
#124 0x00007f547edc9c7b in pymain_run_command (command=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/fileutils.c:248
#125 pymain_run_python (exitcode=0x7fffcecdd894) at Objects/fileutils.c:578
#126 Py_RunMain () at Objects/fileutils.c:666
#127 0x00007f547ed9ff67 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Objects/fileutils.c:720
#128 0x00007f547dc90493 in __libc_start_main () from /lib64/libc.so.6
#129 0x000000000040106e in _start ()

And for the mpirun process itself:

(gdb) thread apply all backtrace

Thread 4 (Thread 0x7f91a7de2700 (LWP 903563)):
#0  0x00007f91a919c29f in select () from /lib64/libc.so.6
#1  0x00007f91a7deba80 in listen_thread () from /project/boegelbot/Rocky8/haswell/software/OpenMPI/4.1.4-GCC-12.2.0/lib/openmpi/mca_oob_tcp.so
#2  0x00007f91a947517a in start_thread () from /lib64/libpthread.so.0
#3  0x00007f91a91a4dc3 in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f91a8608700 (LWP 903562)):
#0  0x00007f91a919c29f in select () from /lib64/libc.so.6
#1  0x00007f91a8f9e087 in listen_thread () from /project/boegelbot/Rocky8/haswell/software/PMIx/4.2.2-GCCcore-12.2.0/lib/libpmix.so.2
#2  0x00007f91a947517a in start_thread () from /lib64/libpthread.so.0
#3  0x00007f91a91a4dc3 in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f91a8e13700 (LWP 903561)):
#0  0x00007f91a91a50f7 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f91a9a354b3 in epoll_dispatch () from /project/boegelbot/Rocky8/haswell/software/libevent/2.1.12-GCCcore-12.2.0/lib/libevent_core-2.1.so.7
#2  0x00007f91a9a2bc95 in event_base_loop () from /project/boegelbot/Rocky8/haswell/software/libevent/2.1.12-GCCcore-12.2.0/lib/libevent_core-2.1.so.7
#3  0x00007f91a8ec17e1 in progress_engine () from /project/boegelbot/Rocky8/haswell/software/PMIx/4.2.2-GCCcore-12.2.0/lib/libpmix.so.2
#4  0x00007f91a947517a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f91a91a4dc3 in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f91a90a3740 (LWP 903560)):
#0  0x00007f91a9199a41 in poll () from /lib64/libc.so.6
#1  0x00007f91a9a34825 in poll_dispatch () from /project/boegelbot/Rocky8/haswell/software/libevent/2.1.12-GCCcore-12.2.0/lib/libevent_core-2.1.so.7
#2  0x00007f91a9a2bc95 in event_base_loop () from /project/boegelbot/Rocky8/haswell/software/libevent/2.1.12-GCCcore-12.2.0/lib/libevent_core-2.1.so.7
#3  0x0000000000401399 in orterun ()
#4  0x00007f91a90cb493 in __libc_start_main () from /lib64/libc.so.6
#5  0x000000000040113e in _start ()

@yosefe
Copy link
Contributor

yosefe commented Dec 6, 2023

A quick search shows it could be this issue:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=903514

@bedroge
Copy link
Author

bedroge commented Dec 12, 2023

A quick search shows it could be this issue: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=903514

Thanks, I looked into this a bit, and I'm not sure if I completely understood that issue. But I tried recompiling OpenBLAS with USE_TLS=0, but that didn't make a difference. Or did you mean that it could be a similar issue, but between glibc and UCX? Is there any way I can test this somehow?

By the way, I also tried to use the same compiler toolchain but with an older UCX version (1.10.0), and that did work fine.

@yosefe
Copy link
Contributor

yosefe commented Dec 12, 2023

A quick search shows it could be this issue: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=903514

Thanks, I looked into this a bit, and I'm not sure if I completely understood that issue. But I tried recompiling OpenBLAS with USE_TLS=0, but that didn't make a difference. Or did you mean that it could be a similar issue, but between glibc and UCX? Is there any way I can test this somehow?

By the way, I also tried to use the same compiler toolchain but with an older UCX version (1.10.0), and that did work fine.

it seems like an issue between glibc and UCX: a deadlock between reading TLS value from one thread and dlclose() from another thread. dlclose() takes TLS lock, which calls UCX destructor, which tries to stop a thread that is reading TLS and stuck on the TLS lock.
One workaround I can think of is that the main thread would wait for the async thread to get past the point of reading TLS value when spawning a new thread, and before returning to main thread flow and allowing dlclose() to happen.
UCX version 1.10.0 did not use TLS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants