Skip to content
This repository was archived by the owner on Apr 2, 2025. It is now read-only.
This repository was archived by the owner on Apr 2, 2025. It is now read-only.

hpcrun problem #4

@shuraG

Description

@shuraG

Hi,

I have a WRF that I run with OpenMPI. I want to make a profiling and tracing of MPI processes, so I want to use HPCTOOLKIT. However, when I try to run hpcrun ./wrf.exe. I get the following error:

--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_init failed
  --> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
--------------------------------------------------------------------------

Also, the process is printing the same log and the message keep repeating over and over.

[cluster:30154] *** Process received signal ***
[cluster:30154] Signal: Segmentation fault (11)
[cluster:30154] Signal code: Address not mapped (1)
[cluster:30154] Failing at address: 0x8
[cluster:30154] [ 0] /home/brayme.guaman/Build_HPCToolkit/hpctoolkit/BUILD/lib/hpctoolkit/ext-libs/libmonitor.so(+0x69a5)[0x2b372f3299a5]
[cluster:30154] [ 1] /lib64/libpthread.so.0[0x30aa40f500]
[cluster:30154] [ 2] /home/brayme.guaman/Build_HPCToolkit/hpctoolkit/BUILD/lib/hpctoolkit/libhpcrun.so(+0x2b202)[0x2b372f0f4202]
[cluster:30154] [ 3] /home/brayme.guaman/Build_HPCToolkit/hpctoolkit/BUILD/lib/hpctoolkit/libhpcrun.so(hpcrun_loadmap_map+0x1ba)[0x2b372f0dd6aa]
[cluster:30154] [ 4] /home/brayme.guaman/Build_HPCToolkit/hpctoolkit/BUILD/lib/hpctoolkit/libhpcrun.so(fnbounds_ensure_mapped_dso+0xa4)[0x2b372f0eec04]
[cluster:30154] [ 5] /home/brayme.guaman/Build_HPCToolkit/hpctoolkit/BUILD/lib/hpctoolkit/libhpcrun.so(+0x2649b)[0x2b372f0ef49b]
[cluster:30154] [ 6] /lib64/libc.so.6(dl_iterate_phdr+0xf6)[0x30a9925596]
[cluster:30154] [ 7] /home/brayme.guaman/Build_HPCToolkit/hpctoolkit/BUILD/lib/hpctoolkit/libhpcrun.so(hpcrun_dlopen+0xac)[0x2b372f0eeefc]
[cluster:30154] [ 8] /home/brayme.guaman/Build_HPCToolkit/hpctoolkit/BUILD/lib/hpctoolkit/libhpcrun.so(monitor_dlopen+0x8b)[0x2b372f0daacb]
[cluster:30154] [ 9] /home/brayme.guaman/Build_HPCToolkit/hpctoolkit/BUILD/lib/hpctoolkit/ext-libs/libmonitor.so(dlopen+0xaa)[0x2b372f3277b3]
[cluster:30154] [10] /home/brayme.guaman/Build_WRF/LIBRARIES/openmpi/lib/libopen-pal.so.20(+0x62c90)[0x2b372f826c90]
[cluster:30154] [11] /home/brayme.guaman/Build_WRF/LIBRARIES/openmpi/lib/libopen-pal.so.20(mca_base_component_repository_open+0x1c7)[0x2b372f80b927]
[cluster:30154] [12] /home/brayme.guaman/Build_WRF/LIBRARIES/openmpi/lib/libopen-pal.so.20(mca_base_component_find+0x29a)[0x2b372f80af8a]
[cluster:30154] [13] /home/brayme.guaman/Build_WRF/LIBRARIES/openmpi/lib/libopen-pal.so.20(mca_base_framework_components_register+0x2a)[0x2b372f81489a]
[cluster:30154] [14] /home/brayme.guaman/Build_WRF/LIBRARIES/openmpi/lib/libopen-pal.so.20(mca_base_framework_register+0x1e0)[0x2b372f814d00]
[cluster:30154] [15] /home/brayme.guaman/Build_WRF/LIBRARIES/openmpi/lib/libopen-pal.so.20(mca_base_framework_open+0x1f)[0x2b372f814ebf]
[cluster:30154] [16] /home/brayme.guaman/Build_WRF/LIBRARIES/openmpi/lib/openmpi/mca_ess_hnp.so(+0x3935)[0x2b3730f63935]
[cluster:30154] [17] /home/brayme.guaman/Build_WRF/LIBRARIES/openmpi/lib/libopen-rte.so.20(orte_init+0x253)[0x2b372f55a613]
[cluster:30154] [18] /home/brayme.guaman/Build_WRF/LIBRARIES/openmpi/lib/libopen-rte.so.20(orte_daemon+0x420)[0x2b372f5779d0]
[cluster:30154] [19] orted[0x400816]
[cluster:30154] [20] /home/brayme.guaman/Build_HPCToolkit/hpctoolkit/BUILD/lib/hpctoolkit/ext-libs/libmonitor.so(monitor_main+0xe8)[0x2b372f3312c0]
[cluster:30154] [21] /lib64/libc.so.6(__libc_start_main+0xfd)[0x30a981ecdd]
[cluster:30154] [22] /home/brayme.guaman/Build_HPCToolkit/hpctoolkit/BUILD/lib/hpctoolkit/ext-libs/libmonitor.so(__libc_start_main+0x1c1)[0x2b372f33149e]
[cluster:30154] [23] orted[0x4006e9]
[cluster:30154] *** End of error message ***

Can anyone guide me, please?

Cheers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions