New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on Stampede #1149

Closed
Finomnis opened this Issue Jun 7, 2014 · 7 comments

Comments

Projects
None yet
2 participants
@Finomnis
Contributor

Finomnis commented Jun 7, 2014

I am attempting to build and run the simplest_hello_world hpx example on the Stampede cluster. It works fine as long as I run it without command line arguments. If I, however, add anything as a command line argument (even simple '-h'), the example segfaults.

These are the modules loaded throughout the entire process:

  1) TACC-paths      4) cluster       7) cuda/5.5          10) mvapich2/2.0b
  2) Linux           5) TACC          8) intel/14.0.1.106  11) python/2.7.6
  3) cluster-paths   6) cmake/2.8.9   9) boost/1.51.0

The entire process takes place on one of the dev-nodes, to which i connect with:

srun -p development      \
     -t 01:00:00         \
     -n 1                \
     -N 1                \
     --pty               \
     bash

This is my build process:

     CC=$ICC_BIN/icc ./autogen.sh --prefix=$INSTALL_PATH
     make dist -k -j16
     make install -k -j16
  cmake                                                       \ 
        -DHPX_NO_INSTALL=On                                   \
        -DCMAKE_AR=$ICC_BIN/xiar                              \
        -DCMAKE_CXX_COMPILER=$ICC_BIN/icpc                    \
        -DCMAKE_C_COMPILER=$ICC_BIN/icc                       \
        -DCMAKE_BUILD_TYPE=Debug                              \
        -DBOOST_ROOT=$TACC_BOOST_DIR                          \
        -DHPX_HAVE_PARCELPORT_MPI=True                        \
        -DHPX_MALLOC="jemalloc"                               \
        -DJEMALLOC_ROOT=$JEMALLOC_INSTALL_PATH                \
        -Wdev                                                 \
        $HPX_REPOSITORY_PATH
  • build hpx with
     make -k -j16
     make -k -j16 examples
  • running simplest_hello_world should work:
     ./build/bin/simplest_hello_world
  • running the same program with command line arguments segfaults:
     ./build/bin/simplest_hello_world --help

The same problem exists with every command line argument I tried.

I then ran it through gdb, to get the backtrace:
https://gist.github.com/Finomnis/209319bc051a18df5d13
https://gist.github.com/Finomnis/2bc97b48265b61fbfa63

Any ideas?

@hkaiser hkaiser added this to the 0.9.9 milestone Jun 7, 2014

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jun 7, 2014

Member

You said you loaded Boost V1.51, but the back traces refer to Boost V1.55. I'm pretty sure that's causing your segfaults.

Member

hkaiser commented Jun 7, 2014

You said you loaded Boost V1.51, but the back traces refer to Boost V1.55. I'm pretty sure that's causing your segfaults.

@Finomnis

This comment has been minimized.

Show comment
Hide comment
@Finomnis

Finomnis Jun 7, 2014

Contributor

Oh, yah, my mistake.
I used Boost v1.55 for a while because I thought the old boost library causes it, but now I reverted back to V1.51.
I can create a new backtrace with the correct boost version.
It doesn't cause the segfault, though. Sadly.

Contributor

Finomnis commented Jun 7, 2014

Oh, yah, my mistake.
I used Boost v1.55 for a while because I thought the old boost library causes it, but now I reverted back to V1.51.
I can create a new backtrace with the correct boost version.
It doesn't cause the segfault, though. Sadly.

@Finomnis

This comment has been minimized.

Show comment
Hide comment
@Finomnis
Contributor

Finomnis commented Jun 7, 2014

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jun 7, 2014

Member

Thanks for this update. It looks like it's failing at a different place now. However I'm still convinced that this is caused by a versioning mixup. There simply is no other explanation for this behavior. Would you mind thoroughly checking what versions of Boost were used during compilation and during runtime. Have these versions of Boost been compiled with the same compiler?

Member

hkaiser commented Jun 7, 2014

Thanks for this update. It looks like it's failing at a different place now. However I'm still convinced that this is caused by a versioning mixup. There simply is no other explanation for this behavior. Would you mind thoroughly checking what versions of Boost were used during compilation and during runtime. Have these versions of Boost been compiled with the same compiler?

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jun 7, 2014

Member

One more thing: I don't think Intel 14 is relying on the libc from gcc 4.4.7, that does not sound right.

Member

hkaiser commented Jun 7, 2014

One more thing: I don't think Intel 14 is relying on the libc from gcc 4.4.7, that does not sound right.

@Finomnis

This comment has been minimized.

Show comment
Hide comment
@Finomnis

Finomnis Jun 7, 2014

Contributor

I did not compile boost myself, it's a prebuilt version from the cluster environment. Therefore I don't really know which compiler they used ...

Contributor

Finomnis commented Jun 7, 2014

I did not compile boost myself, it's a prebuilt version from the cluster environment. Therefore I don't really know which compiler they used ...

@Finomnis

This comment has been minimized.

Show comment
Hide comment
@Finomnis

Finomnis Jun 7, 2014

Contributor

The only thing I built myself was jemalloc.
I just switched my config to:

  1) TACC-paths      5) TACC                     9) gcc/4.7.1
  2) Linux           6) cmake/2.8.9             10) boost/1.51.0
  3) cluster-paths   7) cuda/5.5                11) mvapich2/1.9a2
  4) cluster         8) python/2.7.3-epd-7.3.2

and it seems to be working...
Although I can hardly believe that the default intel compiler configuration of an intel cluster is faulty...

Contributor

Finomnis commented Jun 7, 2014

The only thing I built myself was jemalloc.
I just switched my config to:

  1) TACC-paths      5) TACC                     9) gcc/4.7.1
  2) Linux           6) cmake/2.8.9             10) boost/1.51.0
  3) cluster-paths   7) cuda/5.5                11) mvapich2/1.9a2
  4) cluster         8) python/2.7.3-epd-7.3.2

and it seems to be working...
Although I can hardly believe that the default intel compiler configuration of an intel cluster is faulty...

@Finomnis Finomnis closed this Jun 7, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment