Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when executing on Intel MIC #1280

Closed
dmarce1 opened this issue Sep 29, 2014 · 26 comments
Closed

Bug when executing on Intel MIC #1280

dmarce1 opened this issue Sep 29, 2014 · 26 comments

Comments

@dmarce1
Copy link
Member

dmarce1 commented Sep 29, 2014

When executing my code (https://github.com/dmarce1/xtree) on a single of SuperMIC's Phi co-processors, I get the following error:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<hpx::exception> >'
  what():  failed to set thread affinity mask (0x0000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101111) for cpuset 0x000fffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffef: HPX(kernel_error)
@hkaiser
Copy link
Member

hkaiser commented Sep 29, 2014

What command line options have you used to run your application?

@hkaiser
Copy link
Member

hkaiser commented Sep 29, 2014

Also, what version of HWLOC do you use on the Phi?

@dmarce1
Copy link
Member Author

dmarce1 commented Sep 30, 2014

I tried it with no options, I tried it with -t1, and I tried it with -t(other than one). Same result each time.

HWLOC is version 1.9.1

@hkaiser
Copy link
Member

hkaiser commented Sep 30, 2014

Thanks. Could you provide us with your configuration and build command lines and logs for this, please? I assume this is on Stampede?

@dmarce1
Copy link
Member Author

dmarce1 commented Sep 30, 2014

Its on SuperMIC.

Here is the configuration log:
https://gist.github.com/dmarce1/5a2036679bf61f136155

The config.sh file I am using reads:


if [ -f CMakeCache.txt ]; then
rm -rf CMakeCache.txt
fi

SOURCE=/work/dmarce1/builds/hpx
INSTALL=/home/dmarce1/mic

LOG=config_$(date "+%Y.%m.%d_%H.%M.%S").log
LOGLEVEL="WARN"

cmake -DCMAKE_CXX_COMPILER=/home/dmarce1/intel/bin/icpc
-DCMAKE_BUILD_TYPE=Release
-DHPX_THREAD_GUARD_PAGE=OFF
-DCMAKE_INSTALL_PREFIX="${INSTALL}"
-DHPX_HAVE_PARCELPORT_MPI=ON
-DHPX_CMAKE_LOGLEVEL="${LOGLEVEL}"
-DBOOST_ROOT=$HOME/mic
-DHWLOC_ROOT=$HOME/mic
-DTBBMALLOC_ROOT=/home/dmarce1/intel/composer_xe_2015/tbb/lib/mic
-DHPX_NATIVE_MIC=On
-DCMAKE_CXX_FLAGS="-mmic -std=c++11 -w"
-DHPX_MALLOC=tbbmalloc
-DHPX_BUILD_EXAMPLES=OFF
-DHPX_BUILD_TESTS=OFF
-Wdev ${SOURCE} 2>&1 | tee ${LOG}

ln -fs $LOG config_latest.log


./config.sh
make -j20
make install

I'm using the latest HPX (as of yesterday) - Boost 1.55.0, Intel 14 with GCC 4.9.0 headers, and the Intel MPI library.

@hkaiser
Copy link
Member

hkaiser commented Sep 30, 2014

I added some additional logging output (master branch). COuld you please recompile and run your application with --hpx:debug-hpx-log=log.txt and send the created log.txt as well?

@hkaiser
Copy link
Member

hkaiser commented Sep 30, 2014

Also, our build system has changed quite a bit recently. For up to date instructions on how to buld HPX for the Phi, please see: http://stellar-group.github.io/hpx/docs/html/hpx/manual/build_system/building_hpx/build_recipes.html#hpx.manual.build_system.building_hpx.build_recipes.intel_mic_installation

@dmarce1
Copy link
Member Author

dmarce1 commented Sep 30, 2014

I get this when I try to build now - at the very end of the build:

/work/dmarce1/builds/hpx/src/components/iostreams/component_module.cpp:(.text+0x8ca): undefined reference to `hpx::lcos::future<hpx::naming::id_type> hpx::iostreams::detail::create_ostream<hpx::iostreams::detail::cout_tag>(hpx::iostreams::detail::cout_tag)'

Cmake script.sh
https://gist.github.com/dmarce1/2fb723e3af9bccfc5a94

config log:

https://gist.github.com/dmarce1/74c71159f0037b91d67e

(things didn't work using the XeonPhi.cmake file, so I copied everything over manually)

@hkaiser
Copy link
Member

hkaiser commented Sep 30, 2014

Are you sure to have the iostream component as a dependency for your project?

@dmarce1
Copy link
Member Author

dmarce1 commented Sep 30, 2014

How do I make sure of that?

@hkaiser
Copy link
Member

hkaiser commented Sep 30, 2014

Your application has to link with $HPX_PREFIX/lib/hpx/libiostreams.so.

@dmarce1
Copy link
Member Author

dmarce1 commented Oct 1, 2014

I'm getting the link error during the HPX build, not the application build

@hkaiser
Copy link
Member

hkaiser commented Oct 1, 2014

I committed a change to the iostreams module which should prevent the linker problem. Please try again.

hkaiser added a commit that referenced this issue Oct 1, 2014
@dmarce1
Copy link
Member Author

dmarce1 commented Oct 1, 2014

Got it to run. Here is the log.txt file:

https://gist.github.com/dmarce1/f0ea8bc8e17524133ed3

@dmarce1
Copy link
Member Author

dmarce1 commented Oct 6, 2014

I am getting the exact same error on Stampede.

@hkaiser
Copy link
Member

hkaiser commented Oct 6, 2014

That's unexpected as others have run HPX on Stampede. It might be a problem with HWLOC, though. Could you try using an older version (1.8.x)?

@dmarce1
Copy link
Member Author

dmarce1 commented Oct 7, 2014

I just tried with hwloc 1.8.1 on SuperMIC - same results.

@sithhell
Copy link
Member

sithhell commented Oct 8, 2014

What version of boost are you using?

@dmarce1
Copy link
Member Author

dmarce1 commented Oct 8, 2014

1.55.0

@sithhell
Copy link
Member

sithhell commented Oct 8, 2014

Ok. this is strange. I can't get HPX built with 1.55 or 1.56 on stampede ... How did you manage to do that?

@dmarce1
Copy link
Member Author

dmarce1 commented Oct 8, 2014

I used an evaluation version of Intel 15, which uses gcc 4.9, maybe that's the difference. My application code won't compile with Intel 14 or ealier.

@hkaiser
Copy link
Member

hkaiser commented Oct 11, 2014

Thomas told me yesterday that he is not able to reproduce your issues as it does run for him on Stampede. He will give you access to his prebuilt versions of boost and HPX for you to try with your application.

@sithhell
Copy link
Member

I can not reproduce this error when using intel 15 together with gcc 4.8, boost 1.56 and hwloc 1.10

@dmarce1
Copy link
Member Author

dmarce1 commented Oct 14, 2014

I am able to get my code to execute if I build with HPX_WITH_HWLOC=OFF.

When I try to build HPX using hwloc build as per the link above, I get an error saying I need to rebuild hwloc with -fPIC. When I do this, I am able to build HPX - but then when I run my application I get the error above.

@dmarce1
Copy link
Member Author

dmarce1 commented Oct 14, 2014

I can get things to work with hwloc now. (and MPI) on SuperMIC. I believe the difference is I was using --host=k1om instead of --host=x86_64-k1om-linux. I need to recompile everything to make sure, but things do appear to be working.

@dmarce1 dmarce1 closed this as completed Oct 14, 2014
@hkaiser
Copy link
Member

hkaiser commented Oct 14, 2014

Thanks for letting us know. This is really a subtle problem. We should take a note of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants