Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative entry in reference count table #1031

Closed
eschnett opened this issue Dec 10, 2013 · 9 comments
Closed

Negative entry in reference count table #1031

eschnett opened this issue Dec 10, 2013 · 9 comments

Comments

@eschnett
Copy link
Contributor

After the recent flurry of commits to HPX I thought I should give it a try again. When running on 2 processes with 6 threads each, I received the following segfault. Are you interested in a more complete bug report?

{stack-trace}: 13 frames:
0x7f616fab7490 : hpx::util::backtrace::backtrace(unsigned long) + 0x80 in /xfs1/eschnetter/compute/hpx/lib/hpx/libhpxd.so.0
0x7f616fab7646 : hpx::util::trace_on_new_stack() + 0x1e in /xfs1/eschnetter/compute/hpx/lib/hpx/libhpxd.so.0
0x7f616fab3a46 : hpx::detail::backtrace() + 0x18 in /xfs1/eschnetter/compute/hpx/lib/hpx/libhpxd.so.0
0x7f616fab7a75 : boost::exception_ptr hpx::detail::get_exceptionhpx::exception(hpx::exception const&, std::string const&, std::string const&, long) + 0x92 in /xfs1/eschnetter/compute/hpx/lib/hpx/libhpxd.so.0
0x7f616fab80a0 : void hpx::detail::throw_exceptionhpx::exception(hpx::exception const&, std::string const&, std::string const&, long) + 0x38 in /xfs1/eschnetter/compute/hpx/lib/hpx/libhpxd.so.0
0x7f616fb80562 : hpx::agas::server::primary_namespace::decrement_sweep(std::list<boost::fusion::vector3<hpx::agas::gva, hpx::naming::gid_type, hpx::naming::gid_type>, std::allocator<boost::fusion::vector3<hpx::agas::gva, hpx::naming::gid_type, hpx::naming::gid_type> > >&, hpx::naming::gid_type const&, hpx::naming::gid_type const&, long, hpx::error_code&) + 0x744 in /xfs1/eschnetter/compute/hpx/lib/hpx/libhpxd.so.0
0x7f616fb7e356 : hpx::agas::server::primary_namespace::change_credit_non_blocking(hpx::agas::request const&, hpx::error_code&) + 0x1cc in /xfs1/eschnetter/compute/hpx/lib/hpx/libhpxd.so.0
0x7f616fb7a6ce : hpx::agas::server::primary_namespace::service(hpx::agas::request const&, hpx::error_code&) + 0x2c8 in /xfs1/eschnetter/compute/hpx/lib/hpx/libhpxd.so.0
0x7f616fbecd17 : hpx::agas::server::primary_namespace::remote_service(hpx::agas::request const&) + 0x2f in /xfs1/eschnetter/compute/hpx/lib/hpx/libhpxd.so.0
0x7f616fbf507d : hpx::util::detail::vtable::type<hpx::util::detail::bound<hpx::actions::base_result_action1<hpx::agas::response (hpx::agas::server::primary_namespace::)(hpx::agas::request const&), &hpx::agas::server::primary_namespace::remote_service, hpx::actions::result_action1<hpx::agas::response (hpx::agas::server::primary_namespace::)(hpx::agas::request const&), &hpx::agas::server::primary_namespace::remote_service, hpx::actions::detail::this_type> >::thread_function, hpx::util::tuple<unsigned long, hpx::agas::request, void, void, void, void, void, void> >, hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), void, void>::invoke(void*, hpx::threads::thread_state_ex_enum&&) + 0x41a in /xfs1/eschnetter/compute/hpx/lib/hpx/libhpxd.so.0
0x7f616fb12ad8 : hpx::util::coroutines::detail::coroutine_impl_wrapper<hpx::util::function_nonser<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum)>, hpx::util::coroutines::coroutine<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), hpx::threads::detail::coroutine_allocator, hpx::util::coroutines::detail::lx::x86_linux_context_impl>, hpx::util::coroutines::detail::lx::x86_linux_context_impl, hpx::threads::detail::coroutine_allocator>::operator()() + 0x10e in /xfs1/eschnetter/compute/hpx/lib/hpx/libhpxd.so.0
0x7f616fb0f5f9 : void hpx::util::coroutines::detail::lx::trampoline<hpx::util::coroutines::detail::coroutine_impl_wrapper<hpx::util::function_nonser<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum)>, hpx::util::coroutines::coroutine<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), hpx::threads::detail::coroutine_allocator, hpx::util::coroutines::detail::lx::x86_linux_context_impl>, hpx::util::coroutines::detail::lx::x86_linux_context_impl, hpx::threads::detail::coroutine_allocator> >(hpx::util::coroutines::detail::coroutine_impl_wrapper<hpx::util::function_nonser<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum)>, hpx::util::coroutines::coroutine<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), hpx::threads::detail::coroutine_allocator, hpx::util::coroutines::detail::lx::x86_linux_context_impl>, hpx::util::coroutines::detail::lx::x86_linux_context_impl, hpx::threads::detail::coroutine_allocator>
) + 0x18 in /xfs1/eschnetter/compute/hpx/lib/hpx/libhpxd.so.0
{locality-id}: 1
{hostname}: 127.0.0.1:7911:1
{process-id}: 7619
{function}: primary_namespace::decrement_sweep
{file}: /xfs1/eschnetter/compute/src/hpx/src/runtime/agas/server/primary_namespace_server.cpp
{line}: 909
{os-thread}: 4, worker-thread#4
{thread-id}: 00007f6173f40230
{thread-description}:
{config}:
HPX_HAVE_NATIVE_TLS=ON
HPX_HAVE_STACKTRACES=ON
HPX_HAVE_COMPRESSION_BZIP2=OFF
HPX_HAVE_COMPRESSION_SNAPPY=OFF
HPX_HAVE_COMPRESSION_ZLIB=OFF
HPX_HAVE_PARCEL_COALESCING=ON
HPX_HAVE_PARCELPORT_SHMEM=OFF
HPX_HAVE_PARCELPORT_IBVERBS=OFF
HPX_HAVE_VERIFY_LOCKS=OFF
HPX_HAVE_HWLOC=ON
HPX_HAVE_ITTNOTIFY=OFF
HPX_LIMIT=4
HPX_ACTION_ARGUMENT_LIMIT=5
HPX_COMPONENT_CREATE_ARGUMENT_LIMIT=5
HPX_FUNCTION_ARGUMENT_LIMIT=8
HPX_LOCK_LIMIT=5
HPX_TUPLE_LIMIT=8
HPX_WAIT_ARGUMENT_LIMIT=5
HPX_PARCEL_MAX_CONNECTIONS=512
HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256
HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32
HPX_PREFIX=/xfs1/eschnetter/compute/hpx
{version}: V0.9.8-trunk (AGAS: V3.0), Git: 6465ba6
{boost}: V1.49.0
{build-type}: debug
{date}: Dec 10 2013 15:00:55
{platform}: linux
{compiler}: GNU C++ version 4.7.2
{stdlib}: GNU libstdc++ version 20120920
{what}: negative entry in reference count table, lower({0000000200000001, 0000000000001066}) upper({0000000200000001, 0000000000001066}), count(-247): HPX(invalid_data)

@hkaiser
Copy link
Member

hkaiser commented Dec 10, 2013

Yes, please. This is probably caused by the new code which I wrote to fix the other issue. I must have overlooked something.... :/

@eschnett
Copy link
Contributor Author

I placed a simplified version of my source code into the git repository at https://bitbucket.org/eschnett/block-matrix/branch/issue-1031.

I build HPX with

cmake -DCMAKE_BUILD_TYPE=Debug -DHPX_HAVE_PARCELPORT_MPI=ON \
    -DHPX_MALLOC=system  \
    -DCMAKE_INSTALL_PREFIX=$HPX_DIR $(pwd)/../hpx

and my application with

cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_CXX_COMPILER=g++ \
   -DCMAKE_Fortran_COMPILER=gfortran \
   -DHPX_ROOT=$HPX_DIR

I run my application via

mpirun -x HPX_HAVE_PARCELPORT_TCPIP=0 -x MKL_NUM_THREADS=1 -x OPENBLAS_MAIN_FREE=1 \
    -x OPENBLAS_NUM_THREADS=1 -np 2 ./bin/block_matrix --hpx:1:pu-offset=12 --hpx:affinity=core \
    --hpx:threads=6 --hpx:pu-step=2 --hpx:print-bind --hpx:debug-hpx-log=block_matrix.log --nsize=200 \
    --nblocks=12

My application runs for a few seconds (less than a minute), and then reports an error similar to the one described above.

@hkaiser
Copy link
Member

hkaiser commented Dec 11, 2013

Erik, your error output above points to the git version 6465ba6, which is 3 month old. May I ask you to verify that you run against the newest version from the master branch? It might just be a stale entry in the CMakeCache.txt after all, but I would like to make sure it's not the cause of the issues your seeing now.

@ghost ghost assigned hkaiser Dec 11, 2013
@eschnett
Copy link
Contributor Author

My HPX version is definitively up to date. For example, I now need to use make_ready_future instead of simply calling future's constructor with a value.

I copied my HPX source tree to a different machine, excluding the .git directory. I now see that an old version of the .git directory is still there, and is inconsistent with the source tree. I assume this is where HPX incorrectly picked up its version number.

My source tree is at 01a9973.

@hkaiser
Copy link
Member

hkaiser commented Dec 11, 2013

Sure. I was concerned you had a stale binary laying around as the version number HPX is reporting is compiled into the lib. I'll try to reproduce your problem today.

@hkaiser
Copy link
Member

hkaiser commented Dec 11, 2013

Also, could you send the log created by such a failing run to me offline, please?

@eschnett
Copy link
Contributor Author

Sent via email.

@eschnett
Copy link
Contributor Author

Email didn't work; please look at https://www.dropbox.com/sh/e7b08nhynt6v5w7/frLVWow2VS instead.

@hkaiser
Copy link
Member

hkaiser commented Jan 29, 2014

This issue should be fixed by the recent merge of the fixing_incref and fixing_1021 branches. Please try again and report back to us if you have still problems.

@hkaiser hkaiser closed this as completed Jan 29, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants