Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in dealii::Threads::Task<...>::TaskData::~TaskData() with libc++ #15478

Closed
tamiko opened this issue Jun 25, 2023 · 6 comments · Fixed by #15492
Closed

Segmentation fault in dealii::Threads::Task<...>::TaskData::~TaskData() with libc++ #15478

tamiko opened this issue Jun 25, 2023 · 6 comments · Fixed by #15492

Comments

@tamiko
Copy link
Member

tamiko commented Jun 25, 2023

There are two peculiarly failing tests on the Clang-16 libc++ regression tester variant:

#0  0x00007fffb52b3cc0 in pthread_mutex_lock () from /usr/lib64/libc.so.6
#1  0x00007fffb778b116 in std::__1::mutex::lock() () from /usr/lib64/libc++.so.1
#2  0x00007fffb7788f00 in std::__1::__assoc_sub_state::wait() () from /usr/lib64/libc++.so.1
#3  0x00005555556d68d9 in std::__1::future<int>::wait[abi:v160006]() const (this=0x611000002740) at /usr/include/c++/v1/future:1087
#4  dealii::Threads::Task<int>::TaskData::wait (this=<optimized out>) at /srv/testsuite/dealii/include/deal.II/base/thread_management.h:1399
#5  0x00005555556d665d in dealii::Threads::Task<int>::TaskData::~TaskData (this=0x18) at /srv/testsuite/dealii/include/deal.II/base/thread_management.h:1350
#6  std::__1::__shared_ptr_emplace<dealii::Threads::Task<int>::TaskData, std::__1::allocator<dealii::Threads::Task<int>::TaskData> >::__on_zero_shared (this=0x611000002700)
    at /usr/include/c++/v1/__memory/shared_ptr.h:308
#7  0x00005555556cce23 in std::__1::__shared_count::__release_shared[abi:v160006]() (this=0x611000002700) at /usr/include/c++/v1/__memory/shared_ptr.h:157
#8  std::__1::__shared_weak_count::__release_shared[abi:v160006]() (this=0x611000002700) at /usr/include/c++/v1/__memory/shared_ptr.h:198
#9  std::__1::shared_ptr<dealii::Threads::Task<int>::TaskData>::~shared_ptr[abi:v160006]() (this=0x7fffa8414070) at /usr/include/c++/v1/__memory/shared_ptr.h:745
#10 dealii::Threads::Task<int>::~Task (this=0x7fffa8414070) at /srv/testsuite/dealii/include/deal.II/base/thread_management.h:983
#11 main () at /srv/testsuite/dealii/tests/multithreading/task_01_exception_02.cc:52

The code surrounding this issue is quite tricky because it relies on implementation details not guaranteed by the standard. BUT, the std::future in question [1] should still be alive when this lock is taken...

I will set this issue to high priority for the time being - simply because I think this is important to figure out for the release (and once this is fixed, we should have zero failing tests for the clang-16 libc++ regression tester. At least apart from a small amount of test failures triggered by bugs in external libraries.)

[1] https://github.com/dealii/dealii/blob/master/include/deal.II/base/thread_management.h#L1427

In reference to #15383

@kronbichler
Copy link
Member

Thanks for the investigation. As I've not written the code, could you please elaborate on

The code surrounding this issue is quite tricky because it relies on implementation details not guaranteed by the standard. BUT, the std::future in question [1] should still be alive when this lock is taken...

More precisely, where are you seeing the reliance on implementation details, where one could start to look? Is it the fact that std::promise goes out of scope when the task gets ready in

task_data->task_group.run(
[function_object,
promise =
std::shared_ptr<std::promise<RT>>(std::move(promise))]() {
try
{
internal::evaluate_and_set_promise(function_object, *promise);
}
catch (...)
{
try
{
// store anything thrown in the promise
promise->set_exception(std::current_exception());
}
catch (...)
{
// set_exception() may throw too. But ignore this on
// the task.
}
}
});
- but that exception mechanism looks similar to what the C++ page suggests: https://en.cppreference.com/w/cpp/thread/promise/set_exception

@tamiko
Copy link
Member Author

tamiko commented Jun 26, 2023

@kronbichler The difference that I can see between the cppreference example und our implementation is the fact that we move the promise into a lambda for the run() function. So maybe that's ultimately the issue and the difference between the libstdc++ and libc++ implementations. Namely that when accessing the effect (result or exception) of the promise via a future the promise object must still be alive somewhere.

But apart from that - I don't know what I was writing any more. It was very late.

@kronbichler
Copy link
Member

Yes, this was my recollection as well. @bangerth you originally wrote the code in question, any comment on how to proceed? I would suggest to try to keep the std::promise alive next to the future and see if it helps. But I admit I am not qualified here and just guessing what might help.

@bangerth
Copy link
Member

The thing is that the promise variable isn't an object, but is declared like this:

          std::unique_ptr<std::promise<RT>> promise =
            std::make_unique<std::promise<RT>>();

So when in the lambda declaration we do

            [function_object,
             promise =
               std::shared_ptr<std::promise<RT>>(std::move(promise))]() {

we don't actually move the object, we just transfer ownership from a std::unique_ptr to a std::shared_ptr. This is option 13 here: https://en.cppreference.com/w/cpp/memory/shared_ptr/shared_ptr I see no bug with this code.

@bangerth
Copy link
Member

The error happens in the destructor of the TaskData object calling std::future::wait(). The future is created by the promise we talked about above:
https://github.com/dealii/dealii/blob/master/include/deal.II/base/thread_management.h#L1006-L1009
We properly move the object here because std::future cannot be copied. The TaskData constructor then again correctly moves the object into its own member variable:
https://github.com/dealii/dealii/blob/master/include/deal.II/base/thread_management.h#L1285-L1288
I don't see anything wrong with this approach.

@bangerth
Copy link
Member

But I think I see what's happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants