Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Workstream::run() with Taskflow (no coloring) #17119

Merged
merged 1 commit into from
Aug 25, 2024

Conversation

RyanMoulday
Copy link
Contributor

No description provided.

@tjhei
Copy link
Member

tjhei commented Jun 11, 2024

Thank you, Ryan.

FYI @bangerth (this is part 1 of several)

@tjhei
Copy link
Member

tjhei commented Jun 12, 2024

The test mpi/mesh_worker_02 is failing. We will need to investigate.

@tjhei
Copy link
Member

tjhei commented Jun 12, 2024

The test mpi/mesh_worker_02 is failing. We will need to investigate.

Okay, the old test made the assumption that cells are visited in order. Fixed.

@tjhei
Copy link
Member

tjhei commented Jun 13, 2024

can you reindent, @RyanMoulday ?

@tjhei
Copy link
Member

tjhei commented Jun 16, 2024

We seem to be hitting some random test failures/timeouts. I think we need to investigate before we can merge this.

@tjhei
Copy link
Member

tjhei commented Jun 18, 2024

this is waiting on #17131

Copy link
Member

@bangerth bangerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start. I think it would be nice if we had a way to re-use copier objects in the same way as you are re-using scratch objects. What does the TBB implementation do in this regard?

Comment on lines -244 to +246
Utilities::MPI::MPI_InitFinalize mpi_initialization(
argc, argv, testing_max_num_threads());
MPILogInitAll log;
// Disable multithreading so that text output order is consistent
Utilities::MPI::MPI_InitFinalize mpi_initialization(argc, argv, 1);
MPILogInitAll log;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? What is different between the TBB and the TaskFlow implementation that makes it necessary to make this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we don't support chunking in the TaskFlow implementation. This particular test (and the others changed) relies on the output being generated in a specific order as it compares line by line. I believe this order is not explicitly assured by the TBB implementation either but just happens to be consistent because this gets chunked to run sequentially with the default grain size in the TBB implementation.

The outputs for the TaskFlow implementation are correct they just appear in a different order. I was advised that this would be the sensible change to make as right now I don't think this test actually even runs on more than 1 thread per MPI rank with the TBB implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's true, but moreover I think that this essentially disables the test and that would be a shame. Can you show in which ways the output might differ depending on thread placement of tasks? Currently, the output of the whole test is this:

DEAL:0::DoFHandler ndofs=2
DEAL:0::*** 1. CELLS ***
DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=0
DEAL:0::* own_cells=1 ghost_cells=0 own_faces=0 faces_to_ghost=0
DEAL:0::C 0_0:
DEAL:0::* own_cells=0 ghost_cells=1 own_faces=0 faces_to_ghost=0
DEAL:0::C 1_0:
DEAL:0::* own_cells=1 ghost_cells=1 own_faces=0 faces_to_ghost=0
DEAL:0::C 0_0:
DEAL:0::C 1_0:
DEAL:0::*** 2. FACES ***
DEAL:0::* own_cells=0 ghost_cells=0 own_faces=1 faces_to_ghost=0
DEAL:0::* own_cells=0 ghost_cells=0 own_faces=2 faces_to_ghost=0
DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=1
DEAL:0::F cell1 = 0_0: face = 1 cell2 = 1_0: face2 = 0
DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=2
DEAL:0::F cell1 = 0_0: face = 1 cell2 = 1_0: face2 = 0

Which lines are unreliably ordered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to respond with the output of a different test but I just happen to have this one on hand.

Here is the expected output which TBB generates for Mesh_Worker_02:

DEAL:0::DoFHandler ndofs=7
DEAL:0::* own_cells=0 ghost_cells=0 own_faces=1 faces_to_ghost=0
DEAL:0::F cell1 = 0_1:1 face = 3 cell2 = 0_1:3 face2 = 2
DEAL:0::F cell1 = 0_1:2 face = 1 cell2 = 0_1:3 face2 = 0
DEAL:0::F cell1 = 0_2:00 face = 1 cell2 = 0_2:01 face2 = 0
DEAL:0::F cell1 = 0_2:00 face = 3 cell2 = 0_2:02 face2 = 2
DEAL:0::F cell1 = 0_2:01 face = 1 cell2 = 0_1:1 face2 = 0
DEAL:0::F cell1 = 0_2:01 face = 3 cell2 = 0_2:03 face2 = 2
DEAL:0::F cell1 = 0_2:02 face = 1 cell2 = 0_2:03 face2 = 0
DEAL:0::F cell1 = 0_2:02 face = 3 cell2 = 0_1:2 face2 = 2
DEAL:0::F cell1 = 0_2:03 face = 1 cell2 = 0_1:1 face2 = 0
DEAL:0::F cell1 = 0_2:03 face = 3 cell2 = 0_1:2 face2 = 2
DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=1
DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=2

And TaskFlow:

DEAL:0::DoFHandler ndofs=7
DEAL:0::* own_cells=0 ghost_cells=0 own_faces=1 faces_to_ghost=0
DEAL:0::F cell1 = 0_1:1 face = 3 cell2 = 0_1:3 face2 = 2
DEAL:0::F cell1 = 0_1:2 face = 1 cell2 = 0_1:3 face2 = 0
DEAL:0::F cell1 = 0_2:01 face = 1 cell2 = 0_1:1 face2 = 0
DEAL:0::F cell1 = 0_2:00 face = 1 cell2 = 0_2:01 face2 = 0
DEAL:0::F cell1 = 0_2:01 face = 3 cell2 = 0_2:03 face2 = 2
DEAL:0::F cell1 = 0_2:00 face = 3 cell2 = 0_2:02 face2 = 2
DEAL:0::F cell1 = 0_2:03 face = 1 cell2 = 0_1:1 face2 = 0
DEAL:0::F cell1 = 0_2:02 face = 1 cell2 = 0_2:03 face2 = 0
DEAL:0::F cell1 = 0_2:03 face = 3 cell2 = 0_1:2 face2 = 2
DEAL:0::F cell1 = 0_2:02 face = 3 cell2 = 0_1:2 face2 = 2
DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=1
DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically, the test produce text output in the worker, that can be scheduled concurrently. The test used to work because the default chunk size with TBB is larger than the number of cells in this test. The proposed change make sure the test will run sequentially.
I think this is okay because the purpose of the test is not parallel computing but that the mesh worker visits the correct cells and faces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the change here is still awkward because bypass all of the interesting stuff if we have no threads, which is what happens here. On the other hand, we do test WorkStream pretty heavily in many other tests, so that might not matter.

Comment on lines 1147 to 1348
# ifdef DEAL_II_WITH_TBB
# ifdef DEAL_II_WITH_TASKFLOW
if (static_cast<const std::function<void(const CopyData &)> &>(copier))
{
// If we have a copier, run the algorithm:
internal::taskflow_no_coloring::run(begin,
end,
worker,
copier,
sample_scratch_data,
sample_copy_data,
queue_length,
chunk_size);
}
else
{
// There is no copier function. in this case, we have an
// embarrassingly parallel problem where we can
// essentially apply parallel_for. because parallel_for
// requires subdividing the range for which operator- is
// necessary between iterators, it is often inefficient to
// apply it directly to cell ranges and similar iterator
// types for which operator- is expensive or, in fact,
// nonexistent. rather, in that case, we simply copy the
// iterators into a large array and use operator- on
// iterators to this array of iterators.
//
// instead of duplicating code, this is essentially the
// same situation we have in the colored implementation below, so we
// just defer to that place
std::vector<std::vector<Iterator>> all_iterators(1);
for (Iterator p = begin; p != end; ++p)
all_iterators[0].push_back(p);

run(all_iterators,
worker,
copier,
sample_scratch_data,
sample_copy_data,
queue_length,
chunk_size);
}

// exit this function to not run the sequential version below:
return;
# elif defined(DEAL_II_WITH_TBB)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place that's different between the two implementation is the if block. I think it might be nicer to write this as

#if defined(DEAL_II_WITH_TBB) || defined(DEAL_II_WITH_TASKFLOW)
   if (have copier)
   {
#    if defined(DEAL_II_WITH_TBB)
          new code
#    elif defined(DEAL_II_WITH_TBB)
        old code
#   endif
    }
  else
    {
       ... no change in this part, just forward to the other version of run() ...
    }
#endif

Comment on lines 676 to 688
template <typename Worker,
typename Copier,
typename Iterator,
typename ScratchData,
typename CopyData>

/**
* The last two arguments in this function are for chunking support which
* currently does not exist but ideally will later. For now they are
* ignored but still here to permit existing programs to function
*/
void
run(const Iterator &begin,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the comment above the declaration (which starts with template). In general, start by saying what the function does, rather than with a comment about its arguments.

*/
void
run(const Iterator &begin,
const typename identity<Iterator>::type &end,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std_cxx20::type_identity_t<Iterator> as elsewhere in this file.

Comment on lines 694 to 695
const unsigned int queue_length = 2 * MultithreadInfo::n_threads(),
const unsigned int chunk_size = 8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll get warnings about unused arguments if you don't un-name these arguments:

Suggested change
const unsigned int queue_length = 2 * MultithreadInfo::n_threads(),
const unsigned int chunk_size = 8)
const unsigned int /*queue_length*/ = 2 * MultithreadInfo::n_threads(),
const unsigned int /*chunk_size*/ = 8)

Comment on lines 737 to 719
// This is used to connect each worker to its copier as communication
// between tasks is not supported.
unsigned int idx = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, starting a sentence with "This" is awkward because it's unclear what "this" refers to. Did you mean "The following variable..."? If so, however, I don't understand what the comment is saying. How are workers using this variable to make this connection?

Comment on lines 741 to 737
std::vector<std::unique_ptr<CopyData>> copy_datas;

for (Iterator i = begin; i != end; ++i, ++idx)
{
copy_datas.emplace_back();
// Create a worker task.
auto worker_task =
taskflow
.emplace([it = i,
idx,
&data,
&sample_scratch_data,
&sample_copy_data,
&copy_datas,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not going to be thread-safe. You are taking a reference to copy_datas, which you access via copy_datas[idx] below from the worker thread, but at the same time the thread that spawns all of these tasks is modifying copy_datas in line 745.

(Or perhaps what you are saying is that you are only creating the task objects here, not yet running them, and all of the accesses once the tasks are running only ever modify a single entry of the vector. That is ok, but it would be nice to describe this in a comment.)


using ScratchDataList = std::list<ScratchDataObjects>;

Threads::ThreadLocalStorage<ScratchDataList> data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps give this a better name, like scratch_data_list?

// Ensure that only one copy task can run at a time.
if (!last_copier.empty())
last_copier.precede(copier_task);
last_copier = copier_task;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the semantics of copying tasks? Does the copy point to the same task, or is it a separate object? This may be useful to document at this place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would still be good to address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RyanMoulday Can you add a comment above this assignment that says:

// Keep a handle to the last copier. Tasks in taskflow are basically handles to internally stored data, so this does not perform a copy:

include/deal.II/base/work_stream.h Show resolved Hide resolved
@RyanMoulday
Copy link
Contributor Author

Good start. I think it would be nice if we had a way to re-use copier objects in the same way as you are re-using scratch objects. What does the TBB implementation do in this regard?

I think this is a limitation of the current static tasking idea where the task graph is generated before we start running anything. Each copy task is a totally separate task that only knows it needs to run after its worker task (and after all previous copy tasks) but cannot receive any information from its worker task. As a result the copy task is usually handled by a thread that did not handle the worker task so a thread local copy object doesn't work and many more copy objects may exist than scratch objects.

I imagine the correct way to get around this is the parallel pipeline implementation which is what TBB uses right now. TaskFlow also has its own parallel pipeline which could be used. The structures are pretty similar so this may be feasible with just slight edits to the TBB implementation if we want to go that route (this current code would basically be discarded).

@tjhei
Copy link
Member

tjhei commented Jun 21, 2024

@RyanMoulday Can you please also rebase to the most current master? This should fix the nedelec tests.

@bangerth There are some shortcomings in the current approach (no reuse of Copy objects, no chunking, storing a large vector of pointers, etc.). Nevertheless, Ryan measured same or better performance than TBB. I was encouraged by the results and thought it might be worthwhile to get these initial versions merged and later think about optimizing the implementation. Thoughts?

@bangerth
Copy link
Member

If there is a commitment to address some of these issues in future patches, I'm ok with an incremental approach.

I do think that it would be interesting to try out the TaskFlow pipeline implementation. Having a non-pipeline implementation already merged might make for a nice baseline.

@tjhei
Copy link
Member

tjhei commented Jun 24, 2024

Ryan, is this ready for a another round of reviews?

@RyanMoulday
Copy link
Contributor Author

Yes I have made the requested changes. The only hanging thing left is the tests and what we will do with them.

@tjhei
Copy link
Member

tjhei commented Jun 26, 2024

I do think that it would be interesting to try out the TaskFlow pipeline implementation

I started looking into the tf::Pipeline and I have to admit that it is very hard to understand. We will give it a try, though.

@bangerth
Copy link
Member

bangerth commented Jun 26, 2024 via email

Comment on lines 172 to 203
template <typename Iterator, typename ScratchData, typename CopyData>
struct ScratchDataObject
{
std::unique_ptr<ScratchData> scratch_data;
bool currently_in_use;

/**
* Default constructor.
*/
ScratchDataObject()
: currently_in_use(false)
{}

ScratchDataObject(std::unique_ptr<ScratchData> &&p, const bool in_use)
: scratch_data(std::move(p))
, currently_in_use(in_use)
{}

ScratchDataObject(ScratchData *p, const bool in_use)
: scratch_data(p)
, currently_in_use(in_use)
{}

// Provide a copy constructor that actually doesn't copy the
// internal state. This makes handling ScratchAndCopyDataObjects
// easier to handle with STL containers.
ScratchDataObject(const ScratchDataObject &)
: currently_in_use(false)
{}

ScratchDataObject(ScratchDataObject &&o) noexcept = default;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second reading, I don't see why this class has Iterator and CopyData as template arguments. They do not seem to be used.

Comment on lines 721 to 733
unsigned int idx = 0;

std::vector<std::unique_ptr<CopyData>> copy_datas;

// Generate a static task graph. Here we generate a task for each cell
// that will be worked on. The tasks are not executed until all of them
// are created, this code runs sequentially.
for (Iterator i = begin; i != end; ++i, ++idx)
{
copy_datas.emplace_back();
// Create a worker task.
auto worker_task =
taskflow
.emplace([it = i,
idx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not wrong, but it made me wonder whether there is a good reason to capture the iterator i via a local renamed variable it whereas you just capture idx with its old name. Doing the same thing in different ways is a good way to confuse the reader, so my preference would be to use one style or the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. @RyanMoulday Can you please rename i to it in the for loop above and then just capture it?

// Ensure that only one copy task can run at a time.
if (!last_copier.empty())
last_copier.precede(copier_task);
last_copier = copier_task;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would still be good to address.

Comment on lines -244 to +246
Utilities::MPI::MPI_InitFinalize mpi_initialization(
argc, argv, testing_max_num_threads());
MPILogInitAll log;
// Disable multithreading so that text output order is consistent
Utilities::MPI::MPI_InitFinalize mpi_initialization(argc, argv, 1);
MPILogInitAll log;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the change here is still awkward because bypass all of the interesting stuff if we have no threads, which is what happens here. On the other hand, we do test WorkStream pretty heavily in many other tests, so that might not matter.

@tjhei
Copy link
Member

tjhei commented Jun 26, 2024

I think that the change here is still awkward

You should tell the original author about the bad test design (printing from a worker task! edit: this was me, btw :-) ). But I think this is acceptable, as the test did not run in parallel before either and it is not made to test that workstream runs in parallel.

@RyanMoulday
Copy link
Contributor Author

I've gone ahead and made the most recent requested changes

@RyanMoulday
Copy link
Contributor Author

This grid/intergrid_constraints.debug timeout seems to be a recurring issue. I am unable to recreate it when running the test suite on my own device so investigating the cause is difficult. The test seems to be for compute_intergrid_weights and within that function the helper function compute_intergrid_weights_2 makes a workstream call so perhaps the issue is here somewhere.

@tjhei
Copy link
Member

tjhei commented Jun 27, 2024

es a workstream call so perhaps the issue is here somewhere.

It does not fail on master, so I think this is related to your changes here. The test might already be slow to begin with, but we will need to check.

@RyanMoulday
Copy link
Contributor Author

es a workstream call so perhaps the issue is here somewhere.

It does not fail on master, so I think this is related to your changes here. The test might already be slow to begin with, but we will need to check.

Its definitely related to these changes but I was trying to figure out where this test actually invokes the new code and that was the main place that I could easily see it happening.

The test is not remarkably slow when it passes, on the runs that it did pass it took 15-30 seconds. The timeout is 1200 seconds so there is definitely something weird going on.

@tjhei
Copy link
Member

tjhei commented Jun 27, 2024

Agreed. When it passes on your branch it only takes a few seconds: https://ci.tjhei.info/job/dealii-serial/job/PR-17119/13/testReport/projectroot.tests.grid.intergrid_constraintsdebug/

Do we have a deadlock in this test? Does it pass every time you run it on your computer? (you can cd into tests/grid/intergrid_constraints.debug and run the binary there manually) (try to run it 10 times for example)

@tjhei
Copy link
Member

tjhei commented Jun 28, 2024

Exactly, except if there is a bug within taskflow.

@tjhei
Copy link
Member

tjhei commented Jun 30, 2024

Ryan and I figured out that the hang disappears if we disable the taskflow async() calls that we use for background tasks. This seems like we might be hitting a bug in the taskflow scheduler.

We could move forward with this PR if we disable the async() functionality. Otherwise, we would need to a) make a small program that can reproduce the hang, b) get it fixed in taskflow (assuming it is indeed a bug), c) wait for the next taskflow release, d) then merge this PR.

@tjhei
Copy link
Member

tjhei commented Jul 5, 2024

@bangerth Any thoughts on the current situation? How sure are you that

if (MultithreadInfo::get_taskflow_executor().this_worker_id() >= 0)
MultithreadInfo::get_taskflow_executor().corun_until([this]() {
return (future.wait_for(std::chrono::seconds(0)) ==
std::future_status::ready);
});
else

is correct and can not cause an issue?

@tjhei
Copy link
Member

tjhei commented Jul 8, 2024

I looked at the hang one more time and I observe the following behavior:
thread 1 waits for a workstream to finish running:

#7  0x000071204b8aa0df in dealii::WorkStream::internal::taskflow_no_coloring::run<>...
executor.run(taskflow).wait();

thread 2-20 belong to TBB and are idle

thread 21 seems to be an idle taskflow worker:

#5  0x000071204c8485af in tf::Notifier::_park (this=0x5b245d96a138, w=0x5b245d959330)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/notifier.hpp:266
#6  0x000071204c8482d1 in tf::Notifier::commit_wait (this=0x5b245d96a138, w=0x5b245d959330)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/notifier.hpp:144
#7  0x000071204c84c0ce in tf::Executor::_wait_for_task (this=0x5b245d96a000, worker=..., t=@0x71200f3f8660: 0x0)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1364
#8  0x000071204c84b9f9 in tf::Executor::_spawn(unsigned long)::{lambda()#1}::operator()() const (__closure=0x5b245d95caf8)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1212
#9  0x000071204c85d0ae in std::__invoke_impl<void, tf::Executor::_spawn(unsigned long)::{lambda()#1}>(std::__invoke_other, tf::Executor::_spawn(unsigned long)::{lambda()#1}&&) (__f=...) at /usr/include/c++/13/bits/invoke.h:61

Thread 22/23 are taskflow workers stuck waiting for a mutex, both are tasks in the workstream:

#6  0x0000712043ea33cc in std::lock_guard<std::mutex>::lock_guard (this=0x71200fdf7b80, __m=...)
    at /usr/include/c++/13/bits/std_mutex.h:249
#7  0x00007120463d08db in dealii::FESystem<2, 2>::get_prolongation_matrix (this=0x5b245d994390, child=0, 
    refinement_case=...) at /ssd/deal-git/source/fe/fe_system.cc:940
...
#15 0x000071204b8be110 in std::_Function_handler<void(), dealii::WorkStream::internal::taskflow_no_coloring::run<d

I don't understand why the thread that holds the lock is not making any progress. Does that mean taskflow interrupted the task while it is holding the lock?

@bangerth
Copy link
Member

bangerth commented Jul 8, 2024 via email

@bangerth
Copy link
Member

This is confusing me as well. If you have two threads that are stuck waiting for a mutex, who is currently holding the mutex? It must be one of the other threads, but it's not clear to from what you show which one that would be. In any of the other threads, is one of the higher frames also in FESystem::get_prolongation_matrix(), somewhere in the middle? That would suggest that that thread holds the mutex, called a function that itself spawns tasks, and that these tasks cannot be scheduled because there are other tasks currently waiting for the mutex.

(Separately, I think you can configure with TaskFlow and without TBB. In that case, you don't have to deal with all of the TBB worker threads.)

@bangerth
Copy link
Member

In fact, it's enough if any of the threads in a higher frame is somewhere in the middle of FESystem::get_prolongation_matrix(). This may be one of the other threads, but it can also be one of the ones currently blocked.

@tjhei
Copy link
Member

tjhei commented Jul 12, 2024

This may be one of the other threads, but it can also be one of the ones currently blocked.

Yes, you are indeed correct. I didn't read far enough down. The workstream function indirectly calls get_prolongation_matrix(), which waits for tasks to complete, which taskflow uses to corun another task of workstream that hangs indefinitely:

#0  futex_wait (private=0, expected=2, futex_word=0x556572f106b0) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x556572f106b0, private=0) at ./nptl/lowlevellock.c:49
#2  0x000071a9874a00f1 in lll_mutex_lock_optimized (mutex=0x556572f106b0) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x556572f106b0) at ./nptl/pthread_mutex_lock.c:93
#4  0x000071a99847cacb in __gthread_mutex_lock (__mutex=0x556572f106b0)
    at /usr/include/x86_64-linux-gnu/c++/13/bits/gthr-default.h:749
#5  0x000071a99847cbf6 in std::mutex::lock (this=0x556572f106b0) at /usr/include/c++/13/bits/std_mutex.h:113
#6  0x000071a9984a33cc in std::lock_guard<std::mutex>::lock_guard (this=0x71a968bf67c0, __m=...)
    at /usr/include/c++/13/bits/std_mutex.h:249
#7  0x000071a99a9a7127 in dealii::FESystem<1, 1>::get_prolongation_matrix (this=0x556572f101c0, child=0, 
    refinement_case=...) at /ssd/deal-git/source/fe/fe_system.cc:940
#8  0x000071a99fb51397 in dealii::internal::process_by_interpolation<1, 1, false, dealii::Vector<double>, double>(dealii::DoFCellAccessor<1, 1, false> const&, dealii::Vector<double> const&, dealii::Vector<double>&, unsigned short, std::function<void (dealii::DoFCellAccessor<1, 1, false> const&, dealii::Vector<double> const&, dealii::Vector<double>&)> const&) (
    cell=..., local_values=..., values=..., fe_index_=65535, processor=...)
    at /ssd/deal-git/source/dofs/dof_accessor_set.cc:252
#9  0x000071a99fb24af9 in dealii::DoFCellAccessor<1, 1, false>::set_dof_values_by_interpolation<dealii::Vector<double>, double> (this=0x71a968bf6d50, local_values=..., values=..., fe_index_=65535, perform_check=false)
    at /ssd/deal-git/source/dofs/dof_accessor_set.cc:273
#10 0x000071a99fea2803 in dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_3<1, 1> (cell=..., 
    copy_data=..., coarse_component=5, coarse_fe=..., coarse_to_fine_grid_map=..., 
    parameter_dofs=std::vector of length 3, capacity 3 = {...}) at /ssd/deal-git/source/dofs/dof_tools_constraints.cc:4013
#11 0x000071a99fe9d856 in operator() (__closure=0x7ffceb9ba050, cell=..., scratch_data=..., copy_data=...)
    at /ssd/deal-git/source/dofs/dof_tools_constraints.cc:4171
#12 0x000071a99fea6d6f in operator() (__closure=0x556572ef7380) at /ssd/deal-git/include/deal.II/base/work_stream.h:770
#13 0x000071a99febfa8c in std::__invoke_impl<void, dealii::WorkStream::internal::taskflow_no_coloring::run<dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >, dealii::DoFTools::internal::Assembler::Scratch, dealii::DoFTools::internal::Assembler::CopyData<1, 1> >(const dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >&, dealii::std_cxx20::type_identity_t<dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> > >&, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, const dealii::DoFTools::internal::Assembler::Scratch&, const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&, unsigned int, unsigned int)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/13/bits/invoke.h:61
#14 0x000071a99febe750 in std::__invoke_r<void, dealii::WorkStream::internal::taskflow_no_coloring::run<dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >, dealii::DoFTools::internal::Assembler::Scratch, dealii::DoFTools::internal::Assembler::CopyData<1, 1> >(const dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >&, dealii::std_cxx20::type_identity_t<dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> > >&, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, const dealii::DoFTools::internal::Assembler::Scratch&, const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&, unsigned int, unsigned int)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/13/bits/invoke.h:111
#15 0x000071a99febdbac in std::_Function_handler<void(), dealii::WorkStream::internal::taskflow_no_coloring::run<dealii::Do--Type <RET> for more, q to quit, c to continue without paging--c

FTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >, dealii::DoFTools::internal::Assembler::Scratch, dealii::DoFTools::internal::Assembler::CopyData<1, 1> >(const dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >&, dealii::std_cxx20::type_identity_t<dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> > >&, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, const dealii::DoFTools::internal::Assembler::Scratch&, const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&, unsigned int, unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/13/bits/std_function.h:290
#16 0x000071a9983cf950 in std::function<void ()>::operator()() const (this=0x556572effd58)
    at /usr/include/c++/13/bits/std_function.h:591
#17 0x000071a99a8898c6 in tf::Executor::_invoke_static_task (this=0x556572ef1000, worker=..., node=0x556572effca0)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1758
#18 0x000071a99a888be5 in tf::Executor::_invoke (this=0x556572ef1000, worker=..., node=0x556572effca0)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1547
#19 0x000071a99a8cee0a in tf::Executor::_corun_until<dealii::Threads::Task<void>::TaskData::wait()::{lambda()#1}>(tf::Worker&, dealii::Threads::Task<void>::TaskData::wait()::{lambda()#1}&&) (this=0x556572ef1000, w=..., stop_predicate=...)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1267
#20 0x000071a99a8cc0f9 in tf::Executor::corun_until<dealii::Threads::Task<void>::TaskData::wait()::{lambda()#1}>(dealii::Threads::Task<void>::TaskData::wait()::{lambda()#1}&&) (this=0x556572ef1000, predicate=...)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:2085
#21 0x000071a99a8c8cc4 in dealii::Threads::Task<void>::TaskData::wait (this=0x71a8f802f710)
    at /ssd/deal-git/include/deal.II/base/thread_management.h:963
#22 0x000071a99a8c6434 in dealii::Threads::Task<void>::join (this=0x71a8f801ce80)
    at /ssd/deal-git/include/deal.II/base/thread_management.h:744
#23 0x000071a99a8c081d in dealii::Threads::TaskGroup<void>::join_all (this=0x71a968bf7570)
    at /ssd/deal-git/include/deal.II/base/thread_management.h:1439
#24 0x000071a99bc306aa in dealii::FETools::compute_embedding_matrices<1, double, 1> (fe=..., 
    matrices=std::vector of length 1, capacity 1 = {...}, isotropic_only=true, threshold=9.9999999999999998e-13)
    at /ssd/deal-git/include/deal.II/fe/fe_tools.templates.h:1894
#25 0x000071a99a5c8cb3 in dealii::FE_DGQ<1, 1>::get_prolongation_matrix (this=0x71a8f8002af0, child=0, 
    refinement_case=...) at /ssd/deal-git/source/fe/fe_dgq.cc:458
#26 0x000071a99a9a7284 in dealii::FESystem<1, 1>::get_prolongation_matrix (this=0x556572f101c0, child=0, 
    refinement_case=...) at /ssd/deal-git/source/fe/fe_system.cc:951
#27 0x000071a99fb51397 in dealii::internal::process_by_interpolation<1, 1, false, dealii::Vector<double>, double>(dealii::DoFCellAccessor<1, 1, false> const&, dealii::Vector<double> const&, dealii::Vector<double>&, unsigned short, std::function<void (dealii::DoFCellAccessor<1, 1, false> const&, dealii::Vector<double> const&, dealii::Vector<double>&)> const&) (
    cell=..., local_values=..., values=..., fe_index_=65535, processor=...)
    at /ssd/deal-git/source/dofs/dof_accessor_set.cc:252
#28 0x000071a99fb24af9 in dealii::DoFCellAccessor<1, 1, false>::set_dof_values_by_interpolation<dealii::Vector<double>, double> (this=0x71a968bf8110, local_values=..., values=..., fe_index_=65535, perform_check=false)
    at /ssd/deal-git/source/dofs/dof_accessor_set.cc:273
#29 0x000071a99fea2803 in dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_3<1, 1> (cell=..., 
    copy_data=..., coarse_component=5, coarse_fe=..., coarse_to_fine_grid_map=..., 
    parameter_dofs=std::vector of length 3, capacity 3 = {...}) at /ssd/deal-git/source/dofs/dof_tools_constraints.cc:4013
#30 0x000071a99fe9d856 in operator() (__closure=0x7ffceb9ba050, cell=..., scratch_data=..., copy_data=...)
    at /ssd/deal-git/source/dofs/dof_tools_constraints.cc:4171
#31 0x000071a99fea6d6f in operator() (__closure=0x556572f0fb50) at /ssd/deal-git/include/deal.II/base/work_stream.h:770
#32 0x000071a99febfa8c in std::__invoke_impl<void, dealii::WorkStream::internal::taskflow_no_coloring::run<dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >, dealii::DoFTools::internal::Assembler::Scratch, dealii::DoFTools::internal::Assembler::CopyData<1, 1> >(const dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >&, dealii::std_cxx20::type_identity_t<dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> > >&, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, const dealii::DoFTools::internal::Assembler::Scratch&, const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&, unsigned int, unsigned int)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/13/bits/invoke.h:61
#33 0x000071a99febe750 in std::__invoke_r<void, dealii::WorkStream::internal::taskflow_no_coloring::run<dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >, dealii::DoFTools::internal::Assembler::Scratch, dealii::DoFTools::internal::Assembler::CopyData<1, 1> >(const dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >&, dealii::std_cxx20::type_identity_t<dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> > >&, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, const dealii::DoFTools::internal::Assembler::Scratch&, const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&, unsigned int, unsigned int)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/13/bits/invoke.h:111
#34 0x000071a99febdbac in std::_Function_handler<void(), dealii::WorkStream::internal::taskflow_no_coloring::run<dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >, dealii::DoFTools::internal::Assembler::Scratch, dealii::DoFTools::internal::Assembler::CopyData<1, 1> >(const dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >&, dealii::std_cxx20::type_identity_t<dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> > >&, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, const dealii::DoFTools::internal::Assembler::Scratch&, const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&, unsigned int, unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/13/bits/std_function.h:290
#35 0x000071a9983cf950 in std::function<void ()>::operator()() const (this=0x556572effa70)
    at /usr/include/c++/13/bits/std_function.h:591
#36 0x000071a99a8898c6 in tf::Executor::_invoke_static_task (this=0x556572ef1000, worker=..., node=0x556572eff9b8)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1758
#37 0x000071a99a888be5 in tf::Executor::_invoke (this=0x556572ef1000, worker=..., node=0x556572eff9b8)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1547
#38 0x000071a9a0e4bee6 in tf::Executor::_exploit_task (this=0x556572ef1000, w=..., t=@0x71a968bf8660: 0x556572eff9b8)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1319
#39 0x000071a9a0e4b9db in tf::Executor::_spawn(unsigned long)::{lambda()#1}::operator()() const (__closure=0x556572ee3938)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1209
#40 0x000071a9a0e5d0ae in std::__invoke_impl<void, tf::Executor::_spawn(unsigned long)::{lambda()#1}>(std::__invoke_other, tf::Executor::_spawn(unsigned long)::{lambda()#1}&&) (__f=...) at /usr/include/c++/13/bits/invoke.h:61
#41 0x000071a9a0e5d069 in std::__invoke<tf::Executor::_spawn(unsigned long)::{lambda()#1}>(tf::Executor::_spawn(unsigned long)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/13/bits/invoke.h:96
#42 0x000071a9a0e5d00a in std::thread::_Invoker<std::tuple<tf::Executor::_spawn(unsigned long)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x556572ee3938) at /usr/include/c++/13/bits/std_thread.h:292
#43 0x000071a9a0e5cfb4 in std::thread::_Invoker<std::tuple<tf::Executor::_spawn(unsigned long)::{lambda()#1}> >::operator()() (this=0x556572ee3938) at /usr/include/c++/13/bits/std_thread.h:299
#44 0x000071a9a0e5cf76 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<tf::Executor::_spawn(unsigned long)::{lambda()#1}> > >::_M_run() (this=0x556572ee3930) at /usr/include/c++/13/bits/std_thread.h:244
#45 0x000071a9878eabb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#46 0x000071a98749ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#47 0x000071a987529c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

@bangerth
Copy link
Member

Such is the peril with tasks: You can't spawn new tasks in a region you protect with a mutex. So the bug is here:

#25 0x000071a99a5c8cb3 in dealii::FE_DGQ<1, 1>::get_prolongation_matrix (this=0x71a8f8002af0, child=0, 
    refinement_case=...) at /ssd/deal-git/source/fe/fe_dgq.cc:458

@bangerth
Copy link
Member

Ugh, what a rats' nest :-(

@tjhei
Copy link
Member

tjhei commented Jul 21, 2024

@RyanMoulday can you please rebase to the current master branch?

@RyanMoulday RyanMoulday force-pushed the Taskflow-Workstream branch 2 times, most recently from 086657c to e7144c0 Compare July 21, 2024 20:54
@tjhei
Copy link
Member

tjhei commented Jul 28, 2024

This might be good to go now that we have a workaround for the deadlock. @RyanMoulday is there anything else missing, you think?

@RyanMoulday
Copy link
Contributor Author

The only thing left I can think of to do is setting up a similar guard within the workstream implementation to the one that exists in the tasking module where we corun if we are already within a taskflow task. I think it's theoretically possible if a workstream were to generate another workstream directly that since the calling thread doesn't participate we could lock up. Not sure if having a workstream directly within a workstream is something we ever do though.

@tjhei
Copy link
Member

tjhei commented Aug 25, 2024

@bangerth ?

@bangerth bangerth merged commit 2cebb55 into dealii:master Aug 25, 2024
16 checks passed
@bangerth
Copy link
Member

Let's get this merged. As mentioned above, I think that there are a number of things we should improve (chunking, re-use of objects), which I hope will get addressed in follow-up PRs. But we have a bit of time to the next release.

@bangerth
Copy link
Member

@RyanMoulday Would you mind opening an issue that lists the things we still want to improve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants