Implementing Workstream::run() with Taskflow (no coloring) #17119

RyanMoulday · 2024-06-11T20:20:38Z

No description provided.

tjhei · 2024-06-11T20:21:08Z

Thank you, Ryan.

FYI @bangerth (this is part 1 of several)

tjhei · 2024-06-12T02:39:21Z

The test mpi/mesh_worker_02 is failing. We will need to investigate.

tjhei · 2024-06-12T16:55:08Z

The test mpi/mesh_worker_02 is failing. We will need to investigate.

Okay, the old test made the assumption that cells are visited in order. Fixed.

tjhei · 2024-06-13T20:24:40Z

can you reindent, @RyanMoulday ?

tjhei · 2024-06-16T20:52:26Z

We seem to be hitting some random test failures/timeouts. I think we need to investigate before we can merge this.

tjhei · 2024-06-18T17:28:01Z

this is waiting on #17131

bangerth

Good start. I think it would be nice if we had a way to re-use copier objects in the same way as you are re-using scratch objects. What does the TBB implementation do in this regard?

bangerth · 2024-06-20T20:39:52Z

tests/mpi/mesh_worker_01.cc

-  Utilities::MPI::MPI_InitFinalize mpi_initialization(
-    argc, argv, testing_max_num_threads());
-  MPILogInitAll log;
+  // Disable multithreading so that text output order is consistent
+  Utilities::MPI::MPI_InitFinalize mpi_initialization(argc, argv, 1);
+  MPILogInitAll                    log;


Why is this necessary? What is different between the TBB and the TaskFlow implementation that makes it necessary to make this change?

Right now we don't support chunking in the TaskFlow implementation. This particular test (and the others changed) relies on the output being generated in a specific order as it compares line by line. I believe this order is not explicitly assured by the TBB implementation either but just happens to be consistent because this gets chunked to run sequentially with the default grain size in the TBB implementation.

The outputs for the TaskFlow implementation are correct they just appear in a different order. I was advised that this would be the sensible change to make as right now I don't think this test actually even runs on more than 1 thread per MPI rank with the TBB implementation.

I'm not sure that's true, but moreover I think that this essentially disables the test and that would be a shame. Can you show in which ways the output might differ depending on thread placement of tasks? Currently, the output of the whole test is this:

DEAL:0::DoFHandler ndofs=2 DEAL:0::*** 1. CELLS *** DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=0 DEAL:0::* own_cells=1 ghost_cells=0 own_faces=0 faces_to_ghost=0 DEAL:0::C 0_0: DEAL:0::* own_cells=0 ghost_cells=1 own_faces=0 faces_to_ghost=0 DEAL:0::C 1_0: DEAL:0::* own_cells=1 ghost_cells=1 own_faces=0 faces_to_ghost=0 DEAL:0::C 0_0: DEAL:0::C 1_0: DEAL:0::*** 2. FACES *** DEAL:0::* own_cells=0 ghost_cells=0 own_faces=1 faces_to_ghost=0 DEAL:0::* own_cells=0 ghost_cells=0 own_faces=2 faces_to_ghost=0 DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=1 DEAL:0::F cell1 = 0_0: face = 1 cell2 = 1_0: face2 = 0 DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=2 DEAL:0::F cell1 = 0_0: face = 1 cell2 = 1_0: face2 = 0

Which lines are unreliably ordered?

Sorry to respond with the output of a different test but I just happen to have this one on hand.

Here is the expected output which TBB generates for Mesh_Worker_02:

DEAL:0::DoFHandler ndofs=7 DEAL:0::* own_cells=0 ghost_cells=0 own_faces=1 faces_to_ghost=0 DEAL:0::F cell1 = 0_1:1 face = 3 cell2 = 0_1:3 face2 = 2 DEAL:0::F cell1 = 0_1:2 face = 1 cell2 = 0_1:3 face2 = 0 DEAL:0::F cell1 = 0_2:00 face = 1 cell2 = 0_2:01 face2 = 0 DEAL:0::F cell1 = 0_2:00 face = 3 cell2 = 0_2:02 face2 = 2 DEAL:0::F cell1 = 0_2:01 face = 1 cell2 = 0_1:1 face2 = 0 DEAL:0::F cell1 = 0_2:01 face = 3 cell2 = 0_2:03 face2 = 2 DEAL:0::F cell1 = 0_2:02 face = 1 cell2 = 0_2:03 face2 = 0 DEAL:0::F cell1 = 0_2:02 face = 3 cell2 = 0_1:2 face2 = 2 DEAL:0::F cell1 = 0_2:03 face = 1 cell2 = 0_1:1 face2 = 0 DEAL:0::F cell1 = 0_2:03 face = 3 cell2 = 0_1:2 face2 = 2 DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=1 DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=2

And TaskFlow:

DEAL:0::DoFHandler ndofs=7 DEAL:0::* own_cells=0 ghost_cells=0 own_faces=1 faces_to_ghost=0 DEAL:0::F cell1 = 0_1:1 face = 3 cell2 = 0_1:3 face2 = 2 DEAL:0::F cell1 = 0_1:2 face = 1 cell2 = 0_1:3 face2 = 0 DEAL:0::F cell1 = 0_2:01 face = 1 cell2 = 0_1:1 face2 = 0 DEAL:0::F cell1 = 0_2:00 face = 1 cell2 = 0_2:01 face2 = 0 DEAL:0::F cell1 = 0_2:01 face = 3 cell2 = 0_2:03 face2 = 2 DEAL:0::F cell1 = 0_2:00 face = 3 cell2 = 0_2:02 face2 = 2 DEAL:0::F cell1 = 0_2:03 face = 1 cell2 = 0_1:1 face2 = 0 DEAL:0::F cell1 = 0_2:02 face = 1 cell2 = 0_2:03 face2 = 0 DEAL:0::F cell1 = 0_2:03 face = 3 cell2 = 0_1:2 face2 = 2 DEAL:0::F cell1 = 0_2:02 face = 3 cell2 = 0_1:2 face2 = 2 DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=1 DEAL:0::* own_cells=0 ghost_cells=0 own_faces=0 faces_to_ghost=2

basically, the test produce text output in the worker, that can be scheduled concurrently. The test used to work because the default chunk size with TBB is larger than the number of cells in this test. The proposed change make sure the test will run sequentially.
I think this is okay because the purpose of the test is not parallel computing but that the mesh worker visits the correct cells and faces.

I see, thank you!

I think that the change here is still awkward because bypass all of the interesting stuff if we have no threads, which is what happens here. On the other hand, we do test WorkStream pretty heavily in many other tests, so that might not matter.

bangerth · 2024-06-20T20:42:38Z

include/deal.II/base/work_stream.h

-#  ifdef DEAL_II_WITH_TBB
+#  ifdef DEAL_II_WITH_TASKFLOW
+        if (static_cast<const std::function<void(const CopyData &)> &>(copier))
+          {
+            // If we have a copier, run the algorithm:
+            internal::taskflow_no_coloring::run(begin,
+                                                end,
+                                                worker,
+                                                copier,
+                                                sample_scratch_data,
+                                                sample_copy_data,
+                                                queue_length,
+                                                chunk_size);
+          }
+        else
+          {
+            // There is no copier function. in this case, we have an
+            // embarrassingly parallel problem where we can
+            // essentially apply parallel_for. because parallel_for
+            // requires subdividing the range for which operator- is
+            // necessary between iterators, it is often inefficient to
+            // apply it directly to cell ranges and similar iterator
+            // types for which operator- is expensive or, in fact,
+            // nonexistent. rather, in that case, we simply copy the
+            // iterators into a large array and use operator- on
+            // iterators to this array of iterators.
+            //
+            // instead of duplicating code, this is essentially the
+            // same situation we have in the colored implementation below, so we
+            // just defer to that place
+            std::vector<std::vector<Iterator>> all_iterators(1);
+            for (Iterator p = begin; p != end; ++p)
+              all_iterators[0].push_back(p);
+
+            run(all_iterators,
+                worker,
+                copier,
+                sample_scratch_data,
+                sample_copy_data,
+                queue_length,
+                chunk_size);
+          }
+
+        // exit this function to not run the sequential version below:
+        return;
+#  elif defined(DEAL_II_WITH_TBB)


The only place that's different between the two implementation is the if block. I think it might be nicer to write this as

#if defined(DEAL_II_WITH_TBB) || defined(DEAL_II_WITH_TASKFLOW) if (have copier) { # if defined(DEAL_II_WITH_TBB) new code # elif defined(DEAL_II_WITH_TBB) old code # endif } else { ... no change in this part, just forward to the other version of run() ... } #endif

bangerth · 2024-06-20T21:38:57Z

include/deal.II/base/work_stream.h

+      template <typename Worker,
+                typename Copier,
+                typename Iterator,
+                typename ScratchData,
+                typename CopyData>
+
+      /**
+       *  The last two arguments in this function are for chunking support which
+       * currently does not exist but ideally will later. For now they are
+       * ignored but still here to permit existing programs to function
+       */
+      void
+      run(const Iterator                          &begin,


Put the comment above the declaration (which starts with template). In general, start by saying what the function does, rather than with a comment about its arguments.

bangerth · 2024-06-20T21:40:08Z

include/deal.II/base/work_stream.h

+       */
+      void
+      run(const Iterator                          &begin,
+          const typename identity<Iterator>::type &end,


Use std_cxx20::type_identity_t<Iterator> as elsewhere in this file.

bangerth · 2024-06-20T21:41:40Z

include/deal.II/base/work_stream.h

+          const unsigned int queue_length = 2 * MultithreadInfo::n_threads(),
+          const unsigned int chunk_size   = 8)


You'll get warnings about unused arguments if you don't un-name these arguments:

Suggested change

const unsigned int queue_length = 2 * MultithreadInfo::n_threads(),

const unsigned int chunk_size = 8)

const unsigned int /*queue_length*/ = 2 * MultithreadInfo::n_threads(),

const unsigned int /*chunk_size*/ = 8)

bangerth · 2024-06-20T21:47:20Z

include/deal.II/base/work_stream.h

+        // This is used to connect each worker to its copier as communication
+        // between tasks is not supported.
+        unsigned int idx = 0;


In general, starting a sentence with "This" is awkward because it's unclear what "this" refers to. Did you mean "The following variable..."? If so, however, I don't understand what the comment is saying. How are workers using this variable to make this connection?

bangerth · 2024-06-20T21:51:20Z

include/deal.II/base/work_stream.h

+        std::vector<std::unique_ptr<CopyData>> copy_datas;
+
+        for (Iterator i = begin; i != end; ++i, ++idx)
+          {
+            copy_datas.emplace_back();
+            // Create a worker task.
+            auto worker_task =
+              taskflow
+                .emplace([it = i,
+                          idx,
+                          &data,
+                          &sample_scratch_data,
+                          &sample_copy_data,
+                          &copy_datas,


This is not going to be thread-safe. You are taking a reference to copy_datas, which you access via copy_datas[idx] below from the worker thread, but at the same time the thread that spawns all of these tasks is modifying copy_datas in line 745.

(Or perhaps what you are saying is that you are only creating the task objects here, not yet running them, and all of the accesses once the tasks are running only ever modify a single entry of the vector. That is ok, but it would be nice to describe this in a comment.)

bangerth · 2024-06-20T21:51:47Z

include/deal.II/base/work_stream.h

+
+        using ScratchDataList = std::list<ScratchDataObjects>;
+
+        Threads::ThreadLocalStorage<ScratchDataList> data;


Perhaps give this a better name, like scratch_data_list?

bangerth · 2024-06-20T21:57:13Z

include/deal.II/base/work_stream.h

+            // Ensure that only one copy task can run at a time.
+            if (!last_copier.empty())
+              last_copier.precede(copier_task);
+            last_copier = copier_task;


What are the semantics of copying tasks? Does the copy point to the same task, or is it a separate object? This may be useful to document at this place.

I think this would still be good to address.

@RyanMoulday Can you add a comment above this assignment that says:

// Keep a handle to the last copier. Tasks in taskflow are basically handles to internally stored data, so this does not perform a copy:

include/deal.II/base/work_stream.h

RyanMoulday · 2024-06-21T00:01:58Z

Good start. I think it would be nice if we had a way to re-use copier objects in the same way as you are re-using scratch objects. What does the TBB implementation do in this regard?

I think this is a limitation of the current static tasking idea where the task graph is generated before we start running anything. Each copy task is a totally separate task that only knows it needs to run after its worker task (and after all previous copy tasks) but cannot receive any information from its worker task. As a result the copy task is usually handled by a thread that did not handle the worker task so a thread local copy object doesn't work and many more copy objects may exist than scratch objects.

I imagine the correct way to get around this is the parallel pipeline implementation which is what TBB uses right now. TaskFlow also has its own parallel pipeline which could be used. The structures are pretty similar so this may be feasible with just slight edits to the TBB implementation if we want to go that route (this current code would basically be discarded).

tjhei · 2024-06-21T14:40:33Z

@RyanMoulday Can you please also rebase to the most current master? This should fix the nedelec tests.

@bangerth There are some shortcomings in the current approach (no reuse of Copy objects, no chunking, storing a large vector of pointers, etc.). Nevertheless, Ryan measured same or better performance than TBB. I was encouraged by the results and thought it might be worthwhile to get these initial versions merged and later think about optimizing the implementation. Thoughts?

bangerth · 2024-06-21T20:41:46Z

If there is a commitment to address some of these issues in future patches, I'm ok with an incremental approach.

I do think that it would be interesting to try out the TaskFlow pipeline implementation. Having a non-pipeline implementation already merged might make for a nice baseline.

tjhei · 2024-06-24T18:13:58Z

Ryan, is this ready for a another round of reviews?

RyanMoulday · 2024-06-24T18:50:41Z

Yes I have made the requested changes. The only hanging thing left is the tests and what we will do with them.

tjhei · 2024-06-26T22:15:24Z

I do think that it would be interesting to try out the TaskFlow pipeline implementation

I started looking into the tf::Pipeline and I have to admit that it is very hard to understand. We will give it a try, though.

bangerth · 2024-06-26T22:53:12Z

On 6/26/24 16:15, Timo Heister wrote: I started looking into the tf::Pipeline and I have to admit that it is very hard to understand. We will give it a try, though.

I see you already found my open issue about documentation. It's good to know that it's not just me then...

bangerth · 2024-06-26T23:00:08Z

include/deal.II/base/work_stream.h

+    template <typename Iterator, typename ScratchData, typename CopyData>
+    struct ScratchDataObject
+    {
+      std::unique_ptr<ScratchData> scratch_data;
+      bool                         currently_in_use;
+
+      /**
+       * Default constructor.
+       */
+      ScratchDataObject()
+        : currently_in_use(false)
+      {}
+
+      ScratchDataObject(std::unique_ptr<ScratchData> &&p, const bool in_use)
+        : scratch_data(std::move(p))
+        , currently_in_use(in_use)
+      {}
+
+      ScratchDataObject(ScratchData *p, const bool in_use)
+        : scratch_data(p)
+        , currently_in_use(in_use)
+      {}
+
+      // Provide a copy constructor that actually doesn't copy the
+      // internal state. This makes handling ScratchAndCopyDataObjects
+      // easier to handle with STL containers.
+      ScratchDataObject(const ScratchDataObject &)
+        : currently_in_use(false)
+      {}
+
+      ScratchDataObject(ScratchDataObject &&o) noexcept = default;
+    };


On second reading, I don't see why this class has Iterator and CopyData as template arguments. They do not seem to be used.

bangerth · 2024-06-26T23:05:49Z

include/deal.II/base/work_stream.h

+        unsigned int idx = 0;
+
+        std::vector<std::unique_ptr<CopyData>> copy_datas;
+
+        // Generate a static task graph. Here we generate a task for each cell
+        // that will be worked on. The tasks are not executed until all of them
+        // are created, this code runs sequentially.
+        for (Iterator i = begin; i != end; ++i, ++idx)
+          {
+            copy_datas.emplace_back();
+            // Create a worker task.
+            auto worker_task =
+              taskflow
+                .emplace([it = i,
+                          idx,


This is not wrong, but it made me wonder whether there is a good reason to capture the iterator i via a local renamed variable it whereas you just capture idx with its old name. Doing the same thing in different ways is a good way to confuse the reader, so my preference would be to use one style or the other.

agreed. @RyanMoulday Can you please rename i to it in the for loop above and then just capture it?

bangerth · 2024-06-26T23:08:21Z

include/deal.II/base/work_stream.h

+            // Ensure that only one copy task can run at a time.
+            if (!last_copier.empty())
+              last_copier.precede(copier_task);
+            last_copier = copier_task;


I think this would still be good to address.

bangerth · 2024-06-26T23:12:43Z

tests/mpi/mesh_worker_01.cc

-  Utilities::MPI::MPI_InitFinalize mpi_initialization(
-    argc, argv, testing_max_num_threads());
-  MPILogInitAll log;
+  // Disable multithreading so that text output order is consistent
+  Utilities::MPI::MPI_InitFinalize mpi_initialization(argc, argv, 1);
+  MPILogInitAll                    log;


I think that the change here is still awkward because bypass all of the interesting stuff if we have no threads, which is what happens here. On the other hand, we do test WorkStream pretty heavily in many other tests, so that might not matter.

tjhei · 2024-06-26T23:54:31Z

I think that the change here is still awkward

You should tell the original author about the bad test design (printing from a worker task! edit: this was me, btw :-) ). But I think this is acceptable, as the test did not run in parallel before either and it is not made to test that workstream runs in parallel.

RyanMoulday · 2024-06-27T00:29:26Z

I've gone ahead and made the most recent requested changes

RyanMoulday · 2024-06-27T07:35:43Z

This grid/intergrid_constraints.debug timeout seems to be a recurring issue. I am unable to recreate it when running the test suite on my own device so investigating the cause is difficult. The test seems to be for compute_intergrid_weights and within that function the helper function compute_intergrid_weights_2 makes a workstream call so perhaps the issue is here somewhere.

tjhei · 2024-06-27T15:55:53Z

es a workstream call so perhaps the issue is here somewhere.

It does not fail on master, so I think this is related to your changes here. The test might already be slow to begin with, but we will need to check.

RyanMoulday · 2024-06-27T17:09:50Z

es a workstream call so perhaps the issue is here somewhere.

It does not fail on master, so I think this is related to your changes here. The test might already be slow to begin with, but we will need to check.

Its definitely related to these changes but I was trying to figure out where this test actually invokes the new code and that was the main place that I could easily see it happening.

The test is not remarkably slow when it passes, on the runs that it did pass it took 15-30 seconds. The timeout is 1200 seconds so there is definitely something weird going on.

tjhei · 2024-06-27T17:53:50Z

Agreed. When it passes on your branch it only takes a few seconds: https://ci.tjhei.info/job/dealii-serial/job/PR-17119/13/testReport/projectroot.tests.grid.intergrid_constraintsdebug/

Do we have a deadlock in this test? Does it pass every time you run it on your computer? (you can cd into tests/grid/intergrid_constraints.debug and run the binary there manually) (try to run it 10 times for example)

tjhei · 2024-06-28T14:54:57Z

Exactly, except if there is a bug within taskflow.

tjhei · 2024-06-30T12:52:52Z

Ryan and I figured out that the hang disappears if we disable the taskflow async() calls that we use for background tasks. This seems like we might be hitting a bug in the taskflow scheduler.

We could move forward with this PR if we disable the async() functionality. Otherwise, we would need to a) make a small program that can reproduce the hang, b) get it fixed in taskflow (assuming it is indeed a bug), c) wait for the next taskflow release, d) then merge this PR.

tjhei · 2024-07-05T09:41:27Z

@bangerth Any thoughts on the current situation? How sure are you that

dealii/include/deal.II/base/thread_management.h

Lines 960 to 965 in d25bc7b

    
           if (MultithreadInfo::get_taskflow_executor().this_worker_id() >= 0) 
        
             MultithreadInfo::get_taskflow_executor().corun_until([this]() { 
        
               return (future.wait_for(std::chrono::seconds(0)) == 
        
                       std::future_status::ready); 
        
             }); 
        
           else

is correct and can not cause an issue?

tjhei · 2024-07-08T08:43:34Z

I looked at the hang one more time and I observe the following behavior:
thread 1 waits for a workstream to finish running:

#7  0x000071204b8aa0df in dealii::WorkStream::internal::taskflow_no_coloring::run<>...
executor.run(taskflow).wait();

thread 2-20 belong to TBB and are idle

thread 21 seems to be an idle taskflow worker:

#5  0x000071204c8485af in tf::Notifier::_park (this=0x5b245d96a138, w=0x5b245d959330)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/notifier.hpp:266
#6  0x000071204c8482d1 in tf::Notifier::commit_wait (this=0x5b245d96a138, w=0x5b245d959330)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/notifier.hpp:144
#7  0x000071204c84c0ce in tf::Executor::_wait_for_task (this=0x5b245d96a000, worker=..., t=@0x71200f3f8660: 0x0)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1364
#8  0x000071204c84b9f9 in tf::Executor::_spawn(unsigned long)::{lambda()#1}::operator()() const (__closure=0x5b245d95caf8)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1212
#9  0x000071204c85d0ae in std::__invoke_impl<void, tf::Executor::_spawn(unsigned long)::{lambda()#1}>(std::__invoke_other, tf::Executor::_spawn(unsigned long)::{lambda()#1}&&) (__f=...) at /usr/include/c++/13/bits/invoke.h:61

Thread 22/23 are taskflow workers stuck waiting for a mutex, both are tasks in the workstream:

#6  0x0000712043ea33cc in std::lock_guard<std::mutex>::lock_guard (this=0x71200fdf7b80, __m=...)
    at /usr/include/c++/13/bits/std_mutex.h:249
#7  0x00007120463d08db in dealii::FESystem<2, 2>::get_prolongation_matrix (this=0x5b245d994390, child=0, 
    refinement_case=...) at /ssd/deal-git/source/fe/fe_system.cc:940
...
#15 0x000071204b8be110 in std::_Function_handler<void(), dealii::WorkStream::internal::taskflow_no_coloring::run<d

I don't understand why the thread that holds the lock is not making any progress. Does that mean taskflow interrupted the task while it is holding the lock?

bangerth · 2024-07-08T19:55:51Z

On 7/5/24 03:41, Timo Heister wrote: @bangerth <https://github.com/bangerth> Any thoughts on the current situation? How sure are you that https://github.com/dealii/dealii/blob/d25bc7b6a69a8d4204cab10b88acbeded04ee040/include/deal.II/base/thread_management.h#L960-L965 <https://github.com/dealii/dealii/blob/d25bc7b6a69a8d4204cab10b88acbeded04ee040/include/deal.II/base/thread_management.h#L960-L965> is correct and can not cause an issue?

Reasonably sure -- the best I can say about TaskFlow. I've been running with TaskFlow tasks for weeks locally and have not had problems with it. My best guess is that the problem is with this patch here.

bangerth · 2024-07-11T21:52:09Z

This is confusing me as well. If you have two threads that are stuck waiting for a mutex, who is currently holding the mutex? It must be one of the other threads, but it's not clear to from what you show which one that would be. In any of the other threads, is one of the higher frames also in FESystem::get_prolongation_matrix(), somewhere in the middle? That would suggest that that thread holds the mutex, called a function that itself spawns tasks, and that these tasks cannot be scheduled because there are other tasks currently waiting for the mutex.

(Separately, I think you can configure with TaskFlow and without TBB. In that case, you don't have to deal with all of the TBB worker threads.)

bangerth · 2024-07-11T21:53:18Z

In fact, it's enough if any of the threads in a higher frame is somewhere in the middle of FESystem::get_prolongation_matrix(). This may be one of the other threads, but it can also be one of the ones currently blocked.

tjhei · 2024-07-12T01:06:46Z

This may be one of the other threads, but it can also be one of the ones currently blocked.

Yes, you are indeed correct. I didn't read far enough down. The workstream function indirectly calls get_prolongation_matrix(), which waits for tasks to complete, which taskflow uses to corun another task of workstream that hangs indefinitely:

#0  futex_wait (private=0, expected=2, futex_word=0x556572f106b0) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x556572f106b0, private=0) at ./nptl/lowlevellock.c:49
#2  0x000071a9874a00f1 in lll_mutex_lock_optimized (mutex=0x556572f106b0) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x556572f106b0) at ./nptl/pthread_mutex_lock.c:93
#4  0x000071a99847cacb in __gthread_mutex_lock (__mutex=0x556572f106b0)
    at /usr/include/x86_64-linux-gnu/c++/13/bits/gthr-default.h:749
#5  0x000071a99847cbf6 in std::mutex::lock (this=0x556572f106b0) at /usr/include/c++/13/bits/std_mutex.h:113
#6  0x000071a9984a33cc in std::lock_guard<std::mutex>::lock_guard (this=0x71a968bf67c0, __m=...)
    at /usr/include/c++/13/bits/std_mutex.h:249
#7  0x000071a99a9a7127 in dealii::FESystem<1, 1>::get_prolongation_matrix (this=0x556572f101c0, child=0, 
    refinement_case=...) at /ssd/deal-git/source/fe/fe_system.cc:940
#8  0x000071a99fb51397 in dealii::internal::process_by_interpolation<1, 1, false, dealii::Vector<double>, double>(dealii::DoFCellAccessor<1, 1, false> const&, dealii::Vector<double> const&, dealii::Vector<double>&, unsigned short, std::function<void (dealii::DoFCellAccessor<1, 1, false> const&, dealii::Vector<double> const&, dealii::Vector<double>&)> const&) (
    cell=..., local_values=..., values=..., fe_index_=65535, processor=...)
    at /ssd/deal-git/source/dofs/dof_accessor_set.cc:252
#9  0x000071a99fb24af9 in dealii::DoFCellAccessor<1, 1, false>::set_dof_values_by_interpolation<dealii::Vector<double>, double> (this=0x71a968bf6d50, local_values=..., values=..., fe_index_=65535, perform_check=false)
    at /ssd/deal-git/source/dofs/dof_accessor_set.cc:273
#10 0x000071a99fea2803 in dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_3<1, 1> (cell=..., 
    copy_data=..., coarse_component=5, coarse_fe=..., coarse_to_fine_grid_map=..., 
    parameter_dofs=std::vector of length 3, capacity 3 = {...}) at /ssd/deal-git/source/dofs/dof_tools_constraints.cc:4013
#11 0x000071a99fe9d856 in operator() (__closure=0x7ffceb9ba050, cell=..., scratch_data=..., copy_data=...)
    at /ssd/deal-git/source/dofs/dof_tools_constraints.cc:4171
#12 0x000071a99fea6d6f in operator() (__closure=0x556572ef7380) at /ssd/deal-git/include/deal.II/base/work_stream.h:770
#13 0x000071a99febfa8c in std::__invoke_impl<void, dealii::WorkStream::internal::taskflow_no_coloring::run<dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >, dealii::DoFTools::internal::Assembler::Scratch, dealii::DoFTools::internal::Assembler::CopyData<1, 1> >(const dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >&, dealii::std_cxx20::type_identity_t<dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> > >&, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, const dealii::DoFTools::internal::Assembler::Scratch&, const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&, unsigned int, unsigned int)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/13/bits/invoke.h:61
#14 0x000071a99febe750 in std::__invoke_r<void, dealii::WorkStream::internal::taskflow_no_coloring::run<dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >, dealii::DoFTools::internal::Assembler::Scratch, dealii::DoFTools::internal::Assembler::CopyData<1, 1> >(const dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >&, dealii::std_cxx20::type_identity_t<dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> > >&, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, const dealii::DoFTools::internal::Assembler::Scratch&, const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&, unsigned int, unsigned int)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/13/bits/invoke.h:111
#15 0x000071a99febdbac in std::_Function_handler<void(), dealii::WorkStream::internal::taskflow_no_coloring::run<dealii::Do--Type <RET> for more, q to quit, c to continue without paging--c

FTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >, dealii::DoFTools::internal::Assembler::Scratch, dealii::DoFTools::internal::Assembler::CopyData<1, 1> >(const dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >&, dealii::std_cxx20::type_identity_t<dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> > >&, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, const dealii::DoFTools::internal::Assembler::Scratch&, const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&, unsigned int, unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/13/bits/std_function.h:290
#16 0x000071a9983cf950 in std::function<void ()>::operator()() const (this=0x556572effd58)
    at /usr/include/c++/13/bits/std_function.h:591
#17 0x000071a99a8898c6 in tf::Executor::_invoke_static_task (this=0x556572ef1000, worker=..., node=0x556572effca0)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1758
#18 0x000071a99a888be5 in tf::Executor::_invoke (this=0x556572ef1000, worker=..., node=0x556572effca0)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1547
#19 0x000071a99a8cee0a in tf::Executor::_corun_until<dealii::Threads::Task<void>::TaskData::wait()::{lambda()#1}>(tf::Worker&, dealii::Threads::Task<void>::TaskData::wait()::{lambda()#1}&&) (this=0x556572ef1000, w=..., stop_predicate=...)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1267
#20 0x000071a99a8cc0f9 in tf::Executor::corun_until<dealii::Threads::Task<void>::TaskData::wait()::{lambda()#1}>(dealii::Threads::Task<void>::TaskData::wait()::{lambda()#1}&&) (this=0x556572ef1000, predicate=...)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:2085
#21 0x000071a99a8c8cc4 in dealii::Threads::Task<void>::TaskData::wait (this=0x71a8f802f710)
    at /ssd/deal-git/include/deal.II/base/thread_management.h:963
#22 0x000071a99a8c6434 in dealii::Threads::Task<void>::join (this=0x71a8f801ce80)
    at /ssd/deal-git/include/deal.II/base/thread_management.h:744
#23 0x000071a99a8c081d in dealii::Threads::TaskGroup<void>::join_all (this=0x71a968bf7570)
    at /ssd/deal-git/include/deal.II/base/thread_management.h:1439
#24 0x000071a99bc306aa in dealii::FETools::compute_embedding_matrices<1, double, 1> (fe=..., 
    matrices=std::vector of length 1, capacity 1 = {...}, isotropic_only=true, threshold=9.9999999999999998e-13)
    at /ssd/deal-git/include/deal.II/fe/fe_tools.templates.h:1894
#25 0x000071a99a5c8cb3 in dealii::FE_DGQ<1, 1>::get_prolongation_matrix (this=0x71a8f8002af0, child=0, 
    refinement_case=...) at /ssd/deal-git/source/fe/fe_dgq.cc:458
#26 0x000071a99a9a7284 in dealii::FESystem<1, 1>::get_prolongation_matrix (this=0x556572f101c0, child=0, 
    refinement_case=...) at /ssd/deal-git/source/fe/fe_system.cc:951
#27 0x000071a99fb51397 in dealii::internal::process_by_interpolation<1, 1, false, dealii::Vector<double>, double>(dealii::DoFCellAccessor<1, 1, false> const&, dealii::Vector<double> const&, dealii::Vector<double>&, unsigned short, std::function<void (dealii::DoFCellAccessor<1, 1, false> const&, dealii::Vector<double> const&, dealii::Vector<double>&)> const&) (
    cell=..., local_values=..., values=..., fe_index_=65535, processor=...)
    at /ssd/deal-git/source/dofs/dof_accessor_set.cc:252
#28 0x000071a99fb24af9 in dealii::DoFCellAccessor<1, 1, false>::set_dof_values_by_interpolation<dealii::Vector<double>, double> (this=0x71a968bf8110, local_values=..., values=..., fe_index_=65535, perform_check=false)
    at /ssd/deal-git/source/dofs/dof_accessor_set.cc:273
#29 0x000071a99fea2803 in dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_3<1, 1> (cell=..., 
    copy_data=..., coarse_component=5, coarse_fe=..., coarse_to_fine_grid_map=..., 
    parameter_dofs=std::vector of length 3, capacity 3 = {...}) at /ssd/deal-git/source/dofs/dof_tools_constraints.cc:4013
#30 0x000071a99fe9d856 in operator() (__closure=0x7ffceb9ba050, cell=..., scratch_data=..., copy_data=...)
    at /ssd/deal-git/source/dofs/dof_tools_constraints.cc:4171
#31 0x000071a99fea6d6f in operator() (__closure=0x556572f0fb50) at /ssd/deal-git/include/deal.II/base/work_stream.h:770
#32 0x000071a99febfa8c in std::__invoke_impl<void, dealii::WorkStream::internal::taskflow_no_coloring::run<dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >, dealii::DoFTools::internal::Assembler::Scratch, dealii::DoFTools::internal::Assembler::CopyData<1, 1> >(const dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >&, dealii::std_cxx20::type_identity_t<dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> > >&, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, const dealii::DoFTools::internal::Assembler::Scratch&, const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&, unsigned int, unsigned int)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/13/bits/invoke.h:61
#33 0x000071a99febe750 in std::__invoke_r<void, dealii::WorkStream::internal::taskflow_no_coloring::run<dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >, dealii::DoFTools::internal::Assembler::Scratch, dealii::DoFTools::internal::Assembler::CopyData<1, 1> >(const dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >&, dealii::std_cxx20::type_identity_t<dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> > >&, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, const dealii::DoFTools::internal::Assembler::Scratch&, const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&, unsigned int, unsigned int)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/13/bits/invoke.h:111
#34 0x000071a99febdbac in std::_Function_handler<void(), dealii::WorkStream::internal::taskflow_no_coloring::run<dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >, dealii::DoFTools::internal::Assembler::Scratch, dealii::DoFTools::internal::Assembler::CopyData<1, 1> >(const dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> >&, dealii::std_cxx20::type_identity_t<dealii::TriaActiveIterator<dealii::DoFCellAccessor<1, 1, false> > >&, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFHandler<1, 1>::active_cell_iterator&, const dealii::DoFTools::internal::Assembler::Scratch&, dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, dealii::DoFTools::internal::(anonymous namespace)::compute_intergrid_weights_2<1, 1>(const dealii::DoFHandler<1, 1>&, unsigned int, const dealii::InterGridMap<dealii::DoFHandler<1, 1> >&, const std::vector<dealii::Vector<double> >&, const std::vector<unsigned int>&, std::vector<std::map<unsigned int, float> >&)::<lambda(const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&)>, const dealii::DoFTools::internal::Assembler::Scratch&, const dealii::DoFTools::internal::Assembler::CopyData<1, 1>&, unsigned int, unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/13/bits/std_function.h:290
#35 0x000071a9983cf950 in std::function<void ()>::operator()() const (this=0x556572effa70)
    at /usr/include/c++/13/bits/std_function.h:591
#36 0x000071a99a8898c6 in tf::Executor::_invoke_static_task (this=0x556572ef1000, worker=..., node=0x556572eff9b8)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1758
#37 0x000071a99a888be5 in tf::Executor::_invoke (this=0x556572ef1000, worker=..., node=0x556572eff9b8)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1547
#38 0x000071a9a0e4bee6 in tf::Executor::_exploit_task (this=0x556572ef1000, w=..., t=@0x71a968bf8660: 0x556572eff9b8)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1319
#39 0x000071a9a0e4b9db in tf::Executor::_spawn(unsigned long)::{lambda()#1}::operator()() const (__closure=0x556572ee3938)
    at /ssd/deal-git/bundled/taskflow-3.7.0/taskflow/core/executor.hpp:1209
#40 0x000071a9a0e5d0ae in std::__invoke_impl<void, tf::Executor::_spawn(unsigned long)::{lambda()#1}>(std::__invoke_other, tf::Executor::_spawn(unsigned long)::{lambda()#1}&&) (__f=...) at /usr/include/c++/13/bits/invoke.h:61
#41 0x000071a9a0e5d069 in std::__invoke<tf::Executor::_spawn(unsigned long)::{lambda()#1}>(tf::Executor::_spawn(unsigned long)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/13/bits/invoke.h:96
#42 0x000071a9a0e5d00a in std::thread::_Invoker<std::tuple<tf::Executor::_spawn(unsigned long)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x556572ee3938) at /usr/include/c++/13/bits/std_thread.h:292
#43 0x000071a9a0e5cfb4 in std::thread::_Invoker<std::tuple<tf::Executor::_spawn(unsigned long)::{lambda()#1}> >::operator()() (this=0x556572ee3938) at /usr/include/c++/13/bits/std_thread.h:299
#44 0x000071a9a0e5cf76 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<tf::Executor::_spawn(unsigned long)::{lambda()#1}> > >::_M_run() (this=0x556572ee3930) at /usr/include/c++/13/bits/std_thread.h:244
#45 0x000071a9878eabb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#46 0x000071a98749ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#47 0x000071a987529c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

bangerth · 2024-07-19T21:50:36Z

Such is the peril with tasks: You can't spawn new tasks in a region you protect with a mutex. So the bug is here:

#25 0x000071a99a5c8cb3 in dealii::FE_DGQ<1, 1>::get_prolongation_matrix (this=0x71a8f8002af0, child=0, 
    refinement_case=...) at /ssd/deal-git/source/fe/fe_dgq.cc:458

bangerth · 2024-07-19T22:38:51Z

Ugh, what a rats' nest :-(

tjhei · 2024-07-21T12:14:46Z

@RyanMoulday can you please rebase to the current master branch?

tjhei · 2024-07-28T23:08:33Z

This might be good to go now that we have a workaround for the deadlock. @RyanMoulday is there anything else missing, you think?

RyanMoulday · 2024-07-29T00:40:11Z

The only thing left I can think of to do is setting up a similar guard within the workstream implementation to the one that exists in the tasking module where we corun if we are already within a taskflow task. I think it's theoretically possible if a workstream were to generate another workstream directly that since the calling thread doesn't participate we could lock up. Not sure if having a workstream directly within a workstream is something we ever do though.

...

tjhei · 2024-08-25T02:25:04Z

@bangerth ?

bangerth · 2024-08-25T14:54:10Z

Let's get this merged. As mentioned above, I think that there are a number of things we should improve (chunking, re-use of objects), which I hope will get addressed in follow-up PRs. But we have a bit of time to the next release.

bangerth · 2024-08-25T14:54:33Z

@RyanMoulday Would you mind opening an issue that lists the things we still want to improve?

tjhei added ready to test Parallel shared Parallel labels Jun 11, 2024

RyanMoulday force-pushed the Taskflow-Workstream branch from f998fdb to 05d5f85 Compare June 11, 2024 20:28

RyanMoulday force-pushed the Taskflow-Workstream branch from d05c8e0 to ecc709c Compare June 14, 2024 03:39

tjhei mentioned this pull request Jun 17, 2024

sigabrt in matrixcreator workstream inside VectorTools::project #17131

Closed

bangerth mentioned this pull request Jun 19, 2024

Make access to GridTools::Cache thread safe. #17147

Merged

bangerth previously requested changes Jun 20, 2024

View reviewed changes

RyanMoulday force-pushed the Taskflow-Workstream branch from ecc709c to 79d0fc7 Compare June 21, 2024 18:11

bangerth reviewed Jun 26, 2024

View reviewed changes

tjhei approved these changes Jun 27, 2024

View reviewed changes

This was referenced Jul 19, 2024

Fix blocking and recursive task calls in FE_DGQ (and perhaps other finite element classes). #17317

Closed

FiniteElement: Implement lazy restriction/prolongation via dealii::Lazy<> #16204

Draft

Fix Lazy's approach to tasking. #17318

Closed

RyanMoulday force-pushed the Taskflow-Workstream branch 2 times, most recently from 086657c to e7144c0 Compare July 21, 2024 20:54

Adding taskflow w/o coloring

e14716c

RyanMoulday force-pushed the Taskflow-Workstream branch from e7144c0 to e14716c Compare July 25, 2024 00:51

tjhei requested a review from bangerth August 11, 2024 14:05

bangerth merged commit 2cebb55 into dealii:master Aug 25, 2024
16 checks passed

RyanMoulday mentioned this pull request Aug 25, 2024

Iterative Improvements to the WorkStream no coloring implementation using TaskFlow #17603

Open

3 tasks

marcfehling mentioned this pull request Aug 26, 2024

Fix a typo. #17608

Closed

RyanMoulday mentioned this pull request Aug 31, 2024

Workstream::run() colored implementation with TaskFlow #17642

Merged

RyanMoulday deleted the Taskflow-Workstream branch September 7, 2024 21:10

		const unsigned int queue_length = 2 * MultithreadInfo::n_threads(),
		const unsigned int chunk_size = 8)


		using ScratchDataList = std::list<ScratchDataObjects>;

		Threads::ThreadLocalStorage<ScratchDataList> data;

Implementing Workstream::run() with Taskflow (no coloring) #17119

Implementing Workstream::run() with Taskflow (no coloring) #17119

Conversation

RyanMoulday commented Jun 11, 2024

tjhei commented Jun 11, 2024 • edited Loading

tjhei commented Jun 12, 2024

tjhei commented Jun 12, 2024

tjhei commented Jun 13, 2024

tjhei commented Jun 16, 2024

tjhei commented Jun 18, 2024

bangerth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RyanMoulday commented Jun 21, 2024

tjhei commented Jun 21, 2024

bangerth commented Jun 21, 2024

tjhei commented Jun 24, 2024

RyanMoulday commented Jun 24, 2024

tjhei commented Jun 26, 2024

bangerth commented Jun 26, 2024 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tjhei commented Jun 26, 2024 • edited Loading

RyanMoulday commented Jun 27, 2024

RyanMoulday commented Jun 27, 2024

tjhei commented Jun 27, 2024

RyanMoulday commented Jun 27, 2024

tjhei commented Jun 27, 2024

tjhei commented Jun 28, 2024

tjhei commented Jun 30, 2024

tjhei commented Jul 5, 2024

tjhei commented Jul 8, 2024

bangerth commented Jul 8, 2024 via email

bangerth commented Jul 11, 2024

bangerth commented Jul 11, 2024

tjhei commented Jul 12, 2024

bangerth commented Jul 19, 2024

bangerth commented Jul 19, 2024

tjhei commented Jul 21, 2024

tjhei commented Jul 28, 2024 • edited Loading

RyanMoulday commented Jul 29, 2024

tjhei commented Aug 25, 2024

bangerth commented Aug 25, 2024

bangerth commented Aug 25, 2024

tjhei commented Jun 11, 2024 •

edited

Loading

tjhei commented Jun 26, 2024 •

edited

Loading

tjhei commented Jul 28, 2024 •

edited

Loading