Fix poor attempt at thread safety #331

luca-heltai · 2014-12-15T16:40:21Z

From bange...@gmail.com on August 22, 2014 14:29:52

FE_DGQ and FE_Nedelec use a scheme whereby they defer construction of embedding and/or restriction matrices until they are in fact needed. The guard code looks like this:

{
  // initialization upon first request
  if (this->prolongation[refinement_case-1][child].n() == 0)
    {
      Threads::Mutex::ScopedLock lock(this->mutex);

      // if matrix got updated while waiting for the lock
      if (this->prolongation[refinement_case-1][child].n() ==
          this->dofs_per_cell)
        return this->prolongation[refinement_case-1][child];
   ...
  this_nonconst.prolongation[refinement_case-1].swap(isotropic_matrices.back());
}

The problem with this code is that the swap operation is not atomic: the first or second "if" may see a matrix that already has its size set but not the content yet. Or, worse, that doesn't have the size set yet, but has the elements already -- in which case it will be recomputed and the first thread to use the matrix may see temporary values while the second one recomputes the matrices.

The only way out is storing a boolean for each matrix indicating whether it is computed already, testing the boolean at the beginning and setting it at the very end.

Original issue: http://code.google.com/p/dealii/issues/detail?id=248

The text was updated successfully, but these errors were encountered:

luca-heltai · 2014-12-15T16:40:23Z

From kronbich...@gmail.com on August 22, 2014 12:20:58

Since this code was written by me (and is in FE_Q_Base, FE_DGQ, FESystem and now also FE_Nedelec), let me add a few comments (in particular since I had thought about a similar solution back then):

From a computer's perpective, I do not see a difference in testing for matrix.n() (which affects the last four bytes to be exchanged by prolongation.swap(), which again is an atomic operation) and making the matrices actually a pair of the matrices and a boolean flag, i.e., the very next byte. In particular, the content, i.e., the pointers to the start and end of the data array of the matrix, are swapped before that. So, the first case (size set but content not) cannot happen. I don't know the memory ordering models in all CPU brands, but at least on x86-64 the stores will be in program order and thus uncritical. In 7 out of 8 cases, the bool and the last matrix will even be on the same cache line and trigger the same cache coherency protocol inside the CPU (read from a cache line that is in L1 of the core that has just done the modifications).
The second case (size not set but elements ready) is uncritical because of a second test for matrix.n() == dofs_per_cell. This call is necessary because we wait for the mutex at the beginning of the section (and I could actually verify that this is necessary). But then we do not recompute, see the return statement above. The very same code would be necessary for the boolean.
Of course, item 1 assumes details on the implementation of the swap operation of TableBase (i.e., size indices are swapped after data pointers) and thus somewhat more brittle.

So from what I can judge correctness and performance in the current implementation is the same and it boils down to decide on whether we want simple data structures (current design) or to avoid assumptions on implementation details. I preferred the simplicity back then but really do not insist on that variant.

Cc: kronbich...@gmail.com

luca-heltai · 2014-12-15T16:40:28Z

From bange...@gmail.com on August 22, 2014 13:38:58

Fair enough, thanks for the explanation. I didn't mean to imply that the design in completely broken. I think it works in practice but relies on assumptions that may not be completely sound -- like you suggest, it depends on knowledge of how exactly the swap operation works.

For the Nedelec element, there is also the issue that (if I read the code correctly, I don't have it here right now) we don't actually swap in the matrix but first resize it and later fill it. That's definitely fishy.

My goal with this report was just to have it recorded somewhere that it is something we may want to look at at one point. Your explanations are definitely appreciated!

luca-heltai added Bug imported labels Dec 15, 2014

tamiko removed the imported label Jul 15, 2015

tamiko removed the Priority-Medium label Aug 25, 2015

bangerth self-assigned this Jun 16, 2023

drwells mentioned this issue Nov 7, 2023

FiniteElement: Implement lazy restriction/prolongation via dealii::Lazy<> #16204

Draft

bangerth mentioned this issue Nov 8, 2023

Access to FE* restriction and prolongation matrices is not thread-safe. #16169

Closed

bangerth added this to the Release 9.6 milestone Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix poor attempt at thread safety #331

Fix poor attempt at thread safety #331

luca-heltai commented Dec 15, 2014 •

edited by masterleinad

luca-heltai commented Dec 15, 2014

luca-heltai commented Dec 15, 2014

Fix poor attempt at thread safety #331

Fix poor attempt at thread safety #331

Comments

luca-heltai commented Dec 15, 2014 • edited by masterleinad

luca-heltai commented Dec 15, 2014

luca-heltai commented Dec 15, 2014

luca-heltai commented Dec 15, 2014 •

edited by masterleinad