Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix poor attempt at thread safety #331

Open
luca-heltai opened this issue Dec 15, 2014 · 2 comments
Open

Fix poor attempt at thread safety #331

luca-heltai opened this issue Dec 15, 2014 · 2 comments
Assignees
Labels
Milestone

Comments

@luca-heltai
Copy link
Member

luca-heltai commented Dec 15, 2014

From bange...@gmail.com on August 22, 2014 14:29:52

FE_DGQ and FE_Nedelec use a scheme whereby they defer construction of embedding and/or restriction matrices until they are in fact needed. The guard code looks like this:

{
  // initialization upon first request
  if (this->prolongation[refinement_case-1][child].n() == 0)
    {
      Threads::Mutex::ScopedLock lock(this->mutex);

      // if matrix got updated while waiting for the lock
      if (this->prolongation[refinement_case-1][child].n() ==
          this->dofs_per_cell)
        return this->prolongation[refinement_case-1][child];
   ...
  this_nonconst.prolongation[refinement_case-1].swap(isotropic_matrices.back());
}

The problem with this code is that the swap operation is not atomic: the first or second "if" may see a matrix that already has its size set but not the content yet. Or, worse, that doesn't have the size set yet, but has the elements already -- in which case it will be recomputed and the first thread to use the matrix may see temporary values while the second one recomputes the matrices.

The only way out is storing a boolean for each matrix indicating whether it is computed already, testing the boolean at the beginning and setting it at the very end.

Original issue: http://code.google.com/p/dealii/issues/detail?id=248

@luca-heltai
Copy link
Member Author

From kronbich...@gmail.com on August 22, 2014 12:20:58

Since this code was written by me (and is in FE_Q_Base, FE_DGQ, FESystem and now also FE_Nedelec), let me add a few comments (in particular since I had thought about a similar solution back then):

  1. From a computer's perpective, I do not see a difference in testing for matrix.n() (which affects the last four bytes to be exchanged by prolongation.swap(), which again is an atomic operation) and making the matrices actually a pair of the matrices and a boolean flag, i.e., the very next byte. In particular, the content, i.e., the pointers to the start and end of the data array of the matrix, are swapped before that. So, the first case (size set but content not) cannot happen. I don't know the memory ordering models in all CPU brands, but at least on x86-64 the stores will be in program order and thus uncritical. In 7 out of 8 cases, the bool and the last matrix will even be on the same cache line and trigger the same cache coherency protocol inside the CPU (read from a cache line that is in L1 of the core that has just done the modifications).
  2. The second case (size not set but elements ready) is uncritical because of a second test for matrix.n() == dofs_per_cell. This call is necessary because we wait for the mutex at the beginning of the section (and I could actually verify that this is necessary). But then we do not recompute, see the return statement above. The very same code would be necessary for the boolean.
  3. Of course, item 1 assumes details on the implementation of the swap operation of TableBase (i.e., size indices are swapped after data pointers) and thus somewhat more brittle.

So from what I can judge correctness and performance in the current implementation is the same and it boils down to decide on whether we want simple data structures (current design) or to avoid assumptions on implementation details. I preferred the simplicity back then but really do not insist on that variant.

Cc: kronbich...@gmail.com

@luca-heltai
Copy link
Member Author

From bange...@gmail.com on August 22, 2014 13:38:58

Fair enough, thanks for the explanation. I didn't mean to imply that the design in completely broken. I think it works in practice but relies on assumptions that may not be completely sound -- like you suggest, it depends on knowledge of how exactly the swap operation works.

For the Nedelec element, there is also the issue that (if I read the code correctly, I don't have it here right now) we don't actually swap in the matrix but first resize it and later fill it. That's definitely fishy.

My goal with this report was just to have it recorded somewhere that it is something we may want to look at at one point. Your explanations are definitely appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants