MatrixFree diagonal: Avoid repeated computations and binary search #15147
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a follow-up to #15135, fixing some additional performance problems:
reinit()
function, we used to merge two sets of constraints, one containing generic constraints and the other hanging-node constraints (using the internal representation ofMatrixFree
introduced in Optimize hanging-node constraints in MatrixFree #12560) by going through all dofs and looking up on both lists by binary searches. That is not efficient as it isO(n log(n))
in the number of local dofs, with possibly many branch mispredictions. It is much better to use one list as the "base" object and only check (by simple array lookup) whether the list also contains information there. Using this setup, we also avoid somestd::sort()
calls.std::map
to avoid re-computing the resulting matrix of hanging node interpolation. However, we through away thehelper
object every timeMatrixFree::cell_loop
calls back into thecell_operation
, which can be quite frequent because we use this to interleave loops. Instead, we should keep one such object through the entire loop by creating it outside the loop. There are two complications here, one regarding the possibly threaded execution ofcell_loop
(resolved byThreadLocalStorage<Helper>
) and one regarding the fact that hp adaptivity might involve different interpolations. In the latter, I still need to throw away previous results as soon as I detect a new polynomial degree.