New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AffineConstraints: remove the block sparsity constraint function. #14398
AffineConstraints: remove the block sparsity constraint function. #14398
Conversation
I generally agree with this patch. However, let me note that the original intent of the duplicated code was to gain performance. I wrote this function (or at least the related matrix functions) with the goal to pull out the computation of the index within a I still support this because it simplifies our code base and because I believe that the block sparsity pattern creation is not a performance-critical component in most codes, and I would often step into the non-block sparsity patterns when it matters. That said, we should hear other opinions on this. |
Do any of our performance tests set up block matrices? It would be interesting to see a performance comparison. |
We make use of BSP in ASPECT, so it would be good to know how much of a regression this PR would bring. |
I modified the test to run with a lot more mesh refinement and I see, in callgrind, 66.7 million instructions on master and 81.5 million instructions with this for We may be able to make up some of the shortfall later on when I will get rid of all the |
Is there a basic ASPECT benchmark worth looking at? I'd be happy to give it a try. |
This function predates BlockSparsityPatternBase<T>::add_entries(), which does the same thing (essentially just updating the individual blocks after computing offsets). For good measure I added a test which verifies that we get identical output with constraints and a DoF mask table before and after the switch.
I tried out the aspect plane melt bands benchmark and I don't see a significant difference in timings - with and without this branch I see
for both For step-32 (in 2D) I see 478s on master and 479 with this branch. Here's some timings with master:
and this branch:
I can think of some optimizations which might help but it looks like the extra overhead of this function is only present in 2D (and there its less than 10% slower). |
d42fc75
to
656d41e
Compare
I rewrote
update: with some more work the performance is basically the same as the original version. |
We can avoid some expensive parts and looping over data more than once by utilizing the fact that the DoFs are sorted, which implies that they are also sorted by block.
656d41e
to
5b2d8e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I think this is now in a good state to be merged. I'll wait for a little while before merging, so others also get a chance to look at it. |
@drwells Thank you for working hard on this and making sure performance stays good! 👍 |
This function predates
BlockSparsityPatternBase<T>::add_entries()
, which does the same thing (essentially just updating the individual blocks after computing offsets). For good measure I added a test which verifies that we get identical output with constraints and a DoF mask table before and after the switch.Part 3/11.