AffineConstraints: remove the block sparsity constraint function. #14398

drwells · 2022-11-06T14:09:18Z

This function predates BlockSparsityPatternBase<T>::add_entries(), which does the same thing (essentially just updating the individual blocks after computing offsets). For good measure I added a test which verifies that we get identical output with constraints and a DoF mask table before and after the switch.

Part 3/11.

kronbichler · 2022-11-07T10:44:49Z

I generally agree with this patch. However, let me note that the original intent of the duplicated code was to gain performance. I wrote this function (or at least the related matrix functions) with the goal to pull out the computation of the index within a block(block_row, block_col) sparsity pattern object once up front, rather than computing it for every row over and over again as the current code alternative would do.

I still support this because it simplifies our code base and because I believe that the block sparsity pattern creation is not a performance-critical component in most codes, and I would often step into the non-block sparsity patterns when it matters. That said, we should hear other opinions on this.

drwells · 2022-11-07T12:53:47Z

Do any of our performance tests set up block matrices? It would be interesting to see a performance comparison.

tjhei · 2022-11-07T13:34:21Z

We make use of BSP in ASPECT, so it would be good to know how much of a regression this PR would bring.

drwells · 2022-11-07T21:29:07Z

I modified the test to run with a lot more mesh refinement and I see, in callgrind, 66.7 million instructions on master and 81.5 million instructions with this for make_sparsity_pattern() - its definitely slower. By comparison, SparsityPattern::copy_from() is about 13 million instructions.

We may be able to make up some of the shortfall later on when I will get rid of all the SparsityPatternType::add() calls.

drwells · 2022-11-07T21:29:27Z

Is there a basic ASPECT benchmark worth looking at? I'd be happy to give it a try.

This function predates BlockSparsityPatternBase<T>::add_entries(), which does the same thing (essentially just updating the individual blocks after computing offsets). For good measure I added a test which verifies that we get identical output with constraints and a DoF mask table before and after the switch.

drwells · 2022-11-13T19:19:15Z

I tried out the aspect plane melt bands benchmark and I don't see a significant difference in timings - with and without this branch I see

+----------------------------------------------+------------+------------+
| Total wallclock time elapsed since start     |       271s |            |
|                                              |            |            |
| Section                          | no. calls |  wall time | % of total |
+----------------------------------+-----------+------------+------------+
| Assemble Stokes system           |         2 |      13.6s |         5% |
| Assemble composition system      |         2 |      13.6s |         5% |
| Assemble temperature system      |         2 |      13.7s |         5% |
| Build Stokes preconditioner      |         2 |      10.8s |         4% |
| Build composition preconditioner |         2 |      1.27s |      0.47% |
| Initialization                   |         1 |     0.149s |         0% |
| Postprocessing                   |         1 |      13.2s |       4.9% |
| Setup dof systems                |         1 |      1.49s |      0.55% |
| Setup initial conditions         |         1 |      4.31s |       1.6% |
| Setup matrices                   |         1 |       6.7s |       2.5% |
| Solve Stokes system              |         2 |       186s |        69% |
| Solve composition system         |         2 |      1.63s |       0.6% |
+----------------------------------+-----------+------------+------------+

for both master and this branch (the results are about the same for each - 271 seconds).

For step-32 (in 2D) I see 478s on master and 479 with this branch. Here's some timings with master:

| make Stokes SP                   |        41 |      9.24s |       1.9% |
| make Stokes prec SP              |        41 |      4.36s |      0.91% |
| make Temp SP                     |        41 |       1.5s |      0.31% |

and this branch:

| make Stokes SP                   |        41 |      9.83s |       2.1% |
| make Stokes prec SP              |        41 |      4.63s |      0.97% |
| make Temp SP                     |        41 |       1.5s |      0.31% |

I can think of some optimizations which might help but it looks like the extra overhead of this function is only present in 2D (and there its less than 10% slower).

drwells · 2022-11-13T21:09:58Z

I rewrote add_entries() to close about 2/3rds of the performance gap. Here are some numbers (in milliseconds) for a benchmark based on the test I added (I ran it 3x in 2D and 3D):

master:
2D: 2640, 2645, 2631
3D: 6433, 6434, 6417

feature:
2D: 3038, 3031, 3047
3D: 7013, 6942, 6998

feature + opt:
2D: 2705, 2713, 2739
3D: 6615, 6536, 6560

feature + opt2:
2D: 2592, 2624, 2640
3D: 6492, 6476, 6477

update: with some more work the performance is basically the same as the original version.

We can avoid some expensive parts and looping over data more than once by utilizing the fact that the DoFs are sorted, which implies that they are also sorted by block.

kronbichler

Nice!

kronbichler · 2022-11-14T09:04:35Z

I think this is now in a good state to be merged. I'll wait for a little while before merging, so others also get a chance to look at it.

tjhei · 2022-11-14T19:09:49Z

@drwells Thank you for working hard on this and making sure performance stays good! 👍

drwells added Enhancement ready to test labels Nov 6, 2022

drwells force-pushed the remove-bsp-constraint-function branch from d42fc75 to 656d41e Compare November 13, 2022 21:08

Optimize BlockSparsityPatternBase::add_entries().

5b2d8e2

We can avoid some expensive parts and looping over data more than once by utilizing the fact that the DoFs are sorted, which implies that they are also sorted by block.

drwells force-pushed the remove-bsp-constraint-function branch from 656d41e to 5b2d8e2 Compare November 13, 2022 21:47

kronbichler approved these changes Nov 14, 2022

View reviewed changes

masterleinad approved these changes Nov 14, 2022

View reviewed changes

masterleinad merged commit 726f95d into dealii:master Nov 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AffineConstraints: remove the block sparsity constraint function. #14398

AffineConstraints: remove the block sparsity constraint function. #14398

drwells commented Nov 6, 2022

kronbichler commented Nov 7, 2022

drwells commented Nov 7, 2022

tjhei commented Nov 7, 2022

drwells commented Nov 7, 2022

drwells commented Nov 7, 2022

drwells commented Nov 13, 2022 •

edited

drwells commented Nov 13, 2022 •

edited

kronbichler left a comment

kronbichler commented Nov 14, 2022

tjhei commented Nov 14, 2022

AffineConstraints: remove the block sparsity constraint function. #14398

AffineConstraints: remove the block sparsity constraint function. #14398

Conversation

drwells commented Nov 6, 2022

kronbichler commented Nov 7, 2022

drwells commented Nov 7, 2022

tjhei commented Nov 7, 2022

drwells commented Nov 7, 2022

drwells commented Nov 7, 2022

drwells commented Nov 13, 2022 • edited

drwells commented Nov 13, 2022 • edited

kronbichler left a comment

Choose a reason for hiding this comment

kronbichler commented Nov 14, 2022

tjhei commented Nov 14, 2022

drwells commented Nov 13, 2022 •

edited

drwells commented Nov 13, 2022 •

edited