Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AffineConstraints: remove the block sparsity constraint function. #14398

Merged
merged 2 commits into from Nov 14, 2022

Conversation

drwells
Copy link
Member

@drwells drwells commented Nov 6, 2022

This function predates BlockSparsityPatternBase<T>::add_entries(), which does the same thing (essentially just updating the individual blocks after computing offsets). For good measure I added a test which verifies that we get identical output with constraints and a DoF mask table before and after the switch.

Part 3/11.

@kronbichler
Copy link
Member

I generally agree with this patch. However, let me note that the original intent of the duplicated code was to gain performance. I wrote this function (or at least the related matrix functions) with the goal to pull out the computation of the index within a block(block_row, block_col) sparsity pattern object once up front, rather than computing it for every row over and over again as the current code alternative would do.

I still support this because it simplifies our code base and because I believe that the block sparsity pattern creation is not a performance-critical component in most codes, and I would often step into the non-block sparsity patterns when it matters. That said, we should hear other opinions on this.

@drwells
Copy link
Member Author

drwells commented Nov 7, 2022

Do any of our performance tests set up block matrices? It would be interesting to see a performance comparison.

@tjhei
Copy link
Member

tjhei commented Nov 7, 2022

We make use of BSP in ASPECT, so it would be good to know how much of a regression this PR would bring.

@drwells
Copy link
Member Author

drwells commented Nov 7, 2022

I modified the test to run with a lot more mesh refinement and I see, in callgrind, 66.7 million instructions on master and 81.5 million instructions with this for make_sparsity_pattern() - its definitely slower. By comparison, SparsityPattern::copy_from() is about 13 million instructions.

We may be able to make up some of the shortfall later on when I will get rid of all the SparsityPatternType::add() calls.

@drwells
Copy link
Member Author

drwells commented Nov 7, 2022

Is there a basic ASPECT benchmark worth looking at? I'd be happy to give it a try.

This function predates BlockSparsityPatternBase<T>::add_entries(), which does
the same thing (essentially just updating the individual blocks after computing
offsets). For good measure I added a test which verifies that we get identical
output with constraints and a DoF mask table before and after the switch.
@drwells
Copy link
Member Author

drwells commented Nov 13, 2022

I tried out the aspect plane melt bands benchmark and I don't see a significant difference in timings - with and without this branch I see

+----------------------------------------------+------------+------------+
| Total wallclock time elapsed since start     |       271s |            |
|                                              |            |            |
| Section                          | no. calls |  wall time | % of total |
+----------------------------------+-----------+------------+------------+
| Assemble Stokes system           |         2 |      13.6s |         5% |
| Assemble composition system      |         2 |      13.6s |         5% |
| Assemble temperature system      |         2 |      13.7s |         5% |
| Build Stokes preconditioner      |         2 |      10.8s |         4% |
| Build composition preconditioner |         2 |      1.27s |      0.47% |
| Initialization                   |         1 |     0.149s |         0% |
| Postprocessing                   |         1 |      13.2s |       4.9% |
| Setup dof systems                |         1 |      1.49s |      0.55% |
| Setup initial conditions         |         1 |      4.31s |       1.6% |
| Setup matrices                   |         1 |       6.7s |       2.5% |
| Solve Stokes system              |         2 |       186s |        69% |
| Solve composition system         |         2 |      1.63s |       0.6% |
+----------------------------------+-----------+------------+------------+

for both master and this branch (the results are about the same for each - 271 seconds).

For step-32 (in 2D) I see 478s on master and 479 with this branch. Here's some timings with master:

| make Stokes SP                   |        41 |      9.24s |       1.9% |
| make Stokes prec SP              |        41 |      4.36s |      0.91% |
| make Temp SP                     |        41 |       1.5s |      0.31% |

and this branch:

| make Stokes SP                   |        41 |      9.83s |       2.1% |
| make Stokes prec SP              |        41 |      4.63s |      0.97% |
| make Temp SP                     |        41 |       1.5s |      0.31% |

I can think of some optimizations which might help but it looks like the extra overhead of this function is only present in 2D (and there its less than 10% slower).

@drwells drwells force-pushed the remove-bsp-constraint-function branch from d42fc75 to 656d41e Compare November 13, 2022 21:08
@drwells
Copy link
Member Author

drwells commented Nov 13, 2022

I rewrote add_entries() to close about 2/3rds of the performance gap. Here are some numbers (in milliseconds) for a benchmark based on the test I added (I ran it 3x in 2D and 3D):

master:
2D: 2640, 2645, 2631
3D: 6433, 6434, 6417

feature:
2D: 3038, 3031, 3047
3D: 7013, 6942, 6998

feature + opt:
2D: 2705, 2713, 2739
3D: 6615, 6536, 6560

feature + opt2:
2D: 2592, 2624, 2640
3D: 6492, 6476, 6477

update: with some more work the performance is basically the same as the original version.

We can avoid some expensive parts and looping over data more than once by
utilizing the fact that the DoFs are sorted, which implies that they are also
sorted by block.
@drwells drwells force-pushed the remove-bsp-constraint-function branch from 656d41e to 5b2d8e2 Compare November 13, 2022 21:47
Copy link
Member

@kronbichler kronbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@kronbichler
Copy link
Member

I think this is now in a good state to be merged. I'll wait for a little while before merging, so others also get a chance to look at it.

@masterleinad masterleinad merged commit 726f95d into dealii:master Nov 14, 2022
@tjhei
Copy link
Member

tjhei commented Nov 14, 2022

@drwells Thank you for working hard on this and making sure performance stays good! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants