Tensor product operations: Use loop unrolling for slow mat-vec #16984
+176
−24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the optimization I mentioned in #16970: For the simplex elements and good performance of the matrix-free evaluation kernels, we should provide reasonably good code, avoiding loop overhead and addition latencies. This code implements a 4x manual unrolling of the outer loop in the matrix-vector product with the respective reminder loops, and implements the option to create better code for unit-stride array access. In my tests, this PR gives a speedup of 1.6-1.8x for dim=3, degree={2,3} of the simplex matrix-free operator evaluation.
The code is still not optimal, because the simplices make a matrix-vector product, whereas it would be better to perform matrix-matrix multiplications. More precisely, one should combine 2 or 3 cell batches in order to amortize the load to
matrix_ptr[0]
, as the current code is bottlenecked by the load to the matrix, and to hide the latency of the chained additions that gets clearly visible on my AMD systems withdim=3,degree=3
. I have to reason more about what the best strategy could be, and would be happy to discuss options we have.FYI @dominiktassilostill @nfehn.