Tensor product operations: Use loop unrolling for slow mat-vec #16984

kronbichler · 2024-05-09T10:05:04Z

This is the optimization I mentioned in #16970: For the simplex elements and good performance of the matrix-free evaluation kernels, we should provide reasonably good code, avoiding loop overhead and addition latencies. This code implements a 4x manual unrolling of the outer loop in the matrix-vector product with the respective reminder loops, and implements the option to create better code for unit-stride array access. In my tests, this PR gives a speedup of 1.6-1.8x for dim=3, degree={2,3} of the simplex matrix-free operator evaluation.

The code is still not optimal, because the simplices make a matrix-vector product, whereas it would be better to perform matrix-matrix multiplications. More precisely, one should combine 2 or 3 cell batches in order to amortize the load to matrix_ptr[0], as the current code is bottlenecked by the load to the matrix, and to hide the latency of the chained additions that gets clearly visible on my AMD systems with dim=3,degree=3. I have to reason more about what the best strategy could be, and would be happy to discuss options we have.

FYI @dominiktassilostill @nfehn.

Tensor product operations: Use loop unrolling for slow mat-vec

17fe7b8

kronbichler added Matrix-free ready to test Simplices labels May 9, 2024

kronbichler mentioned this pull request May 9, 2024

MatrixFree: Switch ShapeInfo from VectorizedArray<Number> to Number type #16985

Merged

peterrum approved these changes May 9, 2024

View reviewed changes

kronbichler merged commit 7801e2e into dealii:master May 14, 2024
15 of 16 checks passed

kronbichler deleted the optimize_matvec_kernel branch May 14, 2024 03:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor product operations: Use loop unrolling for slow mat-vec #16984

Tensor product operations: Use loop unrolling for slow mat-vec #16984

kronbichler commented May 9, 2024

Tensor product operations: Use loop unrolling for slow mat-vec #16984

Tensor product operations: Use loop unrolling for slow mat-vec #16984

Conversation

kronbichler commented May 9, 2024