Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help compiler exploit hanging-node symmetries by reduced reg loads #13000

Merged
merged 1 commit into from Nov 28, 2021

Conversation

kronbichler
Copy link
Member

This is a pure performance optimization: The current vectorized code path of the hanging-node evaluation routines exploits the symmetry in shape functions between the two subface interpolation matrices. However, we leave it up to the compiler to actually exploit this by fewer memory-register moves, which is not done for high degrees as there are too many loads in between the two places where the data is re-used (whereas at least clang-13 does the optimization for p=3, so I did not notice immediately). Hence, make the loop explicit, similar to what we do in

in0 = in[stride * ind];
in1 = in[stride * (mm - 1 - ind)];
res0 += val0 * in0;
res1 += val1 * in0;
res0 += val1 * in1;
res1 += val0 * in1;

@kronbichler kronbichler merged commit 52b85b2 into dealii:master Nov 28, 2021
@kronbichler kronbichler deleted the improve_mf_kernel branch November 28, 2021 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants