Reduce overhead in calls to tensor product value function #15182

kronbichler · 2023-05-05T08:29:21Z

This is a follow-up to #15137, reducing the integer code overhead a code has to go through. There are three main changes:

We switch from ArrayView to plain pointers in the internal interfaces, which reduces the setup cost by 2 instructions and one register to passed around between function calls.
The compiler can't see that, for the case stride==1, the loop termination criterion q + v < n_q_points_scalar; is the same as in the outer loop and thus always true. Avoid one additional branch instruction by spelling out this as v < stride && (stride > 1 ? q + v < n_q_points_scalar : true); - I admit this is not super readable and there are maybe better suggestions to make sure the data rearrangement, which is visible in profilers, does not incur a loop if all we add is one element.
I opted to mark do_interpolate_xy with DEAL_II_ALWAYS_INLINE. We had a discussion in Template loop bounds for flexible evaluate/integrate function #14972 (comment) but it seems that at least my compiler (clang-16) does a bad job in guessing the cost of moving the variables in and out of registers; instead, the outer function evaluate_tensor_product_value_and_gradient_shapes should be the function to collect the code. From a code size perspective, I do not see an advantage of not to inline, as the only function calling do_interpolate_xy is that other function.

bangerth · 2023-05-05T18:37:19Z

include/deal.II/matrix_free/fe_point_evaluation.h

      const std::size_t n_batches =
-        n_q_points_scalar / n_lanes_internal +
-        (n_q_points_scalar % n_lanes_internal > 0 ? 1 : 0);
+        (n_q_points_scalar + n_lanes_internal - 1) / n_lanes_internal;


What you're doing here is rounding up n_q_points_scalar/n_lanes_internal. Can you put this into a comment?

bangerth · 2023-05-05T18:42:16Z

include/deal.II/matrix_free/fe_point_evaluation.h

                                 n_shapes,
                                 solution_renumbered);

      if (evaluation_flags & EvaluationFlags::values)
        {
-          for (unsigned int v = 0; v < stride && q + v < n_q_points_scalar; ++v)
+          for (unsigned int v = 0;
+               v < stride && (stride > 1 ? q + v < n_q_points_scalar : true);


This condition (here and below) stumped me for a bit. I think it might be easier to read as

Suggested change

v < stride && (stride > 1 ? q + v < n_q_points_scalar : true);

v < stride && (stride == 1 || (q + v < n_q_points_scalar));

kronbichler · 2023-05-08T06:43:59Z

Thanks for the comments, I adapted according to the suggestions.

kronbichler added Matrix-free ready to test labels May 5, 2023

kronbichler force-pushed the reduce_overhead2 branch from 07006fd to 8cfb2f2 Compare May 5, 2023 08:43

bangerth reviewed May 5, 2023

View reviewed changes

masterleinad approved these changes May 5, 2023

View reviewed changes

kronbichler added 2 commits May 8, 2023 08:43

Reduce overhead in calls to tensor product value function

2b4a4ea

Remove another ArrayView

b5a8c60

kronbichler force-pushed the reduce_overhead2 branch from f59c98d to b5a8c60 Compare May 8, 2023 06:43

bangerth approved these changes May 9, 2023

View reviewed changes

bangerth merged commit a321577 into dealii:master May 9, 2023
14 checks passed

kronbichler deleted the reduce_overhead2 branch August 10, 2023 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce overhead in calls to tensor product value function #15182

Reduce overhead in calls to tensor product value function #15182

kronbichler commented May 5, 2023

bangerth May 5, 2023

bangerth May 5, 2023

kronbichler commented May 8, 2023

	v < stride && (stride > 1 ? q + v < n_q_points_scalar : true);
	v < stride && (stride == 1 \|\| (q + v < n_q_points_scalar));

Reduce overhead in calls to tensor product value function #15182

Reduce overhead in calls to tensor product value function #15182

Conversation

kronbichler commented May 5, 2023

bangerth May 5, 2023

Choose a reason for hiding this comment

bangerth May 5, 2023

Choose a reason for hiding this comment

kronbichler commented May 8, 2023