Vectorize more work in reductions of vector operations #14254

kronbichler · 2022-09-12T09:22:21Z

Addresses one of the issues mentioned in #14251: The reductions for small sizes involve work on up to n_lanes * 32 - 1 vector items without vectorization. If the vector size is a few hundreds, this scalar work gets clearly visible. We can get this down to less than n_lanes. While in this code, I also restructured the reductions on the temporary outer_results that contain the result of 32 operations: We can the associated pairwise summation by a single loop (rather than two nested loops) by writing the temporary result to the end of a (now 2x larger) array and sum until we've reached the end.

These two actions result in a considerable improvement. I tested for vector size 400 and found that around 20% fewer instructions get issued with this patch.

Note: This patch changes the order of additions, so I expect that we might need some updates to the result files that are sensitive to roundoff. I think I identified one, but let us wait for the CI machines.

tamiko · 2022-09-12T14:34:04Z

include/deal.II/lac/vector_operations_internal.h

-          r0 += r1;
-          r2 += r3;
-          result = r0 + r2;
+          result = (r0 + r1) + (r2 + r3);


Much better :-)

tamiko · 2022-09-12T14:38:57Z

@kronbichler The following test fails on the serial configuration: 4265 - lac/bicgstab_large.debug (Failed)

tamiko

This looks good.

Shall we do a full run of the standard configuration of the regression tester on this one before merging?

tamiko · 2022-09-12T14:41:44Z

tests/lac/bicgstab_large.cc

@@ -31,6 +31,7 @@ main()
 {
  initlog();
  deallog << std::setprecision(4);


Would it make sense to drop this std::setprecision or to set it to say 10 digits so that numdiff can do something about roundoff errors?

The problem is that we print a residual that ought to be zero in exact arithmetic, but turns out to be 1e-3 or something (above the absolute/relative threshold of numdiff), because we've scaled by matrix by 1e10. I don't see a simple way to get this back to tolerance, apart from disabling the print of the residual to the logstream.

kronbichler · 2022-09-12T14:45:07Z

Shall we do a full run of the standard configuration of the regression tester on this one before merging?

Yes, here it would be good. 👍

kronbichler added Linear Algebra ready to test labels Sep 12, 2022

kronbichler force-pushed the vectorize_more branch from e49e6d1 to 997fdb3 Compare September 12, 2022 11:29

tamiko reviewed Sep 12, 2022

View reviewed changes

tamiko approved these changes Sep 12, 2022

View reviewed changes

tamiko reviewed Sep 12, 2022

View reviewed changes

peterrum approved these changes Sep 12, 2022

View reviewed changes

kronbichler added 2 commits September 12, 2022 22:15

Vectorize more work in reductions of vector operations

95f9b99

Adjust test output due to roundoff: Disable output of residual

4a25472

kronbichler force-pushed the vectorize_more branch from d95f5d4 to 4a25472 Compare September 12, 2022 20:16

peterrum merged commit 998bb12 into dealii:master Sep 14, 2022

kronbichler deleted the vectorize_more branch September 14, 2022 14:11

kronbichler mentioned this pull request Sep 19, 2022

Vector reductions: Unroll a loop in case of vectorization #14287

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize more work in reductions of vector operations #14254

Vectorize more work in reductions of vector operations #14254

kronbichler commented Sep 12, 2022

tamiko Sep 12, 2022

tamiko commented Sep 12, 2022

tamiko left a comment

tamiko Sep 12, 2022

kronbichler Sep 12, 2022

kronbichler commented Sep 12, 2022

Vectorize more work in reductions of vector operations #14254

Vectorize more work in reductions of vector operations #14254

Conversation

kronbichler commented Sep 12, 2022

tamiko Sep 12, 2022

Choose a reason for hiding this comment

tamiko commented Sep 12, 2022

tamiko left a comment

Choose a reason for hiding this comment

tamiko Sep 12, 2022

Choose a reason for hiding this comment

kronbichler Sep 12, 2022

Choose a reason for hiding this comment

kronbichler commented Sep 12, 2022