Question about `seq_exec` compiler optimizations #1630

yencal · 2024-04-17T21:56:36Z

Greetings, could someone please explain what sort of compiler optimization happens behind the scenes when one uses seq_exec?
Because for a simple 3d finite difference nested loop, I see that the RAJA version is about four times faster than the C code.

Here is the C code. It does not show the body of the loop for brevity, but it is the same as the RAJA version:

  for (int k = 0; k < nz; ++k ) {
    for (int j = 0; j < ny; ++j ) {
      for (int i = 0; i < nx; ++i ) {
         A[i + nx * (j + ny * k)] = ...

and RAJA version of the same loop as such

  using EXEC_POLICY_3D =
    RAJA::KernelPolicy<
      RAJA::statement::For<2, RAJA::seq_exec,      // k
        RAJA::statement::For<1, RAJA::seq_exec,    // j
          RAJA::statement::For<0, RAJA::seq_exec,  // i
            RAJA::statement::Lambda<0>
          >
        >
      >
    >;
  RAJA::kernel<EXEC_POLICY_3D>(
    RAJA::make_tuple( RAJA::TypedRangeSegment<int>(0, nz),
                      RAJA::TypedRangeSegment<int>(0, ny),
                      RAJA::TypedRangeSegment<int>(0, nx) ),

    [=] RAJA_DEVICE ( int k, int j, int i) {
        A[i + nx * (j + ny * k)] = ...

Note that I use the same compiler flags for both codes:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++17 -Ofast -march=native")

And I am using M3 MacBook.
Thanks

The text was updated successfully, but these errors were encountered:

rhornung67 · 2024-04-18T15:56:09Z

I believe that the compiler sees essentially the same source code whether written as a C-style for-loop or using a RAJA kernel exec method with a seq_exec policy. https://github.com/LLNL/RAJA/blob/develop/include/RAJA/policy/sequential/forall.hpp#L65

That is, there are no pragmas or other annotations applied in RAJA internals. That said, we often observe cases where RAJA code runs faster than native C-style code, but it is not clear why. However, 4x faster seems extraordinary. Have you compared the assembly code for the two versions?

yencal · 2024-04-19T14:33:19Z

Ok, I will check the assembly code and ensure the cmake flags are propagated accordingly. Thanks

yencal closed this as completed Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about `seq_exec` compiler optimizations #1630

Question about `seq_exec` compiler optimizations #1630

yencal commented Apr 17, 2024 •

edited

rhornung67 commented Apr 18, 2024

yencal commented Apr 19, 2024

Question about seq_exec compiler optimizations #1630

Question about seq_exec compiler optimizations #1630

Comments

yencal commented Apr 17, 2024 • edited

rhornung67 commented Apr 18, 2024

yencal commented Apr 19, 2024

Question about `seq_exec` compiler optimizations #1630

Question about `seq_exec` compiler optimizations #1630

yencal commented Apr 17, 2024 •

edited