Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] What's the difference between ARROW_SIMD_LEVEL and ARROW_RUNTIME_SIMD_LEVEL #40333

Closed
mapleFU opened this issue Mar 4, 2024 · 25 comments
Closed
Assignees
Labels
Component: C++ Type: usage Issue is a user question
Milestone

Comments

@mapleFU
Copy link
Member

mapleFU commented Mar 4, 2024

Describe the usage question you have. Please include as many useful details as possible.

In Arrow, we have two SIMD flag in CMake listed below:

  define_option_string(ARROW_SIMD_LEVEL
                       "Compile-time SIMD optimization level"
                       "DEFAULT" # default to SSE4_2 on x86, NEON on Arm, NONE otherwise
                       "NONE"
                       "SSE4_2"
                       "AVX2"
                       "AVX512"
                       "NEON"
                       "SVE" # size agnostic SVE
                       "SVE128" # fixed size SVE
                       "SVE256" # "
                       "SVE512" # "
                       "DEFAULT")

  define_option_string(ARROW_RUNTIME_SIMD_LEVEL
                       "Max runtime SIMD optimization level"
                       "MAX" # default to max supported by compiler
                       "NONE"
                       "SSE4_2"
                       "AVX2"
                       "AVX512"
                       "MAX")

They're added in the two patches:

  1. c15637d
  2. f8c9c8b

However, ARROW_RUNTIME_SIMD_LEVEL would also be checked in compile time. So, what's the difference between these two flags? Or do I misunderstand the meaning of "runtime" here(I mean dynamicly checking it when runing the program)?

Component(s)

C++

@mapleFU mapleFU added the Type: usage Issue is a user question label Mar 4, 2024
@zanmato1984
Copy link
Collaborator

Hi @mapleFU , maybe you can see if this section [1] in arrow doc answers your question.

[1] https://arrow.apache.org/docs/cpp/env_vars.html#envvar-ARROW_USER_SIMD_LEVEL

@mapleFU
Copy link
Member Author

mapleFU commented Mar 4, 2024

I guess it doesn't 🤔?

ARROW_USER_SIMD_LEVEL env controls the dynamically dispatch the instr, and controls the CpuInfo struct. However, some code are under ARROW_HAVE_RUNTIME_{} flag[1], and some are under ARROW_HAVE_{} flag[2]...

  1. #if defined(ARROW_HAVE_RUNTIME_AVX2)
  2. #if defined(ARROW_HAVE_SSE4_2)

@zanmato1984
Copy link
Collaborator

I guess it doesn't 🤔?

ARROW_USER_SIMD_LEVEL env controls the dynamically dispatch the instr, and controls the CpuInfo struct. However, some code are under ARROW_HAVE_RUNTIME_{} flag[1], and some are under ARROW_HAVE_{} flag[2]...

  1. #if defined(ARROW_HAVE_RUNTIME_AVX2)
  2. #if defined(ARROW_HAVE_SSE4_2)

Oh I got your point.

@zanmato1984
Copy link
Collaborator

After some research on the code history, it appears to me that:

  1. The naming similarity between ARROW_SIMD_LEVEL/ARROW_RUNTIME_SIMD_LEVEL and ARROW_HAVE_* /ARROW_HAVE_RUNTIME_* seems to be a (unfortunate?) coincidence. They don't really relate to each other.
  2. The cmake option ARROW_RUNTIME_SIMD_LEVEL seems to be introduced in ARROW-9851: [C++] Disable AVX512 runtime paths with Valgrind #8049 to solely tailor the SIMD level specified by ARROW_SIMD_LEVEL at compile time.
  3. The ARROW_HAVE_RUNTIME_* (introduced in ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch #7607) macros seem to be duplicated with ARROW_HAVE_* (introduced in ARROW-8227: [C++] Refine SIMD feature definitions #6794) and need some cleanup.

@pitrou I'm not familiar with the full story so please correct me if I get anything wrong.

@pitrou
Copy link
Member

pitrou commented Mar 4, 2024

The wording could probably be improved, but there are two categories of SIMD optimizations in Arrow:

  1. some optimizations are enabled statically at compile-time; these are governed by ARROW_SIMD_LEVEL. If you compile with ARROW_SIMD_LEVEL=AVX2 and execute on non-AVX2 CPU, the code will crash.

  2. some optimizations are selected dynamically at runtime; these are governed by ARROW_RUNTIME_SIMD_LEVEL. If you compile with ARROW_RUNTIME_SIMD_LEVEL=AVX2 and execute on non-AVX2 CPU, a non-AVX2 code path (perhaps SSE2) will be executed. If you compile with ARROW_RUNTIME_SIMD_LEVEL=AVX2 and execute on AVX2 CPU, a AVX2 code path will be executed (but not AVX512, because for that you would have needed ARROW_RUNTIME_SIMD_LEVEL=AVX512)

@pitrou
Copy link
Member

pitrou commented Mar 4, 2024

Also, ARROW_USER_SIMD_LEVEL is tied to ARROW_RUNTIME_SIMD_LEVEL. The concrete SIMD level selected is MIN(ARROW_RUNTIME_SIMD_LEVEL, ARROW_USER_SIMD_LEVEL, CPU support).

For example, if you compiled with ARROW_RUNTIME_SIMD_LEVEL=AVX512 and execute on a AVX2 CPU, you will get a AVX2 code path by default. But you can force Arrow to use a SSE2 code path by setting ARROW_USER_SIMD_LEVEL=SSE4_2.

(why it's called "user" I don't remember...)

@mapleFU
Copy link
Member Author

mapleFU commented Mar 4, 2024

Aha I got to understand this, so for some user like pyarrow, x86 lib might compile with avx2 avx512 but only uses avx2, and arm compiles with neon? And other user compile C++ directly might define them themselves?

@pitrou
Copy link
Member

pitrou commented Mar 4, 2024

No, it's as I explained in #40333 (comment)

@mapleFU
Copy link
Member Author

mapleFU commented Mar 4, 2024

Oh, sorry seems I just regard it as ARROW_RUNTIME_SIMD_LEVEL. it means we have two kinds of optimizations here. So how to choosing between them when choosing to use an SIMD optimization?

@pitrou
Copy link
Member

pitrou commented Mar 4, 2024

Ideally, all CPU-specific optimizations should be selected at runtime. It's just more work.

@mapleFU
Copy link
Member Author

mapleFU commented Mar 4, 2024

Thanks! Close this first.

(Reopen because you want to enhance this)

@mapleFU mapleFU closed this as completed Mar 4, 2024
@pitrou
Copy link
Member

pitrou commented Mar 4, 2024

Perhaps we can find a way to better explain / document these variables? @amoeba @wjones127 Would you have an idea?

@zanmato1984
Copy link
Collaborator

The wording could probably be improved, but there are two categories of SIMD optimizations in Arrow:

  1. some optimizations are enabled statically at compile-time; these are governed by ARROW_SIMD_LEVEL. If you compile with ARROW_SIMD_LEVEL=AVX2 and execute on non-AVX2 CPU, the code will crash.
  2. some optimizations are selected dynamically at runtime; these are governed by ARROW_RUNTIME_SIMD_LEVEL. If you compile with ARROW_RUNTIME_SIMD_LEVEL=AVX2 and execute on non-AVX2 CPU, a non-AVX2 code path (perhaps SSE2) will be executed. If you compile with ARROW_RUNTIME_SIMD_LEVEL=AVX2 and execute on AVX2 CPU, a AVX2 code path will be executed (but not AVX512, because for that you would have needed ARROW_RUNTIME_SIMD_LEVEL=AVX512)

This is very clear. Thank you.

IIUC, macro family ARROW_HAVE_* is defined by ARROW_SIMD_LEVEL and macro family ARROW_HAVE_RUNTIME_* is defined by ARROW_SIMD_RUNTIME_LEVEL. But I'm still a little confused about the relationship between these two families, specifically: is one of them superior than the other?

For example, function

void AddSumAvx512AggKernels(ScalarAggregateFunction* func);

and its call site
#if defined(ARROW_HAVE_RUNTIME_AVX512)
if (cpu_info->IsSupported(arrow::internal::CpuInfo::AVX512)) {
AddSumAvx512AggKernels(func.get());
}
#endif

seem to be completely controlled by the "runtime" family. Does it mean that, theoretically, all the simd code in arrow can be fully managed by ARROW_RUNTIME_SIMD_LEVEL?

@pitrou
Copy link
Member

pitrou commented Mar 4, 2024

Does it mean that, theoretically, all the simd code in arrow can be fully managed by ARROW_RUNTIME_SIMD_LEVEL?

I'm not sure what you mean by "theoretically"? It would require implementing runtime selection for the optimizations that are currently enabled statically.

@mapleFU mapleFU reopened this Mar 4, 2024
@assignUser
Copy link
Member

I think "theoretically" in this context means "there is no technical reason, outside of implementing the runtime selection, for this not to be managed by ARROW_RUNTIME_SIMD_LEVEL".

Which I guess you answered with your previous comment :D

Ideally, all CPU-specific optimizations should be selected at runtime. It's just more work.

@zanmato1984
Copy link
Collaborator

zanmato1984 commented Mar 4, 2024

I think "theoretically" in this context means "there is no technical reason, outside of implementing the runtime selection, for this not to be managed by ARROW_RUNTIME_SIMD_LEVEL".

Precisely!

Which I guess you answered with your previous comment :D

Ideally, all CPU-specific optimizations should be selected at runtime. It's just more work.

Also precisely!

Thank you both @assignUser @pitrou !

@wjones127
Copy link
Member

Perhaps we can find a way to better explain / document these variables?

It seems like the only thing that is missing from the docs at https://arrow.apache.org/docs/cpp/env_vars.html#envvar-ARROW_USER_SIMD_LEVEL is an explanation of what ARROW_RUNTIME_SIMD_LEVEL does and why one would use it.

Perhaps something like:

In addition to ARROW_SIMD_LEVEL there is another CMake flag ARROW_RUNTIME_SIMD_LEVEL. This controls the maximum runtime-selectable SIMD path that is compiled. For example, if set to AVX2 on x86, then all AVX512 code paths with be omitted from the compiled code.

@amoeba
Copy link
Member

amoeba commented Mar 4, 2024

What's the real use case for ARROW_USER_SIMD_LEVEL? I think putting the example use case right at the top of the env var's text would be an improvement. It looks like we guard all runtime selection with compile-time guards anyway and someone who wants to use Arrow on, say, a <SSE4.2 machine would still need a custom build of Arrow so I'm still a bit confused.

@pitrou
Copy link
Member

pitrou commented Mar 4, 2024

The use case is being able to compare performance with or without those specific code paths if you have a machine that supports them. In some (very?) rare cases it might also help to work around issues with bad CPU support. (I think we had bugs where some CPU instruction set was supposed to be supported, but the kernel/VM hypervisor didn't handle it correctly)

@wjones127
Copy link
Member

who wants to use Arrow on, say, a <SSE4.2 machine would still need a custom build of Arrow so I'm still a bit confused.

Yes, but SSE4 goes back to 2008. AVX2 was available in 2013 on Haswell. I'm not sure it would be that surprising you need a custom build to run on a machine built in 2005.

@amoeba
Copy link
Member

amoeba commented Mar 4, 2024

Thanks @pitrou, I'll put up a PR with this and @wjones127's idea for us to look at.

Yes, but SSE4 goes back to 2008. AVX2 was available in 2013 on Haswell. I'm not sure it would be that surprising you need a custom build to run on a machine built in 2005.

Fair point :)

@mapleFU
Copy link
Member Author

mapleFU commented Mar 5, 2024

In some (very?) rare cases it might also help to work around issues with bad CPU support.

I agree this is useful. Previous days I meet a problem that pdep is slow on amd zen2.

@pitrou
Copy link
Member

pitrou commented Mar 5, 2024

I agree this is useful. Previous days I meet a problem that pdep is slow on amd zen2.

The code is there to help you. See

bool HasEfficientBmi2() const {
// BMI2 (pext, pdep) is only efficient on Intel X86 processors.
return vendor() == Vendor::Intel && IsSupported(BMI2);
}

@amoeba
Copy link
Member

amoeba commented Mar 5, 2024

Hi all, I put up a PR to improve the docs around these variables at #40374. Please have a look and leave any and all feedback.

@kou kou changed the title What's the difference between ARROW_SIMD_LEVEL and ARROW_RUNTIME_SIMD_LEVEL [C++] What's the difference between ARROW_SIMD_LEVEL and ARROW_RUNTIME_SIMD_LEVEL Mar 6, 2024
mapleFU pushed a commit that referenced this issue Mar 18, 2024
### Rationale for this change

Conversation in #40333.

### What changes are included in this PR?

Just tweaks to the text in docs/source/cpp/env_vars.rst.

### Are these changes tested?

I rendered them locally.

### Are there any user-facing changes?

Just docs here.
* GitHub Issue: #40333

Lead-authored-by: Bryce Mecum <petridish@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Rossi Sun <zanmato1984@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Signed-off-by: mwish <maplewish117@gmail.com>
@mapleFU mapleFU added this to the 16.0.0 milestone Mar 18, 2024
@mapleFU
Copy link
Member Author

mapleFU commented Mar 18, 2024

Issue resolved by pull request 40374
#40374

@mapleFU mapleFU closed this as completed Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: C++ Type: usage Issue is a user question
Projects
None yet
Development

No branches or pull requests

6 participants