-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37017: [C++] Guard unexpected uses of BMI2 instructions #37610
Conversation
|
@github-actions crossbow submit r-binary-packages |
@github-actions crossbow submit -g cpp |
Revision: 2c30dfe Submitted crossbow builds: ursacomputing/crossbow @ actions-31cd70eda6 |
Revision: 2c30dfe Submitted crossbow builds: ursacomputing/crossbow @ actions-a3c202cb76
|
@github-actions crossbow submit -g wheel |
Revision: 2c30dfe Submitted crossbow builds: ursacomputing/crossbow @ actions-6019fefdfb |
I assume that the remaining errors on r-binary-packages above are not related, but I'd appreciate if someone more competent could double-check? @assignUser or @thisisnic perhaps? |
Oh thanks @pitrou I will check! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly we compile the _avx2.cc
files with both AVX2 and BMI2 enabled. Doesn't this meant that any function in any _avx2.cc
file may contain a bmi2
operation?
In other words, even if we examine a file and determine it has no explicit pext
, pdep
, etc. then wouldn't it still be possible for the compiler to insert those instructions itself as an implicit operation?
Yeah the remaining errors look unrelated (brotli linking error and actions/checkout failing on centos7...) +1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm somewhat surprised we don't just fold AVX2 + efficient BMI2 into a single SIMD level
Historically we didn't use BMI2 at all. It then appeared only for a very special operation in Parquet, while AVX2 can be implicitly enabled in lots of other places. Folding AVX2 + efficient BMI2 into a single SIMD level would have disabled AVX2 on many CPUs (including all recent AMD CPUs). This was manageable, but of course the stealth use of BMI2 in Acero makes the situation more annoying. |
Nit: only two
Indeed. But this PR introduces matching runtime guards when calling those functions, so this shouldn't be a problem. Am I missing another concern here? |
(distantly related: #26514 ) |
Consider the file However, if I understand this PR, it is changing this to compile with That being said, I just tested this file, and several others with avx2 in the name, and adding the |
That's not what this PR does. The BMI2 flag should only be passed for three files: |
I understand now. I was misreading the cmake changes. Sorry for the trouble. I have no concerns with this change and thank you for figuring this out! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Can we merge this?
Do we need more discussion on this?
No, it's ok to merge since everyone seems to be ok with the approach. |
I posted #37623 as a followup to reenable more code paths. |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 50015f0. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…che#37610) ### Rationale for this change Some functions introduced with Acero only check for AVX2 availability, but they actually invoke BMI2 instructions. This can have two negative consequences: * compiling BMI2 intrinsics may fail because BMI2 was not explicitly enabled on the compiler (apachegh-37017) * some rare CPUs (Via CPUs perhaps) may support AVX2 but not BMI2; other CPUs by AMD have a very inefficient implementation of some BMI2 instructions ### What changes are included in this PR? 1. Ensure that the suitable compiler flag is passed when compiling code with BMI2 intrinsics 2. Make sure the CPU supports BMI2 adequately before invoking functions featuring BMI2 instructions ### Are these changes tested? Yes, assuming CI covers enough diversity of target platforms. ### Are there any user-facing changes? No, but performance might change (positively or negatively) depending on the CPU and platform. * Closes: apache#37017 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…che#37610) ### Rationale for this change Some functions introduced with Acero only check for AVX2 availability, but they actually invoke BMI2 instructions. This can have two negative consequences: * compiling BMI2 intrinsics may fail because BMI2 was not explicitly enabled on the compiler (apachegh-37017) * some rare CPUs (Via CPUs perhaps) may support AVX2 but not BMI2; other CPUs by AMD have a very inefficient implementation of some BMI2 instructions ### What changes are included in this PR? 1. Ensure that the suitable compiler flag is passed when compiling code with BMI2 intrinsics 2. Make sure the CPU supports BMI2 adequately before invoking functions featuring BMI2 instructions ### Are these changes tested? Yes, assuming CI covers enough diversity of target platforms. ### Are there any user-facing changes? No, but performance might change (positively or negatively) depending on the CPU and platform. * Closes: apache#37017 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
Some functions introduced with Acero only check for AVX2 availability, but they actually invoke BMI2 instructions.
This can have two negative consequences:
What changes are included in this PR?
Are these changes tested?
Yes, assuming CI covers enough diversity of target platforms.
Are there any user-facing changes?
No, but performance might change (positively or negatively) depending on the CPU and platform.