Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37017: [C++] Guard unexpected uses of BMI2 instructions #37610

Merged
merged 1 commit into from
Sep 8, 2023

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Sep 7, 2023

Rationale for this change

Some functions introduced with Acero only check for AVX2 availability, but they actually invoke BMI2 instructions.
This can have two negative consequences:

What changes are included in this PR?

  1. Ensure that the suitable compiler flag is passed when compiling code with BMI2 intrinsics
  2. Make sure the CPU supports BMI2 adequately before invoking functions featuring BMI2 instructions

Are these changes tested?

Yes, assuming CI covers enough diversity of target platforms.

Are there any user-facing changes?

No, but performance might change (positively or negatively) depending on the CPU and platform.

@github-actions
Copy link

github-actions bot commented Sep 7, 2023

⚠️ GitHub issue #37017 has been automatically assigned in GitHub to PR creator.

@pitrou
Copy link
Member Author

pitrou commented Sep 7, 2023

@github-actions crossbow submit r-binary-packages

@pitrou
Copy link
Member Author

pitrou commented Sep 7, 2023

@github-actions crossbow submit -g cpp

@github-actions
Copy link

github-actions bot commented Sep 7, 2023

Revision: 2c30dfe

Submitted crossbow builds: ursacomputing/crossbow @ actions-31cd70eda6

Task Status
test-alpine-linux-cpp Github Actions
test-build-cpp-fuzz Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp Github Actions
test-debian-11-cpp-amd64 Github Actions
test-debian-11-cpp-i386 Github Actions
test-fedora-35-cpp Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-bundled Github Actions
test-ubuntu-20.04-cpp-minimal-with-formats Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions
test-ubuntu-22.04-cpp Github Actions
test-ubuntu-22.04-cpp-20 Github Actions
test-ubuntu-22.04-cpp-no-threading Github Actions

@github-actions
Copy link

github-actions bot commented Sep 7, 2023

Revision: 2c30dfe

Submitted crossbow builds: ursacomputing/crossbow @ actions-a3c202cb76

Task Status
r-binary-packages Github Actions

@pitrou pitrou marked this pull request as ready for review September 7, 2023 15:16
@pitrou pitrou requested a review from wgtmac as a code owner September 7, 2023 15:16
@pitrou
Copy link
Member Author

pitrou commented Sep 7, 2023

@github-actions crossbow submit -g wheel

@github-actions
Copy link

github-actions bot commented Sep 7, 2023

Revision: 2c30dfe

Submitted crossbow builds: ursacomputing/crossbow @ actions-6019fefdfb

Task Status
wheel-clean Github Actions
wheel-macos-big-sur-cp310-arm64 Github Actions
wheel-macos-big-sur-cp311-arm64 Github Actions
wheel-macos-big-sur-cp38-arm64 Github Actions
wheel-macos-big-sur-cp39-arm64 Github Actions
wheel-macos-mojave-cp310-amd64 Github Actions
wheel-macos-mojave-cp311-amd64 Github Actions
wheel-macos-mojave-cp38-amd64 Github Actions
wheel-macos-mojave-cp39-amd64 Github Actions
wheel-manylinux-2-28-cp310-amd64 Github Actions
wheel-manylinux-2-28-cp310-arm64 Github Actions
wheel-manylinux-2-28-cp311-amd64 Github Actions
wheel-manylinux-2-28-cp311-arm64 Github Actions
wheel-manylinux-2-28-cp38-amd64 Github Actions
wheel-manylinux-2-28-cp38-arm64 Github Actions
wheel-manylinux-2-28-cp39-amd64 Github Actions
wheel-manylinux-2-28-cp39-arm64 Github Actions
wheel-manylinux-2014-cp310-amd64 Github Actions
wheel-manylinux-2014-cp310-arm64 Github Actions
wheel-manylinux-2014-cp311-amd64 Github Actions
wheel-manylinux-2014-cp311-arm64 Github Actions
wheel-manylinux-2014-cp38-amd64 Github Actions
wheel-manylinux-2014-cp38-arm64 Github Actions
wheel-manylinux-2014-cp39-amd64 Github Actions
wheel-manylinux-2014-cp39-arm64 Github Actions
wheel-windows-cp310-amd64 Github Actions
wheel-windows-cp311-amd64 Github Actions
wheel-windows-cp38-amd64 Github Actions
wheel-windows-cp39-amd64 Github Actions

@pitrou
Copy link
Member Author

pitrou commented Sep 7, 2023

I assume that the remaining errors on r-binary-packages above are not related, but I'd appreciate if someone more competent could double-check? @assignUser or @thisisnic perhaps?

@pitrou pitrou requested review from westonpace and bkietz and removed request for wgtmac September 7, 2023 15:36
@assignUser
Copy link
Member

Oh thanks @pitrou I will check!

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly we compile the _avx2.cc files with both AVX2 and BMI2 enabled. Doesn't this meant that any function in any _avx2.cc file may contain a bmi2 operation?

In other words, even if we examine a file and determine it has no explicit pext, pdep, etc. then wouldn't it still be possible for the compiler to insert those instructions itself as an implicit operation?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Sep 7, 2023
@assignUser
Copy link
Member

Yeah the remaining errors look unrelated (brotli linking error and actions/checkout failing on centos7...) +1

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'm somewhat surprised we don't just fold AVX2 + efficient BMI2 into a single SIMD level

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Sep 7, 2023
@pitrou
Copy link
Member Author

pitrou commented Sep 7, 2023

I'm somewhat surprised we don't just fold AVX2 + efficient BMI2 into a single SIMD level

Historically we didn't use BMI2 at all. It then appeared only for a very special operation in Parquet, while AVX2 can be implicitly enabled in lots of other places. Folding AVX2 + efficient BMI2 into a single SIMD level would have disabled AVX2 on many CPUs (including all recent AMD CPUs).

This was manageable, but of course the stealth use of BMI2 in Acero makes the situation more annoying.

@pitrou
Copy link
Member Author

pitrou commented Sep 7, 2023

If I understand correctly we compile the _avx2.cc files with both AVX2 and BMI2 enabled. Doesn't this meant that any function in any _avx2.cc file may contain a bmi2 operation?

Nit: only two _avx2.cc files are concerned, those that use explicit BMI2 intrinsics.

In other words, even if we examine a file and determine it has no explicit pext, pdep, etc. then wouldn't it still be possible for the compiler to insert those instructions itself as an implicit operation?

Indeed. But this PR introduces matching runtime guards when calling those functions, so this shouldn't be a problem.

Am I missing another concern here?

@pitrou
Copy link
Member Author

pitrou commented Sep 7, 2023

(distantly related: #26514 )

@westonpace
Copy link
Member

Am I missing another concern here?

Consider the file level_comparison_avx2.cc. It doesn't contain any explicit bmi2 instructions. Currently, we compile it with -mavx2 -march=haswell and so it also won't contain any implicit bmi2 instructions (inserted by the compiler).

However, if I understand this PR, it is changing this to compile with -mbmi2 -mavx2 -march=haswell. So, even though there are not any explicit bmi2 instructions, isn't the compiler free to insert bmi2 instructions in an optimization pass?

That being said, I just tested this file, and several others with avx2 in the name, and adding the -mbmi2 flag did not change the compiled object file at all. So maybe gcc doesn't do any bmi2-specific optimization or maybe these files just can't take advantage of it.

@pitrou
Copy link
Member Author

pitrou commented Sep 7, 2023

That's not what this PR does. The BMI2 flag should only be passed for three files: parquet/level_conversion_bmi2.cc, arrow/compute/key_map_avx2.cc and arrow/compute/util_avx2.cc. Are you seeing something else?

@westonpace
Copy link
Member

I understand now. I was misreading the cmake changes. Sorry for the trouble. I have no concerns with this change and thank you for figuring this out!

@kou kou changed the title GH-37017: [C++] Guard unexpected uses of BMI2 instructions. GH-37017: [C++] Guard unexpected uses of BMI2 instructions Sep 8, 2023
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Can we merge this?
Do we need more discussion on this?

@pitrou
Copy link
Member Author

pitrou commented Sep 8, 2023

No, it's ok to merge since everyone seems to be ok with the approach.

@pitrou pitrou merged commit 50015f0 into apache:main Sep 8, 2023
40 checks passed
@pitrou pitrou removed the awaiting merge Awaiting merge label Sep 8, 2023
@pitrou pitrou deleted the gh37017-bmi2 branch September 8, 2023 07:08
@pitrou
Copy link
Member Author

pitrou commented Sep 8, 2023

I posted #37623 as a followup to reenable more code paths.

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 50015f0.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…che#37610)

### Rationale for this change

Some functions introduced with Acero only check for AVX2 availability, but they actually invoke BMI2 instructions.
This can have two negative consequences:
* compiling BMI2 intrinsics may fail because BMI2 was not explicitly enabled on the compiler (apachegh-37017)
* some rare CPUs (Via CPUs perhaps) may support AVX2 but not BMI2; other CPUs by AMD have a very inefficient implementation of some BMI2 instructions

### What changes are included in this PR?

1. Ensure that the suitable compiler flag is passed when compiling code with BMI2 intrinsics
2. Make sure the CPU supports BMI2 adequately before invoking functions featuring BMI2 instructions

### Are these changes tested?

Yes, assuming CI covers enough diversity of target platforms.

### Are there any user-facing changes?

No, but performance might change (positively or negatively) depending on the CPU and platform.

* Closes: apache#37017

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…che#37610)

### Rationale for this change

Some functions introduced with Acero only check for AVX2 availability, but they actually invoke BMI2 instructions.
This can have two negative consequences:
* compiling BMI2 intrinsics may fail because BMI2 was not explicitly enabled on the compiler (apachegh-37017)
* some rare CPUs (Via CPUs perhaps) may support AVX2 but not BMI2; other CPUs by AMD have a very inefficient implementation of some BMI2 instructions

### What changes are included in this PR?

1. Ensure that the suitable compiler flag is passed when compiling code with BMI2 intrinsics
2. Make sure the CPU supports BMI2 adequately before invoking functions featuring BMI2 instructions

### Are these changes tested?

Yes, assuming CI covers enough diversity of target platforms.

### Are there any user-facing changes?

No, but performance might change (positively or negatively) depending on the CPU and platform.

* Closes: apache#37017

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][R][Packaging] r-binary-packages macOS High Sierra failed by BMI2 related error
5 participants