Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36685: [R][C++] Fix illegal opcode failure with Homebrew #36705

Merged
merged 43 commits into from
Jul 30, 2023

Conversation

kou
Copy link
Member

@kou kou commented Jul 15, 2023

Rationale for this change

Summary of this problem: #31132 (comment)

Why this problem is happen again? Because I removed ENV["HOMEBREW_OPTIMIZATION_LEVEL"] = "O2" in #36583. The solution we chose by #14342 was forcing to use -O2 for SIMD related code. It works for -DCMAKE_BUILD_TYPE=MinSizeRel but it doesn't work for Homebrew.

Because Homebrew's CC https://github.com/Homebrew/brew/blob/master/Library/Homebrew/shims/super/cc forces to use the same -O flag. The default is -Os. If we specify -O2, Homebrew's CC replaces it to -Os. If we use ENV["HOMEBREW_OPTIMIZATION_LEVEL"] = "O2", Homebrew's CC always use -O2. So the solution we chose by #14342 isn't used for Homebrew.

But Homebrew thinks that ENV["HOMEBREW_OPTIMIZATION_LEVEL"] = "O2" is a workaround. So we need another solution for Homebrew.

Here are candidate solutions:

  1. -DARROW_RUNTIME_SIMD_LEVEL=NONE
  2. Remove ENV.runtime_cpu_detection if Hardware::CPU.intel?

"1. -DARROW_RUNTIME_SIMD_LEVEL=NONE" works because we don't use the runtime SIMD dispatch feature (the problematic feature) entirely.

"2. Remove ENV.runtime_cpu_detection if Hardware::CPU.intel?" works but I don't know why... If ENV.runtime_cpu_detection is called, Homebrew's CC stops replacing -march=*. If we call ENV.runtime_cpu_detection, -march=haswell is used for AVX2 related code and -march=skylake-avx512 is used for AVX512 including BMI2 related code. If we don't call ENV.runtime_cpu_detection, -march=nehalem is always used. (Note that SIMD related flags such as -mbmi2 aren't removed by Homebrew's CC. So I think that SIMD is enabled.) I don't know why but "the one-definition-rule violation" (see the summary for details: #31132 (comment) ) isn't happen.

FYI: CPU info for GitHub Actions macOS hosted-runner:

$ sysctl hw.optional machdep.cpu
hw.optional.adx: 0
hw.optional.aes: 1
hw.optional.avx1_0: 1
hw.optional.avx2_0: 0
hw.optional.avx512bw: 0
hw.optional.avx512cd: 0
hw.optional.avx512dq: 0
hw.optional.avx512f: 0
hw.optional.avx512ifma: 0
hw.optional.avx512vbmi: 0
hw.optional.avx512vl: 0
hw.optional.bmi1: 0
hw.optional.bmi2: 0
hw.optional.enfstrg: 0
hw.optional.f16c: 1
hw.optional.floatingpoint: 1
hw.optional.fma: 0
hw.optional.hle: 0
hw.optional.mmx: 1
hw.optional.mpx: 0
hw.optional.rdrand: 1
hw.optional.rtm: 0
hw.optional.sgx: 0
hw.optional.sse: 1
hw.optional.sse2: 1
hw.optional.sse3: 1
hw.optional.sse4_1: 1
hw.optional.sse4_2: 1
hw.optional.supplementalsse3: 1
hw.optional.x86_64: 1
machdep.cpu.address_bits.physical: 43
machdep.cpu.address_bits.virtual: 48
machdep.cpu.arch_perf.events: 127
machdep.cpu.arch_perf.events_number: 7
machdep.cpu.arch_perf.fixed_number: 0
machdep.cpu.arch_perf.fixed_width: 0
machdep.cpu.arch_perf.number: 4
machdep.cpu.arch_perf.version: 1
machdep.cpu.arch_perf.width: 48
machdep.cpu.cache.L2_associativity: 8
machdep.cpu.cache.linesize: 64
machdep.cpu.cache.size: 256
machdep.cpu.mwait.extensions: 3
machdep.cpu.mwait.linesize_max: 4096
machdep.cpu.mwait.linesize_min: 64
machdep.cpu.mwait.sub_Cstates: 16
machdep.cpu.thermal.ACNT_MCNT: 0
machdep.cpu.thermal.core_power_limits: 0
machdep.cpu.thermal.dynamic_acceleration: 0
machdep.cpu.thermal.energy_policy: 0
machdep.cpu.thermal.fine_grain_clock_mod: 0
machdep.cpu.thermal.hardware_feedback: 0
machdep.cpu.thermal.invariant_APIC_timer: 1
machdep.cpu.thermal.package_thermal_intr: 0
machdep.cpu.thermal.sensor: 0
machdep.cpu.thermal.thresholds: 0
machdep.cpu.tlb.data.small: 64
machdep.cpu.tlb.inst.large: 8
machdep.cpu.tlb.inst.small: 64
machdep.cpu.tlb.shared: 512
machdep.cpu.tsc_ccc.denominator: 0
machdep.cpu.tsc_ccc.numerator: 0
machdep.cpu.xsave.extended_state: 7 832 832 0
machdep.cpu.xsave.extended_state1: 0 0 0 0
machdep.cpu.brand: 0
machdep.cpu.brand_string: Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
machdep.cpu.core_count: 3
machdep.cpu.cores_per_package: 4
machdep.cpu.extfamily: 0
machdep.cpu.extfeature_bits: 4967106816
machdep.cpu.extfeatures: SYSCALL XD EM64T LAHF RDTSCP TSCI
machdep.cpu.extmodel: 3
machdep.cpu.family: 6
machdep.cpu.feature_bits: 18427078393948011519
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH MMX FXSR SSE SSE2 SS HTT SSE3 PCLMULQDQ MON VMX SSSE3 CX16 SSE4.1 SSE4.2 x2APIC POPCNT AES VMM PCID XSAVE OSXSAVE TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_feature_bits: 643 0
machdep.cpu.leaf7_feature_bits_edx: 3154117632
machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SMEP ERMS MDCLEAR IBRS STIBP L1DF ACAPMSR SSBD
machdep.cpu.logical_per_package: 4
machdep.cpu.max_basic: 13
machdep.cpu.max_ext: 2147483656
machdep.cpu.microcode_version: 1070
machdep.cpu.model: 58
machdep.cpu.processor_flag: 0
machdep.cpu.signature: 198313
machdep.cpu.stepping: 9
machdep.cpu.thread_count: 3
machdep.cpu.vendor: GenuineIntel

What changes are included in this PR?

"1. -DARROW_RUNTIME_SIMD_LEVEL=NONE" because it's straightforward and "2. Remove ENV.runtime_cpu_detection if Hardware::CPU.intel?" may also disable runtime SIMD dispatch implicitly.

This also adds the following debug information for easy to debug in future:

  • CPU information for GitHub Actions runner
  • Homebrew's build logs

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions
Copy link

⚠️ GitHub issue #36685 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Jul 15, 2023
@kou
Copy link
Member Author

kou commented Jul 15, 2023

@github-actions crossbow submit homebrew-r-brew

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Jul 16, 2023

@github-actions crossbow submit homebrew-r-brew

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Jul 16, 2023

@github-actions crossbow submit homebrew-r-brew

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Jul 16, 2023

@github-actions crossbow submit homebrew-r-brew

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Jul 16, 2023

@github-actions crossbow submit homebrew-r-brew

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Jul 16, 2023

@github-actions crossbow submit homebrew-r-brew

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Jul 17, 2023

@github-actions crossbow submit homebrew-r-brew

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Jul 17, 2023

@github-actions crossbow submit homebrew-r-brew

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Jul 18, 2023

@github-actions crossbow submit homebrew-r-brew

@github-actions

This comment was marked as outdated.

@kou kou requested review from westonpace and pitrou July 26, 2023 14:44
@kou
Copy link
Member Author

kou commented Jul 26, 2023

@pitrou @westonpace Could you check the PR description?

@github-actions

This comment was marked as outdated.

@pitrou
Copy link
Member

pitrou commented Jul 26, 2023

Here are candidate solutions:

1. `-DARROW_RUNTIME_SIMD_LEVEL=OFF`

2. Remove `ENV.runtime_cpu_detection if Hardware::CPU.intel?`

You could pass -DARROW_RUNTIME_SIMD_LEVEL=SSE4_2 on Intel, but that wouldn't change anything concretely, as we don't do any SSE4.2 runtime dispatch (only AVX2 and AVX512).

@pitrou
Copy link
Member

pitrou commented Jul 26, 2023

"-DARROW_RUNTIME_SIMD_LEVEL=OFF" is straightforward but it has performance penalty on SIMD available hosts.

The other solution probably has a performance penalty as well, otherwise it wouldn't work?

(can you check the resulting binary and see if there any AVX2 routines?)

@kou
Copy link
Member Author

kou commented Jul 26, 2023

I can't check it easily... (We need to upload binaries and check them locally.)

I'll use -DARROW_RUNTIME_SIMD_LEVEL=NONE solution because Intel macOS will not be our main targets.

@kou
Copy link
Member Author

kou commented Jul 26, 2023

@github-actions crossbow submit homebrew-r-brew

@github-actions

This comment was marked as outdated.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description makes sense to me. I agree that 1 is a simpler approach. We probably shouldn't try to rely on strange build configurations to make runtime SIMD dispatch work. Instead we should have a software solution. Do we want a follow-up PR for a software fix in case someone wanted to try and improve the performance on macos?

There is a lot of debug info in this PR. I think this is probably a good thing (having more info in our build logs can help debugging) but I just want to verify it is intentional?

# TODO(ARROW-16907): apache/arrow@main seems to be installed already
# so this does nothing on a branch/PR
brew install -v --HEAD apache-arrow
brew install -v --HEAD {{ '$(brew --repository homebrew/core)/Formula/apache-arrow.rb' }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to specify absolute path for a formula to use the local modified version. (We didn't need this with old Homebrew.)
I think that it's related to the "Using JSON files downloaded from formulae.brew.sh for package installation rather than local homebrew/core and homebrew/cask taps." change in Homebrew 4.0.0: https://brew.sh/2023/02/16/homebrew-4.0.0/

We may be able to use HOMEBREW_NO_INSTALL_FROM_API=1 instead of specifying an absolute path. But I think that the absolute path approach is straightforward.

@@ -57,7 +57,8 @@ class ApacheArrow < Formula
fails_with gcc: "5"

def install
# https://github.com/Homebrew/homebrew-core/issues/76537
# This isn't for https://github.com/Homebrew/homebrew-core/issues/76537 .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is deleting the old comment enough? Is there a reason a future reader would think this is related to 76537?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we defer this to another PR?
To be honest, I want to remove ENV.runtime_cpu_detection if Hardware::CPU.intel? entirely or remove if Hardware::CPU.intel?. If I do the change, we can remove this.
But I want to keep this in this PR to clarify we choose "1. -DARROW_RUNTIME_SIMD_LEVEL=NONE instead of "2. Remove ENV.runtime_cpu_detection if Hardware::CPU.intel?".

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 26, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 27, 2023
@kou
Copy link
Member Author

kou commented Jul 27, 2023

@github-actions crossbow submit homebrew-r-brew

@github-actions
Copy link

Revision: 7e688a0

Submitted crossbow builds: ursacomputing/crossbow @ actions-6a66c6840c

Task Status
homebrew-r-brew Github Actions

Copy link
Member Author

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't try to rely on strange build configurations to make runtime SIMD dispatch work. Instead we should have a software solution. Do we want a follow-up PR for a software fix in case someone wanted to try and improve the performance on macos?

I've opened a new issue for it: #36902

There is a lot of debug info in this PR. I think this is probably a good thing (having more info in our build logs can help debugging) but I just want to verify it is intentional?

Yes. It's intentional. I think that we should be able to get the debug information by default. I've also mentioned these changes in the PR description.

@@ -57,7 +57,8 @@ class ApacheArrow < Formula
fails_with gcc: "5"

def install
# https://github.com/Homebrew/homebrew-core/issues/76537
# This isn't for https://github.com/Homebrew/homebrew-core/issues/76537 .
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we defer this to another PR?
To be honest, I want to remove ENV.runtime_cpu_detection if Hardware::CPU.intel? entirely or remove if Hardware::CPU.intel?. If I do the change, we can remove this.
But I want to keep this in this PR to clarify we choose "1. -DARROW_RUNTIME_SIMD_LEVEL=NONE instead of "2. Remove ENV.runtime_cpu_detection if Hardware::CPU.intel?".

# TODO(ARROW-16907): apache/arrow@main seems to be installed already
# so this does nothing on a branch/PR
brew install -v --HEAD apache-arrow
brew install -v --HEAD {{ '$(brew --repository homebrew/core)/Formula/apache-arrow.rb' }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to specify absolute path for a formula to use the local modified version. (We didn't need this with old Homebrew.)
I think that it's related to the "Using JSON files downloaded from formulae.brew.sh for package installation rather than local homebrew/core and homebrew/cask taps." change in Homebrew 4.0.0: https://brew.sh/2023/02/16/homebrew-4.0.0/

We may be able to use HOMEBREW_NO_INSTALL_FROM_API=1 instead of specifying an absolute path. But I think that the absolute path approach is straightforward.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jul 27, 2023
@kou
Copy link
Member Author

kou commented Jul 28, 2023

If nobody objects this, I'll merge this in a few days.
(I'll also update apache-arrow formula in Homebrew later.)

@kou kou merged commit af23f6a into apache:main Jul 30, 2023
10 checks passed
@kou kou deleted the r-autobrew-openssl branch July 30, 2023 21:11
@kou kou removed the awaiting changes Awaiting changes label Jul 30, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit af23f6a.

There were 3 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this pull request Aug 20, 2023
…ache#36705)

### Rationale for this change

Summary of this problem: apache#31132 (comment)

Why this problem is happen again? Because I removed `ENV["HOMEBREW_OPTIMIZATION_LEVEL"] = "O2"` in apache#36583. The solution we chose by apache#14342 was forcing to use `-O2` for SIMD related code. It works for `-DCMAKE_BUILD_TYPE=MinSizeRel` but it doesn't work for Homebrew.

Because Homebrew's CC https://github.com/Homebrew/brew/blob/master/Library/Homebrew/shims/super/cc forces to use the same `-O` flag. The default is `-Os`. If we specify `-O2`, Homebrew's CC replaces it to `-Os`. If we use  `ENV["HOMEBREW_OPTIMIZATION_LEVEL"] = "O2"`, Homebrew's CC always use `-O2`. So the solution we chose by apache#14342 isn't used for Homebrew.

But Homebrew thinks that `ENV["HOMEBREW_OPTIMIZATION_LEVEL"] = "O2"` is a workaround. So we need another solution for Homebrew.

Here are candidate solutions:
1. `-DARROW_RUNTIME_SIMD_LEVEL=NONE`
2. Remove `ENV.runtime_cpu_detection if Hardware::CPU.intel?`

"1. `-DARROW_RUNTIME_SIMD_LEVEL=NONE`" works because we don't use the runtime SIMD dispatch feature (the problematic feature) entirely.

"2. Remove `ENV.runtime_cpu_detection if Hardware::CPU.intel?`" works but I don't know why... If `ENV.runtime_cpu_detection` is called, Homebrew's CC stops replacing `-march=*`. If we call `ENV.runtime_cpu_detection`, `-march=haswell` is used for AVX2 related code and `-march=skylake-avx512` is used for AVX512 including BMI2 related code. If we don't call `ENV.runtime_cpu_detection`, `-march=nehalem` is always used. (Note that SIMD related flags such as `-mbmi2` aren't removed by Homebrew's CC. So I think that SIMD is enabled.) I don't know why but "the one-definition-rule violation" (see the summary for details: apache#31132 (comment) ) isn't happen.

FYI: CPU info for GitHub Actions macOS hosted-runner:

```console
$ sysctl hw.optional machdep.cpu
hw.optional.adx: 0
hw.optional.aes: 1
hw.optional.avx1_0: 1
hw.optional.avx2_0: 0
hw.optional.avx512bw: 0
hw.optional.avx512cd: 0
hw.optional.avx512dq: 0
hw.optional.avx512f: 0
hw.optional.avx512ifma: 0
hw.optional.avx512vbmi: 0
hw.optional.avx512vl: 0
hw.optional.bmi1: 0
hw.optional.bmi2: 0
hw.optional.enfstrg: 0
hw.optional.f16c: 1
hw.optional.floatingpoint: 1
hw.optional.fma: 0
hw.optional.hle: 0
hw.optional.mmx: 1
hw.optional.mpx: 0
hw.optional.rdrand: 1
hw.optional.rtm: 0
hw.optional.sgx: 0
hw.optional.sse: 1
hw.optional.sse2: 1
hw.optional.sse3: 1
hw.optional.sse4_1: 1
hw.optional.sse4_2: 1
hw.optional.supplementalsse3: 1
hw.optional.x86_64: 1
machdep.cpu.address_bits.physical: 43
machdep.cpu.address_bits.virtual: 48
machdep.cpu.arch_perf.events: 127
machdep.cpu.arch_perf.events_number: 7
machdep.cpu.arch_perf.fixed_number: 0
machdep.cpu.arch_perf.fixed_width: 0
machdep.cpu.arch_perf.number: 4
machdep.cpu.arch_perf.version: 1
machdep.cpu.arch_perf.width: 48
machdep.cpu.cache.L2_associativity: 8
machdep.cpu.cache.linesize: 64
machdep.cpu.cache.size: 256
machdep.cpu.mwait.extensions: 3
machdep.cpu.mwait.linesize_max: 4096
machdep.cpu.mwait.linesize_min: 64
machdep.cpu.mwait.sub_Cstates: 16
machdep.cpu.thermal.ACNT_MCNT: 0
machdep.cpu.thermal.core_power_limits: 0
machdep.cpu.thermal.dynamic_acceleration: 0
machdep.cpu.thermal.energy_policy: 0
machdep.cpu.thermal.fine_grain_clock_mod: 0
machdep.cpu.thermal.hardware_feedback: 0
machdep.cpu.thermal.invariant_APIC_timer: 1
machdep.cpu.thermal.package_thermal_intr: 0
machdep.cpu.thermal.sensor: 0
machdep.cpu.thermal.thresholds: 0
machdep.cpu.tlb.data.small: 64
machdep.cpu.tlb.inst.large: 8
machdep.cpu.tlb.inst.small: 64
machdep.cpu.tlb.shared: 512
machdep.cpu.tsc_ccc.denominator: 0
machdep.cpu.tsc_ccc.numerator: 0
machdep.cpu.xsave.extended_state: 7 832 832 0
machdep.cpu.xsave.extended_state1: 0 0 0 0
machdep.cpu.brand: 0
machdep.cpu.brand_string: Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
machdep.cpu.core_count: 3
machdep.cpu.cores_per_package: 4
machdep.cpu.extfamily: 0
machdep.cpu.extfeature_bits: 4967106816
machdep.cpu.extfeatures: SYSCALL XD EM64T LAHF RDTSCP TSCI
machdep.cpu.extmodel: 3
machdep.cpu.family: 6
machdep.cpu.feature_bits: 18427078393948011519
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH MMX FXSR SSE SSE2 SS HTT SSE3 PCLMULQDQ MON VMX SSSE3 CX16 SSE4.1 SSE4.2 x2APIC POPCNT AES VMM PCID XSAVE OSXSAVE TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_feature_bits: 643 0
machdep.cpu.leaf7_feature_bits_edx: 3154117632
machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SMEP ERMS MDCLEAR IBRS STIBP L1DF ACAPMSR SSBD
machdep.cpu.logical_per_package: 4
machdep.cpu.max_basic: 13
machdep.cpu.max_ext: 2147483656
machdep.cpu.microcode_version: 1070
machdep.cpu.model: 58
machdep.cpu.processor_flag: 0
machdep.cpu.signature: 198313
machdep.cpu.stepping: 9
machdep.cpu.thread_count: 3
machdep.cpu.vendor: GenuineIntel
```

### What changes are included in this PR?

"1. `-DARROW_RUNTIME_SIMD_LEVEL=NONE`" because it's straightforward and  "2. Remove `ENV.runtime_cpu_detection if Hardware::CPU.intel?`" may also disable runtime SIMD dispatch implicitly.

This also adds the following debug information for easy to debug in future:
* CPU information for GitHub Actions runner
* Homebrew's build logs

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.

* Closes: apache#36685

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…ache#36705)

### Rationale for this change

Summary of this problem: apache#31132 (comment)

Why this problem is happen again? Because I removed `ENV["HOMEBREW_OPTIMIZATION_LEVEL"] = "O2"` in apache#36583. The solution we chose by apache#14342 was forcing to use `-O2` for SIMD related code. It works for `-DCMAKE_BUILD_TYPE=MinSizeRel` but it doesn't work for Homebrew.

Because Homebrew's CC https://github.com/Homebrew/brew/blob/master/Library/Homebrew/shims/super/cc forces to use the same `-O` flag. The default is `-Os`. If we specify `-O2`, Homebrew's CC replaces it to `-Os`. If we use  `ENV["HOMEBREW_OPTIMIZATION_LEVEL"] = "O2"`, Homebrew's CC always use `-O2`. So the solution we chose by apache#14342 isn't used for Homebrew.

But Homebrew thinks that `ENV["HOMEBREW_OPTIMIZATION_LEVEL"] = "O2"` is a workaround. So we need another solution for Homebrew.

Here are candidate solutions:
1. `-DARROW_RUNTIME_SIMD_LEVEL=NONE`
2. Remove `ENV.runtime_cpu_detection if Hardware::CPU.intel?`

"1. `-DARROW_RUNTIME_SIMD_LEVEL=NONE`" works because we don't use the runtime SIMD dispatch feature (the problematic feature) entirely.

"2. Remove `ENV.runtime_cpu_detection if Hardware::CPU.intel?`" works but I don't know why... If `ENV.runtime_cpu_detection` is called, Homebrew's CC stops replacing `-march=*`. If we call `ENV.runtime_cpu_detection`, `-march=haswell` is used for AVX2 related code and `-march=skylake-avx512` is used for AVX512 including BMI2 related code. If we don't call `ENV.runtime_cpu_detection`, `-march=nehalem` is always used. (Note that SIMD related flags such as `-mbmi2` aren't removed by Homebrew's CC. So I think that SIMD is enabled.) I don't know why but "the one-definition-rule violation" (see the summary for details: apache#31132 (comment) ) isn't happen.

FYI: CPU info for GitHub Actions macOS hosted-runner:

```console
$ sysctl hw.optional machdep.cpu
hw.optional.adx: 0
hw.optional.aes: 1
hw.optional.avx1_0: 1
hw.optional.avx2_0: 0
hw.optional.avx512bw: 0
hw.optional.avx512cd: 0
hw.optional.avx512dq: 0
hw.optional.avx512f: 0
hw.optional.avx512ifma: 0
hw.optional.avx512vbmi: 0
hw.optional.avx512vl: 0
hw.optional.bmi1: 0
hw.optional.bmi2: 0
hw.optional.enfstrg: 0
hw.optional.f16c: 1
hw.optional.floatingpoint: 1
hw.optional.fma: 0
hw.optional.hle: 0
hw.optional.mmx: 1
hw.optional.mpx: 0
hw.optional.rdrand: 1
hw.optional.rtm: 0
hw.optional.sgx: 0
hw.optional.sse: 1
hw.optional.sse2: 1
hw.optional.sse3: 1
hw.optional.sse4_1: 1
hw.optional.sse4_2: 1
hw.optional.supplementalsse3: 1
hw.optional.x86_64: 1
machdep.cpu.address_bits.physical: 43
machdep.cpu.address_bits.virtual: 48
machdep.cpu.arch_perf.events: 127
machdep.cpu.arch_perf.events_number: 7
machdep.cpu.arch_perf.fixed_number: 0
machdep.cpu.arch_perf.fixed_width: 0
machdep.cpu.arch_perf.number: 4
machdep.cpu.arch_perf.version: 1
machdep.cpu.arch_perf.width: 48
machdep.cpu.cache.L2_associativity: 8
machdep.cpu.cache.linesize: 64
machdep.cpu.cache.size: 256
machdep.cpu.mwait.extensions: 3
machdep.cpu.mwait.linesize_max: 4096
machdep.cpu.mwait.linesize_min: 64
machdep.cpu.mwait.sub_Cstates: 16
machdep.cpu.thermal.ACNT_MCNT: 0
machdep.cpu.thermal.core_power_limits: 0
machdep.cpu.thermal.dynamic_acceleration: 0
machdep.cpu.thermal.energy_policy: 0
machdep.cpu.thermal.fine_grain_clock_mod: 0
machdep.cpu.thermal.hardware_feedback: 0
machdep.cpu.thermal.invariant_APIC_timer: 1
machdep.cpu.thermal.package_thermal_intr: 0
machdep.cpu.thermal.sensor: 0
machdep.cpu.thermal.thresholds: 0
machdep.cpu.tlb.data.small: 64
machdep.cpu.tlb.inst.large: 8
machdep.cpu.tlb.inst.small: 64
machdep.cpu.tlb.shared: 512
machdep.cpu.tsc_ccc.denominator: 0
machdep.cpu.tsc_ccc.numerator: 0
machdep.cpu.xsave.extended_state: 7 832 832 0
machdep.cpu.xsave.extended_state1: 0 0 0 0
machdep.cpu.brand: 0
machdep.cpu.brand_string: Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
machdep.cpu.core_count: 3
machdep.cpu.cores_per_package: 4
machdep.cpu.extfamily: 0
machdep.cpu.extfeature_bits: 4967106816
machdep.cpu.extfeatures: SYSCALL XD EM64T LAHF RDTSCP TSCI
machdep.cpu.extmodel: 3
machdep.cpu.family: 6
machdep.cpu.feature_bits: 18427078393948011519
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH MMX FXSR SSE SSE2 SS HTT SSE3 PCLMULQDQ MON VMX SSSE3 CX16 SSE4.1 SSE4.2 x2APIC POPCNT AES VMM PCID XSAVE OSXSAVE TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_feature_bits: 643 0
machdep.cpu.leaf7_feature_bits_edx: 3154117632
machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SMEP ERMS MDCLEAR IBRS STIBP L1DF ACAPMSR SSBD
machdep.cpu.logical_per_package: 4
machdep.cpu.max_basic: 13
machdep.cpu.max_ext: 2147483656
machdep.cpu.microcode_version: 1070
machdep.cpu.model: 58
machdep.cpu.processor_flag: 0
machdep.cpu.signature: 198313
machdep.cpu.stepping: 9
machdep.cpu.thread_count: 3
machdep.cpu.vendor: GenuineIntel
```

### What changes are included in this PR?

"1. `-DARROW_RUNTIME_SIMD_LEVEL=NONE`" because it's straightforward and  "2. Remove `ENV.runtime_cpu_detection if Hardware::CPU.intel?`" may also disable runtime SIMD dispatch implicitly.

This also adds the following debug information for easy to debug in future:
* CPU information for GitHub Actions runner
* Homebrew's build logs

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.

* Closes: apache#36685

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R][C++] Homebrew nightly test failing with illegal opcode
3 participants