Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hardware: use Nehalem flags on >= Sierra. #5429

Merged
merged 2 commits into from Dec 21, 2018

Conversation

MikeMcQuaid
Copy link
Member

@MikeMcQuaid MikeMcQuaid commented Dec 20, 2018

See https://en.wikipedia.org/wiki/MacOS_Sierra#System_requirements.

While we're here, copy Hackintosh comment from ENV/std to ENV/super so it's more
obvious that this change won't break those Hackintoshes.

Fixes #5425.

CC @claui @sjackman @fredemmott

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew tests with your changes locally?

@sjackman
Copy link
Member

See my comment at #5425 (comment)
I believe the default flags for Sierra and High Sierra could be -march=core2 -msse4.1

@MikeMcQuaid
Copy link
Member Author

I believe the default flags for Sierra and High Sierra could be -march=core2 -msse4.1

To restate myself: yes, they could be but that's unnecessarily reducing the targeted CPU.

@sjackman
Copy link
Member

I'm not entirely sure what changed in that time that would account for such a big difference.

The Xcode and thus Clang version.

These timing results are all on Linux. My apologies for omitting that important detail. No similar change in compiler occurred on Linux in that time.

@sjackman
Copy link
Member

I believe the default flags for Sierra and High Sierra could be -march=core2 -msse4.1

To restate myself: yes, they could be but that's unnecessarily reducing the targeted CPU.

It sounds as though changing the default to -march=core2 -msse4.1 could affect @claui. For someone who runs Sierra on SSE 4.1 hardware and pours a bottle of an app that uses an SSE 4.2 instruction, that app would crash an with illegal instruction. It'll be rare for sure, but possible.

@MikeMcQuaid
Copy link
Member Author

I’m open to a follow up PR before this lands in a tag that builds everything from source for those people with unsupported configurations (by us or Apple) but I personally would recommend waiting for an actual user report.

@sjackman
Copy link
Member

sjackman commented Dec 20, 2018

@claui On your Penryn system, could you please try changing

core2: "-march=core2",

to

core2: "-march=nehalem -msse4.2",

and brew install --build-bottle hello, and report whether hello works or crashes?
Edit: (check $HOMBREW_LOGS/hello/*.cc to ensure that the change took effect as desired)

@claui
Copy link
Contributor

claui commented Dec 20, 2018

@sjackman Thanks, I’m on it. Will report back in a few minutes.

@claui
Copy link
Contributor

claui commented Dec 20, 2018

@sjackman It prints Hello, world! and terminates with exit code 0. No crash whatsoever!

@sjackman
Copy link
Member

For those interested, @iMichka reports that Homebrew/brew on Linux and Linuxbrew used -Os, and it changed in August to -O2. That may account for the racon difference. See #4773

@fredemmott
Copy link
Contributor

For completeness: I tried -march=core2 -mtune=sandybridge for my HHVM case; the -mtune flag had no noticeable effect. -march is needed to get these speedups.

@claui
Copy link
Contributor

claui commented Dec 20, 2018

@sjackman On disassembling the hello binary, I’ve found that it contains no trace of any opcode from the SSE4.2 extensions.

Any idea on how to trick the compiler into using one of the SSE4.2 instructions?

@sjackman
Copy link
Member

You'd need to find a software app that uses one of the SSE 4.2 instructions. crc32 seems like the most likely candidate. Perhaps openssl.
See https://en.wikipedia.org/wiki/SSE4#SSE4.2

@fredemmott
Copy link
Contributor

fredemmott commented Dec 20, 2018

Adapted from https://docs.microsoft.com/en-us/previous-versions/bb531394(v=vs.120):

#include <stdio.h>
#include <nmmintrin.h>

int main() {
  unsigned int crc = 1;
  unsigned int input = 50000;
  unsigned int res = _mm_crc32_u32(crc, input);
  printf("%u\n", res);
  return 0;
}
$ gcc -march=core2 test.c
test.c:7:22: error: always_inline function '_mm_crc32_u32' requires target feature 'popcnt', but would be inlined into function 'main' that is compiled without support for 'popcnt'
  unsigned int res = _mm_crc32_u32(crc, input);
                     ^
1 error generated.
$ gcc -march=nehalem test.c
$ ./a.out
971731851

The actual CRC32 intrinsic is SSE4.2. The popcnt that it is using is SSE4a (AMD) - Nehalem+ for intel, Barcelona+ for AMD

@claui
Copy link
Contributor

claui commented Dec 20, 2018

@fredemmott Thanks for the code! As expected, your example bails with Illegal instruction: 4.

I’m open to a follow up PR before this lands in a tag that builds everything from source for those people with unsupported configurations (by us or Apple)

@MikeMcQuaid Fair enough. Building from source should be acceptable. (It’s not that users of Penryn Macs are exactly reaching for the stars anyway.) Bottles built for SSE4.2, however, are going to break stuff for those people left and right so I’m definitely going to look into that follow-up PR.

but I personally would recommend waiting for an actual user report.

After hello, the very first formula I built – openssl – bails on me for comparing numbers a few times in a row:

$ brew install --build-bottle openssl
[…]
$ /usr/local/opt/openssl/bin/openssl ecparam -C -name secp256k1
Illegal instruction: 4

This is due to pcmpgtq, an SSE4.2 instruction that compares several numbers simultaneously.

Again I'm afraid I'm not sure I see this being worth the cost of avoiding performance increases for the majority.

You’re right, and I stand corrected. In fact, I’ve been convinced since I learned there was a 5 – 10 % improvement and more on the table.

Copy link
Contributor

@claui claui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said earlier: a 5 – 10 % improvement would have been significant enough to change my mind, and warrant abandoning Penryn. Actually, we’re even dealing with 2×(ish) improvement for real-life workloads so I’ve changed my mind to 👍 of course.

Fallout needs to be mitigated in a separate PR in time before the next tag is cut.

@MikeMcQuaid MikeMcQuaid merged commit 1a78cc7 into Homebrew:master Dec 21, 2018
@MikeMcQuaid
Copy link
Member Author

Thanks for review, folks.

@MikeMcQuaid MikeMcQuaid deleted the nehalem-sierra-bottles branch December 21, 2018 10:29
@sjackman
Copy link
Member

Thanks for the change, Mike. I'd like to do something similar for Linux. I'll think on it.

@MikeMcQuaid
Copy link
Member Author

@sjackman That will need something to avoid bottles being used in that case. Personally I'm unsure whether that would be worth it for the Linuxbrew case given how diverse the systems are in comparison to macOS.

@sjackman
Copy link
Member

sjackman commented Dec 21, 2018

Linux's bottle tag is currently x86_64_linux and targets -march=core2. One approach is to bump the bottle tag to nehalem_linux. The implied _or_later of x86_64_linux_or_later would allow those x86_64_linux bottles to continue to be used by newer systems. The choice of target CPU could be made on a per-tap basis. In Brewsci/bio, I could see a strong argument for bottling for nehalem_linux, since those tools are often run on HPC systems, and can take multiple days to run, so a 2x speed-up saves days of run-time. There's less of an argument for bumping Linuxbrew/core to nehalem_linux.

@fredemmott
Copy link
Contributor

fredemmott commented Dec 21, 2018

Is there support in recipes for “bottle requires $ARCH, automatically build from source on older CPUs”? That would be nice for Linux from a usability perspective

@MikeMcQuaid
Copy link
Member Author

@fredemmott If there's no bottle it'll be built from source so: yep 👍

@fredemmott
Copy link
Contributor

fredemmott commented Dec 21, 2018

I don’t use Linuxbrew, but just to be sure: AIUI macOS bottles are keyed by OS, not microarchitecture; this is different for Linux?

@sjackman
Copy link
Member

We build only a single bottle on Linux. Those bottles currently require x86_64 ≥ Core2 (Penryn) and glibc ≥ 2.23, which can be installed by Linuxbrew if the host's glibc is older. On Linux we could build for multiple microarchitectures, but do not currently.

@lock lock bot added the outdated PR was locked due to age label Jan 20, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jan 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants