Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pacman: Set default x86_64 march to core2 #3229

Merged
merged 1 commit into from Oct 16, 2022
Merged

Conversation

lhmouse
Copy link
Contributor

@lhmouse lhmouse commented Oct 15, 2022

This allows SSE4.2 and CMPXCHG16, the same with RHEL 9.

Reference: https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level
Signed-off-by: LIU Hao lh_mouse@126.com

@jeremyd2019
Copy link
Member

I am still keeping a couple of machines alive with core2 (one generation older than nehalem, which x86-64-v2 corresponds to on Intel). They do run win10 without issue, even if they are not listed as supported. If there's consensus that these machines are out of luck, that's fine, but wanted to point out that dropping win7 does not automatically mean these older machines can't run anymore

@lhmouse
Copy link
Contributor Author

lhmouse commented Oct 15, 2022

Do you suggest -march=core2 instead? Or something even older than that, such as -march=nocona?

@jeremyd2019
Copy link
Member

I don't know about AMD. I've dealt with an AMD X2 something, that also could run win10, but don't know how that fits into those intel codenames. Wikipedia says it has sse3 but doesn't mention ssse3

@lazka
Copy link
Member

lazka commented Oct 15, 2022

I still think we should try to support the same hardware as Windows does, for the Windows versions we support, unless there is a good reason.

  • With Linux distros users have a choice of using an older LTS for older hardware. With MSYS2 being rolling there is no such choice.
  • There are downstream projects which distribute our binaries to Windows users and probably expect them to work everywhere.

edit: that doesn't mean that we shouldn't merge this PR, just wanted to mention it..

edit2: reqs for 8.1 are "PAE, NX, and SSE2, CMPXCHG16b, PrefetchW, and LAHF/SAHF " http://windows.microsoft.com/en-us/windows-8/system-requirements

@Biswa96
Copy link
Member

Biswa96 commented Oct 15, 2022

Would the result be same if a different version of mingw-w64-crt package is added with x86_64-v2?

@mati865
Copy link
Collaborator

mati865 commented Oct 15, 2022

No, it affects all binaries/libraries.

@lazka
Copy link
Member

lazka commented Oct 15, 2022

https://superuser.com/a/941175 has some CPU requirement info if we want to target 8.1

So If understand it correctly, in theory there are core2 that would run 8.1, but it's questionable how many people that affects. The next level up is nehalem, which is equivalent to x86-64-v2, so this looks fine to me I think. If we decide to drop core2.

@MehdiChinoune
Copy link
Collaborator

Some statistics:
Steam Hardware Survey:

CPU feature %
SSE2 100.00
SSE3 100.00
LAHF/SAHF 100.00
CMPXCHG16B 99.99%
SSSE3 99.50
SSE4.1 99.24
SSE4.2 98.95
FCMOV 97.64
AES 96.66
AVX 95.27
AVX2 89.05
HyperThreading 77.12

FireFox Data
in the GPU graph you could observe that most users have GPUs >= SandyBridge == Intel Core 2nd Gen

Arch
30-08-2021
image

29-01-2022
image

@lhmouse
Copy link
Contributor Author

lhmouse commented Oct 16, 2022

The difference between core2, nocona, nehalem etc. is trivial; I don't prefer any of them. But, x86-64 is different from nocona a lot because it doesn't allow _InterlockedCompareExchange128() (observed with clang -fms-extensions though, I think GCC behaves likewise on __atomic_* builtins).

@lazka
Copy link
Member

lazka commented Oct 16, 2022

Arch
30-08-2021

I tried to replicate the "openssl" benchmark used there (openssl speed rsa) with different arch values but there is no difference at all: https://gist.github.com/lazka/722b926e1ffe04ba99da7f33c625ef2b

The difference between core2, nocona, nehalem etc. is trivial; I don't prefer any of them. But, x86-64 is different from nocona a lot because it doesn't allow _InterlockedCompareExchange128() (observed with clang -fms-extensions though, I think GCC behaves likewise on __atomic_* builtins).

Then I'd vote for core2 as it matches win8.1/10 requirements.

@lhmouse fyi, if you run updpkgsums in the directory of the PKGBUILD it will update the checksums and let CI pass.

@lhmouse
Copy link
Contributor Author

lhmouse commented Oct 16, 2022

Thanks for the information. Updated.

@MehdiChinoune
Copy link
Collaborator

I tried to replicate the "openssl" benchmark used there (openssl speed rsa) with different march values but there is no difference at all: https://gist.github.com/lazka/722b926e1ffe04ba99da7f33c625ef2b

Maybe our OpenSSL is already optimized to choose the right CPU architecture at runtime, like OpenBLAS, svt-av1, dav1d, embree ... etc.

@lazka
Copy link
Member

lazka commented Oct 16, 2022

Maybe our OpenSSL is already optimized to choose the right CPU architecture at runtime, like OpenBLAS, svt-av1, dav1d, embree ... etc.

yeah, likely that or asm. I was just wondering since they state some improvement here: https://gitlab.archlinux.org/archlinux/rfcs/-/blob/69bda815a157b3593f514d93f961da15ab3c59c1/rfcs/0002-march.rst#L89

@lazka
Copy link
Member

lazka commented Oct 16, 2022

Thanks for the information. Updated.

the commit message needs updating too.

This allows builtin atomic operations on 128-bit integers.

Signed-off-by: LIU Hao <lh_mouse@126.com>
@lhmouse
Copy link
Contributor Author

lhmouse commented Oct 16, 2022

Thanks for the information. Updated.

the commit message needs updating too.

Oops, updated too.

@mati865
Copy link
Collaborator

mati865 commented Oct 16, 2022

I tried to replicate the "openssl" benchmark used there (openssl speed rsa) with different march values but there is no difference at all: gist.github.com/lazka/722b926e1ffe04ba99da7f33c625ef2b

Maybe our OpenSSL is already optimized to choose the right CPU architecture at runtime, like OpenBLAS, svt-av1, dav1d, embree ... etc.

OpenSSL doesn't have runtime SIMD features detection.

@lazka lazka merged commit 2d6af54 into msys2:master Oct 16, 2022
@MehdiChinoune
Copy link
Collaborator

C-Ray benchmark:

$ wget http://www.phoronix.net/downloads/phoronix-test-suite/benchmark-files/c-ray-1.1.tar.gz
$ tar -xzf c-ray-1.1.tar.gz
$ cd c-ray-1.1
$ export RT_THREADS=$(($(nproc) * 16))

x86-64:

$ export CFLAGS="-march=x86-64 -mtune=generic"
$ cc -o c-ray-mt c-ray-mt.c -lpthread -O3 $CFLAGS
$ ./c-ray-mt.exe -t $RT_THREADS -s 3840x2160 -r 16 -i sphfract -o output.ppm
c-ray-mt v1.1
Rendering took: 235 seconds (235390 milliseconds)

x86-64-v2:

$ export CFLAGS="-march=x86-64-v2 -mtune=generic"
$ cc -o c-ray-mt c-ray-mt.c -lpthread -O3 $CFLAGS
$ ./c-ray-mt.exe -t $RT_THREADS -s 3840x2160 -r 16 -i sphfract -o output.ppm
c-ray-mt v1.1
Rendering took: 218 seconds (218813 milliseconds)

x86-64-v3:

$ export CFLAGS="-march=x86-64-v3 -mtune=generic"
$ cc -o c-ray-mt c-ray-mt.c -lpthread -O3 $CFLAGS
$ ./c-ray-mt.exe -t $RT_THREADS -s 3840x2160 -r 16 -i sphfract -o output.ppm
c-ray-mt v1.1
Rendering took: 159 seconds (159281 milliseconds)

@jeremyd2019
Copy link
Member

CPU feature %
SSE2 100.00
SSE3 100.00
LAHF/SAHF 100.00
CMPXCHG16B 99.99%
SSSE3 99.50

I think that is the cut-off of core2. The only thing I'm questioning there is SSSE3 just because some AMD did not include that, but 99.5% having it is pretty darn good.

@jeremyd2019 jeremyd2019 changed the title pacman: Set default arch to x86-64-v2 pacman: Set default x86_64 march to core2 Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants