Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect decected AES-NI: LTC_AMD64_SSE4_1 => LTC_HAS_AES_NI: "ltc/ciphers/aes/aesni.c:74:26: error: '__builtin_ia32_aeskeygenassist128' needs target feature aes" #97

Closed
VVD opened this issue Oct 4, 2023 · 19 comments

Comments

@VVD
Copy link

VVD commented Oct 4, 2023

FreeBSD 13.2 amd64, CPU Bloomfield and Lynnfield (1st generation of Nehalem 45nm, examples: Core i7 920/930/etc, Core i5 750/760, Core i7 860/870, Xeon X34xx, Xeon E55xx), LLVM 14.0.5, -march=nehalem.

Lynnfield Xeon X3430 with SSE4.1 and without AES-NI:

# cpuid-etallen | grep -iE 'aes|xeon|sse4.1'|head -n5
      SSE4.1 extensions                       = true
      AES instruction                         = false
      VAES instructions                        = false
   brand = "Intel(R) Xeon(R) CPU           X3430  @ 2.40GHz"
   (synth) = Intel Xeon Processor 3400 (Lynnfield B1) [Nehalem] {Nehalem}, 45nm

Bloomfiled Core i7 930 with SSE4.1 and without AES-NI:

# cpuid-etallen | grep -iE 'aes|i7|sse4.1'|head -n5
      SSE4.1 extensions                       = true
      AES instruction                         = false
      VAES instructions                        = false
   brand = "Intel(R) Core(TM) i7 CPU         930  @ 2.80GHz"
   (synth) = Intel Core i7-900 (Bloomfield D0) [Nehalem] {Nehalem}, 45nm

Westmere Xeon E5620 with support of the AES-NI:

# cpuid-etallen | grep -iE 'aes|xeon|sse4.1'|head -n5
      SSE4.1 extensions                       = true
      AES instruction                         = true
      VAES instructions                        = false
   brand = "Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz"
   (synth) = Intel Xeon Processor 3600 / 5600 (Westmere-EP B1) [Westmere] {Nehalem}, 32nm

In src/ltc/headers/tomcrypt_private.h turn on AES_NI if detected SSE4.1, but it's incorrect:

#if defined(LTC_AES_NI) && defined(LTC_AMD64_SSE4_1)
#define LTC_HAS_AES_NI
#endif

And have build error:

--- config ---
--- src/liballinone.a ---
--- ltc/ciphers/aes/aesni.o ---
cc -Iltm -Iltc/headers -DLTC_SOURCE -DLTC_NO_TEST -DLTC_NO_PROTOTYPES -DLTM_DESC -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DUSE_THREAD_SAFE_LOCALE -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -DPIC -fPIC -O2 -pipe -march=nehalem -fstack-protector-strong -fno-strict-aliasing   -DARGTYPE=4 -c ltc/ciphers/aes/aesni.c -o ltc/ciphers/aes/aesni.o
--- ltc/ciphers/anubis.o ---
cc -Iltm -Iltc/headers -DLTC_SOURCE -DLTC_NO_TEST -DLTC_NO_PROTOTYPES -DLTM_DESC -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -DUSE_THREAD_SAFE_LOCALE -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -DPIC -fPIC -O2 -pipe -march=nehalem -fstack-protector-strong -fno-strict-aliasing   -DARGTYPE=4 -c ltc/ciphers/anubis.c -o ltc/ciphers/anubis.o
--- ltc/ciphers/aes/aesni.o ---
ltc/ciphers/aes/aesni.c:74:26: error: '__builtin_ia32_aeskeygenassist128' needs target feature aes
         rk[4] = rk[0] ^ setup_mix(temp, 3) ^ rcon[i];
                         ^
ltc/ciphers/aes/aesni.c:27:43: note: expanded from macro 'setup_mix'
#define setup_mix(t, c) _mm_extract_epi32(_mm_aeskeygenassist_si128(t, 0), c)
                                          ^
/usr/lib/clang/14.0.5/include/__wmmintrin_aes.h:136:13: note: expanded from macro '_mm_aeskeygenassist_si128'
  ((__m128i)__builtin_ia32_aeskeygenassist128((__v2di)(__m128i)(C), (int)(R)))
            ^
ltc/ciphers/aes/aesni.c:89:26: error: '__builtin_ia32_aeskeygenassist128' needs target feature aes
         rk[6] = rk[0] ^ setup_mix(temp, 3) ^ rcon[i];
                         ^
ltc/ciphers/aes/aesni.c:27:43: note: expanded from macro 'setup_mix'
#define setup_mix(t, c) _mm_extract_epi32(_mm_aeskeygenassist_si128(t, 0), c)
                                          ^
/usr/lib/clang/14.0.5/include/__wmmintrin_aes.h:136:13: note: expanded from macro '_mm_aeskeygenassist_si128'
  ((__m128i)__builtin_ia32_aeskeygenassist128((__v2di)(__m128i)(C), (int)(R)))
            ^
ltc/ciphers/aes/aesni.c:108:26: error: '__builtin_ia32_aeskeygenassist128' needs target feature aes
         rk[8] = rk[0] ^ setup_mix(temp, 3) ^ rcon[i];
                         ^
ltc/ciphers/aes/aesni.c:27:43: note: expanded from macro 'setup_mix'
#define setup_mix(t, c) _mm_extract_epi32(_mm_aeskeygenassist_si128(t, 0), c)
                                          ^
/usr/lib/clang/14.0.5/include/__wmmintrin_aes.h:136:13: note: expanded from macro '_mm_aeskeygenassist_si128'
  ((__m128i)__builtin_ia32_aeskeygenassist128((__v2di)(__m128i)(C), (int)(R)))
            ^
ltc/ciphers/aes/aesni.c:116:27: error: '__builtin_ia32_aeskeygenassist128' needs target feature aes
         rk[12] = rk[4] ^ setup_mix(temp, 2);
                          ^
ltc/ciphers/aes/aesni.c:27:43: note: expanded from macro 'setup_mix'
#define setup_mix(t, c) _mm_extract_epi32(_mm_aeskeygenassist_si128(t, 0), c)
                                          ^
/usr/lib/clang/14.0.5/include/__wmmintrin_aes.h:136:13: note: expanded from macro '_mm_aeskeygenassist_si128'
  ((__m128i)__builtin_ia32_aeskeygenassist128((__v2di)(__m128i)(C), (int)(R)))
            ^
ltc/ciphers/aes/aesni.c:145:14: error: always_inline function '_mm_aesimc_si128' requires target feature 'aes', but would be inlined into function 'aesni_setup' that is compiled without support for 'aes'
      temp = temp_invert(rk);
             ^
ltc/ciphers/aes/aesni.c:30:24: note: expanded from macro 'temp_invert'
#define temp_invert(k) _mm_aesimc_si128(*((__m128i*)(k)))
                       ^
ltc/ciphers/aes/aesni.c:146:26: error: always_inline function '_mm_aesimc_si128' requires target feature 'aes', but would be inlined into function 'aesni_setup' that is compiled without support for 'aes'
      *((__m128i*) rk) = temp_invert(rrk);
                         ^
ltc/ciphers/aes/aesni.c:30:24: note: expanded from macro 'temp_invert'
#define temp_invert(k) _mm_aesimc_si128(*((__m128i*)(k)))
                       ^
6 errors generated.
*** [ltc/ciphers/aes/aesni.o] Error code 1

make[2]: stopped in /tmp/work/usr/ports/security/p5-CryptX/work/CryptX-0.079/src
1 error

make[2]: stopped in /tmp/work/usr/ports/security/p5-CryptX/work/CryptX-0.079/src
*** [src/liballinone.a] Error code 2

make[1]: stopped in /tmp/work/usr/ports/security/p5-CryptX/work/CryptX-0.079
--- CryptX.c ---
mv CryptX.xsc CryptX.c
@karel-m
Copy link
Contributor

karel-m commented Oct 4, 2023

Can you please try CryptX-0.80?

@VVD
Copy link
Author

VVD commented Oct 4, 2023

clang detects AES-NI from skylake only and doesn't detect it from westmere to broadwell (sandybridge, ivybridge, haswell):

$ for CPU in westmere sandybridge ivybridge haswell broadwell skylake; do echo $CPU; echo | clang -march=$CPU -dM -E - | grep -i aes; done
westmere
sandybridge
ivybridge
haswell
broadwell
skylake
#define __AES__ 1

On other side it's possible to force turn on AES-NI even if march doesn't support it:

$ echo | clang -march=core2 -maes -dM -E - | grep -i aes
#define __AES__ 1

As a simple workaround we can check #if defined(__AES__1) instead of the SSE4.1.
And also add build option to force add -maes if user know his CPU support it.

@karel-m
Copy link
Contributor

karel-m commented Oct 4, 2023

ping @sjaeckel

@VVD
Copy link
Author

VVD commented Oct 4, 2023

I see -msse4.1 -maes in command line and my -march=nehalem, but it build fine without errors now (0.080).

@karel-m
Copy link
Contributor

karel-m commented Oct 4, 2023

Does the test suite of 0.080 show something like this?

# Various others:  ARGTYPE=4  ADLER32  AES-NI  BASE64  BASE64-URL-SAFE .....
                                       ^^^^^^

@VVD
Copy link
Author

VVD commented Oct 4, 2023

Does the test suite of 0.080 show something like this?

# Various others:  ARGTYPE=4  ADLER32  AES-NI  BASE64  BASE64-URL-SAFE .....
                                       ^^^^^^

How to run it? :-)

@karel-m
Copy link
Contributor

karel-m commented Oct 4, 2023

make test

@VVD
Copy link
Author

VVD commented Oct 4, 2023

On nehalem CPU:

# Various others:  ARGTYPE=4  ADLER32  AES-NI  BASE64
All tests successful.
Files=137, Tests=18568, 29 wallclock secs ( 2.24 usr  0.21 sys + 26.30 cusr  1.99 csys = 30.74 CPU)
Result: PASS

@karel-m
Copy link
Contributor

karel-m commented Oct 4, 2023

It means that AES-SNI support:
1/ was properly compiled
2/ was detected during runtime

In theory you can build a binary with AES-NI support but run it on a CPU lacking AES-SNI support (which will be detected in runtime and fall back to SW implementation of AES)

@sjaeckel
Copy link

sjaeckel commented Oct 4, 2023

In theory you can build a binary with AES-NI support but run it on a CPU lacking AES-SNI support (which will be detected in runtime and fall back to SW implementation of AES)

TBH I never tested that since I don't have any amd64 CPUs w/o AES support, but maybe it's a good time now? ;)

@VVD
Copy link
Author

VVD commented Oct 4, 2023

So it use SW emulation of the AES-NI on my Nehalem?

@sjaeckel
Copy link

sjaeckel commented Oct 4, 2023

Yes, it should fall back to the SW implementation and therefore be a valid test of that functionality.

@karel-m
Copy link
Contributor

karel-m commented Oct 4, 2023

If you see

# Various others:  ARGTYPE=4  ADLER32  AES-NI  BASE64

it uses HW AES, if you see

# Various others:  ARGTYPE=4  ADLER32  BASE64

it uses sw implementation of AES.

@sjaeckel
Copy link

sjaeckel commented Oct 4, 2023

If you see

# Various others:  ARGTYPE=4  ADLER32  AES-NI  BASE64

it uses HW AES, if you see

not completely right.

If you see AES-NI it supports HW AES, but maybe uses SW AES if the HW doesn't support it.

@karel-m
Copy link
Contributor

karel-m commented Oct 4, 2023

@sjaeckel I am adding compiler options -msse4.1 -maes for gcc/clang on x86_64. Are both necessary or should I use only -aes?

I am afraid that without -msse4.1 the __SSE4_1__ will not be defined and AES-NI detection in tomcrypt_cfg.h will not work.

@sjaeckel
Copy link

sjaeckel commented Oct 4, 2023

IIRC they're both required since they enable different parts of the API

@karel-m
Copy link
Contributor

karel-m commented Oct 4, 2023

OK, I will keep both. Just FYI

$ echo | gcc -dM -E - | grep -e AES -e SSE
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSE__ 1

$ echo | gcc -maes -dM -E - | grep -e AES -e SSE
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSE__ 1
#define __AES__ 1

$ echo | gcc -maes -msse4.1 -dM -E - | grep -e AES -e SSE
#define __SSE4_1__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __AES__ 1
#define __SSE3__ 1

@karel-m
Copy link
Contributor

karel-m commented Oct 4, 2023

@VVD to sum it up for you:

If you see in the test suite output this

# Various others:  ARGTYPE=4  ADLER32  AES-NI  BASE64

it is the maximum you can achieve during building the CryptX module. You have built a binary that is ready to use HW AES-NI (or fall back to SW implementation when the CPU running this binary does not support AES-NI).

@VVD
Copy link
Author

VVD commented Oct 4, 2023

Thanks!
0.080 build fine and tests passed on both CPUs with SSE4.1 - with and without AES-NI support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants