Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

32-bit x86 CPUID code returns EBP instead of EBX #51

Closed
bit4 opened this issue Feb 2, 2020 · 1 comment
Closed

32-bit x86 CPUID code returns EBP instead of EBX #51

bit4 opened this issue Feb 2, 2020 · 1 comment

Comments

@bit4
Copy link

bit4 commented Feb 2, 2020

The inline assembly used by 32-bit x86 code needs to save and restore the EBX register because position independent code needs it unfortunately (https://ewontfix.com/18/). Since the CPUID instruction also uses the EBX register to return information, the inline assembly has to copy that data into another register (ESI in this case) before it restores the original EBX.

However, both the cpuid() and the cpuidex() functions contain a typo, and they return whatever is in the EBP register (essentially a random value) instead of EBX. This breaks run-time detection of AVX2 and newer CPU features.

The fix is very simple:

diff --git a/c/blake3_dispatch.c b/c/blake3_dispatch.c
index 7daf43e..b97b7c0 100644
--- a/c/blake3_dispatch.c
+++ b/c/blake3_dispatch.c
@@ -97,7 +97,7 @@ static void cpuid(uint32_t out[4], uint32_t id) {
   __cpuid((int *)out, id);
 #else
 #if defined(__i386__) || defined(_M_IX86)
-  __asm__ __volatile__("pushl %%ebx\ncpuid\nmovl %%ebp, %%esi\npopl %%ebx"
+  __asm__ __volatile__("pushl %%ebx\ncpuid\nmovl %%ebx, %%esi\npopl %%ebx"
                        : "=a"(out[0]), "=S"(out[1]), "=c"(out[2]), "=d"(out[3])
                        : "a"(id));
 #else
@@ -113,7 +113,7 @@ static void cpuidex(uint32_t out[4], uint32_t id, uint32_t sid) {
   __cpuidex((int *)out, id, sid);
 #else
 #if defined(__i386__) || defined(_M_IX86)
-  __asm__ __volatile__("pushl %%ebx\ncpuid\nmovl %%ebp, %%esi\npopl %%ebx"
+  __asm__ __volatile__("pushl %%ebx\ncpuid\nmovl %%ebx, %%esi\npopl %%ebx"
                        : "=a"(out[0]), "=S"(out[1]), "=c"(out[2]), "=d"(out[3])
                        : "a"(id), "c"(sid));
 #else
@sneves sneves closed this as completed in a1c4c4e Feb 2, 2020
@sneves
Copy link
Collaborator

sneves commented Feb 2, 2020

Nicely spotted!

The fix is a bit more involved that what you suggested; I used xchg instead of a push-pop sequence.

kevingoh pushed a commit to ITS-AT-dev/BLAKE3 that referenced this issue Oct 23, 2023
Thanks to bit4 for spotting this bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants