Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix _mm_test_mix_ones_zeros and _mm_testnzc_si128 #621

Merged
merged 8 commits into from
Dec 10, 2023

Conversation

aqrit
Copy link
Contributor

@aqrit aqrit commented Dec 2, 2023

Bug: _mm_test_mix_ones_zeros always returned true.
The function wasn't reducing zf and cf to a bool before combining them.

The fix proposed here isn't the most efficient, but at least it is correct.

_mm_testnzc_si128 is an alias for _mm_test_mix_ones_zeros.

Note(s):
The arguments are named incorrectly in the _mm_test_mix_ones_zeros documentation[0].
The second argument is the mask, as per the behavior of _mm_test_mix_ones_zeros with gcc and clang.
This naming error seems to have propagated through both gcc[1] and llvm[2] headers but not to rust[3] headers or sse2neon[4].

[0] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=ptest&techs=SSE_ALL&ig_expand=6902
[1] https://github.com/gcc-mirror/gcc/blob/27ce74fa23c93c1189c301993cd19ea766e6bdb5/gcc/config/i386/smmintrin.h#L94
[2] https://github.com/llvm/llvm-project/blob/70535f5e609f747c28cfef699eefb84581b0aac0/clang/lib/Headers/smmintrin.h#L1130
[3] https://github.com/rust-lang/stdarch/blob/f4528dd6e85d97bb802240d7cd048b6e1bf72540/crates/core_arch/src/x86/sse41.rs#L1149
[4]

FORCE_INLINE int _mm_test_mix_ones_zeros(__m128i a, __m128i mask)

I'm not sure, what all, is wrong here. One error is the use of a logical NOT (`!a[0]`) was instead of a bitwise NOT (`~_a[0]`).
Bug: `_mm_test_mix_ones_zeros` always returned true.
The function wasn't reducing `zf` and `cf` to a bool before combining them.

The fix proposed here isn't the most efficient, but at least it is correct. 

Note(s):
The arguments are named incorrectly in the `_mm_test_mix_ones_zeros` documentation[0].
The second argument is the mask, as per the behavior of `_mm_test_mix_ones_zeros` with gcc and clang.
This naming error seems to have propagated through both gcc[1] and llvm[2] headers but not to rust[3] headers or sse2neon[4].

[0] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=ptest&techs=SSE_ALL&ig_expand=6902
[1] https://github.com/gcc-mirror/gcc/blob/27ce74fa23c93c1189c301993cd19ea766e6bdb5/gcc/config/i386/smmintrin.h#L94
[2] https://github.com/llvm/llvm-project/blob/70535f5e609f747c28cfef699eefb84581b0aac0/clang/lib/Headers/smmintrin.h#L1130
[3] https://github.com/rust-lang/stdarch/blob/f4528dd6e85d97bb802240d7cd048b6e1bf72540/crates/core_arch/src/x86/sse41.rs#L1149
[4] https://github.com/DLTcollab/sse2neon/blob/243e90f654193c97a691b1a53213d091e02eb631/sse2neon.h#L7595
sse2neon.h Outdated Show resolved Hide resolved
tests/impl.cpp Outdated Show resolved Hide resolved
sse2neon.h Show resolved Hide resolved
tests/impl.cpp Outdated
has_ones |= _a[i] & _mask[i];
has_zeros |= ~_a[i] & _mask[i];
}
int32_t result = ((has_ones != 0) & (has_zeros != 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible maybe we can add some edge cases here to test the behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where the values are coming from, I'll leave this to you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible maybe we can add some edge cases here to test the behavior?

@howjmay, Can you show relevant code accordingly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if there is a test case all the bits of mask is zero would be good. It is a case that force the answer to be zero

@howjmay
Copy link
Contributor

howjmay commented Dec 3, 2023

Thank you for the contribution!😍🥳

@jserv
Copy link
Member

jserv commented Dec 6, 2023

@aqrit, Additionally, I would like to acknowledge your significant contributions, including those to sse2zig, in the sse2neon header. Could you please let me know the preferred name or manner in which you'd like to be credited?

@@ -7592,14 +7592,22 @@ FORCE_INLINE int _mm_test_all_zeros(__m128i a, __m128i mask)
// zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero,
// otherwise return 0.
// https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=mm_test_mix_ones_zero
// Note: Argument names may be wrong in the Intel intrinsics guide.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about this. Why do you think the argument may be wrong. Just curious about the reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2nd argument is the mask, by behavior.
Clang, GCC, Rust, etc. pass the 2nd argument as b to _mm_testnzc_si128.
Version 3.4.6 of the Intel® Intrinsics Guide names the second argument mask, while the current version 3.6.6 names the first argument mask.

Why is it now wrong in the docs?
I assume it has something to do with AXV512 always having the mask as the first argument.
Which argument is named mask doesn't matter for _mm_test_all_zeros but it does for _mm_test_mix_ones_zeros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the Release notes:

on 01/24/2023
Corrected parameter order of _mm_test_all_zeros and _mm_test_mix_ones_zeros.

I don't know if this person still maintains this or not:
https://old.reddit.com/r/simd/comments/cck1kh/feedback_on_intel_intrinsics_guide/

I don't have a twitter account so I can't really track them down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following compilers also treat the 2nd arg as mask:
MSVC 19.37
ICC 2021.10.0
ICX 2023.2.1

so the docs definitely don't match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it!! Thank you for explaining!!!

@aqrit
Copy link
Contributor Author

aqrit commented Dec 7, 2023

manner in which you'd like to be credited?

I'd prefer to remain uncredited.

@jserv jserv requested a review from howjmay December 9, 2023 05:03
@jserv jserv merged commit e5c5eff into DLTcollab:master Dec 10, 2023
16 checks passed
@jserv
Copy link
Member

jserv commented Dec 10, 2023

Thank @aqrit for contributing. It is time to make a new release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants