-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[libc++] Use _BitScanForward64 more often, by fixing availability detection, avoiding calling _BitScanForward twice #142000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[libc++] Use _BitScanForward64 more often, by fixing availability detection, avoiding calling _BitScanForward twice #142000
Conversation
… src/include/ryu/ryu.h). Use MSVC _BitScanForward64 on _M_AMD64 and _M_ARM64, but not on the _M_ARM. Remove erroneous public #define _LIBCPP_HAS_BITSCAN64 (should be defined for _M_ARM64 but not for _M_ARM).
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-libcxx Author: Eugene Golushkov (eugenegff) ChangesUse our private _BitScanForward64 for non-MSVC (in src/include/ryu/ryu.h). Full diff: https://github.com/llvm/llvm-project/pull/142000.diff 3 Files Affected:
diff --git a/libcxx/include/__config b/libcxx/include/__config
index 110450f6e9c51..316800681ec3f 100644
--- a/libcxx/include/__config
+++ b/libcxx/include/__config
@@ -222,15 +222,9 @@ _LIBCPP_HARDENING_MODE_DEBUG
# if defined(_MSC_VER) && !defined(__MINGW32__)
# define _LIBCPP_MSVCRT // Using Microsoft's C Runtime library
# endif
-# if (defined(_M_AMD64) || defined(__x86_64__)) || (defined(_M_ARM) || defined(__arm__))
-# define _LIBCPP_HAS_BITSCAN64 1
-# else
-# define _LIBCPP_HAS_BITSCAN64 0
-# endif
# define _LIBCPP_HAS_OPEN_WITH_WCHAR 1
# else
# define _LIBCPP_HAS_OPEN_WITH_WCHAR 0
-# define _LIBCPP_HAS_BITSCAN64 0
# endif // defined(_WIN32)
# if defined(_AIX) && !defined(__64BIT__)
diff --git a/libcxx/include/__cxx03/__config b/libcxx/include/__cxx03/__config
index ef47327d96355..4dac5964ff917 100644
--- a/libcxx/include/__cxx03/__config
+++ b/libcxx/include/__cxx03/__config
@@ -229,9 +229,6 @@ _LIBCPP_HARDENING_MODE_DEBUG
# if defined(_MSC_VER) && !defined(__MINGW32__)
# define _LIBCPP_MSVCRT // Using Microsoft's C Runtime library
# endif
-# if (defined(_M_AMD64) || defined(__x86_64__)) || (defined(_M_ARM) || defined(__arm__))
-# define _LIBCPP_HAS_BITSCAN64
-# endif
# define _LIBCPP_HAS_OPEN_WITH_WCHAR
# endif // defined(_WIN32)
diff --git a/libcxx/src/ryu/d2s.cpp b/libcxx/src/ryu/d2s.cpp
index c0d11107f880b..0cab0a2ba6d62 100644
--- a/libcxx/src/ryu/d2s.cpp
+++ b/libcxx/src/ryu/d2s.cpp
@@ -479,7 +479,7 @@ struct __floating_decimal_64 {
36893488u, 7378697u, 1475739u, 295147u, 59029u, 11805u, 2361u, 472u, 94u, 18u, 3u };
unsigned long _Trailing_zero_bits;
-#if _LIBCPP_HAS_BITSCAN64
+#if !defined(_MSC_VER) || defined(_M_AMD64) || defined(_M_ARM64) // we have own _BitScanForward64 for non-MSVC
(void) _BitScanForward64(&_Trailing_zero_bits, __v.__mantissa); // __v.__mantissa is guaranteed nonzero
#else // ^^^ 64-bit ^^^ / vvv 32-bit vvv
const uint32_t _Low_mantissa = static_cast<uint32_t>(__v.__mantissa);
|
Ping |
@@ -479,7 +479,7 @@ struct __floating_decimal_64 { | |||
36893488u, 7378697u, 1475739u, 295147u, 59029u, 11805u, 2361u, 472u, 94u, 18u, 3u }; | |||
|
|||
unsigned long _Trailing_zero_bits; | |||
#if _LIBCPP_HAS_BITSCAN64 | |||
#if !defined(_MSC_VER) || defined(_M_AMD64) || defined(_M_ARM64) // we have own _BitScanForward64 for non-MSVC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just use std::countr_zero
? That'd make the code below unnecessary and we can drop this condition altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::to_chars is C++17, std::countr_zero is C++20, but probably std::__countr_zero will work in C++17 mode
Codegen is different for x64 https://godbolt.org/z/TjeT661eT, std::countr_zero translates to MOV EAX,64; TZCNT == REP BSF, that properly works with zero input, but may be more expensive than simple BSF - but benefits of the standard function can be more important. _BitScanForward64 translates into BSF without REP prefix and without preceding MOV EAX,64. Arm64 codegen is the same for both _BitScanForward64 and std::countr_zero
I don't know, how much performance we are ready to sacrifice - in performance oriented chunk of code. Let consider this be outside of current pull request, as currently it also fixes attempt to use unavailable _BitScanForward64 on _M_ARM on Windows, without any negative performance effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're comparing apples and oranges here. When using clang the code gen is almost identical, with just a single additional mov. I doubt very much that a single mov makes a significant difference in performance. If you can show a significant difference in performance we can also use __builtin_ctzg
directly with a comment. Re. C++20: we're in dylib code, so when a feature was introduced doesn't matter.
Use our private _BitScanForward64 for non-MSVC (in src/include/ryu/ryu.h).
Use MSVC _BitScanForward64 on _M_AMD64 and _M_ARM64, but not on the _M_ARM.
Remove erroneous public #define _LIBCPP_HAS_BITSCAN64 (should be defined for _M_ARM64 but not for _M_ARM).