Skip to content

[libc++] Use _BitScanForward64 more often, by fixing availability detection, avoiding calling _BitScanForward twice #142000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

eugenegff
Copy link

Use our private _BitScanForward64 for non-MSVC (in src/include/ryu/ryu.h).
Use MSVC _BitScanForward64 on _M_AMD64 and _M_ARM64, but not on the _M_ARM.
Remove erroneous public #define _LIBCPP_HAS_BITSCAN64 (should be defined for _M_ARM64 but not for _M_ARM).

… src/include/ryu/ryu.h).

Use MSVC _BitScanForward64 on _M_AMD64 and _M_ARM64, but not on the _M_ARM.
Remove erroneous public #define _LIBCPP_HAS_BITSCAN64 (should be defined for _M_ARM64 but not for _M_ARM).
@eugenegff eugenegff requested a review from a team as a code owner May 29, 2025 18:04
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label May 29, 2025
@llvmbot
Copy link
Member

llvmbot commented May 29, 2025

@llvm/pr-subscribers-libcxx

Author: Eugene Golushkov (eugenegff)

Changes

Use our private _BitScanForward64 for non-MSVC (in src/include/ryu/ryu.h).
Use MSVC _BitScanForward64 on _M_AMD64 and _M_ARM64, but not on the _M_ARM.
Remove erroneous public #define _LIBCPP_HAS_BITSCAN64 (should be defined for _M_ARM64 but not for _M_ARM).


Full diff: https://github.com/llvm/llvm-project/pull/142000.diff

3 Files Affected:

  • (modified) libcxx/include/__config (-6)
  • (modified) libcxx/include/__cxx03/__config (-3)
  • (modified) libcxx/src/ryu/d2s.cpp (+1-1)
diff --git a/libcxx/include/__config b/libcxx/include/__config
index 110450f6e9c51..316800681ec3f 100644
--- a/libcxx/include/__config
+++ b/libcxx/include/__config
@@ -222,15 +222,9 @@ _LIBCPP_HARDENING_MODE_DEBUG
 #    if defined(_MSC_VER) && !defined(__MINGW32__)
 #      define _LIBCPP_MSVCRT // Using Microsoft's C Runtime library
 #    endif
-#    if (defined(_M_AMD64) || defined(__x86_64__)) || (defined(_M_ARM) || defined(__arm__))
-#      define _LIBCPP_HAS_BITSCAN64 1
-#    else
-#      define _LIBCPP_HAS_BITSCAN64 0
-#    endif
 #    define _LIBCPP_HAS_OPEN_WITH_WCHAR 1
 #  else
 #    define _LIBCPP_HAS_OPEN_WITH_WCHAR 0
-#    define _LIBCPP_HAS_BITSCAN64 0
 #  endif // defined(_WIN32)
 
 #  if defined(_AIX) && !defined(__64BIT__)
diff --git a/libcxx/include/__cxx03/__config b/libcxx/include/__cxx03/__config
index ef47327d96355..4dac5964ff917 100644
--- a/libcxx/include/__cxx03/__config
+++ b/libcxx/include/__cxx03/__config
@@ -229,9 +229,6 @@ _LIBCPP_HARDENING_MODE_DEBUG
 #    if defined(_MSC_VER) && !defined(__MINGW32__)
 #      define _LIBCPP_MSVCRT // Using Microsoft's C Runtime library
 #    endif
-#    if (defined(_M_AMD64) || defined(__x86_64__)) || (defined(_M_ARM) || defined(__arm__))
-#      define _LIBCPP_HAS_BITSCAN64
-#    endif
 #    define _LIBCPP_HAS_OPEN_WITH_WCHAR
 #  endif // defined(_WIN32)
 
diff --git a/libcxx/src/ryu/d2s.cpp b/libcxx/src/ryu/d2s.cpp
index c0d11107f880b..0cab0a2ba6d62 100644
--- a/libcxx/src/ryu/d2s.cpp
+++ b/libcxx/src/ryu/d2s.cpp
@@ -479,7 +479,7 @@ struct __floating_decimal_64 {
           36893488u, 7378697u, 1475739u, 295147u, 59029u, 11805u, 2361u, 472u, 94u, 18u, 3u };
 
         unsigned long _Trailing_zero_bits;
-#if _LIBCPP_HAS_BITSCAN64
+#if !defined(_MSC_VER) || defined(_M_AMD64) || defined(_M_ARM64) // we have own _BitScanForward64 for non-MSVC
         (void) _BitScanForward64(&_Trailing_zero_bits, __v.__mantissa); // __v.__mantissa is guaranteed nonzero
 #else // ^^^ 64-bit ^^^ / vvv 32-bit vvv
         const uint32_t _Low_mantissa = static_cast<uint32_t>(__v.__mantissa);

@eugenegff
Copy link
Author

Ping

@@ -479,7 +479,7 @@ struct __floating_decimal_64 {
36893488u, 7378697u, 1475739u, 295147u, 59029u, 11805u, 2361u, 472u, 94u, 18u, 3u };

unsigned long _Trailing_zero_bits;
#if _LIBCPP_HAS_BITSCAN64
#if !defined(_MSC_VER) || defined(_M_AMD64) || defined(_M_ARM64) // we have own _BitScanForward64 for non-MSVC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just use std::countr_zero? That'd make the code below unnecessary and we can drop this condition altogether.

Copy link
Author

@eugenegff eugenegff Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::to_chars is C++17, std::countr_zero is C++20, but probably std::__countr_zero will work in C++17 mode

Codegen is different for x64 https://godbolt.org/z/TjeT661eT, std::countr_zero translates to MOV EAX,64; TZCNT == REP BSF, that properly works with zero input, but may be more expensive than simple BSF - but benefits of the standard function can be more important. _BitScanForward64 translates into BSF without REP prefix and without preceding MOV EAX,64. Arm64 codegen is the same for both _BitScanForward64 and std::countr_zero

I don't know, how much performance we are ready to sacrifice - in performance oriented chunk of code. Let consider this be outside of current pull request, as currently it also fixes attempt to use unavailable _BitScanForward64 on _M_ARM on Windows, without any negative performance effects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're comparing apples and oranges here. When using clang the code gen is almost identical, with just a single additional mov. I doubt very much that a single mov makes a significant difference in performance. If you can show a significant difference in performance we can also use __builtin_ctzg directly with a comment. Re. C++20: we're in dylib code, so when a feature was introduced doesn't matter.

@philnik777 philnik777 changed the title [Ryu, performance] Use _BitScanForward64 more often, by fixing availability detection, avoiding calling _BitScanForward twice [libc++] Use _BitScanForward64 more often, by fixing availability detection, avoiding calling _BitScanForward twice Jun 11, 2025
@eugenegff eugenegff requested a review from philnik777 June 17, 2025 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants