-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hardware supported popcount breaks on Windows with CPUs that don't support SSE4 #33
Comments
cc: @Izaron |
cc: @Izaron |
What is the plan if the user being CC'd doesn't respond by 1.70 deadlines? Is the change that was committed going to be reverted for 1.70? |
That's the easiest thing to do, unless someone else steps up to resolve it. |
James, since you committed the PR, it's your responsibility to deal with the regression/crash, one way or another. (The gcc __builtin_popcount path is fine, but the msvc-specific section isn't, so there are options, but something must be done.) |
Okay, thanks. |
@pdimov what would your preference be to resolve this:
Right now I would opt for one of the first two, which makes the library no worse than it was in 1.68.0 for MSVC. It's unfortunate the original author @Izaron hasn't responded or taken action. For now we could do (1) and then open an issue for (3). |
(1) is fine if we don't want to invest too much time into it. On the surface silently doing the right thing, as in (3), is better than (2), but I'm not sure if the two tests - on the static guard variable and on the actual static variable - would not defeat the gains from using the popcnt instruction. There is also the option of using an alternate popcount algorithm - the one used by gcc/clang for implementing the builtin when SSE4 isn't available, https://godbolt.org/z/VGP6Z_, https://en.wikipedia.org/wiki/Hamming_weight. This will need to be benchmarked if we choose to use it (on MSVC or in general). |
I was looking at https://github.com/kimwalisch/libpopcnt/ as there is an implementation in there as well, but I haven't done anything else but look. Benchmarks exist there for the decisions that were made to use different algorithms depending on the size of the bitset. |
FWIW, This probably means that it was judged optimal there in the absence of hardware popcnt. |
When using boost v1.69 dynamic_bitset::count() will crash on CPUs without SSE4 (required for popcount) when compiled using MSVC v15.9.4 and run on Windows 7. The was compiled as Release x64
Attached is example project to recreate: PopcountCrash.zip
I've had a quick look at the PR, issue and linked mailing list discussion
It appears there was some concern about handling hardware support for MSVC but I can't see anything in the PR or commit that would handle CPUs without SSE4 (i.e __cpuid check)
The text was updated successfully, but these errors were encountered: