Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardware supported popcount breaks on Windows with CPUs that don't support SSE4 #33

Closed
montekarlos opened this Issue Dec 18, 2018 · 11 comments

Comments

Projects
None yet
4 participants
@montekarlos
Copy link

montekarlos commented Dec 18, 2018

When using boost v1.69 dynamic_bitset::count() will crash on CPUs without SSE4 (required for popcount) when compiled using MSVC v15.9.4 and run on Windows 7. The was compiled as Release x64

Attached is example project to recreate: PopcountCrash.zip

I've had a quick look at the PR, issue and linked mailing list discussion

It appears there was some concern about handling hardware support for MSVC but I can't see anything in the PR or commit that would handle CPUs without SSE4 (i.e __cpuid check)

@jeking3

This comment has been minimized.

Copy link
Collaborator

jeking3 commented Dec 18, 2018

cc: @Izaron

@jeking3 jeking3 added this to the v1.70.0 milestone Dec 18, 2018

@jeking3

This comment has been minimized.

Copy link
Collaborator

jeking3 commented Jan 5, 2019

cc: @Izaron

@glenfe

This comment has been minimized.

Copy link
Member

glenfe commented Feb 20, 2019

What is the plan if the user being CC'd doesn't respond by 1.70 deadlines? Is the change that was committed going to be reverted for 1.70?

@jeking3

This comment has been minimized.

Copy link
Collaborator

jeking3 commented Feb 20, 2019

That's the easiest thing to do, unless someone else steps up to resolve it.

@pdimov

This comment has been minimized.

Copy link

pdimov commented Feb 21, 2019

James, since you committed the PR, it's your responsibility to deal with the regression/crash, one way or another. (The gcc __builtin_popcount path is fine, but the msvc-specific section isn't, so there are options, but something must be done.)

@jeking3

This comment has been minimized.

Copy link
Collaborator

jeking3 commented Feb 21, 2019

Okay, thanks.

@jeking3

This comment has been minimized.

Copy link
Collaborator

jeking3 commented Feb 21, 2019

@pdimov what would your preference be to resolve this:

  1. Disable the MSVC code (easiest).
  2. Add a definition such as BOOST_DYNAMIC_BITSET_HAVE_SSE4 that folks must define in order to enable the code (also easy, requires documentation). Of course, this means they know their code will always run on a modern CPU so it's their choice.
  3. Add static cpuid inspection (ideally C++11 only, so it is threadsafe) and fall back when not present (more difficult, may not have time to do...).

Right now I would opt for one of the first two, which makes the library no worse than it was in 1.68.0 for MSVC. It's unfortunate the original author @Izaron hasn't responded or taken action. For now we could do (1) and then open an issue for (3).

@glenfe

This comment has been minimized.

Copy link
Member

glenfe commented Feb 21, 2019

I suggest #2 and document it accordingly: i.e. The opt-in macro would only affect MSVC; for GCC and Clang using __builtin_popcount unconditionally is fine. (#1 is also acceptable if you don't want to do #2).

@pdimov

This comment has been minimized.

Copy link

pdimov commented Feb 21, 2019

(1) is fine if we don't want to invest too much time into it.

On the surface silently doing the right thing, as in (3), is better than (2), but I'm not sure if the two tests - on the static guard variable and on the actual static variable - would not defeat the gains from using the popcnt instruction.

There is also the option of using an alternate popcount algorithm - the one used by gcc/clang for implementing the builtin when SSE4 isn't available, https://godbolt.org/z/VGP6Z_, https://en.wikipedia.org/wiki/Hamming_weight. This will need to be benchmarked if we choose to use it (on MSVC or in general).

@jeking3

This comment has been minimized.

Copy link
Collaborator

jeking3 commented Feb 21, 2019

I was looking at https://github.com/kimwalisch/libpopcnt/ as there is an implementation in there as well, but I haven't done anything else but look. Benchmarks exist there for the decisions that were made to use different algorithms depending on the size of the bitset.

@pdimov

This comment has been minimized.

Copy link

pdimov commented Feb 21, 2019

FWIW, std::bitset on MSVC uses the same 8 bit table-based implementation as dynamic_bitset: https://godbolt.org/z/YuQqIj

This probably means that it was judged optimal there in the absence of hardware popcnt.

@jeking3 jeking3 closed this in #35 Feb 22, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.