New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use BMI2 pdep/pext instructions #6
Comments
|
I am aware of the BMI2 instructions set - will try to work this in! |
|
I'm implementing this in 1b57576, but without the fallback to a for-based implementation if the BMI2 instruction set is unsupported. Hard to detect this at compile-time on a MSVC compiler, too. |
|
You can do something like: to allow users on windows to pass it a flag. I have zero experience with windows, but MSVC must offer a way to detect |
|
Anyhow, when BMI2 is not supported, the "for-based fallback implementation" of |
|
Yeah, better fall back on LUT-based or magicbits based methods. At compile time, you'd think MSVC would have a flag like GCC's At runtime, you can manually check the CPUID bits, but it obviously kills performance to do that for every libmorton call. So for now, I'm just going to use the |
This is technically correct but AFAIK all existing CPUs from both Intel and AMD that support AVX2 also support BMI2, so checking from AVX2 compilation on MSVC might be "good enough". |
BMI2 provides parallel bit deposit/extract instructions that allow an efficient encoding/decoding of morton codes. Something like the following should do the trick. Basically in 3D one needs 3 calls to pdep/pext intrinsics to encode/decode morton codes.
The text was updated successfully, but these errors were encountered: