Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simd leading zero count #2

Closed
stevenhoving opened this issue Jun 2, 2020 · 2 comments
Closed

Simd leading zero count #2

stevenhoving opened this issue Jun 2, 2020 · 2 comments

Comments

@stevenhoving
Copy link

stevenhoving commented Jun 2, 2020

Some time ago I came across an old book of mine (hackers delight). And for some reason I opened the part about zero counting. Showing the extremely smart tricks to a friend. A week ago I did read your article as it was posted on reddit.com/r/cpp. And this morning I saw your comment on the microsoft stl issue (microsoft/STL#870) related to your article. Long story short I did a 1+1 a figured that I could at the very least give you an working simd sample of one of the branchless hackers delight samples.
I did not benchmark this, and I am kinda throwing it over the fence but I guess you would appreciate it anyway.

int count_zeros_leading(unsigned int v)
{
    unsigned a = v & ~(v >> 1);

    auto b = (float)a + 0.5f;
    auto  c = *(int*)&b;
    auto d = c >> 23;
    return 158 - d;
}

__m128i count_zeros_simd_leading(__m128i v)
{
    auto a = _mm_srli_epi32(v, 1);
    auto b = _mm_sub_epi32(_mm_set1_epi32(0xFFFFFFFF), a);
    auto c = _mm_and_si128(v, b);
    auto d = _mm_add_ps(_mm_cvtepi32_ps(c), _mm_set1_ps(0.5));
    auto e = _mm_castps_si128(d);

    auto f = _mm_srli_epi32(e, 23);
    return _mm_sub_epi32(_mm_set1_epi32(158), f);
}

https://godbolt.org/z/Snm4Ld

Update:
This works, but its slow. I think best is to do _mm_cmpeq_epi8 + _mm_movemask_epi8 and count the zeros.

Like this:

inline std::uint64_t get_digit_count_from_numeric_mask(__m128i v)
{
    const auto mask = _mm_movemask_epi8(_mm_cmpeq_epi8(v, _mm_setzero_si128()));
    const auto count = __lzcnt16(mask);
    return 16 - count;
}
@stevenhoving stevenhoving changed the title Simd leading zero counting Simd leading zero count Jun 2, 2020
@KholdStare
Copy link
Owner

Hey! Thanks for reading and suggesting. The movemask is super useful here, thanks!
I made a commit using movemask, and it's shrunk the runtime of the general benchmark from 2.5ns to 1.9ns! Nice. Now need a fix for that pesky left shift.

I had to use tzcnt instead of lzcnt because a string can contain other digits after the number we care about parsing, so lzcnt could count too many digits. Hence we need to look at trailing bits. E.g. consider parsing from a string like "1, 2, 3, 4, 5, 6". Also, the mask returned earlier from cmplt already has the most significant bits set in each byte, so no need to _mm_cmpeq_epi8(v, _mm_setzero_si128())

Perhaps I should add a CONTRIBUTORS file and add you there.

@stevenhoving
Copy link
Author

I would appreciate that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants