Feature/excess v3#47
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #47 +/- ##
==========================================
+ Coverage 87.29% 87.36% +0.07%
==========================================
Files 12 13 +1
Lines 3117 3214 +97
Branches 612 602 -10
==========================================
+ Hits 2721 2808 +87
- Misses 260 267 +7
- Partials 136 139 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@codex please review |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
08f5300 to
32355a2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 32355a2b0c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| total_match = _mm512_maskz_mov_epi8(active, total_match); | ||
|
|
||
| const __m512i res = _mm512_maddubs_epi16(total_match, vmult); | ||
| const __m256i packed = _mm512_cvtepi16_epi8(res); |
There was a problem hiding this comment.
Pack LUT bytes with unsigned narrowing
res encodes two 4-bit position masks per byte via maddubs, so valid byte values frequently exceed 127 (e.g. when the high nibble contains bit 0x8, yielding values >= 128). Using _mm512_cvtepi16_epi8 performs signed saturation (vpmovwb), which clamps those bytes to 127 and clears high-nibble match bits, producing incorrect out masks for reachable targets on AVX-512 builds. Use an unsigned pack/narrow path (e.g. _mm512_cvtusepi16_epi8) to preserve full 0..255 byte values.
Useful? React with 👍 / 👎.
Optimized LUT and expand versions of excess algorithm