Skip to content

fix: remove AVX dependency from SSE x1 implementation#64

Merged
potuz merged 4 commits into
mainfrom
fix/sse-win64-avx-epilog
Mar 26, 2026
Merged

fix: remove AVX dependency from SSE x1 implementation#64
potuz merged 4 commits into
mainfrom
fix/sse-win64-avx-epilog

Conversation

@potuz
Copy link
Copy Markdown
Collaborator

@potuz potuz commented Mar 26, 2026

Summary

  • Win64 epilog fix: Replace vmovdqa (AVX) with movdqa (SSE) in the XMM register restore of sha256_sse_x1.S. The prolog already used movdqa; the epilog was inconsistent and would fault on CPUs with SSSE3 but no AVX.
  • CPU detection fix: The SSE x1 fallback in hashtree.c had a duplicate bit_AVX check, making it dead code. Changed to bit_SSSE3, which is the actual minimum requirement for the SSE x1 implementation (pshufb, palignr).

🤖 Generated with Claude Code

The Win64 epilog in sha256_sse_x1.S used vmovdqa (AVX) to restore
callee-saved XMM registers while the prolog used movdqa (SSE). This
would fault on CPUs with SSE but no AVX. Replace with movdqa to match.

Also fix the CPU detection in hashtree.c where the SSE fallback had a
duplicate bit_AVX check (dead code), replacing it with bit_SSSE3 which
is what the SSE x1 implementation actually requires (pshufb, palignr).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
james-prysm
james-prysm previously approved these changes Mar 26, 2026
james-prysm
james-prysm previously approved these changes Mar 26, 2026
@potuz potuz merged commit d4fa8b9 into main Mar 26, 2026
@potuz potuz deleted the fix/sse-win64-avx-epilog branch March 26, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants