Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use global_asm to include the sse/avx impls #272

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

roblabla
Copy link

This PR replaces uses Rust's global_asm to include the sse2/sse41/avx2/avx512 assembly functions used by blake3. This is useful to reduce the toolchain requirements of this crates' user, which is especially nice for cross-compilation purposes. We no longer require a working C compiler/assembler to get the fast asm version of blake3 on x86_64!

You'll notice this MR commits a certain amount of code crimes to get it working:

  1. It will always take the windows_gnu.S version of the code, as it doesn't require running the preprocessor on it. However, this means it always uses the win64 ABI, instead of the C ABI. As such, the FFI definitions have to be changed to show this (e.g. the extern "win64"). While this works, it feels terrible.
  2. The asm is modified by the build.rs to fixup the assembly so it can be included in global_asm without issues. In particular, the initial .intel_syntax is removed (that's the default in global_asm), and the {s and }s are doubled to escape them (as a {} is used for formatting).
  3. When compiling for apple, we replace .section .text with .text and .section .rodata with .static_data, to make it compile.

While I think most of this is fine, fixing the ABI so we can keep using extern "C" ABIs would be a bit nicer.

@oconnor663
Copy link
Member

oconnor663 commented Nov 26, 2022

Very impressive code crimes :)

I definitely want to do something like this at some point, using global_asm to free the Rust crate from its C compiler dependency. My current thinking is that I'll combine this with a subtantial refactoring of the internal API, to get the same optimizations working on the XOF side like they work on the input side. I'm actually playing with some of this this weekend, on my Thanksgiving flights. For that reason, I haven't been worrying too much about how we build the current implementations.

@mat-gas
Copy link

mat-gas commented Dec 5, 2022

Hi, any update on this? Were more horrible code crimes committed during your tests?

@roblabla
Copy link
Author

roblabla commented Jan 9, 2023

So I took the time to finally fix the build of this branch. This code crime now passes Criminial Investigation (CI), which I believe makes this a Perfect Crime. Agatha Christie would be proud.

The build failures were due to global_asm exporting all the labels - including the data ones. This would cause issue as, for instance, all of sse2/sse41/avx2/avx512 variants of the assembly would define the same BLAKE3_IV_0 label. So the linker would freak out, saying there were duplicate symbols.

Unfortunately, I don't think it's possible to make a local label with global_asm yet (I looked at the list of directives but I couldn't find anything that looked like what I needed). So I just went with the simple fix of giving all of them a unique name (prefixing the kind of acceleration we're using to the name). This was (surprisingly) enough to get CI to turn green.

Also, something that came to mind while working on this. I believe that with global_asm, there will be symbol conflicts when linking multiple version of the crate. To avoid this, either we'd need to embed some kind of version identifier in the symbol name, or we'd need to rely on inline_asm instead of global_asm, but that's a huge can of worms as well.

@poliorcetics
Copy link

Versioning and writing a unit test that fails (and fix the files) when the version is changed and doesn't match the symbols version (via simple search-and-replace in the assembly) file is probably enough to prevent this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants