Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable sha1 and sha2 AArch64 extensions from asm-hashes #97

Merged
merged 3 commits into from
Jan 6, 2020

Conversation

linkmauve
Copy link
Contributor

@linkmauve linkmauve commented Jan 4, 2020

This follows RustCrypto/asm-hashes#10

getauxval() is extremely cheap, as it is just a pointer to the beginning of the stack, so it doesn’t make much sense to use a lazy_static!() to remember the flags (at least I can’t see any difference while profiling).

Sorry about #96, I force-pushed from the wrong repository and can’t reopen the PR due to that…

Here are some benchmarks on my Nintendo Switch:

Without this patch (or without --features=asm):

test sha1_10      ... bench:          66 ns/iter (+/- 0) = 151 MB/s
test sha1_100     ... bench:         497 ns/iter (+/- 1) = 201 MB/s
test sha1_1000    ... bench:       4,799 ns/iter (+/- 16) = 208 MB/s
test sha1_10000   ... bench:      47,646 ns/iter (+/- 134) = 209 MB/s
test sha256_10    ... bench:          98 ns/iter (+/- 0) = 102 MB/s
test sha256_100   ... bench:         815 ns/iter (+/- 4) = 122 MB/s
test sha256_1000  ... bench:       7,980 ns/iter (+/- 76) = 125 MB/s
test sha256_10000 ... bench:      79,373 ns/iter (+/- 793) = 125 MB/s

With this patch:

test sha1_10      ... bench:          32 ns/iter (+/- 1) = 312 MB/s
test sha1_100     ... bench:         165 ns/iter (+/- 1) = 606 MB/s
test sha1_1000    ... bench:       1,480 ns/iter (+/- 5) = 675 MB/s
test sha1_10000   ... bench:      14,246 ns/iter (+/- 26) = 701 MB/s
test sha256_10    ... bench:          30 ns/iter (+/- 0) = 333 MB/s
test sha256_100   ... bench:         167 ns/iter (+/- 0) = 598 MB/s
test sha256_1000  ... bench:       1,554 ns/iter (+/- 7) = 643 MB/s
test sha256_10000 ... bench:      15,231 ns/iter (+/- 88) = 656 MB/s

@newpavlov
Copy link
Member

Thank you for your PRs!

IIUC your assembly code uses instructions from the crypto extension. I would be interested to see it implemented using std intrinsics instead of assembly and compare the resulting performance. It will require usage of a nigthly compiler, so having assembly may be a fine temporary solution, but in future I think we will replace your assembly with a code which does not use the extension instructions.

One thing about which I am not sure is usage of getauxval. It's a Linux-specific function, so at the very least you should gate the code appropriately. Ideally we should use AArch64 analogue of is_x86_feature_detected!, but since we target no_std and ARM, it's not so easy... We will have to wait for rust-lang/rfcs#2725 to do it properly.

One alternative could be to gate your asm on target_feature = "crypto". Users will have to explicitly enable this feature using compiler flags (or by using target-cpu=native) to get the HW acceleration. It would simplify code, but will make it harder to discover this option and will remove the runtime detection, which is not ideal...

@linkmauve
Copy link
Contributor Author

linkmauve commented Jan 5, 2020

I originally wanted to use intrinsics, but I’d like the xmpp-rs project to work on stable too (and be fast at that). I’m fine with doing that work again using intrinsics once a path towards stabilisation of AArch64 intrinsics has been decided. Also note that intrinsics aren’t inherently safer than assembly. What do you mean by “code which does not use the extension instructions”? Something which is approximately as slow as the Rust version?

is_aarch64_feature_detected!() is also unstable (see rust-lang/rust#27731), so I guess I will expose it only on Linux for now, with a TODO to replace it with the macro once the stdsimd feature leaves nightly.

Yeah no, I’d rather have this enabled the same way as on x86, to simplify its usage. We can migrate to the target_feature once both of the above issues have been stabilised.

Copy link
Member

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's go with your approach after adding the appropriate gates and replace it sometime in the future after stabilization of the AArch64 intrinsics.

What do you mean by “code which does not use the extension instructions”? Something which is approximately as slow as the Rust version?

Roughly, yes. IIRC even the current not-so-optimal assembly is a bit faster than the pure Rust version, so I think using OpenSSL assembly will result in a performance boost even when the crypto extension is not available. See RustCrypto/asm-hashes#5 for more details.

sha2/src/aarch64.rs Outdated Show resolved Hide resolved
sha2/src/aarch64.rs Outdated Show resolved Hide resolved
@@ -0,0 +1,11 @@
// From sys/auxv.h
const AT_HWCAP: u64 = 16;
const HWCAP_SHA2: u64 = 64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW is it possible that processor supports SHA-2 instructions, but not SHA-1 and/or AES? I thought they are part of the same crypto extension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “crypto extension” is an ARM marketing term for a Cortex-A core bundling these three extensions, and possibly the sha512, sha3, and more in the future.

I’m not aware of any actual CPU exposing e.g. the sha2 flag but not sha1, but given they are different flags I’d rather test them properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using getauxval it does not matter, but during migration to intrinsics we will have to use a more coarse grained target features, which only know about the crypto extension. But I guess since LLVM does not introduce separate finer-grained features it should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it work for e.g. the sha3 flag?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't quite get how LLVM handles features, e.g. in here they write:

Extensions can now have different meanings based on the base architecture they apply to. For example on AArch64, 'crypto' means different things for v8.{1,2,3}-a than v8.4-a. The former adds 'sha2' and 'aes', the latter adds those and 'sm4' and 'sha3' on top.

I wonder why they use crypto instead of finer-grained sha1, sha2, sha512, sha3 and aes features...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They apparently noticed they made a mistake and now have aliased crypto to sha2 + aes, see llvm-mirror/llvm@6634844#diff-5ff776706657bdcfc45aa7d7159bfeadR45

We should be able to use each feature separately nowadays.

@linkmauve linkmauve force-pushed the aarch64 branch 3 times, most recently from 8737a0f to 1bcb1af Compare January 5, 2020 23:12
@linkmauve
Copy link
Contributor Author

I have also added an ARM worker to the CI, it only fails because there has been no sha1-asm and sha2-asm release containing my code yet (so it wouldn’t work anyway even if you merged it).

@linkmauve linkmauve force-pushed the aarch64 branch 2 times, most recently from 227cc2c to 078c1b4 Compare January 6, 2020 00:55
linkmauve added a commit to linkmauve/asm-hashes that referenced this pull request Jan 6, 2020
This only adds AArch64 support, everything else should be identical, so
we can make a patch release instead of a minor one.

RustCrypto/hashes#97 depends on that release.
@linkmauve
Copy link
Contributor Author

Once RustCrypto/asm-hashes#14 is merged the CI should be green again, after which I think this PR is good to merge. :)

The feature gate is specific to glibc on Linux, using getauxval() to
check the CPU flags, so it’s currently not used on other systems.
Patches welcome. :)

Enables RustCrypto/asm-hashes#10
The feature gate is specific to glibc on Linux, using getauxval() to
check the CPU flags, so it’s currently not used on other systems.
Patches welcome. :)

The sha2 extension only provides SHA-256 instructions, so keep using the
plain Rust implementation for SHA-512 until someone adds sha512
extension assembly (I have no computer to test that with).
@newpavlov newpavlov merged commit 9edb0a5 into RustCrypto:master Jan 6, 2020
@newpavlov
Copy link
Member

FYI: crate updates are published now!

@linkmauve linkmauve deleted the aarch64 branch January 6, 2020 10:06
linkmauve added a commit to linkmauve/rust-websocket that referenced this pull request Jan 6, 2020
This other crate is being maintained, it offers slightly better
performances (when using the `asm` feature, especially [on
AArch64](RustCrypto/hashes#97)).  It also allows
deduplicating SHA-1 crates in cargo-web.
linkmauve added a commit to linkmauve/rust-websocket that referenced this pull request Jan 6, 2020
This other crate is being maintained, it offers better performances
(when using the `asm` feature, especially [on
AArch64](RustCrypto/hashes#97)).  It also allows
deduplicating SHA-1 crates in cargo-web.
linkmauve added a commit to linkmauve/rust-websocket that referenced this pull request Jan 6, 2020
This other crate is being maintained, it offers better performances
(when using the `asm` feature, especially [on
AArch64](RustCrypto/hashes#97)).  It also allows
deduplicating SHA-1 crates in cargo-web.
linkmauve added a commit to linkmauve/cargo-web that referenced this pull request Jan 6, 2020
This other crate is being maintained, and it offers better performances
when using the `asm` feature (especially [on
AArch64](RustCrypto/hashes#97)).

Once websockets-rs/rust-websocket#251 is merged,
it will also allow removing this extra crate from the build.
JohnTitor pushed a commit to actix/actix-web that referenced this pull request Jan 10, 2020
* Replace sha1 dependency with sha-1

This other crate is being maintained, and it offers better performances
when using the `asm` feature (especially [on
AArch64](RustCrypto/hashes#97)).

* Update CHANGES.md with the sha-1 migration

* Add a test for hash_key()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants