Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 64-bit fixslice AES implementation #180

Merged
merged 1 commit into from
Oct 27, 2020

Conversation

peterdettman
Copy link
Contributor

  • protoype: replaces 32-bit in lib.rs, impls.rs

Not familiar enough with rust to attempt configuration settings.

@tarcieri
Copy link
Member

tarcieri commented Oct 27, 2020

@peterdettman it'd be good to preserve both the 32-bit and 64-bit backends, but select which one to use based on the host CPU. You can do something like:

 #[cfg(target_pointer_width = "32")] 
mod fixslice32;
 #[cfg(target_pointer_width = "64")] 
mod fixslice64;

If you're having trouble, the cfg-if crate can help:

https://github.com/alexcrichton/cfg-if

@tarcieri
Copy link
Member

Benchmarks look very nice:

test aes128_encrypt8 ... bench:         626 ns/iter (+/- 81) = 204 MB/s
test aes192_encrypt8 ... bench:         742 ns/iter (+/- 76) = 172 MB/s
test aes256_encrypt8 ... bench:         851 ns/iter (+/- 162) = 150 MB/s

@peterdettman
Copy link
Contributor Author

@tarcieri Yeah, I certainly want both to coexist, but I'd really prefer to leave the cfg stuff to you. It's not just cfg either; the 64-bit version does 4 blocks at a time (instead of 2) and somehow the encrypt_block(s) methods in impl.rs will need to know how many blocks the inner implementation is handling.

@tarcieri
Copy link
Member

tarcieri commented Oct 27, 2020

@peterdettman okay, if you'd like I can try to handle the gating.

How about renaming the existing implementation to fixslice32.rs and then removing the mod fixslice32 directive for it, and I can do an additional pass to wire it back up again?

It's not just cfg either; the 64-bit version does 4 blocks at a time (instead of 2) and somehow the encrypt_block(s) methods in impl.rs will need to know how many blocks the inner implementation is handling.

Can you make them all 8 for now and add a TODO to change ParBlocks in the next release? It's a breaking change.

I think there's also some discussion to be had there. Perhaps we should try to align all of the implementations exposed via the aes crate to be 4-blocks (which is also the ideal number for AES-NI on modern Intel archs).

Edit: opened an issue about changing ParBlocks in the next breaking release: #181

@peterdettman
Copy link
Contributor Author

OK, I've renamed fixslice.rs to fixslice32.rs and lib.rs now switches b/w them on cfg(target_pointer_width = "64"). The fixslice implementations now have a FIXSLICE_BLOCKS constant that impls.rs uses to break up the 8 blocks into 2 or 4 chunks respectively.

@tarcieri tarcieri requested review from tarcieri and newpavlov and removed request for tarcieri October 27, 2020 16:21
@@ -24,6 +24,7 @@ macro_rules! define_aes_impl {
$key_size:ty,
$rounds:expr,
$rounds2:ty,
$fixslice_blocks:expr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the same for all impls it doesn't need to be a macro parameter

@@ -70,7 +71,7 @@ macro_rules! define_aes_impl {

#[inline]
fn encrypt_block(&self, block: &mut Block) {
let mut blocks = [Block::default(); 2];
let mut blocks = [Block::default(); $fixslice_blocks];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the FIXSLICE_BLOCKS constant directly here:

Suggested change
let mut blocks = [Block::default(); $fixslice_blocks];
let mut blocks = [Block::default(); FIXSLICE_BLOCKS];

@@ -85,7 +86,7 @@ macro_rules! define_aes_impl {

#[inline]
fn encrypt_blocks(&self, blocks: &mut ParBlocks) {
for chunk in blocks.chunks_mut(2) {
for chunk in blocks.chunks_mut($fixslice_blocks) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...also here:

Suggested change
for chunk in blocks.chunks_mut($fixslice_blocks) {
for chunk in blocks.chunks_mut(FIXSLICE_BLOCKS) {

@tarcieri
Copy link
Member

tarcieri commented Oct 27, 2020

@peterdettman looks great aside from one nit!

After landing this, I can expand the CI config to run the tests on 32-bit architectures as well as add some cross-based testing on ARM. That way we can ensure CI tests both the 32-bit and 64-bit backends.

- protoype: replaces 32-bit in lib.rs, impls.rs
@peterdettman
Copy link
Contributor Author

Fixed nits and rebased.

@peterdettman
Copy link
Contributor Author

@tarcieri I’d be interested in seeing some arm (32/64) benchmarks for these fixslice implementations.

@tarcieri
Copy link
Member

@peterdettman I'll be looking into benchmarking it on Cortex-A and Cortex-M (32-bit).

Somewhere around here I have some aarch64 devices (RasPi) I could benchmark it on too.

@tarcieri tarcieri merged commit 316fcd6 into RustCrypto:master Oct 27, 2020
@tarcieri tarcieri mentioned this pull request Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants