Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aes: soft hazmat backend #268

Merged
merged 1 commit into from May 31, 2021
Merged

aes: soft hazmat backend #268

merged 1 commit into from May 31, 2021

Conversation

tarcieri
Copy link
Member

@tarcieri tarcieri commented May 30, 2021

The hazmat API provides access to the raw AES cipher round, equivalent inverse cipher round, mix columns, and inverse mix column operations.

This PR wires up support for these operations in the "soft" backend (or more specifically, both the 32-bit and 64-bit fixsliced backends).

It would benefit from a parallel API instead of what's currently provided, however that's left for future work.

@tarcieri
Copy link
Member Author

tarcieri commented May 30, 2021

As an initial attempt I tried to implement the hazmat::cipher_round function on top of the 64-bit fixsliced backend.

It's not quite working but it seems close. I'm not quite sure what I'm missing:

---- cipher_round_fips197_vectors stdout ----
thread 'cipher_round_fips197_vectors' panicked at 'assertion failed: `(left == right)`
  left: `[137, 221, 28, 235, 133, 83, 202, 111, 45, 21, 79, 211, 203, 19, 139, 235]`,
 right: `[137, 216, 16, 232, 133, 90, 206, 104, 45, 24, 67, 216, 203, 18, 143, 228]`', aes/tests/hazmat.rs:82:9

(left is actual, right is expected from the test vector)

The deltas in bits look like this (broken down by AES word):

0b0, 0b101, 0b1100, 0b11,
0b0, 0b1001, 0b100, 0b111,
0b0, 0b1101, 0b1100, 0b1011,
0b0, 0b1, 0b100, 0b1111

Unfortunately I can't directly compare to FIPS 197 step-by-step due to the key schedule being bitsliced and reordered.

@tarcieri
Copy link
Member Author

tarcieri commented May 30, 2021

@peterdettman I don't suppose you have any insights here?

For context the intended use case here is Deoxys, or any other construction built on the raw AES round function.

@peterdettman
Copy link
Contributor

peterdettman commented May 31, 2021

@tarcieri Maybe I can take a closer look tomorrow, but if it's using the existing key schedule I guess the sub_bytes_nots is the problem (remove it). Although a more natural way to write the round function would be sub_bytes/shift_rows_1/mix_columns_0/add_round_key (this will also be the fastest).

Edit: Oh I see it's a supplied round key. Then keep sub_bytes_nots and just use the method order above I think. (the way you have it currently, the round key hasn't been prepared with an inv_shift_rows_1 call).

@tarcieri
Copy link
Member Author

tarcieri commented May 31, 2021

if it's using the existing key schedule I guess the sub_bytes_nots is the problem (remove it).

It's not. The goal is to support an AES-NI like API which can work with the standard FIPS 197-style key schedule (edit: or more specifically in the immediate intended use case, Deoxys's key schedule). We have backends working on AES-NI and the ARMv8 Cryptography Extensions. That said...

Although a more natural way to write the round function would be sub_bytes/shift_rows_1/mix_columns_0/add_round_key (this will also be the fastest).

I swear I tried this before, but if I do that with the inclusion of sub_bytes_nots, i.e.

  • sub_bytes
  • sub_bytes_nots
  • shift_rows_1
  • mix_columns_0
  • add_round_key

...it works! 🎉

@tarcieri tarcieri force-pushed the aes/soft-hazmat-backend branch 2 times, most recently from 5cfe35f to f831fbe Compare May 31, 2021 16:42
@tarcieri
Copy link
Member Author

Update: I now have all 4 operations (cipher, equiv inverse cipher, mix columns, and inv mix columns) working on the 64-bit backend.

Gonna do the 32-bit one.

Something else we should definitely consider, especially for performance, is a ParBlocks-based API which accepts an array of round keys. I assume that's useful in Deoxys @zer0x64?

@zer0x64
Copy link

zer0x64 commented May 31, 2021

You mean an API that can do multiple rounds instead of a single one? I think that would help the compiler a lot with auto-vectorization, especially if it's able to unroll the loop and reuse the SIMD registers instead of reload/saving it at each iteration.

Another thing we might need to consider is dynamic AES-NI detection, although I'm not sure the best way to do it.

@tarcieri
Copy link
Member Author

You mean an API that can do multiple rounds instead of a single one? I think that would help the compiler a lot with auto-vectorization, especially if it's able to unroll the loop and reuse the SIMD registers instead of reload/saving it at each iteration.

Yep. Each invocation to the soft backend is actually computing 4 blocks in parallel on 64-bit archs (2 blocks on 32-bit ones), so it's pretty wasteful to shoehorn a single block API on top of it.

I can take a crack at adding a parallel API after I get an initial PoC working.

Another thing we might need to consider is dynamic AES-NI detection, although I'm not sure the best way to do it.

It's already implemented, and works portably across x86(-64) and ARMv8:

https://github.com/RustCrypto/block-ciphers/blob/master/aes/src/hazmat.rs#L47-L55

The `hazmat` API provides access to the raw AES cipher round, equivalent
inverse cipher round, mix columns, and inverse mix column operations.

This commit wires up support in the "soft" backend (or more
specifically, both the 32-bit and 64-bit fixsliced backends).

It would benefit from a parallel API instead of what's currently
provided, however that's left for future work.
@tarcieri tarcieri changed the title [WIP] aes: soft hazmat backend aes: soft hazmat backend May 31, 2021
@tarcieri tarcieri marked this pull request as ready for review May 31, 2021 17:06
@tarcieri tarcieri merged commit 758169d into master May 31, 2021
@tarcieri tarcieri deleted the aes/soft-hazmat-backend branch May 31, 2021 17:06
@zer0x64
Copy link

zer0x64 commented May 31, 2021

Not sure how using 4 parallel blocks would be useful for Deoxys, as each blocks uses a different set of round keys(the block number is used in the key schedule)

@tarcieri
Copy link
Member Author

In a prospective API for this, you'd pass in an array of round keys and an array of blocks, and the parallel API could apply a particular round key to a particular block.

I can open a PR for it and we can discuss.

@peterdettman
Copy link
Contributor

@tarcieri Note that there's no real reason to bitslice the round key here, instead it could be applied after the inv_bitslice of the state (and then only needed for the single output block).

@peterdettman
Copy link
Contributor

@tarcieri (inv_)mix_columns(_0) could also get non-bitsliced implementations for these, if/when it matters.

@tarcieri
Copy link
Member Author

tarcieri commented Jun 1, 2021

I'm just about to open a follow-up which adds parallelism and does a bit of cleanup including avoiding bitslicing the round keys. Edit: opened #269.

Not terribly worried about putting too much effort into this API for now. I'd just like to get it PoC'd and working.

In the future, however, it might be interesting to try to use an API like this as the core of the overall implementation, which would get rid of a lot of redundant boilerplate that presently exists in the Aes128/Aes192/Aes256 and their associated trait impls.

This was referenced Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants