Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial rc5 impl #346

Merged
merged 13 commits into from Feb 10, 2023
Merged

initial rc5 impl #346

merged 13 commits into from Feb 10, 2023

Conversation

antonio-dropulic
Copy link
Contributor

Implemented RC5 based on the revised paper. Got test vector at: https://www.ietf.org/archive/id/draft-krovetz-rc6-rc5-vectors-00.txt

Core impl features a generic impl using typenum and generic array. Const generic implementation
without generic expressions doesn't seem appropriate.

I was not sure what api would work best with the cipher crate so i decided to propagate the InOut ptr.
Put all the core fn behind a trait to avoid rewriting generic bounds.

For now only 32, 20, 20 parametrization made public.

rc5/README.md Outdated Show resolved Hide resolved
rc5/README.md Outdated Show resolved Hide resolved
use cipher::{impl_simple_block_encdec, AlgorithmName, KeyInit};
use cipher::{inout::InOut, Block, BlockCipher, KeySizeUser};

pub struct RC5_32_20_16 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of 20 rounds is not obvious. Both OpenSSL and RFC 2040 default to 12 rounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the: https://www.ietf.org/archive/id/draft-krovetz-rc6-rc5-vectors-00.txt

" The original RC5 publication suggested using 12 rounds when w=32 and
16 rounds when w=64. In response to cryptanalysis, the authors
changed the recommendation when w=32 to 16 rounds [RC5sec]."

I'm not sure what should be the defaults. I have no problem changing it to 12. But where should I find the test vectors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to proceed. Should i expose RC5 32, 12, 16 or 32, 16, 16? I implemented RC5 32, 20, 16 since that is the only w=32 test vector i found already generated in the krovetz draft. I suppose it is fine to use the code given there to generate others.

A possible solution would be to implement cipher api for all posible w/r/b. Then we expose some recomended / common configurations through type aliases. The issue I see is covering all the possible configurations with tests. Would it could be okay to test only the recomended configurations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing them as generic parameters makes sense to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll tweak the code to do that. I don't see how to escape repeating the bounds for every impl. The code wont be pretty :D And I still have to add bounds to limit the range of rounds and key bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the generic implementation, but the compiler seems to strugle with optimizing it. I notice a 3x penalty in performance compared to the previous version.

 name                         control ns/iter    generic ns/iter    diff ns/iter   diff %  speedup 
 rc5_32_12_16_decrypt_block   14,611 (560 MB/s)  63,711 (128 MB/s)        49,100  336.05%   x 0.23 
 rc5_32_12_16_decrypt_blocks  14,659 (558 MB/s)  58,842 (139 MB/s)        44,183  301.41%   x 0.25 
 rc5_32_12_16_encrypt_block   11,242 (728 MB/s)  51,916 (157 MB/s)        40,674  361.80%   x 0.22 
 rc5_32_12_16_encrypt_blocks  11,467 (714 MB/s)  51,360 (159 MB/s)        39,893  347.89%   x 0.22 
 rc5_32_16_16_decrypt_block   19,621 (417 MB/s)  77,881 (105 MB/s)        58,260  296.93%   x 0.25 
 rc5_32_16_16_decrypt_blocks  19,800 (413 MB/s)  76,284 (107 MB/s)        56,484  285.27%   x 0.26 
 rc5_32_16_16_encrypt_block   17,708 (462 MB/s)  81,863 (100 MB/s)        64,155  362.29%   x 0.22 
 rc5_32_16_16_encrypt_blocks  17,704 (462 MB/s)  75,664 (108 MB/s)        57,960  327.38%   x 0.23 

Any idea why this might be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used cargo-show-asm to generate the asm of the bench functions in order to provide some clarity on the performance regression.

I've replaced the bench with something easier to generate asm from and looked at the asm from fn encrypt_

#![feature(test)]
extern crate test;

use cipher::generic_array::GenericArray;
use cipher::typenum::U8;
use cipher::BlockEncryptMut;
use cipher::{block_decryptor_bench, block_encryptor_bench};
use rc5::{RC5_32_12_16, RC5_32_16_16};

#[bench]
pub fn encrypt(bh: &mut test::Bencher) {
    use cipher::KeyInit;
    let key = test::black_box(Default::default());
    let mut cipher = RC5_32_12_16::new(&key);

    let mut block: GenericArray<u8, U8> = test::black_box(Default::default());

    bh.iter(|| {
        encrypt_(&mut cipher, &mut block);
    });
    bh.bytes = (block.len()) as u64;
}

#[inline(never)]
pub fn encrypt_(cipher: &mut RC5_32_12_16, block: &mut GenericArray<u8, U8>) {
    cipher.encrypt_block_mut(block);
}

The asm for the non generic code just calls the fn encrypt_block. This function contains the call to
the generic version of fn encrypt, but with fixed generic parameters. The asm for the less performant generic version inlines the entire encrypt algorithm.

I've managed to reproduce the same performance regresion in the non generic code by adding #[inline(always)] to the fn encrypt_block.

It is unclear for me why this happens or how to fix it. Should this pr be turned into a draft until I find a solution or should the non generic version be submitted in the meantime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon closer inspection i found that the generic version would not inline calls to word functions, while the non generic one would. After adding inline annotations to Word impl the performance is once again comparable.

rc5/src/block_cipher.rs Outdated Show resolved Hide resolved
after 5b84ad4: generic rc5 pefromance regression was noticed.
after analizing the generated asm I noticed that while the generic
top level generic impl does get inlined the actual algorithm impl
differs in the fact that it does not inline Word functions. After
adding inline annotations performance is once again compareable.
@axos88
Copy link

axos88 commented Feb 10, 2023

Can we get this merged and released? Is it usable @antonio-dropulic? I'd very much like to use it, and would be happy to do from your branch until it gets merged.

@tarcieri tarcieri merged commit cbcebe1 into RustCrypto:master Feb 10, 2023
@newpavlov newpavlov mentioned this pull request Jun 23, 2023
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants