Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/crypto: optimize AES footprint #10037

Closed
wants to merge 1 commit into from

Conversation

olegart
Copy link
Contributor

@olegart olegart commented Sep 25, 2018

Contribution description

AES tables on-the-fly calculation instead of using precalculated tables - roughly the same performance while saving ~8 KB code size.

Testing procedure

Passes unittests/tests-crypto. Was benchmarked by this test with xtimer_now_usec timestamps before and after test runs.

STM32L071KBU6 @ 32 MHz
#undef AES_CALCULATE_TABLES
AES encryption: 287 us
AES decryption: 509 us
Code size 29872 bytes

#define AES_CALCULATE_TABLES
AES encryption: 285 us
AES decryption: 558 us
Code size 22296 bytes

STM32L152RET6 @ 32 MHz
#undef AES_CALCULATE_TABLES
AES encryption: 156 us
AES decryption: 274 us
Code size 29600 bytes

#define AES_CALCULATE_TABLES
AES encryption: 165 us
AES decryption: 280 us
Code size 21672 bytes

STM32L451CCU6 @ 80 MHz
#undef AES_CALCULATE_TABLES
AES encryption: 79 us
AES decryption: 143 us
Code size 29840 bytes

#define AES_CALCULATE_TABLES
AES encryption: 81 us
AES decryption: 142 us
Code size 21920 bytes

@x3ro x3ro added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Area: crypto Area: Cryptographic libraries labels Sep 26, 2018
@x3ro
Copy link
Contributor

x3ro commented Sep 26, 2018

Very cool optimization 👍 In order to fully understand the modifications, I'm curious what reference you used to implement them?

@olegart
Copy link
Contributor Author

olegart commented Sep 26, 2018

@x3ro, I don't quite remember, it was like a year ago, but definitely was inspired by https://eprint.iacr.org/2016/714.pdf. In fact, generating all four T-tables from a single table for encryption and two tables for decryption is a common concept - if you just look at them, you'll see that they can be generated from each other using bit-shifting.

Rest of the implementation is unchanged.

I'm not sure why precomputed tables were used in the first place. May be such implementation is significantly faster on some low-end 8/16-bit MCUs, but then again, most ot them don't have too much flash too.

We looked at other possible optimizations, even tested assembler implementation proposed in the abovementioned paper, but decided effect is negligible - AES is rarely a bottleneck in IoT, so T-tables calculation is a nice balance between footprint and performance.

@tcschmidt
Copy link
Member

@bergzand would you like to plug-test against an established AES implementation?

@x3ro
Copy link
Contributor

x3ro commented Oct 4, 2018

Hey @olegart. I'm sorry to say that I won't be reviewing this PR anymore, as I do not find your behaviour towards other RIOT contributors tolerable. Nobody was born an expert kernel developer, and your personal insults regarding their work are extremely detrimental to furthering the development of talent in this community. I personally recommend A Case Study in Not Being A Jerk in Open Source by Gary Bernhardt.

@olegart
Copy link
Contributor Author

olegart commented Oct 4, 2018

@x3ro do I look like I care?..

You've just insulted a whole professional community by throwing away better code because you don't like its' author. You praise yourself on reading some SJW bullshit and profoundly ignoring documents like IEEE Software Engineering Code of Ethics.

And still, I don't care.

@olegart
Copy link
Contributor Author

olegart commented Oct 4, 2018

(a thought a few minutes later) By the way, a legal warning, although I don't think it's necessary: if someone thinks of doing another PR with the same code, I have to remind you that maintaining my copyright is mandatory.

@maxlapshin
Copy link

@x3ro can you please explain, where exactly here @olegart was untolerable?

@x3ro
Copy link
Contributor

x3ro commented Oct 4, 2018

@x3ro can you please explain, where exactly here @olegart was untolerable?

#10088

@maxlapshin
Copy link

so it is not related with this thread, but just personally you do not want to take a look at his code because of his behaviour in other tickets, right?

@x3ro
Copy link
Contributor

x3ro commented Oct 4, 2018

That is correct. I also do not wish to have a discussion on this thread, my message was merely informational as to why I would not be further engaging in this.

Copy link
Contributor

@jcarrano jcarrano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was avoiding a review of this because I know nothing of crypto so I want to make sure I don't screw up. Here is what I did to verify this is correct.

Edit aes.h

First I removed line 41 in aes.h (the hardcoded define should not be there).

Apply semantic patch

I used coccinelle (for the first time) with the following semantic patch (t-table-to-macro.cocci)

expression E;
identifier Te =~ "Te[0-9]";
@@

(
- Te[E]
+ Te(E)
)

@@
expression E;
identifier Td =~ "Td[0-9]";
@@

(
- Td[E]
+ Td(E)
)

Run on the old aes.c to get something that should look more like the new:

git show master:./aes.c > old_aes.c
spatch --sp-file t-table-to-macro.cocci old_aes.c -o new_aes.c

Compare semantic patch with new file

diff -uw  new_aes.c aes.c

This shows that all T(d|e)[0-4] replacements are correct.

Check that the on-the fly calculations are correct.

I wrote the following quick-n-dirty program (test-calculated-tables.c).

/* Compile with and without AES_CALCULATE_TABLES. Output should be unchanged */

/* Use command line: gcc -I../include test-calculated-tables.c */

#include "aes.c"

int main()
{
    int i;

    puts(
#ifdef AES_CALCULATE_TABLES
    "Calculating tables on the fly"
#else
    "Using ROM tables"
#endif /*AES_CALCULATE_TABLES*/
    );

    for (i = 0; i<256; i++) {
        printf("%X %X %X %X %X %X %X %X %X %X\n",
            Te0(i), Te1(i), Te2(i), Te3(i), Te4(i),
            Td0(i), Td1(i), Td2(i), Td3(i), Td4(i)
        );
    }

    return 0;
}

Compile with and without -DAES_CALCULATE_TABLES. Output should be the same except for the first line:

gcc -DAES_CALCULATE_TABLES -pedantic -Wall -I../include test-calculated-tables.c -o new-outp.elf
gcc  -pedantic -Wall -I../include test-calculated-tables.c -o old-outp.elf
./new-outp.elf > CPU_tables.txt
./old-outp.elf > ROM_tables.txt
diff ROM_tables.txt CPU_tables.txt

Should give:

1c1
< Using ROM tables
---
> Calculating tables on the fly

Whats still needs to be done

When aes.h is fixed I'll run the AES tests.

@@ -37,6 +37,8 @@ typedef uint32_t u32;
typedef uint16_t u16;
typedef uint8_t u8;

/* This controls AES table calculation on the fly */
#define AES_CALCULATE_TABLES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wrap this in a #ifndef AES_CALCULATE_TABLES block so that the user can override it without editing the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define wrapped.

Here is what I did to verify this is correct.

It passes unittests/tests-crypto/tests-crypto-aes.c which means it is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh! I just realized what I wrote. I was thinking on defines that take on values, but this is another thing.

According to @cladmi the correct way of handling this is by using a PSEUDOMODULE. I just did it, you can look for it in my branch https://github.com/jcarrano/RIOT/tree/aes-t-table

It passes unittests/tests-crypto/tests-crypto-aes.c which means it is correct.

Sure, but that tests compression and decompression with the same code. I wanted to check that both codes were equivalent (and on the process set a precedent so that in the future when people make big but mechanical changes, I can direct them to this PR to show them how they can help me review it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pseudomodules are ok, except I don't think that precalculated tables nor unrolled cycles is a good default choice. Both are extremely costly in terms of flash memory (I don't even think that anyone ever used FULL_UNROLL option).

Sure, but that tests compression and decompression with the same code

No. It tests actual code against so called test vector — which include both input, encryption key and output. Any AES code under test must provide specified output being given specified key and input.

There are two different test vectors defined, so false positive is near to impossible. They are not NIST recommended vectors, but that should not matter if vectors were calculated properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except I don't think that precalculated tables nor unrolled cycles is a good default choice

Me neither. That's why those features won't be selected unlesss one explicitly selects the pseudomodule. If you just type USEMODULE += crypto then tables will be on-the-fly (I changed the ifdef to ifndef in my branch) and loops rolled.

@jcarrano
Copy link
Contributor

Hey @olegart, are you ok with me pushing the PSEUDOMODULES patch to your branch, or pulling those changes from mine? This is a good PR and I'd like to get it merged.

@jcarrano
Copy link
Contributor

This PR has been merged as #10406. Looking forward to get #10124 in as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: crypto Area: Cryptographic libraries Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants