Skip to content

Commit

Permalink
Revert "Different round keys for columns 0,1 and 2,3 in AesGenerator4R"
Browse files Browse the repository at this point in the history
This reverts commit 8c718b0.
  • Loading branch information
ldmberman committed Apr 6, 2020
1 parent f92973d commit d64fce8
Show file tree
Hide file tree
Showing 5 changed files with 28 additions and 248 deletions.
116 changes: 3 additions & 113 deletions doc/design.md
Expand Up @@ -305,24 +305,17 @@ Using less than 256 MiB of memory is not possible due to the use of tradeoff-res

### 3.1 AesGenerator1R

AesGenerator1R was designed for the fastest possible generation of pseudorandom data to fill the Scratchpad. It takes advantage of hardware accelerated AES in modern CPUs. Only one AES round is performed per 16 bytes of output, which results in throughput exceeding 20 GB/s in most modern CPUs.

AesGenerator1R gives a good output distribution provided that it's initialized with a sufficiently 'random' initial state (see Appendix F).
AesGenerator1R was designed for the fastest possible generation of pseudorandom data to fill the Scratchpad. It takes advantage of hardware accelerated AES in modern CPUs. Only one AES round is performed per 16 bytes of output, which results in throughput exceeding 20 GB/s in most modern CPUs. While 1 AES round is not sufficient for a good distribution of random values, this is not an issue because the purpose is just to initialize the Scratchpad with random non-zero data.

### 3.2 AesGenerator4R

AesGenerator4R uses 4 AES rounds to generate pseudorandom data for Program Buffer initialization. Since 2 AES rounds are sufficient for full avalanche of all input bits [[28](https://csrc.nist.gov/csrc/media/projects/cryptographic-standards-and-guidelines/documents/aes-development/rijndael-ammended.pdf)], AesGenerator4R has excellent statistical properties (see Appendix F) while maintaining very good performance.
AesGenerator4R uses 4 AES rounds to generate pseudorandom data for Program Buffer initialization. Since 2 AES rounds are sufficient for full avalanche of all input bits [[28](https://csrc.nist.gov/csrc/media/projects/cryptographic-standards-and-guidelines/documents/aes-development/rijndael-ammended.pdf)], AesGenerator4R provides an excellent output distribution while maintaining very good performance.

The reversible nature of this generator is not an issue since the generator state is always initialized using the output of a non-reversible hashing function (Blake2b).

### 3.3 AesHash1R

AesHash was designed for the fastest possible calculation of the Scratchpad fingerprint. It interprets the Scratchpad as a set of AES round keys, so it's equivalent to AES encryption with 32768 rounds. Two extra rounds are performed at the end to ensure avalanche of all Scratchpad bits in each lane.

The reversible nature of AesHash1R is not a problem for two main reasons:

* It is not possible to directly control the input of AesHash1R.
* The output of AesHash1R is passed into the Blake2b hashing function, which is not reversible.
AesHash was designed for the fastest possible calculation of the Scratchpad fingerprint. It interprets the Scratchpad as a set of AES round keys, so it's equivalent to AES encryption with 32768 rounds. Two extra rounds are performed at the end to ensure avalanche of all Scratchpad bits in each lane. The output of the AesHash is fed into the Blake2b hashing function to calculate the final PoW hash.

### 3.4 SuperscalarHash

Expand Down Expand Up @@ -483,109 +476,6 @@ This shows that SuperscalaHash has quite low sensitivity to high-order bits and

When calculating a Dataset item, the input of the first SuperscalarHash depends only on the item number. To ensure a good distribution of results, the constants described in section 7.3 of the Specification were chosen to provide unique values of bits 3-53 for *all* item numbers in the range 0-34078718 (the Dataset contains 34078719 items). All initial register values for all Dataset item numbers were checked to make sure bits 3-53 of each register are unique and there are no collisions (source code: [superscalar-init.cpp](../src/tests/superscalar-init.cpp)). While this is not strictly necessary to get unique output from SuperscalarHash, it's a security precaution that mitigates the non-perfect avalanche properties of the randomly generated SuperscalarHash instances.

### F. Statistical tests of RNG

Both AesGenerator1R and AesGenerator4R were tested using the TestU01 library [[30](http://simul.iro.umontreal.ca/testu01/tu01.html)] intended for empirical testing of random number generators. The source code is available in [rng-tests.cpp](../src/tests/rng-tests.cpp).

The tests sample about 200 MB ("SmallCrush" test), 500 GB ("Crush" test) or 4 TB ("BigCrush" test) of output from each generator. This is considerably more than the amounts generated in RandomX (2176 bytes for AesGenerator4R and 2 MiB for AesGenerator1R), so failures in the tests don't necessarily imply that the generators are not suitable for their use case.


#### AesGenerator4R
The generator passes all tests in the "BigCrush" suite when initialized using the Blake2b hash function:

```
$ bin/rng-tests 1
state0 = 67e8bbe567a1c18c91a316faf19fab73
state1 = 39f7c0e0a8d96512c525852124fdc9fe
state2 = 7abb07b2c90e04f098261e323eee8159
state3 = 3df534c34cdfbb4e70f8c0e1826f4cf7
...
========= Summary results of BigCrush =========
Version: TestU01 1.2.3
Generator: AesGenerator4R
Number of statistics: 160
Total CPU time: 02:50:18.34
All tests were passed
```


The generator passes all tests in the "Crush" suite even with an initial state set to all zeroes.
```
$ bin/rng-tests 0
state0 = 00000000000000000000000000000000
state1 = 00000000000000000000000000000000
state2 = 00000000000000000000000000000000
state3 = 00000000000000000000000000000000
...
========= Summary results of Crush =========
Version: TestU01 1.2.3
Generator: AesGenerator4R
Number of statistics: 144
Total CPU time: 00:25:17.95
All tests were passed
```

#### AesGenerator1R

The generator passes all tests in the "Crush" suite when initialized using the Blake2b hash function.

```
$ bin/rng-tests 1
state0 = 67e8bbe567a1c18c91a316faf19fab73
state1 = 39f7c0e0a8d96512c525852124fdc9fe
state2 = 7abb07b2c90e04f098261e323eee8159
state3 = 3df534c34cdfbb4e70f8c0e1826f4cf7
...
========= Summary results of Crush =========
Version: TestU01 1.2.3
Generator: AesGenerator1R
Number of statistics: 144
Total CPU time: 00:25:06.07
All tests were passed
```

When the initial state is initialized to all zeroes, the generator fails 1 test out of 144 tests in the "Crush" suite:

```
$ bin/rng-tests 0
state0 = 00000000000000000000000000000000
state1 = 00000000000000000000000000000000
state2 = 00000000000000000000000000000000
state3 = 00000000000000000000000000000000
...
========= Summary results of Crush =========
Version: TestU01 1.2.3
Generator: AesGenerator1R
Number of statistics: 144
Total CPU time: 00:26:12.75
The following tests gave p-values outside [0.001, 0.9990]:
(eps means a value < 1.0e-300):
(eps1 means a value < 1.0e-15):
Test p-value
----------------------------------------------
12 BirthdaySpacings, t = 3 1 - 4.4e-5
----------------------------------------------
All other tests were passed
```

## References

[1] CryptoNote whitepaper - https://cryptonote.org/whitepaper.pdf
Expand Down
27 changes: 11 additions & 16 deletions doc/specs.md
Expand Up @@ -169,46 +169,41 @@ state0 (16 B) state1 (16 B) state2 (16 B) state3 (16 B)

### 3.3 AesGenerator4R

AesGenerator4R works similar way as AesGenerator1R, except it uses 4 rounds per column. Columns 0 and 1 use a different set of keys than columns 2 and 3.
AesGenerator4R works the same way as AesGenerator1R, except it uses 4 rounds per column:

```
state0 (16 B) state1 (16 B) state2 (16 B) state3 (16 B)
| | | |
AES decrypt AES encrypt AES decrypt AES encrypt
(key0) (key0) (key4) (key4)
(key0) (key0) (key0) (key0)
| | | |
v v v v
AES decrypt AES encrypt AES decrypt AES encrypt
(key1) (key1) (key5) (key5)
(key1) (key1) (key1) (key1)
| | | |
v v v v
AES decrypt AES encrypt AES decrypt AES encrypt
(key2) (key2) (key6) (key6)
(key2) (key2) (key2) (key2)
| | | |
v v v v
AES decrypt AES encrypt AES decrypt AES encrypt
(key3) (key3) (key7) (key7)
(key3) (key3) (key3) (key3)
| | | |
v v v v
state0' state1' state2' state3'
```

AesGenerator4R uses the following 8 round keys:
AesGenerator4R uses the following 4 round keys:

```
key0 = dd aa 21 64 db 3d 83 d1 2b 6d 54 2f 3f d2 e5 99
key1 = 50 34 0e b2 55 3f 91 b6 53 9d f7 06 e5 cd df a5
key2 = 04 d9 3e 5c af 7b 5e 51 9f 67 a4 0a bf 02 1c 17
key3 = 63 37 62 85 08 5d 8f e7 85 37 67 cd 91 d2 de d8
key4 = 73 6f 82 b5 a6 a7 d6 e3 6d 8b 51 3d b4 ff 9e 22
key5 = f3 6b 56 c7 d9 b3 10 9c 4e 4d 02 e9 d2 b7 72 b2
key6 = e7 c9 73 f2 8b a3 65 f7 0a 66 a9 2b a7 ef 3b f6
key7 = 09 d6 7c 7a de 39 58 91 fd d1 06 0c 2d 76 b0 c0
key0 = 5d 46 90 f8 a6 e4 fb 7f b7 82 1f 14 95 9e 35 cf
key1 = 50 c4 55 6a 8a 27 e8 fe c3 5a 5c bd dc ff 41 67
key2 = a4 47 4c 11 e4 fd 24 d5 d2 9a 27 a7 ac 4a 32 3d
key3 = 2a 3a 0c 81 ff ae a9 99 d9 db d3 42 08 db f6 76
```
These keys were generated as:
```
key0, key1, key2, key3 = Hash512("RandomX AesGenerator4R keys 0-3")
key4, key5, key6, key7 = Hash512("RandomX AesGenerator4R keys 4-7")
key0, key1, key2, key3 = Hash512("RandomX AesGenerator4R keys")
```

### 3.4 AesHash1R
Expand Down
38 changes: 13 additions & 25 deletions src/aes_hash.cpp
Expand Up @@ -171,18 +171,10 @@ void fillAes1Rx4(void *state, size_t outputSize, void *buffer) {
template void fillAes1Rx4<true>(void *state, size_t outputSize, void *buffer);
template void fillAes1Rx4<false>(void *state, size_t outputSize, void *buffer);

//AesGenerator4R:
//key0, key1, key2, key3 = Blake2b-512("RandomX AesGenerator4R keys 0-3")
//key4, key5, key6, key7 = Blake2b-512("RandomX AesGenerator4R keys 4-7")

#define AES_GEN_4R_KEY0 0x99e5d23f, 0x2f546d2b, 0xd1833ddb, 0x6421aadd
#define AES_GEN_4R_KEY1 0xa5dfcde5, 0x06f79d53, 0xb6913f55, 0xb20e3450
#define AES_GEN_4R_KEY2 0x171c02bf, 0x0aa4679f, 0x515e7baf, 0x5c3ed904
#define AES_GEN_4R_KEY3 0xd8ded291, 0xcd673785, 0xe78f5d08, 0x85623763
#define AES_GEN_4R_KEY4 0x229effb4, 0x3d518b6d, 0xe3d6a7a6, 0xb5826f73
#define AES_GEN_4R_KEY5 0xb272b7d2, 0xe9024d4e, 0x9c10b3d9, 0xc7566bf3
#define AES_GEN_4R_KEY6 0xf63befa7, 0x2ba9660a, 0xf765a38b, 0xf273c9e7
#define AES_GEN_4R_KEY7 0xc0b0762d, 0x0c06d1fd, 0x915839de, 0x7a7cd609
#define AES_GEN_4R_KEY0 0xcf359e95, 0x141f82b7, 0x7ffbe4a6, 0xf890465d
#define AES_GEN_4R_KEY1 0x6741ffdc, 0xbd5c5ac3, 0xfee8278a, 0x6a55c450
#define AES_GEN_4R_KEY2 0x3d324aac, 0xa7279ad2, 0xd524fde4, 0x114c47a4
#define AES_GEN_4R_KEY3 0x76f6db08, 0x42d3dbd9, 0x99a9aeff, 0x810c3a2a

template<bool softAes>
void fillAes4Rx4(void *state, size_t outputSize, void *buffer) {
Expand All @@ -191,16 +183,12 @@ void fillAes4Rx4(void *state, size_t outputSize, void *buffer) {
const uint8_t* outputEnd = outptr + outputSize;

rx_vec_i128 state0, state1, state2, state3;
rx_vec_i128 key0, key1, key2, key3, key4, key5, key6, key7;
rx_vec_i128 key0, key1, key2, key3;

key0 = rx_set_int_vec_i128(AES_GEN_4R_KEY0);
key1 = rx_set_int_vec_i128(AES_GEN_4R_KEY1);
key2 = rx_set_int_vec_i128(AES_GEN_4R_KEY2);
key3 = rx_set_int_vec_i128(AES_GEN_4R_KEY3);
key4 = rx_set_int_vec_i128(AES_GEN_4R_KEY4);
key5 = rx_set_int_vec_i128(AES_GEN_4R_KEY5);
key6 = rx_set_int_vec_i128(AES_GEN_4R_KEY6);
key7 = rx_set_int_vec_i128(AES_GEN_4R_KEY7);

state0 = rx_load_vec_i128((rx_vec_i128*)state + 0);
state1 = rx_load_vec_i128((rx_vec_i128*)state + 1);
Expand All @@ -210,23 +198,23 @@ void fillAes4Rx4(void *state, size_t outputSize, void *buffer) {
while (outptr < outputEnd) {
state0 = aesdec<softAes>(state0, key0);
state1 = aesenc<softAes>(state1, key0);
state2 = aesdec<softAes>(state2, key4);
state3 = aesenc<softAes>(state3, key4);
state2 = aesdec<softAes>(state2, key0);
state3 = aesenc<softAes>(state3, key0);

state0 = aesdec<softAes>(state0, key1);
state1 = aesenc<softAes>(state1, key1);
state2 = aesdec<softAes>(state2, key5);
state3 = aesenc<softAes>(state3, key5);
state2 = aesdec<softAes>(state2, key1);
state3 = aesenc<softAes>(state3, key1);

state0 = aesdec<softAes>(state0, key2);
state1 = aesenc<softAes>(state1, key2);
state2 = aesdec<softAes>(state2, key6);
state3 = aesenc<softAes>(state3, key6);
state2 = aesdec<softAes>(state2, key2);
state3 = aesenc<softAes>(state3, key2);

state0 = aesdec<softAes>(state0, key3);
state1 = aesenc<softAes>(state1, key3);
state2 = aesdec<softAes>(state2, key7);
state3 = aesenc<softAes>(state3, key7);
state2 = aesdec<softAes>(state2, key3);
state3 = aesenc<softAes>(state3, key3);

rx_store_vec_i128((rx_vec_i128*)outptr + 0, state0);
rx_store_vec_i128((rx_vec_i128*)outptr + 1, state1);
Expand Down
2 changes: 1 addition & 1 deletion src/tests/benchmark.cpp
Expand Up @@ -353,7 +353,7 @@ int main(int argc, char** argv) {
std::cout << "Calculated result: ";
result.print(std::cout);
if (noncesCount == 1000 && seedValue == 0)
std::cout << "Reference result: 10b649a3f15c7c7f88277812f2e74b337a0f20ce909af09199cccb960771cfa1" << std::endl;
std::cout << "Reference result: 669ae4f2e5e2c0d9cc232ff2c37d41ae113fa302bbf983d9f3342879831b4edf" << std::endl;
if (!miningMode) {
std::cout << "Performance: " << 1000 * elapsed / noncesCount << " ms per hash" << std::endl;
}
Expand Down
93 changes: 0 additions & 93 deletions src/tests/rng-tests.cpp

This file was deleted.

0 comments on commit d64fce8

Please sign in to comment.