Replace ISAAC with HC-128 #53

pitdicker · 2017-11-16T12:42:36Z

Okay, this is somewhat funny. After putting quite some effort into optimising our implementation of ISAAC, I now propose we replace it with HC-128.

Just like ISAAC and RC-4 before it, HC-128 is an array based stream cipher that we use as an RNG. Comparison:

Security / predictability

ISAAC is designed to be usable for applications that need a cryptographically secure PRNG. But it does not really meet the current standards for one. This makes ISAAC overkill for applications that just need a statistically good PRNG, and not good enough for anything that needs guarantees for security.

HC-128 is designed by Hongjun Wu, one of the experts in the field. It is well received, and selected as one of the "stream ciphers suitable for widespread adoption" by eSTREAM. A very comprehensive analysis of the current state of attacks / known weaknesses of HC-128 is given in "Some Results On Analysis And Implementation Of HC-128 Stream Cipher" by Shashwat Raizada.

HC-128 has no known weaknesses that are easier to exploit than doing a brute-force search of 2^128. Is makes clear security promises, and also has a clear story around initialization.

Performance

HC-128 is an 32-bit algorithm. It performs usually 30% faster than the 32-bit ISAAC, and mostly the same as the 64-bit ISAAC-64. Only for fill_bytes ISAAC-64 is about 45% faster. Note that this makes it still as fast as our implementation until a week ago.

The results here may improve a little bit, because I have not been able yet to have to compiler optimise out all bounds checks. It now does one per u32.

x86 (all numbers in Mb/s):

PRNG	u32	u64	byte stream
ISAAC (rand 0.3.16)	743	764	432
ISAAC (current)	916	1043	1219
ISAAC-64 (rand 0.3.16)	668	1307	743
ISAAC-64 (current)	937	1410	1451
HC-128	1231	1417	1231

x86_64 (all numbers in Mb/s):

PRNG	u32	u64	byte stream
ISAAC (rand 0.3.16)	826	831	488
ISAAC (current)	980	1118	1434
ISAAC-64 (rand 0.3.16)	937	1874	620
ISAAC-64 (current)	1287	1930	2640
HC-128	1300	1810	1833

For the time it takes to initialise I don't yet have good numbers. With the unoptimised routine from hc128 it takes about as much time as ISAAC-64. It can probably be improved by ~30%.

Memory

ISAAC uses two arrays: one to hold 256 state words, and one to hold 256 results. HC-128 needs two arrays of 512 words to hold its state. As an optimisation we include a 16-word array of results.

Memory usage (excl. counters etc.):

PRNG	state	results	total
ISAAC	256*4 = 1024	256*4 = 1024	2048 bytes
ISAAC-64	256*8 = 2048	256*8 = 2048	4096 bytes
HC-128	25124 = 4096	16*4 = 64	4160 bytes

Overall HC-128 seems like a clear improvement over ISAAC. For ISAAC it is not really clear what its place is in the world. HC-128 is a super-fast cryptographic PRNG that works similar.

The text was updated successfully, but these errors were encountered:

dhardy · 2017-11-16T18:35:37Z

That's a good argument.

The other thing that jumps out from your results is that there is only one reason to use 32-bit Isaac over the 64-bit version: memory usage (which is in any case still poor).

At some point in time rand should probably promise that XYZRng will remain permanently available to avoid breakage where reproducibility is required. Nothing is stabilised yet. It may still be more sensible to add a new generator along-side, update StdRng, and propose removal of ISAAC separately.

dhardy · 2017-12-06T16:13:13Z

HC-128 is at least 13 years old and appears to be designed for 32-bit hardware. Most comparable generators are also 32-bit and of similar age, if not older. Can the arithmetic be parallelised well for modern CPUs? Are there any more recent stream cyphers we should consider?

pitdicker · 2017-12-07T20:09:50Z

Good questions. For cryptographic RNG's it seems sensible to not use something very recent, because there has not been enough time to analyse it. The variant of HC-128 I worked from is from 2009, and has seen a few small improvements. It produces different results from the one in the eSTREAM contest, mostly for performance it seems.

It is written to take good advantage of instruction level parallelism, much more than ISAAC. SIMD does not make sense here, because what it does is mostly reading from different parts of an array. Something like seven times.

The other ciphers from eSTREAM also seem interesting. HC-128 is the said to be clearly fastest one, although I haven't seen benchmarks comparing them. I don't know how optimal our implementation of ChaCha is. LLVM seems to be able to vectorize it... HC128 is 4~6x faster.

It seems all (?) cryptographic RNG's / ciphers get designed for 32 bit, so they work on all sorts of devices well. I don't expect a good 64-bit RNG to get the necessary analysis soon. But I am just an outsider doing some reading... Interestingly HC-128 with just 32-bit arithmetic can rival the performance of 64-bit ISAAC.

I can give updated benchmarks on my implementation of HC-128:
x86 (all numbers in Mb/s):

PRNG	u32	u64	byte stream
ISAAC (rand 0.3.16)	743	764	432
ISAAC (current)	916	1043	1219
ISAAC-64 (rand 0.3.16)	668	1307	743
ISAAC-64 (current)	937	1410	1451
HC-128 (previous)	1231	1417	1231
HC-128	1254	1326	1770

x86_64 (all numbers in Mb/s):

PRNG	u32	u64	byte stream
ISAAC (rand 0.3.16)	826	831	488
ISAAC (current)	980	1118	1434
ISAAC-64 (rand 0.3.16)	937	1874	620
ISAAC-64 (current)	1287	1930	2640
HC-128 (previous)	1300	1810	1833
HC-128	1478	1935	2397

I have hold off making a PR because I use unsafe to do unchecked indexing, and would like to find a trick to avoid bounds checking with just safe code. But with every variant I tried, LLVM just can't figure it out.

I think I will just make a PR and leave that as an exercise for later.

zackw · 2017-12-08T14:57:28Z

I would like to voice my support for making thread_rng a full-strength cryptographic PRNG. I do not have an opinion on which specific algorithm should be used, though.

In another context (backing up the C library's rand with a CSPRNG) someone suggested to me that it would be good to choose the output cache size to make the amortized cost of each call to sample<u32>(Uniform) roughly what the per-call cost with a linear congruential generator would be.

pitdicker · 2017-12-08T15:14:00Z

someone suggested to me that it would be good to choose the output cache size to make the amortized cost of each call to sample(Uniform) roughly what the per-call cost with a linear congruential generator would be.

That is a good suggestion, and already what ISAAC does, and HC-128 to a smaller extend.
ISAAC calculates 256 values at a time, and HC-128 16 at a time.

The small amount of logic to read the next result, and generate new ones if necessary is already 20~40% of the time it takes to generate the next random number.

pitdicker · 2017-12-16T07:34:19Z

I measured the initialization time weeks ago, but forgot to post it.
HC-128 takes about twice as long to do the calculations to set up its table. But because it request fewer bytes from OsRng, it ends up being similar to ISAAC (32-bit variant).

Initialization time (ns):

PRNG	OS	RNG	combined
ChaCha20	1989	17	2006 (for comparison)
HC-128	~~2482~~ 339	4347	4686
ISAAC	2636	2280	4916
ISAAC-64	5007	2418	7425

dhardy · 2017-12-16T09:44:15Z

But 2482 + 4347 >> 4686, and anyway 2482 is not much different that 2636? Those numbers make no sense.

Anyway, I don't think we have to worry too much since thread_rng only seeds rarely.

pitdicker · 2017-12-16T13:24:04Z

Sloppy cope & paste, sorry for the confusion. The column 'OS' is my estimate of the times spend outside the control of the RNG init function, by measuring the difference between initializing from OsRng and XorShiftRng.

The impact of reseeding the RNG in tread_rng is surprisingly high. tread_rng reseeded it every 2^15 generated bytes.
With a total init of time 4686ns, and with 2.733ns to generate an u32, the overhead is:
4686ns / (2^15 / 4 * 2.733ns) * 100% = 21%

This matches the benchmarks surprisingly well.

Maybe it is better to set the reseeding threshold of thread_rng a little higher, like 2^17 bytes. This makes the overhead 4× less, 5%. And reseeding happens every 131 kB.

But I suppose any threshold that is less than the number of rounds it takes to recover the internal state of an RNG ('crack' it) is good. HC-128 claims there is no better algorithm than a brute-force search of 2^128 values. Such numbers do not drop quickly, an improvement that could crack it in half the time would still take 2^127 tries.

A quick look the much weaker (and not actually recognised as cryptographically secure) RC-4 shows it has an attack that can recover the state after 2^26 rounds. I think we can safely say that for an actual cryptographic RNG the number should never drop that low. 2^26 rounds of 4 bytes means reseeding after generating 128 MB. That seems like a nicer compromise to me between not reseeding, and reseeding unnecessary often. With HC-128 running at its best speed it still means reseeding 14 times per second.

dhardy · 2018-01-01T12:57:02Z

Should we also consider Kravatte? I don't know a lot about it, but it's supposed to be a pseudo-random function from the Keccak team.

pitdicker · 2018-01-01T13:05:45Z

Nice find! I think there are quite a few cryptographically good RNG's to choose from.

I saw replacing ISAAC with HC-128 as a simple step because they are so similar. They follow roughly the same method, both are based on indirect array indexing. And they have roughly the seems performance, reportedly much faster than the alternatives. I probably sell it short by saying this, but HC-128 can be seen as an improved variant of ISAAC.

So it seems like a not too controversial change.

Exploring / collecting more CryptoRng's would be interesting though.

dhardy · 2018-01-22T18:18:37Z

@Lokathor can I get you to do some more Rasperry Pi benchmarks? HC-128 is merged into upstream master now; it would be nice to see the result of cargo bench --bench generators on ARM.

Lokathor · 2018-01-22T18:20:23Z

yeah, I'll get to this when I have a free moment.

Lokathor · 2018-01-28T23:28:48Z

pi@retropie:~/dev/rand $ cargo +nightly bench --bench generators
    Finished release [optimized] target(s) in 0.0 secs
     Running target/release/deps/generators-e972b27ec2312ee9

running 31 tests
test gen_bytes_chacha      ... bench:  17,644,512 ns/iter (+/- 398,746) = 58 MB/s
test gen_bytes_hc128       ... bench:   7,899,742 ns/iter (+/- 44,517) = 129 MB/s
test gen_bytes_isaac       ... bench:   4,246,930 ns/iter (+/- 80,216) = 241 MB/s
test gen_bytes_isaac64     ... bench:   3,641,228 ns/iter (+/- 35,539) = 281 MB/s
test gen_bytes_os          ... bench:  24,481,923 ns/iter (+/- 65,632) = 41 MB/s
test gen_bytes_std         ... bench:   4,244,826 ns/iter (+/- 50,662) = 241 MB/s
test gen_bytes_xorshift    ... bench:   2,685,297 ns/iter (+/- 5,213) = 381 MB/s
test gen_u32_chacha        ... bench:      68,660 ns/iter (+/- 292) = 58 MB/s
test gen_u32_hc128         ... bench:      30,628 ns/iter (+/- 222) = 130 MB/s
test gen_u32_isaac         ... bench:      22,280 ns/iter (+/- 95) = 179 MB/s
test gen_u32_isaac64       ... bench:      20,794 ns/iter (+/- 151) = 192 MB/s
test gen_u32_os            ... bench:   1,687,689 ns/iter (+/- 3,675) = 2 MB/s
test gen_u32_std           ... bench:      28,134 ns/iter (+/- 142) = 142 MB/s
test gen_u32_xorshift      ... bench:      10,853 ns/iter (+/- 32) = 368 MB/s
test gen_u64_chacha        ... bench:     141,694 ns/iter (+/- 704) = 56 MB/s
test gen_u64_hc128         ... bench:      56,631 ns/iter (+/- 288) = 141 MB/s
test gen_u64_isaac         ... bench:      42,022 ns/iter (+/- 213) = 190 MB/s
test gen_u64_isaac64       ... bench:      33,237 ns/iter (+/- 1,177) = 240 MB/s
test gen_u64_jitter        ... bench:     108,299 ns/iter (+/- 2,685)
test gen_u64_os            ... bench:   1,693,601 ns/iter (+/- 5,160) = 4 MB/s
test gen_u64_std           ... bench:      47,928 ns/iter (+/- 224) = 166 MB/s
test gen_u64_xorshift      ... bench:      17,547 ns/iter (+/- 1,078) = 455 MB/s
test init_chacha           ... bench:         493 ns/iter (+/- 2)
test init_hc128            ... bench:      40,840 ns/iter (+/- 117)
test init_isaac            ... bench:      16,890 ns/iter (+/- 40)
test init_isaac64          ... bench:      15,887 ns/iter (+/- 179)
test init_jitter           ... bench:     640,362 ns/iter (+/- 10,337)
test init_xorshift         ... bench:         105 ns/iter (+/- 0)
test reseeding_hc128_bytes ... bench:   7,903,153 ns/iter (+/- 32,645) = 129 MB/s
test reseeding_hc128_u32   ... bench:      35,869 ns/iter (+/- 56) = 111 MB/s
test reseeding_hc128_u64   ... bench:      70,728 ns/iter (+/- 147) = 113 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 31 measured; 0 filtered out

dhardy · 2018-01-29T10:21:53Z

Thanks @Lokathor. Looks like @pitdicker was right that ISAAC64 is faster than the 32-bit version everywhere; unfortunately HC-128 doesn't perform well. Comparison on my laptop (Haswell):

$ RUST_VER=nightly cargo bench --bench generators
    Finished release [optimized] target(s) in 0.0 secs
     Running target/release/deps/generators-0cb36d8f63ab1872

running 34 tests
test gen_bytes_chacha12 ... bench:   1,756,129 ns/iter (+/- 178,500) = 583 MB/s
test gen_bytes_chacha20 ... bench:   2,693,987 ns/iter (+/- 249,770) = 380 MB/s
test gen_bytes_chacha8  ... bench:   1,281,627 ns/iter (+/- 47,023) = 798 MB/s
test gen_bytes_hc128    ... bench:     629,247 ns/iter (+/- 81,467) = 1627 MB/s
test gen_bytes_isaac    ... bench:   1,108,569 ns/iter (+/- 31,206) = 923 MB/s
test gen_bytes_isaac64  ... bench:     591,091 ns/iter (+/- 57,612) = 1732 MB/s
test gen_bytes_os       ... bench:   6,163,858 ns/iter (+/- 307,591) = 166 MB/s
test gen_bytes_std      ... bench:     591,572 ns/iter (+/- 56,295) = 1730 MB/s
test gen_bytes_xorshift ... bench:     887,066 ns/iter (+/- 29,150) = 1154 MB/s

pitdicker · 2018-01-30T09:26:07Z

Wow! I did not expect such a large difference between ISAAC and HC-128. I wonder what causes it?

test gen_bytes_hc128       ... bench:   7,899,742 ns/iter (+/- 44,517) = 129 MB/s
test gen_bytes_isaac       ... bench:   4,246,930 ns/iter (+/- 80,216) = 241 MB/s
test gen_bytes_isaac64     ... bench:   3,641,228 ns/iter (+/- 35,539) = 281 MB/s

If the problem is code size or register pressure HC-128 can be optimized some more for this target. But if the performance difference is just because of the larger tables or the number of instructions this is it...

pitdicker · 2018-01-30T09:31:19Z

Looks like @pitdicker was right that ISAAC64 is faster than the 32-bit version everywhere

I should be careful not to claim too much... Actually I didn't notice it before your comment #53 (comment) 😄.

The other thing that jumps out from your results is that there is only one reason to use 32-bit Isaac over the 64-bit version: memory usage (which is in any case still poor).

dhardy · 2018-03-11T11:46:27Z

Implemented: rust-random#277

dhardy mentioned this issue Nov 26, 2017

SeedableRng::Seed and from_hashable #62

Closed

dhardy mentioned this issue Dec 8, 2017

Consider switching the thread RNG to OsRng rust-random/rand#78

Closed

pitdicker mentioned this issue Dec 14, 2017

Add HC-128 RNG #74

Merged

pitdicker mentioned this issue Dec 16, 2017

Add HC-128 rust-random/rand#210

Merged

dhardy added enhancement question labels Jan 2, 2018

pitdicker mentioned this issue Jan 9, 2018

Think about Isaac{,64}Rng rust-random/rand#15

Closed

dhardy mentioned this issue Jan 12, 2018

Tracker: planned changes for 0.5 rust-random/rand#232

Closed

33 tasks

dhardy added apparently solved and removed question labels Jan 22, 2018

pitdicker mentioned this issue Mar 1, 2018

Switch StdRng and thread_rng to HC-128 rust-random/rand#277

Merged

dhardy closed this as completed Mar 11, 2018

sameer mentioned this issue Aug 7, 2018

Use appropriate RNG for version 4 generation uuid-rs/uuid#52

Closed

ramosbugs mentioned this issue Apr 16, 2019

Pin to more recent version of rand ramosbugs/oauth2-rs#57

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace ISAAC with HC-128 #53

Replace ISAAC with HC-128 #53

pitdicker commented Nov 16, 2017

dhardy commented Nov 16, 2017

dhardy commented Dec 6, 2017

pitdicker commented Dec 7, 2017 •

edited

Loading

zackw commented Dec 8, 2017 •

edited

Loading

pitdicker commented Dec 8, 2017

pitdicker commented Dec 16, 2017 •

edited

Loading

dhardy commented Dec 16, 2017

pitdicker commented Dec 16, 2017

dhardy commented Jan 1, 2018

pitdicker commented Jan 1, 2018

dhardy commented Jan 22, 2018

Lokathor commented Jan 22, 2018

Lokathor commented Jan 28, 2018

dhardy commented Jan 29, 2018

pitdicker commented Jan 30, 2018

pitdicker commented Jan 30, 2018

dhardy commented Mar 11, 2018

Replace ISAAC with HC-128 #53

Replace ISAAC with HC-128 #53

Comments

pitdicker commented Nov 16, 2017

Security / predictability

Performance

Memory

dhardy commented Nov 16, 2017

dhardy commented Dec 6, 2017

pitdicker commented Dec 7, 2017 • edited Loading

zackw commented Dec 8, 2017 • edited Loading

pitdicker commented Dec 8, 2017

pitdicker commented Dec 16, 2017 • edited Loading

dhardy commented Dec 16, 2017

pitdicker commented Dec 16, 2017

dhardy commented Jan 1, 2018

pitdicker commented Jan 1, 2018

dhardy commented Jan 22, 2018

Lokathor commented Jan 22, 2018

Lokathor commented Jan 28, 2018

dhardy commented Jan 29, 2018

pitdicker commented Jan 30, 2018

pitdicker commented Jan 30, 2018

dhardy commented Mar 11, 2018

pitdicker commented Dec 7, 2017 •

edited

Loading

zackw commented Dec 8, 2017 •

edited

Loading

pitdicker commented Dec 16, 2017 •

edited

Loading