-
Notifications
You must be signed in to change notification settings - Fork 36.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce wasted pseudorandom bytes in ChaCha20 + various improvements #26153
Conversation
d47e8c2
to
d1aa65b
Compare
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
cc @theStack re #25698 & #25712.
@aureleoules want to benchmark here as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR
ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
2.58 | 387,995,472.42 | 0.5% | 16.98 | 5.93 | 2.866 | 0.03 | 0.0% | 0.03 | CHACHA20_1MB |
ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
2.57 | 389,442,111.22 | 0.1% | 16.98 | 5.89 | 2.883 | 0.03 | 0.0% | 0.03 | CHACHA20_1MB |
Master
ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
2.57 | 389,658,756.50 | 0.4% | 17.03 | 5.89 | 2.892 | 0.05 | 0.0% | 0.03 | CHACHA20_1MB |
ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
2.56 | 390,349,081.17 | 0.3% | 17.03 | 5.87 | 2.900 | 0.05 | 0.0% | 0.03 | CHACHA20_1MB |
#25354
ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
2.89 | 346,242,492.84 | 0.4% | 20.13 | 6.62 | 3.039 | 0.08 | 0.0% | 0.03 | CHACHA20_1MB |
ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
2.88 | 346,669,699.46 | 0.6% | 20.13 | 6.59 | 3.052 | 0.08 | 0.0% | 0.03 | CHACHA20_1MB |
Results seem very close to master and definitely faster than #25354.
d1aa65b
to
c8cbfef
Compare
c8cbfef
to
a2cecfe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noticed something interesting, this PR is faster than master and #25354 when I tested on linux x86 but not on macOS arm64. When testing on macOS arm64, #25354 is faster:
macOS arm64
pr
| ns/byte | byte/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1.45 | 687,478,118.34 | 0.1% | 0.02 | `CHACHA20_1MB`
| 1.48 | 676,913,544.12 | 0.4% | 0.01 | `CHACHA20_256BYTES`
| 1.53 | 652,987,660.64 | 0.3% | 0.01 | `CHACHA20_64BYTES`
| 4.12 | 242,489,722.97 | 0.8% | 0.05 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
| 2.05 | 488,239,791.48 | 0.0% | 0.02 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
| 5.16 | 193,768,633.08 | 0.3% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
| 2.61 | 383,482,499.71 | 1.1% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
| 8.47 | 118,017,131.11 | 0.2% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
| 4.27 | 234,365,831.19 | 0.2% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`
master
| ns/byte | byte/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1.48 | 675,393,000.64 | 0.1% | 0.02 | `CHACHA20_1MB`
| 1.51 | 661,164,154.45 | 0.7% | 0.01 | `CHACHA20_256BYTES`
| 1.58 | 632,796,536.80 | 0.4% | 0.01 | `CHACHA20_64BYTES`
| 4.18 | 239,082,464.13 | 0.3% | 0.05 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
| 2.09 | 478,583,295.30 | 0.3% | 0.02 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
| 5.22 | 191,467,070.40 | 0.2% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
| 2.64 | 379,449,038.71 | 0.5% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
| 8.32 | 120,243,551.71 | 0.0% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
| 4.19 | 238,565,070.80 | 0.2% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`
25354
| ns/byte | byte/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1.45 | 689,833,570.39 | 0.1% | 0.02 | `CHACHA20_1MB`
| 1.46 | 683,001,138.22 | 0.5% | 0.01 | `CHACHA20_256BYTES`
| 1.50 | 667,103,331.45 | 0.4% | 0.01 | `CHACHA20_64BYTES`
| 4.11 | 243,166,834.68 | 0.4% | 0.05 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
| 2.05 | 488,903,601.82 | 0.1% | 0.02 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
| 5.10 | 196,238,854.79 | 0.3% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
| 2.61 | 383,104,484.46 | 1.4% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
| 8.02 | 124,743,480.76 | 0.2% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
| 4.05 | 246,740,398.27 | 0.1% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`
Linux x86
pr
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 1.99 | 502,338,808.12 | 0.6% | 16.75 | 4.80 | 3.490 | 0.02 | 0.0% | 0.02 | `CHACHA20_1MB`
| 2.03 | 492,836,102.56 | 1.4% | 17.35 | 4.93 | 3.519 | 0.07 | 0.1% | 0.01 | `CHACHA20_256BYTES`
| 2.14 | 466,586,919.27 | 0.4% | 19.14 | 5.33 | 3.591 | 0.23 | 0.0% | 0.01 | `CHACHA20_64BYTES`
| 5.46 | 183,057,234.02 | 0.0% | 51.50 | 13.59 | 3.791 | 0.16 | 0.0% | 0.06 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
| 2.73 | 366,170,466.28 | 0.0% | 25.75 | 6.79 | 3.790 | 0.08 | 0.0% | 0.03 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
| 7.45 | 134,198,234.80 | 0.0% | 70.04 | 18.55 | 3.776 | 1.43 | 0.0% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
| 3.74 | 267,643,600.56 | 0.0% | 35.06 | 9.30 | 3.770 | 0.71 | 0.0% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
| 13.01 | 76,836,476.82 | 0.1% | 122.10 | 32.39 | 3.770 | 5.02 | 0.0% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
| 6.94 | 144,054,781.63 | 0.5% | 61.21 | 16.39 | 3.735 | 2.49 | 0.0% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`
master
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 2.00 | 499,954,466.20 | 0.3% | 16.88 | 4.84 | 3.486 | 0.03 | 0.0% | 0.02 | `CHACHA20_1MB`
| 2.05 | 486,935,198.75 | 1.4% | 17.37 | 4.97 | 3.496 | 0.05 | 0.0% | 0.01 | `CHACHA20_256BYTES`
| 2.22 | 450,220,269.82 | 0.4% | 18.86 | 5.37 | 3.512 | 0.13 | 0.0% | 0.01 | `CHACHA20_64BYTES`
| 5.68 | 176,107,008.87 | 0.2% | 51.76 | 13.70 | 3.777 | 0.19 | 0.0% | 0.07 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
| 2.83 | 353,532,983.23 | 0.1% | 25.88 | 6.85 | 3.779 | 0.09 | 0.0% | 0.03 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
| 7.72 | 129,486,776.95 | 0.2% | 72.62 | 18.74 | 3.875 | 2.06 | 0.0% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
| 3.89 | 257,245,464.66 | 0.3% | 36.35 | 9.40 | 3.865 | 1.02 | 0.0% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
| 13.97 | 71,562,220.46 | 0.2% | 135.22 | 33.84 | 3.995 | 7.67 | 0.0% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
| 7.05 | 141,907,027.48 | 0.4% | 67.76 | 17.04 | 3.977 | 3.81 | 0.0% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`
25354
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 1.98 | 505,256,746.75 | 0.1% | 16.98 | 4.80 | 3.540 | 0.05 | 0.0% | 0.02 | `CHACHA20_1MB`
| 2.06 | 484,794,937.08 | 0.2% | 17.46 | 5.00 | 3.493 | 0.08 | 0.0% | 0.01 | `CHACHA20_256BYTES`
| 2.22 | 449,753,059.35 | 0.3% | 18.89 | 5.38 | 3.511 | 0.17 | 0.0% | 0.01 | `CHACHA20_64BYTES`
| 5.66 | 176,546,195.18 | 0.7% | 51.97 | 13.63 | 3.813 | 0.22 | 0.0% | 0.07 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
| 2.81 | 355,375,284.35 | 0.1% | 25.99 | 6.81 | 3.817 | 0.11 | 0.0% | 0.03 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
| 8.04 | 124,406,853.03 | 0.4% | 73.18 | 19.44 | 3.765 | 2.14 | 0.0% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
| 4.03 | 248,427,666.34 | 0.3% | 36.63 | 9.74 | 3.760 | 1.07 | 0.0% | 0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
| 15.09 | 66,259,474.70 | 0.4% | 136.82 | 36.48 | 3.751 | 7.92 | 0.2% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
| 7.53 | 132,846,610.84 | 0.4% | 68.57 | 18.20 | 3.768 | 3.94 | 0.0% | 0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`
@jarolrod It's possible that these differences are just compiler/cpu cache randomness. E.g. just having functions be layed out slightly differently in memory can cause a difference in slight timing differences. Objectively, this PR does less work per byte for long messages than master, while 25354 does more work per byte, but that's not always possible to benchmark when you're talking about a difference of a few CPU cycles per 64 bytes. The compiler may in some cases just emit slightly better or slightly worse code, just due to e.g. alignment differences in where the functions end up in the binary and the CPU caching effects. |
Concept ACK |
Rebased. |
4fce8c2
to
a744c87
Compare
ut reACK 511aa4f |
Going to move this forward. @real-or-random would still be great for you to leave any post-merge commentary, if you get the time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some post-merge comments. None of them are critical. Sorry, many of them are not directly related to the changes in this PR, but I felt it's still easier that I add all of them to this review.
@real-or-random thanks for circling around to this post-merge. |
…tion & follow-ups 57cc136 crypto: make ChaCha20::SetKey wipe buffer (Pieter Wuille) da0ec62 tests: miscellaneous hex / std::byte improvements (Pieter Wuille) bdcbc85 fuzz: support std::byte in Consume{Fixed,Variable}LengthByteVector (Pieter Wuille) 7d1cd93 crypto: require key on ChaCha20 initialization (Pieter Wuille) 44c1176 random: simplify FastRandomContext::randbytes using fillrand (Pieter Wuille) 3da636e crypto: refactor ChaCha20 classes to use Span<std::byte> interface (Pieter Wuille) Pull request description: This modernizes the ChaCha20 and ChaCha20Aligned interfaces to be `Span<std::byte>` based, and other improvements. * Modifies all functions and constructors of `ChaCha20` and `ChaCha20Aligned` to be `Span<std::byte>` based (aligning them with `FSChaCha20`, `AEADChaCha20Poly1305`, and `FSChaCha20Poly1305`) * Remove default constructors, to make sure all call sites provide a key (suggested in bitcoin/bitcoin#26153 (comment)) * Wipe key material on rekey for security (suggested in bitcoin/bitcoin#26153 (comment)) * Use `HexStr` on byte vectors in tests (suggested in bitcoin/bitcoin#27993 (comment)) * Support `std::byte` vectors in `ConsumeRandomLengthByteVector` and `ConsumeFixedLengthByteVector`, and use it (suggested in bitcoin/bitcoin#27993 (comment)) * And a few more. While related, I don't see this as a necessary for BIP324. ACKs for top commit: stratospher: ACK 57cc136. theStack: re-ACK 57cc136 Tree-SHA512: 361da4ff003c8465a32eeac0983a8a6f047dbbf5b400168b409c8e3234e79d577fc854e0764389446585da3e12b964c94dd67fc0c9c1d1d092cec296121e05d4
…ollow-ups 57cc136 crypto: make ChaCha20::SetKey wipe buffer (Pieter Wuille) da0ec62 tests: miscellaneous hex / std::byte improvements (Pieter Wuille) bdcbc85 fuzz: support std::byte in Consume{Fixed,Variable}LengthByteVector (Pieter Wuille) 7d1cd93 crypto: require key on ChaCha20 initialization (Pieter Wuille) 44c1176 random: simplify FastRandomContext::randbytes using fillrand (Pieter Wuille) 3da636e crypto: refactor ChaCha20 classes to use Span<std::byte> interface (Pieter Wuille) Pull request description: This modernizes the ChaCha20 and ChaCha20Aligned interfaces to be `Span<std::byte>` based, and other improvements. * Modifies all functions and constructors of `ChaCha20` and `ChaCha20Aligned` to be `Span<std::byte>` based (aligning them with `FSChaCha20`, `AEADChaCha20Poly1305`, and `FSChaCha20Poly1305`) * Remove default constructors, to make sure all call sites provide a key (suggested in bitcoin#26153 (comment)) * Wipe key material on rekey for security (suggested in bitcoin#26153 (comment)) * Use `HexStr` on byte vectors in tests (suggested in bitcoin#27993 (comment)) * Support `std::byte` vectors in `ConsumeRandomLengthByteVector` and `ConsumeFixedLengthByteVector`, and use it (suggested in bitcoin#27993 (comment)) * And a few more. While related, I don't see this as a necessary for BIP324. ACKs for top commit: stratospher: ACK 57cc136. theStack: re-ACK 57cc136 Tree-SHA512: 361da4ff003c8465a32eeac0983a8a6f047dbbf5b400168b409c8e3234e79d577fc854e0764389446585da3e12b964c94dd67fc0c9c1d1d092cec296121e05d4
…arious improvements 511aa4f Add unit test for ChaCha20's new caching (Pieter Wuille) fb243d2 Improve test vectors for ChaCha20 (Pieter Wuille) 93aee8b Inline ChaCha20 32-byte specific constants (Pieter Wuille) 62ec713 Only support 32-byte keys in ChaCha20{,Aligned} (Pieter Wuille) f21994a Use ChaCha20Aligned in MuHash3072 code (Pieter Wuille) 5d16f75 Use ChaCha20 caching in FastRandomContext (Pieter Wuille) 38eaece Add fuzz test for testing that ChaCha20 works as a stream (Pieter Wuille) 5f05b27 Add xoroshiro128++ PRNG (Martin Leitner-Ankerl) 12ff724 Make unrestricted ChaCha20 cipher not waste keystream bytes (Pieter Wuille) 6babf40 Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 (Pieter Wuille) e37bcaa Split ChaCha20 into aligned/unaligned variants (Pieter Wuille) Pull request description: This is an alternative to bitcoin#25354 (by my benchmarking, somewhat faster), subsumes bitcoin#25712, and adds additional test vectors. It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching. I thought about rebasing bitcoin#25712 on top of this, but the changes before are fairly extensive, so redid it instead. ACKs for top commit: ajtowns: ut reACK 511aa4f dhruv: tACK crACK 511aa4f Tree-SHA512: 3aa80971322a93e780c75a8d35bd39da3a9ea570fbae4491eaf0c45242f5f670a24a592c50ad870d5fd09b9f88ec06e274e8aa3cefd9561d623c63f7198cf2c7
…arious improvements
…arious improvements 511aa4f Add unit test for ChaCha20's new caching (Pieter Wuille) fb243d2 Improve test vectors for ChaCha20 (Pieter Wuille) 93aee8b Inline ChaCha20 32-byte specific constants (Pieter Wuille) 62ec713 Only support 32-byte keys in ChaCha20{,Aligned} (Pieter Wuille) f21994a Use ChaCha20Aligned in MuHash3072 code (Pieter Wuille) 5d16f75 Use ChaCha20 caching in FastRandomContext (Pieter Wuille) 38eaece Add fuzz test for testing that ChaCha20 works as a stream (Pieter Wuille) 5f05b27 Add xoroshiro128++ PRNG (Martin Leitner-Ankerl) 12ff724 Make unrestricted ChaCha20 cipher not waste keystream bytes (Pieter Wuille) 6babf40 Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 (Pieter Wuille) e37bcaa Split ChaCha20 into aligned/unaligned variants (Pieter Wuille) Pull request description: This is an alternative to bitcoin#25354 (by my benchmarking, somewhat faster), subsumes bitcoin#25712, and adds additional test vectors. It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching. I thought about rebasing bitcoin#25712 on top of this, but the changes before are fairly extensive, so redid it instead. ACKs for top commit: ajtowns: ut reACK 511aa4f dhruv: tACK crACK 511aa4f Tree-SHA512: 3aa80971322a93e780c75a8d35bd39da3a9ea570fbae4491eaf0c45242f5f670a24a592c50ad870d5fd09b9f88ec06e274e8aa3cefd9561d623c63f7198cf2c7
…arious improvements 511aa4f Add unit test for ChaCha20's new caching (Pieter Wuille) fb243d2 Improve test vectors for ChaCha20 (Pieter Wuille) 93aee8b Inline ChaCha20 32-byte specific constants (Pieter Wuille) 62ec713 Only support 32-byte keys in ChaCha20{,Aligned} (Pieter Wuille) f21994a Use ChaCha20Aligned in MuHash3072 code (Pieter Wuille) 5d16f75 Use ChaCha20 caching in FastRandomContext (Pieter Wuille) 38eaece Add fuzz test for testing that ChaCha20 works as a stream (Pieter Wuille) 5f05b27 Add xoroshiro128++ PRNG (Martin Leitner-Ankerl) 12ff724 Make unrestricted ChaCha20 cipher not waste keystream bytes (Pieter Wuille) 6babf40 Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 (Pieter Wuille) e37bcaa Split ChaCha20 into aligned/unaligned variants (Pieter Wuille) Pull request description: This is an alternative to bitcoin#25354 (by my benchmarking, somewhat faster), subsumes bitcoin#25712, and adds additional test vectors. It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching. I thought about rebasing bitcoin#25712 on top of this, but the changes before are fairly extensive, so redid it instead. ACKs for top commit: ajtowns: ut reACK 511aa4f dhruv: tACK crACK 511aa4f Tree-SHA512: 3aa80971322a93e780c75a8d35bd39da3a9ea570fbae4491eaf0c45242f5f670a24a592c50ad870d5fd09b9f88ec06e274e8aa3cefd9561d623c63f7198cf2c7
…arious improvements 511aa4f Add unit test for ChaCha20's new caching (Pieter Wuille) fb243d2 Improve test vectors for ChaCha20 (Pieter Wuille) 93aee8b Inline ChaCha20 32-byte specific constants (Pieter Wuille) 62ec713 Only support 32-byte keys in ChaCha20{,Aligned} (Pieter Wuille) f21994a Use ChaCha20Aligned in MuHash3072 code (Pieter Wuille) 5d16f75 Use ChaCha20 caching in FastRandomContext (Pieter Wuille) 38eaece Add fuzz test for testing that ChaCha20 works as a stream (Pieter Wuille) 5f05b27 Add xoroshiro128++ PRNG (Martin Leitner-Ankerl) 12ff724 Make unrestricted ChaCha20 cipher not waste keystream bytes (Pieter Wuille) 6babf40 Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 (Pieter Wuille) e37bcaa Split ChaCha20 into aligned/unaligned variants (Pieter Wuille) Pull request description: This is an alternative to bitcoin#25354 (by my benchmarking, somewhat faster), subsumes bitcoin#25712, and adds additional test vectors. It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching. I thought about rebasing bitcoin#25712 on top of this, but the changes before are fairly extensive, so redid it instead. ACKs for top commit: ajtowns: ut reACK 511aa4f dhruv: tACK crACK 511aa4f Tree-SHA512: 3aa80971322a93e780c75a8d35bd39da3a9ea570fbae4491eaf0c45242f5f670a24a592c50ad870d5fd09b9f88ec06e274e8aa3cefd9561d623c63f7198cf2c7
…arious improvements 511aa4f Add unit test for ChaCha20's new caching (Pieter Wuille) fb243d2 Improve test vectors for ChaCha20 (Pieter Wuille) 93aee8b Inline ChaCha20 32-byte specific constants (Pieter Wuille) 62ec713 Only support 32-byte keys in ChaCha20{,Aligned} (Pieter Wuille) f21994a Use ChaCha20Aligned in MuHash3072 code (Pieter Wuille) 5d16f75 Use ChaCha20 caching in FastRandomContext (Pieter Wuille) 38eaece Add fuzz test for testing that ChaCha20 works as a stream (Pieter Wuille) 5f05b27 Add xoroshiro128++ PRNG (Martin Leitner-Ankerl) 12ff724 Make unrestricted ChaCha20 cipher not waste keystream bytes (Pieter Wuille) 6babf40 Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 (Pieter Wuille) e37bcaa Split ChaCha20 into aligned/unaligned variants (Pieter Wuille) Pull request description: This is an alternative to bitcoin#25354 (by my benchmarking, somewhat faster), subsumes bitcoin#25712, and adds additional test vectors. It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching. I thought about rebasing bitcoin#25712 on top of this, but the changes before are fairly extensive, so redid it instead. ACKs for top commit: ajtowns: ut reACK 511aa4f dhruv: tACK crACK 511aa4f Tree-SHA512: 3aa80971322a93e780c75a8d35bd39da3a9ea570fbae4491eaf0c45242f5f670a24a592c50ad870d5fd09b9f88ec06e274e8aa3cefd9561d623c63f7198cf2c7
This is an alternative to #25354 (by my benchmarking, somewhat faster), subsumes #25712, and adds additional test vectors.
It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching.
I thought about rebasing #25712 on top of this, but the changes before are fairly extensive, so redid it instead.