Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce wasted pseudorandom bytes in ChaCha20 + various improvements #26153

Merged
merged 11 commits into from
Feb 15, 2023

Conversation

sipa
Copy link
Member

@sipa sipa commented Sep 21, 2022

This is an alternative to #25354 (by my benchmarking, somewhat faster), subsumes #25712, and adds additional test vectors.

It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching.

I thought about rebasing #25712 on top of this, but the changes before are fairly extensive, so redid it instead.

@sipa sipa force-pushed the 202209_chacha20 branch 2 times, most recently from d47e8c2 to d1aa65b Compare September 21, 2022 23:59
@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 22, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK ajtowns, dhruv
Concept ACK theStack

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #25325 (Add pool based memory resource by martinus)
  • #24748 (test/BIP324: functional tests for v2 P2P encryption by stratospher)
  • #23169 (fix initialization in FastRandomContext: removes undefined behavior & uninitialized read by martinus)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@fanquake
Copy link
Member

cc @theStack re #25698 & #25712.

(by my benchmarking, somewhat faster),

@aureleoules want to benchmark here as well?

Copy link
Contributor

@aureleoules aureleoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR

ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
2.58 387,995,472.42 0.5% 16.98 5.93 2.866 0.03 0.0% 0.03 CHACHA20_1MB
ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
2.57 389,442,111.22 0.1% 16.98 5.89 2.883 0.03 0.0% 0.03 CHACHA20_1MB

Master

ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
2.57 389,658,756.50 0.4% 17.03 5.89 2.892 0.05 0.0% 0.03 CHACHA20_1MB
ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
2.56 390,349,081.17 0.3% 17.03 5.87 2.900 0.05 0.0% 0.03 CHACHA20_1MB

#25354

ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
2.89 346,242,492.84 0.4% 20.13 6.62 3.039 0.08 0.0% 0.03 CHACHA20_1MB
ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
2.88 346,669,699.46 0.6% 20.13 6.59 3.052 0.08 0.0% 0.03 CHACHA20_1MB

Results seem very close to master and definitely faster than #25354.

src/crypto/chacha20.h Outdated Show resolved Hide resolved
Copy link
Member

@jarolrod jarolrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticed something interesting, this PR is faster than master and #25354 when I tested on linux x86 but not on macOS arm64. When testing on macOS arm64, #25354 is faster:

macOS arm64

pr

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.45 |      687,478,118.34 |    0.1% |      0.02 | `CHACHA20_1MB`
|                1.48 |      676,913,544.12 |    0.4% |      0.01 | `CHACHA20_256BYTES`
|                1.53 |      652,987,660.64 |    0.3% |      0.01 | `CHACHA20_64BYTES`
|                4.12 |      242,489,722.97 |    0.8% |      0.05 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
|                2.05 |      488,239,791.48 |    0.0% |      0.02 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
|                5.16 |      193,768,633.08 |    0.3% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
|                2.61 |      383,482,499.71 |    1.1% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
|                8.47 |      118,017,131.11 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
|                4.27 |      234,365,831.19 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

master

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.48 |      675,393,000.64 |    0.1% |      0.02 | `CHACHA20_1MB`
|                1.51 |      661,164,154.45 |    0.7% |      0.01 | `CHACHA20_256BYTES`
|                1.58 |      632,796,536.80 |    0.4% |      0.01 | `CHACHA20_64BYTES`
|                4.18 |      239,082,464.13 |    0.3% |      0.05 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
|                2.09 |      478,583,295.30 |    0.3% |      0.02 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
|                5.22 |      191,467,070.40 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
|                2.64 |      379,449,038.71 |    0.5% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
|                8.32 |      120,243,551.71 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
|                4.19 |      238,565,070.80 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

25354

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.45 |      689,833,570.39 |    0.1% |      0.02 | `CHACHA20_1MB`
|                1.46 |      683,001,138.22 |    0.5% |      0.01 | `CHACHA20_256BYTES`
|                1.50 |      667,103,331.45 |    0.4% |      0.01 | `CHACHA20_64BYTES`
|                4.11 |      243,166,834.68 |    0.4% |      0.05 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
|                2.05 |      488,903,601.82 |    0.1% |      0.02 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
|                5.10 |      196,238,854.79 |    0.3% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
|                2.61 |      383,104,484.46 |    1.4% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
|                8.02 |      124,743,480.76 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
|                4.05 |      246,740,398.27 |    0.1% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

Linux x86

pr

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                1.99 |      502,338,808.12 |    0.6% |           16.75 |            4.80 |  3.490 |           0.02 |    0.0% |      0.02 | `CHACHA20_1MB`
|                2.03 |      492,836,102.56 |    1.4% |           17.35 |            4.93 |  3.519 |           0.07 |    0.1% |      0.01 | `CHACHA20_256BYTES`
|                2.14 |      466,586,919.27 |    0.4% |           19.14 |            5.33 |  3.591 |           0.23 |    0.0% |      0.01 | `CHACHA20_64BYTES`
|                5.46 |      183,057,234.02 |    0.0% |           51.50 |           13.59 |  3.791 |           0.16 |    0.0% |      0.06 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
|                2.73 |      366,170,466.28 |    0.0% |           25.75 |            6.79 |  3.790 |           0.08 |    0.0% |      0.03 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
|                7.45 |      134,198,234.80 |    0.0% |           70.04 |           18.55 |  3.776 |           1.43 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
|                3.74 |      267,643,600.56 |    0.0% |           35.06 |            9.30 |  3.770 |           0.71 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
|               13.01 |       76,836,476.82 |    0.1% |          122.10 |           32.39 |  3.770 |           5.02 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
|                6.94 |      144,054,781.63 |    0.5% |           61.21 |           16.39 |  3.735 |           2.49 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

master

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                2.00 |      499,954,466.20 |    0.3% |           16.88 |            4.84 |  3.486 |           0.03 |    0.0% |      0.02 | `CHACHA20_1MB`
|                2.05 |      486,935,198.75 |    1.4% |           17.37 |            4.97 |  3.496 |           0.05 |    0.0% |      0.01 | `CHACHA20_256BYTES`
|                2.22 |      450,220,269.82 |    0.4% |           18.86 |            5.37 |  3.512 |           0.13 |    0.0% |      0.01 | `CHACHA20_64BYTES`
|                5.68 |      176,107,008.87 |    0.2% |           51.76 |           13.70 |  3.777 |           0.19 |    0.0% |      0.07 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
|                2.83 |      353,532,983.23 |    0.1% |           25.88 |            6.85 |  3.779 |           0.09 |    0.0% |      0.03 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
|                7.72 |      129,486,776.95 |    0.2% |           72.62 |           18.74 |  3.875 |           2.06 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
|                3.89 |      257,245,464.66 |    0.3% |           36.35 |            9.40 |  3.865 |           1.02 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
|               13.97 |       71,562,220.46 |    0.2% |          135.22 |           33.84 |  3.995 |           7.67 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
|                7.05 |      141,907,027.48 |    0.4% |           67.76 |           17.04 |  3.977 |           3.81 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

25354

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                1.98 |      505,256,746.75 |    0.1% |           16.98 |            4.80 |  3.540 |           0.05 |    0.0% |      0.02 | `CHACHA20_1MB`
|                2.06 |      484,794,937.08 |    0.2% |           17.46 |            5.00 |  3.493 |           0.08 |    0.0% |      0.01 | `CHACHA20_256BYTES`
|                2.22 |      449,753,059.35 |    0.3% |           18.89 |            5.38 |  3.511 |           0.17 |    0.0% |      0.01 | `CHACHA20_64BYTES`
|                5.66 |      176,546,195.18 |    0.7% |           51.97 |           13.63 |  3.813 |           0.22 |    0.0% |      0.07 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
|                2.81 |      355,375,284.35 |    0.1% |           25.99 |            6.81 |  3.817 |           0.11 |    0.0% |      0.03 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
|                8.04 |      124,406,853.03 |    0.4% |           73.18 |           19.44 |  3.765 |           2.14 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
|                4.03 |      248,427,666.34 |    0.3% |           36.63 |            9.74 |  3.760 |           1.07 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
|               15.09 |       66,259,474.70 |    0.4% |          136.82 |           36.48 |  3.751 |           7.92 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
|                7.53 |      132,846,610.84 |    0.4% |           68.57 |           18.20 |  3.768 |           3.94 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

@sipa
Copy link
Member Author

sipa commented Sep 27, 2022

@jarolrod It's possible that these differences are just compiler/cpu cache randomness. E.g. just having functions be layed out slightly differently in memory can cause a difference in slight timing differences. Objectively, this PR does less work per byte for long messages than master, while 25354 does more work per byte, but that's not always possible to benchmark when you're talking about a difference of a few CPU cycles per 64 bytes. The compiler may in some cases just emit slightly better or slightly worse code, just due to e.g. alignment differences in where the functions end up in the binary and the CPU caching effects.

@theStack
Copy link
Contributor

theStack commented Oct 2, 2022

Concept ACK

@sipa
Copy link
Member Author

sipa commented Oct 19, 2022

Rebased.

@ajtowns
Copy link
Contributor

ajtowns commented Feb 7, 2023

ut reACK 511aa4f

@dhruv
Copy link
Contributor

dhruv commented Feb 7, 2023

tACK crACK 511aa4f

Reviewed git range-diff ceb74b844c aca35e5 511aa4f1c7
Downstream BIP324 branches include 511aa4f with green CI

@fanquake fanquake merged commit 1e0198b into bitcoin:master Feb 15, 2023
@fanquake
Copy link
Member

Going to move this forward. @real-or-random would still be great for you to leave any post-merge commentary, if you get the time.

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Feb 15, 2023
Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some post-merge comments. None of them are critical. Sorry, many of them are not directly related to the changes in this PR, but I felt it's still easier that I add all of them to this review.

src/crypto/muhash.cpp Show resolved Hide resolved
src/crypto/chacha20.h Show resolved Hide resolved
src/crypto/chacha20.cpp Show resolved Hide resolved
src/crypto/chacha20.h Show resolved Hide resolved
src/crypto/chacha20.h Show resolved Hide resolved
@fanquake
Copy link
Member

fanquake commented Mar 8, 2023

@real-or-random thanks for circling around to this post-merge.

@sipa sipa mentioned this pull request May 12, 2023
43 tasks
fanquake added a commit to bitcoin-core/gui that referenced this pull request Aug 18, 2023
…tion & follow-ups

57cc136 crypto: make ChaCha20::SetKey wipe buffer (Pieter Wuille)
da0ec62 tests: miscellaneous hex / std::byte improvements (Pieter Wuille)
bdcbc85 fuzz: support std::byte in Consume{Fixed,Variable}LengthByteVector (Pieter Wuille)
7d1cd93 crypto: require key on ChaCha20 initialization (Pieter Wuille)
44c1176 random: simplify FastRandomContext::randbytes using fillrand (Pieter Wuille)
3da636e crypto: refactor ChaCha20 classes to use Span<std::byte> interface (Pieter Wuille)

Pull request description:

  This modernizes the ChaCha20 and ChaCha20Aligned interfaces to be `Span<std::byte>` based, and other improvements.

  * Modifies all functions and constructors of `ChaCha20` and `ChaCha20Aligned` to be `Span<std::byte>` based (aligning them with `FSChaCha20`, `AEADChaCha20Poly1305`, and `FSChaCha20Poly1305`)
  * Remove default constructors, to make sure all call sites provide a key (suggested in bitcoin/bitcoin#26153 (comment))
  * Wipe key material on rekey for security (suggested in bitcoin/bitcoin#26153 (comment))
  * Use `HexStr` on byte vectors in tests (suggested in bitcoin/bitcoin#27993 (comment))
  * Support `std::byte` vectors in `ConsumeRandomLengthByteVector` and `ConsumeFixedLengthByteVector`, and use it (suggested in bitcoin/bitcoin#27993 (comment))
  * And a few more.

  While related, I don't see this as a necessary for BIP324.

ACKs for top commit:
  stratospher:
    ACK 57cc136.
  theStack:
    re-ACK 57cc136

Tree-SHA512: 361da4ff003c8465a32eeac0983a8a6f047dbbf5b400168b409c8e3234e79d577fc854e0764389446585da3e12b964c94dd67fc0c9c1d1d092cec296121e05d4
Frank-GER pushed a commit to syscoin/syscoin that referenced this pull request Sep 8, 2023
…ollow-ups

57cc136 crypto: make ChaCha20::SetKey wipe buffer (Pieter Wuille)
da0ec62 tests: miscellaneous hex / std::byte improvements (Pieter Wuille)
bdcbc85 fuzz: support std::byte in Consume{Fixed,Variable}LengthByteVector (Pieter Wuille)
7d1cd93 crypto: require key on ChaCha20 initialization (Pieter Wuille)
44c1176 random: simplify FastRandomContext::randbytes using fillrand (Pieter Wuille)
3da636e crypto: refactor ChaCha20 classes to use Span<std::byte> interface (Pieter Wuille)

Pull request description:

  This modernizes the ChaCha20 and ChaCha20Aligned interfaces to be `Span<std::byte>` based, and other improvements.

  * Modifies all functions and constructors of `ChaCha20` and `ChaCha20Aligned` to be `Span<std::byte>` based (aligning them with `FSChaCha20`, `AEADChaCha20Poly1305`, and `FSChaCha20Poly1305`)
  * Remove default constructors, to make sure all call sites provide a key (suggested in bitcoin#26153 (comment))
  * Wipe key material on rekey for security (suggested in bitcoin#26153 (comment))
  * Use `HexStr` on byte vectors in tests (suggested in bitcoin#27993 (comment))
  * Support `std::byte` vectors in `ConsumeRandomLengthByteVector` and `ConsumeFixedLengthByteVector`, and use it (suggested in bitcoin#27993 (comment))
  * And a few more.

  While related, I don't see this as a necessary for BIP324.

ACKs for top commit:
  stratospher:
    ACK 57cc136.
  theStack:
    re-ACK 57cc136

Tree-SHA512: 361da4ff003c8465a32eeac0983a8a6f047dbbf5b400168b409c8e3234e79d577fc854e0764389446585da3e12b964c94dd67fc0c9c1d1d092cec296121e05d4
ogabrielides pushed a commit to ogabrielides/dash that referenced this pull request Nov 15, 2023
…arious improvements

511aa4f Add unit test for ChaCha20's new caching (Pieter Wuille)
fb243d2 Improve test vectors for ChaCha20 (Pieter Wuille)
93aee8b Inline ChaCha20 32-byte specific constants (Pieter Wuille)
62ec713 Only support 32-byte keys in ChaCha20{,Aligned} (Pieter Wuille)
f21994a Use ChaCha20Aligned in MuHash3072 code (Pieter Wuille)
5d16f75 Use ChaCha20 caching in FastRandomContext (Pieter Wuille)
38eaece Add fuzz test for testing that ChaCha20 works as a stream (Pieter Wuille)
5f05b27 Add xoroshiro128++ PRNG (Martin Leitner-Ankerl)
12ff724 Make unrestricted ChaCha20 cipher not waste keystream bytes (Pieter Wuille)
6babf40 Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 (Pieter Wuille)
e37bcaa Split ChaCha20 into aligned/unaligned variants (Pieter Wuille)

Pull request description:

  This is an alternative to bitcoin#25354 (by my benchmarking, somewhat faster), subsumes bitcoin#25712, and adds additional test vectors.

  It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching.

  I thought about rebasing bitcoin#25712 on top of this, but the changes before are fairly extensive, so redid it instead.

ACKs for top commit:
  ajtowns:
    ut reACK 511aa4f
  dhruv:
    tACK crACK 511aa4f

Tree-SHA512: 3aa80971322a93e780c75a8d35bd39da3a9ea570fbae4491eaf0c45242f5f670a24a592c50ad870d5fd09b9f88ec06e274e8aa3cefd9561d623c63f7198cf2c7
ogabrielides added a commit to ogabrielides/dash that referenced this pull request Nov 15, 2023
ogabrielides pushed a commit to ogabrielides/dash that referenced this pull request Nov 17, 2023
…arious improvements

511aa4f Add unit test for ChaCha20's new caching (Pieter Wuille)
fb243d2 Improve test vectors for ChaCha20 (Pieter Wuille)
93aee8b Inline ChaCha20 32-byte specific constants (Pieter Wuille)
62ec713 Only support 32-byte keys in ChaCha20{,Aligned} (Pieter Wuille)
f21994a Use ChaCha20Aligned in MuHash3072 code (Pieter Wuille)
5d16f75 Use ChaCha20 caching in FastRandomContext (Pieter Wuille)
38eaece Add fuzz test for testing that ChaCha20 works as a stream (Pieter Wuille)
5f05b27 Add xoroshiro128++ PRNG (Martin Leitner-Ankerl)
12ff724 Make unrestricted ChaCha20 cipher not waste keystream bytes (Pieter Wuille)
6babf40 Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 (Pieter Wuille)
e37bcaa Split ChaCha20 into aligned/unaligned variants (Pieter Wuille)

Pull request description:

  This is an alternative to bitcoin#25354 (by my benchmarking, somewhat faster), subsumes bitcoin#25712, and adds additional test vectors.

  It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching.

  I thought about rebasing bitcoin#25712 on top of this, but the changes before are fairly extensive, so redid it instead.

ACKs for top commit:
  ajtowns:
    ut reACK 511aa4f
  dhruv:
    tACK crACK 511aa4f

Tree-SHA512: 3aa80971322a93e780c75a8d35bd39da3a9ea570fbae4491eaf0c45242f5f670a24a592c50ad870d5fd09b9f88ec06e274e8aa3cefd9561d623c63f7198cf2c7
ogabrielides pushed a commit to ogabrielides/dash that referenced this pull request Nov 17, 2023
…arious improvements

511aa4f Add unit test for ChaCha20's new caching (Pieter Wuille)
fb243d2 Improve test vectors for ChaCha20 (Pieter Wuille)
93aee8b Inline ChaCha20 32-byte specific constants (Pieter Wuille)
62ec713 Only support 32-byte keys in ChaCha20{,Aligned} (Pieter Wuille)
f21994a Use ChaCha20Aligned in MuHash3072 code (Pieter Wuille)
5d16f75 Use ChaCha20 caching in FastRandomContext (Pieter Wuille)
38eaece Add fuzz test for testing that ChaCha20 works as a stream (Pieter Wuille)
5f05b27 Add xoroshiro128++ PRNG (Martin Leitner-Ankerl)
12ff724 Make unrestricted ChaCha20 cipher not waste keystream bytes (Pieter Wuille)
6babf40 Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 (Pieter Wuille)
e37bcaa Split ChaCha20 into aligned/unaligned variants (Pieter Wuille)

Pull request description:

  This is an alternative to bitcoin#25354 (by my benchmarking, somewhat faster), subsumes bitcoin#25712, and adds additional test vectors.

  It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching.

  I thought about rebasing bitcoin#25712 on top of this, but the changes before are fairly extensive, so redid it instead.

ACKs for top commit:
  ajtowns:
    ut reACK 511aa4f
  dhruv:
    tACK crACK 511aa4f

Tree-SHA512: 3aa80971322a93e780c75a8d35bd39da3a9ea570fbae4491eaf0c45242f5f670a24a592c50ad870d5fd09b9f88ec06e274e8aa3cefd9561d623c63f7198cf2c7
UdjinM6 pushed a commit to ogabrielides/dash that referenced this pull request Nov 17, 2023
…arious improvements

511aa4f Add unit test for ChaCha20's new caching (Pieter Wuille)
fb243d2 Improve test vectors for ChaCha20 (Pieter Wuille)
93aee8b Inline ChaCha20 32-byte specific constants (Pieter Wuille)
62ec713 Only support 32-byte keys in ChaCha20{,Aligned} (Pieter Wuille)
f21994a Use ChaCha20Aligned in MuHash3072 code (Pieter Wuille)
5d16f75 Use ChaCha20 caching in FastRandomContext (Pieter Wuille)
38eaece Add fuzz test for testing that ChaCha20 works as a stream (Pieter Wuille)
5f05b27 Add xoroshiro128++ PRNG (Martin Leitner-Ankerl)
12ff724 Make unrestricted ChaCha20 cipher not waste keystream bytes (Pieter Wuille)
6babf40 Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 (Pieter Wuille)
e37bcaa Split ChaCha20 into aligned/unaligned variants (Pieter Wuille)

Pull request description:

  This is an alternative to bitcoin#25354 (by my benchmarking, somewhat faster), subsumes bitcoin#25712, and adds additional test vectors.

  It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching.

  I thought about rebasing bitcoin#25712 on top of this, but the changes before are fairly extensive, so redid it instead.

ACKs for top commit:
  ajtowns:
    ut reACK 511aa4f
  dhruv:
    tACK crACK 511aa4f

Tree-SHA512: 3aa80971322a93e780c75a8d35bd39da3a9ea570fbae4491eaf0c45242f5f670a24a592c50ad870d5fd09b9f88ec06e274e8aa3cefd9561d623c63f7198cf2c7
PastaPastaPasta pushed a commit to ogabrielides/dash that referenced this pull request Nov 19, 2023
…arious improvements

511aa4f Add unit test for ChaCha20's new caching (Pieter Wuille)
fb243d2 Improve test vectors for ChaCha20 (Pieter Wuille)
93aee8b Inline ChaCha20 32-byte specific constants (Pieter Wuille)
62ec713 Only support 32-byte keys in ChaCha20{,Aligned} (Pieter Wuille)
f21994a Use ChaCha20Aligned in MuHash3072 code (Pieter Wuille)
5d16f75 Use ChaCha20 caching in FastRandomContext (Pieter Wuille)
38eaece Add fuzz test for testing that ChaCha20 works as a stream (Pieter Wuille)
5f05b27 Add xoroshiro128++ PRNG (Martin Leitner-Ankerl)
12ff724 Make unrestricted ChaCha20 cipher not waste keystream bytes (Pieter Wuille)
6babf40 Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 (Pieter Wuille)
e37bcaa Split ChaCha20 into aligned/unaligned variants (Pieter Wuille)

Pull request description:

  This is an alternative to bitcoin#25354 (by my benchmarking, somewhat faster), subsumes bitcoin#25712, and adds additional test vectors.

  It separates the multiple-of-64-bytes-only "core" logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20's new built-in caching.

  I thought about rebasing bitcoin#25712 on top of this, but the changes before are fairly extensive, so redid it instead.

ACKs for top commit:
  ajtowns:
    ut reACK 511aa4f
  dhruv:
    tACK crACK 511aa4f

Tree-SHA512: 3aa80971322a93e780c75a8d35bd39da3a9ea570fbae4491eaf0c45242f5f670a24a592c50ad870d5fd09b9f88ec06e274e8aa3cefd9561d623c63f7198cf2c7
@bitcoin bitcoin locked and limited conversation to collaborators Jul 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.