Several randomness improvements #29625

sipa · 2024-03-11T19:36:49Z

This PR contains a number of vaguely-related improvements to the random module.

The specific changes and more detailed rationale is in the commit messages, but the highlights are:

XoRoShiRo128PlusPlus (previously a test-only RNG) moves to random.h and becomes InsecureRandomContext, which is even faster than FastRandomContext but non-cryptographic. It also gets all helper randomness functions (randrange, fillrand, ...), making it a lot more succinct to use.
During tests, all randomness is made deterministic (except for GetStrongRandBytes) but non-repeating (like GetRand() used to be when g_mock_deterministic_tests was used), either fixed, or from a random seed (overridden by env var).
Several infrequently used top-level functions (GetRandMillis, GetRandMicros, GetExponentialRand) are converted into member functions of FastRandomContext (and InsecureRandomContext).
GetRand<T>() (without argument) can now return the maximum value of the type (previously e.g. GetRand<uint32_t>() would never return 0xffffffff).

DrahtBot · 2024-03-11T19:36:52Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	dergoegge

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)
#29543 (refactor: Avoid unsigned integer overflow in script/interpreter.cpp by hebasto)
#29480 (Drop log category in SeedStartup by hebasto)
#29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
#26114 (net: Make AddrFetch connections to fixed seeds by mzumsande)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot · 2024-03-12T04:41:53Z

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

_{Debug: https://github.com/bitcoin/bitcoin/runs/22530315845}

dergoegge · 2024-03-12T10:09:39Z

Concept ACK

There are a few places in the fuzz tests where this will allow to easily replace FastRandomContext with a InsecureRandomContext, which is beneficial for performance (e.g. the addrman harnesses partially fill the addrman with addresses from rng) and we don't need cryptographic rng there anyway.

During tests, all randomness is made deterministic

Great, this should help with #29018

Sjors · 2024-03-12T15:11:44Z

What's the impact on the fuzz corpus of switching to a different (?) deterministic RNG?

dergoegge · 2024-03-12T15:53:35Z

What's the impact on the fuzz corpus of switching to a different (?) deterministic RNG?

I would expect that switching to a different rng should have no meaningful effect on the corpus itself. The corpus for a particular harness might change but the coverage for the code we intend to test should remain the same. This is because using rng in a fuzz harness only makes sense in very rare cases. It should never be used in a way that can significantly affect the coverage reached, otherwise there is no point in using a coverage-guided fuzzer, we could just pipe /dev/random to our harnesses.

For example, if we need to populate some data that we don't really expect to have an impact on the thing we are testing, we might use rng instead of consuming from the fuzz input (we do this in the p2p transport harnesses to fill message contents, which are essentially irrelevant to the transport logic).

Switching to deterministic rng can cause a corpus' coverage to grow because coverage-guided feedback loops start working more reliably when the code under test is deterministic. This can vary from harness to harness, but we've seen coverage-guided fuzzers find bugs once we've improved on determinism.

DrahtBot · 2024-03-15T19:04:07Z

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

_{Debug: https://github.com/bitcoin/bitcoin/runs/22719690018}

sipa · 2024-03-17T15:50:07Z

Ready for review.

sipa · 2024-03-23T13:23:57Z

@Sjors

What's the impact on the fuzz corpus of switching to a different (?) deterministic RNG?

Let's break it down into cases:

The main "random.h" RNG (GetRand() and friends, default-constructed FastRandomContext objects, ...). This used to be truly random in fuzz tests, and will now be deterministic with a fixed seed. In theory this should have no impact, because the fuzz tests shouldn't be relying on this randomness in the first place. But possibly there are some which do, indirectly through code that wasn't properly mocked or otherwise avoiding it, in which case making things deterministic should be a strict improvement by not making the fuzzer waste time on chasing the effects of that randomness.
XoRoShiRo128PlusPlus. There are a few fuzz tests (bip324, crypto_chacha20, p2p_transport_serialization, poolresource) that use this very fast deterministic RNG to construct certain data. This PR changes the behavior of some of them by replacing ad-hoc code to use that randomness with general helper functions that become available for all RNGs. In theory this might invalidate part of the fuzz corpus for those tests, but in practice I expect it won't, because the data drawn from those RNGs is data that shouldn't matter for the test much (if it did, it'd be drawn from the fuzz input instead).

So overall, it might invalidate a few tests' corpus (but probably not), and for others it should either have no effect or be a strict improvement.

EthanHeilman · 2024-04-05T22:10:26Z

@sipa I plan to do a review of this next week

dergoegge · 2024-04-30T09:18:02Z

src/net_processing.cpp

@@ -5501,7 +5503,7 @@ void PeerManagerImpl::MaybeSendFeefilter(CNode& pto, Peer& peer, std::chrono::mi
            MakeAndPushMessage(pto, NetMsgType::FEEFILTER, filterToSend);
            peer.m_fee_filter_sent = filterToSend;
        }
-        peer.m_next_send_feefilter = GetExponentialRand(current_time, AVG_FEEFILTER_BROADCAST_INTERVAL);
+        peer.m_next_send_feefilter = current_time + FastRandomContext().rand_expo_duration(AVG_FEEFILTER_BROADCAST_INTERVAL);


#28558 made PeerManager own a FastRandomContext, so we could (should?) use m_rng here instead (otherwise PeerManager::Options::deterministic_rng still only applies to some of the randomness).

Since this PR kind of makes individual "make this component deterministic" options redudant, we could consider reverting #28558 (not necessarily in this PR)?

I was thinking that in the long run we could break the dependencies between components and the specific rng they use (maybe something like template<RandomNumberGenerator R> class PeerManager { ... }), which would allow more fine grained mocking than a global "make rng deterministic" in tests (e.g. we could have a "rng" type that consumes from a FuzzedDataProvider). I guess this can be done by using globals as well.

#28558 made PeerManager own a FastRandomContext, so we could (should?) use m_rng here instead (otherwise PeerManager::Options::deterministic_rng still only applies to some of the randomness).

I've changed the PR to reuse PeerManagerImpl::m_rng.

Since this PR kind of makes individual "make this component deterministic" options redudant, we could consider reverting #28558 (not necessarily in this PR)?

Maybe. I've opted to use it where possible for now as it's a smaller change, and has some (possibly negligible) performance advantage (no need to lock the global RNG mutex to get randomness when you already hold g_msgproc_mutex), but I think that can be reconsidered.

Independently, we may be able to just drop PeerManager::Options::deterministic_rng, relying on global deterministic mode instead.

I was thinking that in the long run we could break the dependencies between components and the specific rng they use (maybe something like template<RandomNumberGenerator R> class PeerManager { ... }), which would allow more fine grained mocking than a global "make rng deterministic" in tests (e.g. we could have a "rng" type that consumes from a FuzzedDataProvider). I guess this can be done by using globals as well.

Maybe, though that means testing something very different from what we're doing here: testing under conditions where the RNG returns actually decidedly non-random results (which is different from a deterministic FastRandomContext which is still cryptographically-strong, just deterministic. I don't know for how many things this makes sense.

Rather than make all the useful types of randomness be exclusive to FastRandomContext, move it to a separate RandomMixin class where it can be reused by other RNGs. A Curiously Recurring Template Pattern (CRTP) is used for this, to provide the ability for individual RNG classes to override one or more randomness functions, without needing the runtime-cost of virtual classes. Specifically, RNGs are expected to only provide fillrand and rand64, while all others are derived from those: - randbits - randrange - randbytes - rand32 - rand256 - randbool - rand_uniform_delay - rand_uniform_duration - min(), max(), operator()(), to comply with C++ URBG concept.

The previous randbits code would, when requesting more randomness than available in its random bits buffer, discard the remaining entropy and generate new. Benchmarks show that it's usually better to first consume the existing randomness and only then generate new ones. This adds some complexity to randbits, but it doesn't weigh up against the reduced need to generate more randomness.

In many cases, it is known at compile time how many bits are requested from randbits. Provide a variant of randbits that accepts this number as a template, to make sure the compiler can make use of this knowledge. This is used immediately in rand32() and randbool(), and a few further call sites.

Make use of C++20 functions in XoRoShiRo128PlusPlus.

This is preparation for making it more generally accessible.

Convert XoRoShiRo128PlusPlus into a full RandomMixin-based RNG class, providing all utility functionality that FastRandomContext has. In doing so, it is renamed to InsecureRandomContext, highlighting its non-cryptographic nature. To do this, a fillrand fallback is added to RandomMixin (where it is used by InsecureRandomContext), but FastRandomContext still uses its own fillrand.

The existing code provides two randomness mechanisms for test purposes: - g_insecure_rand_ctx (with its wrappers InsecureRand*), which during tests is initialized using either zeros (SeedRand::ZEROS), or using environment-provided randomness (SeedRand::SEED). - g_mock_deterministic_tests, which controls some (but not all) of the normal randomness output if set, but then makes it extremely predictable (identical output repeatedly). Replace this with a single mechanism, which retains the SeedRand modes to control all randomness. There is a new internal deterministic PRNG inside the random module, which is used in GetRandBytes() when in test mode, and which is also used to initialize g_insecure_rand_ctx. This means that during tests, all random numbers are made deterministic. There is one exception, GetStrongRandBytes(), which even in test mode still uses the normal PRNG state. This probably opens the door to removing a lot of the ad-hoc "deterministic" mode functions littered through the codebase (by simply running relevant tests in SeedRand::ZEROS mode), but this isn't done yet.

The existing code uses GetRand(nMax), with a default value for nMax, where nMax is the range of values (not the maximum!) that the output is allowed to take. This will always miss the last possible value (e.g. GetRand<uint32_t>() will never return 0xffffffff). Fix this, by moving the functionality largely in RandomMixin, and also adding a separate RandomMixin::rand function, which returns a value in the entire (non-negative) range of an integer.

There are only a few call sites of these throughout the codebase, so move the functionality into FastRandomContext, and rewrite all call sites. This requires the callers to explicit construct FastRandomContext objects, which do add to the verbosity, but also make potentially apparent locations where the code can be improved by reusing a FastRandomContext object.

This simultaneously allows some queries to be redirected to the PeerManagerImpl::m_rng instance rather than a fresh context.

DrahtBot added the CI failed label Mar 12, 2024

This was referenced Mar 12, 2024

net: Make AddrFetch connections to fixed seeds #26114

Open

Enable HW-accelerated implementations of SHA256 for MSVC builds #24773

Closed

sipa force-pushed the 202403_rand_rework branch 2 times, most recently from b8d2aa9 to 3ad67b0 Compare March 12, 2024 20:55

This was referenced Mar 13, 2024

scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint #29641

Draft

Drop log category in SeedStartup #29480

Open

sipa force-pushed the 202403_rand_rework branch 8 times, most recently from c07a68c to b5c10a4 Compare March 13, 2024 19:32

DrahtBot mentioned this pull request Mar 14, 2024

Make (Read/Write)BinaryFile work with char vector, use AutoFile #29229

Closed

sipa force-pushed the 202403_rand_rework branch 5 times, most recently from 019d483 to 1b75d68 Compare March 14, 2024 20:30

sipa force-pushed the 202403_rand_rework branch 3 times, most recently from 8e3b398 to 036555f Compare March 15, 2024 19:04

DrahtBot added the CI failed label Mar 15, 2024

DrahtBot removed the CI failed label Mar 15, 2024

DrahtBot mentioned this pull request Mar 18, 2024

fuzz: actually test garbage >64b in p2p transport test #29667

Merged

DrahtBot added the Needs rebase label Mar 19, 2024

sipa force-pushed the 202403_rand_rework branch 2 times, most recently from 0707eea to 5904c0c Compare March 19, 2024 15:37

DrahtBot removed the Needs rebase label Mar 19, 2024

DrahtBot mentioned this pull request Apr 26, 2024

[WIP] Cluster mempool implementation #28676

Draft

8 tasks

dergoegge reviewed Apr 30, 2024

View reviewed changes

sipa added 12 commits April 30, 2024 13:59

random: modernize XoRoShiRo128PlusPlus a bit

165f9bd

Make use of C++20 functions in XoRoShiRo128PlusPlus.

random: move XoRoShiRo128PlusPlus into random module

4bbd05f

This is preparation for making it more generally accessible.

random: convert GetExponentialRand into rand_expo_delay

8faf79d

This simultaneously allows some queries to be redirected to the PeerManagerImpl::m_rng instance rather than a fresh context.

random: cleanup order, comments, static

3d88427

tests: make fuzz tests (mostly) deterministic with fixed seed

636e486

sipa force-pushed the 202403_rand_rework branch from 5904c0c to 636e486 Compare April 30, 2024 19:30

DrahtBot added the CI failed label Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several randomness improvements #29625

Several randomness improvements #29625

sipa commented Mar 11, 2024

DrahtBot commented Mar 11, 2024 •

edited

DrahtBot commented Mar 12, 2024

dergoegge commented Mar 12, 2024

Sjors commented Mar 12, 2024

dergoegge commented Mar 12, 2024

DrahtBot commented Mar 15, 2024

sipa commented Mar 17, 2024

sipa commented Mar 23, 2024 •

edited

EthanHeilman commented Apr 5, 2024

dergoegge Apr 30, 2024

sipa Apr 30, 2024

Several randomness improvements #29625

Are you sure you want to change the base?

Several randomness improvements #29625

Conversation

sipa commented Mar 11, 2024

DrahtBot commented Mar 11, 2024 • edited

Code Coverage

Reviews

Conflicts

DrahtBot commented Mar 12, 2024

dergoegge commented Mar 12, 2024

Sjors commented Mar 12, 2024

dergoegge commented Mar 12, 2024

DrahtBot commented Mar 15, 2024

sipa commented Mar 17, 2024

sipa commented Mar 23, 2024 • edited

EthanHeilman commented Apr 5, 2024

dergoegge Apr 30, 2024

Choose a reason for hiding this comment

sipa Apr 30, 2024

Choose a reason for hiding this comment

DrahtBot commented Mar 11, 2024 •

edited

sipa commented Mar 23, 2024 •

edited