Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeding PRNGs (traits and bit size) #18

Closed
dhardy opened this issue Oct 22, 2017 · 121 comments
Closed

Seeding PRNGs (traits and bit size) #18

dhardy opened this issue Oct 22, 2017 · 121 comments
Milestone

Comments

@dhardy
Copy link
Owner

dhardy commented Oct 22, 2017

Update again:

pub trait SeedRestriction: private::Sealed + Default + AsMut<[u8]> {}

pub trait SeedableRng: Sized {
    type Seed: SeedRestriction;

    fn from_seed(seed: Self::Seed) -> Self;

    fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error> {
        let mut seed = Self::Seed::default();
        let size = mem::size_of::<Self::Seed>() as usize;
        unsafe {
            let ptr = seed.as_mut().as_mut_ptr() as *mut u8;
            let slice = slice::from_raw_parts_mut(ptr, size);
            rng.try_fill_bytes(slice)?;
        }
        Ok(Self::from_seed(seed))
    }
}

So far we have proposed the following:

  • seeding via a u64 (convenient non-crypto usage)
  • seeding via the full state size (512-bits for ChaCha, 8192 for Isaac, 16384 for Isaac64)
  • seeding from another RNG

If we're seeding from another RNG, there's probably no reason not to fill the entire state space from this source; we are very likely seeding from a CSPRNG which can easily provide as many bytes as we need for little more cost than just copying a smaller set of bytes multiple times.

On the other hand if we are providing a seed, we might want to use the maximum key size possible. On the other hand, we may well only want to provide a sub-set, such as a 128-bit or 256-bit key (AES uses these key sizes). So should we add functions to seed from 128-bit and 256-bit keys instead of forcing users to copy their key into a larger zero-initialised array?

Actually, the current implementations allow seeding from u32 or u64 slices of undefined length. Keeping this functionality seems sensible.

But why not support seeding from byte slices instead? The only difference is that enforced alignment may allow more efficient copying from a slice of u32 or u64. Considering the user may already have a key in a byte-slice, this enforcement probably does more harm than good (forcing an extra copy or unsafe pointer-coercion).

This leaves me wondering if the current SeedableRng proposal is such a good idea; perhaps instead we should have the following:

pub trait SeedableRng: Rng {
    // when associated constants are allowed:
    const MAX_SEED_BYTES: usize;
    // alternative:
    fn max_seed_bytes() -> usize;
    
    // supports any number of bytes up to max_seed_bytes()
    fn seed_from_slice(&mut self, &[u8]);
    
    // not technically required, but convenient:
    fn seed_from_u64(&mut self, seed: u64);
}

(SeedFromRng may stay a separate trait providing the other type of seeding.)

@burdges
Copy link

burdges commented Oct 22, 2017

I'd say drop the u64 honestly because afaik the only use cases is pulling out less randomness from another PRNG to seed a lot of PRNGs and that sounds useless even if you care about convenience and speed.

I donno if seeding by the whole internal state always makes much sense either. Yes, you could set the internal state for some permutation like ChaCha or Keccak to produce random numbers, but the normal usage of ChaCha allocates 128 bits to fixed constants, at least a u32 for a counter, and the remainder to the nonce.

I dislike using slice methods since they make user pepper their code with asserts to emulate type checking. It's also bad to encourage unsafe code by not returning the actual type. Afaik, there is never any reason for this trait to be object safe since you only make new Rngs with it, not using existing ones, so why not something intuitive like :

pub trait SeedableRng: Rng {
    type Seed: Rand;
    fn from_seed(seed: Seed) -> Self;
    fn from_rng(rng: &mut Rng) -> Self { Self::from_seed(rng.gen()) }
}

There are uses for object safety in some sort of ReseedableRng trait, so one might imagine something more like :

pub trait ReseedableRng: Rng {
    fn add_entropy_u128(&mut self, entropy: u128); 
    fn reseed_from_rng(&mut self, rng: &mut Rng);
}

@dhardy
Copy link
Owner Author

dhardy commented Oct 22, 2017

No, the use-case of seeding from u64 was simulations and the like which just want a convenient way of seeding any PRNG from a number and swapping the PRNG. But so long as we standardise seed types somehow, we don't need seed_from_u64 for that (e.g. the byte slice version would also work).

I don't see much use in using ReseedableRng over just replacing with a new instance (like ReseedWithDefault does now) — ?

I'm not sure I see any point in setting the entire internal state either. But using a fixed bit-size like add_entropy_u128 isn't a good choice either, and only allowing a single key size also doesn't feel right.

You missed my point about being able to seed from whatever key size the user has directly (without e.g. copying into SeedableRng::Seed), but maybe that's less important. Using a byte slice is however more flexible for users and not really hard for PRNGs to implement.

I don't see where asserts would need to be used. The API should allow usage with any number of bytes up to the maximum and just ignore extra bytes; the only things might be considered "wrong" are providing too many bytes (not really important) or not enough bytes (perhaps this makes it easier for users to provide insecure implementations, I don't know).

@dhardy
Copy link
Owner Author

dhardy commented Oct 22, 2017

Part of the issue with having an associated fixed-size type Seed as @burdges proposes is the question of what the seed-type should be for PRNGs with large internal state like Isaac (as @burdges says, there's little sense in requiring a key as big as the whole internal space, yet using anything else explicitly prevents users from setting all bits with fresh entropy, which seems a premature decision unless there is a strong proof that more bits of entropy does not make the algorithm more secure).

Although slices aren't ideal, is there a better solution (besides allowing a whole family of seeding functions/traits using different sizes)?

@burdges
Copy link

burdges commented Oct 22, 2017

The seed sized should be determined by the PRNG, normally being 256 bits, but perhaps the type expresses alignment issues. In many usages like data structures, there should be a fatal error if the seed size does not match the seed provided, so this can either be done with type checking or with asserts.

@dhardy
Copy link
Owner Author

dhardy commented Oct 22, 2017

I disagree; users should be able to use a 128-bit seed with a generator like ChaCha if they wish, or a 256-bit seed, or even a 512-bit seed (if the generator can use it). Not to rule out other sizes (e.g. 192 or 384).

@burdges
Copy link

burdges commented Oct 22, 2017

I think if a user who wants to use a 128 bit seed with ChaCha then they should explicitly pad the seed with zeros, or call some special method whose documentation indicates this behavior. It's harder to audit code when you need to think about lengths of all the slices.

I take it Isaac+ has no standard that defines a seeding procedure? I suppose either there is a security argument that some smaller seed plus zeros is fine even from the beginning, or else you seed with the smaller value and run if for one round before using the output. You could provide a from_slice_with_zeros inherent method just for Isaac, probably with panic if the slice exceeds 256 u32s or if the slice is shorter than whatever the minimum is. We could set SeedableRng::Seed to whatever makes SeedableRng::from_rng happy too, maybe running this one throw away round if that's common. I have not read the security papers on Isaac+ so I don't know the recommendations.

I'm not advocating for ReseedableRng per se, but there is no way to make SeedableRng object safe without asking folk to create unseeded Rngs first, and that creates security bugs.

@dhardy
Copy link
Owner Author

dhardy commented Oct 22, 2017

Point noted about not wanting to check slice lengths, but this is normal practice with large buffers. Anything else makes it hard to avoid extra copy operations, though maybe we shouldn't worry too much about that here.

Uh, probably @burdges you didn't see that we abandoned ISAAC+ (see also here).

No, I'm not aware of a standard seeding routing for ISAAC. The author recommends seeding with the output of a secure cipher (see bottom of page). We could make a seeding routine for some selected key size (maybe to 256 bits), I suppose, and either pad with zero (current code does this) or use another PRNG (e.g. ChaCha) to expand the given 256-bits to the full state size. @pitdicker has just been looking at this.

If we go that route, we either have to prevent direct provision of the full state size or add another function. And then we need to choose which seeding functions we have, e.g. from 64 and 256 bits, maybe also 128 and 512?

@burdges
Copy link

burdges commented Oct 22, 2017

If you have a long enough slice, then the arrayref crate gives you fixed length arrays without copies, which gives you the correct panic behavior and puts the panic closer to the user's call site than if the panic happens in a library called by a library called by your library.

I'd forgotten about ISAAC+. It sounds like ISAAC users need access to the full state, since the author does not express confidence in his seeding routine, but not necessarily through SeedableRng.

I think you want impl FromIterator<u32> for Isaac for this, so that Isaac::from_iter(my_seed_slice) works where my_seed_slice: &[u32], but also Isaac::from_iter(rng.gen_iter()) seeds by setting the whole state. We can obtain other seeding behaviors like

Isaac::from_iter(my_seed_slice.iter().chain(seed_pad))
Isaac::from_iter(my_seed_slice.iter().cycle().zip(seed_pad).map(|(x,y)| x^y))

where seed_pad is a static nonzero padding slice. In theory, all these calls avoid copies thanks to the effort spent optimizing rust for Iterators. I'm not bothered by an iterator consuming a short seed without panicing for several reasons, although maybe if you do the same for ChaChaRng.

@dhardy
Copy link
Owner Author

dhardy commented Oct 23, 2017

Interesting point. impl FromIterator<T> could replace SeedFromRng entirely, except of course that T would vary (usually between u32 and u64 but maybe not exclusively). This hiding of the underlying type is the core difference between Rng and Iterator.

Of course, it would be possible to set the state of a PRNG like ISAAC using SeedFromRng, but constructing a mock Rng implementation to do that isn't so simple (also the code doing it would be more obscure).

Implementing SeedFromRng, SeedableRng with a fixed key size (e.g. 256-bit), seed_from_u64, and FromIterator<T> for some T does meet all requirements I guess. Hmm, now PRNGs need to implement 4 seeding functions?

Of course, we could throw out seed_from_u64, but since both SeedableRng::Seed and T in FromIterator<T> are implementation-defined, seed_from_u64 is the only uniform deterministic seeding method all PRNGs support (ignoring contrived seeding via SeedFromRng).

I suppose we could fix that T and require all PRNGs to support FromIterator<u32>, but this forces 64-bit generators to deal with alignment and type-coercion issues.

@newpavlov
Copy link

newpavlov commented Oct 23, 2017

Can't we use something like this?

pub trait SeedableRng: Rng {
    const SeedSize: usize;

    fn from_seed<S: ToSeed<Self::SeedSize>>(seed: S) -> Self;

    fn from_rng(rng: &mut Rng) -> Self {
        let mut buf = [0u8; Self::SeedSize];
        self.fill(&mut buf);
        Self::from_seed(buf)
    }
}

pub trait ToSeed<const SeedSize: usize> {
    fn to_seed(&self) -> [u8; SeedSize];
}

And implement ToSeed for numeric types, arrays and slices of those numeric types? If input is larger than needed then we truncate it, if smaller we pad it with zeros. We also could add to output of to_seed number of input bytes, so from_seed could e.g. panic if needed. Yes, here we'll do unnecessary copies, but: a) compiler will be able to optimize some of those, b) I don't think we should worry about it that much, as copy cost should be negligible for most applications. Drawback is that we force RNG implementations to work with seeds in the form of fixed-sized byte arrays, not sure if it's a big issue or not.

@dhardy
Copy link
Owner Author

dhardy commented Oct 23, 2017

I don't see much point in that @newpavlov. I didn't really get @burdges requirements with asserting length, but the only difference between this and passing a slice directly is that it is possible to make some assertions about length in to_seed (though we may not want that).

The problem with this is that it assumes each RNG has an ideal length of seed which is the maximum possible length, and always pads smaller seeds with zero. This may not be the case; e.g. small seeds may be repeated and/or modified with some magic number, and there may not be an ideal seed size, as in ISAAC could use a whole 8192 seeding bits but 256 would often be more reasonable.

@newpavlov
Copy link

newpavlov commented Oct 23, 2017

The point is that from_seed will be able to consume almost everything what (reasonable) users would like to feed it (e.g. from_seed(0x123456u64), from_seed([1u8, 2, 3]), from_seed(&buf[..100]), etc.) without any need for additional methods. Instead of converting seed into array of given size we could convert it into slice of bytes, so from_seed implementation could be smarter. But we'll still need something like const OptimalSeedSize: usize; for from_rng.

Click to expand
pub trait SeedableRng: Rng {
    const OptimalSeedSize: usize;

    fn from_seed<S: ToSlice>(seed: S) -> Self;

    fn from_rng(rng: &mut Rng) -> Self {
        let mut buf = [0u8; Self::OptimalSeedSize];
        self.fill(&mut buf);
        Self::from_seed(&buf)
    }
}

pub trait ToSlice {
    fn to_seed(&self) -> &[u8];
}

UPD: But drawback of this approach will be usage of native endian...

@burdges
Copy link

burdges commented Oct 23, 2017

Is there no iterator adapter in the itertools crate or whatever that agglomerates or splits numeric types according to a specific endianness? It'd be easy enough to write, but does need endianness.

I think FromIterator<T> makes lots of sense if a common way to seed to set the whole internal state, like Isaac. It's less clear for ChaCha where standard versions all fix the first 128 bits and the counter comes in between the key and the nonce.

A priori, I'd think lengths should be avoided @newpavlov favoring actual types [T; N] instead. Firstly, rust still handles associated types much better than associated constants. Second, someone can have seeds with unusual properties that way, like additional configuration, a custom Rand, or GenericArray. Yes I'm happy GenericArray will disappear with const-dependent types arrival, but the example stands.

Afaik, we expect tricks like impl<const N: usize, T> Foo for T where T: Bar<Item=[u8; N]> to work eventually, so associated types should be strictly more powerful than associated constants. It'd change if associated constants ever got special treatment in vtables to make them object safe, but that'll be years away if ever. We cannot get object safety here anyways since we need -> Self methods.

@dhardy
Copy link
Owner Author

dhardy commented Oct 23, 2017

I personally like the FromIterator<T> idea but it's significantly more useful if each RNG implements for the same T.

Should we specify RNGs should implement FromIterator<u64>? Why u64 over u32:

  • most seed sizes will be a multiple of 64 bits, excepting some small 32-bit generators
  • 64-bit generators will often implement next_u32 by throwing away bits, so using u32 would commonly force seeding to waste bits

Even so, this approach is still not ideal. We could alternatively create some wrapper which can easily convert a slice or iterator into an Rng for use with SeedFromRng.

@pitdicker
Copy link

After reading up on initializing RNGs and tinkering with ISAAC and Xorshift*/Xoroshiro the last couple of weeks this is how I understand the problem of initializing:

If there is a high-quality source of randomness available (OsRng) it is basically enough to fill the entire state of the RNG with it and call it a day.

If the source of randomness / seed is not of high quality, the problems begin. There is no way to 'increase the randomness'. The entropy of the seed should be distributed evenly over the RNGs state. But you have to make sure that biases in the seed do not affect the quality of the RNG algorithm.

For simple RNGs the common advice in such a case is to run the weak seed through an algorithm that is completely different from the algorithm used in the RNG. If one RNG relies on Multiplies (like LCGs as PCG) it may be best to run the seed through something using Xorshifts. And for Xorshift-based something like SplitMix, a Wehl-sequence with a strong output hash, is a good option. But this assumes only RNGs that are statistically biased are available to initialize from a weak seed. I think any RNG that turns the seed into something statistically unbiased is fine like PCG and Xorshift*.

For cryptographic RNGs it is (as everything) slightly more complicated. A weak seed could reduce the effort it takes to brute-force the internal state of the RNG. The result of the initialization should not only evenly and unbiased divide the available entropy over the internal state. It should also not be possible to reconstruct the seed from the initialized state. I think.

The last point is why Bob Jenkins recommends to encrypt the initial seed of ISAAC with a strong block cipher, and does not fully trust his implementation. For us I think it is useful to keep it as-is. For the default use, initializing with OsRng, it does not matter at all. Running any initialization routine at all is not strictly necessary. And for use with a manually provided seed, there is a benefit in matching the reference implementation.

For ChaCha we definitely need to look at how it gets initialized. I don't have faith in the current from_seed code (the from_rng part when used with OsRng should be fine). Especially if the seed is smaller than the internal state. It probably needs a some cipher...

Sorry, this does not really fit into the current discussion. But it seems like things to keep in mind.

@burdges
Copy link

burdges commented Oct 23, 2017

I just noticed that if R: FromIterator<T> means iterator.collect() can yield an R, perhaps counter intuitive if R: Rng in general, but maybe not so bad for setting the whole state in Isaac. If this is a problem generally, then an Rng::from_iter(iter: I) method could replace this where I: IntoIterator<Item=..>.

You could always support FromIterator<T> for multiple T, but.. Any given PRNGs is specified in terms of one fixed T, even if no seeding algorithm is specified, like u32 for Isaac. Anything else incurs endianness issues when determinism matters, so that T should be the highest priority to minimize confusion.

As for ChaCha, I always liked that Peter Reid's ChaCha makes all types explicit, so I'd likely take SeedableRng::Seed to be a tuple (Key,Nonce), ala

/// Warning: Only 2^32 bytes of output
struct IETFChaChaRng(ChaCha);
impl SeedableRng for IETFChaChaRng {
    type Seed = ([u8; 32]; [u8; 16]);
    ...
}
/// Slightly slower start up
struct XChaCha20Rng(ChaCha);
impl SeedableRng for XChaCha20Rng {
    type Seed = ([u8; 32]; [u8; 24]);
    ...
}
/// Small nonce sounds find for an Rng
struct ChaCha20Rng(ChaCha);
impl SeedableRng for ChaCha20Rng {
    type Seed = ([u8; 32]; [u8; 8]);
    ...
}
struct ChaCha12Rng(ChaCha);
...
struct ChaCha8Rng(ChaCha);
...

In practice, one should avoid boilerplate by using internal traits or macros to get seek and set_nonce methods too, except for XChaCha for which set_nonce makes no sense.

@dhardy
Copy link
Owner Author

dhardy commented Oct 24, 2017

It should also not be possible to reconstruct the seed from the initialized state.

@pitdicker I don't understand the logic of this. My understanding of entropy in the case of cryptography is as a measure of how much information a potential attacker doesn't know; e.g. 30 bits of entropy would imply that an attacker could guess that our system is in one of 2^30 states. Assuming the attacker also knows the algorithms used (published in this case), it makes no difference what the initialisation algorithm does, so long as it doesn't lose entropy, the attacker still has 2^30 states to check.

The only other thing to consider, I think, is how much of that entropy gets used in each number output. If there is insufficient mixing, the initial output may have less entropy than given in the seed, i.e. there may be fewer output possibilities than the available entropy should imply; I guess this is the generalisation of what's commonly called bias (the more specific version being a predicate like 'output < max / 2' not having the expected probability).

I'm not sure that "not being able to reconstruct the seed from the internal state" says anything except about the complexity of the algorithm; I don't see why using a reversible hash function here would be a disadvantage?

@pitdicker
Copy link

pitdicker commented Oct 24, 2017

It should also not be possible to reconstruct the seed from the initialized state.

@pitdicker I don't understand the logic of this.

Yes, I wrote it badly. Let me try again: Suppose a cryptographic RNG is initialized with a 128-bit seed. The full state is 512 bits. What happens if the state is naively initialized with some non-cryptographic function (for example, repeating the seed 4 times)? If you can derive a quarter of the state, you know the entire state. Now you should not be able to derive that quarter anyway. But it seems to me bad initialization would make the cryptographic promises of such an RNG weaker.

I was not thinking about brute-forcing, but about someone exploiting weaknesses in the cryptographic RNG. After a good initialization of a CSRNG it should not be possible to know anything about one part of the state after figuring out another part. Then that would mean using some irreversable function to construct the state from the seed.

@pitdicker
Copy link

Just learned something. 'Every CSPRNG should withstand "state compromise extensions". In the event that part or all of its state has been revealed (or guessed correctly), it should be impossible to reconstruct the stream of random numbers prior to the revelation.' (Wikipedia). So it is part of the CSPRNG's job to prevent this.

We don't need to search for "some irreversible function to construct the state from the seed" if the seed may be weak or too short. Just copy the available bytes from the seed into the RNGs state and run one round of the cryptographic RNG.

It still means the current implementation of ChaCha's from_seed should do some extra work.

Note that ISAAC does not seem to offer this guarantee, and therefore has the special initialization function.

@pitdicker
Copy link

Okay, I had some doubts about initializing RNGs with a seed that is shorter than the state. But they are all resolved now. Initializing with a slice of some arbitrary size should be workable.

One other question:
I think we agree that with from_seed we should always have some initialization functions to make sure a weak seed it usable.
What do you think about trusting from_rng and using the seed directly as state as a policy?

@pitdicker
Copy link

Even so, this approach is still not ideal. We could alternatively create some wrapper which can easily convert a slice or iterator into an Rng for use with SeedFromRng.

Is that not exactly what ReadRng can be used for?

@dhardy
Copy link
Owner Author

dhardy commented Oct 24, 2017

It still means the current implementation of ChaCha's from_seed should do some extra work.

Why precisely? update should get run the next time any numbers are drawn.

@pitdicker
Copy link

After the first round the first block of output is based on the weak state. Only after the first round is the state mixed well. So I would run update twice after an initialization we do not completely trust.

@burdges
Copy link

burdges commented Oct 24, 2017

I'd discourage giving any ChaCha less than a 256 bit key because ChaCha never mixes the internal state. If I understand, you guys are no longer discussing ChaCha though, but another PRNG built with the ChaCha permutation function? You need another name for that like MixingChaChaRng or ChaChaPermRng, not ChaChaRng.

@pitdicker
Copy link

@burdges I keep messing up anything I say about ChaCha, glad you know more about it.

Still the point stands, one round of ChaCha should be a good enough initialization function to turn a weak seed into an acceptable state for ChaChaRng. But we do have to use the output of that round for the new state.

Of course initializing with a full seed of good-quality random numbers is the only way to initialize ChaCha that can claim to be as secure as the algorithm gets.

I am not sure yet what to think about allowing smaller seeds, but after the discussion today I think it only has consequences for brute-forcing. The algorithm does not get easier to break (with proper initialization).

@dhardy
Copy link
Owner Author

dhardy commented Oct 24, 2017

Hmm, I thought this was suspicious when I read it. It's not true. ChaCha allows seeking which makes reversal trivial, thus "state compromise" allows past output to be reconstructed easily.

'Every CSPRNG should withstand "state compromise extensions". In the event that part or all of its state has been revealed (or guessed correctly), it should be impossible to reconstruct the stream of random numbers prior to the revelation.'

Looking at Peter Reid's code (mentioned by @burdges earlier) I see three main differences:

  1. Constructors for various combinations of key, nonce and number of rounds. The 20-round versions are possible in rand's ChaCha too (by calling set_counter); the exceptions are 8 & 12 round versions and new_xchacha20, which accepts a 24-byte nonce (16+8) and does a round of permutation on the key (using first 16 bytes for counter & nonce).
  2. Code to XOR output with a byte stream (en-/de-cryption)
  3. End-of-stream error detection, with variable counter size. rand's version is non-standard in that it increments the nonce instead, allowing a ridiculously long 2^134 byte stream. This doesn't imply incompatibility with content encrypted by other libraries, just that content encrypted by rand::ChaCha could potentially be too long for other libraries to decrypt.

I think it would be interesting to build some encryption/decryption code around rand::ChaCha and check compatibility and compare performance with other implementations, but that's a little off-topic.


Regarding ChaCha initialisation, it may be worth adding constructors accepting various key/nonce combinations. But these would be specific to ChaCha.

More generally, encouraging use of ReadRng in combination with SeedFromRng makes a lot of sense. Similar wrappers to make an Rng from u32 and u64 slices may also be useful. But what to do when the seed isn't long enough? Only try_fill allows reporting without a panic, and I wonder whether PRNGs should allow seeding with short seeds via ReadRng? Possibly better not to, but allow an alternative constructor using short seeds.

What about ISAAC, should it just fill its huge state directly from the ReadRng or take only a smaller key and use another PRNG to expand into its space? Probably better to set the whole state directly from ReadRng.

So now we want a general way to expand a short seed. Perhaps we should have something similar to ReadRng which initialises itself from some "small" seed and outputs as many bytes as necessary, to be consumed by the target PRNG's SeedFromRng implementation. How about using ChaCha? It can accept a "key" up to 384 bits if setting the counter and nonce portions too, which we might as well allow; anything smaller can be zero-padded.

So have we argued ourselves to the point that we should drop SeedableRng entirely and just keep SeedFromRng as-is, plus allow specific constructors (e.g. where ChaCha should replicate IETF 7539 12-byte nonce behaviour)?

@burdges
Copy link

burdges commented Oct 24, 2017

It's not even about security but about standards: If you want to call it literally ChaChaRng then it should be ChaCha20 or maybe IETF ChaCha or XChaCha20, and only XChaCha does an initialization round. Afaik, all these assume almost a 256 bit key, but not uniformly distributed, as it's commonly a curve25519 point.

I'd think a SeedableRng trait with an associated seed type of the form (Key,Nonce) would best suit standardized ChaCha variants, while something like FromIterator<u32> makes sense for Isaac. There is nothing wrong with providing access to the ChaChaRng internal state, but one should not do use it in normal usage.

I'd wager the ChaCha permutation operating like Isaac gives plenty of security, but I donno if anybody bothered writing that down, or maybe it follows form DJB's paper. It's plausible other permutations like Keccak give such an analysis.

@dhardy
Copy link
Owner Author

dhardy commented Oct 24, 2017

I think the idea was to allow parameterisation with constants, i.e. ChaCha<20>, for configuring the number of rounds. Perhaps this can finally be implemented soon. An extra round of initialisation just needs a specific constructor.

But you posted at the same time as me so I'll assume you didn't reply to my previous post.

@pitdicker
Copy link

When did you update the first post? I didn't notice it...

If we're seeding from another RNG, there's probably no reason not to fill the entire state space from this source; we are very likely seeding from a CSPRNG which can easily provide as many bytes as we need for little more cost than just copying a smaller set of bytes multiple times.

On the other hand if we are providing a seed, we might want to use the maximum key size possible. On the other hand, we may well only want to provide a sub-set, such as a 128-bit or 256-bit key (AES uses these key sizes). So should we add functions to seed from 128-bit and 256-bit keys instead of forcing users to copy their key into a larger zero-initialised array?

Actually, the current implementations allow seeding from u32 or u64 slices of undefined length. Keeping this functionality seems sensible.

I don't think we should get to creative with the initialization function for RNG's, and also that we should not force other implementers of RNG's to become so. The standard functions in SeedableRng should just implement the initialization routines from the author, if sensible. At least for me it it difficult to see the consequences otherwise.

For the proposed seed_from_u64 and from_hashable it is different, because they are only for convenience. Helpful for non-critical uses such as testing. I expect them to produce a working generator, but not to be secure.

I have looked at several randomness libraries in C, and some allow some flexibility in the seed size. This is mostly because of the lack of easy sources of randomness: they support seeding from things like a single u32 (u64 if you are lucky) or from the system time, and 'good' seeding.

ISAAC is a bit special because it supports arbitrary seed sizes. On the other hand its initialization function is viewed as the weakest part of it.

In short I would not like to give users the idea that the seed size is something flexible, and using something else than the maximum is ok for serious uses. And I would not like to force RNG implementations to have to handle it.

@dhardy
Copy link
Owner Author

dhardy commented Jan 16, 2018

like this could default to true and ask exceptions to specify false

It should definitely be the other way around if there is a default. It's not about deterministic behaviour, it's about stating that the code won't change in the future (in a way that affects output).

When did you update the first post? I didn't notice it...

Several times. I only copied SeedableRng from your PR today; I think I added some kind of summary but that was a long time back. And the second half of that post never got updated (I didn't remove the original content). I believe seed_from_u64 and variable seed sizes are no more (or will be as soon as your PR merges).

@burdges
Copy link

burdges commented Jan 16, 2018

I think SeedableRng::from_rng should be reproducible given everything else being reproducible.

Why the unsafe in from_rng? That's looks needless since you're calling AsMut::<[u8]>::as_mut anyways. It's maybe dangerous to have a ? inside an unsafe.

Why the SeedRestriction trait? Why not merely require Seed : AsMut<[u8]>? In fact, I'd think specialization could avoid these restrictions entirely:

pub trait SeedableRng: Sized {
    type Seed : Sized;
    fn from_seed(seed: Self::Seed) -> Self;
    fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error>;
}
impl<T> SeedableRng for T where T: SeedableRng, T::Seed: Default + AsMut<[u8]> {
    default fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error> {
        let mut seed = Self::Seed::default();
        assert!(seed.as_mut().len() == ::std::mem::size_of::<Self::Seed>());
        rng.try_fill_bytes(seed.as_mut())?;
        Ok(Self::from_seed(seed))
    }
}

If you want to support u16, u32, etc. then use specialization on Deref<Target=[u8]> + DerefMut instead of AsMut<[u8]>.

In any case, I think if a developer just starts writing code using rand and standardized CSPRNGs then that code should usually have a reasonable representation in terms of the standards. At real World Crypto, I just witnessed folks attempting to disentangle rand seeding of ChaChaRng due to the ZCash's now running Powers of Tau ceremony using it and now them being dependent upon rand to standardize their trusted setup.

I think there are a couple relevant seeding modes, (1) seed however the PRNG is defined, or some approximation there of, (2) seed by filling the PRNG's whole internal state, and (3) hash something to do either (1) or (2), ala from_hashable. Arguably (2) should be made available only via intrinsic method not trait methods. I'd say do from_hashable only via free functions, like

use std::hash::{Hash,Hasher,BuildHasher}
fn seed_from_hashable<B,R,H>(data: H) -> R where R: SeedableRng, R::Seed: AsMut<[u8]>, H: Hash, B: BuildHasher {
    let mut seed = Self::Seed::default();
    for (i,s) in seed.as_mut().chunks_mut().enumerate() {
        let mut h = B::build_hasher();
        i.hash(&mut h);
        data.hash(&mut h);
        let t = h.finish();
        s.copy_from_slice( unsafe { std::mem::transmute::<_,[u8; 8]>(t) }[..s.len()] );
    }
}

@dhardy
Copy link
Owner Author

dhardy commented Jan 16, 2018

I think SeedableRng::from_rng should be reproducible given everything else being reproducible.

The reasoning was (a) to avoid requirements on fixing Endianness and (b) to allow flexibility for optimisation later.

Why the unsafe in from_rng?

I wasn't so keen on this either but didn't see a better solution. @pitdicker wrote it.

Why the SeedRestriction trait?

So that users know something or other about Seed in generic code, and so that we can write from_hashable which works with any SeedableRng. But maybe AsMut<[u8]> is enough for this.

It seems crypto PRNGs will probably never directly fill the internal state, or at least we probably won't do that with ISAAC; this makes me question whether we want from_rng any more (although ISAAC still uses more data with from_rng).

@burdges
Copy link

burdges commented Jan 16, 2018

I see no reason for unsafe in from_rng since the one I wrote above achieves the same ends safely (note the update that panics).

In fact, your from_rng is undefined behavior currently so either it should be replaced, avoiding unsafe, or else SeedRestriction needs to be unsafe.

@burdges
Copy link

burdges commented Jan 16, 2018

We should figure out if specialization can be used like this:

pub trait SeedableRng: Sized {
    type Seed;
    fn from_seed(seed: Self::Seed) -> Self;
    fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error>;
}
impl<T> SeedableRng for T where
  T: SeedableRng,
  T::Seed: Sized + Default + Deref<Target=[u8]> + DerefMut 
{
    default fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error> {
        let mut seed = Self::Seed::default();
        assert!(seed.deref_mut().len() == ::std::mem::size_of::<Self::Seed>());
        rng.try_fill_bytes(seed.deref_mut())?;
        Ok(Self::from_seed(seed))
    }
}
impl<T> SeedableRng for T where
  T: SeedableRng,
  T::Seed: Sized + Default + Deref<Target=[u32]> + DerefMut 
{
    default fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error> {
        let mut seed = Self::Seed::default();
        let size = 4 * seed.deref_mut().len();
        assert!(size == ::std::mem::size_of::<Self::Seed>());
        {
            let ptr = seed.deref_mut().as_mut_ptr() as *mut u8;
            let s = unsafe { slice::from_raw_parts_mut(ptr, size) };
            rng.try_fill_bytes(s)?;
        }
        Ok(Self::from_seed(seed))
    }
}

or maybe like this:

pub trait SeedableRng: Sized {
    type Seed;
    fn from_seed(seed: Self::Seed) -> Self;
    fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error>;
}

trait SeedArrayBase : Sized {}
impl SeedArrayBase for u8 {}
impl SeedArrayBase for u16 {}
impl SeedArrayBase for u32 {}
impl SeedArrayBase for u64 {}

impl<P,T> SeedableRng for T where
  P: SeedArrayBase,
  T: SeedableRng,
  T::Seed: Sized + Default + Deref<Target=[P]> + DerefMut 
{
    default fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error> {
        use std::mem::size_of;
        let mut seed = Self::Seed::default();
        let size = size_of::<P>() * seed.deref_mut().len();
        assert!(size == size_of::<Self::Seed>());
        {
            let ptr = seed.deref_mut().as_mut_ptr();
            let s = unsafe { slice::from_raw_parts_mut(ptr as *mut u8, size) };
            rng.try_fill_bytes(s)?;
        }
        Ok(Self::from_seed(seed))
    }
}

@burdges
Copy link

burdges commented Jan 16, 2018

Actually we should simply go with one specialization which probably works.

pub trait SeedableRng: Sized {
    type Seed;
    fn from_seed(seed: Self::Seed) -> Self;
    fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error>;
}
impl<T> SeedableRng for T where
  T: SeedableRng,
  T::Seed: Sized + Default + Deref<Target=[u8]> + DerefMut // Or maybe AsMut<[u8]>if that works
{
    default fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error> {
        let mut seed = Self::Seed::default();
        assert!(seed.deref_mut().len() == ::std::mem::size_of::<Self::Seed>());
        rng.try_fill_bytes(seed.deref_mut())?;
        Ok(Self::from_seed(seed))
    }
}

We should not provide any default conversion from [u32] to [u8] due to endianness issues. ISAAC could use a [u32] like its specification says, but it should provide its own from_rng and do byte order correct conversions itself.

In this way, a SeedableRng can provide any seed it likes, even &[u8] or !Sized, but must write a fairly trivial from_rng if they do anything strange.

@burdges
Copy link

burdges commented Jan 16, 2018

As I pointed out above, there is no obvious need for SeedRestriction in writing a from_hashable either. It could seemingly even be an unrelated crate :

use std::hash::{Hash,Hasher,BuildHasher}

pub trait FromHashable {
    fn  from_hashable<B: BuildHasher,H: Hash>(data: H) -> Self;
}
impl<T> FromHashable for T where
  T: Sized + Default + Deref<Target=[u8]> + DerefMut // Or AsMut<[u8]>
{
    default fn  from_hashable<B: BuildHasher,H: Hash>(data: H) -> Self {
        let mut r = Self::default();
        assert!(r.deref_mut().len() == ::std::mem::size_of::<Self::Seed>());
        for (i,s) in r.deref_mut().chunks_mut().enumerate() {
            let mut h = B::build_hasher();
            i.hash(&mut h);
            data.hash(&mut h);
            let t = h.finish();
            let l = s.len();  // No NLL yet
            s.copy_from_slice( unsafe { std::mem::transmute::<_,[u8; 8]>(t) }[..l] );
        }
        r
    }
}

impl<R> FromHashable for R where R: SeedableRng, R::Seed: FromHashable {
    fn from_hashable<B: BuildHasher,H: Hash>(data: H) -> Self {
        Self::from_seed(Self::Seed::from_hashable(data))
    }
}

In this way, you can provide the from_hashable either on the seed or on the Rng directly.

@dhardy
Copy link
Owner Author

dhardy commented Jan 16, 2018

In fact, your from_rng is undefined behavior currently

What's the UB? Sorry, if you're looking at the code at the top of this issue, it's incomplete; full version (sealing the types supported) is here. The "sealed" stuff is a trick to prevent user extension.

As I remember, you implied that we should only use u8 array seeds; I would prefer going with that because (a) it makes fill_bytes easy to use on any seed and (b) it simplifies Endian conversions.

So maybe Seed: Default + AsMut<[u8]> is all the restriction we need? On the other hand, the limited seed lengths listed (4, 8, 12, 16, 24 and 32 bytes) plus maybe a few more (48, 64) should be enough to cover everything, I would think. But the permissive version may be better.

I've been wondering if from_hashable should be separate; at any rate it's not something I'm thinking about merging just yet. BTW we can't use std::hash because there's no guarantee they won't change it and we need stable output.

@burdges
Copy link

burdges commented Jan 16, 2018

Yeah, it's not undefined behavior per se but it pushes memory safety onto SeedRestriction so that trait should be marked unsafe even if only used internally.

I suppose ChaChaRng would take Seed = [u8; 32]; but permit setting the nonce elsewhere, yes?

It appears my specialization trick fails bizarrely because all associated types cannot yet be leveraged for specialization. :(

I'd think one could fix it by splitting off from_rng into another trait but I'd say do the following for now with the idea to generalize Seed later once associated type and specialization play nicer together.

pub trait SeedableRng: Sized {
    type Seed: Sized + Default + Deref<Target=[u8]> + DerefMut; // Or AsMut<[u8]>
    fn from_seed(seed: Self::Seed) -> Self;
    fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error> {
        let mut seed = Self::Seed::default();
        assert!(seed.deref_mut().len() == ::std::mem::size_of::<Self::Seed>());
        rng.try_fill_bytes(seed.deref_mut())?;
        Ok(Self::from_seed(seed))
    }
}

@newpavlov
Copy link

newpavlov commented Jan 16, 2018

Yeah, it's not undefined behavior per se but it pushes memory safety onto SeedRestriction so that trait should be marked unsafe even if only used internally.

No, unsafe code always relies on "safe" code, and in this case sealed trait reliably protects users from shooting themselves in the foot, thus the method is perfectly safe.

UPD: Oh, sorry, you mean marking trait unsafe, here I agree with you.

@pitdicker
Copy link

I suppose ChaChaRng would take Seed = [u8; 32]; but permit setting the nonce elsewhere, yes?

@burdges Can you explain why you need to set a nonce for an RNG?

The implementation of ChaCha in rand just makes the nonce part of the counter, so we end up with a 128-bit counter. From the documentation:

Since the nonce words are used to extend the counter to 128 bits, users wishing to obtain the conventional ChaCha pseudorandom stream associated with a particular nonce can call this function with arguments 0, desired_nonce.
pub fn set_counter(&mut self, counter_low: u64, counter_high: u64)

@burdges
Copy link

burdges commented Jan 17, 2018

Just because a nonce is part of the chacha standard. I suppose a massive counter has uses too though, so maybe the nonce could be left aside. We might eventually have a general wrapper for treating stream ciphers as CSPRNGs anyways.

@dhardy
Copy link
Owner Author

dhardy commented Jan 17, 2018

ChaCha is standardised as a stream cipher; we're using it as an RNG. I have nothing against adding additional constructors on ChaCha to allow it to be used for both purposes however. It may be that using from_seed and set_counter is sufficient to get output compatible with the standard but that's not a goal because SeedableRng::from_seed is intended for seeding the RNG, not the stream cipher; and judging by this code it looks like there are in fact multiple standards on how to initialise the cipher.

AsMut<[u8]> is more general than Deref<Target=[u8]> + DerefMut; I think either we should go with the more restrictive version ([u8; N] for specified N for now, maybe generalised to all N later) or the more general version (any T: AsMut<[u8]>). @pitdicker thoughts?

And is there a good reason to keep from_rng at all? There are two reasons:

  • Allows consumption of more data from the source RNG (e.g. ISAAC can set its full memory instead of just a 256-bit block), potentially improving quality. But as already pointed out, the initialisation rounds are still necessary unless we completely trust the source RNG, so likely most PRNGs will either use the default implementation of from_rng or fill a larger buffer but use the same init code. Also, the PRNG's Seed should already be large enough for the specified usage, so consuming more data (and potentially "entropy") likely doesn't gain anything.
  • Allows error handling, both passing on errors from the source RNG and internal errors, such as incompatible seeds (e.g. Xorshift and 0), and allows the user to retry.

So my thoughts are we use @pitdicker's PR except (a) mark SeedRestriction as unsafe, and (b) avoid usage of unsafe code in from_rng, using @burdges code (or similar).

@dhardy
Copy link
Owner Author

dhardy commented Jan 17, 2018

Sorry, if from_rng is safe we don't need to mark the trait unsafe.

@pitdicker
Copy link

pitdicker commented Jan 18, 2018

@burdges You are right, there is no unsafe code necessary anymore in from_rng 👍. It was a leftover from previous variants with [u32] and [u64] seeds.

One problem with AsMut<[u8]> and other traits is that they are only defined for arrays with less than 32 elements. Do we want to restrict seeds to a maximum of 256 bits, at least for the next year or so?

Edit: the code without unsafe only works with the AsMut requirement.

@burdges
Copy link

burdges commented Jan 18, 2018

Also I suggested Deref thinking it'd admit/avoid negative reasoning so that [u32] might work, but that comes with endianness issues, so probably pointless.

@pitdicker
Copy link

Since the experiments with from_hashable I learned some more, and implemented quite a few small RNGs. from_hashable is a nice idea because it allows seeding with all sorts of types. But is has the disadvantage that it adds quite some non-trivial code, and the ideal hashing algorithm is not clear.

I think again that a seed_from_u64 function is convenient (from_seed is the 'right' function, but not all that easy to use). Simple deterministic initialization for tests, simulations etc. is something we do not support all that well yet.

For cryptographic RNGs the u64 can simply be mapped to the first 8 bytes of the seed. For small RNGs small numbers or patterns can start the RNG in a weak state. But the common way I have seen that handled is to just run the RNG the number of times it approximately takes to escape a weak state, usually about 20 times.

So I would propose to add seed_from_u64 to the SeedableRng trait, with a default implementation. The default implementation will repeat the u64 to fill the seed, initialize the RNG with from_seed, and call next_u32 20 times.

The advantage of adding the function to SeedableRng is that RNG implementations can override it if for example the number of times to call next_u32 is less (and 0 for cryptographic RNGs).

@vks
Copy link

vks commented Mar 5, 2018

@pitdicker

So I would propose to add seed_from_u64 to the SeedableRng trait, with a default implementation. The default implementation will repeat the u64 to fill the seed, initialize the RNG with from_seed, and call next_u32 20 times.

I don't think 20 times is enough for all RNGs with large state. I would suggest to implement seed_from_u64 using some small RNG like splitmix64, this seems like a safer option to me.

@pitdicker
Copy link

Ah, you are right. I forgot about a third category: simple RNGs using large states, like xorshift1024*φ (which you are probably thinking about?). Although I see only very limited use for these, we should support them. Using a RNG to fill the state is definitely the best solution there.

If we add helper functions for all three cases in the impls module, I think we have the common methods to seed from an u64 covered?

Of course the documentation should make very clear that using this in combination with a cryptographic RNG is rarely a good idea.

@dhardy
Copy link
Owner Author

dhardy commented Mar 5, 2018

For what it's worth, I'm still planning on getting from_hashable into a usable state (probably using a cryptographic hash function), it just hasn't been a priority lately.

I guess both seed_from_u64 and from_hashable could be implemented without touching SeedableRng, by implementing an RNG (SplitMix or the hash function itself used as an RNG) and using from_rng, but it might still be worth adding them to SeedableRng (convenience, at the cost of extra code — this isn't so desirable for rand-core minimality however).

A potential issue with seed_from_u64 is the limited state — @pitdicker you made a good argument for wanting at least 96-128 bits of state when initialising small PRNGs. For some uses this is less important, particularly if another PRNG/hash function is used to modify the seed u64 first.

Ultimately I don't see much use for seed_from_u64 once we have from_hashable — it might be faster but it's unlikely that would be a big deal for most uses.

@dhardy
Copy link
Owner Author

dhardy commented Mar 5, 2018

Note that we don't need to implement either as a function at all: SomePrng::from_rng(SomeHasher::new(key)) is probably good enough.

@dhardy
Copy link
Owner Author

dhardy commented Mar 11, 2018

Implemented, apart from suggested additions like seed_from_u64 and from_hashable, but we don't need to keep this issue open for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants