Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: remove dependency on rand ecosystem #241

Closed
BurntSushi opened this issue Aug 25, 2019 · 37 comments
Closed

RFC: remove dependency on rand ecosystem #241

BurntSushi opened this issue Aug 25, 2019 · 37 comments
Labels

Comments

@BurntSushi
Copy link
Owner

BurntSushi commented Aug 25, 2019

I am no longer happy about depending on the rand crates. There is too much churn, too many crates, and IMO, worst of all, there is no desire to add a minimal version check to their CI. Which means anything that depends on quickcheck in turn cannot reliably have its own minimal version check.

Because I am tired of depending on rand, I have started removing it completely where possible. For example, in walkdir, I've removed quickcheck as a dependency. In ripgrep, I've removed tempfile as a dependency, because it in turn was the only thing bringing rand into ripgrep's dependency tree.

I don't see any other path forward here. I can either continue to grin and bear rand, drop everything that depends on randomness, or figure out how to generate randomness without rand. Specifically, I'd very much like to add a minimal version check back to the regex crate, which catches bugs that happen in practice. (See here and here.) My sense is that there is some design space in the ecosystem for a simple source of randomness that doesn't need to be cryptographically secure, and an API that does not experience significant churn. Certainly, quickcheck does not need a cryptographic random number generator.

With that said, there is some infrastructure in the rand API that is incredibly useful. For example, quickcheck makes heavy use of the Rng::gen method for generating values based on type.

So it seems like if we have something like the Rng trait with with a non-cryptographic RNG, then we'd be probably good to go.

Are there other avenues here? What have I missed? My experience in building infrastructure for randomness is pretty limited, so am I underestimating the difficulty involved here?

Another side to this question is whether any users of quickcheck are leveraging parts of the rand ecosystem that would be difficult or impossible to do if we broke ties with rand.

@BurntSushi BurntSushi changed the title brainstorm: remove dependency on rand ecosystem RFC: remove dependency on rand ecosystem Aug 25, 2019
@elichai
Copy link

elichai commented Aug 25, 2019

One thing you could do is have your own Distribution style trait that you implement for the types you want and then use one of the more minimalistic rand crates like rand_os or even seed directly from /dev/random though that will require some gymnastics for windows support

Edit: that actually sounds like a good thin crate. and if it's explicitly for non-cryptographic uses then it should be pretty easy

@BurntSushi
Copy link
Owner Author

@elichai For the purposes of the quickcheck crate---and honestly, probably many other simple uses---something like the Rng::gen method would also provide a ton of utility. If it doesn't, quickcheck would need to roll all of that itself. Which maybe is the right path forward, but is something to consider.

Also, rand_os still depends on rand_core, moreover, it looks like rand_os is just implemented via the getrandom crate. But the getrandom crate (and therefore, rand_os) seems to be for cryptographic RNGs? So we might not need that at all.

@elichai
Copy link

elichai commented Aug 25, 2019

I'm writing a thin crate now that just gives a replacement to the Rng/RngCore traits.
You could then implement using getrandom and easily get all that rand gives you.

Hope to publish a first release soon and if you like it you can use it :)

@elichai
Copy link

elichai commented Aug 26, 2019

FYI, ThreadLocal is also a secure Rng.
I think you would want something more like

https://lemire.me/blog/2019/03/19/the-fastest-conventional-random-number-generator-that-can-pass-big-crush
I might implement that.

@BurntSushi
Copy link
Owner Author

Hope to publish a first release soon and if you like it you can use it

Can you say what your long term maintenance plan is?

FYI, ThreadLocal is also a secure Rng.

I'm not sure I grok this sentence. Could you elaborate?

@elichai
Copy link

elichai commented Aug 26, 2019

About ThreadLocal you can see here https://docs.rs/rand/0.7.0/rand/rngs/struct.ThreadRng.html that it tries to use OsRng is is marked as CryptoRng

About my plans, My hope is to write it in a way that requires little maintenance as the whole point is to keep the code very thin.
But of course i'll fix any bugs coming and add support to new std/core types.

You can see my current work here: https://github.com/elichai/random-rs
would love your feedback, and I hope to get a first release today.

@BurntSushi
Copy link
Owner Author

That's ThreadRng though, not ThreadLocal.

About my plans, My hope is to write it in a way that requires little maintenance as the whole point is to keep the code very thin.
But of course i'll fix any bugs coming and add support to new std/core types.

You can see my current work here: https://github.com/elichai/random-rs
would love your feedback, and I hope to get a first release today.

Thanks! So I just want to be crystal clear: my standards for bringing in another crate for this are going to be very high, in particular because it will likely be a public dependency of quickcheck. That means quickcheck will inherit any churn that this new crate introduces.

See here for some words I've written about how I evaluate dependencies.

@elichai
Copy link

elichai commented Aug 26, 2019

Right, sorry. confused the names.
And that's my first priority, having a stable API, without breaking support for MSRV, minimum API breaking changes (hopefully even non, but cannot predict the future.)

As I said i'm planing to release today, would be appreciated if you could look/play with it a bit and tell me your thoughts, but for actually depending as you said in your post waiting a bit is good advice, as we're all humans and stuff tend to come up that I might need to fix. but hopefully that will be a very short period.

I'll reiterate, my plan is to have a stable API that doesn't require any changes except adding support for new primitives in the future.
That's why I'm using clippy+rustfmt+miri in the CI. to make sure everything is correct and good. and I test against 15 different rust compiler versions.
Next week I'll have more time and will also invest time writing enough examples so that their tests could be used as API tests to also make sure there's no breakage what so ever.

@dhardy
Copy link

dhardy commented Aug 28, 2019

Sorry if this comes across as somewhat bitter; I'll try not to be. So many people have expectations of what a random number crate should be; the rand lib tries to deliver on many of those goals, but at the end of the day I'm not sure it's possible to satisfy everyone:

  • some users want crypto-grade randomness; some only want small PRNGs
  • many users (e.g. quickcheck) care only about uniformly-distributed values or don't care at all about the distribution, while others do care; for this reason most of the non-uniform distributions have now been moved to rand_distr in the latest release
  • some users want only minimal crates while others complain about micro-craterism and too many dependencies
  • OpenSource developers are frequently encouraged to update little and often; yet others complain about API churn

IMO, worst of all, there is no desire to add a minimal version check to their CI

For what it's worth, I have re-opened rust-random/rand#850. Opinions differ about what it's worth supporting, but perhaps I see some value in this now.


I won't tell people use rand or don't use rand, but if people wish to pitch in with their views on the project, we will try to listen. Whether it is a good fit for quickcheck, I actually don't know.

@dhardy
Copy link

dhardy commented Aug 28, 2019

Also, rand_os still depends on rand_core, moreover, it looks like rand_os is just implemented via the getrandom crate. But the getrandom crate (and therefore, rand_os) seems to be for cryptographic RNGs? So we might not need that at all.

rand_os is basically deprecated at this point; getrandom is for OS-level randomness via syscall or equivalent only (thus not great performance).

FYI, ThreadLocal is also a secure Rng.
I think you would want something more like

See the rand book for a summary of RNGs we supply — there are both small, fast PRNGs and crypto-RNGs. Some are very easy to re-implement; e.g. SeedableRng::seed_from_u64 uses PCG-32 internally. You could use our implementations via rand_core + e.g. rand_pcg or you could just copy them.

@dhardy
Copy link

dhardy commented Aug 28, 2019

So I just want to be crystal clear: my standards for bringing in another crate for this are going to be very high, in particular because it will likely be a public dependency of quickcheck. That means quickcheck will inherit any churn that this new crate introduces.

This is a very good point, and it doesn't help that to use Rand's Distribution trait you need to depend on the whole of rand. It may be helpful here if that trait moves to rand_core, which is supposed to be more stable — unfortunately, history shows that it is not much more stable, and there are still open questions about more breaking changes to its API.

So again, I don't know what is your best option here.

@BurntSushi
Copy link
Owner Author

@dhardy Thanks! I appreciate your response. The minimal version check is probably my most pressing concern. I have looked into using only rand_core in quickcheck before, but couldn't see how to do it (although I don't remember why).

@BurntSushi
Copy link
Owner Author

@dhardy

but at the end of the day I'm not sure it's possible to satisfy everyone

Yeah, it definitely isn't. These are hard trade offs to balance. I think the most common complaints I hear about rand come down to churn, the size of the dependency tree and the spawling APIs spread out over crates.

Churn is hard to fix, because it requires settling on an API that is fixed for a potentially long period of time. rand has some heavy expectations levied upon it, because there is nothing anyone can do to stop folks from thinking of random number generation as a core and fundamental part of any software ecosystem. Rust and its ecosystem, along with rand, are maturing. While many of the core crates in the ecosystem have settled down and experience few if any breaking changes, rand is still releasing breaking changes at a fairly rapid cadence relative to other crates of similar station. This is made even worse because rand is typically a public dependency, so it's hard for folks to migrate to new releases at their own pace.

I don't know what rand's roadmap looks like and how to balance this trade off. My main aim here is to perhaps lend my voice toward adding even more weight on the side of figuring out how to stabilize soon. I realize this is probably already a priority for y'all, so I don't mean this to sound like I'm accusing you of not prioritizing it.

some users want only minimal crates while others complain about micro-craterism and too many dependencies

Right. This is a tough one to balance too. I personally think the ecosystem has swung too far in the direction of micro-crates, but that's just my opinion. And even if everybody agreed with that opinion, the path to fixing it is not clear. But as an example, there probably exists a design in which rand condenses the number of crates in its tree while retaining similarish functionality via the use of cargo features. Maybe the default is to include a bunch of RNGs that are totally split out into separate crates, and if folks want to avoid building them, then they can tweak rand's features themselves. The primary benefit of doing this, IMO, is cohesion. It brings everything together into one place and (IMO) makes it simpler for folks to comprehend. It also has other benefits, such as making dependency review easier. (And I personally hope dependency review becomes a more and more common thing that we do in the Rust ecosystem.)

Of course, I am only representing the benefits. There are of course benefits to splitting things across crates, and in particular, there are downsides to using Cargo features. So most of what I'm saying here is an opinion based on my own sensibilities.

So again, I don't know what is your best option here.

I definitely have a very strong desire to depend on the random crate that everyone else uses. That is a huge benefit that can't really be mitigated by depending on something that is less commonly used. Therefore, I'd like to impress upon you that I did not make this issue lightly, and I only did it after a large amount of frustration on my end began to bubble up and boil over.

@dhardy
Copy link

dhardy commented Aug 28, 2019

@BurntSushi thanks for your response!

While many of the core crates in the ecosystem have settled down and experience few if any breaking changes, rand is still releasing breaking changes at a fairly rapid cadence relative to other crates of similar station.

rand is partly still in flux because its code was left stagnant for a long time after being split out from std lib and after @huonw moved on. But I also get the impression that many people's view is "how hard can random numbers be?", without appreciating how many aspects there are to the subject (at least getrandom is now separate).

I don't know what rand's roadmap looks like and how to balance this trade off.

There isn't actually a lot more planned. There are some open issues regarding API tweaks, but it may be better to minimise churn in these cases. The big issue is really stabilising the rand_core API, which recently had the error type revamped (leaving yet another frustration).

But as an example, there probably exists a design in which rand condenses the number of crates in its tree while retaining similarish functionality via the use of cargo features.

Compile time isn't even a big motivation; as I understand it it's more about having access to the APIs while having less dependency code to review. The current design may have been influenced too much by crypto-nerds.

I half wonder if it would be for the best to re-assimilate rand_core and our required PRNGs back into the rand lib now that getrandom is separate, but realistically I think this is too much churn. In any case, rand v0.7 is in a better place than v0.6, and rust-random/rand#872 may help further.

I definitely have a very strong desire to depend on the random crate that everyone else uses.

If your only public API dependencies are on the RngCore and Distribution traits, then moving the latter into rand_core may be a nice move — though users will still end up using rand anyway unless another crate re-implements all those Distribution impls (u8, Option<(f32, bool)>, [i32; 13] etc.).

@aldanor
Copy link

aldanor commented Aug 28, 2019

Here's another point on the topic from a quant :) What many people don't realize is that stacking multiple uniform RNGs for a bunch of independent variables does not yield a good coverage of a high-dimensional space. This is a very well-known fact e.g. in math finance when performing Monte-Carlo simulations; to achieve good coverage in a sparse space, you would typically use a low-discrepancy sequence. Even in 2-D space, discrepancy is already visible; as dimensionality increases, things get progressively worse.

For instance, you have a tuple (f32, f32, f32, f32), and for each parameter you generate a uniform rng sequence independently; chances are, the volume you'll cover will be very "chunked" with huge visible gaps. It's not immediately obvious - after all, each of the coordinates is perfectly uniform.

Just a (quasi-)random thought - given the purpose of this crate (to cover as much of the search space as possible leaving no gaps), would it make sense to consider using QRngs such as Sobol sequence? As for other distributions like normal, afaik it is possible to make a (quasi-)normal qrng out of a uniform qrng using Box-Muller transform. In reality, would people use anything more sophisticated than uniform / normal in this crate?

Here's how it works), note it's 2-D where things are not that bad. /* removed the image so as not to clutter this thread */

@BurntSushi
Copy link
Owner Author

@aldanor Thanks for the insight, but I don't think that's relevant to this specific issue? Maybe you could open a new issue? This issue isn't about switching rngs, but rather, considering which crates to use.

Also, if you're looking for changes to be made, it would be helpful to leverage your expertise and explain in more simpler terms the changes that would result by switching to a different rng.

Also, note that quickcheck does not strictly use a uniform distribution. It specifically also tries to pick out problem values for specific types.

@aldanor
Copy link

aldanor commented Aug 29, 2019

@BurntSushi

switching to a different rng

My point was that QRngs don't require an Rng at all :) (hence a comment in this thread) As in, they are essentially deterministic and behave better in higher-dimensional spaces (at least the floating-point ones) when you have tons of parameters.

If this topic would be of interest of anyone, I can open a separate issue and list a few thoughts there.

@BurntSushi
Copy link
Owner Author

@aldanor Yes, a separate issue please. The details of how quickcheck generates values is way off topic for this thread I think.

@newpavlov
Copy link

newpavlov commented Aug 29, 2019

@BurntSushi

I personally think the ecosystem has swung too far in the direction of micro-crates, but that's just my opinion

As probably one of the main advocates of the current micro-crate approach employed by rand, I would like to defend it a bit. Let's take a look at the low level of current rand stack and its separation of concerns:

  • getrandom: this crate is for retrieving system entropy on as much systems as possible. It's a surprisingly tricky process to do right and efficiently, so I would strongly recommend not to try to roll your own solution in this space. Plus overwriting this crate by a custom one allows to run crates dependent on randomness on "unsupported" targets (e.g. on embedded devices). API is essentially stabilized due to its extreme simplicity. We had some question about the Error type, but I think we've worked them out and we are unlikely to change it after v0.2 release in near future (see trait impl disappeared without bumping major version rust-random/getrandom#94). The main source of potential future breakage is handling of the wasm32-unknown-unknown target and some crate feature changes (e.g. see Feature for using CPU-based RNG rust-random/getrandom#84).
  • rand_core: contains core traits for describing RNG functionality. So crates (including third-party ones) implementing PRNG or hardware RNG can depend only on it, without bringing the whole rand on board, and be immediately compatible crates generic over the RNG trait. It's intentionally very bare-bone and fancier functionality is left for extension traits like rand::Rng. We probably could move some functionality from rand::Rng to RngCore, but where should we draw the line? Some people would argue that range generation is a must have, others would like to see sampling and random string generation. I think we are quite close to API stabilization here, the main question right now is the error type. This crate optionally depends on getrandom for seeding PRNG algorithms from system entropy. We also plan to move OsRng here (it will be feature-gated behind getrandom). BTW note that we even bring parts of byteorder to rand_core, instead of adding +1 dependency.
  • rand_chacha and other RNG crates: these represent algorithm implementations and (potentially) "drivers" for hardware RNGs. They have to depend only on rand_core (we do not recommend enabling getrandom feature). Right now rand_chacha is essentially just a thin wrapper around c2-chacha and arguably it can be deprecated in favor of implementing RngCore trait directly in ChaCha implementation crate(s). These crates essentially can be stabilized right after stabilization of rand_core.

For me it's a very clean and logical separation, which allows incremental stabilization of some rand ecosystem parts and gradual deprecation of others (e.g. like we do with rand_os and rand_jitter), while being friendly to third-party crate developers. Also for some applications this approach significantly reduces amount of code which has to be compiled and reviewed.

I would like to argue that most of the observed churn is not caused directly by the micro-crate design. I think the "explosion" of rand crates is simply the most visible change, so people start to associate our mistakes (which, granted, we done a plenty...) with it. Also when something breaks in a project which uses micro-crate design, often it's some small crate upstream, so the first thought is "damn those micro-crates, again they broke something", while the same breakage as easily could've happened with a monolithic design. So in my opinion we have the good old "correlation does not imply causation" on our hands here.

@dhardy
Copy link

dhardy commented Aug 29, 2019

@aldanor has a point — the optimal distribution of values for quickcheck is not exactly the uniform distribution which the Rand crate is designed for. This reminds me of this request (which has never been implemented but could conceivably be added to the lib). It could therefore be that the optimal approach (numerically) would be to use your own trait and sequence generator in quickcheck. (But against this, libraries now have two different frameworks to implement for randomised generation of values, where this makes any sense.)

Thanks @newpavlov for summarising the status of Rand crates. Personally I am very happy that getrandom is a separate crate now and I also think many of the complaints are simply due to the visibility of "so much stuff for random numbers". Conceivably I think merging rand, rand_core, rand_pcg and rand_chacha could still work, but of course it would be a kludge. The only technical drawback of micro-cratering IMO is that there is more packaging (a bigger problem for linux distros) and scope for packaging errors (e.g. the rand 0.6.5 issue).

@BurntSushi
Copy link
Owner Author

BurntSushi commented Aug 29, 2019

@newpavlov Thanks for the thoughts. This is why this problem is hard: reasonable people can disagree. The design you laid out is perfectly defensible. But there are other designs, hinted at by @dhardy. For example, I would definitely agree with you that getrandom is a nice crate to have that is separate from everything else. But conceivably, a substantial portion of the rand ecosystem could be merged into the rand crate without sacrificing the use cases you've laid out that are currently supported by a micro-crate design. Obviously, this would require leaning more heavily on Cargo features, and those come with their own downsides.

So in my opinion we have the good old "correlation does not imply causation" on our hands here.

Right. The size of a dependency tree is typically just a signal. But I'd like to re-iterate my point above about cohesion. I don't think we can really evaluate crate hierarchies in a technical vacuum. We also need to consider the actual interaction the folks have with crates. This includes everything from trying to understand the aggregate APIs provided by those crates to managing expectations when they see a much larger number of crates added to their tree than they would otherwise expect.

I'm in the process of writing a blog post about dependency selection that will hopefully try to explain my thoughts more clearly, assuming I get it done. It isn't just about rand, but about the wider ecosystem and my own personal struggle with this too. For example, I just removed two dependencies from regex, but in the near future, I have plans to add two more and I don't know how to avoid it.

@newpavlov
Copy link

newpavlov commented Aug 29, 2019

@BurntSushi

Obviously, this would require leaning more heavily on Cargo features, and those come with their own downsides.

Well, in my opinion it's just sweeping complexity under the rug. Increasing number of features also makes it harder to comprehensively test a crate, since number of possible feature combinations raises exponentially. And instead of a clean dependency graph you get a potentially messy grey-box. This is why I believe that instead of introducing a ton of features to std (see rust-lang/rfcs#2663), it will be better to add more "standard" crates like alloc and proc_macro.

One argument with which I agree is that micro-crate design (BTW I think "micro" is a bit too strong in our case) makes life of linux package maintainers more difficult, but I think it this case a better approach would be to develop a solution for packaging a Rust application into a single package with all its upstream dependencies listed in its Cargo.lock (granted, it may require some changes to packaging policies, which may not be easy).

I think it's somewhat funny how number of crates became a main complexity metric in such discussions. Instead of, for example, total LoC or number of groups which maintain your dependencies. I guess the main reason for that is because it's the only metric which you always observe when compiling your project, so people tend to dramatically overestimate its importance. (although there are certain issues with cargo, which make situation a bit worse than it should be, e.g. like downloading unused optional dependencies)

@BurntSushi
Copy link
Owner Author

Well, in my opinion we just sweep complexity under the rug here. Increasing number of features also makes it harder to comprehensively test a crate, since number of possible feature combinations raises exponentially. And instead of a clean dependency tree you get a potentially messy grey-box.

Yes, those are some of the downsides. I mentioned that Cargo features had downsides, so I wasn't trying to sweep them under the rug.

One argument with which I agree is that micro-crate design (BTW I think "micro" is a bit too strong in our case) makes life of linux package maintainers more difficult,

It's not just distros. It's also folks that need to review dependencies, i.e., something like cargo crev, which I am currently experimenting with.

but I think it this case a better approach would be to develop a solution for packaging a Rust application with all its upstream dependencies into a single package.

Sounds like a non-starter to me? Some distros, like Archlinux, do this. But it's not going to fly for Debian, as far as I understand things.

I think it's somewhat funny how number of crates became a main complexity metric in such discussions. Instead of, for example, total LoC or number of maintainer groups which work on your dependencies. I guess the main reason for that is because it's the only metric which you always observe when compiling your project, so people tend to dramatically overestimate its importance.

Yes, I mostly agree. But it doesn't seem funny to me, I guess. It makes perfect sense. As the maintainer of Rust applications, I try to keep my dependency trees under control. Whenever I run cargo update, I generally carefully audit its output and investigate anything that looks strange. If my dependency tree explodes to hundreds of crates, then this process becomes much harder. This is where it gets tricky, because it's often the case that there is no one single source of the explosion. It's an aggregate effect. Which makes it extremely difficult to fix, because it's very hard to convince anyone that it's a problem, because in isolation, it generally isn't.

A layer of abstraction over crates (like "maintenance groups") is perhaps a good idea. Certainly, in some cases, a maintenance group more closely approximates the maintenance burden assumed by relying on dependencies. But I'm pretty sure that will require significant tooling to pull off correctly, nevermind the social work required to do it. I'm not much of a visionary, and I don't have a lot of time to burn on this stuff, so I'm more or less looking for ways of making things better today, using the tools we have. I can't spend too much of my time on what the "ideal" scenario is divorced from the feasibility of it.

I am on the receiving end of this too. I very often hear from folks that they don't want to use regex because they want to avoid the dependency tree it brings in. What am I supposed to do with that? I want people to use my work, so I'm going to try to remove the barriers that users are telling me that are preventing them from using my work. From my perspective, I can either try to educate them that the dependency tree is fine and would exist anyway, or I can try to actually shrink the tree itself. Maybe there are other avenues that involve changing how our dependency trees are surfaced to users, but that's not something I can do now. So my plan for now is to try to do both. (My other idea is to make a lighter regex crate as an alternative, but there are various problems with that approach.)

(although there are certain issues with cargo, which make situation a bit worse than it should be, e.g. like downloading unused optional dependencies)

Yes, it would be very nice to fix this.

huitseeker added a commit to huitseeker/diem that referenced this issue Sep 6, 2019
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
bors-libra pushed a commit to diem/diem that referenced this issue Sep 6, 2019
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand

Closes: #717
Approved by: mimoo
@dhardy
Copy link

dhardy commented Sep 12, 2019

@BurntSushi do you think this thread has served its purpose by now? I don't care much for the Libra project, and minimising dependencies is potentially a laudable goal in its own right, but that others are claiming rand is "antiquated" and using this as evidence is rather odd.

My recommendations:

  1. Open another thread discussing alternative sampling routines (distributions) on technical merit (as mentioned in this post).
  2. If you think it appropriate, open an issue on the rand project with recommendations on policies / changes. (This issue already inspired rand_chacha and reducing dependencies rust-random/rand#872.)

@BurntSushi
Copy link
Owner Author

Yes, I don't understand the commentary from the libra people. Although, they are just doing what I've already done in a few places: ripping out tempfile in favor of a much simpler solution. I think I linked to such things in my initial comment, which is maybe why they are linking this issue? Not sure.

I'm not sure I've quite decided what to do. But I haven't thought about this too much lately. When I circle back around to this, I'll update this ticket with a decision and close it out.

@matklad
Copy link

matklad commented Nov 12, 2020

Couple of thoughts here (admittedly, I've spend only about 5 minutes on catching up the the thread & design):

  • there are a number of "rand, but simple" crates in the ecosystem. However, the recently introduced fastrand crate seems to finally hit the nail on the head in terms of design.
  • Rng::gen is useful, but it seems roughly equivalent to Arbitrary::arbitrary? Ie, quickcheck itself provides API for generated typed arbitrary values?
  • If folks really need access to Rng ecosystem, they can implement RngCore trait for whatever source of randomness we provide them with.

So, it seems to me that we can provide the following simplified API:

pub trait Arbitrary: Clone + Send + 'static {
    fn arbitrary(seed: &mut Seed) -> Self;

    fn shrink(&self) -> Box<dyn Iterator<Item = Self>> {
        empty_shrinker()
    }
}

pub struct Seed { 
    rng: fastrand::Rng,
    size: usize,
}
impl Seed {
    fn rng(&mut self) -> &mut fastrand::Rng { &mut self.rng }
    fn gen<T: Arbitrary>(&mut self) -> T { T::arbitrary(self) }
    fn size(&self) -> usize { self.size }
}

@BurntSushi
Copy link
Owner Author

@matklad That looks plausible. I think the main issue is that it is probably occasionally useful to have access to the rng within an Arbitrary impl. While Arbitrary itself provides a way to generate values of a specific type, it doesn't give you other things, like shuffling a slice. But if shuffling or a small number of other operations are the only things missing, then we could just provide those APIs on Seed. (Although I think I'd probably rename Seed to Rng in your proposal?)

But otherwise, yes, I like the idea. fastrand looks good and about what I had in mind.

@matklad
Copy link

matklad commented Nov 13, 2020

I think the main issue is that it is probably occasionally useful to have access to the rng within an Arbitrary impl.

Yeah, that's why my strawman has rng and gen on Seed -- having those as . methods, rather than :: assoc functions should be more ergonomic. And there are indeed two ways we can implement this -- either add .rng() accessor and expose fast_rand API publicly, or just lift all fastrand's methods onto Seed (if we go this route, it probably makes sense to just vendor fastrand, as it is mostly API, the core rng logic is tiny).

Although I think I'd probably rename Seed to Rng in your proposal?

Yeah, the naming is half-backed. The idea behind Seed is that it seems to me that one needs an unbiased source of randomness (Seed::rng) as well as a way to enable bias for small or pathological inputs (Seed::size). This are somewhat orthogonal concerns, so hence the split into Rng (which is uniform) and Seed (which has skew parameters, like size). If we uplift Rng's API directly only Seed, than it makes sense to name the resulting type Rng.

Additional minor observation: it's nice that Seed parameter is a concrete type in my proposal. That means that for libraries providing Arbitrary impls for their types codegen will happen upstream (as there's no monomorphisation involved), which helps compile times!

@matklad
Copy link

matklad commented Nov 13, 2020

While Arbitrary itself provides a way to generate values of a specific type, it doesn't give you other things, like shuffling a slice.

Hm, re-reading this, I feel that there might be some misunderstanding. Arbitrary has access to those other things, because arbitrary function takes Seed as an argument, and Seed provides API for shuffling and such. We don't need to add extra API to Arbitrary itself.

@BurntSushi
Copy link
Owner Author

BurntSushi commented Nov 13, 2020

Yeah, that's why my strawman has rng and gen on Seed -- having those as . methods, rather than :: assoc functions should be more ergonomic. And there are indeed two ways we can implement this -- either add .rng() accessor and expose fast_rand API publicly, or just lift all fastrand's methods onto Seed (if we go this route, it probably makes sense to just vendor fastrand, as it is mostly API, the core rng logic is tiny).

So looking at the API, I think all we'd probably want is shuffle. So:

pub trait Arbitrary: Clone + Send + 'static {
    fn arbitrary(gen: &mut Gen) -> Self;

    fn shrink(&self) -> Box<dyn Iterator<Item = Self>> {
        empty_shrinker()
    }
}

pub struct Gen { // was 'Seed' in your original comment
    rng: fastrand::Rng,
    size: usize,
}
impl Gen {
    pub fn gen<T: Arbitrary>(&mut self) -> T { T::arbitrary(self) }
    pub fn size(&self) -> usize { self.size }
    pub fn shuffle<T>(&self, slice: &mut [T]) { self.rng.shuffle(slice); }
}

And I think that should do it? Many of the Arbitrary impls inside this crate will probably use more of the fastrand routines, but I don't think we need to expose them since they will be expose by virtue of the Arbitrary impl itself. Note also that I removed the rng method. This way, fastrand is not a public dependency.

I chose Gen because of your note about size. I think Rng would be a defensible name too, but Gen at least indicates there might be something more going on? Note that the name Gen comes from the Haskell quickcheck library.

Hm, re-reading this, I feel that there might be some misunderstanding. Arbitrary has access to those other things, because arbitrary function takes Seed as an argument, and Seed provides API for shuffling and such. We don't need to add extra API to Arbitrary itself.

I think we're on the same page? What I meant was, Arbitrary only gives you the ability to generate random values. That is the majority of the fastrand API right there. But fastrand does provide some other things---I think the only material one is slice shuffling---that we'd also want to make available to folks who write Arbitrary impls. I did not mean to suggest that shuffling needs to be on the Arbitrary trait. But rather, that it be available. And I think the most natural place for it is on your Seed type.

@matklad
Copy link

matklad commented Nov 13, 2020

I don't think we need to expose them since they will be expose by virtue of the Arbitrary impl itself

That's clever and that's the bit I didn't understand! It makes the API surface smaller, and hides public dependency!

And yeah, I agree that shuffle is the only thing we need to expose additionally. fast_rand also provides API to generate an unbiased integer in the given range, which we'd loose with your proposal. But, I think unbiasness is not an important property here, and otherwise the API can be simulated easily enough with %.

Gen names looks perfect.

@BurntSushi
Copy link
Owner Author

Aye. And if we need to bring in more methods from fastrand, it should be as easy as just re-exporting them on Gen. I'd have no problem with that personally.

@BurntSushi
Copy link
Owner Author

Well, fastrand appears to be archived. So it seems unwise to depend on it now. There is also oorandom.

@BurntSushi
Copy link
Owner Author

And it looks like the rand 0.8 release dropped some dependencies (or made more of them optional anyway). If we upgrade and remove rand as a public dependency (using the tactic discussed above), then that would probably address most of my concerns. At that point, I think I could do a quickcheck 1.0 release at least.

BurntSushi added a commit that referenced this issue Dec 27, 2020
This removes the use of the rand_core crate as a public dependency. It
is now an implementation detail.

We achieve this primarily by turning the `Gen` trait into a concrete
type and fixing the fallout.

This does make it impossible for callers to use their own `Gen`
implementations, but it's unclear how often this was being used (if at
all). This does also limit the number of RNG utility routines that
callers have easy access to. However, it should be possible to use
rand's `SeedableRng::from_{rng,seed}` routines to get access to more
general RNGs.

Closes #241
BurntSushi added a commit that referenced this issue Dec 27, 2020
This removes the use of the rand_core crate as a public dependency. It
is now an implementation detail.

We achieve this primarily by turning the `Gen` trait into a concrete
type and fixing the fallout.

This does make it impossible for callers to use their own `Gen`
implementations, but it's unclear how often this was being used (if at
all). This does also limit the number of RNG utility routines that
callers have easy access to. However, it should be possible to use
rand's `SeedableRng::from_{rng,seed}` routines to get access to more
general RNGs.

Closes #241
mxinden added a commit to multiformats/rust-multiaddr that referenced this issue Jun 16, 2021
Pre quickcheck v1.0 rand and quickcheck had to be updated in lock-step,
given that the latter makes use of traits of the former.

This commit decouples the two, only depending on quickcheck directly for
randomness. This will ease the transisition to quickcheck 1.0, see
BurntSushi/quickcheck#241.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants