Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: remove dependency on `rand` ecosystem #241

Open
BurntSushi opened this issue Aug 25, 2019 · 28 comments

Comments

@BurntSushi
Copy link
Owner

commented Aug 25, 2019

I am no longer happy about depending on the rand crates. There is too much churn, too many crates, and IMO, worst of all, there is no desire to add a minimal version check to their CI. Which means anything that depends on quickcheck in turn cannot reliably have its own minimal version check.

Because I am tired of depending on rand, I have started removing it completely where possible. For example, in walkdir, I've removed quickcheck as a dependency. In ripgrep, I've removed tempfile as a dependency, because it in turn was the only thing bringing rand into ripgrep's dependency tree.

I don't see any other path forward here. I can either continue to grin and bear rand, drop everything that depends on randomness, or figure out how to generate randomness without rand. Specifically, I'd very much like to add a minimal version check back to the regex crate, which catches bugs that happen in practice. (See here and here.) My sense is that there is some design space in the ecosystem for a simple source of randomness that doesn't need to be cryptographically secure, and an API that does not experience significant churn. Certainly, quickcheck does not need a cryptographic random number generator.

With that said, there is some infrastructure in the rand API that is incredibly useful. For example, quickcheck makes heavy use of the Rng::gen method for generating values based on type.

So it seems like if we have something like the Rng trait with with a non-cryptographic RNG, then we'd be probably good to go.

Are there other avenues here? What have I missed? My experience in building infrastructure for randomness is pretty limited, so am I underestimating the difficulty involved here?

Another side to this question is whether any users of quickcheck are leveraging parts of the rand ecosystem that would be difficult or impossible to do if we broke ties with rand.

@BurntSushi BurntSushi added the question label Aug 25, 2019

@BurntSushi BurntSushi changed the title brainstorm: remove dependency on `rand` ecosystem RFC: remove dependency on `rand` ecosystem Aug 25, 2019

@elichai

This comment has been minimized.

Copy link

commented Aug 25, 2019

One thing you could do is have your own Distribution style trait that you implement for the types you want and then use one of the more minimalistic rand crates like rand_os or even seed directly from /dev/random though that will require some gymnastics for windows support

Edit: that actually sounds like a good thin crate. and if it's explicitly for non-cryptographic uses then it should be pretty easy

@BurntSushi

This comment has been minimized.

Copy link
Owner Author

commented Aug 25, 2019

@elichai For the purposes of the quickcheck crate---and honestly, probably many other simple uses---something like the Rng::gen method would also provide a ton of utility. If it doesn't, quickcheck would need to roll all of that itself. Which maybe is the right path forward, but is something to consider.

Also, rand_os still depends on rand_core, moreover, it looks like rand_os is just implemented via the getrandom crate. But the getrandom crate (and therefore, rand_os) seems to be for cryptographic RNGs? So we might not need that at all.

@elichai

This comment has been minimized.

Copy link

commented Aug 25, 2019

I'm writing a thin crate now that just gives a replacement to the Rng/RngCore traits.
You could then implement using getrandom and easily get all that rand gives you.

Hope to publish a first release soon and if you like it you can use it :)

@elichai

This comment has been minimized.

Copy link

commented Aug 26, 2019

FYI, ThreadLocal is also a secure Rng.
I think you would want something more like

https://lemire.me/blog/2019/03/19/the-fastest-conventional-random-number-generator-that-can-pass-big-crush
I might implement that.

@BurntSushi

This comment has been minimized.

Copy link
Owner Author

commented Aug 26, 2019

Hope to publish a first release soon and if you like it you can use it

Can you say what your long term maintenance plan is?

FYI, ThreadLocal is also a secure Rng.

I'm not sure I grok this sentence. Could you elaborate?

@elichai

This comment has been minimized.

Copy link

commented Aug 26, 2019

About ThreadLocal you can see here https://docs.rs/rand/0.7.0/rand/rngs/struct.ThreadRng.html that it tries to use OsRng is is marked as CryptoRng

About my plans, My hope is to write it in a way that requires little maintenance as the whole point is to keep the code very thin.
But of course i'll fix any bugs coming and add support to new std/core types.

You can see my current work here: https://github.com/elichai/random-rs
would love your feedback, and I hope to get a first release today.

@BurntSushi

This comment has been minimized.

Copy link
Owner Author

commented Aug 26, 2019

That's ThreadRng though, not ThreadLocal.

About my plans, My hope is to write it in a way that requires little maintenance as the whole point is to keep the code very thin.
But of course i'll fix any bugs coming and add support to new std/core types.

You can see my current work here: https://github.com/elichai/random-rs
would love your feedback, and I hope to get a first release today.

Thanks! So I just want to be crystal clear: my standards for bringing in another crate for this are going to be very high, in particular because it will likely be a public dependency of quickcheck. That means quickcheck will inherit any churn that this new crate introduces.

See here for some words I've written about how I evaluate dependencies.

@elichai

This comment has been minimized.

Copy link

commented Aug 26, 2019

Right, sorry. confused the names.
And that's my first priority, having a stable API, without breaking support for MSRV, minimum API breaking changes (hopefully even non, but cannot predict the future.)

As I said i'm planing to release today, would be appreciated if you could look/play with it a bit and tell me your thoughts, but for actually depending as you said in your post waiting a bit is good advice, as we're all humans and stuff tend to come up that I might need to fix. but hopefully that will be a very short period.

I'll reiterate, my plan is to have a stable API that doesn't require any changes except adding support for new primitives in the future.
That's why I'm using clippy+rustfmt+miri in the CI. to make sure everything is correct and good. and I test against 15 different rust compiler versions.
Next week I'll have more time and will also invest time writing enough examples so that their tests could be used as API tests to also make sure there's no breakage what so ever.

huitseeker added a commit to huitseeker/libra that referenced this issue Aug 26, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Aug 26, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Aug 26, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Aug 28, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
@dhardy

This comment has been minimized.

Copy link

commented Aug 28, 2019

Sorry if this comes across as somewhat bitter; I'll try not to be. So many people have expectations of what a random number crate should be; the rand lib tries to deliver on many of those goals, but at the end of the day I'm not sure it's possible to satisfy everyone:

  • some users want crypto-grade randomness; some only want small PRNGs
  • many users (e.g. quickcheck) care only about uniformly-distributed values or don't care at all about the distribution, while others do care; for this reason most of the non-uniform distributions have now been moved to rand_distr in the latest release
  • some users want only minimal crates while others complain about micro-craterism and too many dependencies
  • OpenSource developers are frequently encouraged to update little and often; yet others complain about API churn

IMO, worst of all, there is no desire to add a minimal version check to their CI

For what it's worth, I have re-opened rust-random/rand#850. Opinions differ about what it's worth supporting, but perhaps I see some value in this now.


I won't tell people use rand or don't use rand, but if people wish to pitch in with their views on the project, we will try to listen. Whether it is a good fit for quickcheck, I actually don't know.

@dhardy

This comment has been minimized.

Copy link

commented Aug 28, 2019

Also, rand_os still depends on rand_core, moreover, it looks like rand_os is just implemented via the getrandom crate. But the getrandom crate (and therefore, rand_os) seems to be for cryptographic RNGs? So we might not need that at all.

rand_os is basically deprecated at this point; getrandom is for OS-level randomness via syscall or equivalent only (thus not great performance).

FYI, ThreadLocal is also a secure Rng.
I think you would want something more like

See the rand book for a summary of RNGs we supply — there are both small, fast PRNGs and crypto-RNGs. Some are very easy to re-implement; e.g. SeedableRng::seed_from_u64 uses PCG-32 internally. You could use our implementations via rand_core + e.g. rand_pcg or you could just copy them.

@dhardy

This comment has been minimized.

Copy link

commented Aug 28, 2019

So I just want to be crystal clear: my standards for bringing in another crate for this are going to be very high, in particular because it will likely be a public dependency of quickcheck. That means quickcheck will inherit any churn that this new crate introduces.

This is a very good point, and it doesn't help that to use Rand's Distribution trait you need to depend on the whole of rand. It may be helpful here if that trait moves to rand_core, which is supposed to be more stable — unfortunately, history shows that it is not much more stable, and there are still open questions about more breaking changes to its API.

So again, I don't know what is your best option here.

@BurntSushi

This comment has been minimized.

Copy link
Owner Author

commented Aug 28, 2019

@dhardy Thanks! I appreciate your response. The minimal version check is probably my most pressing concern. I have looked into using only rand_core in quickcheck before, but couldn't see how to do it (although I don't remember why).

@BurntSushi

This comment has been minimized.

Copy link
Owner Author

commented Aug 28, 2019

@dhardy

but at the end of the day I'm not sure it's possible to satisfy everyone

Yeah, it definitely isn't. These are hard trade offs to balance. I think the most common complaints I hear about rand come down to churn, the size of the dependency tree and the spawling APIs spread out over crates.

Churn is hard to fix, because it requires settling on an API that is fixed for a potentially long period of time. rand has some heavy expectations levied upon it, because there is nothing anyone can do to stop folks from thinking of random number generation as a core and fundamental part of any software ecosystem. Rust and its ecosystem, along with rand, are maturing. While many of the core crates in the ecosystem have settled down and experience few if any breaking changes, rand is still releasing breaking changes at a fairly rapid cadence relative to other crates of similar station. This is made even worse because rand is typically a public dependency, so it's hard for folks to migrate to new releases at their own pace.

I don't know what rand's roadmap looks like and how to balance this trade off. My main aim here is to perhaps lend my voice toward adding even more weight on the side of figuring out how to stabilize soon. I realize this is probably already a priority for y'all, so I don't mean this to sound like I'm accusing you of not prioritizing it.

some users want only minimal crates while others complain about micro-craterism and too many dependencies

Right. This is a tough one to balance too. I personally think the ecosystem has swung too far in the direction of micro-crates, but that's just my opinion. And even if everybody agreed with that opinion, the path to fixing it is not clear. But as an example, there probably exists a design in which rand condenses the number of crates in its tree while retaining similarish functionality via the use of cargo features. Maybe the default is to include a bunch of RNGs that are totally split out into separate crates, and if folks want to avoid building them, then they can tweak rand's features themselves. The primary benefit of doing this, IMO, is cohesion. It brings everything together into one place and (IMO) makes it simpler for folks to comprehend. It also has other benefits, such as making dependency review easier. (And I personally hope dependency review becomes a more and more common thing that we do in the Rust ecosystem.)

Of course, I am only representing the benefits. There are of course benefits to splitting things across crates, and in particular, there are downsides to using Cargo features. So most of what I'm saying here is an opinion based on my own sensibilities.

So again, I don't know what is your best option here.

I definitely have a very strong desire to depend on the random crate that everyone else uses. That is a huge benefit that can't really be mitigated by depending on something that is less commonly used. Therefore, I'd like to impress upon you that I did not make this issue lightly, and I only did it after a large amount of frustration on my end began to bubble up and boil over.

@dhardy

This comment has been minimized.

Copy link

commented Aug 28, 2019

@BurntSushi thanks for your response!

While many of the core crates in the ecosystem have settled down and experience few if any breaking changes, rand is still releasing breaking changes at a fairly rapid cadence relative to other crates of similar station.

rand is partly still in flux because its code was left stagnant for a long time after being split out from std lib and after @huonw moved on. But I also get the impression that many people's view is "how hard can random numbers be?", without appreciating how many aspects there are to the subject (at least getrandom is now separate).

I don't know what rand's roadmap looks like and how to balance this trade off.

There isn't actually a lot more planned. There are some open issues regarding API tweaks, but it may be better to minimise churn in these cases. The big issue is really stabilising the rand_core API, which recently had the error type revamped (leaving yet another frustration).

But as an example, there probably exists a design in which rand condenses the number of crates in its tree while retaining similarish functionality via the use of cargo features.

Compile time isn't even a big motivation; as I understand it it's more about having access to the APIs while having less dependency code to review. The current design may have been influenced too much by crypto-nerds.

I half wonder if it would be for the best to re-assimilate rand_core and our required PRNGs back into the rand lib now that getrandom is separate, but realistically I think this is too much churn. In any case, rand v0.7 is in a better place than v0.6, and rust-random/rand#872 may help further.

I definitely have a very strong desire to depend on the random crate that everyone else uses.

If your only public API dependencies are on the RngCore and Distribution traits, then moving the latter into rand_core may be a nice move — though users will still end up using rand anyway unless another crate re-implements all those Distribution impls (u8, Option<(f32, bool)>, [i32; 13] etc.).

@aldanor

This comment has been minimized.

Copy link

commented Aug 28, 2019

Here's another point on the topic from a quant :) What many people don't realize is that stacking multiple uniform RNGs for a bunch of independent variables does not yield a good coverage of a high-dimensional space. This is a very well-known fact e.g. in math finance when performing Monte-Carlo simulations; to achieve good coverage in a sparse space, you would typically use a low-discrepancy sequence. Even in 2-D space, discrepancy is already visible; as dimensionality increases, things get progressively worse.

For instance, you have a tuple (f32, f32, f32, f32), and for each parameter you generate a uniform rng sequence independently; chances are, the volume you'll cover will be very "chunked" with huge visible gaps. It's not immediately obvious - after all, each of the coordinates is perfectly uniform.

Just a (quasi-)random thought - given the purpose of this crate (to cover as much of the search space as possible leaving no gaps), would it make sense to consider using QRngs such as Sobol sequence? As for other distributions like normal, afaik it is possible to make a (quasi-)normal qrng out of a uniform qrng using Box-Muller transform. In reality, would people use anything more sophisticated than uniform / normal in this crate?

Here's how it works), note it's 2-D where things are not that bad. /* removed the image so as not to clutter this thread */

@BurntSushi

This comment has been minimized.

Copy link
Owner Author

commented Aug 28, 2019

@aldanor Thanks for the insight, but I don't think that's relevant to this specific issue? Maybe you could open a new issue? This issue isn't about switching rngs, but rather, considering which crates to use.

Also, if you're looking for changes to be made, it would be helpful to leverage your expertise and explain in more simpler terms the changes that would result by switching to a different rng.

Also, note that quickcheck does not strictly use a uniform distribution. It specifically also tries to pick out problem values for specific types.

@aldanor

This comment has been minimized.

Copy link

commented Aug 29, 2019

@BurntSushi

switching to a different rng

My point was that QRngs don't require an Rng at all :) (hence a comment in this thread) As in, they are essentially deterministic and behave better in higher-dimensional spaces (at least the floating-point ones) when you have tons of parameters.

If this topic would be of interest of anyone, I can open a separate issue and list a few thoughts there.

@BurntSushi

This comment has been minimized.

Copy link
Owner Author

commented Aug 29, 2019

@aldanor Yes, a separate issue please. The details of how quickcheck generates values is way off topic for this thread I think.

@newpavlov

This comment has been minimized.

Copy link

commented Aug 29, 2019

@BurntSushi

I personally think the ecosystem has swung too far in the direction of micro-crates, but that's just my opinion

As probably one of the main advocates of the current micro-crate approach employed by rand, I would like to defend it a bit. Let's take a look at the low level of current rand stack and its separation of concerns:

  • getrandom: this crate is for retrieving system entropy on as much systems as possible. It's a surprisingly tricky process to do right and efficiently, so I would strongly recommend not to try to roll your own solution in this space. Plus overwriting this crate by a custom one allows to run crates dependent on randomness on "unsupported" targets (e.g. on embedded devices). API is essentially stabilized due to its extreme simplicity. We had some question about the Error type, but I think we've worked them out and we are unlikely to change it after v0.2 release in near future (see rust-random/getrandom#94). The main source of potential future breakage is handling of the wasm32-unknown-unknown target and some crate feature changes (e.g. see rust-random/getrandom#84).
  • rand_core: contains core traits for describing RNG functionality. So crates (including third-party ones) implementing PRNG or hardware RNG can depend only on it, without bringing the whole rand on board, and be immediately compatible crates generic over the RNG trait. It's intentionally very bare-bone and fancier functionality is left for extension traits like rand::Rng. We probably could move some functionality from rand::Rng to RngCore, but where should we draw the line? Some people would argue that range generation is a must have, others would like to see sampling and random string generation. I think we are quite close to API stabilization here, the main question right now is the error type. This crate optionally depends on getrandom for seeding PRNG algorithms from system entropy. We also plan to move OsRng here (it will be feature-gated behind getrandom). BTW note that we even bring parts of byteorder to rand_core, instead of adding +1 dependency.
  • rand_chacha and other RNG crates: these represent algorithm implementations and (potentially) "drivers" for hardware RNGs. They have to depend only on rand_core (we do not recommend enabling getrandom feature). Right now rand_chacha is essentially just a thin wrapper around c2-chacha and arguably it can be deprecated in favor of implementing RngCore trait directly in ChaCha implementation crate(s). These crates essentially can be stabilized right after stabilization of rand_core.

For me it's a very clean and logical separation, which allows incremental stabilization of some rand ecosystem parts and gradual deprecation of others (e.g. like we do with rand_os and rand_jitter), while being friendly to third-party crate developers. Also for some applications this approach significantly reduces amount of code which has to be compiled and reviewed.

I would like to argue that most of the observed churn is not caused directly by the micro-crate design. I think the "explosion" of rand crates is simply the most visible change, so people start to associate our mistakes (which, granted, we done a plenty...) with it. Also when something breaks in a project which uses micro-crate design, often it's some small crate upstream, so the first thought is "damn those micro-crates, again they broke something", while the same breakage as easily could've happened with a monolithic design. So in my opinion we have the good old "correlation does not imply causation" on our hands here.

@dhardy

This comment has been minimized.

Copy link

commented Aug 29, 2019

@aldanor has a point — the optimal distribution of values for quickcheck is not exactly the uniform distribution which the Rand crate is designed for. This reminds me of this request (which has never been implemented but could conceivably be added to the lib). It could therefore be that the optimal approach (numerically) would be to use your own trait and sequence generator in quickcheck. (But against this, libraries now have two different frameworks to implement for randomised generation of values, where this makes any sense.)

Thanks @newpavlov for summarising the status of Rand crates. Personally I am very happy that getrandom is a separate crate now and I also think many of the complaints are simply due to the visibility of "so much stuff for random numbers". Conceivably I think merging rand, rand_core, rand_pcg and rand_chacha could still work, but of course it would be a kludge. The only technical drawback of micro-cratering IMO is that there is more packaging (a bigger problem for linux distros) and scope for packaging errors (e.g. the rand 0.6.5 issue).

@BurntSushi

This comment has been minimized.

Copy link
Owner Author

commented Aug 29, 2019

@newpavlov Thanks for the thoughts. This is why this problem is hard: reasonable people can disagree. The design you laid out is perfectly defensible. But there are other designs, hinted at by @dhardy. For example, I would definitely agree with you that getrandom is a nice crate to have that is separate from everything else. But conceivably, a substantial portion of the rand ecosystem could be merged into the rand crate without sacrificing the use cases you've laid out that are currently supported by a micro-crate design. Obviously, this would require leaning more heavily on Cargo features, and those come with their own downsides.

So in my opinion we have the good old "correlation does not imply causation" on our hands here.

Right. The size of a dependency tree is typically just a signal. But I'd like to re-iterate my point above about cohesion. I don't think we can really evaluate crate hierarchies in a technical vacuum. We also need to consider the actual interaction the folks have with crates. This includes everything from trying to understand the aggregate APIs provided by those crates to managing expectations when they see a much larger number of crates added to their tree than they would otherwise expect.

I'm in the process of writing a blog post about dependency selection that will hopefully try to explain my thoughts more clearly, assuming I get it done. It isn't just about rand, but about the wider ecosystem and my own personal struggle with this too. For example, I just removed two dependencies from regex, but in the near future, I have plans to add two more and I don't know how to avoid it.

@newpavlov

This comment has been minimized.

Copy link

commented Aug 29, 2019

@BurntSushi

Obviously, this would require leaning more heavily on Cargo features, and those come with their own downsides.

Well, in my opinion it's just sweeping complexity under the rug. Increasing number of features also makes it harder to comprehensively test a crate, since number of possible feature combinations raises exponentially. And instead of a clean dependency graph you get a potentially messy grey-box. This is why I believe that instead of introducing a ton of features to std (see rust-lang/rfcs#2663), it will be better to add more "standard" crates like alloc and proc_macro.

One argument with which I agree is that micro-crate design (BTW I think "micro" is a bit too strong in our case) makes life of linux package maintainers more difficult, but I think it this case a better approach would be to develop a solution for packaging a Rust application into a single package with all its upstream dependencies listed in its Cargo.lock (granted, it may require some changes to packaging policies, which may not be easy).

I think it's somewhat funny how number of crates became a main complexity metric in such discussions. Instead of, for example, total LoC or number of groups which maintain your dependencies. I guess the main reason for that is because it's the only metric which you always observe when compiling your project, so people tend to dramatically overestimate its importance. (although there are certain issues with cargo, which make situation a bit worse than it should be, e.g. like downloading unused optional dependencies)

@BurntSushi

This comment has been minimized.

Copy link
Owner Author

commented Aug 29, 2019

Well, in my opinion we just sweep complexity under the rug here. Increasing number of features also makes it harder to comprehensively test a crate, since number of possible feature combinations raises exponentially. And instead of a clean dependency tree you get a potentially messy grey-box.

Yes, those are some of the downsides. I mentioned that Cargo features had downsides, so I wasn't trying to sweep them under the rug.

One argument with which I agree is that micro-crate design (BTW I think "micro" is a bit too strong in our case) makes life of linux package maintainers more difficult,

It's not just distros. It's also folks that need to review dependencies, i.e., something like cargo crev, which I am currently experimenting with.

but I think it this case a better approach would be to develop a solution for packaging a Rust application with all its upstream dependencies into a single package.

Sounds like a non-starter to me? Some distros, like Archlinux, do this. But it's not going to fly for Debian, as far as I understand things.

I think it's somewhat funny how number of crates became a main complexity metric in such discussions. Instead of, for example, total LoC or number of maintainer groups which work on your dependencies. I guess the main reason for that is because it's the only metric which you always observe when compiling your project, so people tend to dramatically overestimate its importance.

Yes, I mostly agree. But it doesn't seem funny to me, I guess. It makes perfect sense. As the maintainer of Rust applications, I try to keep my dependency trees under control. Whenever I run cargo update, I generally carefully audit its output and investigate anything that looks strange. If my dependency tree explodes to hundreds of crates, then this process becomes much harder. This is where it gets tricky, because it's often the case that there is no one single source of the explosion. It's an aggregate effect. Which makes it extremely difficult to fix, because it's very hard to convince anyone that it's a problem, because in isolation, it generally isn't.

A layer of abstraction over crates (like "maintenance groups") is perhaps a good idea. Certainly, in some cases, a maintenance group more closely approximates the maintenance burden assumed by relying on dependencies. But I'm pretty sure that will require significant tooling to pull off correctly, nevermind the social work required to do it. I'm not much of a visionary, and I don't have a lot of time to burn on this stuff, so I'm more or less looking for ways of making things better today, using the tools we have. I can't spend too much of my time on what the "ideal" scenario is divorced from the feasibility of it.

I am on the receiving end of this too. I very often hear from folks that they don't want to use regex because they want to avoid the dependency tree it brings in. What am I supposed to do with that? I want people to use my work, so I'm going to try to remove the barriers that users are telling me that are preventing them from using my work. From my perspective, I can either try to educate them that the dependency tree is fine and would exist anyway, or I can try to actually shrink the tree itself. Maybe there are other avenues that involve changing how our dependency trees are surfaced to users, but that's not something I can do now. So my plan for now is to try to do both. (My other idea is to make a lighter regex crate as an alternative, but there are various problems with that approach.)

(although there are certain issues with cargo, which make situation a bit worse than it should be, e.g. like downloading unused optional dependencies)

Yes, it would be very nice to fix this.

@newpavlov

This comment has been minimized.

Copy link

commented Aug 29, 2019

I wasn't trying to sweep them under the rug

I didn't mean you, I was comparing approaches: many features vs many crates.

It's not just distros. It's also folks that need to review dependencies, i.e., something like cargo crev, which I am currently experimenting with.

It's the other way around, isn't it? Splitting monolith into smaller crates does not increase the total amount of code (well, plus-minus some book-keeping). And for some applications it can dramatically reduce amount of code which needs reviewing. For example in RustCrypto I plan to make some types generic over R: RngCore + CryptoRng = OsRng, meaning that instead of depending on the whole rand, I will depend only on rand_core and getrandom. And crate users will be still able to change RNG to one of their liking if needed (e.g. to ThreadRng from rand for an additional performance). Also with a tree of smaller crates you have to review only updated crates, instead of the whole monolith (yes, usually you will work with diffs, but it's easier to keep in mind API surface of a smaller crate).

Making reviews easier was one of the major reasons why I was advocating for such re-design.

But I'm pretty sure that will require significant tooling to pull off correctly, nevermind the social work required to do it.

I think the easiest approach will be to simply calculate number of crate owners in your dependency tree (i.e. teams will be ignored) by parsing crates.io pages. It will be an approximate upper bound of the desired number. Alternatively we could analyze repository links to find users/organization, but not all crates provide them, so it will be a bit less reliable, although probably more precise.

I don't have experience writing cargo sub-commands and I am not particularly interested in doing it right now, so hopefully someone will find this idea interesting enough to perform preliminary experiments. :)

Maybe there are other avenues that involve changing how our dependency trees are surfaced to users, but that's not something I can do now.

One possible idea: default log messages during project compilation could instead of:

   Compiling typenum v1.11.2
   Compiling byteorder v1.3.2
   Compiling byte-tools v0.3.1
   Compiling opaque-debug v0.2.3
   Compiling keccak v0.1.0
   Compiling block-padding v0.1.4
   Compiling generic-array v0.12.3
   Compiling block-buffer v0.7.3
   Compiling digest v0.8.1
   Compiling sha3 v0.8.2

Output something like this by grouping crates belonging to a single "project":

   Compiling typenum v1.11.2
   Compiling byteorder v1.3.2
   Compiling generic-array v0.12.3
   Compiling project crates: rustcrypto (7)

So it will be less scary for most users (ofc users should be able to turn off this behvior). This would require a new field in Cargo.toml for reliable grouping of crates, e.g. project. To disallow hijacks of project name we could make that only people who have write access to a crate with the project name will be able to upload crates belonging to the project. For crates with a single owner we could probably use the owner name as a substitute, if project name was not provided for the crate.

But I am not sure how effective this measure will be.

@BurntSushi

This comment has been minimized.

Copy link
Owner Author

commented Aug 29, 2019

It's the other way around, isn't it?

It depends. Likely true in some cases. But there is overhead associated with doing each individual review and understanding how all of the crates fit together. Especially as reviews become more targeted toward diffs in updates. I'd rather do one ~small review of a monolith than many tiny reviews of updated crates. Obviously this is a bit hand wavy, but my point is that there's a mental burden cost here to consider. (Along with the other points I've raised.)

I think the easiest approach will be to simply calculate number of crate owners in your dependency tree

Sure. I'm just really trying to avoid taking this discussion into a brainstorming session of how we can "fix" official tooling. From my perspective, fixes at that level are extremely costly. I'm personally trying to stay focused on what I can do today.

(A tool that analyzes crate ownership is nice, but it's not integrated into tooling like your other idea, so it doesn't really do too much to assuage the issue.)

@newpavlov

This comment has been minimized.

Copy link

commented Aug 31, 2019

@BurntSushi FYI: I've published the earlier idea on internals: https://internals.rust-lang.org/t/10895

huitseeker added a commit to huitseeker/libra that referenced this issue Sep 5, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 5, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 5, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 5, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 5, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
huitseeker added a commit to huitseeker/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
bors-libra added a commit to libra/libra that referenced this issue Sep 6, 2019
Remove tempfile as a dependency
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand

Closes: #717
Approved by: mimoo
@dhardy

This comment has been minimized.

Copy link

commented Sep 12, 2019

@BurntSushi do you think this thread has served its purpose by now? I don't care much for the Libra project, and minimising dependencies is potentially a laudable goal in its own right, but that others are claiming rand is "antiquated" and using this as evidence is rather odd.

My recommendations:

  1. Open another thread discussing alternative sampling routines (distributions) on technical merit (as mentioned in this post).
  2. If you think it appropriate, open an issue on the rand project with recommendations on policies / changes. (This issue already inspired rust-random/rand#872.)
@BurntSushi

This comment has been minimized.

Copy link
Owner Author

commented Sep 12, 2019

Yes, I don't understand the commentary from the libra people. Although, they are just doing what I've already done in a few places: ripping out tempfile in favor of a much simpler solution. I think I linked to such things in my initial comment, which is maybe why they are linking this issue? Not sure.

I'm not sure I've quite decided what to do. But I haven't thought about this too much lately. When I circle back around to this, I'll update this ticket with a decision and close it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.