Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upRFC: remove dependency on `rand` ecosystem #241
Comments
This comment has been minimized.
This comment has been minimized.
|
One thing you could do is have your own Edit: that actually sounds like a good thin crate. and if it's explicitly for non-cryptographic uses then it should be pretty easy |
This comment has been minimized.
This comment has been minimized.
|
@elichai For the purposes of the quickcheck crate---and honestly, probably many other simple uses---something like the Also, |
This comment has been minimized.
This comment has been minimized.
|
I'm writing a thin crate now that just gives a replacement to the Rng/RngCore traits. Hope to publish a first release soon and if you like it you can use it :) |
This comment has been minimized.
This comment has been minimized.
|
FYI, ThreadLocal is also a secure Rng. https://lemire.me/blog/2019/03/19/the-fastest-conventional-random-number-generator-that-can-pass-big-crush |
This comment has been minimized.
This comment has been minimized.
Can you say what your long term maintenance plan is?
I'm not sure I grok this sentence. Could you elaborate? |
This comment has been minimized.
This comment has been minimized.
|
About About my plans, My hope is to write it in a way that requires little maintenance as the whole point is to keep the code very thin. You can see my current work here: https://github.com/elichai/random-rs |
This comment has been minimized.
This comment has been minimized.
|
That's
Thanks! So I just want to be crystal clear: my standards for bringing in another crate for this are going to be very high, in particular because it will likely be a public dependency of See here for some words I've written about how I evaluate dependencies. |
This comment has been minimized.
This comment has been minimized.
|
Right, sorry. confused the names. As I said i'm planing to release today, would be appreciated if you could look/play with it a bit and tell me your thoughts, but for actually depending as you said in your post waiting a bit is good advice, as we're all humans and stuff tend to come up that I might need to fix. but hopefully that will be a very short period. I'll reiterate, my plan is to have a stable API that doesn't require any changes except adding support for new primitives in the future. |
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
This comment has been minimized.
This comment has been minimized.
|
Sorry if this comes across as somewhat bitter; I'll try not to be. So many people have expectations of what a random number crate should be; the
For what it's worth, I have re-opened rust-random/rand#850. Opinions differ about what it's worth supporting, but perhaps I see some value in this now. I won't tell people use rand or don't use rand, but if people wish to pitch in with their views on the project, we will try to listen. Whether it is a good fit for |
This comment has been minimized.
This comment has been minimized.
See the rand book for a summary of RNGs we supply — there are both small, fast PRNGs and crypto-RNGs. Some are very easy to re-implement; e.g. |
This comment has been minimized.
This comment has been minimized.
This is a very good point, and it doesn't help that to use Rand's So again, I don't know what is your best option here. |
This comment has been minimized.
This comment has been minimized.
|
@dhardy Thanks! I appreciate your response. The minimal version check is probably my most pressing concern. I have looked into using only rand_core in quickcheck before, but couldn't see how to do it (although I don't remember why). |
This comment has been minimized.
This comment has been minimized.
Yeah, it definitely isn't. These are hard trade offs to balance. I think the most common complaints I hear about Churn is hard to fix, because it requires settling on an API that is fixed for a potentially long period of time. I don't know what
Right. This is a tough one to balance too. I personally think the ecosystem has swung too far in the direction of micro-crates, but that's just my opinion. And even if everybody agreed with that opinion, the path to fixing it is not clear. But as an example, there probably exists a design in which Of course, I am only representing the benefits. There are of course benefits to splitting things across crates, and in particular, there are downsides to using Cargo features. So most of what I'm saying here is an opinion based on my own sensibilities.
I definitely have a very strong desire to depend on the random crate that everyone else uses. That is a huge benefit that can't really be mitigated by depending on something that is less commonly used. Therefore, I'd like to impress upon you that I did not make this issue lightly, and I only did it after a large amount of frustration on my end began to bubble up and boil over. |
This comment has been minimized.
This comment has been minimized.
|
@BurntSushi thanks for your response!
There isn't actually a lot more planned. There are some open issues regarding API tweaks, but it may be better to minimise churn in these cases. The big issue is really stabilising the
Compile time isn't even a big motivation; as I understand it it's more about having access to the APIs while having less dependency code to review. The current design may have been influenced too much by crypto-nerds. I half wonder if it would be for the best to re-assimilate
If your only public API dependencies are on the |
This comment has been minimized.
This comment has been minimized.
|
Here's another point on the topic from a quant :) What many people don't realize is that stacking multiple uniform RNGs for a bunch of independent variables does not yield a good coverage of a high-dimensional space. This is a very well-known fact e.g. in math finance when performing Monte-Carlo simulations; to achieve good coverage in a sparse space, you would typically use a low-discrepancy sequence. Even in 2-D space, discrepancy is already visible; as dimensionality increases, things get progressively worse. For instance, you have a tuple Just a (quasi-)random thought - given the purpose of this crate (to cover as much of the search space as possible leaving no gaps), would it make sense to consider using QRngs such as Sobol sequence? As for other distributions like normal, afaik it is possible to make a (quasi-)normal qrng out of a uniform qrng using Box-Muller transform. In reality, would people use anything more sophisticated than uniform / normal in this crate? Here's how it works), note it's 2-D where things are not that bad. /* removed the image so as not to clutter this thread */ |
This comment has been minimized.
This comment has been minimized.
|
@aldanor Thanks for the insight, but I don't think that's relevant to this specific issue? Maybe you could open a new issue? This issue isn't about switching rngs, but rather, considering which crates to use. Also, if you're looking for changes to be made, it would be helpful to leverage your expertise and explain in more simpler terms the changes that would result by switching to a different rng. Also, note that quickcheck does not strictly use a uniform distribution. It specifically also tries to pick out problem values for specific types. |
This comment has been minimized.
This comment has been minimized.
My point was that QRngs don't require an Rng at all :) (hence a comment in this thread) As in, they are essentially deterministic and behave better in higher-dimensional spaces (at least the floating-point ones) when you have tons of parameters. If this topic would be of interest of anyone, I can open a separate issue and list a few thoughts there. |
This comment has been minimized.
This comment has been minimized.
|
@aldanor Yes, a separate issue please. The details of how quickcheck generates values is way off topic for this thread I think. |
This comment has been minimized.
This comment has been minimized.
As probably one of the main advocates of the current micro-crate approach employed by
For me it's a very clean and logical separation, which allows incremental stabilization of some I would like to argue that most of the observed churn is not caused directly by the micro-crate design. I think the "explosion" of |
This comment has been minimized.
This comment has been minimized.
|
@aldanor has a point — the optimal distribution of values for Thanks @newpavlov for summarising the status of Rand crates. Personally I am very happy that |
This comment has been minimized.
This comment has been minimized.
|
@newpavlov Thanks for the thoughts. This is why this problem is hard: reasonable people can disagree. The design you laid out is perfectly defensible. But there are other designs, hinted at by @dhardy. For example, I would definitely agree with you that
Right. The size of a dependency tree is typically just a signal. But I'd like to re-iterate my point above about cohesion. I don't think we can really evaluate crate hierarchies in a technical vacuum. We also need to consider the actual interaction the folks have with crates. This includes everything from trying to understand the aggregate APIs provided by those crates to managing expectations when they see a much larger number of crates added to their tree than they would otherwise expect. I'm in the process of writing a blog post about dependency selection that will hopefully try to explain my thoughts more clearly, assuming I get it done. It isn't just about |
This comment has been minimized.
This comment has been minimized.
Well, in my opinion it's just sweeping complexity under the rug. Increasing number of features also makes it harder to comprehensively test a crate, since number of possible feature combinations raises exponentially. And instead of a clean dependency graph you get a potentially messy grey-box. This is why I believe that instead of introducing a ton of features to One argument with which I agree is that micro-crate design (BTW I think "micro" is a bit too strong in our case) makes life of linux package maintainers more difficult, but I think it this case a better approach would be to develop a solution for packaging a Rust application into a single package with all its upstream dependencies listed in its I think it's somewhat funny how number of crates became a main complexity metric in such discussions. Instead of, for example, total LoC or number of groups which maintain your dependencies. I guess the main reason for that is because it's the only metric which you always observe when compiling your project, so people tend to dramatically overestimate its importance. (although there are certain issues with cargo, which make situation a bit worse than it should be, e.g. like downloading unused optional dependencies) |
This comment has been minimized.
This comment has been minimized.
Yes, those are some of the downsides. I mentioned that Cargo features had downsides, so I wasn't trying to sweep them under the rug.
It's not just distros. It's also folks that need to review dependencies, i.e., something like
Sounds like a non-starter to me? Some distros, like Archlinux, do this. But it's not going to fly for Debian, as far as I understand things.
Yes, I mostly agree. But it doesn't seem funny to me, I guess. It makes perfect sense. As the maintainer of Rust applications, I try to keep my dependency trees under control. Whenever I run A layer of abstraction over crates (like "maintenance groups") is perhaps a good idea. Certainly, in some cases, a maintenance group more closely approximates the maintenance burden assumed by relying on dependencies. But I'm pretty sure that will require significant tooling to pull off correctly, nevermind the social work required to do it. I'm not much of a visionary, and I don't have a lot of time to burn on this stuff, so I'm more or less looking for ways of making things better today, using the tools we have. I can't spend too much of my time on what the "ideal" scenario is divorced from the feasibility of it. I am on the receiving end of this too. I very often hear from folks that they don't want to use
Yes, it would be very nice to fix this. |
This comment has been minimized.
This comment has been minimized.
I didn't mean you, I was comparing approaches: many features vs many crates.
It's the other way around, isn't it? Splitting monolith into smaller crates does not increase the total amount of code (well, plus-minus some book-keeping). And for some applications it can dramatically reduce amount of code which needs reviewing. For example in RustCrypto I plan to make some types generic over Making reviews easier was one of the major reasons why I was advocating for such re-design.
I think the easiest approach will be to simply calculate number of crate owners in your dependency tree (i.e. teams will be ignored) by parsing crates.io pages. It will be an approximate upper bound of the desired number. Alternatively we could analyze repository links to find users/organization, but not all crates provide them, so it will be a bit less reliable, although probably more precise. I don't have experience writing cargo sub-commands and I am not particularly interested in doing it right now, so hopefully someone will find this idea interesting enough to perform preliminary experiments. :)
One possible idea: default log messages during project compilation could instead of:
Output something like this by grouping crates belonging to a single "project":
So it will be less scary for most users (ofc users should be able to turn off this behvior). This would require a new field in But I am not sure how effective this measure will be. |
This comment has been minimized.
This comment has been minimized.
It depends. Likely true in some cases. But there is overhead associated with doing each individual review and understanding how all of the crates fit together. Especially as reviews become more targeted toward diffs in updates. I'd rather do one ~small review of a monolith than many tiny reviews of updated crates. Obviously this is a bit hand wavy, but my point is that there's a mental burden cost here to consider. (Along with the other points I've raised.)
Sure. I'm just really trying to avoid taking this discussion into a brainstorming session of how we can "fix" official tooling. From my perspective, fixes at that level are extremely costly. I'm personally trying to stay focused on what I can do today. (A tool that analyzes crate ownership is nice, but it's not integrated into tooling like your other idea, so it doesn't really do too much to assuage the issue.) |
This comment has been minimized.
This comment has been minimized.
|
@BurntSushi FYI: I've published the earlier idea on internals: https://internals.rust-lang.org/t/10895 |
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand Closes: #717 Approved by: mimoo
This comment has been minimized.
This comment has been minimized.
|
@BurntSushi do you think this thread has served its purpose by now? I don't care much for the Libra project, and minimising dependencies is potentially a laudable goal in its own right, but that others are claiming rand is "antiquated" and using this as evidence is rather odd. My recommendations:
|
This comment has been minimized.
This comment has been minimized.
|
Yes, I don't understand the commentary from the libra people. Although, they are just doing what I've already done in a few places: ripping out I'm not sure I've quite decided what to do. But I haven't thought about this too much lately. When I circle back around to this, I'll update this ticket with a decision and close it out. |
I am no longer happy about depending on the
randcrates. There is too much churn, too many crates, and IMO, worst of all, there is no desire to add a minimal version check to their CI. Which means anything that depends onquickcheckin turn cannot reliably have its own minimal version check.Because I am tired of depending on
rand, I have started removing it completely where possible. For example, inwalkdir, I've removed quickcheck as a dependency. In ripgrep, I've removedtempfileas a dependency, because it in turn was the only thing bringingrandinto ripgrep's dependency tree.I don't see any other path forward here. I can either continue to grin and bear
rand, drop everything that depends on randomness, or figure out how to generate randomness withoutrand. Specifically, I'd very much like to add a minimal version check back to theregexcrate, which catches bugs that happen in practice. (See here and here.) My sense is that there is some design space in the ecosystem for a simple source of randomness that doesn't need to be cryptographically secure, and an API that does not experience significant churn. Certainly, quickcheck does not need a cryptographic random number generator.With that said, there is some infrastructure in the
randAPI that is incredibly useful. For example, quickcheck makes heavy use of theRng::genmethod for generating values based on type.So it seems like if we have something like the
Rngtrait with with a non-cryptographic RNG, then we'd be probably good to go.Are there other avenues here? What have I missed? My experience in building infrastructure for randomness is pretty limited, so am I underestimating the difficulty involved here?
Another side to this question is whether any users of quickcheck are leveraging parts of the
randecosystem that would be difficult or impossible to do if we broke ties withrand.