-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non repeatable build / broken dependency propogation #184
Comments
This step is expected to be non-reproducible as the package index will be consulted. Still, the build should succeed if cargo can succeed (Definitely point out if this is not the case). Among search results, this link seems pointing in the right direction:
A mechanism I could believe to be at work here is if Nix evaluation thinks a dependency doesn't need rebuild when it actually does. The compiler catches our error due to the wrong interface hashes. To simplify, we might be ignoring part of the input when writing the Nix expressions. If this theory is correct, cleaning out the store paths should result in Nix not finding the (broken) cached rlib and being able to succeed. Can you give this a shot? You can find the paths that might need rebuilding using a similar workflow: nix-store -q --deriver /nix/store/32d9pc56axxrc2830q5cz1iwifjbar2m-crate-cargo2nix-0.9.0-bin
nix show-derivation /nix/store/ddwgw69slmx1ajdcyjrj2z53c4kfw3ml-crate-cargo2nix-0.9.0.drv | grep '/nix.*crate.*drv
"/nix/store/ddwgw69slmx1ajdcyjrj2z53c4kfw3ml-crate-cargo2nix-0.9.0.drv": {
"/nix/store/3inhij8bmn3qzpn1m18j2f48bvwlil2r-crate-anyhow-1.0.35.drv": [
"/nix/store/3l1la0qrsgwyzx6nvpxlb8kz8l98jgi0-crate-semver-0.9.0.drv": [
"/nix/store/9lid9ypd34zxyp697ajx92qwn9bkzshf-crate-tera-1.5.0.drv": [
"/nix/store/ixv0163kk08nyjwjk22p2z990z4335wy-crate-cargo-0.48.0.drv": [
"/nix/store/jjkvilw9283576lxysgkyrng2v5myf33-crate-tempfile-3.1.0.drv": [
"/nix/store/js8lidv52qbdflgnjrqbv9hnhairsd66-crate-colorify-0.2.3.drv": [
"/nix/store/l5l4anxvgpj6kxvp9h3g2dgi37glabql-crate-serde-1.0.118.drv": [
"/nix/store/n08g1s2v3kgcir8z06f45hn2a32ybm0k-crate-toml-0.5.7.drv": [
"/nix/store/qyivqx0j3g4gikbf7y9gn1wfan1xl0ds-crate-pathdiff-0.2.0.drv": [
"/nix/store/vgdpa0vxdw1yh88xbgmbdvijhm2mdl95-crate-cargo-platform-0.1.1.drv": [
|
To confirm, the build does not succeed. I'll have a go with what you've suggested - thanks for the pointer. It'll be next week now before i get a chance. |
Assuming cargo build works but nix build fails. Please correct me if that's not the case. |
Yes, cargo works fine, nix does not. I can simplify the failure case to:
The resultant error message is then on a downstream crate:
so here, we have crate I can confirm that deleting all store paths, then rebuilding with We were seeing CI failures too. In CI, we build 4 sets of derivations for our software, which are
we were seeing these failures also in |
Guessing that you've already implemented flushing the CI cache as a workaround for now. Based on first example, there's strong evidence that the newly generated Cargo.nix did not react to the "modified" Cargo.toml. Question is, how does the inconsequential change to the Cargo.toml result in |
Our workaround was nota subtle one - CI is ignoring all entries in the binary cache containing our crates. We warm it up by building hello-world with the cache for all cross-compilation targets, to avoid rebuilding the universe. I'm equally baffled. I don't understand the internals of cargo sufficiently to know where to inspect the metadata. |
My leading (only) candidate for this is a single binary crate, which shares a many of the dependencies of the failing ones. Its the only override aside from some fairly straightforward sys crates including build inputs on the libraries. We have my_bin = pkgs.rustBuilder.rustLib.makeOverride {
name = "my_bin";
overrideAttrs = drv: {
# Rust flags were being ignored: this is a workaround.
configureCargo = ''
mkdir -p .cargo
cargo_config=${./.cargo/config.toml}
cp $cargo_config .cargo/$(stripHash $cargo_config)
'';
};
}; This is being done to set linkage flags for static cross compilation. I'm unaware of any other way to inject these in cargo2nix - i'd love to been shown to be incorrect.
Building the dependencies for this target will have different flags than building those dependencies for other targets, which if encoded in metadata would cause what i'm seeing if those dependencies were reused. I don't have a great handle on what metadata cargo stores, but i'd believe link flags were amongst them. Does this seem likely? If so what would be a better method for injecting these rust flags for this target and its deps? |
We've come to hit this issue as well. We use crates2nix however. |
@Fuuzetsu Is there a related issue in that project? Also, are you doing cross compilation? |
No issue on cargo2nix that I know of. w.r.t. reproduction, sadly it's a closed source project that has been working well up until recently when we hit this issue. We have a workspace with maybe like 20 crates (not at PC to check) and when we added one more, the new crate is having the problem. I found this ticket by Googling. We are not doing cross compilation. |
Sorry, first line meant to say crate2nix, not cargo2nix |
@Fuuzetsu Are you able to share how you invoke crate2nix? My working assumption for this error is its something in the manner in which I'm setting the linkage flags not propagating correctly. However, that could be completely unrelated, and there is some more fundamental bug here. The way this first manifested itself for us was very similar to what your describe - we've 15 or so crates in a workspace (closed source), and adding a new one caused this problem. |
We're not doing anything strange with the crates I think: I am adding things like I don't really think it's an issue with how we're invoking the builder nor with what you're doing. I initially was blaming non-determinism of rustc but it doesn't explain why changing the whitespace in the upstream crate makes it work sometimes... The issue happening when a crate is added is also very strange: they ought to be just separate derivations, I don't understand why they would be affecting each other. Honestly I'm a bit lost as to how to deal with this. I was going to try and examine what's happening during the build in detail when I got some time but I haven't gotten to that point yet. If we have a way to replicate the issue then change some whitespace and have a working build, it should presumably be possible to examine what rustc is doing during the build and find the difference... |
Oh, I wanted to add something. One thing that's different on say my machine where build works vs coworker's machine where build failed vs CI where build failed is the |
I think you may have something with I'm not sure of the exact correlation, because the CI infrastructure was done out of band from our commit history. Perhaps The docs suggest this is not the case though:
|
I'm not sure what you mean by |
For now I'm going to send MR to nixpkgs to make |
Cargo2nix uses cargo to do the building |
Ah, I see. I still am not sure what you mean by "this is not the case" in this situation but as I understand, this means that probably cargo2nix is using the value of 16 as release builds are non-incremenal by default as of few rust versions ago (and nix builds aren't able to be incremental anyway so one would hope it's doing non-incremental settings anyway). I guess the patch to nixpkgs doesn't matter for cargo2nix and for your usecase you should just either patch cargo2nix to set the flag explicitly or set it in Cargo.toml, assuming it gets honoured properly. |
This parameter is being set to `$NIX_BUILD_CORES` by default. This is a standard practice but there's a suspicion that this can produce broken builds. For some details see cargo2nix/cargo2nix#184 . As a work-around/test, it'd be good if codegen-units can be set to something constant, such as `1`. This PR allows it. Note that the default of `$NIX_BUILD_CORES` is preserved so this MR causes no change in default behaviour and no rebuilds.
We started doing codegen-units=1 in our codebase recently in hopes that it'd fix the issue. Sadly, I just hit this again even with codegen-units=1 so it doesn't make the issue go away. I guess this needs some real debugging ;/ |
As codegen-units=1 did not help, I had to spend some time to investigate properly. I suspected As this crate was deep in our dependencies, it would only rarely actually get recompiled which is why we weren't hitting the issue often: only really when someone was adding crates (and so potentially causing it to build due to different resolved versions/features). I submitted a fix at frozenlib/structmeta#1 for our particular instance. @ollie-etl if you guys still have issues, I'd recommend you try and vet macros that are in your (transitive) dependency tree. I suspect you may even have the same crate in scope. Sadly, I don't have an easy way to find what crate an issue comes from. I effectively was hacking I suppose there should at least be a tool that prints crate hashes dependency tree: you can already see these hashes if you do ask |
Need to look at the drv's between success & failure. In the tree of dependencies, there's got to be some extra drv's that show up (along with old ones) or else the impurity is a non-determinism that's leaking in, meaning the intensional store winds up with multiple results at the same nix path. It's a great help to spot the propagation of differences in the drvs |
0.9.1 has fixed a behavior where trivial features were leaking down into dependencies. This might actually help the behavior. |
After reading more into Fuuzetsu's experience, it's pretty clear that this issue is with Rust, non-deterministic binary outputs. In general, we can't protect from non-determinism. I guess would could recommend an override workflow for the specific crate to always trigger rebuild. A workaround is the right answer if there's unavoidable non-determinism in the Rust dependency or some rustc behavior. |
Yes, rustc 1.55 seems OK and issues there lie with macro crates that produce non-deterministic output. There's a nix config flag that you can set to rebuild all packages multiple times and check hashes: this can help catch non-deterministic packages ahead of time, hopefully before they go into your binary caches etc. Another is to override your rust deps with preferLocalBuild or whatever it is. For us, we are just on 1.55 for now. |
@Fuuzetsu Did you see any fixes land as expected? |
Yes, there were a bunch. We ran into another issue but it turned out to be a derive crate producing code from a macro in non-deterministic way. It's looking like we're going to be upgrading from 1.55 to 1.57, skipping the broken 1.56. You can see some issues getting fixed in rust-lang/rust#90301 – it's all LLVM stuff. |
Then I think it's appropriate to handle this issue as errata, noting it in the common issues on the README but not attempting to hunt it down as a nix or cargo2nix issue. With no objections, I'll include this note in my open PR and close this issue on merge. |
Yes, there is nothing on the nix side that's broken or that one can do beyond something like "disable binary caches because binaries are randomly not compatible". Definitely nothing cargo2nix or crate2nix or anything can do (except somehing crazy like enabling preferLocalBuild or something). |
This parameter is being set to `$NIX_BUILD_CORES` by default. This is a standard practice but there's a suspicion that this can produce broken builds. For some details see cargo2nix/cargo2nix#184 . As a work-around/test, it'd be good if codegen-units can be set to something constant, such as `1`. This PR allows it. Note that the default of `$NIX_BUILD_CORES` is preserved so this MR causes no change in default behaviour and no rebuilds.
I'm not sure exactly whats up here, but cargo2nix appears broken.
I've been building in CI, and running to successful completion. Getting hold of master and issuing nix-build pulls everything from the binary cache, which no build, as expected.
However:
results in the error below. Cargo is presumably unhappy with the metadata within
encoding_rs
.I haven't bottomed out the cause yet (otherwise there would be an accompanying PR).
The fact that this has built successfully indicates there is an ordering type bug at play here? Or that cargo is emitting metadeta which renders the build invalid
The text was updated successfully, but these errors were encountered: