-
-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cargo.lock
considered harmful
#327063
Comments
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/cargo-lock-considered-harmful/49047/1 |
To add to the evidence: |
I don’t think this should be prohibited entirely because I think it’s the only option if upstream doesn’t provide a |
I'm currently including a |
On making too easy for doing the wrong thing, only the wrong thing is done. |
I don’t think blocking Nixpkgs on upstream packaging issues is a viable approach, especially ones that primarily only affect us. We wouldn’t have a package set at all. In this case, many upstreams are actively unwilling to maintain a |
The same goes for composer.lock files for php packages. There are soooo many upstreams that will never accept a lockfile, no matter how much you talk with them. Perhaps we can still allow lockfiles, with the recommendation that you first talk to upstream to see if they are willing to accept it there. And reduce the tarball size slowly that way? |
@emilazy I don't think @superherointj suggested doing that. They suggested prioritising upstream collaboration and only falling back to in-tree workarounds when collaboration fails but have a proper process for that in place. I don't think we need to eradicate Cargo.lock entirely myself. There are likely edge-cases where anything else is simply impossible. It needs to be the exception rather than the norm though; something that 30 packages might do, not 300. @patka-123 w.r.t. composer.lock and friends, also see #327064. I have two further thoughts on possible solutions:
|
I’m all for upstream collaboration – I’ve opened like 7 upstream PRs during Nixpkgs work in the past month or so – but from my experience of upstream responsivity, I don’t think we can viably have a workflow like “work with upstream, then work around when that fails”. A lot of the time upstreams just take ages to even acknowledge an issue or PR, and even when they do it can take several rounds to achieve mutual understanding and consensus. “Work with upstream, apply a workaround in the meantime, then remove it if upstream collaboration succeeds” is much better for Nixpkgs maintainers and users. I think that making sure that workarounds get removed in a timely fashion when no longer necessary, and aren’t introduced unnecessarily in the first place, are more effective sites of intervention. |
Since a simple way to avoid a lockfile is to ask upstream to add the lockfile. My proposal:
Can we agree with this? |
I think that is a reasonable enough expectation for packages with non‐dormant upstreams that haven’t previously expressed an opinion on lock files, when there is no other obstacle to removing the lock file (e.g. Git dependencies or us needing to extensively patch it for our own purposes), yeah. Note that Cargo upstream used to explicitly recommend committing lock files for applications but not libraries, but they have since changed their guidance and it is now considerably more equivocal. So we don’t really have anything slam‐dunk that we can point people to here. |
I've written this out more thoroughly in an amendment in #327064 |
I think that the Rust stuff can’t process a fetcher‐produced |
What if Nix would be able to import files from zip files? Flakes and stuff could just download zip files instead of the extracted tree and Nix could load the files from the inside of the zip file without unpacking. Wouldn't solve the problem but could be a little easier on the IOPS, so, probably more usable in slower storage. |
See the perpetual‐work‐in‐progress lazy trees work. It’s a hard problem, unfortunately, so we shouldn’t hold our breaths for it. |
I haven't studied the problem of git dependencies carefully, but this problem may be solved by using some scripts (instead of cargo) to parse Cargo.lock at build time (instead of eval time)? |
#217084 The original plan was to replace every cargoHash with Cargo.lock, but this was not implemented. This PR also lists some benefits of migrating to Cargo.lock. |
I think I remember people recommending parsing the For example, with But with Is this something we want to abandon, and either have ecosystem-specific tools, or wait for something like recursive Nix? |
We don't need all the data in the Cargo.lock file. (We could drop the |
The other big reason we vendor Cargo.lock are git dependencies. Is there maybe a way to make them worm without dependency vendoring? |
Eventually, the way I'd like this to work would be that we have deduplication as well — some big file that defines every version of a Cargo package required for Nixpkgs, that we'd only have to process once. We could even eventually deduplicate semver-compatible packages, since the Rust ecosystem is very good about this. This would mitigate the problem of relying on upstream to update for dependency crates for security fixes. But this would require some tooling to keep that file up to date when adding / updating a package. The quicker fix would be to remove the dependency information as suggested above, which would just require doing the deletion, and modifying the code that checks the Cargo.lock file matches to allow this. |
This would help with filesystem size but does it help with RAM usage? |
I like the idea of One Gigantic Lock, but a directory with one package per file would probably be better than one file, even if less efficient on disk, because we won’t be fighting Git merges constantly. And of course we’d probably still want per‐package local additions/overrides for things like random Git fork dependencies. |
If I remember correctly, Flutter currently seems to do this in part, which also allows us to patch these dependencies (otherwise it will become very troublesome)
This is how Arch Linux handles golang, and it seems to be the same for rust. However, the problem is that people in the rust ecosystem are already familiar with the feeling of vendoring, which may undermine our assumptions about version compatibility. |
Yeah, we have prior art with big files with pkgs/development/node-packages/node-packages.nix, which is not really fun to update. |
I think you misunderstand. Deleting the version information from Cargo.lock files in Nixpkgs would not change what versions of dependencies are used. The real Cargo.lock file would still be used in the derivation — we just don't need that information at eval time, because the Cargo.lock file would still contain a list of all the packages required for the build, just not which ones depended on which other ones. |
@Mic92 I've just tried using the Cargo.lock from i.e. do cargoLock.lockFile = "${src}/Cargo.lock"; instead of cargoLock.lockFile = ./Cargo.lock Full diffdiff --git a/pkgs/applications/misc/faircamp/Cargo.lock b/pkgs/applications/misc/faircamp/Cargo.lock
deleted file mode 100644
index deeaca6b86be..000000000000
--- a/pkgs/applications/misc/faircamp/Cargo.lock
+++ /dev/null
@@ -1,3169 +0,0 @@
-# This file is automatically @generated by Cargo.
-# It is not intended for manual editing.
-version = 3
// -snip-
diff --git a/pkgs/applications/misc/faircamp/default.nix b/pkgs/applications/misc/faircamp/default.nix
index b243dccf9734..9e359f370aea 100644
--- a/pkgs/applications/misc/faircamp/default.nix
+++ b/pkgs/applications/misc/faircamp/default.nix
@@ -27,7 +27,7 @@ rustPlatform.buildRustPackage rec {
};
cargoLock = {
- lockFile = ./Cargo.lock;
+ lockFile = "${src}/Cargo.lock";
outputHashes = {
"enolib-0.4.2" = "sha256-FJuWKcwjoi/wKfTzxghobNWblhnKRdUvHOejhpCF7kY=";
}; |
That's IFD (Import From Derivation), which is not supported in Nixpkgs because |
I've talked with enough upstreams to be fully convinced that they will intentionally make the wrong choice about |
Flutter packages today convert the pubspec.lock to JSON. The same thing could be done with rust. BTW flet-client-flutter is living proof that the update can be done automatically. BTW which is better? Generate and load nix files with data directly or use something like lib.importJSON? |
In fairness we don’t really care about lock files for libraries, as far as I know; i.e. it’s only our leaf‐ish application packages that consume (IMO |
Also, as people realize the benefits of |
I thought about this for longer and maybe we can set this up in a way that doesn't suck as much. Still it will require some engineering. So if we had a way for users to submit Cargo.lock file to some sort of pastebin prior to submitting the update to nixpkgs and than we had some tooling that could work like this: # stored in a separate file so it's easy to discover with find without having to
# evaluate nix code
cargoDeps = fetchFromNixpkgsCargoLocks ./nix-cargo.lock;
For projects that do have a cargo.lock in their own source tree we woudn't need the pastebin. In the context of the pull request this would use IFD, but when we merge it to nixpkgs, we could use something like a merge-queue that converts IFD to a datatstructure that is in nixpkgs.
If the file is 100% auto-generated, we should not have issues with merge conflicts as we can just delete the whole thing. We would also need to modify our CI tooling to allow IFD from this service that provides our cargo.lock |
Of course if we have computed derivations as in NixOS/rfcs#92 |
I have done a similar analysis of git rev-list --objects --filter=object:type=blob origin/master | git cat-file --batch-check='%(objectsize:disk) %(rest)' | awk '{n=split($2,a,"/"); arr[a[n]]+=$1} END {for (i in arr) print arr[i], i}' | sort --numeric-sort --reverse | numfmt --to=iec-i --suffix=B --padding=7 --round=nearest | head -n 50 Turns out that
Though I do agree that with regard to tarballs, it's a different story. |
Thinking from first-principle, it seems the way this should work is that nixpkgs should define a _sub_set of crates.io packages use that set as a registry for cargo: [source.crates-io]
replace-with = "nixpkgs-flavored-crates.io"
[registries.nixpkgs-flavored-crates-io]
index = "/nix/store/..."
So, when building a Rust project, upstream Cargo.lock is ignored, and instead a new Cargo.lock is generated, based on nixpkgs crates.io registry replacement. The problems here:
|
We used to do this and stopped. I don't know why. |
For now, can we detect the cases in which this is being used without need? An empty If we could throw errors in completely useless usages (suggesting using |
For these, as long as there are no git dependencies, there's still no need to read Cargo.lock from Nix with all the eval penalty that imposes. We can just copy the file into the source tree, which should be very low overhead. And that would mean we can treat empty outputHashes as a robust indicator. (Though there should be an option for out-of-tree projects to allow it.) |
Does copying it into the tree in |
You have to use cargoPatches or some other mechanism that works with the FOD. (There's no real reason we shouldn't just apply the whole patch phase in the FOD, except that at first we didn't so now we can't change it without breaking hashes.) |
Introduction
I've been doing a little investigation on the impact of
Cargo.lock
files because, if you runncdu
against a Nixpkgs checkout, they're usually the largest individual files you come across and rust packages are frequently at the top in any given sub-directory.AFAICT the functionality to import
Cargo.lock
has existed since May 2021. Usage has exploded since:Measurements
Next I measured the total disk usage of all
Cargo.lock
files combined:24MiB!
Realistically though, anyone who cares about space efficiency in any way will use compression, so I measured again with each
Cargo.lock
compressed individually:Further, evidence in #320528 (comment) suggests that handling
Cargo.lock
adds significant eval overhead. Eval time for Nixpkgs vianix-env
is ~28% lower if parsing/handling ofCargo.lock
files is stubbed.Analysis
Just ~300/116231 packages (~0.25%) make up ~6MiB of our ~41MiB compressed nixpkgs tarball which is about 15% in relative terms (18.5KiB per package).
For comparison, our hackage-packages.nix containing the entire Hackage package set (18191 packages) is ~2.3MiB compressed (133 Bytes per package).
Breaking down eval time by package reveals that each
Cargo.lock
takes on average about 76.67 ms to handle/parse.Discussion
I do not believe that this trend is sustainable, especially given the likely increasing importance of rust in the coming years. If we had one order of magnitude more rust packages in Nixpkgs and assumed the same amount of data per package that we currently observe, just the rust packages alone would take up ~54 MiB compressed.
If nothing is done, I could very well see the compressed Nixpkgs tarball bloat beyond 100MiB in just a few years.
Extrapolating eval time does not paint a bright picture either: If we assume one order of magnitude more
Cargo.lock
packages again, evaluating just those packages would take ~4x as long as evaluating the entire rest of Nixpkgs currently does.This does not scale.
Solutions
I'm not deep into rust packaging but I remember the
vendorHash
being the predominant pattern a few years ago which did not have any of these issue as it's just one 32 Byte string literal per package.Would it be possible to revert back to using
vendorHash
es again?(At least for packages in Nixpkgs, having
Cargo.lock
support available for external use is fine.)What else could be done to mitigate this situation?
Limitations/Future work
Files were compressed individually, adding gzip overhead for each lockfile. You could create a tarball out of all
Cargo.lock
files and compress it as a whole to mitigate this effect.I found some
Cargo.lock
files that have a different name or a prefix/suffix and were not considered.CC @NixOS/rust
The text was updated successfully, but these errors were encountered: