Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetchcargo: use flat tar.gz file for vendored src instead of recursive hash dir #78501

Merged
merged 1 commit into from Feb 10, 2020

Conversation

@bhipple
Copy link
Contributor

@bhipple bhipple commented Jan 26, 2020

This has several advantages:

  1. It takes up significantly less space on disk in-between builds in the nix store.
  2. It plays nicely with hashed mirrors like tarballs.nixos.org, which only
    substitute --flat hashes on single files (not recursive directory hashes).
  3. It's consistent with how simple fetchurl src derivations work.
  4. It provides a stronger abstraction between input src-package and output
    package, e.g., it's harder to accidentally depend on the src derivation at
    runtime by referencing something like ${src}/etc/index.html. Likewise, in
    the store it's harder to get confused with something that is just there as a
    build-time dependency vs. a runtime dependency, since the build-time
    src dependencies are tarred up.

Note that since the binary cache vendor packages are already compressed with xz, it's not more efficient on that end or in terms of network traffic.

Disadvantages are:

  1. It takes slightly longer to untar at the start of a build (~2 seconds).

As currently implemented, this attaches the compacted vendor.tar.gz feature as a
rider on verifyCargoDeps, since both of them are relatively newly implemented
behavior that change the cargoSha256.

If this PR is accepted, I will push forward the remaining rust packages with a
series of treewide PRs to update the cargoSha256s.

Motivation for this change
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
@bhipple bhipple requested a review from andir as a code owner Jan 26, 2020
@bhipple
Copy link
Contributor Author

@bhipple bhipple commented Jan 26, 2020

@GrahamcOfBorg build broot ripgrep

@bhipple
Copy link
Contributor Author

@bhipple bhipple commented Jan 26, 2020

a compressed tar.gz file instead.


## Migration plan

This comment has been minimized.

@bhipple

bhipple Jan 26, 2020
Author Contributor

I have updated the hash on every package that has verifyCargoDeps = true; in this PR, and verified that other rust packages with this set to false are not impacted, e.g., changing 1 character on the hash for ripgrep and re-building it produces an error that shows the old hash.

I have a script to do the rest of them and set it to true everywhere, but figured I'd open the PR first for feedback about whether others agree this is a good idea in general first. If so, once it's merged I can send the treewide follow-up PR that changes all the hashes and sets the attribute to true, then once that's merged I'll send another that removes the boolean.

Copy link
Member

@zimbatm zimbatm left a comment

I think it's a good idea.

Since this is a breaking change to the output hash, can you rename the verifyCargoDeps variable to something else like archiveCargoDeps? That way external users will get an explicit break rather than mysteriously changing hashes.

Since fetchcargo now produces two entirely different outputs, what do you think of creating a new fetchCargoTarball fetcher instead? There would be a bit of duplication but it would also avoid having that logic inter-linked.

@bhipple
Copy link
Contributor Author

@bhipple bhipple commented Jan 29, 2020

Nice, I'll play around with the idea of creating an entirely different fetchcargo helper and leaving the old one modified and see how it looks. Either way I intend to get us back to just having 1 standard way of doing things and hope to delete the legacy behavior once ported, but tactically your suggestion might be superior.

@bhipple bhipple force-pushed the bhipple:feature/fetchcargo-tar-gz branch from 54141c7 to 2452e69 Feb 2, 2020
@bhipple bhipple requested review from LnL7 and Mic92 as code owners Feb 2, 2020
@bhipple bhipple force-pushed the bhipple:feature/fetchcargo-tar-gz branch 2 times, most recently from 970a828 to fdd2460 Feb 2, 2020
@bhipple
Copy link
Contributor Author

@bhipple bhipple commented Feb 2, 2020

Ok, per discussion I have refactored to leave the old fetchcargo alone, and implement this as a switch on a new, separate fetchCargoTarball. This does look cleaner, and it makes it clearer what the new implementation looks like once all the legacy stuff is deleted.

Copy link
Contributor Author

@bhipple bhipple left a comment

Building one thing with the new behavior and one thing without:

@GrahamcOfBorg build ripgrep hexdino

As a tip to anyone testing, hexdino appears to be one of the smallest packages with the fewest vendored dependencies to download and compile, so it's great for fast edit -> compile -> test cycles.

install -D $CARGO_CONFIG $name/.cargo/config;
'';

# Build a reproducible tar, per instructions at https://reproducible-builds.org/docs/archives/

This comment has been minimized.

@bhipple

bhipple Feb 2, 2020
Author Contributor

I've run this through a bunch of times and verified that it is indeed reproducible.


cargoFetcher = if legacyCargoFetcher
then fetchcargo
else fetchCargoTarball;

This comment has been minimized.

@bhipple

bhipple Feb 2, 2020
Author Contributor

While we're making some light changes in the default.nix, there is no change in fetchcargo.nix -- just a switch on which implementation to use.

2. Send a treewide Rust PR that sets `legacyCargoFetcher = true;` in all Rust
applications not using this (which is ~200 of them), with a note to
maintainers to delete if updating the package. Change the default in
`buildRustPackage` to false.

This comment has been minimized.

@bhipple

bhipple Feb 2, 2020
Author Contributor

I think the actual implementation is relatively straightforward and the advantages are clear, but I could foresee this causing git merge conflict hell for maintainers. As implemented here and discussed I think this provides the easiest/clearest way for maintainers to resolve cherry-pick conflicts on stable backports, but if anyone has any better guidance LMK.

This comment has been minimized.

@bhipple

bhipple Feb 4, 2020
Author Contributor

A script that accomplishes this is attached at the bottom. I have run it on this implementation and verified that it does not cause any rebuilds or change any hashes. Since it touches 200+ files and causes no rebuilds, my current plan is to let the implementation filter its way through staging -> staging-next -> master, then send the treewide PR to master to reduce merge conflict burden on the maintainers, as mentioned above.

At that point, all of the Rust applications will be explicitly opting into the legacy Cargo fetcher, with a comment to maintainers to delete the attr line on the next update.

#!/usr/bin/env zsh
set -euo pipefail
cd ~/src/nixpkgs || exit 1

sed -i 's|^, legacyCargoFetcher.*|, legacyCargoFetcher ? false|' pkgs/build-support/rust/default.nix

for f in $(rg -l 'cargoSha256 = "' **/*.nix); do
    if rg -q 'legacyCargoFetcher = false;' $f; then
        continue
    fi

    sed -i '/cargoSha256 = "/i   # Please delete this line on next update\n  legacyCargoFetcher = true;\n' $f
done
@bhipple bhipple requested review from grahamc and zimbatm Feb 2, 2020
@ofborg ofborg bot requested a review from misuzu Feb 2, 2020

# Fetcher implementation choice should not be part of the hash in final
# derivation; only the cargoSha256 input matters.
filteredArgs = builtins.removeAttrs args [ "legacyCargoFetcher" ];

This comment has been minimized.

@bhipple

bhipple Feb 4, 2020
Author Contributor

Also added this, which ensures that the default value of legacyCargoFetcher does not impact the hash sent to stdenv.mkDerivation. This lets us do the next step (default it to true, opting out all packages via a treewide sed) without causing any hash changes or package rebuilds, including in packages like cargo that do use the Rust build platform infra but don't use any fetchers (it inherits its vendor dir from rustc).

@bhipple bhipple force-pushed the bhipple:feature/fetchcargo-tar-gz branch from bea97b5 to 6d50ae5 Feb 6, 2020
@bhipple
Copy link
Contributor Author

@bhipple bhipple commented Feb 6, 2020

Rebased onto staging to fix a merge conflict, but otherwise unchanged.

@bhipple bhipple mentioned this pull request Feb 6, 2020
0 of 10 tasks complete
@bhipple
Copy link
Contributor Author

@bhipple bhipple commented Feb 7, 2020

Thanks for the feedback and review @zimbatm. Anyone else mind taking a look and giving a review? @andir perhaps?

@zimbatm
Copy link
Member

@zimbatm zimbatm commented Feb 7, 2020

@zimbatm
Copy link
Member

@zimbatm zimbatm commented Feb 7, 2020

@bhipple do you see any ways to change the PR so that it's not a mass rebuild? If not then let's merge this to staging.

@bhipple
Copy link
Contributor Author

@bhipple bhipple commented Feb 7, 2020

Unfortunately I don't -- though this does clean the fetcher impl used from the stdenv.mkDerivation, so the follow-ups will not be mass rebuilds.

@bhipple bhipple mentioned this pull request Feb 8, 2020
4 of 10 tasks complete
@FRidh FRidh added this to WIP in Staging via automation Feb 9, 2020
@bhipple bhipple force-pushed the bhipple:feature/fetchcargo-tar-gz branch from 6d50ae5 to 4583d91 Feb 10, 2020
@bhipple
Copy link
Contributor Author

@bhipple bhipple commented Feb 10, 2020

I rebased onto staging and cherry-picked in the broot update so I could re-compute the sha, but made no other material changes. I think this will (probably?) still cause git merge conflicts on broot, but as stacked my final commit has the "right" value.

@zimbatm
Copy link
Member

@zimbatm zimbatm commented Feb 10, 2020

should we merge this?

…e hash dir

This has several advantages:

1. It takes up less space on disk in-between builds in the nix store.
2. It uses less space in the binary cache for vendor derivation packages.
3. It uses less network traffic downloading from the binary cache.
4. It plays nicely with hashed mirrors like tarballs.nixos.org, which only
   substitute --flat hashes on single files (not recursive directory hashes).
5. It's consistent with how simple `fetchurl` src derivations work.
6. It provides a stronger abstraction between input src-package and output
   package, e.g., it's harder to accidentally depend on the src derivation at
   runtime by referencing something like `${src}/etc/index.html`. Likewise, in
   the store it's harder to get confused with something that is just there as a
   build-time dependency vs. a runtime dependency, since the build-time
   src dependencies are tarred up.

Disadvantages are:
1. It takes slightly longer to untar at the start of a build.

As currently implemented, this attaches the compacted vendor.tar.gz feature as a
rider on `verifyCargoDeps`, since both of them are relatively newly implemented
behavior that change the `cargoSha256`.

If this PR is accepted, I will push forward the remaining rust packages with a
series of treewide PRs to update the `cargoSha256`s.
@bhipple bhipple force-pushed the bhipple:feature/fetchcargo-tar-gz branch from 4583d91 to 2115a20 Feb 10, 2020
@bhipple
Copy link
Contributor Author

@bhipple bhipple commented Feb 10, 2020

Rebased again to fix broot merge conflict. Yes I'd say let's merge this :)

@zimbatm zimbatm merged commit 15a51f6 into NixOS:staging Feb 10, 2020
Staging automation moved this from WIP to Done Feb 10, 2020
@zimbatm
Copy link
Member

@zimbatm zimbatm commented Feb 10, 2020

please make sure to monitor the staging channel to see if there are any regressions :)

bhipple added a commit to bhipple/nixpkgs that referenced this pull request Feb 13, 2020
Cherry-picked from PR NixOS#78501

This causes no functional change, but does recompile all Rust applications. It's
necessary to get the new fetcher on the 20.03 branch in order to backport
updated Rust packages in the future.

See NixOS#79975 for details.
@bhipple bhipple mentioned this pull request Feb 13, 2020
0 of 10 tasks complete
@petabyteboy
Copy link
Member

@petabyteboy petabyteboy commented Feb 17, 2020

So while updating my systems I ran into a hash mismatch for a third party derivation for shelfie.

The errors were very confusing to me. First I got this:

hash mismatch in fixed-output derivation '/nix/store/6kn0lgbb42pa51i2p41aqhdpb3klz07n-shelfie-0.1.0-vendor.tar.gz':
  wanted: sha256:06nhwxyv2x78gdky5lz9raipx6z764n9ffqwq30x4mfxbsbdwf2x
  got:    sha256:0085z26g6j80nf0zbwqxnbcqyjhxzknvb5wy21fmrrmy4ldrp4fl

So I changed the hash and rebuilt the system, which gave me this error:

builder for '/nix/store/kaxfdgqak5y8hmc3q314x53cs8hgbwy4-shelfie-0.1.0.drv' failed with exit code 1; last 10 log lines:

  ERROR: cargoSha256 is out of date

  Cargo.lock is not the same in shelfie-0.1.0-vendor.tar.gz

  To fix the issue:
  1. Use "1111111111111111111111111111111111111111111111111111" as the cargoSha256 value
  2. Build the derivation and wait it to fail with a hash mismatch
  3. Copy the 'got: sha256:' value back into the cargoSha256 field

But fixing the hash is literally what I just did.
What is going on here?

Edit:
So I followed the instructions again step-by-step and used nix-build instead of nix build this time, and go the following additional output:

building '/nix/store/kaxfdgqak5y8hmc3q314x53cs8hgbwy4-shelfie-0.1.0.drv'...
unpacking sources
unpacking source archive /nix/store/1cmnijialk8md0gphmnaihpgmzlrrayc-n8wf2dci0psz9q2j0n2phsf9ak1d23q2-source
source root is n8wf2dci0psz9q2j0n2phsf9ak1d23q2-source
unpacking source archive /nix/store/6kn0lgbb42pa51i2p41aqhdpb3klz07n-shelfie-0.1.0-vendor.tar.gz
diff: source/Cargo.lock: No such file or directory

ERROR: cargoSha256 is out of date

Cargo.lock is not the same in shelfie-0.1.0-vendor.tar.gz

To fix the issue:
1. Use "1111111111111111111111111111111111111111111111111111" as the cargoSha256 value
2. Build the derivation and wait it to fail with a hash mismatch
3. Copy the 'got: sha256:' value back into the cargoSha256 field

builder for '/nix/store/kaxfdgqak5y8hmc3q314x53cs8hgbwy4-shelfie-0.1.0.drv' failed with exit code 1

What could cause source/Cargo.lock to be missing?

@bhipple
Copy link
Contributor Author

@bhipple bhipple commented Feb 17, 2020

See here: https://github.com/NixOS/nixpkgs/pull/80262/files/f180adc0c1cdba75a5fd540d8a86880e36aade0b#r379944118

Because you're using src = ./.;, the name of the source package is not "source", so it's failing to find the Cargo.lock to compare against the cargoSha256 vendor dir. Can you try rebuilding shelfie with that PR and see if it "just works"?

That's on the staging branch, so cheaper/faster alternatives are to fix it with either the LHS or RHS of a diff like this:
https://github.com/NixOS/nixpkgs/pull/80239/files

E.g., you can either use the legacy cargo fetcher, which won't validate that your src and cargoSha256 are in sync, or you can ln the source dir to be explicitly named source.

@petabyteboy
Copy link
Member

@petabyteboy petabyteboy commented Feb 17, 2020

I see! It works fine after cherry-picking the commit from the PR you mentioned, thanks.

@zimbatm
Copy link
Member

@zimbatm zimbatm commented Feb 17, 2020

one way to force the source name is to use builtins.path like this: src = builtins.path { name = "source"; path = ./.; }

@bhipple
Copy link
Contributor Author

@bhipple bhipple commented Feb 17, 2020

The root "issue" here is that fetchFromGitHub (and a couple other fetchers) will give the fixed-output derivation the name source if no other is provided, which almost none of them do:
https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-support/fetchgithub/default.nix#L3

However, since users can provide a different name, and since a couple of the fetchers do, I think it's good practice for us to not rely on that particular implementation detail of fetchFromGitHub. We do have workarounds like ln -s ${name} source, which we can do at an individual package level, but I don't think we can do that at an infra level, since we don't know (nor should we) the given name of the src attribute's package.

Using a glob to be generic seems like the right approach to me philosophically; the trick is thinking of what weird edge conditions might exist and making it robust in #80262

@bhipple bhipple deleted the bhipple:feature/fetchcargo-tar-gz branch May 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.