prefetch-yarn-deps, fetchYarnDeps: init #140701

yu-re-ka · 2021-10-06T02:57:42Z

Motivation for this change

Each time we want to add a big application with lots of JS dependencies, someone says it will slow down eval.

Running yarn itself in a fixed-output derivation is a relatively high risk (depending on who you ask a misuse of FOD), as changes in yarn might lead to invisible breakage of the FODs

=> Let's instead only do the downloads of tars and git repos with a small (~100 line) program
Afterwards we can run yarn in a proper derivation, just as if we had imported the offline_cache of a yarn2nix yarn.nix (see gitlab, element).

Inspired by:

A discussion with @luxferresum yesterday
fetchCargoTarball
Yarn fetcher #138233
fetchNodeModules: init #128749

Implementation details that need to be worked out:

If a download or nix-prefetch-git fails, it should exit imediately
We might want to check the hash of tarballs we downloaded. Yarn will do it afterwards anyways, but better catch it earlier.
Maybe add the yarn.lock file to the output, so that it can be compared in the proper derivation (like buildRustPackage will tell you that Cargo.lock changed when you forget to update the hash)
- this is partially redundant since yarn would most likely fail
We need to do more checks on the reproducability of git tarballs on different systems, like the differences in symlink permissions we had in fetchNodeModules: init #128749
Write tests

Things done

Fits CONTRIBUTING.md.

roberth · 2021-10-06T08:52:53Z

Is this sufficient to perform a proper build?

I would assume you tried, but we'll need to know.

Yarn will do it afterwards anyways, but better catch it earlier.

Is this part of the FOD? Otherwise it's ineffective.

We might want to check the hash of tarballs we downloaded.

If you check early, you can retry. Also failing early is helpful when fetchYarnDeps takes a long time. It'd be bad when downloads are failing slowly, because that could make the fetcher take a very very long time to fail. It's also irresponsible from a distributed systems perspective. If the load is too high and a service starts to fail, clients should back off. The only (responsible) thing we can do is fail early.

Maybe add the yarn.lock file to the output

This is convenient but adds the risk of minute changes to the lockfile to break the FOD, in a way we can't detect, because we don't have a hash for the lock file. It does not make the FOD more powerful, so I recommend to wire that file into the build independently of the FOD.
Granted, in many cases you could probable guarantee that it does not change by itself but that doesn't take into account the human factor. We do need to update dependencies and expressions every now and then, which shouldn't require a deeper understanding of this fetcher than necessary.

Maybe add the yarn.lock file to the output [...] this is partially redundant since yarn would most likely fail

It must do so in the FOD, otherwise we can't get any info out about the mismatch, except a useless combined hash.

Write tests

Use https://nixos.org/manual/nixpkgs/unstable/#sec-pkgs-invalidateFetcherByDrvHash for success cases.

Testing failures is harder, but can be done with nixosTest.

"passthru.tests" on functions via OfBorg is an open problem. callPackage produces a callable attrset (.__functor) so we can just // { tests = ... } onto that, but nix-build doesn't let you build it, whereas the experimental nix build does. OfBorg follows the nix-build behavior. See #136022

yu-re-ka · 2021-10-06T09:52:52Z

Is this sufficient to perform a proper build?

I would assume you tried, but we'll need to know.

It is sufficient to build an offline_cache equivalent to one provided by a yarn2nix yarn.nix. I have tested with some big yarn.lock files of applications packaged in nixpkgs, including hedgedoc, which has some git dependencies.

yu-re-ka · 2021-10-06T16:48:47Z

I have implemented hashsum checking, and ensured it will always exit with a non-zero code when something went wrong.

As a demo, I have converted gitlab, hedgedoc and element-desktop to fetchYarnDeps.

kloenk · 2021-10-06T20:10:03Z

hedgedoc.offlineCache and element-desktop.offlineCache builds on aarch64-darwin

happysalada · 2021-10-07T13:14:56Z

I really like this personally. I'm running a nixpkgs-review just to confirm all is good.
This is looking like a really good short term solution to our problem of code generation.

This is obviously outside the scope of this PR, but there are 2 yarn.lock parsers in pure nix.
One in the yarn2nix fork of iohk and I made one as well based on parser combinators.

I think in the future we can think about doing that parsing of the lock in nix. I'm just starting the conversation here to see how people feel about that.

yu-re-ka · 2021-10-07T14:42:22Z

@happysalada from my point of view there are multiple use cases that have really different requirements:

Third party repositories
For CI on third party repositories, importing a yarn.lock file must "just work" without extra steps.
We can use builtins.fetchGit and IFD. We don't have to care about the eval time so much.
This is where your yarn.lock parser would be useful.
Nixpkgs
Here we can do extra steps with internet access manually or as part of an update script. We only package a new release every so often.
But we have to be careful with eval time, size of files added to Nix repository and we can not use builtins.fetchGit or IFD.

So in my opinion it's really nice to have a yarn.lock parser in pure Nix, but right now it's useless for packaging stuff in nixpkgs, because it would have worse evaluation time than pre-converted yarn.nix files, and would still need to check in big yarn.lock files to nixpkgs.
This may look different in the future, depending on how recursive Nix will work.

@roberth I'm not sure if I understand your other comments

Yarn will do it afterwards anyways, but better catch it earlier.

Is this part of the FOD? Otherwise it's ineffective.

I don't understand why it would be ineffective to check outside of the FOD if the FOD generated some nonsense offline cache, but anyways this is solved now: It will exit immediately if one download fails or gives the wrong hash.

We might want to check the hash of tarballs we downloaded.

If you check early, you can retry. Also failing early is helpful when fetchYarnDeps takes a long time. It'd be bad when downloads are failing slowly, because that could make the fetcher take a very very long time to fail. It's also irresponsible from a distributed systems perspective. If the load is too high and a service starts to fail, clients should back off. The only (responsible) thing we can do is fail early.

Again, solved by checking hashes and exiting

Maybe add the yarn.lock file to the output

This is convenient but adds the risk of minute changes to the lockfile to break the FOD, in a way we can't detect, because we don't have a hash for the lock file. It does not make the FOD more powerful, so I recommend to wire that file into the build independently of the FOD.
Granted, in many cases you could probable guarantee that it does not change by itself but that doesn't take into account the human factor. We do need to update dependencies and expressions every now and then, which shouldn't require a deeper understanding of this fetcher than necessary.

This is how it's supposed to work:

I assume that we will use the same yarn.lock source for fetchYarnSource and the main drv.

The risk I am talking about is: Someone updates the src, but forgets to update the FOD hash. It will not attempt to build the FOD again, because the output for the old FOD hash is cached. No check in the FOD could help against this, because it would simply not be run.

The question I am asking is: Do we need additional protection from this like fetchCargoTarball has?
Currently it will only fail if yarn fails because the offlineCache does not have all the resources it needs. But for example if one dependency is removed from yarn.lock, there could be an invisible breakage because the old offlineCache is still sufficient to perform the main build, and it is still in the binary cache.

happysalada · 2021-10-08T11:26:15Z

I agree with you on the fact that a pure nix-parser will be much use in nixpkgs. In order to not be IFD, the lock file would need to be commited, and that's not great. FOD sounds a much better solution (in my opinion too).

Another package suggestion I had if you are interested in testing more is https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/applications/video/mirakurun/default.nix

In investigating yarn and the madness around it, we found a couple of places that are badly broken. For example in the 'react' lock https://github.com/facebook/react/blob/main/yarn.lock#L12604, a version of 'react-is' that fits the semver is not included. I'm not saying that the FOD should be mindful of these, but just saying that we might have problems with this approach on some packages. Personally I think as long as we can use this approach, we should still go for it.

happysalada · 2021-10-08T11:52:08Z

I couldn't finish the nixpkgs-review, due to git hanging forever on trying to build (it's not related, but I'm afraid I won't be able to verify with my machine).

Regarding the problem of updating source and not the sha-256, perhaps the solution should just be to have update scripts for these packages ? Of course, an error can still be made, but it would reduce the potential of such mistakes.

mohe2015 · 2021-10-08T18:25:21Z

Result of nixpkgs-review pr 140701 run on x86_64-linux 1

6 packages built:

element-desktop
element-desktop-wayland
gitlab
gitlab-ee
hedgedoc
prefetch-yarn-deps

For NixOS/nixpkgs#140701

yu-re-ka · 2021-10-09T16:12:08Z

Actually I will remove the "convert xyz to fetchYarnDeps" again, and we can do that in seperate PRs. For now I just want to add fetchYarnDeps with this PR.
I think happysalada is right: Update scripts will mostly prevent these kinds of human errors.

yu-re-ka · 2021-10-12T23:04:55Z

I think this is in a pretty good state, with update scripts and all.
@Ma27 could you take a look at the hedgedoc and element changes?

roberth · 2021-10-13T09:06:45Z

Is there any reason not to copy yarn.lock over to the FOD, and check against that when building in the proper derivation? That way we could warn the user about an outdated FOD hash that is cached locally but would not be reproducible for other users

If the lock file changes subtly, you'll get a hash mismatch, telling you nothing about the cause of the problem. Such a change could be caused by anything like a formatter, git crlf handling, editors appending or removing final newlines. As usual with FOD arguments, the error will only appear after some time, or when run by someone else, removing the bad error message from its context.

The check would be useless, since the value of offlineCache.yarnLock would point to the new src, even when the FOD hash is outdated.

I think you're right.

Maybe there's a middle ground where the lock is in a second output. Having two hashes means you can tell which one broke, but it also leads to another slight chance of human error and more implementation complexity.

Another way to improve the error context is to incorporate a lock source hash (hash of source path or hash of FOD out path or regular drv hash, depending on how we get the lock file) in the FOD name. This refetches immediately, allowing the error to be caught while making the change, but this will also lead to unexpected rebuilds and possibly unnecessary resource consumption if the hash is not a unique identifier, as is the case when it's neither a local source nor an independently fetched file.

I can't think of a solution that ticks all the boxes including a good error message when the lock file changes subtly. I don't think this scenario will be as frequent as forgetting to update the hash, so adding the lock file to the fixed output seems best.

pkgs/applications/networking/instant-messengers/element/element-desktop.nix

Ma27 · 2021-10-13T10:27:53Z

FYI before this gets merged, I'd also like to review it — at least the element changes, however I don't think I'll get to it before the weekend :)

yu-re-ka · 2021-10-13T10:30:02Z

If the lock file changes subtly, you'll get a hash mismatch, telling you nothing about the cause of the problem. Such a change could be caused by anything like a formatter, git crlf handling, editors appending or removing final newlines. As usual with FOD arguments, the error will only appear after some time, or when run by someone else, removing the bad error message from its context.

But this going unnoticed is exactly what we are preventing by comparing the yarn.lock in the FOD with the yarn.lock from the current src. If the yarn.lock is edited (for example a newline is appended), the main derivation will tell you that there is a mismatch and this change will not even make it onto master (or will be directly apparent).

FYI before this gets merged, I'd also like to review it — at least the element changes, however I don't think I'll get to it before the weekend :)

Of course, we'll wait for your review :)

roberth · 2021-10-13T10:59:32Z

Is there any reason not to copy yarn.lock over to the FOD

If the lock file changes subtly, you'll get a hash mismatch, telling you nothing about the cause of the problem. Such a change could be caused by anything like a formatter, git crlf handling, editors appending or removing final newlines. As usual with FOD arguments, the error will only appear after some time, or when run by someone else, removing the bad error message from its context.

But this going unnoticed is exactly what we are preventing by comparing the yarn.lock in the FOD with the yarn.lock from the current src. If the yarn.lock is edited (for example a newline is appended), the main derivation will tell you that there is a mismatch and this change will not even make it onto master (or will be directly apparent).

The only problem is that the FOD may not be buildable. It relies on an FOD with the same output having been built before on the same host or cache. Without a preexisting FOD output, the error will not be informative. I hope we are correct to assume that it's unlikely to happen in practice.

As said, I'm ok with adding the lock file to the FOD output. It's a good trade-off.

Ma27

Without a preexisting FOD output, the error will not be informative. I hope we are correct to assume that it's unlikely to happen in practice

Isn't that a common issue with FODs?

While I'd personally prefer to generate fixed-output drvs while reading a lock-file at eval time (despite increased eval-times, see e.g. https://github.com/edolstra/import-cargo), I acknowledge that this is pretty complex for Yarn's lockfiles.

Since this also avoids relying on tools such as Yarn that bring in potential impurities or imply a risk of hard upgradability in case something changes internally (e.g. the issues we have with Go 1.17), I think that this is a good solution for us, hence 👍

pkgs/applications/networking/instant-messengers/element/element-web.nix

roberth · 2021-10-17T15:35:06Z

Without a preexisting FOD output, the error will not be informative. I hope we are correct to assume that it's unlikely to happen in practice

This is quite a unique fetcher because we're in a position to validate the individual files. It lets us give an error message that leads to the cause of the problem, rather than "we downloaded the interweb again and this time it's different". However, this does not apply to the lock file itself if we add it to the output, because we don't have a hash of that.

Isn't that a common issue with FODs?

Yes, it is, and that's why I would normally recommend against complex FODs. This one can provide good error messages, unless the lock file is in the output, in which case it can also produce a bad error message: a hash mismatch with no clue as to what has changed. Someone who knows this fetcher really well might be able to speculate what went wrong, but it's bad UX.
As mentioned, the choice of putting the lock in the FOD output is a trade-off and I think the more common mistake is to forget to update the hash, so we should prioritize that and accept that in rare cases, a lock file mismatch can occur.

pkgs/applications/networking/instant-messengers/element/keytar/update.sh

yu-re-ka · 2021-10-18T16:45:45Z

Anything more left to do? Would be nice to get it done until the end of the week.

happysalada

I can't think of anything else

mohe2015 · 2021-10-25T15:22:13Z

Would it be possible that an empty hash doesnt produce error: fetchYarnDeps requires a hash but instead produces a hash mismatch like many other fetchers?

For NixOS/nixpkgs#140701

yu-re-ka requested review from Mic92, happysalada, Ma27 and roberth October 6, 2021 02:57

ofborg bot added 8.has: package (new) 10.rebuild-darwin: 0 10.rebuild-linux: 1-10 10.rebuild-linux: 1 labels Oct 6, 2021

yu-re-ka force-pushed the feature/fetch-yarn branch from 54fbaaf to a153aef Compare October 6, 2021 16:47

ofborg bot added 10.rebuild-darwin: 1-10 and removed 10.rebuild-darwin: 0 10.rebuild-linux: 1 labels Oct 6, 2021

yu-re-ka mentioned this pull request Oct 7, 2021

nixos/peertube: init service #119110

Merged

10 tasks

yu-re-ka pushed a commit to yu-re-ka/nix-update that referenced this pull request Oct 8, 2021

support for updating yarn FOD hash

c18005c

For NixOS/nixpkgs#140701

yu-re-ka mentioned this pull request Oct 8, 2021

support for updating yarn FOD hash Mic92/nix-update#68

Merged

yu-re-ka added a commit to yu-re-ka/nix-update that referenced this pull request Oct 8, 2021

support for updating yarn FOD hash

611438d

For NixOS/nixpkgs#140701

yu-re-ka force-pushed the feature/fetch-yarn branch from e484cb9 to b38d147 Compare October 9, 2021 16:12

ofborg bot added 10.rebuild-darwin: 0 10.rebuild-linux: 1 and removed 10.rebuild-darwin: 1-10 labels Oct 9, 2021

roberth reviewed Oct 13, 2021

View reviewed changes

pkgs/applications/networking/instant-messengers/element/element-desktop.nix Show resolved Hide resolved

Ma27 reviewed Oct 17, 2021

View reviewed changes

pkgs/applications/networking/instant-messengers/element/element-web.nix Show resolved Hide resolved

roberth reviewed Oct 17, 2021

View reviewed changes

pkgs/applications/networking/instant-messengers/element/keytar/update.sh Outdated Show resolved Hide resolved

yu-re-ka added 3 commits October 17, 2021 19:20

prefetch-yarn-deps, fetchYarnDeps: init

8b19711

mkYarnModules: allow passing custom offlineCache

4c07009

fetchYarnDeps, mkYarnModules: verify the FOD hash is up-to-date

49c2d69

yu-re-ka force-pushed the feature/fetch-yarn branch from dc72676 to 99caa8b Compare October 17, 2021 17:20

yu-re-ka requested a review from roberth October 17, 2021 17:21

yu-re-ka added 5 commits October 18, 2021 14:15

element-desktop: use fetchYarnDeps, add update script

d46005e

element-desktop.keytar: use fetchYarnDeps

b4e2ea5

element-desktop.seshat: use fetchYarnDeps

895d86c

hedgedoc: use fetchYarnDeps, add update script

e2b3b95

gitlab: use fetchYarnDeps

f40ff0c

yu-re-ka force-pushed the feature/fetch-yarn branch from 99caa8b to f40ff0c Compare October 18, 2021 12:16

happysalada approved these changes Oct 19, 2021

View reviewed changes

yu-re-ka merged commit 2d1057f into NixOS:master Oct 20, 2021

yu-re-ka deleted the feature/fetch-yarn branch October 20, 2021 09:39

talyz mentioned this pull request Oct 28, 2021

[21.05] gitlab: 14.3.3 -> 14.4.1 #143397

Merged

12 tasks

Izorkin mentioned this pull request Nov 6, 2021

mastodon: 3.4.1 -> 3.4.4 #144824

Merged

12 tasks

yu-re-ka added a commit to yu-re-ka/nix-update that referenced this pull request Sep 6, 2023

support for updating yarn FOD hash

e6b8be9

For NixOS/nixpkgs#140701

yu-re-ka added a commit to yu-re-ka/nix-update that referenced this pull request Sep 6, 2023

support for updating yarn FOD hash

8b3e81e

For NixOS/nixpkgs#140701

Stunkymonkey mentioned this pull request Nov 26, 2023

Get rid of sha1 for fixed-output derivations in nixpkgs #77238

Open

42 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prefetch-yarn-deps, fetchYarnDeps: init #140701

prefetch-yarn-deps, fetchYarnDeps: init #140701

yu-re-ka commented Oct 6, 2021 •

edited

Loading

roberth commented Oct 6, 2021

yu-re-ka commented Oct 6, 2021 •

edited

Loading

yu-re-ka commented Oct 6, 2021

kloenk commented Oct 6, 2021

happysalada commented Oct 7, 2021

yu-re-ka commented Oct 7, 2021 •

edited

Loading

happysalada commented Oct 8, 2021

happysalada commented Oct 8, 2021

mohe2015 commented Oct 8, 2021

yu-re-ka commented Oct 9, 2021 •

edited

Loading

yu-re-ka commented Oct 12, 2021 •

edited

Loading

roberth commented Oct 13, 2021

Ma27 commented Oct 13, 2021

yu-re-ka commented Oct 13, 2021 •

edited

Loading

roberth commented Oct 13, 2021 •

edited

Loading

Ma27 left a comment

roberth commented Oct 17, 2021

yu-re-ka commented Oct 18, 2021

happysalada left a comment

mohe2015 commented Oct 25, 2021

prefetch-yarn-deps, fetchYarnDeps: init #140701

prefetch-yarn-deps, fetchYarnDeps: init #140701

Conversation

yu-re-ka commented Oct 6, 2021 • edited Loading

Motivation for this change

Things done

roberth commented Oct 6, 2021

yu-re-ka commented Oct 6, 2021 • edited Loading

yu-re-ka commented Oct 6, 2021

kloenk commented Oct 6, 2021

happysalada commented Oct 7, 2021

yu-re-ka commented Oct 7, 2021 • edited Loading

happysalada commented Oct 8, 2021

happysalada commented Oct 8, 2021

mohe2015 commented Oct 8, 2021

yu-re-ka commented Oct 9, 2021 • edited Loading

yu-re-ka commented Oct 12, 2021 • edited Loading

roberth commented Oct 13, 2021

Ma27 commented Oct 13, 2021

yu-re-ka commented Oct 13, 2021 • edited Loading

roberth commented Oct 13, 2021 • edited Loading

Ma27 left a comment

Choose a reason for hiding this comment

roberth commented Oct 17, 2021

yu-re-ka commented Oct 18, 2021

happysalada left a comment

Choose a reason for hiding this comment

mohe2015 commented Oct 25, 2021

yu-re-ka commented Oct 6, 2021 •

edited

Loading

yu-re-ka commented Oct 6, 2021 •

edited

Loading

yu-re-ka commented Oct 7, 2021 •

edited

Loading

yu-re-ka commented Oct 9, 2021 •

edited

Loading

yu-re-ka commented Oct 12, 2021 •

edited

Loading

yu-re-ka commented Oct 13, 2021 •

edited

Loading

roberth commented Oct 13, 2021 •

edited

Loading