Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA drv mass rebuild in parallel #3819

Open
Ericson2314 opened this issue Jul 15, 2020 · 12 comments
Open

CA drv mass rebuild in parallel #3819

Ericson2314 opened this issue Jul 15, 2020 · 12 comments

Comments

@Ericson2314
Copy link
Member

Is your feature request related to a problem? Please describe.

We often do mass rebuilds where many outputs shouldn't be changed, e.g. we tweaking some corner case in bash. It's a bummer to wait for everything to rebuild in the usual order.

Describe the solution you'd like

I would love to tell Nix to assume no outputs will change, so we can do the mass rebuild completely in parallel!

Nix would be given the new and old DRV closures, align them, and then for every pair kick off a build where the old resolved inputs (which are CA drvs) are plugged into the new drv to "hypothetically" resolve it. Then all the drvs are built in parallel.

If anytime the hypothesis didn't hold, it's OK, we'll just end up having done some extra work, and then proceed as we would have normally.

Describe alternatives you've considered

Not sure there is one.

Additional context

RFC NixOS/rfcs#62

@7c6f434c
Copy link
Member

I am not sure I udnerstand the proposal: CA paths are supposed to be used via resolved paths during the builds, so if content doesn't change the rev-dep doesn't even need a rebuild. And the paths that can be easily relocated to a new path are probably expected to be declared CA…

@Ericson2314
Copy link
Member Author

Ericson2314 commented Jul 22, 2020

@7c6f434c The idea is that the derivations changed but the result didn't, e.g. if someone changes a comment in Nixpkg's pkgs/stdenv/generic/setup.sh.

Everything still needs to be rebuilt for sake of purity, but if we assume that no outputs change with those rebuilds, we can rebuild everything in parallel.

@7c6f434c
Copy link
Member

If someone changes a comment in setup.sh, all the path references in all the derivations change, so we need some kind of good-enough path rewriting?

I guess the idea is that you do not assume the path does not change at all, but that the path- rewriting procedure handles it correctly. I guess this could work, although then we really need to check what is the deepest non-bit-reproducible builds in the dep-tree (I heard installation image is 99% reproducible, so maybe it is actually fine)

@Ericson2314
Copy link
Member Author

@7c6f434c with this plan there is no rewriting, it is just a matter of speculative building, if some output path really does end up different, then the speculative build is of no use. Simple as that.

@7c6f434c
Copy link
Member

7c6f434c commented Jul 22, 2020 via email

@Ericson2314
Copy link
Member Author

OK so before let's imagine we have:

  • stdenv -> path0
  • glibc[*stdenv0] -> glibc[path0] -> path1
  • bash[*stdenv0, *glibc] -> bash[path0, path1] -> path2

Then we do:

  • stdenvNext -> path3

Our speculative execution is doing in parallel:

  • glibc[*stdenvNext] => glibc[path3] -> path4
  • bash[*stdenvNext, *glibc] => bash[path3, path1] -> path5

If we find that path4 = path1, then we have:

  • bash[*stdenvNext, *glibcNext => bash[path3, path4] = bash[path3, path1] -> path5

otherwise path4 != path1 and we build bash again:

  • bash[*stdenvNext, *glibcNext => bash[path3, path4] -> path6

Note that -> is the fundamental relation between resolved CA derivations and outputs we must write down, while => is the inductive relation generated by substituting inputs according to ->.

The trick is that we guess what -> will be so we can "pre-resolve" a derivation and start building. We never store our -> guesses, and the pre-resolved derivations are "just data" so there's nothing to invalidate.

@7c6f434c
Copy link
Member

So let me try to formulate the set of parallel build (which I think is currently underspecified).

We have some direct changes and input-path changes. We include in the rebuild set all such derivations that no dependency paths from them to direct-change derivations contain any «run-time dependency on non-CA derivation» edges. Is that right?

By induction and assumption, all build-time dependencies are functionally equivalent and not refrenced, and all run-time dependencies have the same rewritten path.

@Ericson2314
Copy link
Member Author

Yes if the assumption holds and we don't have to do any fallback building, all runtime dep paths stayed the same, but the build-time deps can change (stdenv in this example).

@7c6f434c
Copy link
Member

Maybe add the description of set of rebuilds to try to the issue overview then?

@FRidh
Copy link
Member

FRidh commented Jul 27, 2020

If I understand correctly, I imagine having this will push us to filter parts that we do not think are relevant in order to avoid rebuilds.

@Ericson2314
Copy link
Member Author

Ericson2314 commented Jul 27, 2020

Yes, one subtle thing is in my list of steps I first rebuilt stdenv which is different, and then did the magic step. But we don't want to rely on things already being built----ideally it should be fully declarative. The more general thing is to indicate which derivations are expected to produce different output, and which derivations are should produce same output giving certain assumptions about their inputs. From this rules (note now the second form are inductive), Nix can figure out the rest.

I might add that a very similar inductive process is used to solve the "please ignore the difference between cross and native builds with the same host platform" problem.

@stale
Copy link

stale bot commented Feb 12, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 12, 2021
@stale stale bot removed the stale label Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants