Split build.cc -- new version of #3098 #4114

Ericson2314 · 2020-10-06T18:01:11Z

Opening this because I think @edolstra's opinion has changed, and this would certainly be useful to to me doing incremental debug builds and being able to page less code into my head at a time.

I merged each of the original commits in succession, and basically re-copped code as needed. There were no interesting conflicts.

CC @regnat

Ericson2314 · 2020-10-06T18:11:46Z

If this is wanted, I hope we can merge it quickly, because it will bit-rot very fast :).

thufschmitt · 2020-10-07T08:27:42Z

If this is wanted, I hope we can merge it quickly, because it will bit-rot very fast :).

Too late ;)

edolstra · 2020-10-07T09:53:36Z

The other issues are that it breaks history (which I do actually use), invalidates existing PRs (e.g. #3600, which BTW moves UserLock into a separate file because it was largely rewritten in that PR anyway) and makes cherrypicking changes to the maintenance branch a lot harder.

Ericson2314 · 2020-10-09T01:26:54Z

So git has recently been horrendous with conflicts for me with this, I'll admit. I ended up manually computing a diff, manually partitioning the hunks based one what new file they changes, and apply thing to fix conflicts. The modified patches at least applied cleanly, but that still is a rather bad process.

Still, I think we should do this.

The diff cost is temporary; going forward with PRs made after this there is no problem
As @edolstra says:

it makes cherry-picking changes to the maintenance branch a lot harder

But we're soon going to release 3.0, so this is also one of the best times to make cherry-picking harder. I'm quite convinced this is something we will at some point end up anyways as the schedule is hard to modify/grok right now and many changes around scheduling (especially with remote builders) are desire. If so, we might as well pay the cost now when it's cheaper.
After 3.0 and by whenever floating CA derivations become stable, I think the current flurry of activity around libnixstore will subside. Similar to the above, I want to get the big invasive changes "out of the way".

In particular, there are some big scheduler changes we'll need/want to do in the short term I rather do after this:

support building only drvs that lack substitutes (aka nix-build-uncached) #3946: This seems quite popular, but my idea of doing nix-build --store $remote_store --builders auto requires making scheduled ensurePath work for stores beyond LocalStore so that they can use the builders. This is a pretty hefty refactor that builds very nicely on this as it's further separating scheduling from building.
Build loops trying to substitute when output is missing #3964: My solution for that is to a split derivation goals into "plain goals", for which a method isn't yet choosen, and "build goals", for which no substitution is attempted and wantedOuptuts isn't needed. This is also a pretty major change to the scheduler that benefits from separating out these concepts.
Trust map substituting. This is needed so we don't need to download all the build-time dependencies of things just to figure out the resolved derivation we look up: (e.g don't want to download GCC just to figure out whether we can download bash). The "build goal" vs "plain goal" distinction from above might also come in handy, as dealing with trust maps and DRV resolving can be another planning aspect of plain goals and not clutter up building.
Derivations that build derivations: I decided this is best way to make [RFC 0040] "Ret-cont" recursive Nix rfcs#40. Just producing them is easy and done in Derivations can output "text-hashed" data #3959, but depending on those built derivations means changes to inputDrvs and the scheduler.

Hope this helps motivate this reactor :).

edolstra · 2020-10-09T15:44:26Z

The diff cost is temporary

It's actually a permanent cost: the history becomes permanently polluted. It's too bad git isn't better at following history across these kinds of refactors... Although apparently there is a very hacky way: https://devblogs.microsoft.com/oldnewthing/20190916-00/?p=102892

I'm not sure it's really worth it though. Incremental build times probably won't go down by much (since build/derivation-goal.cc is still huge) and non-incremental build times will go up.

Ericson2314 · 2020-10-09T16:35:23Z

Although apparently there is a very hacky way: https://devblogs.microsoft.com/oldnewthing/20190916-00/?p=102892

I'm more than happy to redo this with that trick.

[FWIW, it doesn't even feel "very hacky" to me. The underlying issue is git only knows states not patches, computing patches as needed. In those circumstances, it may well be better to hand-hold git with a careful history than increase the complexity of the diff calculation, as one would pay for the latter with every computed diff.

There are patch-theory-oriented version control systems like Darcs and https://pijul.org/ that solve this better by making the patch rather than state authoritative, but I don't think you are proposing that we switch to one of those at this time :).]

I'm not sure it's really worth it though. Incremental build times probably won't go down by much (since build/derivation-goal.cc is still huge) and non-incremental build times will go up.

Well to me the build times is not main point for this. Rather, it's structuring the code better so the subsequent refactors are easier to understand and maintain. If we were to do all those changes in the single build.cc, I'm worried we could easily hit, say, 11,000 lines. Surely we the large file and combination of concerns (building vs scheduling) is already hard to maintain and not something we'd want to make worse?

(Lastly, didn't you experiment with a way to do unity builds? We can always do that in non-incremental builds and never get a performance downside from splitting .cc files.)

Mathnerd314 · 2020-10-10T02:52:33Z

It's too bad git isn't better at following history

There is also the -C flag. Trying it on the foods example -C4 gives full history (because peas is 4 characters and he splits up the file by line). Here the chunks are larger and the default of 40 mostly works, although the merge commits do show up occasionally. -C10 is a little slower but resolved every nontrivial line of code I looked at. And git gui blame uses -C by default. So overall the history issue seems like a non-issue to me, although I guess that doesn't apply to GitHub or third-party blame tools missing the option.

Regarding the merge commits, is there a reason for them? Rebasing patches is based on the "patch-theory-oriented" view and I have found it much easier to resolve conflicts when working with patches. And of course the commit history looks cleaner when rebasing instead of merging.

… 'build-6', 'build-7' and 'build-8' into split_build_cc

Ericson2314 · 2020-10-11T16:50:14Z

OK I did the Raymond Chen method for now, but I have the old stuff at https://github.com/obsidiansystems/nix/commits/split_build_cc_old so we switch methods and recover @regnat's breaking down the big split into interated splits if we like.

@Mathnerd314

although I guess that doesn't apply to GitHub or third-party blame tools missing the option.

Sadly that might matter

Regarding the merge commits, is there a reason for them? Rebasing patches is based on the "patch-theory-oriented" view and I have found it much easier to resolve conflicts when working with patches. And of course the commit history looks cleaner when rebasing instead of merging.

So since git will not store nice patches that show how the file is split, when a PR that predates the now-mainlined split is rebased git will have a hard time figuring out what to do. (Is there e -C flag for patch applying, let alone git rebase?) The merge trick basically forces git to track the files properly despite having the lousy info per delta, by breaking up the interesting patch into boring steps.

…h-2', 'split_build_hh-3' and 'split_build_hh-4' into split_build_cc

Mathnerd314 · 2020-10-12T20:07:07Z

Regarding the merge commits

I was talking about the old version: master...obsidiansystems:split_build_cc_old. If you rebased you'd end up with only 4 commits, instead of 17. But handling all the open PRs nicely is more important than clean commit history.

Is there a -C flag for patch applying, let alone git rebase?

There is actually git apply -C but it's not useful here. And git rebase/merge use an interface into diff which only supports rename detection - although with enough work copy detection could be added. So the Raymond Chen method seems to be the most effective at present.

Ericson2314 · 2020-10-12T20:17:07Z

In other news, I added another round of this splitting build.hh, so now each of the new .cc files has a corresponding new .hh. I hope this makes the components more "genuinely separated".

edolstra · 2020-10-13T13:57:34Z

@Ericson2314 Thanks, looks great now!

Ericson2314 · 2020-10-13T14:22:54Z

Thank you! It means a lot :).

Ericson2314 · 2020-10-13T17:56:03Z

Bad news :(

Having done more merges since this was merged, I'm noticing Raymon Chen's method isn't working as well as it should. The issue is 3bab1c5, despite being a very simple removing of code, shows up as hot garbage.

On the command line, one can demonstrate this by comparing

git show --patience 3bab1c5bb0a56f850a7bc1bacc9f974b108cf601

with

git show 3bab1c5bb0a56f850a7bc1bacc9f974b108cf601

Evidentially the shear amount of deleted code is fooling git, and it starts aligning blank lines instead, with chaos naturally ensuing.

One thing I could attempt is to make a new branch off the parent of that commit, deleting the code little by little so git wouldn't get lost, and then merging that back in.

Another thing is my original offer of fixing merge conflicts for others still stands.

Very sorry this latest version of the history wasn't the panacea I thought it would be.

Mathnerd314 · 2020-10-13T19:57:27Z

There are a few settings that can be tweaked, git merge -X diff-algorithm=patience -X find-renames=<n>. But I don't have a merge handy to test.

Ericson2314 · 2020-10-13T20:20:24Z

Oh whew! That makes me feel a lot better. Thanks so much, @Mathnerd314.

Ericson2314 mentioned this pull request Oct 6, 2020

Split build.cc #3098

Closed

thufschmitt approved these changes Oct 7, 2020

View reviewed changes

Ericson2314 added 21 commits October 11, 2020 16:38

Split out commonChildInit

aef44cb

Prepare for build/* files

428536f

Rename to hand-hold git (build.hh)

fc72cb0

Trim build.hh

a4f0fec

Rename to hand-hold git (build/derivation-goal.cc)

9629290

Trim build/derivation-goal.cc

3bab1c5

Rename to hand-hold git (build/goal.cc)

184bfc3

Trim build/goal.cc

819fe84

Rename to hand-hold git (build/hook-instance.cc)

f0b8987

Trim build/hook-instance.cc

159054f

Rename to hand-hold git (build/local-store-build.cc)

dc5225c

Trim build/local-store-build.cc

4bdff7d

Rename to hand-hold git (build/substitution-goal.cc)

3633b35

Trim build/substitution-goal.cc

d24ffe0

Rename to hand-hold git (build/worker.cc)

904e315

Trim build/worker.cc

eed53ed

Rename to hand-hold git (lock.hh)

d0004bf

Trim lock.hh

dbc5886

Rename to hand-hold git (lock.cc)

e0be041

Trim lock.cc

bcb67e1

Merge branches 'build-1', 'build-2', 'build-3', 'build-4', 'build-5',…

8cc510f

… 'build-6', 'build-7' and 'build-8' into split_build_cc

Ericson2314 force-pushed the split_build_cc branch from a56e2ba to 8cc510f Compare October 11, 2020 16:48

Ericson2314 mentioned this pull request Oct 12, 2020

Build loops trying to substitute when output is missing #3964

Closed

Ericson2314 added 14 commits October 12, 2020 17:07

Prepare for build/*.hh headers

5a97621

Change .cc files to use split build headers

f709996

Rename to hand-hold git (derivation-goal.hh)

4eb8c69

Trim derivation-goal.hh

2ce7269

Rename to hand-hold git (goal.hh)

0d0e345

Trim goal.hh

8067d32

Rename to hand-hold git (hook-instance.hh)

10b749a

Trim hook-instance.hh

d585b4c

Rename to hand-hold git (substitution-goal.hh)

e77a234

Trim substitution-goal.hh

3ffa354

Rename to hand-hold git (worker.hh)

0e23062

Trim worker.hh

542972f

Merge branches 'split_build_hh-0', 'split_build_hh-1', 'split_build_h…

d334fd4

…h-2', 'split_build_hh-3' and 'split_build_hh-4' into split_build_cc

Merge remote-tracking branch 'upstream/master' into split_build_cc

a73959e

edolstra merged commit 2653801 into NixOS:master Oct 13, 2020

Ericson2314 deleted the split_build_cc branch October 13, 2020 14:22

thufschmitt mentioned this pull request Nov 4, 2020

Don't explicitely pattern-match on the various outputs #4112

Closed

nh2 mentioned this pull request Nov 25, 2020

Tell in detail why remote building fails on -v #3927

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split build.cc -- new version of #3098 #4114

Split build.cc -- new version of #3098 #4114

Ericson2314 commented Oct 6, 2020 •

edited

Loading

Ericson2314 commented Oct 6, 2020

thufschmitt commented Oct 7, 2020

edolstra commented Oct 7, 2020

Ericson2314 commented Oct 9, 2020

edolstra commented Oct 9, 2020

Ericson2314 commented Oct 9, 2020 •

edited

Loading

Mathnerd314 commented Oct 10, 2020

Ericson2314 commented Oct 11, 2020 •

edited

Loading

Mathnerd314 commented Oct 12, 2020

Ericson2314 commented Oct 12, 2020

edolstra commented Oct 13, 2020

Ericson2314 commented Oct 13, 2020 •

edited

Loading

Ericson2314 commented Oct 13, 2020 •

edited

Loading

Mathnerd314 commented Oct 13, 2020

Ericson2314 commented Oct 13, 2020

Split build.cc -- new version of #3098 #4114

Split build.cc -- new version of #3098 #4114

Conversation

Ericson2314 commented Oct 6, 2020 • edited Loading

Ericson2314 commented Oct 6, 2020

thufschmitt commented Oct 7, 2020

edolstra commented Oct 7, 2020

Ericson2314 commented Oct 9, 2020

edolstra commented Oct 9, 2020

Ericson2314 commented Oct 9, 2020 • edited Loading

Mathnerd314 commented Oct 10, 2020

Ericson2314 commented Oct 11, 2020 • edited Loading

Mathnerd314 commented Oct 12, 2020

Ericson2314 commented Oct 12, 2020

edolstra commented Oct 13, 2020

Ericson2314 commented Oct 13, 2020 • edited Loading

Ericson2314 commented Oct 13, 2020 • edited Loading

Mathnerd314 commented Oct 13, 2020

Ericson2314 commented Oct 13, 2020

Ericson2314 commented Oct 6, 2020 •

edited

Loading

Ericson2314 commented Oct 9, 2020 •

edited

Loading

Ericson2314 commented Oct 11, 2020 •

edited

Loading

Ericson2314 commented Oct 13, 2020 •

edited

Loading

Ericson2314 commented Oct 13, 2020 •

edited

Loading