Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffle BuildResult data definition, make state machine clearer, introduce SingleDrvOutputs #6312

Merged
merged 4 commits into from
Apr 17, 2023

Conversation

Ericson2314
Copy link
Member

@Ericson2314 Ericson2314 commented Mar 25, 2022

Motivation

There are two changes going on here: changing BuildResult and changing the goal state machine logic.

Introduce Worker::makeGoal

This takes a DerivedPath so the caller doesn't need to care about which sort of goal does what.

BuildResult changes

Make BuildResult be like it was before, and instead made KeyedBuildResult be a subclass wit the path. Only buildPathsWithResults returns KeyedBuildResults, everything else just becomes like it was before, where the "key" is unambiguous from context.

The idea here is that storing the path when it is already unambiguous from context denormalizes our data model, making two sources of truth that must be kept in sync. This isn't ideal.

This is especially notable for DerivationGoal, where two DerivedPath requests may correspond to the same DerivationGoal. Before, the BuildResult of that goal (stored, then returned) would just be the first one to create that goal. That is non-deterministic!

KeyedBuildResult is useful in the return type of KeyedBuildResult, but this is because it is conceptually returning an mapping of DerivedPath keys to BuildResult values. We could, for example, also use std::map<BuildResult, BuildResult> for this purpose (and probably also have the arguments be a std::set not std::vector for symmetry), in which case we would here also not need KeyedBuildResult.

State machine changes

The needRestart, retrySubstitution, and retriedSubstitution bools are replaced with enums. The enum cases and conditional logic are exhaustively documented to try to make the code easier to understand.

An especially thorny case with the old code was that needRestart was set to true if the outputs changed for a goal that had progressed to building, but needRestart would never be read again. This is an in fact a good thing operationally, as a build will produce all outputs and so restarting is not necessary. But this (a) being how it worked, and (b) being a good thing was quite obfuscated just reading the code!

The new code and comments call out this case explicitly.

Introduce SingleDrvOutputs

In many cases we are dealing with a collection of realisations, they are all outputs of the same derivation. In that case, we don't need "derivation hashes modulos" to be part of our map key, because the output names alone will be unique. Those hashes are still part of the realisation proper, so we aren't loosing any information, we're just "normalizing our schema" by narrowing the "primary key".

Besides making our data model a bit "tighter" this allows us to avoid a double for loop in DerivationGoal::waiteeDone. The inner for loop was previously just to select the output we cared about without knowing its hash. Now we can just select the output by name directly.

Context

Protocol non-changes

Note that neither protocol is changed as part of this: we are still transferring DrvOutputs over the wire for BuildResults. I would only consider revising this once #6223 is merged, and we can mention protocol versions inside factored-out serialization logic. Until then it is better not change anything because it would come a the cost of code reuse.

Initializers

Unfortunately, we need to avoid C++20 strictness on designated initializers.

(BTW https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2287r1.html
this offers some new syntax for this use-case. Hopefully this will be
adopted and we can eventually use it.)

No having that yet, maybe it would be better to not make KeyedBuildResult a subclass to just avoid this.

History of this PR

In #6311 (comment), I thought that since derivation goals' wanted outputs can "grow" due to overlapping dependencies (See DerivationGoal::addWantedOutputs, called by Worker::makeDerivationGoalCommon), the previous bug fix had an unfortunate side effect of causing more pointless rebuilds.

In particular, I was worried about this situation:

  1. Goal made from DerivedPath::Built { foo, OutputsSpec::Names { a } }.

  2. Goal gives up on on substituting, starts building.

  3. Goal made from DerivedPath::Built { foo, OutputsSpec::Names { b } }, in fact is just modified original goal.

  4. Though the goal had gotten as far as building, so all outputs were going to be produced, addWantedOutputs no longer knows that and so the goal is flagged to be restarted.

This might sound far-fetched with input-addressed drvs, where we usually basically have all our goals "planned out" before we start doing anything, but with CA derivation goals and especially RFC 92, where drv resolution means goals are created after some building is completed, it would be more likely to happen.

@edolstra, on the other hand, thought this was not an issue because even though needsRestart = true would be set, it would never be read again. Eelco might be right, but nonetheless this is still very confusing storing "needs restart" but in fact neither needing nor wanting a restart!

So the first thing I did was restore the clearing of wantedOutputs we used to do, and then filter the outputs in buildPathsWithResults to only get the ones we care about.

The KeyedBuildResult change technically was already needed due to the "modified derivational goal with second request" issue described in the motivation section, but become more urgent with this modification of wantedOutputs.

Eelco however (if I recalled correctly) in real time meeting (so no paper trail) also didn't like modifying wantedOutputs upon falling back to doing a build (as we did before), because we should only modify it in response to new requests --- actual wants --- and not because we are "incidentally" building all the outptus beyond what may have been requested.

That's a fair point, and the alternative is to replace the boolean soup with proper enums: Instead of modifying wantedOuputs som more, we'll modify needsRestart to indicate we are passed the need. That lead me to the bool -> enum changes, and then I did the substitution ones in addition too for consistency.

"Keyed" data types in general.

I think separating the "primary key" field(s) from the other fields is good practical in general. (I would like to do the same thing for ValidPathInfo, for example.) Besides the example in the motivation, there are probably other cases where we would like to do std::map<Key, ThingWithoutKey> where the "without keys" is necessary to not contain duplicate keys and just precludes the possibility of those duplicate keys being out of sync.

Checklist for maintainers

Maintainers: tick if completed or explain if not relevant

  • agreed on idea
  • agreed on implementation strategy
  • tests, as appropriate
    • functional tests - tests/**.sh
    • unit tests - src/*/tests
    • integration tests - tests/nixos/*
  • documentation in the manual
  • code and comments are self-explanatory
  • commit message explains why the change was made
  • new feature or incompatible change: updated release notes

@Ericson2314
Copy link
Member Author

Failure is spurious.

@Ericson2314
Copy link
Member Author

@edolstra For reference, the wantedOutputs clearing is something you did in 8d8d47a. Maybe looking at the commit will jog your memory on whether you were thinking of the same issue I was when you wrote that.

@Ericson2314 Ericson2314 changed the title Make KeyedBuildResult, put back BuildResult the way it was before Make KeyedBuildResult, BuildResult like before, and fix bug another way Mar 25, 2022
@Ericson2314 Ericson2314 changed the title Make KeyedBuildResult, BuildResult like before, and fix bug another way Fix race conditition causing other work, shuffle BuildResult data definition Mar 25, 2022
@Ericson2314 Ericson2314 changed the title Fix race conditition causing other work, shuffle BuildResult data definition Fix race condition causing other work, shuffle BuildResult data definition Mar 25, 2022
@Ericson2314 Ericson2314 changed the title Fix race condition causing other work, shuffle BuildResult data definition Fix race condition causing extra work, shuffle BuildResult data definition Mar 25, 2022
@edolstra
Copy link
Member

edolstra commented Apr 1, 2022

Though the goal had gotten as far as building, so all outputs were
going to be produced, addWantedOutputs no longer knows that and so
the goal is flagged to be restarted.

Why would it be restarted? The only place where we restart is in DerivationGoal::outputsSubstitutionTried(). Once the build has actually started, it builds all outputs and it won't be restarted, as far as I can tell.

@Ericson2314
Copy link
Member Author

Fair enough, but I think setting a needs-restart flag that is never read is at least as confusing as changing the wanted outputs.

I still prefer the way it was before, because the metadata on the accurately reflects what it is doing.

@stale stale bot added the stale label Oct 1, 2022
@Ericson2314 Ericson2314 changed the title Fix race condition causing extra work, shuffle BuildResult data definition Shuffle BuildResult data definition, make state machine clearer Feb 14, 2023
@stale stale bot removed the stale label Feb 14, 2023
@roberth roberth self-assigned this Feb 15, 2023
@roberth
Copy link
Member

roberth commented Feb 26, 2023

Data model improvement and worker protocol no-ops lgtm.
Just need to look into the src/libstore/build/entry-points.cc changes still.

@fricklerhandwerk fricklerhandwerk added the store Issues and pull requests concerning the Nix store label Mar 3, 2023
Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting some improvements to buildPathsWithResults.
Another issue I had was the heavy abbreviation of local variables, making the code harder to understand. I'm already suggesting some things that can be factored out, so perhaps it won't be as much of a problem anymore.

req/reqs: this is fine
g will only span 3 lines, but why not goal?
reqs2 -> state as mentioned
gp: only used once, but goalPtr is still easier to read
pbp: might go away?
bp: contains multiple outputs. We've already established that that's an unnecessary grouping in most cases, so we'll probably revisit that anyway.
bos: move to goal method, change to results.

src/libstore/build/entry-points.cc Outdated Show resolved Hide resolved
},
}, br.raw());
}, req.raw());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This visit could be moved into worker.
DerivedPath is a very fundamental thing, so I think this would be appropriate.

make is a lie, but that's out of scope for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goal is also a lie. MutableWorkUnit? Not sure if that name helps though. Let's not move the Goal posts for this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit of #4628 is I think just what you want! Added it here.

});

auto pbp = std::get_if<DerivedPath::Built>(&req);
if (!pbp) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we could return a BuildResult for opaque paths too. Substituted, AlreadyValid, NoSubstituters are appropriate statuses.

However, that might be somewhat of a breaking change. Maybe add a new method in a followup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh we are, that continue is after making a KeyedBuildResult.

This is a sign to make the continue is too tricky, so I am just doing a regular if block on the rest.

Comment on lines 87 to 90
/* Because goals are in general shared between derived paths
that share the same derivation, we need to filter their
results to get back just the results we care about.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aforementioned lies make this non-obvious. Short of that, I think we can simplify this a bit

        /* Goals are mutable shared state, that may contain more
           outputs than we requested.
         */

ie only the weird part. However, I think this code should be moved.

Part of the problem is the gp->buildResult is too easy. How about we isolate the nastiness of mutable state by encapsulating gp? Then we can provide a method that requests the right outputs and make this mistake impossible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines 1486 to 1508
if (!useDerivation) return;
auto & fullDrv = *dynamic_cast<Derivation *>(drv.get());

auto * dg = dynamic_cast<DerivationGoal *>(&*waitee);
if (!dg) return;

auto outputs = fullDrv.inputDrvs.find(dg->drvPath);
if (outputs == fullDrv.inputDrvs.end()) return;

for (auto & outputName : outputs->second) {
auto buildResult = dg->getBuildResult(DerivedPath::Built {
.drvPath = dg->drvPath,
.outputs = OutputsSpec::Names { outputName },
});
if (buildResult.success()) {
for (auto & [output, realisation] : buildResult.builtOutputs) {
inputDrvOutputs.insert_or_assign(
{ bfd->drvPath, output.outputName },
{ dg->drvPath, output.outputName },
realisation.outPath);
}
}
}
}
Copy link
Member Author

@Ericson2314 Ericson2314 Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This became more awkward after buildResult was encapsulated because one cannot easily index builtOutputs without knowing the hash modulo / fixed output hash.

I think what this means is that

DrvPaths builtOutputs;

is the wrong field for BuildResult; instead of that std::map<DrvOutput, Realisation> it should be

std::map<std::string, Realisation> builtOutputs;

since we are only returning realizations for a single top-level derivation.

Realisation.id.drvHash gets us that hash, that was removed from the map key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed one more commit which addresses this awkwardness, getting rid of the double for loop.

@Ericson2314 Ericson2314 changed the title Shuffle BuildResult data definition, make state machine clearer Shuffle BuildResult data definition, make state machine clearer, introduce SingleDrvOutputs Apr 14, 2023
src/libstore/build/derivation-goal.cc Outdated Show resolved Hide resolved
src/libstore/build/derivation-goal.hh Outdated Show resolved Hide resolved
Comment on lines 163 to 176
void write(const Store & store, Sink & to, const BuildResult & res)
{
worker_proto::write(store, to, res.path);
to
<< res.status
<< res.errorMsg
<< res.timesBuilt
<< res.isNonDeterministic
<< res.startTime
<< res.stopTime;
worker_proto::write(store, to, res.builtOutputs);
DrvOutputs builtOutputs;
for (auto & [output, realisation] : res.builtOutputs)
builtOutputs.insert_or_assign(realisation.id, realisation);
worker_proto::write(store, to, builtOutputs);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a breaking change to the protocol?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err my intent was to not change the protocol, converting it to the old way. (I should add that the commit message and PR description!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The top line is changed in a previous commit; that is the difference between BuildResult and KeyedBuildResult. The latter is now used everywhere needed to preserve the protocol.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what its worth, after #6223 is merged the factored out serialization logic can use protocol versions, and then I can deduplicate this stuff much more aggressively :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really register the other changes to the write functions. This makes sense. Does the daemon compatibility test pass?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Yes it does!

Ericson2314 and others added 4 commits April 15, 2023 11:01
This takes a `DerivedPath` so the caller doesn't need to care about
which sort of goal does what.
…er way

In NixOS#6311 (comment), I
realized since derivation goals' wanted outputs can "grow" due to
overlapping dependencies (See `DerivationGoal::addWantedOutputs`, called
by `Worker::makeDerivationGoalCommon`), the previous bug fix had an
unfortunate side effect of causing more pointless rebuilds.

In paticular, we have this situation:

1. Goal made from `DerivedPath::Built { foo, {a} }`.

2. Goal gives on on substituting, starts building.

3. Goal made from `DerivedPath::Built { foo, {b} }`, in fact is just
   modified original goal.

4. Though the goal had gotten as far as building, so all outputs were
   going to be produced, `addWantedOutputs` no longer knows that and so
   the goal is flagged to be restarted.

This might sound far-fetched with input-addressed drvs, where we usually
basically have all our goals "planned out" before we start doing
anything, but with CA derivation goals and especially RFC 92, where *drv
resolution* means goals are created after some building is completed, it
is more likely to happen.

So the first thing to do was restore the clearing of `wantedOutputs` we
used to do, and then filter the outputs in `buildPathsWithResults` to
only get the ones we care about.

But fix also has its own side effect in that the `DerivedPath` in the
`BuildResult` in `DerivationGoal` cannot be trusted; it is merely the
*first* `DerivedPath` for which this goal was originally created.

To remedy this, I made `BuildResult` be like it was before, and instead
made `KeyedBuildResult` be a subclass wit the path. Only
`buildPathsWithResults` returns `KeyedBuildResult`s, everything else
just becomes like it was before, where the "key" is unambiguous from
context.

I think separating the "primary key" field(s) from the other fields is
good practical in general anyways. (I would like to do the same thing
for `ValidPathInfo`.) Among other things, it allows constructions like
`std::map<Key, ThingWithKey>` where doesn't contain duplicate keys and
just precludes the possibility of those duplicate keys being out of
sync.

We might leverage the above someday to overload `buildPathsWithResults`
to take a *set* of return a *map* per the above.

-----

Unfortunately, we need to avoid C++20 strictness on designated
initializers.

(BTW
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2287r1.html
this offers some new syntax for this use-case. Hopefully this will be
adopted and we can eventually use it.)

No having that yet, maybe it would be better to not make
`KeyedBuildResult` a subclass to just avoid this.

Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
If my memory is correct, @edolstra objected to modifying `wantedOutputs`
upon falling back to doing a build (as we did before), because we should
only modify it in response to new requests --- *actual* wants --- and
not because we are "incidentally" building all the outptus beyond what
may have been requested.

That's a fair point, and the alternative is to replace the boolean soup
with proper enums: Instead of modifying `wantedOuputs` som more, we'll
modify `needsRestart` to indicate we are passed the need.
In many cases we are dealing with a collection of realisations, they are
all outputs of the same derivation. In that case, we don't need
"derivation hashes modulos" to be part of our map key, because the
output names alone will be unique. Those hashes are still part of the
realisation proper, so we aren't loosing any information, we're just
"normalizing our schema" by narrowing the "primary key".

Besides making our data model a bit "tighter" this allows us to avoid a
double `for` loop in `DerivationGoal::waiteeDone`. The inner `for` loop
was previously just to select the output we cared about without knowing
its hash. Now we can just select the output by name directly.

Note that neither protocol is changed as part of this: we are still
transferring `DrvOutputs` over the wire for `BuildResult`s. I would only
consider revising this once NixOS#6223 is merged, and we can mention protocol
versions inside factored-out serialization logic. Until then it is
better not change anything because it would come a the cost of code
reuse.
@roberth roberth merged commit 3f9589f into NixOS:master Apr 17, 2023
@Ericson2314 Ericson2314 deleted the keyed-build-result branch April 17, 2023 16:21
break;
case RetrySubstitution::YesNeed:
// Should not be able to reach this state from here.
assert(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it has been reached

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
store Issues and pull requests concerning the Nix store
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants