Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make buildPathsWithResults() only return info on wanted outputs #6311

Merged
merged 6 commits into from
Mar 25, 2022

Conversation

edolstra
Copy link
Member

This fixes a regression in nix build (mentioned here) where it would print all outputs when building, but only the wanted outputs otherwise. Now it only prints the wanted outputs, as before.

It also fixes a potential infinite loop when retrying substitutions after an incomplete closure.

This avoids an infinite loop in the final test in
tests/binary-cache.sh. I think this was only not triggered previously
by accident (because we were clearing wantedOutputs in between).
/* At least one of the output paths could not be
produced using a substitute. So we have to build instead. */
void DerivationGoal::gaveUpOnSubstitution()
{
/* Make sure checkPathValidity() from now on checks all
outputs. */
wantedOutputs.clear();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't see any reason why checkPathValidity() needs to check all outputs. In any case, messing with wantedOutputs seems dirty...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is dirty, but the idea I think (I did not write that but remember reading it IIRC) that since we are now doing the build, we will be making all afresh outputs, and we should fail if the derivation did not produce all the outputs even if we already had the missing outputs before.

Without this, we run the risk of some wonky non-deterministic derivation that produces a different subset of the outputs each time not failing as it should.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, another reason is that if during a build we discover we in fact need other outputs, it is nice to know we don't need to restart a build that got so upgraded. See DerivationGoal::addWantedOutputs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A consequence of this is that I think buildPathsWithResults is going to need to filter the results based on the derived path. I will make a PR removing the derived path from the build result stored by the derivation goal, because I no longer think it is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, what outputs get build/registered (or whose absence causes the build to fail) is unaffected by wantedOutputs or initialOutputs.*.wanted. So I think it should be okay. Restarts are only needed in the substitution case, not when we're actually building.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edolstra On what basis do you think this?

See Worker::makeDerivationGoalCommon. If a new goal needs to be created that overlaps with an existing goal, this function is called. This may occur at any point over the derivation goal's lifecycle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix this in the other PR too, sigh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think what? I'm not sure I understand the problem...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the long PR description in #6312

@edolstra edolstra added this to the nix-2.8 milestone Mar 24, 2022
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Mar 25, 2022
In NixOS#6311 (comment), I
realized since derivation goals' wanted outputs can "grow" due to
overlapping dependencies (See `DerivationGoal::addWantedOutputs`), the
`DerivedPath` in the `BuildResult` in `DerivationGoal` cannot be
trusted; it is merely the *first* `DerivedPath` for which this goal was
originally created.

To remedy this, I made `BuildResult` be like it was before, and instead
made `KeyedBuildResult` be a subclass wit the path. Only
`buildPathsWithResults` returns `KeyedBuildResult`s, everything else just
becomes like it was before, where the "key" is unambiguous from context.

I think separating the "primary key" field(s) from the other fields is
good practical in general anyways. (I would like to do the same thing
for `ValidPathInfo`.) Among other things, it allows constructions like
`std::map<Key, ThingWithKey>` where doesn't contain duplicate keys and
just precludes the possibility of those duplicate keys being out of
sync.

We might leverage the above to someday overload `buildPathsWithResults`
to take a *set* of return a *map* per the above.
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Mar 25, 2022
In NixOS#6311 (comment), I
realized since derivation goals' wanted outputs can "grow" due to
overlapping dependencies (See `DerivationGoal::addWantedOutputs`), the
`DerivedPath` in the `BuildResult` in `DerivationGoal` cannot be
trusted; it is merely the *first* `DerivedPath` for which this goal was
originally created.

To remedy this, I made `BuildResult` be like it was before, and instead
made `KeyedBuildResult` be a subclass wit the path. Only
`buildPathsWithResults` returns `KeyedBuildResult`s, everything else just
becomes like it was before, where the "key" is unambiguous from context.

I think separating the "primary key" field(s) from the other fields is
good practical in general anyways. (I would like to do the same thing
for `ValidPathInfo`.) Among other things, it allows constructions like
`std::map<Key, ThingWithKey>` where doesn't contain duplicate keys and
just precludes the possibility of those duplicate keys being out of
sync.

We might leverage the above to someday overload `buildPathsWithResults`
to take a *set* of return a *map* per the above.
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Mar 25, 2022
In NixOS#6311 (comment), I
realized since derivation goals' wanted outputs can "grow" due to
overlapping dependencies (See `DerivationGoal::addWantedOutputs`), the
`DerivedPath` in the `BuildResult` in `DerivationGoal` cannot be
trusted; it is merely the *first* `DerivedPath` for which this goal was
originally created.

To remedy this, I made `BuildResult` be like it was before, and instead
made `KeyedBuildResult` be a subclass wit the path. Only
`buildPathsWithResults` returns `KeyedBuildResult`s, everything else just
becomes like it was before, where the "key" is unambiguous from context.

I think separating the "primary key" field(s) from the other fields is
good practical in general anyways. (I would like to do the same thing
for `ValidPathInfo`.) Among other things, it allows constructions like
`std::map<Key, ThingWithKey>` where doesn't contain duplicate keys and
just precludes the possibility of those duplicate keys being out of
sync.

We might leverage the above someday to overload `buildPathsWithResults`
to take a *set* of return a *map* per the above.
Co-authored-by: John Ericson <git@JohnEricson.me>
@edolstra edolstra merged commit 16cf1e6 into NixOS:master Mar 25, 2022
@edolstra edolstra deleted the return-wanted-paths branch March 25, 2022 14:44
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Mar 25, 2022
…er way

In NixOS#6311 (comment), I
realized since derivation goals' wanted outputs can "grow" due to
overlapping dependencies (See `DerivationGoal::addWantedOutputs`, called
by `Worker::makeDerivationGoalCommon`), the previous bug fix had an
unfortunate side effect of causing more pointless rebuilds.

In paticular, we have this situation:

1. Goal made from `DerivedPath::Built { foo, {a} }`.

2. Goal gives on on substituting, starts building.

3. Goal made from `DerivedPath::Built { foo, {b} }`, in fact is just
   modified original goal.

4. Though the goal had gotten as far as building, so all outputs were
   going to be produced, `addWantedOutputs` no longer knows that and so
   the goal is flagged to be restarted.

This might sound far-fetched with input-addressed drvs, where we usually
basically have all our goals "planned out" before we start doing
anything, but with content addressed goals and especially RFC 92, where
*drv resolution* means goals are created after some building is
completed, it is more likely to happen.

So the first thing to do was restore the clearing of `wantedOutputs` we
used to do, and then filter the outputs in `buildPathsWithResults` to
only get the ones we care about.

But fix also has its own side effect in that the `DerivedPath` in the
`BuildResult` in `DerivationGoal` cannot be trusted; it is merely the
*first* `DerivedPath` for which this goal was originally created.

To remedy this, I made `BuildResult` be like it was before, and instead
made `KeyedBuildResult` be a subclass wit the path. Only
`buildPathsWithResults` returns `KeyedBuildResult`s, everything else
just becomes like it was before, where the "key" is unambiguous from
context.

I think separating the "primary key" field(s) from the other fields is
good practical in general anyways. (I would like to do the same thing
for `ValidPathInfo`.) Among other things, it allows constructions like
`std::map<Key, ThingWithKey>` where doesn't contain duplicate keys and
just precludes the possibility of those duplicate keys being out of
sync.

We might leverage the above someday to overload `buildPathsWithResults`
to take a *set* of return a *map* per the above.
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Mar 25, 2022
…er way

In NixOS#6311 (comment), I
realized since derivation goals' wanted outputs can "grow" due to
overlapping dependencies (See `DerivationGoal::addWantedOutputs`, called
by `Worker::makeDerivationGoalCommon`), the previous bug fix had an
unfortunate side effect of causing more pointless rebuilds.

In paticular, we have this situation:

1. Goal made from `DerivedPath::Built { foo, {a} }`.

2. Goal gives on on substituting, starts building.

3. Goal made from `DerivedPath::Built { foo, {b} }`, in fact is just
   modified original goal.

4. Though the goal had gotten as far as building, so all outputs were
   going to be produced, `addWantedOutputs` no longer knows that and so
   the goal is flagged to be restarted.

This might sound far-fetched with input-addressed drvs, where we usually
basically have all our goals "planned out" before we start doing
anything, but with CA derivation goals and especially RFC 92, where *drv
resolution* means goals are created after some building is completed, it
is more likely to happen.

So the first thing to do was restore the clearing of `wantedOutputs` we
used to do, and then filter the outputs in `buildPathsWithResults` to
only get the ones we care about.

But fix also has its own side effect in that the `DerivedPath` in the
`BuildResult` in `DerivationGoal` cannot be trusted; it is merely the
*first* `DerivedPath` for which this goal was originally created.

To remedy this, I made `BuildResult` be like it was before, and instead
made `KeyedBuildResult` be a subclass wit the path. Only
`buildPathsWithResults` returns `KeyedBuildResult`s, everything else
just becomes like it was before, where the "key" is unambiguous from
context.

I think separating the "primary key" field(s) from the other fields is
good practical in general anyways. (I would like to do the same thing
for `ValidPathInfo`.) Among other things, it allows constructions like
`std::map<Key, ThingWithKey>` where doesn't contain duplicate keys and
just precludes the possibility of those duplicate keys being out of
sync.

We might leverage the above someday to overload `buildPathsWithResults`
to take a *set* of return a *map* per the above.
@Ericson2314 Ericson2314 mentioned this pull request Mar 31, 2022
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-27/18433/1

Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Feb 24, 2023
…er way

In NixOS#6311 (comment), I
realized since derivation goals' wanted outputs can "grow" due to
overlapping dependencies (See `DerivationGoal::addWantedOutputs`, called
by `Worker::makeDerivationGoalCommon`), the previous bug fix had an
unfortunate side effect of causing more pointless rebuilds.

In paticular, we have this situation:

1. Goal made from `DerivedPath::Built { foo, {a} }`.

2. Goal gives on on substituting, starts building.

3. Goal made from `DerivedPath::Built { foo, {b} }`, in fact is just
   modified original goal.

4. Though the goal had gotten as far as building, so all outputs were
   going to be produced, `addWantedOutputs` no longer knows that and so
   the goal is flagged to be restarted.

This might sound far-fetched with input-addressed drvs, where we usually
basically have all our goals "planned out" before we start doing
anything, but with CA derivation goals and especially RFC 92, where *drv
resolution* means goals are created after some building is completed, it
is more likely to happen.

So the first thing to do was restore the clearing of `wantedOutputs` we
used to do, and then filter the outputs in `buildPathsWithResults` to
only get the ones we care about.

But fix also has its own side effect in that the `DerivedPath` in the
`BuildResult` in `DerivationGoal` cannot be trusted; it is merely the
*first* `DerivedPath` for which this goal was originally created.

To remedy this, I made `BuildResult` be like it was before, and instead
made `KeyedBuildResult` be a subclass wit the path. Only
`buildPathsWithResults` returns `KeyedBuildResult`s, everything else
just becomes like it was before, where the "key" is unambiguous from
context.

I think separating the "primary key" field(s) from the other fields is
good practical in general anyways. (I would like to do the same thing
for `ValidPathInfo`.) Among other things, it allows constructions like
`std::map<Key, ThingWithKey>` where doesn't contain duplicate keys and
just precludes the possibility of those duplicate keys being out of
sync.

We might leverage the above someday to overload `buildPathsWithResults`
to take a *set* of return a *map* per the above.
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Apr 14, 2023
…er way

In NixOS#6311 (comment), I
realized since derivation goals' wanted outputs can "grow" due to
overlapping dependencies (See `DerivationGoal::addWantedOutputs`, called
by `Worker::makeDerivationGoalCommon`), the previous bug fix had an
unfortunate side effect of causing more pointless rebuilds.

In paticular, we have this situation:

1. Goal made from `DerivedPath::Built { foo, {a} }`.

2. Goal gives on on substituting, starts building.

3. Goal made from `DerivedPath::Built { foo, {b} }`, in fact is just
   modified original goal.

4. Though the goal had gotten as far as building, so all outputs were
   going to be produced, `addWantedOutputs` no longer knows that and so
   the goal is flagged to be restarted.

This might sound far-fetched with input-addressed drvs, where we usually
basically have all our goals "planned out" before we start doing
anything, but with CA derivation goals and especially RFC 92, where *drv
resolution* means goals are created after some building is completed, it
is more likely to happen.

So the first thing to do was restore the clearing of `wantedOutputs` we
used to do, and then filter the outputs in `buildPathsWithResults` to
only get the ones we care about.

But fix also has its own side effect in that the `DerivedPath` in the
`BuildResult` in `DerivationGoal` cannot be trusted; it is merely the
*first* `DerivedPath` for which this goal was originally created.

To remedy this, I made `BuildResult` be like it was before, and instead
made `KeyedBuildResult` be a subclass wit the path. Only
`buildPathsWithResults` returns `KeyedBuildResult`s, everything else
just becomes like it was before, where the "key" is unambiguous from
context.

I think separating the "primary key" field(s) from the other fields is
good practical in general anyways. (I would like to do the same thing
for `ValidPathInfo`.) Among other things, it allows constructions like
`std::map<Key, ThingWithKey>` where doesn't contain duplicate keys and
just precludes the possibility of those duplicate keys being out of
sync.

We might leverage the above someday to overload `buildPathsWithResults`
to take a *set* of return a *map* per the above.

-----

Unfortunately, we need to avoid C++20 strictness on designated
initializers.

(BTW
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2287r1.html
this offers some new syntax for this use-case. Hopefully this will be
adopted and we can eventually use it.)

No having that yet, maybe it would be better to not make
`KeyedBuildResult` a subclass to just avoid this.

Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Apr 15, 2023
…er way

In NixOS#6311 (comment), I
realized since derivation goals' wanted outputs can "grow" due to
overlapping dependencies (See `DerivationGoal::addWantedOutputs`, called
by `Worker::makeDerivationGoalCommon`), the previous bug fix had an
unfortunate side effect of causing more pointless rebuilds.

In paticular, we have this situation:

1. Goal made from `DerivedPath::Built { foo, {a} }`.

2. Goal gives on on substituting, starts building.

3. Goal made from `DerivedPath::Built { foo, {b} }`, in fact is just
   modified original goal.

4. Though the goal had gotten as far as building, so all outputs were
   going to be produced, `addWantedOutputs` no longer knows that and so
   the goal is flagged to be restarted.

This might sound far-fetched with input-addressed drvs, where we usually
basically have all our goals "planned out" before we start doing
anything, but with CA derivation goals and especially RFC 92, where *drv
resolution* means goals are created after some building is completed, it
is more likely to happen.

So the first thing to do was restore the clearing of `wantedOutputs` we
used to do, and then filter the outputs in `buildPathsWithResults` to
only get the ones we care about.

But fix also has its own side effect in that the `DerivedPath` in the
`BuildResult` in `DerivationGoal` cannot be trusted; it is merely the
*first* `DerivedPath` for which this goal was originally created.

To remedy this, I made `BuildResult` be like it was before, and instead
made `KeyedBuildResult` be a subclass wit the path. Only
`buildPathsWithResults` returns `KeyedBuildResult`s, everything else
just becomes like it was before, where the "key" is unambiguous from
context.

I think separating the "primary key" field(s) from the other fields is
good practical in general anyways. (I would like to do the same thing
for `ValidPathInfo`.) Among other things, it allows constructions like
`std::map<Key, ThingWithKey>` where doesn't contain duplicate keys and
just precludes the possibility of those duplicate keys being out of
sync.

We might leverage the above someday to overload `buildPathsWithResults`
to take a *set* of return a *map* per the above.

-----

Unfortunately, we need to avoid C++20 strictness on designated
initializers.

(BTW
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2287r1.html
this offers some new syntax for this use-case. Hopefully this will be
adopted and we can eventually use it.)

No having that yet, maybe it would be better to not make
`KeyedBuildResult` a subclass to just avoid this.

Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants