Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow depending on content-addressed derivations #3776

Open
wants to merge 19 commits into
base: master
from

Conversation

@regnat
Copy link
Contributor

regnat commented Jul 2, 2020

Builds on top of #3740, and extends it to allow depending on ca-derivations, with a proper cutoff in case two different derivations in the build chain produce the same output (see the added test)

@Ericson2314
Copy link
Member

Ericson2314 commented Jul 2, 2020

Between the first and #3740, This does seem like a bunch code doubling down on the "real and fake outputs" approach, which I do worry will be a source of bugs.

Now that derivation outputs are fully parsed, I'd like to exploit that, going from

struct DerivationOutput
{
    StorePath path;
    std::optional<FixedOutputHash> hash; /* hash used for expected hash computation */
};

to

struct DerivationOutputLegacy {
    // StorePath path;
    // regnat pointed out that when we depend on a ca-derivation,
    // we cannot calculate downstream paths until the upstream one is built.
    std::optional<StorePath> path;
};
struct DerivationOutputCAFixed {
    /* hash used for expected hash computation */
    FixedOutputHash hash;
    // no need to store output store path, as it can be computed on the fly.
};
struct DerivationOutputCAFloating {
    /* hash used for ca path computation */
    HashType hashType;
};
typedef std::variant<
    DerivationOutputLegacy,
    DerivationOutputCAFixed,
    DerivationOutputCAFloating
> DerivationOutput;

/* accessors to make it easier to handle multiple cases at once. */
std::optional<StorePath> outputPath(DerivationOutput) { ... }
std::optional<HashType> outputHashAlgo(DerivationOutput) { ... }

I really do think we can make that change, fix all the type errors "on autopilot', and be quite close to a working version.


As a first, preparatory step, we can do:

struct DerivationOutputLegacy {
    StorePath path;
};
struct DerivationOutputCAFixed {
    /* hash used for expected hash computation */
    FixedOutputHash hash;
    // no need to store output store path, as it can be computed on the fly.
};
typedef std::variant<
    DerivationOutputLegacy,
    DerivationOutputCAFixed
> DerivationOutput;

/* accessors to make it easier to handle multiple cases at once. */
StorePath outputPath(DerivationOutput) { ... }

I think we'll also need to make derivations store their own name.

which is exactly isomorphic to today, to get the churn out of the way, and then add new new variant in a separate PR.

@Ericson2314
Copy link
Member

Ericson2314 commented Jul 2, 2020

https://github.com/NixOS/nix/pull/3754/files should also help. Every time in that PR I do something like

FixedOutputInfo {
    { .method = ..., .hash = ... },
    {} // empty references
}

the {} is practically a Mad Lib beginning to be filed in with richer reference data.

regnat added 4 commits Jun 18, 2020
Move the logic from the `nix make-content-addressed` nix command to a
new method in the `Store` class
This is required so that we can call LocalStore::makeContentAddressed on
paths that haven't been registered
Very limited atm:

- Only `nix-build` works properly (`nix-store`, `nix build`, etc.. will
  print and create a symlink to a non-existent output path)
- It isn't possible to depend on a content-addressed derivation
@regnat regnat force-pushed the tweag:ca-derivations-dependencies branch from d32357a to 75a06ee Jul 3, 2020
regnat added 3 commits Jul 3, 2020
Otherwise `resolveDerivation` changes the name of the derivation (and as
a consequence its hash)
Otherwise substituted or short-circuited paths are not properly
registered
@regnat regnat force-pushed the tweag:ca-derivations-dependencies branch from 75a06ee to bab7345 Jul 3, 2020
bool isModified = resolveDerivation(worker.store, *(dynamic_cast<Derivation *>(drv.get())));
// Define the actual outputs of the build
for (auto & i: drv->outputs)
actualOutputs.emplace(i.first, i.second.path);

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

It's confusing to me that here actualOutputs is initialized to what appears to be the non-actual outputs.

This comment has been minimized.

@Ericson2314

Ericson2314 Jul 7, 2020 Member

I think this is intentional, but this is the sort of foot-gun I want to avoid by making DerivationOutput use variants in my comment above.

// Replace the output path by the new CA one
// FIXME: Why can't this be done in one step?
actualOutputs.erase(i.first);
actualOutputs.emplace(i.first, info.path);

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

This can be replaced by insert_or_assign().

@@ -4248,6 +4286,7 @@ void DerivationGoal::done(BuildResult::Status status, std::optional<Error> ex)
mcRunningBuilds.reset();

if (result.success()) {
registerOutputMappings();

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

Why is this done here rather than in registerOutputs()? To me done() is a function that is only intended to interact with Worker to set error states and trigger dependent goals, so it shouldn't have side-effects like registering stuff in the database.

This comment has been minimized.

@regnat

regnat Jul 7, 2020 Author Contributor

There might be a better place for this indeed.
I didn't put it in registerOutputs because we want it to be called even when no build took place (either because we substituted the output or because we've short-cutted the build after resolving the derivation).

@@ -79,8 +79,6 @@ ref<FSAccessor> LocalFSStore::getFSAccessor()

void LocalFSStore::narFromPath(const StorePath & path, Sink & sink)
{
if (!isValidPath(path))
throw Error("path '%s' is not valid", printStorePath(path));

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

I think this check should stay since narFromPath is supposed to apply to valid paths only. For arbitrary paths you should use dumpPath().

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

If the issue is that build.cc is calling Store::makeContentAddressed() on an invalid path, then maybe that function should take a separate argument for the path (so it can use dumpPath()).

@@ -576,6 +576,17 @@ void LocalStore::checkDerivationOutputs(const StorePath & drvPath, const Derivat
}
}

void LocalStore::linkDeriverToPath(const StorePath deriver, const string outputName, const StorePath output)

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

This should be const StorePath & deriver etc. Otherwise we're passing by value which is less efficient.

This comment has been minimized.

@regnat

regnat Jul 7, 2020 Author Contributor

Oh forgot these, thanks

* (as given by `queryDerivationOutputMap`)
* **if this one differs from the one written in the derivation**
*/
bool resolveDerivation(Store & store, Derivation & drv);

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

What's the boolean return value?

This comment has been minimized.

@regnat

regnat Jul 7, 2020 Author Contributor

Oh my bad I forgot to document it.

It serves to indicate whether the resolving actually modified the derivation or not (to know whether to retry the substitution). But I'm not very happy with it, so I'd be happy if you have any suggestion on improving this

auto outputEnvVar = drv.env.find(i.first);
if (outputEnvVar != drv.env.end())
outputEnvVar->second = store.printStorePath(outPath);
debug(format("Rewrote output %1% to %2%")

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

format is obsolete, please do debug("rewrite output '%s' to '%s'", arg1, arg2).

for (auto & i : drv.outputs) {
// XXX: There's certainly a better and less error-prone way
// of getting the name than to look it up in the drv environment
string name = ParsedDerivation(StorePath::dummy, drv).getStringAttr("name").value_or("");

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

This can be lifted out of the for loop.

const StringMap & extraRewrites
)
{
bool hasSelfReference = info.references.find(info.path) != info.references.end();

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

=> bool hasSelfReference = info.references.count(info.path).

/* Create a new path similar to the one referred to by `info`, but that's
* content-addressed.
*
* This doesn't alter the original path

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

This should document whether info needs to denote a valid path.

std::optional<StorePath> deriver;
std::optional<string> outputname;
Comment on lines 117 to 118

This comment has been minimized.

@Ericson2314

Ericson2314 Jul 7, 2020 Member

Is the intent that we can have outputname whenever we have deriver, except for backwards compat with old infos?

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

BTW it looks like this field is not actually used anywhere.

This comment has been minimized.

@regnat

regnat Jul 9, 2020 Author Contributor

Indeed, I've added it for an experiment and forgot to remove it. This field doesn't really make any sense (even the deriver field should probably be deprecated as a store path can now have several derivers)

This comment has been minimized.

@Ericson2314

Ericson2314 Jul 9, 2020 Member

Yeah once we upload the trust mapping, we'll probably want to key it by the derivation not the output path, which also helps different signatures sign for different derivers.

@@ -115,6 +115,7 @@ struct ValidPathInfo
{
StorePath path;
std::optional<StorePath> deriver;
std::optional<string> outputname;

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

It seems incorrect to have an outputName field in ValidPathInfo. A CA store path can be produced by differently named output of different derivations (e.g. the "foo" output of derivation A could produce the same contents as the "bar" output of derivation B).

state->stmtAddDerivationOutput.use()
(queryValidPathId(*state, deriver))
(outputName)
(printStorePath(output))

This comment has been minimized.

@edolstra

edolstra Jul 7, 2020 Member

output is an "actual" output here, right? In that case, doesn't this cause confusion wrt the meaning of the DerivationOutputs table?

regnat added 6 commits Jul 9, 2020
narFromPath expects a valid path as input (and making it more general
would be complicated as it's a general method of the `Store` class), so
we just inline the bit of `makeContentAddressed` that we need (which is
bigger than what I though. Maybe I'm trying to retro-fit functional
idioms for which cpp isn't designed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.