-
-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow building content-addressed derivations with hydra - reprise #1228
Conversation
Allow building (floating) content-addressed derivations with Hydra. There's some UI artifacts here and there, but everything should work fine otherwise
This is something that Nix should do internally as part of the build loop, but since we're partially reimplementing it, we need to also reimplement this logic
This hadn't been caught anywhere until now as all the tests run with `destStore==localStore`
Otherwise ca-derivations are always assumed to be the same because their output path is empty
Unless it's also the local store, this will likely fail because this call requires having the derivation handy, which the dest store generally won't have
The dest store doesn't need to have the derivation, so rather first ask the local store for all the built outputs, and directly pass these to `getBuildOutput`
Fix a bug introduced by bb12e9b
Ensure that if a path is present locally but not in the destination store, then we properly copy it
Looks like it has never been used because the file was ill-named
Make it follow the same APIs and conventions as the rest of the testsuite
Only ca derivations might have an empty `path` field (because this one will only be known at a later stage)
That way it can be reused more generically
This causes the `Output store paths` field to always appear empty (regardless of whether the drv is CA or not), which isn't good
Once the build is done, update the `BuildOutputs` table to add the output paths of the derivation
- Test that the `build` page loads correctly - Test that after a succesful build, all the corresponding outputs have their `path` correctly set
The previous logic was (probably) crashing hydra-queue-runner if 1. The `ca-derivation` feature wasn't enabled 2. An output of the current step was present on the local store but not on the dest store The reason for that is that the code was trying to call `localStore->queryRealisation` which only works if `ca-derivations` is enabled
Make it so that hydra still correctly shortcuts input-addressed derivations, even when the `ca-derivations` experimental feature is enabled. (The substitution of content-addressed derivations doesn't work properly in hydra yet)
Not much to document, but at least it’s there :)
Otherwise it’s a bit tricky to use it as a binary cache
- Only call `tryResolve` when we now that we can do it safely - Make sure that we always copy *all* the inputs of the derivation This is missing a nice test, but it’s also a bit of a pain to write, so it’ll wait a bit.
Fix a small mistake that caused an assertion failure in such a case
It’s possible that their inputs don’t have a properly registered `realisation` (coming from an older Nix), so don’t assume that
Makes the whole thing marginally easier to read
Now `BuildStepOutputs` has a new boolean column `contentAddressed`. Also regenerated the relative Perl.
Thanks to @dasJ (who helped me a lot with Perl though he also raised the issue 😅) I improved the UI. Non-CA derivations are shown exactly as before, instead for CA derivations the following is shown: While it's building: After: In the end I chose to add a new row for the step outputs, initially I tried doing this querying the store from Perl to understand if a derivation was CA, I think that this way is much cleaner. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-33/20048/1 |
Never mind. I thought it started to evaluate but it somehow ran into a timeout and now I get the same error message again. |
Trying this PR out in prod (terrible idea I know) and it seems like there is no database migration for it yet The following manual migration seems to work for me: ALTER TABLE BuildOutputs ALTER COLUMN path DROP NOT NULL;
ALTER TABLE BuildStepOutputs ALTER COLUMN path DROP NOT NULL;
ALTER TABLE BuildStepOutputs ADD contentAddressed BOOLEAN NOT NULL DEFAULT 'f'; In my very limited testing it seems to work, including with remote building of regular |
TBH I have no idea how to debug this (and however I didn't try to replicate it yet).
Ooops...sorry, I should have written to not do so, I thought about it but then I forgot. The reason is that I wanted to be sure about the ultimate changes in the tables before. Sorry again. However I'm super happy to see that there is so much interest into this. |
The chat doesn’t seem to be public unfortunately |
localStore->registerDrvOutput(realisation); | ||
destStore->registerDrvOutput(realisation); | ||
|
||
// Also register the unresolved one | ||
auto unresolvedRealisation = realisation; | ||
unresolvedRealisation.signatures.clear(); | ||
unresolvedRealisation.id.drvHash = outputHashes.at(outputId.outputName); | ||
localStore->registerDrvOutput(unresolvedRealisation); | ||
destStore->registerDrvOutput(unresolvedRealisation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remote-building CA Derivations seems to fail here because the destStore already has the realisation registered. incorrect, see below. My hydra has this patch applied although you can probably remove the try {} catch for the localStore and pattern match the error to make sure it is about a constraint failure when inserting into Realisations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, also removing the try/catch for the local store and pattern matching the other errors.
Do you already know how can we check that the error is caused by a constraint failure?
However I would prefer leaving the final word to @thufschmitt about this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t know enough about nix’s c++ code to know how to match a specific error Will try changing the catch type to SQLiteError
also it seems i was wrong about the “remove the localStore try catch blocks” because the error only seems to occur on the local parts my bad
Jul 02 10:02:54 nas hydra-queue-runner[1352608]: Failed registering realisation locally: error: executing SQLite statement '
Jul 02 10:02:54 nas hydra-queue-runner[1352608]: insert into Realisations (drvPath, outputName, outputPath, signatures)
Jul 02 10:02:54 nas hydra-queue-runner[1352608]: values ('sha256:4768517dfe0afebd04abc714af41ddd08785e3067abd98471ad2b608e8448a1f', 'out', (select id from ValidPaths where path = '/nix/store/cna07xr4yfh8ck0q00ibbpgqgmcmxrf4-system-units'), '')
Jul 02 10:02:54 nas hydra-queue-runner[1352608]: ;': constraint failed (in '/nix/var/nix/db/db.sqlite') (already exists?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying this out made me discover that the issue is not that the realisation already exists but that the out output is not copied over for some reason
SubstitutablePathInfos infos; | ||
localStore->querySubstitutablePathInfos({{*path, {}}}, infos); | ||
localStore->querySubstitutablePathInfos({{*maybePath, {}}}, infos); | ||
if (infos.size() == 1) | ||
avail++; | ||
} | ||
} | ||
|
||
if (missing.size() == avail) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that copying dependencies to remote builders does not work as I am seeing error messages in the form
Jul 02 08:01:15 nas hydra-queue-runner[765078]: possibly transient failure building ‘/nix/store/r486r6s30ikrqdsb2rrrl5kb1jxhccr9-etc.drv’ on ‘build-pc’: error: some dependencies of '/nix/store/r486r6s30ikrqdsb2rrrl5kb1jxhccr9-etc.drv' are missing
Having success with remote building and this patch: DarkKirb@b3eedbf I am not sure if copying step->drv->inputSrcs is needed. I don’t know why copying the derivation outputs back to the machine running hydra-queue-runner needs to be explicit, I think it was working before even though this particular code was not touched? |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/how-to-make-nixpkgs-more-eco-friendly-use-less-resources/20976/12 |
did you find any time to look into or what could I do to debug this? |
Using this together with DarkKirb@b3eedbf, I've experienced hydra-queue-runner silently hanging inside |
Do you have a branch with all of these combined? |
https://github.com/t184256/hydra/commits/nix-ca-reprise (d9c0957) is what I ended up running when I mashed it all up together. But consider these just tips for those who want to try this PR in its current form to save them some debugging time; a proper rebase of it onto the latest master would be the a better solution. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/content-addressed-nix-call-for-testers/12881/219 |
Based on the above comments fron @t184256 I merged this PR into the current master, rebased two patches that no longer applied as SuperSandro2000@e43b298 and SuperSandro2000@c2685b4 and applied the other commits done by @DarkKirb . See https://github.com/SuperSandro2000/hydra/commits/ca-derivations-reprise for full history. I am locally applying it via the following patches I append in an overlay. patches = [
# ca-derivation support
(let
patch = fetchpatch {
url = "https://github.com/NixOS/hydra/pull/1228.diff";
excludes = [ "t/**" ];
hash = "sha256-wPiMSgETENmFtmITV41/EAo9yar0aPpcZXkQium3zf4=";
};
in runCommand "" {
nativeBuildInputs = [ patchutils ];
} /* bash */ ''
filterdiff -p1 -x src/hydra-queue-runner/build-remote.cc ${patch} > $out
filterdiff -p1 -i src/hydra-queue-runner/build-remote.cc --hunks=x2 ${patch} |
sed -e 's/worker_proto::read(\*localStore\, from\, Phantom<DrvOutputs> {});/WorkerProto::Serialise<DrvOutputs>::read(\*localStore\, rconn);/' >> $out
'')
# based on https://github.com/t184256/hydra/commit/a5ff881c96061841dfc82a264e7dd5201a973512
(fetchpatch {
url = "https://github.com/SuperSandro2000/hydra/commit/e43b298a57f85ecb89e0a17d1ba11b9dd0871ca5.patch";
hash = "sha256-ad/s2tZytsAy1rZkgMAim0JkvoVVSWSIyYERF3eYvts=";
})
# based on https://github.com/t184256/hydra/commit/359479b27948be1543b483979232d0fb84b08330
(fetchpatch {
url = "https://github.com/SuperSandro2000/hydra/commit/c2685b4bca917c1f6c077e359d55c1b121ee8fe2.patch";
hash = "sha256-9mCT+y6fRyAs/O7SKJP6dzhmjnKWEVe3R72TeaP4DFk=";
})
(fetchpatch {
url = "https://github.com/NixOS/hydra/commit/9004ba944a556365d75fc8c45cd6b70c384c1f9a.patch";
hash = "sha256-jPW1tvZejpBTSnPI4xfMj7gPfS7R41XU2YQ/fiPb6As=";
})
(fetchpatch {
url = "https://github.com/DarkKirb/hydra/commit/51c2319f653df748adfbc7a5fa7440205ad884df.patch";
hash = "sha256-jO2kSh/hnkect0IdhauMdsSeMZ10/U/K+xYfrznQ4bA=";
})
]; |
Bugs I have found so far:
|
Has someone a patch floating around for fetching .doi's aka |
Found one over at edolstra/nix-serve#21 and ported that to hydra SuperSandro2000@9e528b2 I didn't e2e test this yet but in the browser the json was displayed 🎉 |
@aciceri is it possible to allow maintainers to push to this branch? Otherwise I can close it an open a new PR. |
@Ericson2314 Apparently this is not possible since the fork is owned by an organization and not an user. If you send me the list of maintainers I can manually give them permissions (to the entire repo I fear, but it wouldn't be a problem since we always used it only for this PR). Feel free to write me at Otherwise I can create a personal fork and allow maintainers to push. But unless I can do this without deleting the PR (but simply editing the branch) it would be useless and you could directly do it. |
OK in that case I think I will close this, and reopen the original #875, because I can push to that one. All history is preserved so everyone's contributions fixing conflicts with this is remembered :) |
This is an attempt to merge #875 by @thufschmitt into current
master
.I don't have much experience with Nix internals nor languages involved here, but I tried my best to understand what I was merging and doing the correct fixes where needed, I hope it makes all sense.
All tests except
evaluator/evaluate-oom-job.t
are passing, and that one is also failing onmaster
on my machine so probably the cause is not related to this PR.I've been experimenting building some derivations from
nixpkgs
and everything seems to work but probably there are some corner cases which I've not considered.@Ericson2314 @SuperSandro2000 IIRC you were interested into this, maybe you can help me finding these corner cases.