Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow building content-addressed derivations with hydra - reprise #1228

Closed
wants to merge 40 commits into from

Conversation

aciceri
Copy link
Member

@aciceri aciceri commented Jun 27, 2022

This is an attempt to merge #875 by @thufschmitt into current master.

I don't have much experience with Nix internals nor languages involved here, but I tried my best to understand what I was merging and doing the correct fixes where needed, I hope it makes all sense.

All tests except evaluator/evaluate-oom-job.t are passing, and that one is also failing on master on my machine so probably the cause is not related to this PR.

I've been experimenting building some derivations from nixpkgs and everything seems to work but probably there are some corner cases which I've not considered.

@Ericson2314 @SuperSandro2000 IIRC you were interested into this, maybe you can help me finding these corner cases.

thufschmitt and others added 30 commits January 19, 2022 16:34
Allow building (floating) content-addressed derivations with Hydra.

There's some UI artifacts here and there, but everything should work
fine otherwise
This is something that Nix should do internally as part of the build
loop, but since we're partially reimplementing it, we need to also
reimplement this logic
This hadn't been caught anywhere until now as all the tests run with
`destStore==localStore`
Otherwise ca-derivations are always assumed to be the same because their
output path is empty
Unless it's also the local store, this will likely fail because this
call requires having the derivation handy, which the dest store
generally won't have
The dest store doesn't need to have the derivation, so rather first ask
the local store for all the built outputs, and directly pass these to
`getBuildOutput`
Ensure that if a path is present locally but not in the destination
store, then we properly copy it
Looks like it has never been used because the file was ill-named
Make it follow the same APIs and conventions as the rest of the
testsuite
Only ca derivations might have an empty `path` field (because this one
will only be known at a later stage)
That way it can be reused more generically
This causes the `Output store paths` field to always appear empty
(regardless of whether the drv is CA or not), which isn't good
Once the build is done, update the `BuildOutputs` table to add the
output paths of the derivation
- Test that the `build` page loads correctly
- Test that after a succesful build, all the corresponding outputs have
  their `path` correctly set
The previous logic was (probably) crashing hydra-queue-runner if
1. The `ca-derivation` feature wasn't enabled
2. An output of the current step was present on the local store but not on
  the dest store

The reason for that is that the code was trying to call
`localStore->queryRealisation` which only works if `ca-derivations` is
enabled
Make it so that hydra still correctly shortcuts input-addressed
derivations, even when the `ca-derivations` experimental feature is
enabled.
(The substitution of content-addressed derivations doesn't work properly
in hydra yet)
Not much to document, but at least it’s there :)
Otherwise it’s a bit tricky to use it as a binary cache
- Only call `tryResolve` when we now that we can do it safely
- Make sure that we always copy *all* the inputs of the derivation

This is missing a nice test, but it’s also a bit of a pain to write, so
it’ll wait a bit.
Fix a small mistake that caused an assertion failure in such a case
It’s possible that their inputs don’t have a properly registered
`realisation` (coming from an older Nix), so don’t assume that
Makes the whole thing marginally easier to read
Now `BuildStepOutputs` has a new boolean column `contentAddressed`.
Also regenerated the relative Perl.
@aciceri
Copy link
Member Author

aciceri commented Jun 30, 2022

Thanks to @dasJ (who helped me a lot with Perl though he also raised the issue 😅) I improved the UI. Non-CA derivations are shown exactly as before, instead for CA derivations the following is shown:

While it's building:
image
image
^ Note that here the usual line "Output store paths" is not shown.
image

After:
image
image
^ Now that line it's shown again.
image

In the end I chose to add a new row for the step outputs, initially I tried doing this querying the store from Perl to understand if a derivation was CA, I think that this way is much cleaner.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-33/20048/1

@SuperSandro2000
Copy link
Member

SuperSandro2000 commented Jun 30, 2022

in job ‘kibana’:
error: not an absolute path: 'ARRAY(0x58d7868)/0wfc6i459l68cx55y5iljyk0dnxz8m43-nixos-system-kibana-22.05.20220626.cd90e77.drv'

I think this was caused by having ca-derivations disabled in the nix-daemon due to some priority problem on my end. I think the error message should be improved here and/or errors should properly be handled to prevent other people from running into this.

Never mind. I thought it started to evaluate but it somehow ran into a timeout and now I get the same error message again.

@DarkKirb
Copy link
Contributor

DarkKirb commented Jul 1, 2022

Trying this PR out in prod (terrible idea I know) and it seems like there is no database migration for it yet

The following manual migration seems to work for me:

ALTER TABLE BuildOutputs ALTER COLUMN path DROP NOT NULL;
ALTER TABLE BuildStepOutputs ALTER COLUMN path DROP NOT NULL;
ALTER TABLE BuildStepOutputs ADD contentAddressed BOOLEAN NOT NULL DEFAULT 'f';

In my very limited testing it seems to work, including with remote building of regular and CA derivations (which is something that the old patch had issues with) 👍

@aciceri
Copy link
Member Author

aciceri commented Jul 1, 2022

Never mind. I thought it started to evaluate but it somehow ran into a timeout and now I get the same error message again.

TBH I have no idea how to debug this (and however I didn't try to replicate it yet).

Trying this PR out in prod (terrible idea I know) and it seems like there is no database migration for it yet

Ooops...sorry, I should have written to not do so, I thought about it but then I forgot. The reason is that I wanted to be sure about the ultimate changes in the tables before. Sorry again.

However I'm super happy to see that there is so much interest into this.
I think that tracking everything as comments here is better but I've also created a matrix room if you prefer.

@DarkKirb
Copy link
Contributor

DarkKirb commented Jul 1, 2022

The chat doesn’t seem to be public unfortunately

Comment on lines +563 to +571
localStore->registerDrvOutput(realisation);
destStore->registerDrvOutput(realisation);

// Also register the unresolved one
auto unresolvedRealisation = realisation;
unresolvedRealisation.signatures.clear();
unresolvedRealisation.id.drvHash = outputHashes.at(outputId.outputName);
localStore->registerDrvOutput(unresolvedRealisation);
destStore->registerDrvOutput(unresolvedRealisation);
Copy link
Contributor

@DarkKirb DarkKirb Jul 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remote-building CA Derivations seems to fail here because the destStore already has the realisation registered. incorrect, see below. My hydra has this patch applied although you can probably remove the try {} catch for the localStore and pattern match the error to make sure it is about a constraint failure when inserting into Realisations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, also removing the try/catch for the local store and pattern matching the other errors.
Do you already know how can we check that the error is caused by a constraint failure?

However I would prefer leaving the final word to @thufschmitt about this part.

Copy link
Contributor

@DarkKirb DarkKirb Jul 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t know enough about nix’s c++ code to know how to match a specific error Will try changing the catch type to SQLiteError

also it seems i was wrong about the “remove the localStore try catch blocks” because the error only seems to occur on the local parts my bad

Jul 02 10:02:54 nas hydra-queue-runner[1352608]: Failed registering realisation locally: error: executing SQLite statement '
Jul 02 10:02:54 nas hydra-queue-runner[1352608]:                        insert into Realisations (drvPath, outputName, outputPath, signatures)
Jul 02 10:02:54 nas hydra-queue-runner[1352608]:                        values ('sha256:4768517dfe0afebd04abc714af41ddd08785e3067abd98471ad2b608e8448a1f', 'out', (select id from ValidPaths where path = '/nix/store/cna07xr4yfh8ck0q00ibbpgqgmcmxrf4-system-units'), '')
Jul 02 10:02:54 nas hydra-queue-runner[1352608]:                        ;': constraint failed (in '/nix/var/nix/db/db.sqlite') (already exists?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying this out made me discover that the issue is not that the realisation already exists but that the out output is not copied over for some reason

SubstitutablePathInfos infos;
localStore->querySubstitutablePathInfos({{*path, {}}}, infos);
localStore->querySubstitutablePathInfos({{*maybePath, {}}}, infos);
if (infos.size() == 1)
avail++;
}
}

if (missing.size() == avail) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that copying dependencies to remote builders does not work as I am seeing error messages in the form

Jul 02 08:01:15 nas hydra-queue-runner[765078]: possibly transient failure building ‘/nix/store/r486r6s30ikrqdsb2rrrl5kb1jxhccr9-etc.drv’ on ‘build-pc’: error: some dependencies of '/nix/store/r486r6s30ikrqdsb2rrrl5kb1jxhccr9-etc.drv' are missing

@aciceri
Copy link
Member Author

aciceri commented Jul 2, 2022

The chat doesn’t seem to be public unfortunately

Try again please.

However I just added a new "Content addressed" field in the details.
image

@DarkKirb
Copy link
Contributor

DarkKirb commented Jul 2, 2022

Having success with remote building and this patch: DarkKirb@b3eedbf

I am not sure if copying step->drv->inputSrcs is needed. I don’t know why copying the derivation outputs back to the machine running hydra-queue-runner needs to be explicit, I think it was working before even though this particular code was not touched?

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-make-nixpkgs-more-eco-friendly-use-less-resources/20976/12

@SuperSandro2000
Copy link
Member

in job ‘kibana’:
error: not an absolute path: 'ARRAY(0x58d7868)/0wfc6i459l68cx55y5iljyk0dnxz8m43-nixos-system-kibana-22.05.20220626.cd90e77.drv'

did you find any time to look into or what could I do to debug this?

@t184256
Copy link
Contributor

t184256 commented Apr 20, 2023

Using this together with DarkKirb@b3eedbf, I've experienced hydra-queue-runner silently hanging inside getBuiltOutputs; turns out I was missing the magic update from 44e1eff. With these two and some trivial cherry-picking I finally have a hydra that builds content-addressed derivations along with normal ones + can do it on remote builders + uses a recent Nix! Thanks a lot for this updated MR, can't wait for it to land.

@SuperSandro2000
Copy link
Member

Do you have a branch with all of these combined?

@t184256
Copy link
Contributor

t184256 commented Apr 21, 2023

https://github.com/t184256/hydra/commits/nix-ca-reprise (d9c0957) is what I ended up running when I mashed it all up together. But consider these just tips for those who want to try this PR in its current form to save them some debugging time; a proper rebase of it onto the latest master would be the a better solution.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/content-addressed-nix-call-for-testers/12881/219

@SuperSandro2000
Copy link
Member

Based on the above comments fron @t184256 I merged this PR into the current master, rebased two patches that no longer applied as SuperSandro2000@e43b298 and SuperSandro2000@c2685b4 and applied the other commits done by @DarkKirb . See https://github.com/SuperSandro2000/hydra/commits/ca-derivations-reprise for full history. I am locally applying it via the following patches I append in an overlay.

patches = [
  # ca-derivation support
  (let
    patch = fetchpatch {
      url = "https://github.com/NixOS/hydra/pull/1228.diff";
      excludes = [ "t/**" ];
      hash = "sha256-wPiMSgETENmFtmITV41/EAo9yar0aPpcZXkQium3zf4=";
    };
  in runCommand "" {
    nativeBuildInputs = [ patchutils ];
  } /* bash */ ''
    filterdiff -p1 -x src/hydra-queue-runner/build-remote.cc ${patch} > $out
    filterdiff -p1 -i src/hydra-queue-runner/build-remote.cc --hunks=x2 ${patch} |
    sed -e 's/worker_proto::read(\*localStore\, from\, Phantom<DrvOutputs> {});/WorkerProto::Serialise<DrvOutputs>::read(\*localStore\, rconn);/' >> $out
  '')
  # based on https://github.com/t184256/hydra/commit/a5ff881c96061841dfc82a264e7dd5201a973512
  (fetchpatch {
    url = "https://github.com/SuperSandro2000/hydra/commit/e43b298a57f85ecb89e0a17d1ba11b9dd0871ca5.patch";
    hash = "sha256-ad/s2tZytsAy1rZkgMAim0JkvoVVSWSIyYERF3eYvts=";
  })
  # based on https://github.com/t184256/hydra/commit/359479b27948be1543b483979232d0fb84b08330
  (fetchpatch {
    url = "https://github.com/SuperSandro2000/hydra/commit/c2685b4bca917c1f6c077e359d55c1b121ee8fe2.patch";
    hash = "sha256-9mCT+y6fRyAs/O7SKJP6dzhmjnKWEVe3R72TeaP4DFk=";
  })
  (fetchpatch {
    url = "https://github.com/NixOS/hydra/commit/9004ba944a556365d75fc8c45cd6b70c384c1f9a.patch";
    hash = "sha256-jPW1tvZejpBTSnPI4xfMj7gPfS7R41XU2YQ/fiPb6As=";
  })
  (fetchpatch {
    url = "https://github.com/DarkKirb/hydra/commit/51c2319f653df748adfbc7a5fa7440205ad884df.patch";
    hash = "sha256-jO2kSh/hnkect0IdhauMdsSeMZ10/U/K+xYfrznQ4bA=";
  })
];

@SuperSandro2000
Copy link
Member

SuperSandro2000 commented Nov 26, 2023

Bugs I have found so far:

  • /evals is just empty, nothing is logged I had a faulty nginx location regex
  • when remote builders are used that do not have ca-derivations enabled, hydra just loops forever trying to use them
Nov 26 03:21:03 hydrogen hydra-queue-runner[1401522]: checking the queue for builds > 0...
Nov 26 03:21:03 hydrogen hydra-queue-runner[1401522]: loading build 10725 (sandro:nixos-config:nixos.hydrogen-simd.system)
Nov 26 03:21:03 hydrogen hydra-queue-runner[1401522]: queue monitor: error:
Nov 26 03:21:03 hydrogen hydra-queue-runner[1401522]:        … while loading build 10725:
Nov 26 03:21:03 hydrogen hydra-queue-runner[1401522]:        error: experimental Nix feature 'ca-derivations' is disabled; use '--extra-experimental-features ca-derivations' to override

@SuperSandro2000
Copy link
Member

SuperSandro2000 commented Dec 4, 2023

Has someone a patch floating around for fetching .doi's aka https://hydra.example.de/realisations/sha256:a62128132508a3a32eef651d6467695944763602f226ac630543e947d9feb140!out.doi over a http binary cache?

@SuperSandro2000
Copy link
Member

SuperSandro2000 commented Dec 4, 2023

Found one over at edolstra/nix-serve#21 and ported that to hydra SuperSandro2000@9e528b2

I didn't e2e test this yet but in the browser the json was displayed 🎉

@Ericson2314
Copy link
Member

Ericson2314 commented Dec 4, 2023

@aciceri is it possible to allow maintainers to push to this branch? Otherwise I can close it an open a new PR.

@aciceri
Copy link
Member Author

aciceri commented Dec 4, 2023

@Ericson2314 Apparently this is not possible since the fork is owned by an organization and not an user.

If you send me the list of maintainers I can manually give them permissions (to the entire repo I fear, but it wouldn't be a problem since we always used it only for this PR). Feel free to write me at @aciceri:nixos.dev.

Otherwise I can create a personal fork and allow maintainers to push. But unless I can do this without deleting the PR (but simply editing the branch) it would be useless and you could directly do it.

@Ericson2314
Copy link
Member

OK in that case I think I will close this, and reopen the original #875, because I can push to that one.

All history is preserved so everyone's contributions fixing conflicts with this is remembered :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants