Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hydra.nixos.org nixpkgs/nixos jobs coordination #117495

Closed
FRidh opened this issue Mar 24, 2021 · 27 comments
Closed

hydra.nixos.org nixpkgs/nixos jobs coordination #117495

FRidh opened this issue Mar 24, 2021 · 27 comments
Labels

Comments

@FRidh
Copy link
Member

FRidh commented Mar 24, 2021

Purpose

Inform one another about Hydra jobs that need to be (re)started, aborted or otherwise adjusted. Note often these kind of things go via IRC as well. This is just another channel. Also, if one needs a jobset they could request here as well.

Added the channel-blocker label so it will show up on status.nixos.org.

cc @roberth @vcunat @mweinelt

@FRidh FRidh changed the title hydra.nixos.org jobs coordination hydra.nixos.org nixpkgs/nixos jobs coordination Mar 24, 2021
@FRidh
Copy link
Member Author

FRidh commented Mar 24, 2021

I've aborted the hardening-flags jobs because of IMO more important jobs. #104091 (comment)

I am thinking of pausing the staging* builds because of the upcoming openssl cve fix #117191

@worldofpeace
Copy link
Contributor

Perhaps we should enable discussions in nixpkgs. Maybe it's better suited for this

@vcunat

This comment has been minimized.

@vcunat
Copy link
Member

vcunat commented Mar 24, 2021

Maybe we could've done this on the discourse forum :-) GH discussions sound like a replacement for that, though I don't have any experience with them yet.

@mweinelt
Copy link
Member

The openssl updates are in master and release-20.09. Evaluations have been triggered and now we're waiting for the update to reach the channels, before other jobs will be restarted again.

#117588
#117589

@vcunat
Copy link
Member

vcunat commented Mar 25, 2021

Someone's cancelled the first release-20.09-small evaluation that contains the updated openssl? EDIT: now someone restarted them.

@grahamc
Copy link
Member

grahamc commented Mar 25, 2021

I've canceled all jobs except for release-20.09-small to get that channel update out as quickly as possible. I'm not sure of the cause yet, but "bump to front" doesn't seem to bump to front. I'll do 20.09 after that, then unstable-small.

@dasJ
Copy link
Member

dasJ commented Mar 25, 2021

"Bump to front" only works when new jobs are to be dispatched iirc :/ I usually bump the jobs to front and restart the queue runner to force it to redispatch all jobs (with the correct order this time). Kills all builds but yeah…

@vcunat
Copy link
Member

vcunat commented Mar 25, 2021

I don't know... in this state where almost all is cancelled, we get most of the build farm idling. EDIT: well, it should only take a few hours until -small is finished.

@grahamc
Copy link
Member

grahamc commented Mar 26, 2021

Good point, @vcunat. Now that the 20.09-small build is making good progress, to get more builds running on the unused capacity I restarted the unstable-small jobset. But this shouldn't overwhelm anything and will hopefully keep the 20.09-small jobset highly prioritized.

@vcunat
Copy link
Member

vcunat commented Mar 26, 2021

I started a couple 20.09 darwin jobs as well, as there are none in the nixos jobsets.

@grahamc
Copy link
Member

grahamc commented Mar 26, 2021

20.09-small finished its builds: https://hydra.nixos.org/eval/1658031

@grahamc
Copy link
Member

grahamc commented Mar 26, 2021

unstable-small is very near completion also. I'm going to wait for it to complete then restart 20.09

@grahamc
Copy link
Member

grahamc commented Mar 26, 2021

I restarted the 20.09 jobs.

@grahamc
Copy link
Member

grahamc commented Mar 26, 2021

nixos:unstable-small:tested finished: https://hydra.nixos.org/eval/1658000

@grahamc
Copy link
Member

grahamc commented Mar 26, 2021

I've just bumped the 20.09 jobs to the front of the queue, restarted hydra-queue-runner, and started hydra-evaluator. Hopefully hydra keeps churning on 20.09 as the priority overnight. With that, I'm heading to bed.

@vcunat
Copy link
Member

vcunat commented Mar 26, 2021

I canceled 20.03 jobs (for now). Those jobsets still have high amount of shares configured; I think we should lower them significantly, as 20.03 isn't officially supported anymore. EDIT: I did that later.

@wamserma
Copy link
Member

There are still some strange timeouts on the 20.09 build: https://hydra.nixos.org/build/140063634

@vcunat
Copy link
Member

vcunat commented Mar 26, 2021

Occasional timeouts like that do happen. It works locally so I restarted it.

@FRidh
Copy link
Member Author

FRidh commented Mar 26, 2021

I've aborted the haskell-updates jobs since the mass rebuild after the openssl update is still ongoing cc @peti

@vcunat
Copy link
Member

vcunat commented Mar 26, 2021

Well, it was based atop, so perhaps it was targeting merge before the real rebuild of master with new openssl happens.

@grahamc
Copy link
Member

grahamc commented Mar 26, 2021

20.09 has about 7,000 jobs left, but tested has passed. I'm inclined to cancel all remaining jobs, le the channel advance, and then restart the jobs to backfill the cache. Any opinions? I'll do it in ~30min unless I hear otherwise.

@peti
Copy link
Member

peti commented Mar 26, 2021

I've aborted the haskell-updates jobs since the mass rebuild after the openssl update is still ongoing cc @peti

I wish you hadn't. How am I supposed to do the weekly merge tonight without any results from Hydra?

@grahamc
Copy link
Member

grahamc commented Mar 26, 2021

Sorry, it may be challenging to get nixos-unstable caught up in time.

@SuperSandro2000
Copy link
Member

#113747 (comment)

@vcunat
Copy link
Member

vcunat commented Mar 28, 2021

All four supported NixOS channels have updated and contain the new openssl 🎉
@peti: so I restarted the last haskell-updates evaluation.

x86_64-darwin still needs a few days to catch up, I think. I assume that nixpkgs-20.09-darwin should go first.

@vcunat
Copy link
Member

vcunat commented Mar 30, 2021

nixpkgs-20.09-darwin updated as well. I think we're basically in a normal regime now.

@vcunat vcunat closed this as completed Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants