Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test everything before merging with merge queue + hercules-ci #7674

Open
roberth opened this issue Jan 23, 2023 · 28 comments
Open

Test everything before merging with merge queue + hercules-ci #7674

roberth opened this issue Jan 23, 2023 · 28 comments
Assignees
Labels
contributor-experience Developer experience for Nix contributors feature Feature request or proposal with-tests Issues related to testing. PRs with tests have some priority

Comments

@roberth
Copy link
Member

roberth commented Jan 23, 2023

Is your feature request related to a problem? Please describe.

Prevent issues like #7669 by building the whole flake before merge.

Describe the solution you'd like

Maybe Theophane was right. I could add a bors.toml after an admin installs Hercules CI and bors on this repo. Instead of "enable auto-merge", we'd comment bors r+ to indicate a positive review and let bors merge it as appropriate. This way we build a larger set and we get the guarantee that the merge builds. (informally: the no rocket science rule)
Bors also allows the delegation of review (merge) rights to contributors, and it does not suffer from the GitHub Actions limitation on builds triggered by actions.

Describe alternatives you've considered

Extend the actions, but they seem slow. Doesn't help with NixOS tests either. I don't know a ton about actions, whereas the above is easy and you'll get amazing support.

Additional context
Add any other context or screenshots about the feature request here.

Priorities

Add 👍 to issues you find important.

@roberth roberth added feature Feature request or proposal contributor-experience Developer experience for Nix contributors with-tests Issues related to testing. PRs with tests have some priority labels Jan 23, 2023
@Ericson2314
Copy link
Member

This sounds good. I have been repeatedly bitten by GitHub Actions' poor coverage and they are too slow anyways. I would also like to add more cross-version daemon and ssh:// integration testing, and as discussed many months ago that was blocked by github actions capacity limits.

@Ericson2314 Ericson2314 mentioned this issue Jan 25, 2023
7 tasks
@Ericson2314
Copy link
Member

We have had a number of instances of things being broken and that found after the fact. This is very stressful! (At least I feel under pressure when it turns out I broke something by accident.)

It would be much nicer if PRs were auto-merged only after all jobs passed, so yes please let's do this.

@edolstra
Copy link
Member

edolstra commented Feb 2, 2023

I don't want to make the Nix CI / release process dependent on yet another CI system. We already have GitHub actions and Hydra. Ideally we would build PRs on Hydra (which also has the advantage that it's not a proprietary tool/infrastructure).

@Ericson2314
Copy link
Member

I don't care which CI it is, as long as we build PRs complete, so I would be fine if hydra.nixos.org did it. I would say we can even drop GitHub actions at that point.

@roberth roberth mentioned this issue Feb 7, 2023
14 tasks
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-02-10-nix-team-meeting-minutes-31/25438/1

@Ericson2314
Copy link
Member

#7748 (comment) More issues caught by Hydra but not GitHub Actions. We really need completely PR CI.

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Mar 27, 2023

Discussed in the Nix team meeting:

  • @thufschmitt: in favor of trying it; can roll it back if it doesn't work out
    • consideration: using Hydra may block us on @grahamc, using Hercules may block us on @roberth
      • prefer someone who's on the Nix team and immediately available
  • @edolstra: can it replace the GitHub Actions?
    • @roberth: may need development to build forks
    • @edolstra: don't want to deal with three CI systems
    • @Ericson2314: it's good to try all three before getting rid of one
  • @edolstra: another consideration is build time
    • don't want a trivial documentation PR to take hours
    • @roberth: currenlty runs on an almost-dedicated machine, should not be the bottleneck
      • can re-evaluate if needed, but would be surprised
    • @edolstra: what about aarch64-linux
      • @roberth: could also provide that
      • @thufschmitt: is that something the NixOS Foundation could fund?
  • @edolstra: how does it affect the PR workflow? does it block merging on failure?
    • @roberth: the merge button changes to "add to merge queue" and behaves like the "auto-merge" button
  • agreement: idea approved, assigned to @edolstra to set it up with help of @roberth
    • will experiment it for ~1 month and re-evaluate

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-03-27-nix-team-meeting-minutes-44/26759/1

@roberth
Copy link
Member Author

roberth commented Mar 27, 2023

I need to make some adjustments to smoothen the transition, but we can get the repo admin operations out of the way and start using merge queues.

Start using merge queues

  1. A repo admin follows the steps in this paragraph https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#managing-a-merge-queue

  2. For the required statuses, we can just use the actions for now.

Install Hercules CI

But not enable it yet.

  1. A repo admin installs Hercules CI on the repo: https://github.com/apps/hercules-ci/installations/new
  2. Anyone with write access disables the project in https://hercules-ci.com/github/NixOS/nix, because configuration is necessary for it to succeed. (right under Settings)

@roberth tasks

When the admin tasks are done, my goal is to enable it without causing any interference and address anything that comes up with high priority.

Meanwhile I have the following tasks

  • add a feature that allows to ignore unconfigured branches, as the project is too heavy on those for a smooth transition
  • make adjustments so that the project can be configured in a way that does not build the ccache stuff, and succeeds for unconfigured branches.
  • configure more hardware, or only build x86_64-linux

Known performance issue:

The first build may be slower than subsequent builds. This is something I plan to improve in the coming weeks.

@alyssais
Copy link
Member

alyssais commented Mar 27, 2023

I think adding Hercules should have more justification than "Hercules is run by Robert, and Hydra is run by Graham". Especially since, as I understand it, Hercules is proprietary. If we want something changed in Hercules, we're entirely dependent on Robert to do that in a way that we're not with Hydra — anybody can do the implementation work, and in the worst case scenario anybody can run Hydra.

Edit: to expand further after some more research (infra team correct me if I got anything wrong), Hydra has an infrastructure team behind it. These are the people with full access (and there are more with partial access). These are the people who can commit to the Hydra repo.

@roberth
Copy link
Member Author

roberth commented Mar 27, 2023

@alyssais
This isn't proposing a policy; just adding a service, that can be complemented or replaced by something better in a few clicks.
There's no buy-in like with Jenkins vs Travis before; it's all just Nix underneath.
It's up to the team to decide which tools are useful at any given time, and my "vote" carries no weight on this.
You're free to help out in any way you want, and I hope you spend your time well, regardless of what you choose to do.

@alyssais
Copy link
Member

It's up to the team to decide which tools are useful at any given time, and my "vote" carries no weight on this.

I agree, I just want to help the team come to an informed decision, as the minutes of the meeting suggested that the team might be information that I understand to be inaccurate. (Specifically, that Hydra could mean being blocked waiting for Graham.)

@thufschmitt
Copy link
Member

@alyssais yes, that log line is a bit of a simplification of the overall discussion (although I think I quite literally said it at some point as a tongue-in-cheek summary).

The thing is that getting Hydra to do this would be a bit of work (because we'd have to spawn a dedicated instance), and I don't see anyone ready to do it. I talked about this in the past (in #Hydra IIRC), and although people liked the idea, it looks like no one had the time to do anything about it. On the other hand, we have @roberth here who's offering to experiment with Hercules and take the infra part on himself. This is why we're moving in that direction at the moment.

@alyssais
Copy link
Member

The thing is that getting Hydra to do this would be a bit of work (because we'd have to spawn a dedicated instance), and I don't see anyone ready to do it. I talked about this in the past (in #Hydra IIRC), and although people liked the idea, it looks like no one had the time to do anything about it. On the other hand, we have @roberth here who's offering to experiment with Hercules and take the infra part on himself. This is why we're moving in that direction at the moment.

Thanks for the background — with that perspective it makes a lot more sense indeed than what the meeting notes said.

@alyssais
Copy link
Member

we'd have to spawn a dedicated instance

Wait, I missed this — why?

@Ericson2314
Copy link
Member

@alyssais Basically concerns about unreviewed PRs getting CI runs. (c.f. ofborg vs hydra.nixos.org.)

@Ericson2314
Copy link
Member

In both cases, I would like the solve the problem of having separate build farms without resorting to completely separate technology, but that will also take a bunch of volunteer effort that hasn't yet materialized.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-05-05-nix-team-meeting-minutes-52/27893/1

@alyssais
Copy link
Member

alyssais commented May 6, 2023

Bors is now deprecated. (although I'm guessing using the GitHub feature instead of Bors wouldn't change much about this.)

@roberth roberth changed the title Test everything before merging with bors + hercules-ci Test everything before merging with merge queue + hercules-ci May 7, 2023
@DavHau
Copy link
Member

DavHau commented Jan 2, 2024

For the mean time, could we just throw some money at the problem and upgrade to larger github runners?

Even nixos VM tests can be enabled. They got KVM support recently.

It would be great if this issue could be prioritized. Quick high quality feedback stands at the core of an efficient development workflow and right now the feedback is neither complete nor quick.

@Mic92
Copy link
Member

Mic92 commented Jan 2, 2024

I use mergify to replace bors i.e. in nixos-hardware: NixOS/nixos-hardware#820

  • From my experience it has the advantage that it prints the status on the same page, which is less confusing to contributor when pull requests are not merged.
  • It's not open-source but I don't think this buys us much given that github is also not open-source - there is not real vendor lock in for a merge queue as it could be swapped out with something else any day.

@zowoq also added that github's merge queues can now also be enabled for public organizations.

@jkarni
Copy link

jkarni commented Jul 11, 2024

I hope this isn't stepping on anyone's toes, but I kind of accidentally ran garnix on a commit on my fork of this repo (I have it enabled globally) and without further configuration already half the outputs worked (and many of the failures seem related to ccache). The load on our system was pretty minimal - we can definitely support enabling it in this repo if that'd be helpful.

(Note: I started garnix - not sure if that presents a conflict of interests here.)

@fricklerhandwerk
Copy link
Contributor

@jkarni that would be great! We recently discussed this on the team and would gladly run multiple setups side by side to see what works best. @Ericson2314 seemed to be particularly interested.

@roberth
Copy link
Member Author

roberth commented Jul 11, 2024

I kind of accidentally

Let's be honest, of course a small voice in me wants your competing product not to be here, and probably you've had a small voice tell you that this could be helpful for yours. That's fine; of course it's mutually beneficial between Nix and our respective products.

As you can probably tell from the age of this issue I haven't been able to push forward on the few blockers that would let Hercules CI do the whole thing, as its focus has so far been on CI/CD for trustworthy contributions, and not PRs. (Specifically because the HCI architecture has one set of machines and one cache per organization.)
I suppose that's where garnix could help out, by building PRs, including those from forks?

@Mic92
Copy link
Member

Mic92 commented Jul 11, 2024

I also want to throw my horse into the CI race here :)

I also used both hercules-ci and garnix before and they work great. I wanted to support some more architectures using my own hardware and more importantly be able to test pull requests (limitation of hercules-ci), which is why I started buildbot-nix instead.
buildbot-nix is an opensource library/configuration on top of buildbot. Buildbot is a mature 20+ year old CI framework used in many big opensource projects: Python,[6] WebKit,[7] LLVM,[8] Blender,[9] ReactOS,[10

Buildbot-nix extends buildbot to build nix projects with zero configuration. The project itself is small (3K lines). It currently supports native GitHub/Gitea integration. You can see what this looks like here on my nix fork: Mic92#2

If you want to test buildbot, I already want have the github app installed into nixos-wiki-infra. All it needs is setting the build-with-buildbot github topic and pinging me on matrix, so I start the project re-scan.

Of course I can understand if you prefer hercules-ci as this is what Robert develops, especially if it helps Robert to justify his countless hours of work on Nix.

@Mic92
Copy link
Member

Mic92 commented Jul 11, 2024

Will hercules-ci supports windows/*BSD in the future btw? This would be nice to test our upcoming platforms....

@Ericson2314
Copy link
Member

I very much dislike GitHub actions :) if we need to test Nix on "vanilla distros" (e.g. test installers) we should use non-NixOS VM tests.

@Mic92
Copy link
Member

Mic92 commented Jul 14, 2024

Not sure about windows but cloud-config works well with the nixos test framework. In system-manager we use it with ubuntu: https://github.com/numtide/system-manager/blob/4d33bfa43cc067dd6ea6a0e41115f27b50cd5a35/test/nix/modules/default.nix#L123 https://github.com/numtide/nix-vm-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor-experience Developer experience for Nix contributors feature Feature request or proposal with-tests Issues related to testing. PRs with tests have some priority
Projects
Status: 🏁 Review
Development

No branches or pull requests

10 participants