Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start gating Bodhi updates of CoreOS-owned packages on Fedora CoreOS tests #1617

Closed
jlebon opened this issue Nov 21, 2023 · 9 comments
Closed
Assignees
Labels
jira for syncing to jira

Comments

@jlebon
Copy link
Member

jlebon commented Nov 21, 2023

We recently added ResultsDB reporting for Bodhi tests in coreos/coreos-ci#50. The next step would be gating updates on those results.

Before we propose any wider gating to the community, we need to make sure that it works well. Let's start with gating our own packages first to get a feel for it.

Gating is done using Greenwave, which decides whether a "subject" (e.g. Bodhi update) has passed or not based on test results from ResultsDB and policies described in YAML files. The global policy for Fedora is at https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/greenwave/templates/fedora.yaml.

@jlebon jlebon added the jira for syncing to jira label Nov 21, 2023
@jlebon jlebon self-assigned this Nov 21, 2023
@jlebon
Copy link
Member Author

jlebon commented Nov 21, 2023

The common way to gate a group of packages in the global policy is using comps groups. This is what the critical path decision contexts in the global policy refer to.

So we could create a new comps group, e.g. critical-path-coreos with the packages we want to gate. But depending on how wide a net we'll want to cast in the future, this can get cumbersome to maintain. E.g. at the limit, if we want to apply gating to all the packages in FCOS, we'll have to maintain in sync the packages in the lockfiles with the comps definition.

Another approach is using gating.yaml in dist-git, but managing all those files across all those repos is also cumbersome. And it also means that packagers are free to modify them to stop gating on those tests, which is both good and bad. (Note forcing a specific update through even with failing CI can always be done in Bodhi using waivers.)

@AdamWill
Copy link

AdamWill commented Nov 21, 2023

Just one thing it'd be best to note explicitly: there is rather more to it than just "you can put any comps group you like in a greenwave policy". Rather, the critical path groups are fairly "special".

We have a script which runs daily and calculates "full" versions of specific comps groups - currently, all the critical path groups and core, which is also considered to be a critpath group. That is, it resolves the dependencies for the comps groups and includes all of those too. It writes the results out as JSON files (one per release). Bodhi reads those files, and uses the data to figure out whether each update contains packages from each critical path group. It exposes that information in the web UI and also via its API (so test systems can use it in deciding whether to schedule tests). It also uses that information in constructing its greenwave queries: it adds a decision context to the query for each critical path group the update contains a package from.

In practice, extending this mechanism to another group isn't too complicated - you put the group in comps, add it to CRITPATH_GROUPS list in critpath.py, and add the appropriate decision context stuff to the fedora.yaml greenwave policy file. But it helps to know all this is going on in the background, and you do need to make the change to critpath.py, otherwise your group won't "work".

@AdamWill
Copy link

I guess I should highlight that "dependency" thing a bit, too. As far as this system is concerned, all dependencies of packages in a critpath group are part of that group. So e.g. https://bodhi.fedoraproject.org/updates/FEDORA-2023-ff5bd8445c, a libcap update, is considered part of just about every critpath group by Bodhi, even though libcap is not explicitly listed in any comps group. This is because lots of very core stuff requires libcap.

So you need to be aware that if you go ahead and use the current critpath mechanism - create a critical-path-coreos group, list a handful of packages in it, add a greenwave policy for it, and add it to critpath.py - the gating would apply to a lot more than just that handful of packages; it will apply to all their dependencies too.

I have considered introducing the concept of a 'non-resolving' group for this kinda case (for e.g. to run and gate on more extensive openQA tests on anaconda updates without running them on every dependency of anaconda). But I hadn't got around to actually doing it yet.

@AdamWill
Copy link

I suppose another direction we could extend in is to make the critpath.py script capable of parsing more than just comps; I suppose we could make it parse other input formats, like whatever FCOS uses, and expose that data to Bodhi too, for Bodhi to use the same way it uses critpath comps group data.

Any way we go with this, we may have to consider de-emphasizing the "critical path" relationship, I guess, though that involves doing some conceptual thinking (I wrote a thread on devel@ about this, back when I set this whole mechanism up).

@jlebon
Copy link
Member Author

jlebon commented Nov 22, 2023

Thanks for the additional info!

I suppose another direction we could extend in is to make the critpath.py script capable of parsing more than just comps; I suppose we could make it parse other input formats, like whatever FCOS uses, and expose that data to Bodhi too, for Bodhi to use the same way it uses critpath comps group data.

That sounds like a potential path forward. For this first phase, maybe simplest is to host a .coreos-critpath.json file in https://github.com/coreos/fedora-coreos-config which would list the few packages we want to gate on, and critpath.py could just merge it into its output JSON under the .rpm.coreos key (as is; without any dependency resolution)? IIUC, that would translate into a e.g. bodhi_update_push_testing_coreos_critpath and bodhi_update_push_stable_coreos_critpath decision context in the policy file.

Whenever we're ready to turn on gating more broadly, we could decide to expand that list or just update critpath.py to read the full packageset from the lockfiles (if we want to expand it to all packages in FCOS).

Any way we go with this, we may have to consider de-emphasizing the "critical path" relationship, I guess, though that involves doing some conceptual thinking (I wrote a thread on devel@ about this, back when I set this whole mechanism up).

Could you link to that thread? I suppose what you mean here is that the whole process is very geared towards critical path comps groups and gating and we're sort of hijacking those mechanisms for broader purposes? In the smaller scoped instance, keeping the critpath nomenclature seems appropriate, even if it's not sourced from comps. E.g. the list in https://github.com/coreos/coreos-ci/blob/bc8b4aceebf45f0663d48246fce7d500f0c02c50/jobs/bodhi-trigger.Jenkinsfile#L11-L27 can be considered an "FCOS critical path" group. (These are the packages we currently trigger Bodhi tests for.) If we do expand it to all packages in FCOS, I agree it's stretching it.

I'm happy to help with this de-emphasis as I work on this if we think it's worthwhile.

@AdamWill
Copy link

AdamWill commented Nov 22, 2023

I suppose what you mean here is that the whole process is very geared towards critical path comps groups and gating and we're sort of hijacking those mechanisms for broader purposes?

Yeah, more or less. Even without this CoreOS stuff, we're kinda stretching it at present. The initial idea of the "critical path" concept was that it defined the packages absolutely necessary for a system to boot and work. The purpose was to put stricter requirements on updates containing those packages - they require more karma or a longer wait in updates-testing than regular updates.

But then when we set up openQA update testing I decided to use "is it critpath?" as a heuristic for whether to run tests on updates - because it was convenient, and more or less mapped to what we wanted to test. Since then we've kinda stretched the critpath definition a bit primarily just to get things tested and gated by openQA, e.g. pulling FreeIPA into the 'critpath' definition for Server. Last year I tweaked the formal critpath definition in the wiki to retcon this: https://fedoraproject.org/w/index.php?title=Critical_path_package&diff=686714&oldid=599546

so we have a bit of wiggle room there, especially you as the CoreOS edition owners can pretty much just declare whatever you like to be on the CoreOS critical path. but still, the further we stretch away from the initial conception of what the 'critical path' was for, the more we might wanna look at separating it from this 'what gets tested and gated' concept, I guess.

The thread was this one.

Your path forward sounds like it should work, yeah. Any time you want to propose a change to the critpath.py script go ahead and I'll refresh my memory on exactly how the bits plug together, and review it. Just note that per the above, so long as we keep this tie to the 'critical path' concept, anything you mark 'critical path' so it'll be gated also gets the more onerous requirements to be pushed stable. In practical terms, changing that would require some changes to Bodhi, as it'd need a new concept...

jlebon added a commit to jlebon/rpm-ostree that referenced this issue Nov 29, 2023
As part of Bodhi gating, we need to deal with source RPM names and not
binary RPMs. E.g. Bodhi CI messages include Koji NVRs, not binary RPMs.
Having to do the translation between SRPM and RPM is expensive because
you need either repodata or e.g. bring up a VM and query the rpmdb.

Instead, just have rpm-ostree output the SRPM name as a metadata field
in the output lockfile. cosa already saves this and outputs it in the
build dir so it ends up in S3 and is much easier to work with.

Obviously there's some redundancy there since every binary RPM from
the same source RPM will have the same field. But it doesn't seem worth
extending the schema for it instead of just working with it as is.

Related: coreos/fedora-coreos-tracker#1617
@jlebon
Copy link
Member Author

jlebon commented Feb 1, 2024

Progress on this!

A lot of the code there will likely look very different depending on how we want to approach this once we're ready to gate a lot more packages (see the commit message in the releng PR, though hoping to do a more detailed braindump on this with options possibly in a Change Proposal draft).

@jlebon
Copy link
Member Author

jlebon commented Feb 14, 2024

Progress on this!
* Add bodhi-testing.yaml and support specifying tests to run coreos-ci#57 adds a new bodhi-testing.yaml which is used to both know which RPMs to test on and which RPMs should be gating
* pagure.io/releng/pull-request/11926 makes critpath.py read bodhi-testing.yaml and create a new critical-path-coreos group in critpath.json for Bodhi
* pagure.io/fedora-infra/ansible/pull-request/1758 makes the packages in critical-path-coreos require the test coreos.cosa.build-and-test to have passed

OK cool, all those patches are in now! It'll take some time to propagate to Bodhi, but if it all works, we should see in the next Bodhi update we do (for any package in this list) that the coreos.cosa.build-and-test test is marked as required in the Bodhi UI.

@jlebon
Copy link
Member Author

jlebon commented Feb 26, 2024

This is done now! Look at e.g. https://bodhi.fedoraproject.org/updates/FEDORA-2024-27f330f546. You can see in the "Automated Tests" tab that the coreos.cosa.build-and-test test has an asterisk next to it, which means that it's a required test.

Let's discuss follow-up steps in a separate ticket.

@jlebon jlebon closed this as completed Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira
Projects
None yet
Development

No branches or pull requests

2 participants