Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short-lived package branches for backported fixes #79

Closed
bgilbert opened this issue Nov 7, 2018 · 41 comments
Closed

Short-lived package branches for backported fixes #79

bgilbert opened this issue Nov 7, 2018 · 41 comments
Assignees
Labels
kind/design releng Related to Fedora Release Engineering team/input

Comments

@bgilbert
Copy link
Contributor

bgilbert commented Nov 7, 2018

Fixes in out-of-cycle releases may take one of two paths:

  1. An updated package taken directly from Fedora
  2. A minimal fix applied to the package version already present in the affected stream

See #72 for details. For path 1, it should be sufficient to update the package manifest (#77). For path 2, we'll also need a way to branch the package to backport the fix. Such branches will be short-lived; the maximum lifetime is the testing release cadence plus the stable release cadence, currently 4 weeks in total.

@dustymabe
Copy link
Member

Thanks for opening this issue. I think we'll need to discuss this with some of the releng crowd to see if this is easily doable today with our existing toolset/policies or if we need to investigate changes.

@dustymabe dustymabe added the releng Related to Fedora Release Engineering team/input label Dec 12, 2018
@bgilbert bgilbert added this to Proposed in Fedora CoreOS preview via automation Jan 22, 2019
@bgilbert bgilbert moved this from Proposed to Selected in Fedora CoreOS preview Jan 22, 2019
@dustymabe
Copy link
Member

OK after brainstorming a bit some options that we'd like to explore:

  1. make a branch in dist-git each time we need to do this

This unfortunately might be surprising and unwanted by the real package maintainers.

  1. make a namespace in dist-git for coreos

i.e. src.fp.o/rpms/coreos/systemd - we'd fork a package when we needed to do a backport
and backport branches live in our namespace. The negative here is that rpms exist that
don't have branches in normal dist-git so it might be hard to find for a user that doesn't
know that the coreos namespace exists.

  1. somehow use modularity for this??

Not sure if it makes sense but maybe we could use modularity here.

@mohanboddu
Copy link

OK after brainstorming a bit some options that we'd like to explore:

  1. make a branch in dist-git each time we need to do this

This unfortunately might be surprising and unwanted by the real package maintainers.

I prefer this, and probably we can look at adding branch acls in case you dont want other maintainers making changes to your branch.

  1. make a namespace in dist-git for coreos

i.e. src.fp.o/rpms/coreos/systemd - we'd fork a package when we needed to do a backport
and backport branches live in our namespace. The negative here is that rpms exist that
don't have branches in normal dist-git so it might be hard to find for a user that doesn't
know that the coreos namespace exists.

  1. somehow use modularity for this??

Not sure if it makes sense but maybe we could use modularity here.

I guess this is not useful if you need to make changes to platform related packages, since they cannot be modularized.

@dustymabe
Copy link
Member

  1. make a branch in dist-git each time we need to do this

I talked with @mattdm and he also said he preferred this option:

@dustymabe
Copy link
Member

At this point we should determine a proper naming strategy for these branches that we create and decide how to use them. For example do we do one of:

  • create a new branch each time we do a backport (branch name in scm would be coreos-backport-$date)
  • use the same branch every time (branch name in scm would be coreos-backports)

In the first case we can get a bunch of branches that pollute the repo over time (because we can't delete branches). In the later case the git history of the branch in the repo is ugly.

@bgilbert
Copy link
Contributor Author

It's hypothetically possible that we'll need up to five branches at once: one for stable, one for testing, one for next, and one each for the testing and next development branches.

@dustymabe
Copy link
Member

I know you used the word hypothetically but would that ever really happen? I would think that we'd have a single backport that we tag into each of those streams rather than having 5 different branches where a backport is needed. Practically we could have more than one backport that we need at a time (one for next and one for stable/testing) but more than two probably shouldn't exist, right?. In most cases we should be taking the latest package from Fedora (that has the fix applied), which wouldn't require a special branch/build, right?

@bgilbert
Copy link
Contributor Author

Five simultaneous backports is a bit much. I wouldn't be surprised by three, though. Suppose stable, testing, and next each have a different version of some package (e.g. the kernel) and we need to backport a security fix to all of them.

The situation with the development branches is a bit different. Those wouldn't be single-cycle backports, but cases where there's a fix for the Fedora package that we haven't been able to merge back for some reason. Hopefully that doesn't happen much, if ever.

@dustymabe
Copy link
Member

Five simultaneous backports is a bit much. I wouldn't be surprised by three, though. Suppose stable, testing, and next each have a different version of some package (e.g. the kernel) and we need to backport a security fix to all of them.

Some assumptions I'm making:

  1. there is always a version of an rpm in fedora that has the fix in it, but it may have more than just the fix (i.e. fix+extra).
  2. In the case of a backport we're saying that we just want the fix and no extra.

With those assumptions I understand stable because we need to be able to backport just fix and no extra. I understand next because we could have that same problem there, but for a newer set of packages. I don't understand testing. I would think that testing could pick up the already patched version from the normal fedora package (fix+extra) without any problem.

@bgilbert
Copy link
Contributor Author

For both next and testing, we're allowed to promote fix+extra without baking it anywhere else first. (Presumably it passes CI, of course.) But we generally won't want to promote the extra to testing mid-cycle, because that means it'll get less than a full bake before promoting to stable. next has no such downstream, so thinking about it further, we probably will want to just take fix+extra there.

@dustymabe
Copy link
Member

next has no such downstream, so thinking about it further, we probably will want to just take fix+extra there.

OK. Does that mean we agree we probably don't need another backport branch for next?

For both next and testing, we're allowed to promote fix+extra without baking it anywhere else first. (Presumably it passes CI, of course.) But we generally won't want to promote the extra to testing mid-cycle, because that means it'll get less than a full bake before promoting to stable.

Agree on the But we generally won't want to promote the extra to testing mid-cycle part, but wouldn't we just be promoting the same backported rpm to both stable and testing.. i.e. new systemd cve comes out. We create a backport branch and then build systemd-x.y.z.rpm. Then we include systemd-x.y.z.rpm in both stable and testing. In this example we still only need one backport branch I think. I might be missing something

@bgilbert
Copy link
Contributor Author

Does that mean we agree we probably don't need another backport branch for next?

I think so, yeah.

We create a backport branch and then build systemd-x.y.z.rpm. Then we include systemd-x.y.z.rpm in both stable and testing. In this example we still only need one backport branch I think.

Using the kernel as the example: we might have 4.20.10 in stable, 4.20.14 in testing, and 4.20.16 in testing-devel. When applying fix, we'd want 4.20.10+fix in stable, 4.20.14+fix in testing, and probably 4.20.17 (including the fix) in testing-devel.

systemd and other userspace packages move more slowly, so we might frequently have systemd 239-12 in both testing and stable. But if we have 239-12 in testing and 239-10 in stable, and if their patchsets are significantly different, we might want to backport to each of those separately.

@dustymabe
Copy link
Member

Using the kernel as the example: we might have 4.20.10 in stable, 4.20.14 in testing, and 4.20.16 in testing-devel. When applying fix, we'd want 4.20.10+fix in stable, 4.20.14+fix in testing, and probably 4.20.17 (including the fix) in testing-devel.

Why couldn't 4.20.17 (including the fix) go straight to testing if it passes CI?

@bgilbert
Copy link
Contributor Author

Why couldn't 4.20.17 (including the fix) go straight to testing if it passes CI?

Because then it'd have less than two weeks of bake time before promoting to stable, possibly as little as a day or two.

@dustymabe
Copy link
Member

Why couldn't 4.20.17 (including the fix) go straight to testing if it passes CI?

Because then it'd have less than two weeks of bake time before promoting to stable, possibly as little as a day or two.

But in this example stable is receiving 4.20.10+fix (i.e. just the backport), but I see you are saying the next to be promoted stable would get the new kernel with only a few days of soak time. Could we reset the clock on testing if we chose to?

@bgilbert
Copy link
Contributor Author

Could we reset the clock on testing if we chose to?

As stream promotion is currently defined, no. We could choose to change that, of course. But I'd tend to think a predictable cadence for stable promotions is more valuable than avoiding the occasional backport.

@dustymabe
Copy link
Member

so in this case we could have the following:

  • day 0: release stable, testing
  • day 10: release stable, testing with fixed backport
  • day 14: release stable, testing again

Is that accurate? Seems like if we have out of band updates we could elect to skip the next one depending on where it landed.

@bgilbert
Copy link
Contributor Author

That's accurate, yes.

With CL we do, in fact, have some slop. We'll sometimes perform a scheduled release a day or two early or late to coincide with a security update. That's in part because releases still require a lot of manual work (which hopefully won't be true for FCOS) and in part to reduce the number of node reboots. We also might delay a release because of a late-breaking regression.

For FCOS we likewise may not want to hold ourselves to a precise date & time for each scheduled release. But we can allow ourselves some flexibility without causing that to slip the schedule for future releases.

@dustymabe
Copy link
Member

At this point we should determine a proper naming strategy for these branches that we create and decide how to use them. For example do we do one of:

  • create a new branch each time we do a backport (branch name in scm would be coreos-backport-$date)

  • use the same branch every time (branch name in scm would be coreos-backports)

In the first case we can get a bunch of branches that pollute the repo over time (because we can't delete branches). In the later case the git history of the branch in the repo is ugly.

Considering the discussion that followed #79 (comment) I'll propose we:

  • create a fedora-coreos-backports-A branch to use. If we ever need two backports for a given package then we create a fedora-coreos-backports-B branch (bikeshed welcome)
  • when we run a build we create a tag in the git repo (i feel like the build tools should do this automatically since NVR uniqueness is enforced in koji)
  • We re-use fedora-coreos-backports-A whenever we need a backport - this way we don't pollute the git repo with a bunch of branches that people may not like.

@bgilbert
Copy link
Contributor Author

@dustymabe SGTM. Re naming, fedora- seems redundant and I'd probably s/backports/backport/. So maybe coreos-backport-A?

@mattdm
Copy link

mattdm commented Mar 26, 2019

I agree that fedora- seems redundant — I think Dusty is thinking about distinguishing from the centos branches that will soon land in dist-git. But as I've mentioned, I think for Fedora CoreOS we want to leave open the possibility of actually building from some of those CentOS branches where necessary, or at least using those as the source for some of the backports. So, I actually think that's all fine. :)

@dustymabe dustymabe self-assigned this Jun 13, 2019
@dustymabe
Copy link
Member

created fesco issue for discussion there: https://pagure.io/fesco/issue/2152

@Conan-Kudo
Copy link

Conan-Kudo commented Jun 21, 2019

I'll be honest, this seems like you're doing it wrong. You guys are creating all kinds of weird workflows that not only don't make sense, but add confusion to packaging and pressure on the infrastructure.

Your concept of multiple update streams for backports is more or less pointless in the context of Fedora CoreOS. You're providing nothing of value by doing weird specific backport things, and it presents a view that you are only in Fedora because you must, rather than actually wanting to be working within the community.

@Conan-Kudo
Copy link

Conan-Kudo commented Jun 21, 2019

I'll also point out that this really didn't come up before with Fedora Atomic Host for the years it has existed, and you guys are already envisioning things that they never needed for Fedora CoreOS (which, still hasn't had a formal release and looks an awful lot like vaporware...).

You seem to think Fedora CoreOS is going to something different from FAH, and I really don't think that'll be the case. If anything, it'll likely be less than what FAH could do, since there will be no OpenShift nor Kubernetes for Fedora CoreOS (Kubernetes is tied to OpenShift versions in Fedora right now, and OpenShift upstream doesn't give a damn about Fedora, so we don't even have OpenShift 4.x in Rawhide).

As it stands today, there's nothing you need this for.

@bgilbert
Copy link
Contributor Author

@Conan-Kudo For more context, see the design document section on release streams. The reason we need a stream promotion process is the same reason that CoreOS Container Linux needed one: in order for automatic updates to be viable, they must not break users' systems. CI can't catch everything, so we need a way to let users test a new release before it's pushed out to the majority of machines. We want them to do that in production, on say 1-2% of their fleet. But conversely, for any of this to be viable in production, we can't wait 2-4 weeks for security fixes to promote through the testing stream to stable. So, net result, we need to provide out-of-cycle security support for both testing and stable channels, and we may sometimes need to do that by backporting patches. (The kernel is the most obvious case where simply shipping the current version of a package might not be safe.)

That basic approach worked pretty well for CoreOS Container Linux and its user community. We've adapted it to work better with Fedora workflows, and will continue to make adjustments as we gain more experience. But I agree that we haven't done a good job yet of explaining our thinking to the broader Fedora community. I'll be writing up a Fedora Magazine article within the next few weeks that should help explain the basic goals of the distro and why we're doing things the way we are.

@Conan-Kudo
Copy link

Conan-Kudo commented Jun 21, 2019

@bgilbert But you have the advantage of being able to produce OSTrees from content in updates-testing. There's literally no reason to wait "weeks" for security fixes. If your tooling still can't pull in select packages from specific repos, you should probably fix that. :)

And I disagree about out-of-cycle security support, especially on the testing stream. You might need to cherry-pick something from updates-testing for the stable stream, but I don't envision any case where you need to do cherry-picks for testing ones. What would you cherry-pick from? They're already in updates-testing...

@bgilbert
Copy link
Contributor Author

@Conan-Kudo The testing and stable streams are both on a two-week update cadence. The purpose of testing is to ship a fixed snapshot for long enough to qualify ("bake") the package set to be promoted to stable. If we churned testing frequently, it wouldn't serve that purpose. So we want to be careful about what kinds of out-of-cycle changes we make to it.

@Conan-Kudo
Copy link

@bgilbert They should be on an overlapping two week cadence, so that the transition from testing to stable happens as part of the second week. Thus, it should really be a 1~1.5 week baking period.

However, I think two weeks is too much for the streams. A week should be sufficient, given that it is enough for updates to push through in regular Fedora.

@bgilbert
Copy link
Contributor Author

@Conan-Kudo The stream cadences aren't contractual and we might change them after we gain some experience. We'll see how things go. 😁

@Conan-Kudo
Copy link

I'm not convinced that you need to do targeted backporting, and I'm also not sure this is particularly necessary for Fedora CoreOS.

In fact, I suspect you're going to cause more problems with this model because you're twisting and turning things into ways that aren't really desirable or supportable by the wider community.

I worry that what you're trying to do here is also going to further marginalize the community in favor of just being a "lab for RHEL". We already get plenty of people outside of Fedora foisting that moniker when it isn't true, let's not actually make it true.

@dustymabe
Copy link
Member

@Conan-Kudo. I understand your concerns. I have some concerns myself. The model we built for atomic host was a very passive process on top of existing Fedora processes and it worked pretty well. However, we have a new opportunity here with Fedora CoreOS to merge two communities (which means we need to take some of both) and bring users to to Fedora that would have otherwise not been here. In order to get to automated upgrades we needed to make some changes, including automated testing, but also locking on a package set (more controlled than bodhi) and letting the community report issues on those locked package sets (testing stream) before we ship to stable. We're certainly not trying to hurt Fedora. This is an attempt to bridge two communities together. We'll certainly make some mistakes and we hope to learn from them.

@Conan-Kudo
Copy link

@dustymabe In that scenario, don't you only need the ability to compose repos to feed in as inputs for FCOS? Because Koji keeps every build forever, it shouldn't be an issue to do more flexible compositions. No need to do weird things in Dist-Git.

@dustymabe
Copy link
Member

dustymabe commented Jun 26, 2019

@dustymabe In that scenario, don't you only need the ability to compose repos to feed in as inputs for FCOS?

Indeed we are doing that with the coreos-pool koji tag.

Because Koji keeps every build forever, it shouldn't be an issue to do more flexible compositions. No need to do weird things in Dist-Git.

Since we will have a stable stream that lags fedora we'll need to be able to backport security issues or major regresssions we find to the version of the package that is in the stable stream (i.e. deliver just the fix). 99% of the time we will just be able to use the rpm that was already built by the maintainer in fedora because versions don't typically change in the middle of a release (with a few exceptions). In some cases the version in Fedora will have been bumped and we'll elect to do a small backport instead of taking the new rpm version (minimizing risk to the stable stream).

In order to do this backport we'll need to apply a change to dist-git and do a build. Tag that build into our tag and update our lockfiles to include it. We don't want to stomp on existing branches of the package in dist-git, so we're requesting new branches for this purpose.

@Conan-Kudo
Copy link

In some cases the version in Fedora will have been bumped and we'll elect to do a small backport instead of taking the new rpm version (minimizing risk to the stable stream).

I don't think this is worth it for Fedora CoreOS. At all. No matter how you present this, you're basically saying you want to do something completely different from what the maintainer(s) would do for that package. This should not fly in Fedora, and I'd say you should invest in doing controlled validation of pulling in updated packages instead.

@dustymabe
Copy link
Member

I don't think this is worth it for Fedora CoreOS. At all. No matter how you present this, you're basically saying you want to do something completely different from what the maintainer(s) would do for that package.

That's not my interpretation of what we're asking. My interpretation is that we're saying:

  • We need a fix for kernel-F, but you (maintainer) have already moved on to kernel-G. Instead of asking you (maintainer) to build us a kernel-F-1 with the fix we'll be glad to do the work ourselves since we're the ones asking for it."

The other branch just makes it a designated sanctioned place to do it.

@dustymabe
Copy link
Member

dustymabe commented Jun 28, 2019

This was discussed in the FESCO meeting today. During the discussion and in the ticket we determined that it is feasible to use tags to push commits into a repo. The approval by FESCO is:

  * AGREED: Let's use tags. The exact workflow is up to the CoreOS team.
    (+8, 0, -0)  (contyk, 15:47:52)

So we use the following workflow:

git checkout commitxyz
add fix && git commit
git tag tagname
git push HEAD:refs/tags/tagname
fedpkg --release f30 --debug --verbose build 

There are a few remaining items left. I'll briefly mention them and enumerate them below that:

fedpkg build needs to support building from a tag. It works today if you comment out some code, but it would be nice if the tool supported it.

We should establish a pattern for naming of these tags and also a pattern for marking the resulting rpms. I suggest we use:

  • we always use a unique tag and never overwrite a tag once a build has been run
  • distgit tag name: coreos-backport-YYYYMMDD
  • rpm NVR: includes coreosbackport in the release field (i.e. kernel-4.20.14-200.coreosbackport.fc30)

Once we have established this pattern we should create a document at https://docs.fedoraproject.org/en-US/fesco/ under Policy Documents and reference the pattern and also the fesco ticket where it was approved that we could do this.

We should then socialize this decision and the document with devel@ list to raise awareness of the backports in case a maintainer sees them.

In order to achieve our goals we'll also need write access to all of the repos where we'll want to do a backport. The easiest way to achieve this is by getting some of our team members to be proven packagers where you can push to any dist-git repo.

Enumerating those:

  • RFE/patch to fedpkg to support building from a tag
  • Establish pattern for naming of tags and rpms that we'll use for the backports
  • Add policy document to fesco docs
  • Send email to devel@ list with summary of backports and link to FESCO policy doc
  • Get some team members with proven packager ability

@sinnykumari
Copy link
Contributor

sinnykumari commented Jul 1, 2019

* we always use a unique tag and never overwrite a tag once a build has been run

* distgit tag name: coreos-backport-YYYYMMDD

We may also want to include Counter (e.g. coreos-backport-YYYYMMDD.$counter) in case we end up doing additional backport related fix on same day after tag has already been pushed.

@bgilbert
Copy link
Contributor Author

bgilbert commented Jul 4, 2019

git tag tagname

Lightweight tags rather than annotated tags, then? That's not the typical Git workflow (though I could see an argument for it here). Annotated tags would also let us record e.g. the FCOS release for which the tag is intended.

From git-tag(1):

Annotated tags are meant for release while lightweight tags are meant
for private or temporary object labels. For this reason, some git
commands for naming objects (like git describe) will ignore lightweight
tags by default.

never overwrite a tag once a build has been run

We shouldn't overwrite a tag once it has been pushed, because Git doesn't refresh local copies of tags during a fetch.

NVR: includes coreosbackport in the release field (i.e. kernel-4.20.14-200.coreosbackport.fc30)

That feels clunky. Can we just use coreos?

@dustymabe
Copy link
Member

git tag tagname

Lightweight tags rather than annotated tags, then?

I'm +1 for annotated tags assuming there isn't any technicaly reason they won't work. My steps above were mostly for proof of concept documentation of how to push a commit+tag at the same time (i.e. the commit isn't already in the git repo).

never overwrite a tag once a build has been run

We shouldn't overwrite a tag once it has been pushed, because Git doesn't refresh local copies of tags during a fetch.

That works too. I was thinking of a situation where we pushed the tag then realized we needed a slight modificaiton before we ran the build.

NVR: includes coreosbackport in the release field (i.e. kernel-4.20.14-200.coreosbackport.fc30)

That feels clunky. Can we just use coreos?

works for me

@dustymabe
Copy link
Member

In practice we haven't needed this. I propose we close it and revisit in the future if the need arises.

@bgilbert
Copy link
Contributor Author

Yes, I agree.

@bgilbert bgilbert closed this as not planned Won't fix, can't repro, duplicate, stale Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design releng Related to Fedora Release Engineering team/input
Projects
No open projects
Development

No branches or pull requests

6 participants