Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major release and update cycle for Fedora CoreOS #22

Closed
sinnykumari opened this issue Aug 1, 2018 · 29 comments
Closed

Major release and update cycle for Fedora CoreOS #22

sinnykumari opened this issue Aug 1, 2018 · 29 comments

Comments

@sinnykumari
Copy link
Contributor

For Fedora Atomic Host, we do Atomic Host release along with Fedora major release. This follows with new release every Two Weeks with updated content from Fedora updates. This helps users to receives updated (including security fixes) and tested content every Two Weeks. For a major CVE fix, we make exception and do an in-between releases.

For FCOS, as per my knowledge we will have our first official release around Fedora 30 release based on be f30 tagged built packages (correct me if I am wrong). It will be nice to discuss and define how frequently we are going to do releases in between with updated content.

@bgilbert
Copy link
Contributor

bgilbert commented Aug 1, 2018

As part of this, we should also discuss the stream structure.

Container Linux

On Container Linux, we have these channels:

  • alpha: Updated frequently. Contains the latest versions of software, including (for Docker and the kernel) versions we don't necessarily intend to promote to other channels.
  • beta: Updated less frequently. Essentially a stable release candidate. Consists of a release branch promoted from alpha, probably with changes.
  • stable: Updated less frequently than beta, but more frequently than a Fedora release. Promoted from beta, often with fixes.

Not every branch promotes to stable; half of all alpha branches and half of all beta branches are not promoted. Serious bugfixes and security fixes are backported to existing release branches via out-of-cycle releases.

We encourage users to run some production nodes on alpha and some on beta in order to help catch regressions. In particular, this means that serious fixes must be applied to all channels.

The channel names are not especially descriptive and should not be carried forward without some thought; in particular, beta is not merely a promoted alpha.

Fedora CoreOS

We'll probably want to sync with the Fedora release cycle, even if it's not mandatory. If we branched from Rawhide independently from the rest of Fedora, we'd end up responsible for backporting fixes, which is too much work.

There's still value in maintaining a multiple-stream structure, however. If we don't have something like a beta channel where users can get an early preview of updates, we're more likely to push regressions to stable. Avoiding this is especially important with an automatically-updating production OS.

Straw proposal

  • next: Preview of the next Fedora release when that has stabilized "enough". We'd want to define the cutover point: Bodhi enablement? the Fedora alpha release? During the rest of the cycle, next would track testing.
  • testing: Periodic snapshot of the current Fedora release + updates, perhaps every two weeks.
  • stable: Promotion of a snapshot that's baked in testing for two weeks, including any needed fixes.

Out-of-cycle backports of important security fixes and bugfixes would also occur. If a fix is important enough to backport directly to stable for an out-of-cycle release, it should also be backported to testing.

I've excluded updates-testing from the above, on the theory that packages retracted from updates-testing are considered not to have been deployed, so packages will not carry workarounds for any resulting breakage. I'm not sure whether that's the right idea, though.

I'm also not sure of a reasonable cadence for stable.

@dustymabe
Copy link
Member

thanks for a well thought out reply @bgilbert. I didn't get the notification and just now stumbled upon it (see GH notifications issue I've been having).

A few comments:

  • I like the idea of next.
  • testing sounds like the current FAH stable (AKA fedora/28/x86_64/atomic-host). Is that right?
  • should we version our refs (i.e. include major version in the name) like we have done in the past for FAH? There is a long discussion here about what a "rolling" release could have looked like for FAH.
  • There will also be the rawhide ref that will exist if people want to go real leading edge.
  • are you suggesting we don't advertise the updates-testing ref to the users?

@bgilbert
Copy link
Contributor

bgilbert commented Aug 3, 2018

testing sounds like the current FAH stable (AKA fedora/28/x86_64/atomic-host). Is that right?

AIUI yes.

should we version our refs (i.e. include major version in the name) like we have done in the past for FAH? There is a long discussion here about what a "rolling" release could have looked like for FAH.

No. Users shouldn't have to think about Fedora versions or to manually update between them.

Relatedly, though, it's worth considering how we assign version numbers to trees, as a friendly name for the user. <Fedora release>.<snapshot serial number>.<patch release>?

There will also be the rawhide ref that will exist if people want to go real leading edge.

Is there value to continuing that? In the proposal above, FCOS won't ever be tracking Rawhide or attempting to keep it working.

are you suggesting we don't advertise the updates-testing ref to the users?

That's the proposal. I don't know whether it's the right idea, and I'd love to hear opinions.

@mscherer
Copy link

mscherer commented Aug 6, 2018

So what would be the ABI/API stability promise ?

If I am using next/testing/stable, what and how often should I expect breakage for each, and effort to fix those breakages ?

@ajeddeloh
Copy link
Contributor

ajeddeloh commented Aug 6, 2018

For API stability: With CL it's always been "Stable unless otherwise noted" which is something we should continue. Bugs aside, there should never be breaking changes without an announcement and deprecation window. Users should not have to worry about updates (but should run a bit of beta/whatever we call the prerelease channel in their clusters to detect upcoming bugs).

For ABI stability: You shouldn't care. Run your stuff in containers. Running on the host directly is unsupported. We will break you and not care if you run on the host directly.

@mscherer
Copy link

mscherer commented Aug 7, 2018

Running in container do not prevent to have a ABI. I do remember a article that made the round about how containers were not as portable as we tought, due to /proc being somehow exposed and being used for detecting selinux, etc and so triggering non standard code paths with Ubuntu container on RHEL. So the conclusion was that we can't really abstract the kernel from that. And even today, that's a point that is pushed by RH Folks like http://crunchtools.com/portability-not-compatibility/

So if I am being told "there is no ABI promise", how can I be sure that stuff will not break as a user ? And if that's not a problem, can someone tell to RH why they pretend otherwise and tell me I should care ?

@bgilbert
Copy link
Contributor

bgilbert commented Aug 7, 2018

Fedora CoreOS is a community software project. There is no ironclad way to "be sure that stuff will not break". As @ajeddeloh said, the best approach is running some of your nodes on preproduction streams and reporting bugs you encounter.

The kernel's ABI stability rules are also not ironclad, but they generally work pretty well. If you need stronger ABI guarantees than the upstream kernel can provide, neither Container Linux nor Fedora CoreOS nor any of the other Fedora editions are likely to meet your needs.

@bgilbert
Copy link
Contributor

bgilbert commented Aug 7, 2018

To @ajeddeloh's point, we'll need ways to carry downstream reversions of breaking changes in Fedora. Eventually there will be something like another cgroups v2 that we'll have to work around.

@ajeddeloh
Copy link
Contributor

I was more thinking ABI in terms of C/C++ ABI (don't link against our stuff and run it on the host please), but the kernel ABI is a good point. I don't think there's anything we can do there really other than say "run a little preprod if you want to catch bugs before they're a problem." We're not going to ship ancient or out of date kernels. That being said, breakage from the kernel, systemd, etc, is 99% bugs, not intenional changes.

@mscherer
Copy link

But Fedora has a process to get kernel tested before being pushed with the updates-testing system and a policy to not break ABI in stable release (as much as possible, that's not perfect, but people get annoyed when it happen).

https://fedoraproject.org/wiki/Updates_Policy

And my understanding is that such promise is not on the table, and @bgilbert explictely say in this thread that users shouldn't care about version. That's a goal I agree with, but if we rely on Fedora release as the basis and switch every 6 months, that is where things will break and where I think people would need to care about version.

So while Fedora do not promise much, as a user, I know when I can expect breakage and wait before upgrading because there is a time where I can use both. I do have 6 months to do the upgrade, which is fine for me.

My understanding of the proposal is that as a user, I would have 2 weeks to fix if I see anything broken on my side in testing before it get to stable, which is a much shorter timeframe.

And for the kernel ABI, while it is stable when the configuration do not change, we may also do configuration change in the future. So no matter how ironclad are the kernel devs, our packager might be as strict.

And API/ABI is also stuff like the Docker API, switching from Docker to podman/buildah, etc. Admins should relay on the binary API and run thing on the host for sure, but in the end, they also need to interact with the host somehow, using a API. Again, if you tell "this can be broken any time", this is not very compelling to me as a user.

@cgwalters
Copy link
Member

cgwalters commented Aug 20, 2018

Again, if you tell "this can be broken any time", this is not very compelling to me as a user.

If you look at how Container Linux has been managed, some things on the host have been removed, such as fleet with a year-long deprecation window. That's two Fedora release cycles.

I am sure that if FCOS made any such changes (e.g. dropping docker) it would make total sense to do so in concert with a Fedora major window.

As far as the rest of your comment - nothing is perfect, but the basic premise here is that value of containerization far outweighs concerns about corner case kernel ABIs affecting apps.

@bgilbert
Copy link
Contributor

But Fedora has a process to get kernel tested before being pushed with the updates-testing system and a policy to not break ABI in stable release (as much as possible, that's not perfect, but people get annoyed when it happen).

Stable Fedora releases update to new kernels with new ABIs all the time. Kernel 4.17 will go EOL soon, so Fedora 28 will bump to 4.18.

My understanding of the proposal is that as a user, I would have 2 weeks to fix if I see anything broken on my side in testing before it get to stable, which is a much shorter timeframe.

You'd have two weeks to report regressions that reach testing. If you're running a few testing nodes in production, as you should, hopefully that will be enough time to catch issues. If you want additional time to test packages from the following version of Fedora, that's what next is for.

@ajeddeloh
Copy link
Contributor

ajeddeloh commented Aug 24, 2018

Does Fedora have planned breaking changes that are announced anywhere? If so we should probably work with Fedora to either announce those explicitly for FCOS (since it's supposed to be "set and forget*") or ensure that the change won't impact users.
* except when we email you

@dustymabe
Copy link
Member

Does Fedora have planned breaking changes that are announced anywhere?

Fedora does have a "changes" process where large changes are announced and the implications of changes are considered. I'm sure not everything that changes gets reported, but it's a good way to at least socialize changes that we would like to make or to also monitor them so we are aware of changes that will affect us.

Here are links to all the fedora changes for the last 10 releases.

@bgilbert
Copy link
Contributor

Adding meeting label.

@bgilbert bgilbert added the meeting topics for meetings label Sep 19, 2018
@dustymabe
Copy link
Member

ok we discussed in the meeting today and agreed to re-visit in a week. I had a few comments today.

benjamins original proposal:

  • next: Preview of the next Fedora release when that has stabilized "enough". We'd want to define the cutover point: Bodhi enablement? the Fedora alpha release? During the rest of the cycle, next would track testing.
  • testing: Periodic snapshot of the current Fedora release + updates, perhaps every two weeks.
  • stable: Promotion of a snapshot that's baked in testing for two weeks, including any needed fixes.

benjamin mentioned that these are the refs that he expects people to be able to run in production, with stable being the most conservative and next being the least conservative.

development refs

we voiced the need for some "development" refs that essentially follow packages from the updates-testing fedora yum repos or are built nighly from the testing branch or something so we'd probably add a few more refs to the above proposal, but not publicize them.

need for backports

one concern I raised is that if testing is only composed of fedora stable yum repo packages and we only release stable ref after things have been in testing for two weeks then there is possibly a month from the time a package makes it to bodhi stable to the stable ref in FCOS.

tangent We could explore not considering bodhi at all and just using our automated tests to gate packages going into FCOS, which would speed things up a bit.

The above relationship between stable and testing is OK, but it raises the need for backports for security issues. I'd strongly like to not have us building our own kernels (or container runtime, glibc, curl, etc) with backport patches to go to stable ref. Any other ideas on how we can manage this appropriately? We will probably discuss this point in next weeks meeting.

@cgwalters
Copy link
Member

This ticket links into a whole lot of other things; there's deep questions here around how much we hook into/diverge from the current Fedora package process.

The proposal at the top seems to basically aim to be "single stream" - I am broadly in favor of this, although I think practical realities are going to force us into at least having refs for each underlying major or so?

@ajeddeloh
Copy link
Contributor

I think "single stream" (or rather "triple stream") is an absolute necessity. Updates are automatic and invisible to users (hopefully). Having separate refs for streams based on different fedora releases breaks that model (not that we can't work around that, but more that it conceptually "feels" different if we do).

@bgilbert
Copy link
Contributor

@cgwalters Why do you think we'd need separate refs for each major?

@bgilbert
Copy link
Contributor

bgilbert commented Oct 2, 2018

Backports

@dustymabe Do you have any sense of how many updates in updates-testing are never promoted to updates? I wouldn't want to lose whatever degree of validation we get from bodhi and the karma process.

My use of the word "backport" may have made the proposal sound scarier than it is. The situation I had in mind is a significant kernel security fix. In that case our options are a) accept an entirely new stable kernel, including perhaps 100-200 unrelated patches, or b) cherry-pick the relevant patches to the stable kernel we're already shipping. I'm arguing for option (b). There's no real backporting work to do; the patch will almost always apply cleanly. Option (a) would entail pushing a new kernel directly to our stable ref, thus risking the sorts of regressions that often come with new stable kernels.

When needed, we can use the same process for other packages as well, e.g. curl or docker. The key point is that, once we have the tooling to support backports, we can choose on a case-by-case basis whether to backport or accept an update from upstream Fedora. I don't expect the backport load to be especially heavy: in Container Linux today, many low-grade security fixes only go into the alpha channel and relatively few are backported to stable.

@bgilbert
Copy link
Contributor

bgilbert commented Oct 2, 2018

Kernel releases

There's another complication not mentioned above. Most packages only receive major updates between Fedora releases. In the above proposal, the next stream collects those updates to be tested before they reach the testing and stable streams. However, that mechanism won't help for the kernel, which receives major updates within a single Fedora release. With the proposal above, new kernels would land directly in testing, and then would promote to stable after only two weeks, which doesn't seem like enough time to avoid regressions.

Proposal: have next track the rawhide kernel after the kernel has stabilized "enough", perhaps -rc6 or so. This would occur whether or not next is currently tracking the next Fedora. That would give us a couple extra RCs and a few stable point releases to test the new kernel before it reaches the testing stream.

@dustymabe
Copy link
Member

I'm arguing for option (b). There's no real backporting work to do; the patch will almost always apply cleanly

That makes me feel better. Let's discuss this at the community meeting tomorrow and try to bring in the kernel team after that so we can come up with a plan.

@mattdm
Copy link

mattdm commented Oct 2, 2018

Dusty asked me to weigh in, but I generally don't have anything to add except I think what bgilbert is suggesting makes a lot of sense to me.

I do think having a stream with updates-testing enabled is going to be important, even if it's just not widely publicized. Otherwise, it's going to be hard to actually test the relevant packages in place in CoreOS, making it hard to get them out of testing with reasonable certainty.

I'm kind of thinking this should be "next", in fact. Some notable recent problems with dnf aside, it's usually the case that distro-sync handles the case of pulled updates fine — I've personally been running with updates-testing enabled for years. I think (right, @cgwalters?) that with rpm-ostree, going backwards (due to pulled updates) should be safe in almost all cases.

@cgwalters
Copy link
Member

@cgwalters Why do you think we'd need separate refs for each major?

Sorry I meant to follow up here. Basically I was more thinking for development purposes ("Let's test the new systemd in rawhide") or whatever. Exposing to users would be a distinct thing. One option is to have a separate ostree repo for this too.

@bgilbert
Copy link
Contributor

bgilbert commented Oct 5, 2018

We discussed this more at the meeting Wednesday. Here's the current state:

Proposal

Production refs

Fedora CoreOS will have several refs for use on production machines. At any given time, each ref will be downstream of a particular Fedora branch, and will consist of a snapshot of Fedora packages plus occasionally a backported fix.

  • testing: Periodic snapshot of the current Fedora release plus updates.
  • stable: Promotion of a testing release, including any needed fixes.
  • next:
    1. After Bodhi is enabled for the upcoming Fedora release, tracks that release; before then, tracks testing.
    2. After the upcoming kernel release has reached rc6 and before it goes final, tracks the rawhide kernel. After the kernel goes final and before it is included in the tracked Fedora release, tracks the kernel from updates-testing.

All of these refs will be unversioned, in the sense that their names will not include the current Fedora major version. The stream cadences are not contractual, but will initially have two weeks between releases. The stream maintenance policies are also not contractual and may evolve from those described above, but changes will preserve the use cases and intended stability of each stream.

Users will be encouraged to run most of their production systems on stable, and a few percent of their systems on each of next and testing to catch regressions before they reach stable.

Development refs

There will also be some additional unversioned refs for the convenience of Fedora CoreOS developers. These will be public, but won't be exposed to users in the same way as production refs: they might be in a different repo, or in the same repo but not listed in the summary file. None of these are contractual; they might go away if we don't find them useful.

  • rawhide: Nightly snapshot of rawhide.
  • bodhi-updates: Nightly snapshot of Bodhi updates for the Fedora release currently tracked by testing.
  • bodhi-updates-testing: Nightly snapshot of Bodhi updates-testing for the Fedora release currently tracked by testing.

Out-of-cycle releases

Due to the promotion structure described above, stable can contain packages that are as much as four weeks out of date. Sometimes, however, there will be an important bugfix or security fix that cannot wait a month to reach stable (or two weeks to reach next or testing). In that case, the fix will be incorporated into out-of-cycle releases on affected streams. These releases will not affect the regular promotion schedules; for example, a fix might sit in testing for only a few days before it is promoted to stable.

A fix can take one of two forms:

  1. An updated package taken directly from Fedora
  2. A minimal fix applied to the package version already present in the affected stream

We'll need infrastructure for both approaches, and the ability to choose between them on a case-by-case basis. Option 1 is cleaner and easier, but may not always be safe. Option 2 is especially useful for the kernel, where we'll want to fix individual bugs without pushing an entire stable kernel update directly to the stable stream.

If a fix is important enough for an out-of-cycle stable release, other affected release streams should be updated as well.

In some cases it may make sense to apply a fix to testing but not issue an out-of-cycle release, allowing the fix to be picked up automatically when testing promotes to stable.

Deprecation

Because production refs are unversioned, users will seamlessly upgrade between Fedora major releases, so compatibility must be maintained. Removal of functionality will require explicitly announced deprecations, potentially with long deprecation windows.

@bgilbert
Copy link
Contributor

I've updated #22 (comment) to reflect the comments from last week's meeting:

  • State that stream definitions, not just cadences, are non-contractual.
  • Rename the updates and updates-testing development refs to bodhi-updates and bodhi-updates-testing.

@lucab lucab removed the meeting topics for meetings label Oct 18, 2018
@bgilbert
Copy link
Contributor

PR in #72.

@dustymabe
Copy link
Member

PR in #72.

nice - and close to merging..

I think the only other thing was that we needed to set up a session with the kernel devs to socialize our backporting strategy documented above. Should we set up some time for that?

@bgilbert
Copy link
Contributor

bgilbert commented Nov 9, 2018

I think the only other thing was that we needed to set up a session with the kernel devs to socialize our backporting strategy documented above.

Filed as #80. Design doc merged in #72; closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants