Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling back doesn't work well with automatic updates #247

Closed
mike-nguyen opened this issue Aug 7, 2019 · 10 comments
Closed

Rolling back doesn't work well with automatic updates #247

mike-nguyen opened this issue Aug 7, 2019 · 10 comments

Comments

@mike-nguyen
Copy link
Member

Expected behavior: disable automatic updates and rpm-ostree rollback successfully rollback to the previous deployment
Actual behavior: automatic updates is difficult to disable because the rollback removes the file that disables automatic updates

Issue:
I provisioned a FCOS system running 30.20190725.0 and when the system booted, it automatically updated and rebooted into 30.20190801.0. 🍾 automatic updates!

Lets say I needed to manually rollback to 30.20190725 for some reason. I disabled updates by creating /etc/zincati/config.d/90-disable-auto-updates.toml with contents

[updates]
enabled = false

rpm-ostree rollback, and rebooted. The problem is that rpm-ostree rollback will roll back to an /etc/zincati/config.d without the 90-disable-auto-updates.toml and will automatically update again as soon as the system is booted.

On FCOS:
[core@localhost ~]$ cat /etc/zincati/config.d/90-disable-auto-updates.toml
[updates]
enabled = false
[core@localhost ~]$ sudo rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● ostree://fedora:fedora/x86_64/coreos/testing
                   Version: 30.20190801.0 (2019-08-01T13:54:21Z)
                    Commit: a9c8d66d3628d1b9b4c4690777e8b730d08329b4359410cb410a2003296af1ca
              GPGSignature: Valid signature by F1D8EC98F241AAF20DF69420EF3C111FCFC659B9

  ostree://fedora:fedora/x86_64/coreos/testing
                   Version: 30.20190725.0 (2019-07-25T18:54:22Z)
                    Commit: 8b79877efa7ac06becd8637d95f8ca83aa385f89f383288bf3c2c31ca53216c7
              GPGSignature: (unsigned)
[core@localhost ~]$ sudo rpm-ostree rollback
Moving '8b79877efa7ac06becd8637d95f8ca83aa385f89f383288bf3c2c31ca53216c7.0' to be first deployment
Bootloader updated; bootconfig swap: yes; deployment count change: 0
Downgraded:
  container-selinux 2:2.111.0-1.fc30 -> 2:2.107-1.git453b816.fc30
  containers-common 1:0.1.37-2.fc30 -> 1:0.1.37-0.gite079f9d.fc30
  glib2 2.60.6-1.fc30 -> 2.60.5-1.fc30
  iptables 1.8.2-3.fc30 -> 1.8.2-1.fc30
  iptables-libs 1.8.2-3.fc30 -> 1.8.2-1.fc30
  iptables-nft 1.8.2-3.fc30 -> 1.8.2-1.fc30
  iptables-services 1.8.2-3.fc30 -> 1.8.2-1.fc30
  kernel 5.1.20-300.fc30 -> 5.1.18-300.fc30
  kernel-core 5.1.20-300.fc30 -> 5.1.18-300.fc30
  kernel-modules 5.1.20-300.fc30 -> 5.1.18-300.fc30
  libldb 1.5.5-1.fc30 -> 1.5.4-1.fc30
  libnftnl 1.1.3-1.fc30 -> 1.1.1-6.fc30
  libsmbclient 2:4.10.6-0.fc30 -> 2:4.10.5-1.fc30
  libssh 0.9.0-5.fc30 -> 0.8.7-1.fc30
  libwbclient 2:4.10.6-0.fc30 -> 2:4.10.5-1.fc30
  nftables 1:0.9.1-2.fc30 -> 1:0.9.0-5.fc30
  openssh 8.0p1-5.fc30 -> 8.0p1-4.fc30
  openssh-clients 8.0p1-5.fc30 -> 8.0p1-4.fc30
  openssh-server 8.0p1-5.fc30 -> 8.0p1-4.fc30
  samba-client-libs 2:4.10.6-0.fc30 -> 2:4.10.5-1.fc30
  samba-common 2:4.10.6-0.fc30 -> 2:4.10.5-1.fc30
  samba-common-libs 2:4.10.6-0.fc30 -> 2:4.10.5-1.fc30
  samba-libs 2:4.10.6-0.fc30 -> 2:4.10.5-1.fc30
  selinux-policy 3.14.3-42.fc30 -> 3.14.3-41.fc30
  selinux-policy-targeted 3.14.3-42.fc30 -> 3.14.3-41.fc30
  skopeo 1:0.1.37-2.fc30 -> 1:0.1.37-0.gite079f9d.fc30
  sqlite-libs 3.26.0-6.fc30 -> 3.26.0-5.fc30
  vim-minimal 2:8.1.1749-1.fc30 -> 2:8.1.1713-1.fc30
  whois-nls 5.5.0-1.fc30 -> 5.4.3-1.fc30
Removed:
  libssh-config-0.9.0-5.fc30.noarch
  systemd-container-241-9.gitb67ecf2.fc30.x86_64
Added:
  bridge-utils-1.6-3.fc30.x86_64
Run "systemctl reboot" to start a reboot
[core@localhost ~]$ sudo systemctl reboot
[core@localhost ~]$ Connection to 192.168.122.153 closed by remote host.
Connection to 192.168.122.153 closed.
$ ssh core@192.168.122.153
Warning: Permanently added '192.168.122.153' (ECDSA) to the list of known hosts.
Fedora 30.20190725.0 (CoreOS preview)
Tracker: https://github.com/coreos/fedora-coreos-tracker
Preview release: breaking changes may occur

Last login: Tue Aug  6 21:25:55 2019 from 192.168.122.1
[core@localhost ~]$ sudo rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
  ostree://fedora:fedora/x86_64/coreos/testing
                   Version: 30.20190801.0 (2019-08-01T13:54:21Z)
                    Commit: a9c8d66d3628d1b9b4c4690777e8b730d08329b4359410cb410a2003296af1ca
              GPGSignature: Valid signature by F1D8EC98F241AAF20DF69420EF3C111FCFC659B9
                      Diff: 29 upgraded, 1 removed, 2 added

● ostree://fedora:fedora/x86_64/coreos/testing
                   Version: 30.20190725.0 (2019-07-25T18:54:22Z)
                    Commit: 8b79877efa7ac06becd8637d95f8ca83aa385f89f383288bf3c2c31ca53216c7
              GPGSignature: (unsigned)

  ostree://fedora:fedora/x86_64/coreos/testing
                   Version: 30.20190801.0 (2019-08-01T13:54:21Z)
                    Commit: a9c8d66d3628d1b9b4c4690777e8b730d08329b4359410cb410a2003296af1ca
              GPGSignature: Valid signature by F1D8EC98F241AAF20DF69420EF3C111FCFC659B9
[core@localhost ~]$ Connection to 192.168.122.153 closed by remote host.
Connection to 192.168.122.153 closed.
$ sshq core@192.168.122.153
Warning: Permanently added '192.168.122.153' (ECDSA) to the list of known hosts.
Fedora 30.20190801.0 (CoreOS preview)
Tracker: https://github.com/coreos/fedora-coreos-tracker
Preview release: breaking changes may occur

Last login: Wed Aug  7 17:12:53 2019 from 192.168.122.1
[core@localhost ~]$ sudo rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● ostree://fedora:fedora/x86_64/coreos/testing
                   Version: 30.20190801.0 (2019-08-01T13:54:21Z)
                    Commit: a9c8d66d3628d1b9b4c4690777e8b730d08329b4359410cb410a2003296af1ca
              GPGSignature: Valid signature by F1D8EC98F241AAF20DF69420EF3C111FCFC659B9

  ostree://fedora:fedora/x86_64/coreos/testing
                   Version: 30.20190725.0 (2019-07-25T18:54:22Z)
                    Commit: 8b79877efa7ac06becd8637d95f8ca83aa385f89f383288bf3c2c31ca53216c7
              GPGSignature: (unsigned)
[core@localhost ~]$ ls /etc/zincati/config.d/

@bgilbert suggested possibly automatically updating to a build only once. @arithx mentioned this related issue regarding reboots with active interactive sessions: #239

@bgilbert bgilbert changed the title Manually rolling back doesn't work well with automatic updates Rolling back doesn't work well with automatic updates Aug 7, 2019
@bgilbert
Copy link
Contributor

bgilbert commented Aug 7, 2019

Generalizing this issue a bit, we should also avoid update loops in the case that a failed update is automatically rolled back.

@lucab
Copy link
Contributor

lucab commented Aug 7, 2019

I believe there is quite a bit of design work hidden in here to define the semantics and flows (both human and automatic) of declaring a release "locally not appreciated".

We definitely don't want the user to disable the auto-update logic. Instead, the system should be taking care of not following again edges to releases that the node did not appreciate.

There is probably a similar but more tricky case of just rebooting into an older deployment. I haven't tried that yet, but I would expect Zincati to cause the same havoc as the one originally reported here.

@dustymabe
Copy link
Member

could we give zincati some state information? There are a few pieces of state that would be useful:

  1. rolled back from (tells us that a rollback occurred and what version we rolled back from)

In this case we probably should disable auto-updating to the one we rolled back from

  1. if we want to continue automatically updating

If we are manually rolled back by the user then maybe we should disable auto updating?

  1. if we booted into the non-default deployment

In this case maybe we should just disable auto-updating as it's not clear what the user wants us to do?

@bgilbert
Copy link
Contributor

bgilbert commented Aug 7, 2019

In this case we probably should disable auto-updating to the one we rolled back from

I was thinking Zincati would just keep track of updates it's applied, and refuse to install the same update twice. That sidesteps the question of tracking rollbacks per se.

If we are manually rolled back by the user then maybe we should disable auto updating?

There's a very high bar to automatically disabling updates. We should basically never do it without an excellent reason.

We might want to allow the user to manually blacklist certain versions (not ranges!) without updating to them first, but that's a bit different from this bug. Maybe we should just add a command-line mode that adds/removes a version from the replay list described above?

@cgwalters
Copy link
Member

Offhand idea: Have zincati "register" in such a way that rpm-ostree knows it's being driven via zincati.service. Any changes that come from elsewhere rollback/upgrade/deploy are rejected as long as that service is active.

So if you want to rollback, you clearly need to opt-out of zincati.

@bgilbert
Copy link
Contributor

bgilbert commented Aug 8, 2019

So if you want to rollback, you clearly need to opt-out of zincati.

I don't think that's what we actually want, though. We should discourage users from disabling updates, while still giving them the functionality they need, such as avoiding a particular bad release.

@lucab
Copy link
Contributor

lucab commented Aug 8, 2019

@bgilbert basically nailed it. But I think we may be all talking about the same thing just with different terms, so I'll try to rephrase all the comments above:

  • the user should NOT be forced to manually disable auto-updates. If this occurs, it's likely a UX bug to be fixed
  • Zincati should be aware of previously deployed (or rolled-back, only?) upgrades, and filter them out from the set of possible updates
  • Zincati should detect a manual non-default-deployment boot, and temporarily stun itself till the next "normal" boot
  • Zincati should announce itself to rpm-ostree, once there is a shared protocol for doing that (see Strengthen notion of "update driver" in rpm-ostree status rpm-ostree#1747)
  • wishlist: the user should be able to express preferences over the set of available update targets provided by Cincinnati

@jlebon has some ongoing work on deployments history at coreos/rpm-ostree#1813, so I'll wait for him to come back to brainstorm on if/how to integrate that here.

Additionally, Zincati only provides auto-update hints on the side of rpm-ostree, and the user is still free to manually rollback/upgrade/deploy without having to disable Zincati. We may likely suggest to stop it though when performing manual transactions, to avoid unexpected results due to conflicting actions going through in parallel.

@bgilbert
Copy link
Contributor

bgilbert commented Aug 8, 2019

Zincati should detect a manual non-default-deployment boot, and temporarily stun itself till the next "normal" boot

A regression would then require manual intervention twice: once to roll back, and once to start updates again.

@lucab
Copy link
Contributor

lucab commented Aug 26, 2019

So, we had additional chats around this topic and we decided to go in the short term for a slightly different strategy on Zincati side.
In order to avoid interferences with rollbacks and non-default boots, Zincati will look for deployments already available locally and filter them out from possible targets. That will avoid immediate auto-updates both in auto-rollback and manual-boot cases, thus covering the initial report.

I've split this task to coreos/zincati#111.

@lucab
Copy link
Contributor

lucab commented May 19, 2020

This has been fixed with Zincati 0.0.6, which is already in all channels. Closing.

@lucab lucab closed this as completed May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants