Skip to content

Conversation

keszybz
Copy link
Contributor

@keszybz keszybz commented Jul 21, 2015

While testing offline updates I noticed that reboot was scheduled twice when the packagekit offline update failed (see patch one for logs). Two fixes to make sure that either failure is returned or a reboot/shutdown in scheduled.

@keszybz
Copy link
Contributor Author

keszybz commented Jul 21, 2015

Hm, still discussing this on the mailing list. I think the patches are an improvement over status quo, but you might want to wait a bit to see if the semantics change.

@hughsie
Copy link
Collaborator

hughsie commented Jul 27, 2015

Both patches seem sane to me, but please keep me in the loop on the mailing list discussion. FWIW, fwupd also uses the offline update mechanism too.

@keszybz
Copy link
Contributor Author

keszybz commented Jul 27, 2015

On Mon, Jul 27, 2015 at 08:55:23AM -0700, Richard Hughes wrote:

Both patches seem sane to me, but please keep me in the loop on the mailing list discussion.

There hasn't been much discussion on systemd-devel yet. @poettering
is recently back from vacation, and people are working on a release.
Hopefully the subject will be picked up after that.

FWIW, fwupd also uses the offline update mechanism too.

It most likely only works as a race condition then :)

@Conan-Kudo
Copy link
Member

@keszybz Is this still something that needs to be dealt with?

@keszybz
Copy link
Contributor Author

keszybz commented Jul 12, 2016

Yeah, those two patches seem still relevant.

@Conan-Kudo
Copy link
Member

@keszybz Could you rebase the commits against master? And what happened with the mailing list discussion?

@kalev
Copy link
Collaborator

kalev commented Jan 30, 2017

@keszybz Do you have a link to the mailing list discussion that I could read?

@keszybz
Copy link
Contributor Author

keszybz commented Mar 23, 2017

Rebased and updated with a commit that should fix https://bugzilla.redhat.com/show_bug.cgi?id=1430920. Please have a look and test.

And what happened with the mailing list discussion?

There were a few discussions. Anyway, the "official" recommendations are all gathered in https://www.freedesktop.org/software/systemd/man/systemd.offline-updates.html. If anything is missing, issues/PRs against that page in the systemd bug tracker will be the most effective.

@kalev
Copy link
Collaborator

kalev commented Mar 24, 2017

Awesome, thanks for updating these, Zbigniew! I'll go and apply the 3rd patch downstream in Fedora to get the blocker fix in.

I'm a bit unsure about the first patch, "pk-offline-update: only try to shutdown or reboot on success". If the user explicitly wanted to shut down how is it a good idea to then reboot on failure? I think I'd prefer to shut down in that case as well.

Let's say a user has hacked the whole day on systemd at home and her eyes are bleeding and she wants to go to bed. She unplugs the computer and goes to turn it off. When doing that she notices that GNOME Shell offers updates as well. She chooses "Install pending software updates" and clicks Power Off and walks away. Next morning, she puts the computer in her bag, knowing it's fully updated and nearly fully charged as well, and heads off to give a presentation. When there, the computer doesn't turn on ... and she realizes that it must have rebooted instead of shutting down and it was actually powered on all nighty and the battery is fully drained now.

I'd like to avoid that if possible :)

@keszybz
Copy link
Contributor Author

keszybz commented Mar 26, 2017

If the user explicitly wanted to shut down how is it a good idea to then reboot on failure? I think I'd prefer to shut down in that case as well.

True, if you requested a shutdown, you probably want a shutdown even if the upgrade fails.

Existing code was subject to a race: a reboot or a shutdown was scheduled, but the unit has OnFailure=reboot.target, so it might be possible for the shutdown to be overridden by a reboot anyway. So it seems the right solution is to issue a big warning on failure, and schedule a shutdown or reboot, and exit with success, to avoid the OnFailure handling. If this approach is acceptable, I'll rework the patch.

@kalev
Copy link
Collaborator

kalev commented Mar 26, 2017

That sounds great to me! Thanks for working on this.

keszybz added 2 commits March 31, 2017 15:17
The unit file contains OnFailure=reboot.target, so if we return
non-zero for any reason, a reboot will be scheduled. This creates a
race condition where our request to shutdown might be overridden by
the request to reboot. To avoid the issue, always return "success"
after we have scheduled the shutdown or reboot ourselves.

However, if we fail to we fail to communicate with systemd, return
error so that systemd's failure handling is triggered. Should not
happen to often, but this makes things marginally more robust, since
a reboot is better than a hang.

Double request example:
Jul 20 21:55:00 fedora22 pk-offline-update[657]: failed to update system: cannot download Packages/g/gstreamer1-1.4.5-1.fc2
...
Jul 20 21:55:16 fedora22 systemd[1]: Trying to enqueue job reboot.target/start/replace
Jul 20 21:55:16 fedora22 systemd[1]: Enqueued job reboot.target/start as 658
Jul 20 21:55:16 fedora22 systemd[1]: packagekit-offline-update.service failed.
...
Jul 20 21:55:11 fedora22 systemd[1]: packagekit-offline-update.service: main process exited, code=exited, status=1/FAILURE
Jul 20 21:55:11 fedora22 systemd[1]: packagekit-offline-update.service changed running -> failed
Jul 20 21:55:11 fedora22 systemd[1]: Unit packagekit-offline-update.service entered failed state.
Jul 20 21:55:11 fedora22 systemd[1]: Triggering OnFailure= dependencies of packagekit-offline-update.service.
Jul 20 21:55:16 fedora22 systemd[1]: Job system-update.target/stop finished, result=done
- Change description to "Update ...", because systemd uses the
  Description when starting and stopping units, and "Starting Update
  the OS…" better than "Starting Updates the OS…".

- Add After=dbus.socket. a54c5ae added Requires=, but Requires=
  without After= is useless for socket units.

- Add DefaultDependencies=no to drop the dependency on basic.target,
  and add Requires=sysinit.target, After=sysinit.target
  systemd-journald.socket, Before=shutdown.target system-update.target
  to re-add the dependencies that are necessary.

  Because of the dependency on dbus.socket, which starts dbus.service,
  we still have a dependency on basic.target. (As soon as a connection
  to the socket is made, dbus.service will be started, and since
  dbus.service has DefaultDependencies=yes, we'll wait for
  basic.target to be reached.) But we might be started without our
  symlink being in place and exit without doing anything. In that case
  basic.target might not be needed. This allows other upgrade
  mechanism which do not require dbus to proceed.

- Change OnFailure to FailureAction — more modern and nicer syntax, and
  the target is started irreplaceably, which should be more robust.

- Change to Type=oneshot. The default is Type=simple, which means that
  systemd considers the unit started immediately, which means that it
  races with the system-update-cleanup.service which is started after
  system-update.target is reached.

  https://bugzilla.redhat.com/show_bug.cgi?id=1430920
@keszybz
Copy link
Contributor Author

keszybz commented Mar 31, 2017

Updated. I squashed the first two commits together, they make more sense as one step now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants