-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zincati refuses to update after manually staging a deployment #1691
Comments
We run k8s on fedora Coreos with zincati and fleetlock and got a similar issue.
|
Can you also add the output of |
|
JFI Couple of our nodes update from Looks like the problem exist only in |
One way this could happen is if you manually do e.g. |
Maybe but normally the reboot is triggered by zincati immediately after rpm-ostree upgrade and fleetlock take care about the locking (one node at a time) As i mentioned this only happens on |
Txn FinalizeDeployment on /org/projectatomic/rpmostree1/fedora_coreos failed: Staged deployment already unlocked
Txn FinalizeDeployment on /org/projectatomic/rpmostree1/fedora_coreos failed: Staged deployment already unlocked
Txn FinalizeDeployment on /org/projectatomic/rpmostree1/fedora_coreos failed: Staged deployment already unlocked
We can confirm this - we installed nrpe on provisioned hosts with this systemd unit:
Removing this unit results in a newly provisioned host to reboot as expected after applying the update to the latest CoreOS version. Is this in some way a not (yet) supported usecase? Or is this indeed expected to work and might get fixed soon? |
I'm guessing the Though, maybe not. Why would it only happen to fleetlock based zincati updates systems and not all systems with zincati enabled? |
This is shown on our test node for that command:
We will, for now, just throw out the nrpe installation via rpm-ostree and run it in a container instead. |
The issue is likely that the staged deployment created by rpm-ostree is unlocked and Zincati doesn't like that. Possibly the docs in https://docs.fedoraproject.org/en-US/fedora-coreos/os-extensions/ were written before this was tightened in Zincati. One workaround is to add |
If the docs as written today put the system in a state that can't be upgraded that really isn't great. Can we get someone to confirm? |
I've just run into this myself. I'm following the advice from https://docs.fedoraproject.org/en-US/fedora-coreos/os-extensions/ to layer in some extra packages on the first Ignition boot: https://github.com/samcday/home-cluster/blob/main/control-plane/config.bu#L93 I was specifically testing Zincati with fleetlock so I deliberately re-provisioned a node with the previous stable version ( So my
And Zincati is complaining with:
|
Now that we've made stabilized and made public the finalization APIs, let's use the new bindings. This also fixes an issue where when creating a locked deployment using the legacy API (i.e. touching the `/run/ostree/staged/deployment-locked` file before calling the staging API), if a staged deployment already exists, libostree would just nuke the lockfile (this behaviour was introduced in ostreedev/ostree#3077). In theory the legacy API (via the lockfile) should keep working, but the core issue is that there's no way for libostree to know if the lockfile is carried-over state, or was freshly created for the current invocation. So let's not try to salvage the legacy API and just move over to the new one. We already have finalization tests; they will now test that the new API functions correctly. But stop looking for the legacy lockfile. We could instead inspect the staged deployment GVariant, but these checks were redundant anyway since the tests verify the finalization by actually rebooting and/or not use `finalize-deployment --allow-unlocked`. Fixes: coreos/fedora-coreos-tracker#1691
Now that we've stabilized and made public deployment finalization APIs, let's use them. This also fixes an issue where when creating a locked deployment using the legacy API (i.e. touching the `/run/ostree/staged/deployment-locked` file before calling the staging API), if a staged deployment already exists, libostree would just nuke the lockfile (this behaviour was introduced in ostreedev/ostree#3077). In theory the legacy API (via the lockfile) should keep working, but the core issue is that there's no way for libostree to know if the lockfile is carried-over state, or was freshly created for the current invocation. So let's not try to salvage the legacy API and just move over to the new one. We already have finalization tests; they will now test that the new API functions correctly. But stop looking for the legacy lockfile. We could instead inspect the staged deployment GVariant, but these checks were redundant anyway since the tests verify the finalization by actually rebooting and/or not use `finalize-deployment --allow-unlocked`. Fixes: coreos/fedora-coreos-tracker#1691
Now that we've stabilized and made public deployment finalization APIs, let's use them. This also fixes an issue where when creating a locked deployment using the legacy API (i.e. touching the `/run/ostree/staged-deployment-locked` file before calling the staging API), if a staged deployment already exists, libostree would just nuke the lockfile (this behaviour was introduced in ostreedev/ostree#3077). In theory the legacy API (via the lockfile) should keep working, but the core issue is that there's no way for libostree to know if the lockfile is carried-over state, or was freshly created for the current invocation. So let's not try to salvage the legacy API and just move over to the new one. We already have finalization tests; they will now test that the new API functions correctly. But stop looking for the legacy lockfile. We could instead inspect the staged deployment GVariant, but these checks were redundant anyway since the tests verify the finalization by actually rebooting and/or not use `finalize-deployment --allow-unlocked`. Fixes: coreos/fedora-coreos-tracker#1691
This should be fixed by coreos/rpm-ostree#4939. Short-term hacks around this:
|
There is a bug that causes nodes started with --apply-live to not be able to update via zincati [1]. We can add it back once the bug is taken care of. [1] coreos/fedora-coreos-tracker#1691
There is a bug that causes nodes started with --apply-live to not be able to update via zincati [1]. We can add it back once the bug is taken care of. [1] coreos/fedora-coreos-tracker#1691
Describe the bug
We've just CoreOS for a while now but always had to disable zincati to ensure uninterrupted availability of the container hosts. We are now trying to set up zincati with the lock-based update strategy but while the update does seem to get downloaded and "staged" (?), the system does never reboot to apply the update.
I'm not very familiar with the inner workings of rpm-ostree and zincati. So is there some documentation about how exactly "deployments", "stagings" and "locks" work together within ostree? Maybe I can then find more information related to the shown error
Staged deployment already unlocked
.Reproduction steps
Expected behavior
System does reboot and apply the outstanding update.
Actual behavior
System does never reboot and continuously logs
Txn FinalizeDeployment on /org/projectatomic/rpmostree1/fedora_coreos failed: Staged deployment already unlocked
.System details
Additional information
The text was updated successfully, but these errors were encountered: