Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrades to 123.0.0 can fail after unneeded kubelet restart #3827

Open
gdemonet opened this issue Jul 26, 2022 · 0 comments
Open

Upgrades to 123.0.0 can fail after unneeded kubelet restart #3827

gdemonet opened this issue Jul 26, 2022 · 0 comments
Labels
kind:bug Something isn't working topic:lifecycle Issues related to upgrade or downgrade of MetalK8s topic:salt Everything related to SaltStack in our product

Comments

@gdemonet
Copy link
Contributor

gdemonet commented Jul 26, 2022

Component: salt

What happened:

On a 3-nodes upgrade, where multiple registry "replicas" were configured, but only the bootstrap node (192.168.1.100 in this example) has the 123.0.0 archive, the rolling update of kube-apiserver fails on node-2 (192.168.1.102) with:

salt.exceptions.CommandExecutionError: Check availability of package container-selinux failed:
[...]
http://192.168.1.102:8080/metalk8s-123.0.0/redhat/7/metalk8s-epel-el7/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found
[...]
http://192.168.1.101:8080/metalk8s-123.0.0/redhat/7/metalk8s-epel-el7/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found

Analysis:

The issue is caused by two main problems:

  • when running metalk8s.orchestrate.apiserver to perform the rolling upgrade, kubelet restarts because of incomplete logic in cri.wait_pod (fixed in salt: Handle duplicates in cri.wait_pod #3828), which ends up marking the repositories-bootstrap Pod as not ready, hence removed it from the endpoints, before running the upgrade on node-2 - at this point, node-2 sees no mirror with the 123.0.0 version of metalk8s-epel, which causes the failure
  • there should not be a situation where this registry "HA setup" is in an incoherent state prior to running an upgrade - we need to implement something for managing the replicas properly
@gdemonet gdemonet added kind:bug Something isn't working topic:lifecycle Issues related to upgrade or downgrade of MetalK8s topic:salt Everything related to SaltStack in our product labels Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug Something isn't working topic:lifecycle Issues related to upgrade or downgrade of MetalK8s topic:salt Everything related to SaltStack in our product
Projects
None yet
Development

No branches or pull requests

1 participant