Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud-api-adaptor ds rolling update non-disruptively #1322

Closed
huoqifeng opened this issue Aug 11, 2023 · 2 comments · Fixed by #1323
Closed

cloud-api-adaptor ds rolling update non-disruptively #1322

huoqifeng opened this issue Aug 11, 2023 · 2 comments · Fixed by #1323

Comments

@huoqifeng
Copy link
Contributor

huoqifeng commented Aug 11, 2023

Follow up of the issue #1240.
The proposed solution is to leverage the rolling update feature in Daemonset in CR and add additional probes:

So, we can make sure all PeerPod instances on specific nodes can be running before update another node. Diagram like:
image

DaemonSetController rolling update cloud-api-adaptor ds, steps as below

  1. Update the cloud-api-adaptor pod on Node-1
  2. cloud-api-adaptor pod recreated and startup on Node-1
  3. PeerPod VM-1 on Node-1 recreated and running
  4. Probes return ready (via readiness probe, startup probe or containers lifecycle hook)
  5. Update the cloud-api-adaptor pod on Node-2
  6. cloud-api-adaptor pod recreated and startup on Node-2
  7. PeerPod VM-2 on Node-2 recreated and running
  8. Probes return ready (via readiness probe, startup probe or containers lifecycle hook

In this way, we can avoid the complete down time window on all nodes (cloud-api-adaptor pod upgraded but corresponding PeerPod instances not recreated yet), So, we might:

  • Add a http service for probe in cloud-api-adaptor container
  • Add probes in cloud-adi-adaptor ds yaml
  • Implement the probe service to check all Pod instance status with runtimeClass and nodename fields match
@huoqifeng
Copy link
Contributor Author

@jtumber-ibm @stevenhorsman @bpradipt @liudalibj @surajssd @jensfr This is the proposed solution based on the discussion on the weekly meeting on Wed, may you please help give your comments on that?

@huoqifeng huoqifeng changed the title cloud-api-adaptor ds rolling update cloud-api-adaptor ds rolling update non-disruptively Aug 11, 2023
@katexochen
Copy link
Contributor

As far as I understand, the proposal right now requires 2 nodes to be able to update. I think it would be beneficial if we could manage to update single nodes clusters. While most production clusters likely consist of multiple nodes, for testing, especially in CI, its much easier to run on a single node cluster.

Would it be possible to update pods on the same node one-by-one to the new CAA version? Very naive approach:

  • Have two instances of a pod running on the same node
  • Start new version CAA pod next to old CAA pod (changes required to make that possible I guess)
  • For one pod at a time, terminate it and recreate under the new version

huoqifeng added a commit to huoqifeng/cloud-api-adaptor that referenced this issue Aug 22, 2023
Fixes: confidential-containers#1322

Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
huoqifeng added a commit that referenced this issue Aug 22, 2023
Fixes: #1322

Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
wainersm pushed a commit to wainersm/cc-cloud-api-adaptor that referenced this issue Sep 5, 2023
Fixes: confidential-containers#1322

Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
lysliu pushed a commit to lysliu/cloud-api-adaptor that referenced this issue Nov 9, 2023
Fixes: confidential-containers#1322

Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants