Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install: add probes for caa ds rolling update #1323

Merged
merged 1 commit into from
Aug 22, 2023

Conversation

huoqifeng
Copy link
Contributor

Fixes: #1322

@huoqifeng huoqifeng temporarily deployed to external August 11, 2023 08:17 — with GitHub Actions Inactive
@huoqifeng
Copy link
Contributor Author

Deployed a nginx deployment PeerPods against a k8s cluster with 2 workers. and then rolling update the cloud-api-adaptor ds, logs looks like below from 2 console:

  • The cloud-api-adaptor ds
# kubectl -n confidential-containers-system get po --watch | grep cloud-api-adaptor
cloud-api-adaptor-daemonset-8whkj                 1/1     Running             0          13m
cloud-api-adaptor-daemonset-v8q6c                 0/1     ContainerCreating   0          3s
cloud-api-adaptor-daemonset-v8q6c                 0/1     Running             0          4s
cloud-api-adaptor-daemonset-v8q6c                 0/1     Running             0          41s
cloud-api-adaptor-daemonset-v8q6c                 1/1     Running             0          41s
cloud-api-adaptor-daemonset-8whkj                 1/1     Terminating         0          13m
cloud-api-adaptor-daemonset-8whkj                 0/1     Terminating         0          13m
cloud-api-adaptor-daemonset-8whkj                 0/1     Terminating         0          13m
cloud-api-adaptor-daemonset-8whkj                 0/1     Terminating         0          13m
cloud-api-adaptor-daemonset-98fk4                 0/1     Pending             0          0s
cloud-api-adaptor-daemonset-98fk4                 0/1     Pending             0          0s
cloud-api-adaptor-daemonset-98fk4                 0/1     ContainerCreating   0          0s
cloud-api-adaptor-daemonset-98fk4                 0/1     Running             0          3s
cloud-api-adaptor-daemonset-98fk4                 0/1     Running             0          41s
cloud-api-adaptor-daemonset-98fk4                 1/1     Running             0          41s
  • The nginx pods:
$ kubectl get po --watch
NAME                    READY   STATUS    RESTARTS      AGE
nginx-daemonset-5srp8   1/1     Running   2 (12m ago)   13h
nginx-daemonset-5tcgz   1/1     Running   2 (12m ago)   13h
nginx-daemonset-5srp8   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5srp8   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5srp8   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5srp8   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5srp8   0/1     Error     2             13h
nginx-daemonset-5srp8   0/1     Error     2             13h
nginx-daemonset-5srp8   0/1     Error     2             13h
nginx-daemonset-5srp8   0/1     Error     2             13h
nginx-daemonset-5srp8   0/1     Error     2             13h
nginx-daemonset-5srp8   0/1     Error     2             13h
nginx-daemonset-5srp8   1/1     Running   3 (38s ago)   13h
nginx-daemonset-5tcgz   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5tcgz   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5tcgz   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5tcgz   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   1/1     Running   3 (41s ago)   13h

@huoqifeng huoqifeng changed the title [WIP] install: add probes for caa ds rolling update install: add probes for caa ds rolling update Aug 16, 2023
Copy link
Member

@liudalibj liudalibj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks @huoqifeng

Copy link
Contributor

@tumberino tumberino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits

pkg/probe/checker.go Outdated Show resolved Hide resolved
pkg/probe/checker.go Outdated Show resolved Hide resolved
Copy link
Contributor

@tumberino tumberino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @huoqifeng

I haven't had a chance to actually test this and I'm on holiday from tomorrow (so I won't formally approve) but the code makes sense and looks good.

@huoqifeng
Copy link
Contributor Author

huoqifeng commented Aug 17, 2023

Did another test like below:

  • Created a service like:
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  type: NodePort
  ports:
    - name: port80
      port: 80
      targetPort: 80
      protocol: TCP
  selector:
    app: nginx
$ kubectl get svc |grep nginx
nginx-service   NodePort    172.21.128.171   <none>        80:31551/TCP   16h
  • The corresponding pods are on different nodes
# kubectl get po nginx-deployment-7796887b88-gnzx8 -o yaml |grep nodeName
  nodeName: 10.244.64.11
# kubectl get po nginx-deployment-7796887b88-ltkq5 -o yaml |grep nodeName
  nodeName: 10.244.64.6
  • Created a busybox pod via:
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  labels:
    app: busybox
spec:
  containers:
    - name: busybox
      command: ['sh', '-c', 'while true; do date; sleep 30; done']
      image: radial/busyboxplus:curl
  • Exec into the busy box pod and prepared a test script
# kubectl exec -it busybox /bin/sh
[ root@busybox:/ ]$ cat test.sh 
#!/bin/bash
while true; do
    curl 172.21.128.171:80
    sleep 1  # Sleep for 1 second
done
  • Update the cloud-api-adaptor's ds and monitor the pods
# kubectl -n confidential-containers-system get po --watch | grep cloud-api-adaptor
cloud-api-adaptor-daemonset-ftmzt                 0/1     Running   0          6s
cloud-api-adaptor-daemonset-j9bxr                 1/1     Running   0          24m
cloud-api-adaptor-daemonset-ftmzt                 0/1     Running   0          40s
cloud-api-adaptor-daemonset-ftmzt                 1/1     Running   0          40s
cloud-api-adaptor-daemonset-j9bxr                 1/1     Terminating   0          24m
cloud-api-adaptor-daemonset-j9bxr                 0/1     Terminating   0          24m
cloud-api-adaptor-daemonset-j9bxr                 0/1     Terminating   0          24m
cloud-api-adaptor-daemonset-j9bxr                 0/1     Terminating   0          24m
cloud-api-adaptor-daemonset-5cv68                 0/1     Pending       0          0s
cloud-api-adaptor-daemonset-5cv68                 0/1     Pending       0          0s
cloud-api-adaptor-daemonset-5cv68                 0/1     ContainerCreating   0          0s
cloud-api-adaptor-daemonset-5cv68                 0/1     Running             0          2s
cloud-api-adaptor-daemonset-5cv68                 0/1     Running             0          40s
cloud-api-adaptor-daemonset-5cv68                 1/1     Running             0          40s
  • Run the test in busybox pod
[ root@busybox:/ ]$ sh test.sh 

Logs:
test-log.txt

@huoqifeng
Copy link
Contributor Author

Files used for testing:

  1. nginx deploy and svc (rename it to *.yaml)
    nginx-deploy-svc.txt
  2. busybox (rename it to *.yaml)
    busybox.txt
  3. test script (rename it to *.sh)
    test.txt

@@ -139,6 +140,7 @@ func (cfg *daemonConfig) Setup() (cmd.Starter, error) {
var config cmd.Config = &daemonConfig{}

func main() {
go probe.Start()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the probe.Start fails ? Does CAA needs to bail out as well?
Also if CAA fails to start due to an error post starting the probe, does the goroutine needs to be killed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failed startupProbe will cause pods recreating. per https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

If the startup probe never succeeds, the container is killed after 300s and subject to the pod's restartPolicy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be good to add a comment in the code, to help new devs understand why error from the goroutine is not handled.
Like "We don't handle error from the probe goroutine since if the goroutine fails to start, the probe configured as part of the deployment will fail and the CAA containerwill be killed and restarted"

pkg/probe/checker.go Outdated Show resolved Hide resolved
pkg/probe/probe.go Outdated Show resolved Hide resolved
Fixes: confidential-containers#1322

Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
Copy link
Member

@bpradipt bpradipt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Thanks @huoqifeng

@huoqifeng huoqifeng merged commit b3c073a into confidential-containers:main Aug 22, 2023
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cloud-api-adaptor ds rolling update non-disruptively
4 participants