install: add probes for caa ds rolling update #1323

huoqifeng · 2023-08-11T08:17:53Z

huoqifeng · 2023-08-15T01:53:15Z

Deployed a nginx deployment PeerPods against a k8s cluster with 2 workers. and then rolling update the cloud-api-adaptor ds, logs looks like below from 2 console:

The cloud-api-adaptor ds

# kubectl -n confidential-containers-system get po --watch | grep cloud-api-adaptor
cloud-api-adaptor-daemonset-8whkj                 1/1     Running             0          13m
cloud-api-adaptor-daemonset-v8q6c                 0/1     ContainerCreating   0          3s
cloud-api-adaptor-daemonset-v8q6c                 0/1     Running             0          4s
cloud-api-adaptor-daemonset-v8q6c                 0/1     Running             0          41s
cloud-api-adaptor-daemonset-v8q6c                 1/1     Running             0          41s
cloud-api-adaptor-daemonset-8whkj                 1/1     Terminating         0          13m
cloud-api-adaptor-daemonset-8whkj                 0/1     Terminating         0          13m
cloud-api-adaptor-daemonset-8whkj                 0/1     Terminating         0          13m
cloud-api-adaptor-daemonset-8whkj                 0/1     Terminating         0          13m
cloud-api-adaptor-daemonset-98fk4                 0/1     Pending             0          0s
cloud-api-adaptor-daemonset-98fk4                 0/1     Pending             0          0s
cloud-api-adaptor-daemonset-98fk4                 0/1     ContainerCreating   0          0s
cloud-api-adaptor-daemonset-98fk4                 0/1     Running             0          3s
cloud-api-adaptor-daemonset-98fk4                 0/1     Running             0          41s
cloud-api-adaptor-daemonset-98fk4                 1/1     Running             0          41s

The nginx pods:

$ kubectl get po --watch
NAME                    READY   STATUS    RESTARTS      AGE
nginx-daemonset-5srp8   1/1     Running   2 (12m ago)   13h
nginx-daemonset-5tcgz   1/1     Running   2 (12m ago)   13h
nginx-daemonset-5srp8   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5srp8   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5srp8   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5srp8   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5srp8   0/1     Error     2             13h
nginx-daemonset-5srp8   0/1     Error     2             13h
nginx-daemonset-5srp8   0/1     Error     2             13h
nginx-daemonset-5srp8   0/1     Error     2             13h
nginx-daemonset-5srp8   0/1     Error     2             13h
nginx-daemonset-5srp8   0/1     Error     2             13h
nginx-daemonset-5srp8   1/1     Running   3 (38s ago)   13h
nginx-daemonset-5tcgz   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5tcgz   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5tcgz   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5tcgz   0/1     Error     2 (13m ago)   13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   0/1     Error     2             13h
nginx-daemonset-5tcgz   1/1     Running   3 (41s ago)   13h

liudalibj

LGTM. thanks @huoqifeng

pkg/probe/probe.go

tumberino

small nits

pkg/probe/checker.go

tumberino

LGTM, thanks @huoqifeng

I haven't had a chance to actually test this and I'm on holiday from tomorrow (so I won't formally approve) but the code makes sense and looks good.

huoqifeng · 2023-08-17T01:29:02Z

Did another test like below:

Created a service like:

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  type: NodePort
  ports:
    - name: port80
      port: 80
      targetPort: 80
      protocol: TCP
  selector:
    app: nginx

$ kubectl get svc |grep nginx
nginx-service   NodePort    172.21.128.171   <none>        80:31551/TCP   16h

The corresponding pods are on different nodes

# kubectl get po nginx-deployment-7796887b88-gnzx8 -o yaml |grep nodeName
  nodeName: 10.244.64.11
# kubectl get po nginx-deployment-7796887b88-ltkq5 -o yaml |grep nodeName
  nodeName: 10.244.64.6

Created a busybox pod via:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  labels:
    app: busybox
spec:
  containers:
    - name: busybox
      command: ['sh', '-c', 'while true; do date; sleep 30; done']
      image: radial/busyboxplus:curl

Exec into the busy box pod and prepared a test script

# kubectl exec -it busybox /bin/sh
[ root@busybox:/ ]$ cat test.sh 
#!/bin/bash
while true; do
    curl 172.21.128.171:80
    sleep 1  # Sleep for 1 second
done

Update the cloud-api-adaptor's ds and monitor the pods

# kubectl -n confidential-containers-system get po --watch | grep cloud-api-adaptor
cloud-api-adaptor-daemonset-ftmzt                 0/1     Running   0          6s
cloud-api-adaptor-daemonset-j9bxr                 1/1     Running   0          24m
cloud-api-adaptor-daemonset-ftmzt                 0/1     Running   0          40s
cloud-api-adaptor-daemonset-ftmzt                 1/1     Running   0          40s
cloud-api-adaptor-daemonset-j9bxr                 1/1     Terminating   0          24m
cloud-api-adaptor-daemonset-j9bxr                 0/1     Terminating   0          24m
cloud-api-adaptor-daemonset-j9bxr                 0/1     Terminating   0          24m
cloud-api-adaptor-daemonset-j9bxr                 0/1     Terminating   0          24m
cloud-api-adaptor-daemonset-5cv68                 0/1     Pending       0          0s
cloud-api-adaptor-daemonset-5cv68                 0/1     Pending       0          0s
cloud-api-adaptor-daemonset-5cv68                 0/1     ContainerCreating   0          0s
cloud-api-adaptor-daemonset-5cv68                 0/1     Running             0          2s
cloud-api-adaptor-daemonset-5cv68                 0/1     Running             0          40s
cloud-api-adaptor-daemonset-5cv68                 1/1     Running             0          40s

Run the test in busybox pod

[ root@busybox:/ ]$ sh test.sh

Logs:
test-log.txt

huoqifeng · 2023-08-17T01:43:21Z

Files used for testing:

nginx deploy and svc (rename it to *.yaml)
nginx-deploy-svc.txt
busybox (rename it to *.yaml)
busybox.txt
test script (rename it to *.sh)
test.txt

bpradipt · 2023-08-17T09:07:51Z

cmd/cloud-api-adaptor/main.go

@@ -139,6 +140,7 @@ func (cfg *daemonConfig) Setup() (cmd.Starter, error) {
 var config cmd.Config = &daemonConfig{}

 func main() {
+	go probe.Start()


What happens if the probe.Start fails ? Does CAA needs to bail out as well?
Also if CAA fails to start due to an error post starting the probe, does the goroutine needs to be killed ?

failed startupProbe will cause pods recreating. per https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

If the startup probe never succeeds, the container is killed after 300s and subject to the pod's restartPolicy

It'll be good to add a comment in the code, to help new devs understand why error from the goroutine is not handled.
Like "We don't handle error from the probe goroutine since if the goroutine fails to start, the probe configured as part of the deployment will fail and the CAA containerwill be killed and restarted"

pkg/probe/checker.go

pkg/probe/probe.go

Fixes: confidential-containers#1322 Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>

bpradipt

/lgtm
Thanks @huoqifeng

huoqifeng temporarily deployed to external August 11, 2023 08:17 — with GitHub Actions Inactive

huoqifeng force-pushed the readiness branch from dfb02fb to 689a280 Compare August 14, 2023 02:59

huoqifeng had a problem deploying to external August 14, 2023 02:59 — with GitHub Actions Failure

huoqifeng force-pushed the readiness branch from 689a280 to 1766e82 Compare August 14, 2023 03:03

huoqifeng had a problem deploying to external August 14, 2023 03:03 — with GitHub Actions Failure

huoqifeng force-pushed the readiness branch from 1766e82 to e7874aa Compare August 14, 2023 03:22

huoqifeng had a problem deploying to external August 14, 2023 03:22 — with GitHub Actions Failure

huoqifeng force-pushed the readiness branch from e7874aa to dd99dc1 Compare August 14, 2023 03:45

huoqifeng had a problem deploying to external August 14, 2023 03:45 — with GitHub Actions Failure

huoqifeng force-pushed the readiness branch from dd99dc1 to fbee8eb Compare August 14, 2023 05:50

huoqifeng had a problem deploying to external August 14, 2023 05:50 — with GitHub Actions Failure

huoqifeng force-pushed the readiness branch from fbee8eb to dc0447f Compare August 14, 2023 05:56

huoqifeng had a problem deploying to external August 14, 2023 05:56 — with GitHub Actions Failure

huoqifeng force-pushed the readiness branch from dc0447f to 6ed4a55 Compare August 14, 2023 07:01

huoqifeng had a problem deploying to external August 14, 2023 07:01 — with GitHub Actions Failure

huoqifeng force-pushed the readiness branch from 6ed4a55 to 6b5ce56 Compare August 14, 2023 12:07

huoqifeng had a problem deploying to external August 14, 2023 12:07 — with GitHub Actions Failure

huoqifeng had a problem deploying to external August 15, 2023 01:51 — with GitHub Actions Failure

huoqifeng had a problem deploying to external August 15, 2023 08:20 — with GitHub Actions Failure

huoqifeng force-pushed the readiness branch from f0a5124 to e945e08 Compare August 15, 2023 08:47

huoqifeng had a problem deploying to external August 15, 2023 08:48 — with GitHub Actions Failure

huoqifeng force-pushed the readiness branch from e945e08 to 9b2f94e Compare August 16, 2023 05:12

huoqifeng had a problem deploying to external August 16, 2023 05:12 — with GitHub Actions Failure

huoqifeng requested review from bpradipt, snir911, liudalibj and stevenhorsman August 16, 2023 07:24

huoqifeng changed the title ~~[WIP] install: add probes for caa ds rolling update~~ install: add probes for caa ds rolling update Aug 16, 2023

liudalibj approved these changes Aug 16, 2023

View reviewed changes

huoqifeng requested a review from tumberino August 16, 2023 09:12

tumberino reviewed Aug 16, 2023

View reviewed changes

pkg/probe/probe.go Show resolved Hide resolved

tumberino reviewed Aug 16, 2023

View reviewed changes

pkg/probe/checker.go Outdated Show resolved Hide resolved

pkg/probe/checker.go Outdated Show resolved Hide resolved

huoqifeng had a problem deploying to external August 16, 2023 15:01 — with GitHub Actions Failure

huoqifeng force-pushed the readiness branch from a114bc9 to c12473d Compare August 16, 2023 15:06

huoqifeng had a problem deploying to external August 16, 2023 15:06 — with GitHub Actions Failure

tumberino reviewed Aug 16, 2023

View reviewed changes

bpradipt reviewed Aug 17, 2023

View reviewed changes

pkg/probe/checker.go Outdated Show resolved Hide resolved

bpradipt reviewed Aug 21, 2023

View reviewed changes

pkg/probe/probe.go Outdated Show resolved Hide resolved

huoqifeng requested a review from yoheiueda August 22, 2023 01:56

huoqifeng had a problem deploying to external August 22, 2023 08:15 — with GitHub Actions Failure

huoqifeng had a problem deploying to external August 22, 2023 11:31 — with GitHub Actions Failure

install: add probes for caa ds rolling update

f1a9294

Fixes: confidential-containers#1322 Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>

huoqifeng force-pushed the readiness branch from 24985e7 to f1a9294 Compare August 22, 2023 11:36

huoqifeng had a problem deploying to external August 22, 2023 11:36 — with GitHub Actions Failure

bpradipt approved these changes Aug 22, 2023

View reviewed changes

huoqifeng merged commit b3c073a into confidential-containers:main Aug 22, 2023
12 of 13 checks passed

liudalibj mentioned this pull request Sep 7, 2023

Add additional check in cloud-api-adaptor startup probe #1417

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

install: add probes for caa ds rolling update #1323

install: add probes for caa ds rolling update #1323

huoqifeng commented Aug 11, 2023

huoqifeng commented Aug 15, 2023

liudalibj left a comment

tumberino left a comment

tumberino left a comment

huoqifeng commented Aug 17, 2023 •

edited

huoqifeng commented Aug 17, 2023

bpradipt Aug 17, 2023

huoqifeng Aug 17, 2023

bpradipt Aug 21, 2023

bpradipt left a comment

install: add probes for caa ds rolling update #1323

install: add probes for caa ds rolling update #1323

Conversation

huoqifeng commented Aug 11, 2023

huoqifeng commented Aug 15, 2023

liudalibj left a comment

Choose a reason for hiding this comment

tumberino left a comment

Choose a reason for hiding this comment

tumberino left a comment

Choose a reason for hiding this comment

huoqifeng commented Aug 17, 2023 • edited

huoqifeng commented Aug 17, 2023

bpradipt Aug 17, 2023

Choose a reason for hiding this comment

huoqifeng Aug 17, 2023

Choose a reason for hiding this comment

bpradipt Aug 21, 2023

Choose a reason for hiding this comment

bpradipt left a comment

Choose a reason for hiding this comment

huoqifeng commented Aug 17, 2023 •

edited