Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to update kata-runtime to point to the main version #1596

Closed
stevenhorsman opened this issue Nov 22, 2023 · 14 comments · Fixed by #1754
Closed

Attempt to update kata-runtime to point to the main version #1596

stevenhorsman opened this issue Nov 22, 2023 · 14 comments · Fixed by #1754
Assignees

Comments

@stevenhorsman
Copy link
Member

As part of the merge to main effort, we have kata-containers/kata-containers#7046 which is adding the remote hypervisor feature to the kata runtime. Once this is merge we should test out whether the CAA can re-vendor on it and see what issues there are. We also know that as part of these changes we want to remove the gogoprotobuf workaround

cloud-api-adaptor/go.mod

Lines 162 to 164 in eb1b368

// The following line is a workaround for the issue descrined in https://github.com/containerd/ttrpc/issues/62
// We can remove this workaround when Kata stop using github.com/gogo/protobuf
replace google.golang.org/genproto => google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8
and align with kata where we had to do the same thing.

@stevenhorsman stevenhorsman self-assigned this Nov 22, 2023
@stevenhorsman
Copy link
Member Author

After a lot of compilation issues, I've got it all compiling not and built a CAA OCI image from it, but when testing it doesn't work:

Events:
  Type     Reason                  Age                 From               Message
  ----     ------                  ----                ----               -------
  Normal   Scheduled               35m                 default-scheduler  Successfully assigned default/alpine to sh-libvirt-s390x-e2e-22-04-test-4
  Warning  FailedCreatePodSandBox  7s (x160 over 35m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: remote hypervisor call failed: ttrpc: closed: unknown

so I've clearly broken something during the change

@stevenhorsman
Copy link
Member Author

Beraldo has recommended re-generating the hypervisor protos with ttrpc rather than grpc, so I've created a branch that has that change in kata-runtime: https://github.com/kata-containers/kata-containers/compare/main...stevenhorsman:hypervisor-ttrpc?expand=1
I'm not reworking the changes to undo some of these related to the ttrpc -> grpc change.

@stevenhorsman
Copy link
Member Author

I've updated my branch to use my fork of kata runtime with the ttrpc changes and I think I get further now. as when I try and start a peer pods the error is:

Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Normal   Scheduled               7m20s                  default-scheduler  Successfully assigned default/nginx-secret-pod to peer-pods-worker-0
  Warning  FailedCreatePodSandBox  114s (x26 over 7m20s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd container: create prepare snapshot dir: failed to create temp dir: stat /var/lib/containerd-nydus/snapshots: no such file or directory: unknown

@stevenhorsman
Copy link
Member Author

We've been tracking the status on this in a slack thread. A rough summary is:

  • After the initial ttrpc switch we still had issues, so re-gen'd the kata hypervisor protos and they seemed to be working better
  • This has let us get to the point that the kata-runtime can talk to the CAA as this bit of logs show:
2023/11/29 13:48:42 [adaptor/proxy] CreateContainer: containerID:18a7d2fd310f9d92b2ebf495f50bd5f87a3d8a40bd048432493742f456325804
2023/11/29 13:48:42 [adaptor/proxy]     mounts:
2023/11/29 13:48:42 [adaptor/proxy]         destination:/proc source:proc type:proc
2023/11/29 13:48:42 [adaptor/proxy]         destination:/dev source:tmpfs type:tmpfs
2023/11/29 13:48:42 [adaptor/proxy]         destination:/dev/pts source:devpts type:devpts
2023/11/29 13:48:42 [adaptor/proxy]         destination:/dev/mqueue source:mqueue type:mqueue
2023/11/29 13:48:42 [adaptor/proxy]         destination:/sys source:sysfs type:sysfs
2023/11/29 13:48:42 [adaptor/proxy]         destination:/dev/shm source:/run/kata-containers/sandbox/shm type:bind
2023/11/29 13:48:42 [adaptor/proxy]         destination:/etc/resolv.conf source:/run/kata-containers/shared/containers/18a7d2fd310f9d92b2ebf495f50bd5f87a3d8a40bd048432493742f456325804-5c2378f47edc3280-resolv.conf type:bind
2023/11/29 13:48:42 [adaptor/proxy]     annotations:
2023/11/29 13:48:42 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-shares: 2
2023/11/29 13:48:42 [adaptor/proxy]         io.katacontainers.pkg.oci.bundle_path: /run/containerd/io.containerd.runtime.v2.task/k8s.io/18a7d2fd310f9d92b2ebf495f50bd5f87a3d8a40bd048432493742f456325804
2023/11/29 13:48:42 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-quota: 0
2023/11/29 13:48:42 [adaptor/proxy]         io.kubernetes.cri.sandbox-namespace: default
2023/11/29 13:48:42 [adaptor/proxy]         io.kubernetes.cri.container-type: sandbox
2023/11/29 13:48:42 [adaptor/proxy]         io.kubernetes.cri.sandbox-memory: 0
2023/11/29 13:48:42 [adaptor/proxy]         io.kubernetes.cri.sandbox-name: nginx
2023/11/29 13:48:42 [adaptor/proxy]         io.kubernetes.cri.sandbox-uid: 77aae7ac-3160-4e79-9381-74931196d7b1
2023/11/29 13:48:42 [adaptor/proxy]         nerdctl/network-namespace: /var/run/netns/cni-459f5739-6f42-6c74-6592-c33541e1cfd4
2023/11/29 13:48:42 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-period: 100000
2023/11/29 13:48:42 [adaptor/proxy]         io.kubernetes.cri.sandbox-log-directory: /var/log/pods/default_nginx_77aae7ac-3160-4e79-9381-74931196d7b1
2023/11/29 13:48:42 [adaptor/proxy]         io.kubernetes.cri.sandbox-id: 18a7d2fd310f9d92b2ebf495f50bd5f87a3d8a40bd048432493742f456325804
2023/11/29 13:48:42 [adaptor/proxy]         io.katacontainers.pkg.oci.container_type: pod_sandbox
2023/11/29 13:48:42 [adaptor/proxy] Pulling image separately not support on main
2023/11/29 13:48:43 [adaptor/proxy] CreateContainer fails: rpc error: code = Internal desc = failed to mount /run/kata-containers/shared/containers/18a7d2fd310f9d92b2ebf495f50bd5f87a3d8a40bd048432493742f456325804/rootfs to /run/kata-containers/18a7d2fd310f9d92b2ebf495f50bd5f87a3d8a40bd048432493742f456325804/rootfs, with error: ENOENT: No such file or directory

It also shows that nydus options aren't being set. This is for at least two reasons:

  • kata-deploy in main doesn't support setting the containerd config runtime's snapshotter at the moment.
  • All the nydus mount related changes haven't been merged into main yet.

I think we are now blocked until these steps before we can go much further, but hopefully my kata runtime PR: https://github.com/kata-containers/kata-containers/pull/8520/commits can be merged in the mean time

@stevenhorsman
Copy link
Member Author

We've managed to get some of the steps required for this upstreamed now:

Remaining issues that will need resolving before we can be unblocked here (and there might be more after)

@stevenhorsman
Copy link
Member Author

Just an update on this - now the agent supports image pull on guest I've done a bunch of PoC work on this to push us further in the right direction. The current place we are at is that nydus_snapshotter isn't putting the correct annotation into the storage driver for us to pull on guest. Fabiano is also seeing this on local hypervisor, so hopefully we can work it out between us...

@stevenhorsman
Copy link
Member Author

The problem we were hitting in noted in Issue 4 here: kata-containers/kata-containers#8407 (comment)

I ran the following script, kindly provided by Fabiano on the worker:

	test_images_to_remove=(
		"docker.io/rancher/mirrored-pause"
		"registry.k8s.io/pause"
		"quay.io/sjenning/nginx"
		"quay.io/prometheus/busybox"
		"quay.io/confidential-containers/test-images"
	)

	ctr_args=""
	if [ "${KUBERNETES}" = "k3s" ]; then
		ctr_args="--address	 /run/k3s/containerd/containerd.sock "
	fi
	ctr_args+="--namespace k8s.io"
	ctr_command="sudo -E ctr ${ctr_args}"
	for related_image in "${test_images_to_remove[@]}"; do
		# We need to delete related image
		image_list=($(${ctr_command} i ls -q |grep "$related_image" |awk '{print $1}'))
		if [ "${#image_list[@]}" -gt 0 ]; then
			for image in "${image_list[@]}"; do
				${ctr_command} i remove "$image"
			done
		fi
		# We need to delete related content of image
		IFS="/" read -ra parts <<< "$related_image"; 
		repository="${parts[0]}";     
		image_name="${parts[1]}";
		formatted_image="${parts[0]}=${parts[-1]}"
		image_contents=($(${ctr_command} content ls | grep "${formatted_image}" | awk '{print $1}'))
		if [ "${#image_contents[@]}" -gt 0 ]; then
			for content in $image_contents; do
				${ctr_command} content rm "$content"
			done
		fi
	done

and after that the image pull on host worked and the container is up and running:

2024/04/11 09:50:38 [adaptor/proxy]     storages:
2024/04/11 09:50:38 [adaptor/proxy]         mount_point:/run/kata-containers/7ce95eeeb93faef7640d7640c0d46ddd5128b4c81904478fc80ab945fedc4b56/rootfs source:docker.io/library/nginx:latest fstype:overlay driver:image_guest_pull
# kubectl get pods
NAME    READY   STATUS    RESTARTS   AGE
nginx   1/1     Running   0          76s

so we just need a way to do this more easily on the worker node...

@stevenhorsman
Copy link
Member Author

stevenhorsman commented Apr 19, 2024

Ok, I'm going to try and describe how to reliably and reproducible set up a dev environment for this for testing. I'm using libvirt with a kcli cluster and an pretty chunky 16 vCPU 32GRAM VM, but I'm not sure that is strictly required to be that large. I will also do some steps like pushing the podvm image that are just so people following that can use my images and save some time:

  • Clone my repo (edit this to pick your own if you are based on my branch)
export GOPATH="${HOME}/go"
cloud_api_adaptor_repo="github.com/confidential-containers/cloud-api-adaptor"
cloud_api_adaptor_dir="${GOPATH}/src/${cloud_api_adaptor_repo}"
mkdir -p $(dirname "${cloud_api_adaptor_dir}")
git clone -b main "https://${cloud_api_adaptor_repo}.git" "${cloud_api_adaptor_dir}"
pushd $cloud_api_adaptor_dir/src/cloud-api-adaptor
git remote add sh https://github.com/stevenhorsman/cloud-api-adaptor.git
git fetch sh
git checkout -b kata-runtime-bump sh/kata-runtime-bump
  • Setup libvirt and create a kcli cluster
./libvirt/config_libvirt.sh
./libvirt/kcli_cluster.sh create
export KUBECONFIG=$HOME/.kcli/clusters/peer-pods/auth/kubeconfig
  • Install some pre-reqs
sudo snap install yq
yq --version
echo "Install docker"
sudo snap install docker
sudo systemctl start snap.docker.dockerd
sudo systemctl enable snap.docker.dockerd
  • Build and publish a podvm image (you can skip this and use mine if you want)
make podvm-builder podvm-binaries podvm-image
docker image tag quay.io/confidential-containers/podvm-generic-ubuntu-amd64 quay.io/stevenhorsman/podvm-generic-ubuntu-amd64
docker image tag quay.io/confidential-containers/podvm-generic-ubuntu-amd64:c392670e2401a08956ed4f52c4152209617bcde872a71b0a3b87da86b8fad2dc quay.io/stevenhorsman/podvm-generic-ubuntu-amd64:c392670e2401a08956ed4f52c4152209617bcde872a71b0a3b87da86b8fad2dc
docker login quay.io
docker push quay.io/stevenhorsman/podvm-generic-ubuntu-amd64:c392670e2401a08956ed4f52c4152209617bcde872a71b0a3b87da86b8fad2dc
  • Download the podvm qcow2
pushd podvm
./hack/download-image.sh ghcr.io/confidential-containers/podvm-generic-ubuntu-amd64:ci-pr1754 . -o podvm.qcow2
popd
  • Prepare the libvirt podvm volume
export IMAGE="${PWD}/podvm/podvm.qcow2"
ls -al $IMAGE
virsh -c qemu:///system vol-create-as --pool default --name podvm-base.qcow2 --capacity 20G --allocation 2G --prealloc-metadata --format qcow2
virsh -c qemu:///system vol-upload --vol podvm-base.qcow2 $IMAGE --pool default --sparse
virsh -c qemu:///system vol-info --pool default podvm-base.qcow2
  • Install the operator and libvirt CAA
export LIBVIRT_IP="192.168.122.1"
export SSH_KEY_FILE="id_rsa"
./libvirt/install_operator.sh

You should now have pods:

# kubectl get pods -n confidential-containers-system
NAME                                              READY   STATUS              RESTARTS   AGE
cc-operator-controller-manager-767d88bbb4-9ppb6   2/2     Running             0          2m42s
cc-operator-daemon-install-6kxb5                  0/1     ContainerCreating   0          103s
cc-operator-pre-install-daemon-zsvk6              1/1     Running             0          2m21s
cloud-api-adaptor-daemonset-dvskj                 1/1     Running             0          2m42s
peerpodconfig-ctrl-caa-daemon-v6mvd               1/1     Running             0          2m21s

Note: with the new operator peer pod approach we know have two CAA pods - the normal one and peerpodconfig-ctrl. This needs resolving, but in the short term we will edit both to use our CAA image

  • Create a new CAA image (this this optional - you can use mine: quay.io/stevenhorsman/cloud-api-adaptor:dev-c7c48d677c2c1f4c7f4085c4e663d4a10daf1b2f-dirty
docker login quay.io
registry=quay.io/stevenhorsman make image
  • Edit both CAA daemonsets to use our new quay.io/stevenhorsman/cloud-api-adaptor:dev-c7c48d677c2c1f4c7f4085c4e663d4a10daf1b2f-dirty image
kubectl edit ds/peerpodconfig-ctrl-caa-daemon -n confidential-containers-system
kubectl edit ds/cloud-api-adaptor-daemonset -n confidential-containers-system
  • Log into the worker nodes and clean the pause image cache:
kcli ssh peer-pods-worker-0

test_images_to_remove=(
                "docker.io/rancher/mirrored-pause"
                "registry.k8s.io/pause"
                "quay.io/sjenning/nginx"
                "quay.io/prometheus/busybox"
                "quay.io/confidential-containers/test-images"
        )

        ctr_args=""
        if [ "${KUBERNETES}" = "k3s" ]; then
                ctr_args="--address      /run/k3s/containerd/containerd.sock "
        fi
        ctr_args+="--namespace k8s.io"
        ctr_command="sudo -E ctr ${ctr_args}"
        for related_image in "${test_images_to_remove[@]}"; do
                # We need to delete related image
                image_list=($(${ctr_command} i ls -q |grep "$related_image" |awk '{print $1}'))
                if [ "${#image_list[@]}" -gt 0 ]; then
                        for image in "${image_list[@]}"; do
                                ${ctr_command} i remove "$image"
                        done
                fi
                # We need to delete related content of image
                IFS="/" read -ra parts <<< "$related_image";
                repository="${parts[0]}";
                image_name="${parts[1]}";
                formatted_image="${parts[0]}=${parts[-1]}"
                image_contents=($(${ctr_command} content ls | grep "${formatted_image}" | awk '{print $1}'))
                if [ "${#image_contents[@]}" -gt 0 ]; then
                        for content in $image_contents; do
                                ${ctr_command} content rm "$content"
                        done
                fi
        done
  • Test it by creating a peer pod:
echo '
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        io.containerd.cri.runtime-handler: kata-remote
    spec:
      runtimeClassName: kata-remote
      containers:
      - image: nginx@sha256:9700d098d545f9d2ee0660dfb155fe64f4447720a0a763a93f2cf08997227279
        name: nginx
' | kubectl apply -f -

After waiting a little while you should see the pod running:

# kubectl get pods --watch
NAME                     READY   STATUS              RESTARTS   AGE
nginx-5bb58f7796-lqvr5   0/1     ContainerCreating   0          11s
nginx-5bb58f7796-lqvr5   1/1     Running             0          57s
  • We can check the CAA logs to see that it was pulled correctly (depending on the order you might need to check the peerpodctrl version of the CAA ds):
# kubectl logs -f ds/cloud-api-adaptor-daemonset -n confidential-containers-system
+ exec cloud-api-adaptor libvirt -uri 'qemu+ssh://root@192.168.122.1/system?no_verify=1' -data-dir /opt/data-dir -pods-dir /run/peerpod/pods -network-name default -pool-name default -disable-cvm -socket /run/peerpod/hypervisor.sock
cloud-api-adaptor version v0.8.2-dev
  commit: c7c48d677c2c1f4c7f4085c4e663d4a10daf1b2f-dirty
  go: go1.21.9
cloud-api-adaptor: starting Cloud API Adaptor daemon for "libvirt"
2024/04/19 16:14:46 [adaptor/cloud] Cloud provider external plugin loading is disabled, skipping plugin loading
2024/04/19 16:14:46 [adaptor/cloud/libvirt] libvirt config: &libvirt.Config{URI:"qemu+ssh://root@192.168.122.1/system?no_verify=1", PoolName:"default", NetworkName:"default", DataDir:"/opt/data-dir", DisableCVM:true, VolName:"podvm-base.qcow2", LaunchSecurity:"", Firmware:"/usr/share/edk2/ovmf/OVMF_CODE.fd"}
2024/04/19 16:14:46 [adaptor/cloud/libvirt] Created libvirt connection
2024/04/19 16:14:46 [adaptor] server config: &adaptor.ServerConfig{TLSConfig:(*tlsutil.TLSConfig)(0xc00071bc80), SocketPath:"/run/peerpod/hypervisor.sock", CriSocketPath:"", PauseImage:"", PodsDir:"/run/peerpod/pods", ForwarderPort:"15150", ProxyTimeout:300000000000, AAKBCParams:"", EnableCloudConfigVerify:false}
2024/04/19 16:14:46 [util/k8sops] initialized PeerPodService
2024/04/19 16:14:46 [probe/probe] Using port: 8000
2024/04/19 16:14:46 [adaptor] server started
...
2024/04/19 16:15:25 [probe/probe] All PeerPods standup. we do not check the PeerPods status any more.
2024/04/19 16:18:46 [podnetwork] routes on netns /var/run/netns/cni-f63410de-a7a4-08a5-2c26-3b51f4146cbc
2024/04/19 16:18:46 [podnetwork]     0.0.0.0/0 via 10.244.1.1 dev eth0
2024/04/19 16:18:46 [podnetwork]     10.244.0.0/16 via 10.244.1.1 dev eth0
2024/04/19 16:18:46 [adaptor/cloud] Credentials file is not in a valid Json format, ignored
2024/04/19 16:18:46 [adaptor/cloud] stored /run/peerpod/pods/54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432/daemon.json
2024/04/19 16:18:46 [adaptor/cloud] create a sandbox 54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432 for pod nginx-5bb58f7796-lqvr5 in namespace default (netns: /var/run/netns/cni-f63410de-a7a4-08a5-2c26-3b51f4146cbc)
2024/04/19 16:18:46 [adaptor/cloud/libvirt] LaunchSecurityType: None
2024/04/19 16:18:46 [adaptor/cloud/libvirt] Checking if instance (podvm-nginx-5bb58f7796-lqvr5-54fb9aba) exists
2024/04/19 16:18:46 [adaptor/cloud/libvirt] Uploaded volume key /var/lib/libvirt/images/podvm-nginx-5bb58f7796-lqvr5-54fb9aba-root.qcow2
2024/04/19 16:18:46 [adaptor/cloud/libvirt] Create cloudInit iso
2024/04/19 16:18:46 [adaptor/cloud/libvirt] Uploading iso file: podvm-nginx-5bb58f7796-lqvr5-54fb9aba-cloudinit.iso
2024/04/19 16:18:46 [adaptor/cloud/libvirt] 45056 bytes uploaded
2024/04/19 16:18:46 [adaptor/cloud/libvirt] Volume ID: /var/lib/libvirt/images/podvm-nginx-5bb58f7796-lqvr5-54fb9aba-cloudinit.iso
2024/04/19 16:18:46 [adaptor/cloud/libvirt] Create XML for 'podvm-nginx-5bb58f7796-lqvr5-54fb9aba'
2024/04/19 16:18:46 [adaptor/cloud/libvirt] Creating VM 'podvm-nginx-5bb58f7796-lqvr5-54fb9aba'
2024/04/19 16:18:46 [adaptor/cloud/libvirt] Starting VM 'podvm-nginx-5bb58f7796-lqvr5-54fb9aba'
2024/04/19 16:18:47 [adaptor/cloud/libvirt] VM id 3
2024/04/19 16:19:32 [adaptor/cloud/libvirt] Instance created successfully
2024/04/19 16:19:32 [adaptor/cloud/libvirt] created an instance podvm-nginx-5bb58f7796-lqvr5-54fb9aba for sandbox 54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432
2024/04/19 16:19:32 [util/k8sops] nginx-5bb58f7796-lqvr5 is now owning a PeerPod object
2024/04/19 16:19:32 [adaptor/cloud] created an instance podvm-nginx-5bb58f7796-lqvr5-54fb9aba for sandbox 54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432
2024/04/19 16:19:32 [tunneler/vxlan] vxlan ppvxlan1 (remote 192.168.122.17:4789, id: 555000) created at /proc/1/task/15/ns/net
2024/04/19 16:19:32 [tunneler/vxlan] vxlan ppvxlan1 created at /proc/1/task/15/ns/net
2024/04/19 16:19:33 [tunneler/vxlan] vxlan ppvxlan1 is moved to /var/run/netns/cni-f63410de-a7a4-08a5-2c26-3b51f4146cbc
2024/04/19 16:19:33 [tunneler/vxlan] Add tc redirect filters between eth0 and vxlan1 on pod network namespace /var/run/netns/cni-f63410de-a7a4-08a5-2c26-3b51f4146cbc
2024/04/19 16:19:33 [adaptor/proxy] Listening on /run/peerpod/pods/54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432/agent.ttrpc
2024/04/19 16:19:33 [adaptor/proxy] failed to init cri client, the err: cri runtime endpoint is not specified, it is used to get the image name from image digest
2024/04/19 16:19:33 [adaptor/proxy] Trying to establish agent proxy connection to 192.168.122.17:15150
2024/04/19 16:19:33 [adaptor/proxy] established agent proxy connection to 192.168.122.17:15150
2024/04/19 16:19:33 [adaptor/cloud] agent proxy is ready
2024/04/19 16:19:33 [adaptor/proxy] CreateSandbox: hostname:nginx-5bb58f7796-lqvr5 sandboxId:54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432
2024/04/19 16:19:33 [adaptor/proxy]     storages:
2024/04/19 16:19:33 [adaptor/proxy]         mountpoint:/run/kata-containers/sandbox/shm source:shm fstype:tmpfs driver:ephemeral
2024/04/19 16:19:33 [adaptor/proxy] CreateContainer: containerID:54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432
2024/04/19 16:19:33 [adaptor/proxy]     mounts:
2024/04/19 16:19:33 [adaptor/proxy]         destination:/proc source:proc type:proc
2024/04/19 16:19:33 [adaptor/proxy]         destination:/dev source:tmpfs type:tmpfs
2024/04/19 16:19:33 [adaptor/proxy]         destination:/dev/pts source:devpts type:devpts
2024/04/19 16:19:33 [adaptor/proxy]         destination:/dev/mqueue source:mqueue type:mqueue
2024/04/19 16:19:33 [adaptor/proxy]         destination:/sys source:sysfs type:sysfs
2024/04/19 16:19:33 [adaptor/proxy]         destination:/dev/shm source:/run/kata-containers/sandbox/shm type:bind
2024/04/19 16:19:33 [adaptor/proxy]         destination:/etc/resolv.conf source:/run/kata-containers/shared/containers/54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432-c4a81fd2ad7f9069-resolv.conf type:bind
2024/04/19 16:19:33 [adaptor/proxy]     annotations:
2024/04/19 16:19:33 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-period: 100000
2024/04/19 16:19:33 [adaptor/proxy]         io.kubernetes.cri.container-type: sandbox
2024/04/19 16:19:33 [adaptor/proxy]         io.kubernetes.cri.sandbox-namespace: default
2024/04/19 16:19:33 [adaptor/proxy]         io.kubernetes.cri.sandbox-uid: b6c142b8-b0fe-4c31-b986-08bec12d1672
2024/04/19 16:19:33 [adaptor/proxy]         io.katacontainers.pkg.oci.container_type: pod_sandbox
2024/04/19 16:19:33 [adaptor/proxy]         io.kubernetes.cri.sandbox-log-directory: /var/log/pods/default_nginx-5bb58f7796-lqvr5_b6c142b8-b0fe-4c31-b986-08bec12d1672
2024/04/19 16:19:33 [adaptor/proxy]         io.kubernetes.cri.sandbox-name: nginx-5bb58f7796-lqvr5
2024/04/19 16:19:33 [adaptor/proxy]         io.kubernetes.cri.sandbox-id: 54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432
2024/04/19 16:19:33 [adaptor/proxy]         io.katacontainers.pkg.oci.bundle_path: /run/containerd/io.containerd.runtime.v2.task/k8s.io/54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432
2024/04/19 16:19:33 [adaptor/proxy]         io.kubernetes.cri.sandbox-memory: 0
2024/04/19 16:19:33 [adaptor/proxy]         nerdctl/network-namespace: /var/run/netns/cni-f63410de-a7a4-08a5-2c26-3b51f4146cbc
2024/04/19 16:19:33 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-shares: 2
2024/04/19 16:19:33 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-quota: 0
2024/04/19 16:19:33 [adaptor/proxy]     storages:
2024/04/19 16:19:33 [adaptor/proxy]         mount_point:/run/kata-containers/54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432/rootfs source:pause fstype:overlay driver:image_guest_pull
2024/04/19 16:19:33 [adaptor/proxy] StartContainer: containerID:54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432
2024/04/19 16:19:36 [adaptor/proxy] CreateContainer: containerID:75407d23b84d1889a1a32cdfe812ce369de18b049dbc8657e058d11d71c1d3d3
2024/04/19 16:19:36 [adaptor/proxy]     mounts:
2024/04/19 16:19:36 [adaptor/proxy]         destination:/proc source:proc type:proc
2024/04/19 16:19:36 [adaptor/proxy]         destination:/dev source:tmpfs type:tmpfs
2024/04/19 16:19:36 [adaptor/proxy]         destination:/dev/pts source:devpts type:devpts
2024/04/19 16:19:36 [adaptor/proxy]         destination:/dev/mqueue source:mqueue type:mqueue
2024/04/19 16:19:36 [adaptor/proxy]         destination:/sys source:sysfs type:sysfs
2024/04/19 16:19:36 [adaptor/proxy]         destination:/sys/fs/cgroup source:cgroup type:cgroup
2024/04/19 16:19:36 [adaptor/proxy]         destination:/etc/hosts source:/run/kata-containers/shared/containers/75407d23b84d1889a1a32cdfe812ce369de18b049dbc8657e058d11d71c1d3d3-369ac679030043fe-hosts type:bind
2024/04/19 16:19:36 [adaptor/proxy]         destination:/dev/termination-log source:/run/kata-containers/shared/containers/75407d23b84d1889a1a32cdfe812ce369de18b049dbc8657e058d11d71c1d3d3-c8bd42dfb10de530-termination-log type:bind
2024/04/19 16:19:36 [adaptor/proxy]         destination:/etc/hostname source:/run/kata-containers/shared/containers/75407d23b84d1889a1a32cdfe812ce369de18b049dbc8657e058d11d71c1d3d3-fe964c9d0abee194-hostname type:bind
2024/04/19 16:19:36 [adaptor/proxy]         destination:/etc/resolv.conf source:/run/kata-containers/shared/containers/75407d23b84d1889a1a32cdfe812ce369de18b049dbc8657e058d11d71c1d3d3-c4d1259d92f0bc28-resolv.conf type:bind
2024/04/19 16:19:36 [adaptor/proxy]         destination:/dev/shm source:/run/kata-containers/sandbox/shm type:bind
2024/04/19 16:19:36 [adaptor/proxy]         destination:/var/run/secrets/kubernetes.io/serviceaccount source:/run/kata-containers/shared/containers/75407d23b84d1889a1a32cdfe812ce369de18b049dbc8657e058d11d71c1d3d3-72c2763bcd815f2d-serviceaccount type:bind
2024/04/19 16:19:36 [adaptor/proxy]     annotations:
2024/04/19 16:19:36 [adaptor/proxy]         io.kubernetes.cri.container-name: nginx
2024/04/19 16:19:36 [adaptor/proxy]         io.katacontainers.pkg.oci.bundle_path: /run/containerd/io.containerd.runtime.v2.task/k8s.io/75407d23b84d1889a1a32cdfe812ce369de18b049dbc8657e058d11d71c1d3d3
2024/04/19 16:19:36 [adaptor/proxy]         io.kubernetes.cri.image-name: docker.io/library/nginx@sha256:9700d098d545f9d2ee0660dfb155fe64f4447720a0a763a93f2cf08997227279
2024/04/19 16:19:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-name: nginx-5bb58f7796-lqvr5
2024/04/19 16:19:36 [adaptor/proxy]         io.katacontainers.pkg.oci.container_type: pod_container
2024/04/19 16:19:36 [adaptor/proxy]         io.kubernetes.cri.container-type: container
2024/04/19 16:19:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-uid: b6c142b8-b0fe-4c31-b986-08bec12d1672
2024/04/19 16:19:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-id: 54fb9abac7fd17bb3e9d31379c3e9430107a7c98dba74bd3450789b1256d7432
2024/04/19 16:19:36 [adaptor/proxy]         io.kubernetes.cri.sandbox-namespace: default
2024/04/19 16:19:36 [adaptor/proxy]     storages:
2024/04/19 16:19:36 [adaptor/proxy]         mount_point:/run/kata-containers/75407d23b84d1889a1a32cdfe812ce369de18b049dbc8657e058d11d71c1d3d3/rootfs source:docker.io/library/nginx@sha256:9700d098d545f9d2ee0660dfb155fe64f4447720a0a763a93f2cf08997227279 fstype:overlay driver:image_guest_pull
2024/04/19 16:19:41 [adaptor/proxy] StartContainer: containerID:75407d23b84d1889a1a32cdfe812ce369de18b049dbc8657e058d11d71c1d3d3
  • This shows peer pods working using a code base just on kata-containers main. You can now clear up the nginx peer pod:
kubectl delete deployment nginx

@stevenhorsman
Copy link
Member Author

After chatting to Pradipta we've realised that we need to change and remove all the CAA install/kustomize's installation of the caa-pod now that the peerpodconfig-ctrl is deploying it. There is a lot of references to it, so I'm trying to go through and unpick and provide alternatives to this...

@stevenhorsman
Copy link
Member Author

stevenhorsman commented Apr 23, 2024

After a bunch of updates to resolve the double CAA ds the latest instructions are a bit more simplified:

I'm going to try and describe how to reliably and reproducible set up a dev environment for this for testing. I'm using libvirt with a kcli cluster and an pretty chunky 16 vCPU 32GRAM VM, but I'm not sure that is strictly required to be that large. I will also do some steps like pushing the podvm image that are just so people following that can use my images and save some time:

  • Clone my repo (edit this to pick your own if you are based on my branch)
export GOPATH="${HOME}/go"
cloud_api_adaptor_repo="github.com/confidential-containers/cloud-api-adaptor"
cloud_api_adaptor_dir="${GOPATH}/src/${cloud_api_adaptor_repo}"
mkdir -p $(dirname "${cloud_api_adaptor_dir}")
git clone -b main "https://${cloud_api_adaptor_repo}.git" "${cloud_api_adaptor_dir}"
pushd $cloud_api_adaptor_dir/src/cloud-api-adaptor
git remote add sh https://github.com/stevenhorsman/cloud-api-adaptor.git
git fetch sh
git checkout -b kata-runtime-bump sh/kata-runtime-bump
  • Setup libvirt and create a kcli cluster
./libvirt/config_libvirt.sh
./libvirt/kcli_cluster.sh create
export KUBECONFIG=$HOME/.kcli/clusters/peer-pods/auth/kubeconfig
  • Install some pre-reqs
echo "Install docker"
sudo snap install docker
sudo systemctl start snap.docker.dockerd
sudo systemctl enable snap.docker.dockerd
  • Build and publish a podvm image (you can skip this and use mine if you want)
make podvm-builder podvm-binaries podvm-image
docker image tag quay.io/confidential-containers/podvm-generic-ubuntu-amd64:c392670e2401a08956ed4f52c4152209617bcde872a71b0a3b87da86b8fad2dc quay.io/stevenhorsman/podvm-generic-ubuntu-amd64:c392670e2401a08956ed4f52c4152209617bcde872a71b0a3b87da86b8fad2dc
docker login quay.io
docker push quay.io/stevenhorsman/podvm-generic-ubuntu-amd64:c392670e2401a08956ed4f52c4152209617bcde872a71b0a3b87da86b8fad2dc
  • Download the podvm qcow2
pushd podvm
./hack/download-image.sh ghcr.io/confidential-containers/podvm-generic-ubuntu-amd64:ci-pr1754 . -o podvm.qcow2
popd
  • Prepare the libvirt podvm volume
export IMAGE="${PWD}/podvm/podvm.qcow2"
ls -al $IMAGE
virsh -c qemu:///system vol-create-as --pool default --name podvm-base.qcow2 --capacity 20G --allocation 2G --prealloc-metadata --format qcow2
virsh -c qemu:///system vol-upload --vol podvm-base.qcow2 $IMAGE --pool default --sparse
virsh -c qemu:///system vol-info --pool default podvm-base.qcow2
  • Create a new CAA image (this this optional - you can use mine: ghcr.io/confidential-containers/cloud-api-adaptor:ci-pr1754-dev)
docker login quay.io
registry=quay.io/stevenhorsman make image
  • Install the operator and libvirt CAA
export CAA_IMAGE="ghcr.io/confidential-containers/cloud-api-adaptor:ci-pr1754-dev"
export SSH_KEY_FILE="id_rsa"
./libvirt/install_operator.sh

After waiting a little while, you should now have pods:

# kubectl get pods -n confidential-containers-system
NAME                                              READY   STATUS              RESTARTS   AGE
cc-operator-controller-manager-7b6c5f84bf-nlqjt   2/2     Running             0          81s
cc-operator-daemon-install-mxxlk                  0/1     ContainerCreating   0          23s
cc-operator-pre-install-daemon-sx4p8              1/1     Running             0          44s
peerpodconfig-ctrl-caa-daemon-j7chc               1/1     Running             0          44s
  • Log into the worker nodes and clean the pause image cache:
kcli ssh peer-pods-worker-0

test_images_to_remove=(
        "docker.io/rancher/mirrored-pause"
        "registry.k8s.io/pause"
        "quay.io/sjenning/nginx"
        "quay.io/prometheus/busybox"
        "quay.io/confidential-containers/test-images"
)

ctr_args=""
if [ "${KUBERNETES}" = "k3s" ]; then
        ctr_args="--address      /run/k3s/containerd/containerd.sock "
fi
ctr_args+="--namespace k8s.io"
ctr_command="sudo -E ctr ${ctr_args}"
for related_image in "${test_images_to_remove[@]}"; do
        # We need to delete related image
        image_list=($(${ctr_command} i ls -q |grep "$related_image" |awk '{print $1}'))
        if [ "${#image_list[@]}" -gt 0 ]; then
                for image in "${image_list[@]}"; do
                        ${ctr_command} i remove "$image"
                done
        fi
        # We need to delete related content of image
        IFS="/" read -ra parts <<< "$related_image";
        repository="${parts[0]}";
        image_name="${parts[1]}";
        formatted_image="${parts[0]}=${parts[-1]}"
        image_contents=($(${ctr_command} content ls | grep "${formatted_image}" | awk '{print $1}'))
        if [ "${#image_contents[@]}" -gt 0 ]; then
                for content in $image_contents; do
                        ${ctr_command} content rm "$content"
                done
        fi
done

exit
  • Test it by creating a peer pod:
echo '
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  annotations:
    io.containerd.cri.runtime-handler: kata-remote
  labels:
    app: nginx
spec:
  runtimeClassName: kata-remote
  containers:
    - name: nginx
      image: nginx
' | kubectl apply -f -

After waiting a little while you should see the pod running:

# kubectl get pods --watch
NAME                     READY   STATUS              RESTARTS   AGE
nginx-5bb58f7796-lqvr5   0/1     ContainerCreating   0          11s
nginx-5bb58f7796-lqvr5   1/1     Running             0          57s
  • We can check the CAA logs to see that it was pulled correctly:
$ kubectl logs ds/peerpodconfig-ctrl-caa-daemon -n confidential-containers-systemem
+ exec cloud-api-adaptor libvirt -uri 'qemu+ssh://root@192.168.122.1/system?no_verify=1' -data-dir /opt/data-dir -pods-dir /run/peerpod/pods -network-name default -pool-name default -disable-cvm -socket /run/peerpod/hypervisor.sock
cloud-api-adaptor version v0.8.2-dev
  commit: c7c48d677c2c1f4c7f4085c4e663d4a10daf1b2f-dirty
  go: go1.21.9
cloud-api-adaptor: starting Cloud API Adaptor daemon for "libvirt"
2024/04/23 11:05:47 [adaptor/cloud] Cloud provider external plugin loading is disabled, skipping plugin loading
2024/04/23 11:05:47 [adaptor/cloud/libvirt] libvirt config: &libvirt.Config{URI:"qemu+ssh://root@192.168.122.1/system?no_verify=1", PoolName:"default", NetworkName:"default", DataDir:"/opt/data-dir", DisableCVM:true, VolName:"podvm-base.qcow2", LaunchSecurity:"", Firmware:"/usr/share/edk2/ovmf/OVMF_CODE.fd"}
2024/04/23 11:05:47 [adaptor/cloud/libvirt] Created libvirt connection
2024/04/23 11:05:47 [adaptor] server config: &adaptor.ServerConfig{TLSConfig:(*tlsutil.TLSConfig)(0xc0007a0280), SocketPath:"/run/peerpod/hypervisor.sock", CriSocketPath:"", PauseImage:"", PodsDir:"/run/peerpod/pods", ForwarderPort:"15150", ProxyTimeout:300000000000, AAKBCParams:"", EnableCloudConfigVerify:false}
2024/04/23 11:05:47 [util/k8sops] initialized PeerPodService
2024/04/23 11:05:47 [probe/probe] Using port: 8000
2024/04/23 11:05:47 [adaptor] server started
2024/04/23 11:06:01 [podnetwork] routes on netns /var/run/netns/cni-9b568317-31fb-db0a-0a67-c78d789342e0
2024/04/23 11:06:01 [podnetwork]     0.0.0.0/0 via 10.244.1.1 dev eth0
2024/04/23 11:06:01 [podnetwork]     10.244.0.0/16 via 10.244.1.1 dev eth0
2024/04/23 11:06:01 [adaptor/cloud] Credentials file is not in a valid Json format, ignored
2024/04/23 11:06:01 [adaptor/cloud] stored /run/peerpod/pods/f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f/daemon.json
2024/04/23 11:06:01 [adaptor/cloud] create a sandbox f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f for pod nginx in namespace default (netns: /var/run/netns/cni-9b568317-31fb-db0a-0a67-c78d789342e0)
2024/04/23 11:06:01 [adaptor/cloud/libvirt] LaunchSecurityType: None
2024/04/23 11:06:01 [adaptor/cloud/libvirt] Checking if instance (podvm-nginx-f5a5be92) exists
2024/04/23 11:06:01 [adaptor/cloud/libvirt] Uploaded volume key /var/lib/libvirt/images/podvm-nginx-f5a5be92-root.qcow2
2024/04/23 11:06:01 [adaptor/cloud/libvirt] Create cloudInit iso
2024/04/23 11:06:01 [adaptor/cloud/libvirt] Uploading iso file: podvm-nginx-f5a5be92-cloudinit.iso
2024/04/23 11:06:01 [adaptor/cloud/libvirt] 45056 bytes uploaded
2024/04/23 11:06:01 [adaptor/cloud/libvirt] Volume ID: /var/lib/libvirt/images/podvm-nginx-f5a5be92-cloudinit.iso
2024/04/23 11:06:01 [adaptor/cloud/libvirt] Create XML for 'podvm-nginx-f5a5be92'
2024/04/23 11:06:01 [adaptor/cloud/libvirt] Creating VM 'podvm-nginx-f5a5be92'
2024/04/23 11:06:01 [adaptor/cloud/libvirt] Starting VM 'podvm-nginx-f5a5be92'
2024/04/23 11:06:02 [adaptor/cloud/libvirt] VM id 7
2024/04/23 11:06:23 [adaptor/cloud/libvirt] Instance created successfully
2024/04/23 11:06:23 [adaptor/cloud/libvirt] created an instance podvm-nginx-f5a5be92 for sandbox f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f
2024/04/23 11:06:23 [util/k8sops] nginx is now owning a PeerPod object
2024/04/23 11:06:23 [adaptor/cloud] created an instance podvm-nginx-f5a5be92 for sandbox f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f
2024/04/23 11:06:23 [tunneler/vxlan] vxlan ppvxlan1 (remote 192.168.122.254:4789, id: 555000) created at /proc/1/task/13/ns/net
2024/04/23 11:06:23 [tunneler/vxlan] vxlan ppvxlan1 created at /proc/1/task/13/ns/net
2024/04/23 11:06:23 [tunneler/vxlan] vxlan ppvxlan1 is moved to /var/run/netns/cni-9b568317-31fb-db0a-0a67-c78d789342e0
2024/04/23 11:06:23 [tunneler/vxlan] Add tc redirect filters between eth0 and vxlan1 on pod network namespace /var/run/netns/cni-9b568317-31fb-db0a-0a67-c78d789342e0
2024/04/23 11:06:23 [adaptor/proxy] Listening on /run/peerpod/pods/f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f/agent.ttrpc
2024/04/23 11:06:23 [adaptor/proxy] failed to init cri client, the err: cri runtime endpoint is not specified, it is used to get the image name from image digest
2024/04/23 11:06:23 [adaptor/proxy] Trying to establish agent proxy connection to 192.168.122.254:15150
2024/04/23 11:06:23 [adaptor/proxy] established agent proxy connection to 192.168.122.254:15150
2024/04/23 11:06:23 [adaptor/cloud] agent proxy is ready
2024/04/23 11:06:23 [adaptor/proxy] CreateSandbox: hostname:nginx sandboxId:f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f
2024/04/23 11:06:23 [adaptor/proxy]     storages:
2024/04/23 11:06:23 [adaptor/proxy]         mountpoint:/run/kata-containers/sandbox/shm source:shm fstype:tmpfs driver:ephemeral
2024/04/23 11:06:25 [adaptor/proxy] CreateContainer: containerID:f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f
2024/04/23 11:06:25 [adaptor/proxy]     mounts:
2024/04/23 11:06:25 [adaptor/proxy]         destination:/proc source:proc type:proc
2024/04/23 11:06:25 [adaptor/proxy]         destination:/dev source:tmpfs type:tmpfs
2024/04/23 11:06:25 [adaptor/proxy]         destination:/dev/pts source:devpts type:devpts
2024/04/23 11:06:25 [adaptor/proxy]         destination:/dev/mqueue source:mqueue type:mqueue
2024/04/23 11:06:25 [adaptor/proxy]         destination:/sys source:sysfs type:sysfs
2024/04/23 11:06:25 [adaptor/proxy]         destination:/dev/shm source:/run/kata-containers/sandbox/shm type:bind
2024/04/23 11:06:25 [adaptor/proxy]         destination:/etc/resolv.conf source:/run/kata-containers/shared/containers/f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f-cb97a11b2b31b71e-resolv.conf type:bind
2024/04/23 11:06:25 [adaptor/proxy]     annotations:
2024/04/23 11:06:25 [adaptor/proxy]         io.katacontainers.pkg.oci.container_type: pod_sandbox
2024/04/23 11:06:25 [adaptor/proxy]         io.kubernetes.cri.sandbox-id: f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f
2024/04/23 11:06:25 [adaptor/proxy]         io.kubernetes.cri.sandbox-uid: b3ddf27f-a678-4a2b-932e-f286f16c7d03
2024/04/23 11:06:25 [adaptor/proxy]         io.katacontainers.pkg.oci.bundle_path: /run/containerd/io.containerd.runtime.v2.task/k8s.io/f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f
2024/04/23 11:06:25 [adaptor/proxy]         io.kubernetes.cri.container-type: sandbox
2024/04/23 11:06:25 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-shares: 2
2024/04/23 11:06:25 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-quota: 0
2024/04/23 11:06:25 [adaptor/proxy]         io.kubernetes.cri.sandbox-name: nginx
2024/04/23 11:06:25 [adaptor/proxy]         io.kubernetes.cri.sandbox-namespace: default
2024/04/23 11:06:25 [adaptor/proxy]         io.kubernetes.cri.sandbox-log-directory: /var/log/pods/default_nginx_b3ddf27f-a678-4a2b-932e-f286f16c7d03
2024/04/23 11:06:25 [adaptor/proxy]         io.kubernetes.cri.sandbox-memory: 0
2024/04/23 11:06:25 [adaptor/proxy]         io.kubernetes.cri.sandbox-cpu-period: 100000
2024/04/23 11:06:25 [adaptor/proxy]         nerdctl/network-namespace: /var/run/netns/cni-9b568317-31fb-db0a-0a67-c78d789342e0
2024/04/23 11:06:25 [adaptor/proxy]     storages:
2024/04/23 11:06:25 [adaptor/proxy]         mount_point:/run/kata-containers/f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f/rootfs source:pause fstype:overlay driver:image_guest_pull
2024/04/23 11:06:25 [adaptor/proxy] StartContainer: containerID:f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f
2024/04/23 11:06:27 [adaptor/proxy] CreateContainer: containerID:d666dd01a271f4b3030851e55c39edc037570f3e7f5c148ba5e0b31820eccff8
2024/04/23 11:06:27 [adaptor/proxy]     mounts:
2024/04/23 11:06:27 [adaptor/proxy]         destination:/proc source:proc type:proc
2024/04/23 11:06:27 [adaptor/proxy]         destination:/dev source:tmpfs type:tmpfs
2024/04/23 11:06:27 [adaptor/proxy]         destination:/dev/pts source:devpts type:devpts
2024/04/23 11:06:27 [adaptor/proxy]         destination:/dev/mqueue source:mqueue type:mqueue
2024/04/23 11:06:27 [adaptor/proxy]         destination:/sys source:sysfs type:sysfs
2024/04/23 11:06:27 [adaptor/proxy]         destination:/sys/fs/cgroup source:cgroup type:cgroup
2024/04/23 11:06:27 [adaptor/proxy]         destination:/etc/hosts source:/run/kata-containers/shared/containers/d666dd01a271f4b3030851e55c39edc037570f3e7f5c148ba5e0b31820eccff8-116ec96e0f318291-hosts type:bind
2024/04/23 11:06:27 [adaptor/proxy]         destination:/dev/termination-log source:/run/kata-containers/shared/containers/d666dd01a271f4b3030851e55c39edc037570f3e7f5c148ba5e0b31820eccff8-b1e1de20d924b941-termination-log type:bind
2024/04/23 11:06:27 [adaptor/proxy]         destination:/etc/hostname source:/run/kata-containers/shared/containers/d666dd01a271f4b3030851e55c39edc037570f3e7f5c148ba5e0b31820eccff8-416109fa9381e6d1-hostname type:bind
2024/04/23 11:06:27 [adaptor/proxy]         destination:/etc/resolv.conf source:/run/kata-containers/shared/containers/d666dd01a271f4b3030851e55c39edc037570f3e7f5c148ba5e0b31820eccff8-cec22eb0353bc28d-resolv.conf type:bind
2024/04/23 11:06:27 [adaptor/proxy]         destination:/dev/shm source:/run/kata-containers/sandbox/shm type:bind
2024/04/23 11:06:27 [adaptor/proxy]         destination:/var/run/secrets/kubernetes.io/serviceaccount source:/run/kata-containers/shared/containers/d666dd01a271f4b3030851e55c39edc037570f3e7f5c148ba5e0b31820eccff8-0134a9e1026d5c80-serviceaccount type:bind
2024/04/23 11:06:27 [adaptor/proxy]     annotations:
2024/04/23 11:06:27 [adaptor/proxy]         io.kubernetes.cri.sandbox-id: f5a5be92bf37cab72c9bba602b047f33ce62a0d9ca4c5f9f4d6e4dc838ae3e0f
2024/04/23 11:06:27 [adaptor/proxy]         io.kubernetes.cri.image-name: docker.io/library/nginx:latest
2024/04/23 11:06:27 [adaptor/proxy]         io.kubernetes.cri.container-name: nginx
2024/04/23 11:06:27 [adaptor/proxy]         io.katacontainers.pkg.oci.container_type: pod_container
2024/04/23 11:06:27 [adaptor/proxy]         io.kubernetes.cri.sandbox-name: nginx
2024/04/23 11:06:27 [adaptor/proxy]         io.kubernetes.cri.sandbox-uid: b3ddf27f-a678-4a2b-932e-f286f16c7d03
2024/04/23 11:06:27 [adaptor/proxy]         io.katacontainers.pkg.oci.bundle_path: /run/containerd/io.containerd.runtime.v2.task/k8s.io/d666dd01a271f4b3030851e55c39edc037570f3e7f5c148ba5e0b31820eccff8
2024/04/23 11:06:27 [adaptor/proxy]         io.kubernetes.cri.container-type: container
2024/04/23 11:06:27 [adaptor/proxy]         io.kubernetes.cri.sandbox-namespace: default
2024/04/23 11:06:27 [adaptor/proxy]     storages:
2024/04/23 11:06:27 [adaptor/proxy]         mount_point:/run/kata-containers/d666dd01a271f4b3030851e55c39edc037570f3e7f5c148ba5e0b31820eccff8/rootfs source:docker.io/library/nginx:latest fstype:overlay driver:image_guest_pull
2024/04/23 11:06:31 [adaptor/proxy] StartContainer: containerID:d666dd01a271f4b3030851e55c39edc037570f3e7f5c148ba5e0b31820eccff8
  • This shows peer pods working using code based just on kata-containers main. You can now clear up the nginx peer pod:
kubectl delete pod nginx

@beraldoleal
Copy link
Member

Hi @stevenhorsman I can confirm this is working with containerd cluster. I was able to reproduce it.

Unfortunately not true with cri-o. :(

$ kubectl logs pod/cc-operator-pre-install-daemon-nmmhs -n confidential-containers-system
INSTALL_COCO_CONTAINERD: false
INSTALL_OFFICIAL_CONTAINERD: false
INSTALL_VFIO_GPU_CONTAINERD: false
INSTALL_NYDUS_SNAPSHOTTER: true
ERROR: cri-o is not yet supported 

It looks like the operator is not fully supporting cri-o yet.

@stevenhorsman
Copy link
Member Author

A small update here - I've created new images with the 3.4 release of kata-containers:
quay.io/stevenhorsman/cloud-api-adaptor:dev-040e9bef8fdb4e2ed94cf68ae04e89076f7b3249
quay.io/stevenhorsman/podvm-generic-ubuntu-amd64:9def86b85afa60d06146608c1f27c74a0568534540f0cc2d60c08cb046cfa0db

@stevenhorsman
Copy link
Member Author

stevenhorsman commented May 1, 2024

This is going okay and I have some e2e passes locally, but can't get them on the e2e CI as the pull_request_target only pulls from the main branch, so we first have to try and code into the workflow that should work and I've spun up #1828 to do this.

I also can't fully test it on my fork as it runs on the azure runner that I can't access.

@stevenhorsman
Copy link
Member Author

The workflow PR #1828 has been merged, so hopefully if I rebase the kata-runtime-bump branch on that we might be able to get some e2e tests runnings in the PR.

@stevenhorsman stevenhorsman linked a pull request May 15, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants