Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libvirt: support s390x cluster #1597

Merged
merged 1 commit into from
Nov 27, 2023

Conversation

liudalibj
Copy link
Member

@liudalibj liudalibj commented Nov 23, 2023

fixes #1598

  • create s390x cluster with libvirt
  • show e2e-test result for s390x libvirt cluster

libvirt/README.md Outdated Show resolved Hide resolved
@liudalibj liudalibj force-pushed the libvirt-s390x branch 5 times, most recently from 6a46210 to 8eb704a Compare November 23, 2023 10:49
Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running through this I've hit:

time="2023-11-23T11:01:55Z" level=info msg="Cluster provisioning"
F1123 11:01:55.014896   18134 env.go:369] Setup failure: Storage pool 'default' not found. It should be created beforehand
FAIL	github.com/confidential-containers/cloud-api-adaptor/test/e2e	0.028s

which might be related to the comment I added?

libvirt/kcli_cluster.sh Outdated Show resolved Hide resolved
@stevenhorsman
Copy link
Member

Running through this I've hit:

time="2023-11-23T11:01:55Z" level=info msg="Cluster provisioning"
F1123 11:01:55.014896   18134 env.go:369] Setup failure: Storage pool 'default' not found. It should be created beforehand
FAIL	github.com/confidential-containers/cloud-api-adaptor/test/e2e	0.028s

which might be related to the comment I added?

When I manually ran:

sudo virsh pool-define-as default dir - - - - "/var/lib/libvirt/images"
sudo virsh pool-build default
sudo virsh pool-start default
sudo setfacl -m "u:${USER}:rwx" /var/lib/libvirt/images
sudo adduser "$USER" libvirt
sudo setfacl -m "u:${USER}:rwx" /var/run/libvirt/libvirt-sock

before make test-e2e which was my old process then it got past this error

@stevenhorsman
Copy link
Member

I hit another error:

Using pool default
Grabbing image ubuntu2204 from url https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-s390x.img
Image ubuntu2204 already there.Leaving...
Using 192.168.122.253 as api_ip
Using keepalived virtual_router_id 187
Deploying Vms...
Hypervisor not compatible with nesting. Skipping
Traceback (most recent call last):
  File "/usr/local/bin/kcli", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 5364, in cli
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 1819, in create_generic_kube
    create_kube(args)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 1798, in create_kube
    result = config.create_kube(cluster, kubetype, overrides=overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2663, in create_kube
    result = self.create_kube_generic(cluster, overrides=overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2688, in create_kube_generic
    return kubeadm.create(self, plandir, cluster, overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cluster/kubeadm/__init__.py", line 182, in create
    result = config.plan(plan, inputfile=f'{plandir}/bootstrap.yml', overrides=data)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2133, in plan
    result = self.create_vm(name, profilename, overrides=currentoverrides, customprofile=profile, k=z,
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 953, in create_vm
    result = k.create(name=name, virttype=virttype, plan=plan, profile=profilename, flavor=flavor,
  File "/usr/local/lib/python3.10/dist-packages/kvirt/providers/kvm/__init__.py", line 1344, in create
    conn.defineXML(vmxml)
  File "/usr/local/lib/python3.10/dist-packages/libvirt.py", line 4441, in defineXML
    raise libvirtError('virDomainDefineXML() failed')
libvirt.libvirtError: unsupported configuration: ps2 is not supported by this QEMU binary
F1123 11:05:15.541985   18749 env.go:369] Setup failure: exit status 1
FAIL	github.com/confidential-containers/cloud-api-adaptor/test/e2e	2.129s

so I'm not sure if I have missed an package install?

@liudalibj
Copy link
Member Author

I hit another error:

Using pool default
Grabbing image ubuntu2204 from url https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-s390x.img
Image ubuntu2204 already there.Leaving...
Using 192.168.122.253 as api_ip
Using keepalived virtual_router_id 187
Deploying Vms...
Hypervisor not compatible with nesting. Skipping
Traceback (most recent call last):
  File "/usr/local/bin/kcli", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 5364, in cli
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 1819, in create_generic_kube
    create_kube(args)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 1798, in create_kube
    result = config.create_kube(cluster, kubetype, overrides=overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2663, in create_kube
    result = self.create_kube_generic(cluster, overrides=overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2688, in create_kube_generic
    return kubeadm.create(self, plandir, cluster, overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cluster/kubeadm/__init__.py", line 182, in create
    result = config.plan(plan, inputfile=f'{plandir}/bootstrap.yml', overrides=data)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2133, in plan
    result = self.create_vm(name, profilename, overrides=currentoverrides, customprofile=profile, k=z,
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 953, in create_vm
    result = k.create(name=name, virttype=virttype, plan=plan, profile=profilename, flavor=flavor,
  File "/usr/local/lib/python3.10/dist-packages/kvirt/providers/kvm/__init__.py", line 1344, in create
    conn.defineXML(vmxml)
  File "/usr/local/lib/python3.10/dist-packages/libvirt.py", line 4441, in defineXML
    raise libvirtError('virDomainDefineXML() failed')
libvirt.libvirtError: unsupported configuration: ps2 is not supported by this QEMU binary
F1123 11:05:15.541985   18749 env.go:369] Setup failure: exit status 1
FAIL	github.com/confidential-containers/cloud-api-adaptor/test/e2e	2.129s

so I'm not sure if I have missed an package install?

Please check the "-P arch=s390x" is added to the kcli create kube ... command.

@liudalibj
Copy link
Member Author

liudalibj commented Nov 23, 2023

I tested with kcli kcli-99.0.202311221029

root@liudali-z-test:~# pip3 install kcli
Collecting kcli
  Using cached kcli-99.0.202311221029-py3-none-any.whl (1.5 MB)
Requirement already satisfied: jinja2 in /usr/lib/python3/dist-packages (from kcli) (2.10.1)
Requirement already satisfied: argcomplete in /usr/local/lib/python3.8/dist-packages (from kcli) (3.1.6)
Requirement already satisfied: libvirt-python>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from kcli) (9.9.0)
Requirement already satisfied: prettytable in /usr/local/lib/python3.8/dist-packages (from kcli) (3.9.0)
Requirement already satisfied: PyYAML in /usr/lib/python3/dist-packages (from kcli) (5.3.1)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.8/dist-packages (from prettytable->kcli) (0.2.12)
Installing collected packages: kcli
Successfully installed kcli-99.0.202311221029
root@liudali-z-test:~# kcli version
version: 99.0 commit: 841a950 2023/11/22 Available Updates: False
root@liudali-z-test:~#

And install the required packages with:

sudo DEBIAN_FRONTEND=noninteractive apt-get update -y > /dev/null
sudo DEBIAN_FRONTEND=noninteractive apt-get install git make python3-pip genisoimage qemu-kvm libvirt-daemon-system libvirt-dev cpu-checker -y

@stevenhorsman

@liudalibj
Copy link
Member Author

my env:
ubuntu 20.04 VPC Z VSI

root@liudali-z-test:~# /usr/bin/qemu-system-s390x --version
QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.27)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers
root@liudali-z-test:~# uname -a
Linux liudali-z-test 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 14:00:33 UTC 2022 s390x s390x s390x GNU/Linux
root@liudali-z-test:~#

@stevenhorsman
Copy link
Member

Hey DaLi,

I'm still hitting an issue with the storage pool after creating a brand new zVSI to test this on.
My machine set-up (Which is 22.04):

/usr/bin/qemu-system-s390x --version
QEMU emulator version 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.15)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
root@sh-libvirt-s390x-e2e-test-2:~/go/src/github.com/confidential-containers/cloud-api-adaptor# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
# uname -a
Linux sh-libvirt-s390x-e2e-test-2 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:23:03 UTC 2023 s390x s390x s390x GNU/Linux
root@sh-libvirt-s390x-e2e-test-2:~/go/src/github.com/confidential-containers/cloud-api-adaptor# kcli version
version: 99.0 commit: 841a950 2023/11/22 Available Updates: False

The error I get is:

time="2023-11-23T12:16:49Z" level=info msg="Do setup"
time="2023-11-23T12:16:49Z" level=info msg="Cluster provisioning"
F1123 12:16:49.806353   17796 env.go:369] Setup failure: Storage pool 'default' not found. It should be created beforehand

when I manually check it looks like it hasn't been created:

# sudo virsh pool-info default
error: failed to get pool 'default'
error: Storage pool not found: no storage pool with matching name 'default'

When I ran ./libvirt/kcli)cluster.sh create manually it just creates the pool and continues and unfortunately the output isn't shown in the e2e, so I'll do some more work to try and enable that debug

@stevenhorsman
Copy link
Member

Please check the "-P arch=s390x" is added to the kcli create kube ... command.

Running kcli_cluster.sh create manually I'm still hitting the same error and it's passing in the arguments correectly:

 kcli create kube generic -P domain=kata.com -P pool=default -P ctlplanes=1 -P workers=1 -P network=default -P image=ubuntu2204 -P sdn=flannel -P nfs=false -P disk_size=20 -P version=1.26.7 -P arch=s390x -P multus=false -P autolabeller=false peer-pods
Using 192.168.122.253 as api_ip
Using keepalived virtual_router_id 8
Deploying Vms...
Hypervisor not compatible with nesting. Skipping
Traceback (most recent call last):
  File "/usr/local/bin/kcli", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 5364, in cli
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 1819, in create_generic_kube
    create_kube(args)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 1798, in create_kube
    result = config.create_kube(cluster, kubetype, overrides=overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2663, in create_kube
    result = self.create_kube_generic(cluster, overrides=overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2688, in create_kube_generic
    return kubeadm.create(self, plandir, cluster, overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cluster/kubeadm/__init__.py", line 182, in create
    result = config.plan(plan, inputfile=f'{plandir}/bootstrap.yml', overrides=data)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2133, in plan
    result = self.create_vm(name, profilename, overrides=currentoverrides, customprofile=profile, k=z,
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 953, in create_vm
    result = k.create(name=name, virttype=virttype, plan=plan, profile=profilename, flavor=flavor,
  File "/usr/local/lib/python3.10/dist-packages/kvirt/providers/kvm/__init__.py", line 1344, in create
    conn.defineXML(vmxml)
  File "/usr/local/lib/python3.10/dist-packages/libvirt.py", line 4441, in defineXML
    raise libvirtError('virDomainDefineXML() failed')
libvirt.libvirtError: unsupported configuration: ps2 is not supported by this QEMU binary

@stevenhorsman
Copy link
Member

The error I get is:

time="2023-11-23T12:16:49Z" level=info msg="Do setup"
time="2023-11-23T12:16:49Z" level=info msg="Cluster provisioning"
F1123 12:16:49.806353 17796 env.go:369] Setup failure: Storage pool 'default' not found. It should be created beforehand

@liudalibj - I've found the reason for this. That error and storage pool check is done as part of CreateVPC, which runs before CreateCluster where your script changes would be run. I've dicussed with Wainer whether it's more appropriate for the pool storage checks to move into the podvm upload stage, so that might be a solution to this issue?

@liudalibj
Copy link
Member Author

The error I get is:
time="2023-11-23T12:16:49Z" level=info msg="Do setup"
time="2023-11-23T12:16:49Z" level=info msg="Cluster provisioning"
F1123 12:16:49.806353 17796 env.go:369] Setup failure: Storage pool 'default' not found. It should be created beforehand

@liudalibj - I've found the reason for this. That error and storage pool check is done as part of CreateVPC, which runs before CreateCluster where your script changes would be run. I've dicussed with Wainer whether it's more appropriate for the pool storage checks to move into the podvm upload stage, so that might be a solution to this issue?

Ah, got it, I use kcli_cluster.sh create command to verify the script on my Z VSI, to make sure the script work as expected, and I delete the cluster with command kcli_cluster.sh delete before I run make e2e-test for libvirt.

In your case, you didn't run the kcli_cluster.sh create command, you directly run make e2e-test, so it reports Setup failure: Storage pool 'default' not found. It should be created beforehand

@wainersm
Copy link
Member

hi @liudalibj @stevenhorsman !

Notice that the storage pool is not necessarily the same for the e2e tests (i.e. the pool where the podvm will be pushed to) and kcli (i.e. the pool where kcli will built the nodes images).

On workflows/e2e_libvirt.yaml it setups the test environment, when libvirt/kcli_cluster.sh is executed by e2e all the requirements are met. With some changes proposed here, half of the test environment setup is in libvirt/kcli_cluster.sh and half is somewhere else. So my suggestion is to extract the commands of workflows/e2e_libvirt.yaml into a script (maybe Ansible?) and leave any setup to the environment (create pools, download images...etc) out of libvirt/kcli_cluster.sh (i.e. that is script does only what it is supposed to do: create/destroy the cluster). That Ansible/script with the environment setup then can be used by different workflows as well as developers. Makes sense?

@huoqifeng
Copy link
Contributor

hi @liudalibj @stevenhorsman !

Notice that the storage pool is not necessarily the same for the e2e tests (i.e. the pool where the podvm will be pushed to) and kcli (i.e. the pool where kcli will built the nodes images).

On workflows/e2e_libvirt.yaml it setups the test environment, when libvirt/kcli_cluster.sh is executed by e2e all the requirements are met. With some changes proposed here, half of the test environment setup is in libvirt/kcli_cluster.sh and half is somewhere else. So my suggestion is to extract the commands of workflows/e2e_libvirt.yaml into a script (maybe Ansible?) and leave any setup to the environment (create pools, download images...etc) out of libvirt/kcli_cluster.sh (i.e. that is script does only what it is supposed to do: create/destroy the cluster). That Ansible/script with the environment setup then can be used by different workflows as well as developers. Makes sense?

And, the make e2e also calls libvirt/kcli_cluster.sh in https://github.com/confidential-containers/cloud-api-adaptor/blob/main/test/provisioner/provision_libvirt.go#L105

@liudalibj
Copy link
Member Author

hi @liudalibj @stevenhorsman !

Notice that the storage pool is not necessarily the same for the e2e tests (i.e. the pool where the podvm will be pushed to) and kcli (i.e. the pool where kcli will built the nodes images).

On workflows/e2e_libvirt.yaml it setups the test environment, when libvirt/kcli_cluster.sh is executed by e2e all the requirements are met. With some changes proposed here, half of the test environment setup is in libvirt/kcli_cluster.sh and half is somewhere else. So my suggestion is to extract the commands of workflows/e2e_libvirt.yaml into a script (maybe Ansible?) and leave any setup to the environment (create pools, download images...etc) out of libvirt/kcli_cluster.sh (i.e. that is script does only what it is supposed to do: create/destroy the cluster). That Ansible/script with the environment setup then can be used by different workflows as well as developers. Makes sense?

Good idea, I created a new script, config_libvirt.sh, users/workflows can use it to config libvirt and kcli.

@liudalibj
Copy link
Member Author

Please check the "-P arch=s390x" is added to the kcli create kube ... command.

Running kcli_cluster.sh create manually I'm still hitting the same error and it's passing in the arguments correectly:

 kcli create kube generic -P domain=kata.com -P pool=default -P ctlplanes=1 -P workers=1 -P network=default -P image=ubuntu2204 -P sdn=flannel -P nfs=false -P disk_size=20 -P version=1.26.7 -P arch=s390x -P multus=false -P autolabeller=false peer-pods
Using 192.168.122.253 as api_ip
Using keepalived virtual_router_id 8
Deploying Vms...
Hypervisor not compatible with nesting. Skipping
Traceback (most recent call last):
  File "/usr/local/bin/kcli", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 5364, in cli
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 1819, in create_generic_kube
    create_kube(args)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 1798, in create_kube
    result = config.create_kube(cluster, kubetype, overrides=overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2663, in create_kube
    result = self.create_kube_generic(cluster, overrides=overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2688, in create_kube_generic
    return kubeadm.create(self, plandir, cluster, overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cluster/kubeadm/__init__.py", line 182, in create
    result = config.plan(plan, inputfile=f'{plandir}/bootstrap.yml', overrides=data)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2133, in plan
    result = self.create_vm(name, profilename, overrides=currentoverrides, customprofile=profile, k=z,
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 953, in create_vm
    result = k.create(name=name, virttype=virttype, plan=plan, profile=profilename, flavor=flavor,
  File "/usr/local/lib/python3.10/dist-packages/kvirt/providers/kvm/__init__.py", line 1344, in create
    conn.defineXML(vmxml)
  File "/usr/local/lib/python3.10/dist-packages/libvirt.py", line 4441, in defineXML
    raise libvirtError('virDomainDefineXML() failed')
libvirt.libvirtError: unsupported configuration: ps2 is not supported by this QEMU binary

I reproduced this issue on Ubuntu22.04 s390x vsi, so it seems that the QEMU binary do not support ps2, need do a fix on kcli side.

@liudalibj
Copy link
Member Author

hi @liudalibj @stevenhorsman !
Notice that the storage pool is not necessarily the same for the e2e tests (i.e. the pool where the podvm will be pushed to) and kcli (i.e. the pool where kcli will built the nodes images).
On workflows/e2e_libvirt.yaml it setups the test environment, when libvirt/kcli_cluster.sh is executed by e2e all the requirements are met. With some changes proposed here, half of the test environment setup is in libvirt/kcli_cluster.sh and half is somewhere else. So my suggestion is to extract the commands of workflows/e2e_libvirt.yaml into a script (maybe Ansible?) and leave any setup to the environment (create pools, download images...etc) out of libvirt/kcli_cluster.sh (i.e. that is script does only what it is supposed to do: create/destroy the cluster). That Ansible/script with the environment setup then can be used by different workflows as well as developers. Makes sense?

And, the make e2e also calls libvirt/kcli_cluster.sh in https://github.com/confidential-containers/cloud-api-adaptor/blob/main/test/provisioner/provision_libvirt.go#L105

yeah libvirt/kcli_cluster.sh will only focus on create/delete cluster, install packages and config libvirt be put to script libvirt/config_libvirt.sh

liudalibj added a commit to liudalibj/kcli that referenced this pull request Nov 23, 2023
- 'ps2 is not supported by this QEMU binary'

confidential-containers/cloud-api-adaptor#1597 (comment)

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
@liudalibj
Copy link
Member Author

ps2 is not supported by this QEMU binary

Create a fix pr karmab/kcli#623

Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-tried on a 20.04 zVSI and have few suggestions. Manually editing the scripts, I'm able to get futher in the process, by the e2e tests are failing at the operator install phase, with:

# kubectl get pods -A
NAMESPACE                        NAME                                              READY   STATUS              RESTARTS        AGE
confidential-containers-system   cc-operator-controller-manager-857f844f7d-97n7g   2/2     Running             1 (4m22s ago)   5m24s
confidential-containers-system   cc-operator-daemon-install-t4v7p                  0/1     ContainerCreating   0               4m36s

The describe event on that image just seems to suggest that it is taking >5mins to pull:

Events:
  Type     Reason                  Age    From               Message
  ----     ------                  ----   ----               -------
  Normal   Scheduled               4m58s  default-scheduler  Successfully assigned confidential-containers-system/cc-operator-daemon-install-t4v7p to peer-pods-worker-0
  Warning  FailedCreatePodSandBox  4m58s  kubelet            Failed to create pod sandbox: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory"
  Normal   Pulling                 4m43s  kubelet            Pulling image "quay.io/confidential-containers/runtime-payload-ci:kata-containers-8de1f8e19f858134ba455a7c04edcb21d8bcf6b1"

I'm hoping that's a network issue with quay or something and I'll try again later

libvirt/config_libvirt.sh Show resolved Hide resolved
libvirt/config_libvirt.sh Outdated Show resolved Hide resolved
libvirt/kcli_cluster.sh Outdated Show resolved Hide resolved
@stevenhorsman
Copy link
Member

yeah libvirt/kcli_cluster.sh will only focus on create/delete cluster, install packages and config libvirt be put to script libvirt/config_libvirt.sh

The separate config script is really helpful to me when doing local tests @wainersm are you ok with this approach?

karmab pushed a commit to karmab/kcli that referenced this pull request Nov 23, 2023
- 'ps2 is not supported by this QEMU binary'

confidential-containers/cloud-api-adaptor#1597 (comment)

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
libvirt/README.md Outdated Show resolved Hide resolved
@stevenhorsman
Copy link
Member

I'm able to get futher in the process, by the e2e tests are failing at the operator install phase, with:

kubectl get pods -A

NAMESPACE NAME READY STATUS RESTARTS AGE
confidential-containers-system cc-operator-controller-manager-857f844f7d-97n7g 2/2 Running 1 (4m22s ago) 5m24s
confidential-containers-system cc-operator-daemon-install-t4v7p 0/1 ContainerCreating 0 4m36s
The describe event on that image just seems to suggest that it is taking >5mins to pull:

Events:
Type Reason Age From Message


Normal Scheduled 4m58s default-scheduler Successfully assigned confidential-containers-system/cc-operator-daemon-install-t4v7p to peer-pods-worker-0
Warning FailedCreatePodSandBox 4m58s kubelet Failed to create pod sandbox: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory"
Normal Pulling 4m43s kubelet Pulling image "quay.io/confidential-containers/runtime-payload-ci:kata-containers-8de1f8e19f858134ba455a7c04edcb21d8bcf6b1"
I'm hoping that's a network issue with quay or something and I'll try again later

Hey DaLi, just to give you an update on this. I had the same problem this morning on Ubuntu 22.04, but then had a thought that the profile I was using might be the issue. I was running it on a cz2-4x8 zVSI, which is the same size I use for x86, but I checked and saw you were using bz2-8x32, so I've retried with that and got some e2e tests running now...

@liudalibj liudalibj force-pushed the libvirt-s390x branch 2 times, most recently from fe27b47 to 72001c2 Compare November 24, 2023 12:25
Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see @wainersm review this before it's merged, but I've tried it out on a 8 vCPU, 32GB RAM s390x VM and all the tests pass. Thanks for all the great work @liudalibj!

@wainersm
Copy link
Member

Hi @liudalibj ! First, thanks for accepting my suggestion!

I tried it out locally on my x86_64 machine, running with make e2-tests CLOUD_PROVIDER=libvirt.

I got this error:

error: 'node-role.kubernetes.io/worker' already has a value (), and --overwrite is false
F1124 11:19:01.736625  286991 env.go:369] Setup failure: exit status 1
FAIL    github.com/confidential-containers/cloud-api-adaptor/test/e2e   579.220s
FAIL
make: *** [Makefile:95: test-e2e] Error 1

That I fixed with:

index fc2caff..01f3cd4 100755
--- a/libvirt/kcli_cluster.sh
+++ b/libvirt/kcli_cluster.sh
@@ -81,8 +81,8 @@ create () {
        fi
        workers=$(kubectl get nodes -o name --no-headers | grep 'worker')
        for worker in $workers; do
-               kubectl label "$worker" node.kubernetes.io/worker=
-               kubectl label "$worker" node-role.kubernetes.io/worker=
+               kubectl label --overwrite "$worker" node.kubernetes.io/worker=
+               kubectl label --overwrite "$worker" node-role.kubernetes.io/worker=
        done
 
        # Ensure that system pods are running or completed.

Having that the e2e tests passed on my machine \o/

- create s390x cluster with libvirt
- show e2e-test result for s390x libvirt cluster

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>

follow the review comment, use a new script to config libvirt
@liudalibj
Copy link
Member Author

Hi @liudalibj ! First, thanks for accepting my suggestion!

I tried it out locally on my x86_64 machine, running with make e2-tests CLOUD_PROVIDER=libvirt.

I got this error:

error: 'node-role.kubernetes.io/worker' already has a value (), and --overwrite is false
F1124 11:19:01.736625  286991 env.go:369] Setup failure: exit status 1
FAIL    github.com/confidential-containers/cloud-api-adaptor/test/e2e   579.220s
FAIL
make: *** [Makefile:95: test-e2e] Error 1

That I fixed with:

index fc2caff..01f3cd4 100755
--- a/libvirt/kcli_cluster.sh
+++ b/libvirt/kcli_cluster.sh
@@ -81,8 +81,8 @@ create () {
        fi
        workers=$(kubectl get nodes -o name --no-headers | grep 'worker')
        for worker in $workers; do
-               kubectl label "$worker" node.kubernetes.io/worker=
-               kubectl label "$worker" node-role.kubernetes.io/worker=
+               kubectl label --overwrite "$worker" node.kubernetes.io/worker=
+               kubectl label --overwrite "$worker" node-role.kubernetes.io/worker=
        done
 
        # Ensure that system pods are running or completed.

Having that the e2e tests passed on my machine \o/

Thanks @wainersm I updated the script base on your comment.

Copy link
Contributor

@huoqifeng huoqifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liudalibj
Copy link
Member Author

@wainersm do you have any new comments? If there is none, I would like merge this pr tomorrow.

Copy link
Member

@wainersm wainersm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @liudalibj !

@wainersm wainersm merged commit 88fc551 into confidential-containers:main Nov 27, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libvirt s390x cluster support.
4 participants