New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to add HA masters in kubespher v3.0.0 #4066
Comments
Hi Team. Any one help on this? [root@master2 ~]# systemctl status kubelet Jul 18 18:07:39 master2 kubelet[11569]: W0718 18:07:39.561495 11569 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d But the etcd cluster is work. |
@whaifang Can you parse the error log when adding new nodes? |
@RolandMa1986 We added master nodes not work nodes, and there is not error log when adding new master nodes, it tells me the process is complete and successful. It seems the LB configure is not updated in ~/.kube/configure, the server still refer to master0 not LB server. So I think the kk command is not completed all process for add HA masters, please help to confirm this. |
What kind of LB are you using for the Kube-apiserver? The config shows your LB address is |
@RolandMa1986 Our LB server is 9.112.254.207 and using nginx to setup this, we think the LB configure should be auto updated when kk command adding HA maters. But it still keep to master0:9.30.222.112 And please note: Our origin cluster is only one master (master0), so we were not configured LB server at that time. Is anyone successfully setup HA masters same to my process? /etc/hosts /etc/kubernetes/kubelet.conf: nginx.conf : worker_processes auto; stream {
} |
You can edit the kubelet.conf manually and use |
@RolandMa1986 Hi , I yet updated kubelet.conf manually and use lb.kubesphere.local as the Kube-apiserver address, nodes are ready now but another error will throws out to me, and many system pods are still crash now, Sometimes we got below error: These two pods are still ContainerCreating Some volumes folder are not exists. So what i means is that: I think the kk command is not completed all process for add HA masters, could you please check this? Is anyone successfully setup HA masters same to my process? |
@RolandMa1986 Any update on this? Hope good idea from you~~) |
We don't have a failure upgrade case similar to yours before. So you may have to troubleshoot by yourself. We can just give you some advice or suggestions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions. |
This issue is being automatically closed due to inactivity. |
Hi Team.
Our original cluster based on kubesphere-all-v2.1.1 is like below:
We upgrade to v3.0.0 according to https://v3-0.docs.kubesphere.io/docs/upgrade/upgrade-with-kubekey/
Then we add master1 and master2 on v3.0.0 to setup HA according to https://v3-0.docs.kubesphere.io/docs/installing-on-linux/cluster-operation/add-new-nodes/
config.yaml
spec:
hosts:
roleGroups:
etcd:
master:
worker:
controlPlaneEndpoint:
domain: lb.kubesphere.local
address: 9.112.254.207
port: 6443
But we fail, after we execute: ./kk add nodes -f config.yaml
[root@master0 conf]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master0 Ready master 4d1h v1.17.9 9.30.222.112 CentOS Linux 7 (Core) 3.10.0-1160.31.1.el7.x86_64 docker://18.9.7
master1 NotReady master 3d21h v1.17.9 9.30.181.110 CentOS Linux 7 (Core) 3.10.0-1160.31.1.el7.x86_64 docker://18.9.7
master2 NotReady master 3d21h v1.17.9 9.112.254.38 CentOS Linux 7 (Core) 3.10.0-1160.31.1.el7.x86_64 docker://20.10.7
node1 Ready worker 4d1h v1.17.9 9.30.181.117 CentOS Linux 7 (Core) 3.10.0-1160.31.1.el7.x86_64 docker://18.9.7
node2 Ready worker 4d1h v1.17.9 9.30.223.102 CentOS Linux 7 (Core) 3.10.0-1160.31.1.el7.x86_64 docker://18.9.7
Master1 and Master2 are not ready for us.
[root@master1 ~]# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Wed 2021-07-14 23:38:32 PDT; 20h ago
Docs: http://kubernetes.io/docs/
Main PID: 19529 (kubelet)
CGroup: /system.slice/kubelet.service
└─19529 /usr/local/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=kubesphere/pause:3.1 --node-ip=9.30.181.110 --hostname-override=master1
Jul 15 20:21:24 master1 kubelet[19529]: E0715 20:21:24.511205 19529 kubelet.go:2264] node "master1" not found
Jul 15 20:21:24 master1 kubelet[19529]: E0715 20:21:24.599211 19529 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://9.30.222.112:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster1&limit=500&resourceVersion=0: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
Jul 15 20:21:24 master1 kubelet[19529]: E0715 20:21:24.611363 19529 kubelet.go:2264] node "master1" not found
Jul 15 20:21:24 master1 kubelet[19529]: E0715 20:21:24.711571 19529 kubelet.go:2264] node "master1" not found
Jul 15 20:21:24 master1 kubelet[19529]: E0715 20:21:24.799528 19529 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: Get https://9.30.222.112:6443/api/v1/services?limit=500&resourceVersion=0: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
Jul 15 20:21:24 master1 kubelet[19529]: E0715 20:21:24.811756 19529 kubelet.go:2264] node "master1" not found
Jul 15 20:21:24 master1 kubelet[19529]: E0715 20:21:24.911945 19529 kubelet.go:2264] node "master1" not found
Jul 15 20:21:25 master1 kubelet[19529]: E0715 20:21:25.012102 19529 kubelet.go:2264] node "master1" not found
Jul 15 20:21:25 master1 kubelet[19529]: E0715 20:21:25.112244 19529 kubelet.go:2264] node "master1" not found
Jul 15 20:21:25 master1 kubelet[19529]: E0715 20:21:25.199911 19529 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: Get https://9.30.222.112:6443/apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&resourceVersion=0: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
Jul 15 20:21:25 master1 kubelet[19529]: E0715 20:21:25.212422 19529 kubelet.go:2264] node "master1" not found
[root@master0 ~]# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Thu 2021-07-15 01:15:42 PDT; 19h ago
Docs: http://kubernetes.io/docs/
Main PID: 5625 (kubelet)
Tasks: 27
Memory: 82.8M
CGroup: /system.slice/kubelet.service
└─5625 /usr/local/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=kubesphere/pause:3.1 --node-ip=9.30.222.112 --hostname-override=master0
Jul 15 20:21:28 master0 kubelet[5625]: E0715 20:21:28.027224 5625 pod_workers.go:191] Error syncing pod f6b1e75f-8491-47f0-8b9e-463893f25770 ("redis-6fd6c6d6f9-44ktp_kubesphere-system(f6b1e75f-8491-47f0-8b9e-463893f25770)"), skipping: unmounted volumes=[redis-pvc], unattached volumes=[redis-pvc default-token-l7tkq]: timed out waiting for the condition
Jul 15 20:21:33 master0 kubelet[5625]: E0715 20:21:33.027318 5625 kubelet.go:1681] Unable to attach or mount volumes for pod "openldap-0_kubesphere-system(72e73a3f-8188-4005-a4d9-a5184e9c99f8)": unmounted volumes=[openldap-pvc], unattached volumes=[openldap-pvc default-token-l7tkq]: timed out waiting for the condition; skipping pod
Jul 15 20:21:33 master0 kubelet[5625]: E0715 20:21:33.027354 5625 pod_workers.go:191] Error syncing pod 72e73a3f-8188-4005-a4d9-a5184e9c99f8 ("openldap-0_kubesphere-system(72e73a3f-8188-4005-a4d9-a5184e9c99f8)"), skipping: unmounted volumes=[openldap-pvc], unattached volumes=[openldap-pvc default-token-l7tkq]: timed out waiting for the condition
Jul 15 20:21:39 master0 kubelet[5625]: E0715 20:21:39.027303 5625 pod_workers.go:191] Error syncing pod 3c9b52a0-edc9-470c-9c68-ab744a33a2c6 ("ks-controller-manager-5b7f8cbd6c-h75zk_kubesphere-system(3c9b52a0-edc9-470c-9c68-ab744a33a2c6)"), skipping: failed to "StartContainer" for "ks-controller-manager" with CrashLoopBackOff: "back-off 5m0s restarting failed container=ks-controller-manager pod=ks-controller-manager-5b7f8cbd6c-h75zk_kubesphere-system(3c9b52a0-edc9-470c-9c68-ab744a33a2c6)"
Jul 15 20:21:40 master0 kubelet[5625]: E0715 20:21:40.027521 5625 pod_workers.go:191] Error syncing pod b5cf6111-08d1-4747-8f65-38600059bd00 ("ks-apiserver-869f56b578-98fdg_kubesphere-system(b5cf6111-08d1-4747-8f65-38600059bd00)"), skipping: failed to "StartContainer" for "ks-apiserver" with CrashLoopBackOff: "back-off 5m0s restarting failed container=ks-apiserver pod=ks-apiserver-869f56b578-98fdg_kubesphere-system(b5cf6111-08d1-4747-8f65-38600059bd00)"
Jul 15 20:21:52 master0 kubelet[5625]: E0715 20:21:52.027367 5625 pod_workers.go:191] Error syncing pod b5cf6111-08d1-4747-8f65-38600059bd00 ("ks-apiserver-869f56b578-98fdg_kubesphere-system(b5cf6111-08d1-4747-8f65-38600059bd00)"), skipping: failed to "StartContainer" for "ks-apiserver" with CrashLoopBackOff: "back-off 5m0s restarting failed container=ks-apiserver pod=ks-apiserver-869f56b578-98fdg_kubesphere-system(b5cf6111-08d1-4747-8f65-38600059bd00)"
Jul 15 20:21:52 master0 kubelet[5625]: E0715 20:21:52.028017 5625 pod_workers.go:191] Error syncing pod 3c9b52a0-edc9-470c-9c68-ab744a33a2c6 ("ks-controller-manager-5b7f8cbd6c-h75zk_kubesphere-system(3c9b52a0-edc9-470c-9c68-ab744a33a2c6)"), skipping: failed to "StartContainer" for "ks-controller-manager" with CrashLoopBackOff: "back-off 5m0s restarting failed container=ks-controller-manager pod=ks-controller-manager-5b7f8cbd6c-h75zk_kubesphere-system(3c9b52a0-edc9-470c-9c68-ab744a33a2c6)"
Jul 15 20:21:54 master0 kubelet[5625]: W0715 20:21:54.219451 5625 volume_linux.go:45] Setting volume ownership for /var/lib/kubelet/pods/659c68b1-d99a-48e3-9bf6-d8a04d082577/volumes/kubernetes.io~secret/dns-autoscaler-token-pr4p5 and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see kubernetes/kubernetes#69699
Jul 15 20:22:03 master0 kubelet[5625]: E0715 20:22:03.027313 5625 pod_workers.go:191] Error syncing pod 3c9b52a0-edc9-470c-9c68-ab744a33a2c6 ("ks-controller-manager-5b7f8cbd6c-h75zk_kubesphere-system(3c9b52a0-edc9-470c-9c68-ab744a33a2c6)"), skipping: failed to "StartContainer" for "ks-controller-manager" with CrashLoopBackOff: "back-off 5m0s restarting failed container=ks-controller-manager pod=ks-controller-manager-5b7f8cbd6c-h75zk_kubesphere-system(3c9b52a0-edc9-470c-9c68-ab744a33a2c6)"
Jul 15 20:22:05 master0 kubelet[5625]: E0715 20:22:05.027315 5625 pod_workers.go:191] Error syncing pod b5cf6111-08d1-4747-8f65-38600059bd00 ("ks-apiserver-869f56b578-98fdg_kubesphere-system(b5cf6111-08d1-4747-8f65-38600059bd00)"), skipping: failed to "StartContainer" for "ks-apiserver" with CrashLoopBackOff: "back-off 5m0s restarting failed container=ks-apiserver pod=ks-apiserver-869f56b578-98fdg_kubesphere-system(b5cf6111-08d1-4747-8f65-38600059bd00)"
These pods are CrashLoopBackOff
kube-system kube-apiserver-master1 0/1 CrashLoopBackOff 729 3d22h
kube-system openebs-localpv-provisioner-77fbd6858d-xk5w6 0/1 CrashLoopBackOff 203 4d1h
kube-system openebs-ndm-5tp84 0/1 CrashLoopBackOff 203 4d1h
kube-system openebs-ndm-operator-59c75c96fc-jpnt9 0/1 CrashLoopBackOff 204 4d1h
kube-system openebs-ndm-sndrv 0/1 CrashLoopBackOff 204 4d1h
kubesphere-monitoring-system kube-state-metrics-5c466fc7b6-lwbf8 2/3 CrashLoopBackOff 219 3d22h
kubesphere-monitoring-system notification-manager-deployment-7ff95b7544-r7n87 0/1 CrashLoopBackOff 219 3d22h
kubesphere-monitoring-system notification-manager-deployment-7ff95b7544-vq78c 0/1 CrashLoopBackOff 219 3d22h
kubesphere-monitoring-system notification-manager-operator-5cbb58b756-xbk8l 1/2 Error 219 3d22h
kubesphere-monitoring-system prometheus-operator-78c5cdbc8f-sbdqb 1/2 CrashLoopBackOff 219 3d22h
kubesphere-system ks-apiserver-869f56b578-98fdg 0/1 CrashLoopBackOff 215 3d22h
kubesphere-system ks-controller-manager-5b7f8cbd6c-h75zk 0/1 CrashLoopBackOff 215 3d22h
kubesphere-system ks-installer-85854b8c8-6l8gh 0/1 CrashLoopBackOff 202 3d22h
kubesphere-system ks-upgrade-wlcqq 0/1 Completed 0 3d22h
kubesphere-system openldap-0 0/1 ContainerCreating 2 4d1h
kubesphere-system redis-6fd6c6d6f9-44ktp 0/1 ContainerCreating 2 4d1h
**Why we encountered so many errors on setup HA masters using kk command in v3.0.0.
Have you tested successfully like this case?
**
The text was updated successfully, but these errors were encountered: