Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s crashes if I do not start the cloud-provider #10068

Closed
deitch opened this issue May 6, 2024 · 1 comment
Closed

k3s crashes if I do not start the cloud-provider #10068

deitch opened this issue May 6, 2024 · 1 comment
Assignees
Milestone

Comments

@deitch
Copy link

deitch commented May 6, 2024

Environmental Info:
K3s Version:

k3s version v1.29.4+k3s1 (94e29e2e)
go version go1.21.9

Node(s) CPU architecture, OS, and Version:

Linux controller-01 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

(Ubuntu 22.04 on x86_64)

Cluster Configuration:

Single node cluster (for simplicity of reproducing the issue)

Describe the bug:

k3s crashes every ~1:30 with the following error:

May 06 08:54:40 controller-01 k3s[4158]: time="2024-05-06T08:54:40Z" level=fatal msg="network policy controller timed out waiting for node.cloudprovider.kubernetes.io/uninitialized taint to be removed from Node controller-01: timed out waiting for the condition"

Steps To Reproduce:

BIND_IP=<my IP>
INSTALL_K3S_EXEC="\
    --bind-address ${BIND_IP} \
    --advertise-address ${BIND_IP} \
    --node-ip ${BIND_IP} \
    --tls-san ${BIND_IP} \
    --disable-cloud-controller \
    --kubelet-arg cloud-provider=external \
    --disable=servicelb \
    --cluster-init"
curl -sfL https://get.k3s.io | sh -

Install is fine. Because I added --disable-cloud-controller --kubelet-arg cloud-provider=external, it reasonably expects that I will add a cloud-controller, which will remove the taints.

Expected behavior:

I expect it to function. The node continues to have the taint, but k3s should not crash because of that (which, ironically, makes it harder to actually run the cloud-controller)

Actual behavior:

Crash every ~90 seconds.

Additional context / logs:

As described above. k3s is behaving correctly insofar as it expects that taint to be removed. It is causing issues in that it is crashing, rather than simply logging the issue.

As a contrast, run a full k8s (e.g. using kubeadm or "the hard way") with --cloud-provider=external, and it will have the taint. But the apiserver will not crash.

@fmoral2
Copy link
Contributor

fmoral2 commented May 31, 2024

Validated on Version:

-$ k3s version v1.30.1+k3s-7a0ea3c9 (7a0ea3c9)
 

Environment Details

Infrastructure
Cloud EC2 instance

Node(s) CPU architecture, OS, and Version:
Ubuntu
AMD

Cluster Configuration:
-1 nodes

config;


 sudo bash -c 'cat <<EOF>>/etc/rancher/k3s/config.yaml
write-kubeconfig-mode: 644
secrets-encryption: true
cluster-init: true
selinux: true
kubelet-arg:
 - cloud-provider=external
disable-cloud-controller: true
disable: servicelb
EOF' 

Steps to validate the fix

  1. start k3s with config above
  2. Validate that it not crashes
  3. Validate nodes and pods

Reproduction Issue:

 k3s version v1.29.5+k3s1 (4e53a323)

  journalctl -xeu k3s.service | grep "timed"
May 31 12:01:41 ip-  k3s[4434]: time="2024-05-31T12:01:41Z" level=fatal msg="network policy controller timed out waiting for node.cloudprovider.kubernetes.io/uninitialized taint to be removed from Node  : timed out waiting for the condition"

$ ^C
 :~$ systemctl status k3s  server.service
Unit server.service could not be found.
● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Fri 2024-05-31 12:04:28 UTC; 5s ago
       Docs: https://k3s.io
    Process: 4934 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null (code=exited, status=0/SUCCESS)
    Process: 4936 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 4937 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
    Process: 4938 ExecStart=/usr/local/bin/k3s server --cluster-init --token secret --write-kubeconfig-mode=644 (code=exited, status=1/FAILURE)
   Main PID: 4938 (code=exited, status=1/FAILURE)
        CPU: 29.239s


Validation Results:

$ journalctl -xeu k3s.service | grep "timed"
<no logs>

  $ systemctl status k3s  server.service
Unit server.service could not be found.
● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2024-05-31 12:11:14 UTC; 1min 42s ago
       Docs: https://k3s.io


:~$ systemctl status k3s  server.service
Unit server.service could not be found.
● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2024-05-31 12:11:14 UTC; 3min 1s ago
       Docs: https://k3s.io
    Process: 6502 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null (code=exited, status=0/SUCCESS)
    Process: 6504 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 6505 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)


$ k get nodes,pods -A
NAME                   STATUS   ROLES                       AGE   VERSION
node/ip-   Ready    control-plane,etcd,master   4m    v1.30.1+k3s-7a0ea3c9

NAMESPACE     NAME                                          READY   STATUS    RESTARTS   AGE
kube-system   pod/coredns-576bfc4dc7-qmzhb                  0/1     Pending   0          3m43s
kube-system   pod/helm-install-traefik-crd-6kdnn            0/1     Pending   0          3m44s
kube-system   pod/helm-install-traefik-tlcb2                0/1     Pending   0          3m44s
kube-system   pod/local-path-provisioner-75bb9ff978-zl25s   0/1     Pending   0          3m43s
kube-system   pod/metrics-server-557ff575fb-qkqgj           0/1     Pending   0          3m43s

@fmoral2 fmoral2 closed this as completed May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

4 participants