Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady... #284

Closed
max-rocket-internet opened this issue Jan 9, 2019 · 42 comments
Labels

Comments

@max-rocket-internet
Copy link
Contributor

max-rocket-internet commented Jan 9, 2019

EKS: v1.11.5
CNI: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:1.3.0
AMI: amazon-eks-node-1.11-v20181210 (ami-0a9006fb385703b54)

We are still seeing these CNI errors in pod events. e.g.

Events:
  Type     Reason           Age               From                                                Message
  ----     ------           ----              ----                                                -------
  Warning  NetworkNotReady  5s (x3 over 35s)  kubelet, ip-10-0-26-197.eu-west-1.compute.internal  network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized]

I tried to run /opt/cni/bin/aws-cni-support.sh on the node with pod aws-node-hhtrt but I get this error:

[root@ip-10-0-25-4 ~]# /opt/cni/bin/aws-cni-support.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1223  100  1223    0     0   1223      0  0:00:01 --:--:--  0:00:01 1194k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   912  100   912    0     0    912      0  0:00:01 --:--:--  0:00:01  890k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   106  100   106    0     0    106      0  0:00:01 --:--:--  0:00:01  103k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    83  100    83    0     0     83      0  0:00:01 --:--:--  0:00:01 83000
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    28  100    28    0     0     28      0  0:00:01 --:--:--  0:00:01 28000
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6268  100  6268    0     0   6268      0  0:00:01 --:--:--  0:00:01 6121k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 10255: Connection refused
@max-rocket-internet max-rocket-internet changed the title uninitialized network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized] Jan 9, 2019
@max-rocket-internet max-rocket-internet changed the title network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized] network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady... Jan 9, 2019
@nxf5025
Copy link

nxf5025 commented Jan 11, 2019

Hitting this as well with the same setup as above.

@nak3
Copy link
Contributor

nak3 commented Jan 12, 2019

I tried to run /opt/cni/bin/aws-cni-support.sh on the node with pod aws-node-hhtrt but I get this error:

The second one is same with #285

The line in the script should be updated as command -v kubectl > /dev/null && kubectl get --kubeconfig=/var/lib/kubelet/kubeconfig --raw=/api/v1/pods or something.

@mogren mogren added the bug label Mar 15, 2019
@max-rocket-internet
Copy link
Contributor Author

Still seeing this now and again:

  Warning  FailedCreatePodSandBox  7m33s                  kubelet, ip-10-0-25-88.eu-west-1.compute.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "2c600b4c1a8f344393f614e04706dc428ba1467ca67cb1169674807bd830646d" network for pod "ingress02-nginx-ingress-controller-dr84q": NetworkPlugin cni failed to set up pod "ingress02-nginx-ingress-controller-dr84q_default" network: add cmd: failed to assign an IP address to container

@tiffanyfay
Copy link
Contributor

@max-rocket-internet Hey, are you still hitting the issue of network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized?

With the newer CNI the support script should work since it addresses the change for that kubelet port.

Have you seen network: add cmd: failed to assign an IP address to container again? This went into 1.4 #367.

For either of these, if you have, what CNI version are you using? Thanks!

@TarekAS
Copy link

TarekAS commented Jun 10, 2019

I'm on EKS 1.12.7 and CNI 1.3.3, and this error actually happened to most of my nodes for about 10 minutes, and resolved itself magically (seemingly). It was right after I re-deployed my ASGs through CloudFormation.
I'm using the latest AMI as of this date.

@tiffanyfay Do you have any insight on why this could happen?

@stephenmuss
Copy link

Recording this here in case it helps others.

I had a MutatingWebhookConfiguration hanging around that was no longer relevant and there were no pods available to service it. This was stopping nodes from becoming Ready. The kubelet logs and describe node messages had the exact same error as recorded here.

In my case, running kubectl delete MutatingWebhookConfiguration <name> and then restarting one of the kubelets caused all nodes to become healthy/ready.

@Pharb
Copy link

Pharb commented Jun 27, 2019

I also had a similar issue today with EKS 1.12 and CNI plugin version 1.4.1.
I also got KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized on a few nodes for about 10-20 minutes, then it worked again.

I didn't find anymore debugging info and the nodes were replaced by the cluster auto-scaler. Is there anything I should look out for if this happens again?

@xrl
Copy link

xrl commented Jul 7, 2019

I have been seeing this problem on pods that start immediately after the kube node comes up. If I delete the pods and have them "try again" they get their IPs and there's no warnings. Could this be solved through a node readiness change?

@jahoward
Copy link

Just had the same issue with 1.11.9. The cni networking failed on one of two new nodes so the failed node never joined the cluster. A reboot from the AWS Console got it working

@jahoward
Copy link

the first two warnings I see are

Jul 26 08:18:06 ip-10-2-118-4.ap-southeast-2.compute.internal kubelet[4537]: W0726 08:18:06.517655    4537 cni.go:172] Unable to update cni config: No networks found in /etc/
Jul 26 08:18:06 ip-10-2-118-4.ap-southeast-2.compute.internal kubelet[4537]: W0726 08:18:06.521509    4537 cni.go:172] Unable to update cni config: No networks found in /etc/

@schahal
Copy link

schahal commented Oct 28, 2019

Just had the same issue with 1.11.9. The cni networking failed on one of two new nodes so the failed node never joined the cluster. A reboot from the AWS Console got it working

This is the workaround we use as well.

Our environment where we ran into it:

k8s: Kubernetes v1.13.11-eks-5876d6
cni plugin: amazon-k8s-cni:v1.5.3

This has happened only a couple times over half a year (so on older versions too), so it's difficult for us to reproduce.

@maltekrupa
Copy link

EKS: v1.14.7-eks-1861c5
CNI: amazon-k8s-cni:v1.5.3
AMI: amazon-eks-node-1.14-v20190927 (ami-0e21bc066a9dbabfa)

Same problem on multiple EKS cluster. New VMs cannot join the cluster.

Kubelet error on the nodes:

Oct 29 07:29:27 ip-10-1-21-123.eu-central-1.compute.internal kubelet[3727]: W1029 07:29:27.735403    3727 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Oct 29 07:29:28 ip-10-1-21-123.eu-central-1.compute.internal kubelet[3727]: E1029 07:29:28.262822    3727 kubelet.go:2172] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Events:

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

@mogren
Copy link
Contributor

mogren commented Oct 30, 2019

@temal- Thanks for the report. If the cni binary and config file is missing, ipamd must have failed to start correctly on the new node. There are a few possible options. Either the calls to the EC2 control plane got throttled and timed out, or there are no more ENIs or IPs available in the subnet. If you could get the logfiles from ipamd on a node that has this issue it would be extremely helpful.

(A comprehensive log collector script: amazon-eks-ami/log-collector-script)

@maltekrupa
Copy link

@mogren Thanks for the quick reply.
I think it was related to an ongoing linkerd installation. Sadly, I couldn't reproduce the error afterwards (which is strange, because it happened on two different clusters) and therefore wasn't able to run the collector script. If the issue appears again, I'll come back here with more information.

@s-tokutake
Copy link

s-tokutake commented Nov 1, 2019

We are trying to create a new cluster using eksctl and face the same error . The cluster is created successfully but the nodes are not become ready . Detail is below.

  • EKS: version 1.14

  • CNI: amazon-k8s-cni:v1.5.3

  • AMI: amazon-eks-node-1.14-v20190927 (ami-02e124a380df41614)

  • create cluster in the existing vpc and subnets which have sufficient ips.

  • Kubelet error on the nodes ↓:

W1101 03:31:48.212631    3705 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
E1101 03:31:48.430668    3705 kubelet.go:2172] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
  • ↓ exec sudo bash eks-log-collector.sh on the node. but all log files in the ipamd directory are empty.

image

@mak-1-sim
Copy link

mak-1-sim commented Nov 2, 2019

We are trying to create a new cluster using eksctl and face the same error . The cluster is created successfully but the nodes are not become ready . Detail is below.

  • EKS: version 1.14
  • CNI: amazon-k8s-cni:v1.5.3
  • AMI: amazon-eks-node-1.14-v20190927 (ami-02e124a380df41614)
  • create cluster in the existing vpc and subnets which have sufficient ips.
  • Kubelet error on the nodes ↓:
W1101 03:31:48.212631    3705 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
E1101 03:31:48.430668    3705 kubelet.go:2172] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
  • ↓ exec sudo bash eks-log-collector.sh on the node. but all log files in the ipamd directory are empty.

image

EKS : 1.13
CNI: amazon-k8s-cni:v1.5.3
AMI: ami-0619d38218e46ef86

Got the same issue while updating the EKS cluster.
Freshly created worker nodes can't become Ready state.
Rollback fromamazon-k8s-cni:v1.5.3 to amazon-k8s-cni:v1.5.1 resolved Issue.

UPD:
My main issue is about the mess with SG rules between worker_node group and ControlPlane.
After updating SG rules everything looks fine with CNI 1.5.1 and 1.5.3.
Guys don't forget to check and edit ControlPlane SG Inbound and Outbound rules.
Inbound: 443 port for worker_nodes SG
Outbound: 443 and 1025 - 65535 ports for worker_nodes SG

But really strange that with not all needed SG rules in ControlPlane SG with CNI 1.5.1 new worker nodes are become Ready state.

@s-tokutake
Copy link

s-tokutake commented Nov 6, 2019

  • Create cluster , and downgrade v1.5.3 to v1.5.1 ( kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/ffaf737145ab3262b7afd0ddbdf613a2174f30dd/config/v1.5/aws-k8s-cni.yaml ), but the issue has been not resolved .
  • /etc/cni/ directory does not exists in the node.

@hhamalai
Copy link

hhamalai commented Nov 6, 2019

Had the same issue as @hardcorexcat, rollback to v1.5.1 resolved the issue too, and now after a while upgrading back to v1.5.3 works too with new nodes. Clusters are created with terraform-aws-eks.

@ixjosemi
Copy link

ixjosemi commented Nov 7, 2019

Hi all,

Here is my case:

Environment

EKS: v1.14
CNI: v1.5.3
AMI: ami-082bb518441d3954c

I just created a fresh EKS cluster with 2 worker nodes that join the cluster but with a NotReady status. After downgrading the CNI version from v1.5.3 to v1.5.1, workers get Ready status but when checking the subnet's IP that should be assigned to the workers there is only one IP.

Regards,
Josemi.

@mluscon
Copy link

mluscon commented Nov 23, 2019

Hi, I am hitting the same issue with eks cluster created by terraform.

@onprema
Copy link

onprema commented Dec 5, 2019

Just in case someone comes across this who is using a g4dn family instance on AWS. I was stuck on this for a while because the version of the CNI plugin I was using didn't support that family. After upgrading the CNI plugin it worked. https://docs.aws.amazon.com/eks/latest/userguide/cni-upgrades.html

@Erokos
Copy link

Erokos commented Dec 8, 2019

For the past few days I've been experimenting with EKS cluster creation. I'm using terraform, actually a terraform module similar to the popular community module.
What I've observed:

Creating clusters below version 1.14 have no problems with worker nodes being in a "Ready" state. I'm using the latest CNI version: amazon-k8s-cni:v1.5.5
BUT,
No matter what I try when creating 1.14 version clusters, the worker nodes are in the "NotReady" state even though I've applied the aws-auth-cm.yaml configmap and the latest CNI version. Upon closer look (kubectl describe node <node_name>) I see an error that the CNI is uninitialized, also when I look at the running pods (kubectl get pods -n kube-system) I can see the core-dns pods being in a "pending" state and the aws-node pods crashing every few seconds.
I've then taken some steps to see if I could fix it:
a) downgraded the CNI version to 1.5.3 - this resulted in nodes getting to "Ready" state but this didn't fix the problem, the core-dns pods were now in "ContainerCreating" status constantly and aws-node pods had the same behaviour. Upgrading the CNI back to 1.5.5 didn't change anything.
b) Next what I tried was to create a 1.13 cluster first with nodes using a 1.14 kubernetes AMI. The nodes didn't have any problems joining the cluster and were ready. I then upgraded the cluster version and this resulted in a working 1.14 cluster with the nodes joined and being ready. - HOWEVER, if I increased the number of nodes in an auto scaling group, the new nodes had the same old problems of not being ready no matter what I tried.

To sum up, I've decided to use a 1.13 cluster in which I see no problems with nodes using a 1.14 AMI in hopes of fixing this problem in the near future.

Epilogue: I'm using a full 1.13 version cluster because every once in a while a worker node would briefly become "NotReady" and then after a few seconds revert to Ready. Very strange behaviour.

@ppaepam
Copy link

ppaepam commented Dec 11, 2019

I have experienced the same as @Erokos. With 1.13 works, with 1.14 nodes fail to get to ready.

I dont think the issue is related to AWS VPC CNI, because I tried replacing it with Calico and got same problem: cni pod (aws-node or calico-node) cannot connect to 10.100.0.1 which is kubernetes service clusterip.

@ppaepam
Copy link

ppaepam commented Dec 13, 2019

Coming from AWS support:
It is possible that issue is caused by changes to security group requirements for worker nodes [1] introduced in EKS platform v3 [2]

Another possible cause is my old AWS provider. I use 1.60.0.

Hope this helps

[1] https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html
[2] https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html

@mogren
Copy link
Contributor

mogren commented Mar 4, 2020

Hi @ppaepam, is this still an issue?

@SarasaGunawardhana
Copy link

I am also having this issue.

tried add worker node via Cloud formation.

@SarasaGunawardhana
Copy link

SarasaGunawardhana commented Mar 21, 2020

I fixed this issue by upgrading Kubernetes components. I had the same problem in my AWS EKS cluster. So ran below commands using eksctl CLI tool.

eksctl utils update-kube-proxy --name Your_Cluster_Name --approve
eksctl utils update-aws-node --name Your_Cluster_Name --approve
eksctl utils update-coredns --name Your_Cluster_Name --approve

@mogren
Copy link
Contributor

mogren commented Apr 22, 2020

This issue contains a mix of CNI versions and EKS cluster versions. I think @ppaepam and @SarasaGunawardhana are both right, and if anyone has similar issues please open a new issue to track that specific case.

@mogren mogren closed this as completed Apr 22, 2020
@mlachmish
Copy link

I experienced this issue after updating EKS to version 1.16 and @SarasaGunawardhana commands did the trick for me.

@Alien2150
Copy link

@mlachmish also struggeling with it. Thx for the confirmation :)

@brianstorti
Copy link

Leaving this here as this issue was the first result on Google.

The problem for me was that my kube-proxy daemonset was using the --resource-container flag, which was removed on Kubernetes 1.16, resulting in this "cni config uninitialized" error and nodes getting stuck in the NotReady state.

I had to manually edit this daemonset and remove the flag ($ kubectl edit ds kube-proxy -n kube-system).

For reference, this is the daemonset command I'm using now, with kube-proxy 1.16.8:

      - command:
        - /bin/sh
        - -c
        - kube-proxy --oom-score-adj=-998 --master=https://MYCLUSTER.eks.amazonaws.com
          --kubeconfig=/var/lib/kube-proxy/kubeconfig --proxy-mode=iptables --v=2
          1>>/var/log/kube-proxy.log 2>&1

@brankerd
Copy link

Thankyou @SarasaGunawardhana, This has just worked for me

@Erokos
Copy link

Erokos commented Jun 11, 2020

Coming from AWS support:
It is possible that issue is caused by changes to security group requirements for worker nodes [1] introduced in EKS platform v3 [2]

Another possible cause is my old AWS provider. I use 1.60.0.

Hope this helps

[1] https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html
[2] https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html

Just to verify, I've recently created a 1.15 cluster with an additional security group for the EKS control plane and have had no problems. Before, and that worked for 1.13 version, my EKS module used to assign the default VPC security group to the EKS cluster control plane.
Thanks to all of you.

@ugurarpaci
Copy link

Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

These logs still occurs on some occasions.

@ghost
Copy link

ghost commented Oct 21, 2020

this just occurred to me when upgrading from EKS 1.14 -> 1.15 and CNI from 1.6.0 to 1.7.5

no matter what we did, 1.7.5 would not put nodes into a ready state. Our solution (for now) was to revert the daemonset back to 1.6.0.

End state: cluster upgraded to 1.15.11 but AWS CNI is still at 1.6.0

@aksharj
Copy link

aksharj commented Dec 4, 2020

Hi All,

I am still facing the issue where i am trying to update from 1.14 to 1.15. I am doing the upgrade process from AWS Console.

The cluster version upgraded successfully but for nodes i am seeing the same error runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Any help on how to workaround this would be really great, Thanks.

@jayanthvn
Copy link
Contributor

Hi @aksharj

Can you please try the suggestion by @max-rocket-internet? Please see this - #284 (comment)

Thank you!

@fsh905
Copy link

fsh905 commented Dec 20, 2020

Leaving this here as this issue was the first result on Google.

The problem for me was that my kube-proxy daemonset was using the --resource-container flag, which was removed on Kubernetes 1.16, resulting in this "cni config uninitialized" error and nodes getting stuck in the NotReady state.

I had to manually edit this daemonset and remove the flag ($ kubectl edit ds kube-proxy -n kube-system).

For reference, this is the daemonset command I'm using now, with kube-proxy 1.16.8:

      - command:
        - /bin/sh
        - -c
        - kube-proxy --oom-score-adj=-998 --master=https://MYCLUSTER.eks.amazonaws.com
          --kubeconfig=/var/lib/kube-proxy/kubeconfig --proxy-mode=iptables --v=2
          1>>/var/log/kube-proxy.log 2>&1

I tried to use this method, kube-proxy still cannot be started properly, then I refer to this tutorial https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html

The pod security policy admission controller is enabled on Amazon EKS clusters running Kubernetes version 1.13 or later. If you're upgrading your cluster to Kubernetes version 1.13 or later, ensure that the proper pod security policies are in place before you update to avoid any issues. You can check for the default policy with the following command:

then I install default pod security policy install psp
and followed What you need to do before upgrading to 1.16 in tutorial updated my kube-proxy and everything is ok!

@tmehlinger
Copy link

tmehlinger commented Feb 5, 2021

For others still running into this, it just happened to me--with a much simpler solution.

If you're using eksctl and restricting access to the public API endpoint (either with eksctl utils set-public-access-cidrs or the vpc.publicAccessCIDRs config file option), don't forget to enable private endpoint access. If you don't, your nodes will be unable to connect to the API server and time out trying to retrieve CNI configuration.

@argyrodagdileli
Copy link

Okay for anyone that might still face this issue.
In my case, we had a cluster that was initially set up a while ago at version 1.14 and was later upgraded to 1.16.
I was able to deploy x86 nodes without any issue.
However when I tried to deploy arm64 node groups they would fail with the error described in this issue

After running the steps that @SarasaGunawardhana mentioned

I fixed this issue by upgrading Kubernetes components. I had the same problem in my AWS EKS cluster. So ran below commands using eksctl CLI tool.

eksctl utils update-kube-proxy --name Your_Cluster_Name --approve
eksctl utils update-aws-node --name Your_Cluster_Name --approve
eksctl utils update-coredns --name Your_Cluster_Name --approve

I noticed that the aws-node pod was still using the 1.6.xx CNI version, when another cluster I had that was also running kubernetes v.1.16 was running CNI v1.7.xx

So after further troubleshooting I had to manually upgrade the CNI addon on my cluster.
I also had to modify the daemon set configuration for kube-proxy as instructed here

@charlescurt
Copy link

I am still getting this error on k8 1.22 eks

I have this error when switching my ami version from

ami-06bdb0d00ff41dc6d
to
ami-0abfb3be33c196cbf

None of the fixes above have worked is there a configuration I am missing when moving to newer ami?

@dumlutimuralp
Copy link

I am still getting this error on k8 1.22 eks

I have this error when switching my ami version from

ami-06bdb0d00ff41dc6d to ami-0abfb3be33c196cbf

None of the fixes above have worked is there a configuration I am missing when moving to newer ami?

Have you checked if you have the AmazonEKS_CNI_Policy associated with the EKS Worker Node Role ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests