Kubelet will start scheduling pods before amazon-vpc-cni-k8s daemon set is fully functional causing workloads to error with "failed to assign an IP address to container" #330

nithu0115 · 2019-02-22T17:16:22Z

Issue description

Pods are failing to create with sanbox error. We are using latest CNI (1.3.0) and there sufficient IPs available for pods

Analysis

I went through the Kubelet and aws-node logs. This happens as soon as the Worker Nodes joins the cluster and if there are any pending work loads which needs to be scheduled as soon as the Worker Nodes goes to Ready state. I see Kubelet failed to start pod because of no available IP address which tally with the ipamd daemon not having IP address in its warm pool.
This is what I found so far about issue: As soon as the aws-node CNI was initialized at 2019-01-29T01:57:01Z, Kubelet gave the aws-node cni todo pods networking to assign a private ip addresses. However, aws-node CNI took over 7seconds to add ip addresses to warm pool which happened around 2019-01-29T01:57:08Z. This caused the pods to fail in those 7 seconds as they could not get a IP address with error - "failed to assign an IP address to container".

Below are the last occurrences of the error where it is complaining about "failed to assign an IP address to container".

Kubelet logs

cmd: failed to assign an IP address to container
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: E0129 01:57:08.022505    5122 cni.go:260] Error adding network: add cmd: failed to assign an IP address to container
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: E0129 01:57:08.022519    5122 cni.go:228] Error while adding to cni network: add cmd: failed to assign an IP address to container
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: E0129 01:57:08.377159    5122 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container "b849b621ba3c224c1b42912252ded857d2a798d031d8f81065d2b1736d07771a" network for pod "workbench-rules-cc7764ccb-stjc8": NetworkPlugin cni failed to set up pod "workbench-rules-cc7764ccb-stjc8_placeable-regression" network: add cmd: failed to assign an IP address to container
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: E0129 01:57:08.377186    5122 kuberuntime_sandbox.go:56] CreatePodSandbox for pod "workbench-rules-cc7764ccb-stjc8_placeable-regression(2d3b0c4c-2369-11e9-910a-02b33f273b20)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "b849b621ba3c224c1b42912252ded857d2a798d031d8f81065d2b1736d07771a" network for pod "workbench-rules-cc7764ccb-stjc8": NetworkPlugin cni failed to set up pod "workbench-rules-cc7764ccb-stjc8_placeable-regression" network: add cmd: failed to assign an IP address to container
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: E0129 01:57:08.377197    5122 kuberuntime_manager.go:646] createPodSandbox for pod "workbench-rules-cc7764ccb-stjc8_placeable-regression(2d3b0c4c-2369-11e9-910a-02b33f273b20)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "b849b621ba3c224c1b42912252ded857d2a798d031d8f81065d2b1736d07771a" network for pod "workbench-rules-cc7764ccb-stjc8": NetworkPlugin cni failed to set up pod "workbench-rules-cc7764ccb-stjc8_placeable-regression" network: add cmd: failed to assign an IP address to container
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: E0129 01:57:08.377238    5122 pod_workers.go:186] Error syncing pod 2d3b0c4c-2369-11e9-910a-02b33f273b20 ("workbench-rules-cc7764ccb-stjc8_placeable-regression(2d3b0c4c-2369-11e9-910a-02b33f273b20)"), skipping: failed to "CreatePodSandbox" for "workbench-rules-cc7764ccb-stjc8_placeable-regression(2d3b0c4c-2369-11e9-910a-02b33f273b20)" with CreatePodSandboxError: "CreatePodSandbox for pod \"workbench-rules-cc7764ccb-stjc8_placeable-regression(2d3b0c4c-2369-11e9-910a-02b33f273b20)\" failed: rpc error: code = Unknown desc = failed to set up sandbox container \"b849b621ba3c224c1b42912252ded857d2a798d031d8f81065d2b1736d07771a\" network for pod \"workbench-rules-cc7764ccb-stjc8\": NetworkPlugin cni failed to set up pod \"workbench-rules-cc7764ccb-stjc8_placeable-regression\" network: add cmd: failed to assign an IP address to container"
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: I0129 01:57:08.377298    5122 server.go:460] Event(v1.ObjectReference{Kind:"Pod", Namespace:"placeable-regression", Name:"workbench-rules-cc7764ccb-stjc8", UID:"2d3b0c4c-2369-11e9-910a-02b33f273b20", APIVersion:"v1", ResourceVersion:"106928974", FieldPath:""}): type: 'Warning' reason: 'FailedCreatePodSandBox' Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "b849b621ba3c224c1b42912252ded857d2a798d031d8f81065d2b1736d07771a" network for pod "workbench-rules-cc7764ccb-stjc8": NetworkPlugin cni failed to set up pod "workbench-rules-cc7764ccb-stjc8_placeable-regression" network: add cmd: failed to assign an IP address to container`

ipamd logs

2019-01-29T01:57:07Z [INFO] Send DelNetworkReply: IPv4Addr , DeviceNumber: 0, err: datastore: unknown pod
2019-01-29T01:57:08Z [INFO] Received AddNetwork for NS /proc/7846/ns/net, Pod workbench-rules-cc7764ccb-stjc8, NameSpace regression, Container b849b621ba3c224c1b42912252ded857d2a798d031d8f81065d2b1736d07771a, ifname eth0
2019-01-29T01:57:08Z [DEBUG] AssignIPv4Address: IP address pool stats: total:0, assigned 0

The text was updated successfully, but these errors were encountered:

mogren · 2019-02-27T22:44:30Z

Related to #282

mogren · 2019-03-29T23:50:10Z

I've added some improved logging and a potential fix in #367. Punting this issue to 1.5 until I can make sure it has been resolved.

lutierigb · 2019-06-18T23:17:58Z

Possible workaround:

The only thing I can think of is tainting the node from the very beginning with a NoSchedule action, thus preventing pods from being scheduled. The aws-node(cni plugin) and kube-proxy have tolerations so they will be scheduled regardless. Later on, a few seconds/minutes after the node has joined the cluster and once aws-node is running on the new node they can remove the taint, allowing other pods to be scheduled.

To taint the node one can use Kubelet register-with-taints flag. In the EC2 userdata it will look like this:

/etc/eks/bootstrap.sh <clustername> --kubelet-extra-args "--register-with-taints=\"bootstrapping=true:NoSchedule"\"

Later on, to manually remove the taint, use:

kubectl taint node ip-xxxxxxx.compute.internal bootstrapping:NoSchedule-

One could think of adding a delay in the userdata and removing the taint from there, the problem is the node doesn't have permission to remove taints so removing the taint has to be done manually by the operator.

nithu0115 · 2019-06-25T01:17:23Z

Thanks @lutierigb for the workaround.

I did little research about the issue. The best possible approach to avoid manual steps would be to include grpc health check at 127.0.0.1:50051 and make changes to manifest file to do live-probe/readiness-probe.

@mogren any thoughts ?

mogren · 2019-06-25T01:44:19Z

@nithu0115 Sounds good to me, worth testing. Thanks!

edmorley · 2019-06-26T20:16:17Z

Hi! We're hitting this issue quite regularly with CNI plugin version 1.4.1, since our bursty Job workloads means node addition in the presence of a backlog of pending pods is a common occurrence, increasing the chance we hit the race condition.

Notably this issue became more of a problem for us when we upgraded to Kubernetes 1.13, since our Job definition specifies a restartPolicy of Never, which means pods that encounter the race condition then no longer get retried, due to a 1.13 regression (kubernetes/kubernetes/issues/79398).

I tested briefly using CNI plugin 1.5.0 (under Kubernetes 1.13), and couldn't seem to reproduce the race condition, though didn't investigate too much further for now, since we're going to stick to Kubernetes 1.12 until the Kubernetes regression is fixed anyway.

dng-dev · 2019-07-09T09:11:41Z

For me the Liveness / readiness probe provided by @nithu0115 is not working.
spec.template.spec.containers[0].livenessProbe.successThreshold: Invalid value: 2: must be 1
After fixing the threshold the health check failes Liveness probe failed: dial tcp 172.19.39.52:50051: connect: connection refused. I applied the 1.5 manifest.

stijndehaes · 2019-07-09T09:22:46Z

@dng-dev This works:

readinessProbe:
  tcpSocket:
    host: 127.0.0.1
    port: 50051
  initialDelaySeconds: 1
  periodSeconds: 1
  successThreshold: 1
  timeoutSeconds: 1
  failureThreshold: 3

You need to set the host to localhost. Since it only listens on localhost

stijndehaes · 2019-07-09T09:25:00Z

I will make a PR

dng-dev · 2019-07-09T09:53:06Z

@stijndehaes thanks for the advice, i also found that the issue is to set the host. But some was-node pods stays unhealthy. Its a new eks cluster.
pod/aws-node-2zwc7 0/1 CrashLoopBackOff 4 3m44s

Events:
  Type     Reason     Age                    From                                                   Message
  ----     ------     ----                   ----                                                   -------
  Normal   Scheduled  5m3s                   default-scheduler                                      Successfully assigned kube-system/aws-node-2zwc7 to ip-172-19-83-9.eu-central-1.compute.internal
  Normal   Pulling    5m1s                   kubelet, ip-172-19-83-9.eu-central-1.compute.internal  pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.5.0"
  Normal   Pulled     4m48s                  kubelet, ip-172-19-83-9.eu-central-1.compute.internal  Successfully pulled image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.5.0"
  Normal   Created    4m45s                  kubelet, ip-172-19-83-9.eu-central-1.compute.internal  Created container
  Normal   Started    4m45s                  kubelet, ip-172-19-83-9.eu-central-1.compute.internal  Started container
  Warning  Unhealthy  4m38s (x3 over 4m40s)  kubelet, ip-172-19-83-9.eu-central-1.compute.internal  Liveness probe failed: dial tcp 127.0.0.1:50051: connect: connection refused
  Warning  Unhealthy  0s (x225 over 4m43s)   kubelet, ip-172-19-83-9.eu-central-1.compute.internal  Readiness probe failed: dial tcp 127.0.0.1:50051: connect: connection refused

stijndehaes · 2019-07-09T10:53:01Z

@dng-dev not sure what is happing. I am having no problems atm. But I am using 1.4.1 have you looked into the logs?

nithu0115 · 2019-07-09T12:40:00Z

@dng-dev, I have unreferenced commits from my local repository which was adding confusion.
By setting tcpSocket probe, kubelet is opening a socket to gRPC server which is not meaningful, since it cannot read the response body.

I am still testing if tcpsocket liveness and readiness probes is better approach or otherwise. I have run into similar problems if aws-node pod takes more than usual time to start and the goes into crashloop back off with liveliness probe failed message.

stijndehaes · 2019-07-09T12:41:33Z

My PR is not enough just did a test launching a 100 pods an adding an extra 8 machines. Still got errors on some on the nodes.

stijndehaes · 2019-07-09T12:42:48Z

Maybe we should use this: https://github.com/grpc-ecosystem/grpc-health-probe/ ?

stijndehaes · 2019-07-09T13:09:15Z

Ok adding a long timeout does not work. As kubernetes does not care if this pod is ready or not. The real problem is that k8s thinks the CNI is correctly installed before it is running see: #282

nithu0115 · 2019-07-25T22:52:06Z

@stijndehaes @dng-dev @uthark I wrote a quick fix by adding healthz to IPamD and client to poll IPamD endpoint https://raw.githubusercontent.com/nithu0115/amazon-vpc-cni-k8s/342626c1c885b63254cd114f3e82fe91ab63eb03/config/v1.5/aws-k8s-cni.yaml

Would you mind giving it a spin and see if this helps ?

ericlarssen-wf · 2019-07-30T14:28:31Z

I tried deploying this fix and got new errors specifically on new hosts

Failed CNI request: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"

Error received from DelNetwork grpc call for pod td-agent-daemonset-cvt8z namespace internal container c1fab7216e28f6173afcb52d2fc0b13e5b7b7b42fd887192f03ddd772a1228c0: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"

kimi1987 · 2019-08-06T19:54:42Z

Hi I am seeing exact same error jobs pod. They get stuck in container creating state

#330 (comment)

mogren · 2019-08-06T20:50:59Z

@kimi1987 @ericlarssen-wf Can you try with the recently released v1.5.2?

ericlarssen-wf · 2019-08-06T21:17:20Z

lgtm, was able to verify that this v.1.5.2 worked

kimi1987 · 2019-08-08T20:23:30Z

@mogren Ill try the new version today and will let you know.

kimi1987 · 2019-08-09T17:58:37Z

@kimi1987 @ericlarssen-wf Can you try with the recently released v1.5.2?

Tried the new version I dont see the issue for now. Ill do some more runs. Also, AWS doc says new version is 1.5.2 but I created new cluster today and I had to manually change the container image to 1.5.2 it was 1.5.1

mogren · 2019-08-09T19:06:06Z

@kimi1987 Thanks for testing. Yes, the default for new clusters is still v1.5.1. I'm working on a v1.5.3 with some additional fixes for some rare cases.

mogren · 2019-08-22T21:33:00Z

v1.5.3 has a fix for this on new nodes, and the fix has been merged back to master.

mogren added this to the v1.4 milestone Mar 1, 2019

mogren added the priority/P0 Highest priority. Someone needs to actively work on this. label Mar 1, 2019

mogren mentioned this issue Mar 16, 2019

Pods stuck in ContainerCreating due to CNI Failing to Assing IP to Container Until aws-node is deleted #59

Closed

mogren modified the milestones: v1.4, v1.5 Mar 29, 2019

mogren modified the milestones: v1.5, v1.6 Jun 5, 2019

stijndehaes mentioned this issue Jul 9, 2019

Added liveness and readiness probes #523

Closed

uthark mentioned this issue Jul 10, 2019

ENI warming is delayed for at least for 1 minute, probably caused by #480 #525

Closed

deliahu mentioned this issue Jul 22, 2019

Pods stuck in ContainerCreating after cluster autoscaling (AWS CNI race condition) cortexlabs/cortex#247

Closed

mogren mentioned this issue Jul 23, 2019

Detach ENI before deleting #538

Merged

deliahu mentioned this issue Jul 25, 2019

Bring back AWS CNI once issues are resolved cortexlabs/cortex#260

Closed

nithu0115 mentioned this issue Jul 26, 2019

Adding healthz endpoint to IPamD #544

Closed

nithu0115 mentioned this issue Jul 30, 2019

Adding healthz endpoint to IPamD #548

Merged

deliahu mentioned this issue Aug 7, 2019

Pods stuck in ContainerCreating (AWS CNI pod limit) cortexlabs/cortex#219

Closed

3 tasks

mogren closed this as completed Aug 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubelet will start scheduling pods before amazon-vpc-cni-k8s daemon set is fully functional causing workloads to error with "failed to assign an IP address to container" #330

Kubelet will start scheduling pods before amazon-vpc-cni-k8s daemon set is fully functional causing workloads to error with "failed to assign an IP address to container" #330

nithu0115 commented Feb 22, 2019 •

edited by mcrute

mogren commented Feb 27, 2019

mogren commented Mar 29, 2019

lutierigb commented Jun 18, 2019

nithu0115 commented Jun 25, 2019 •

edited

mogren commented Jun 25, 2019

edmorley commented Jun 26, 2019

dng-dev commented Jul 9, 2019

stijndehaes commented Jul 9, 2019

stijndehaes commented Jul 9, 2019

dng-dev commented Jul 9, 2019

stijndehaes commented Jul 9, 2019

nithu0115 commented Jul 9, 2019

stijndehaes commented Jul 9, 2019

stijndehaes commented Jul 9, 2019

stijndehaes commented Jul 9, 2019

nithu0115 commented Jul 25, 2019

ericlarssen-wf commented Jul 30, 2019 •

edited

kimi1987 commented Aug 6, 2019

mogren commented Aug 6, 2019

ericlarssen-wf commented Aug 6, 2019

kimi1987 commented Aug 8, 2019

kimi1987 commented Aug 9, 2019 •

edited

mogren commented Aug 9, 2019

mogren commented Aug 22, 2019

Kubelet will start scheduling pods before amazon-vpc-cni-k8s daemon set is fully functional causing workloads to error with "failed to assign an IP address to container" #330

Kubelet will start scheduling pods before amazon-vpc-cni-k8s daemon set is fully functional causing workloads to error with "failed to assign an IP address to container" #330

Comments

nithu0115 commented Feb 22, 2019 • edited by mcrute

Issue description

Analysis

Kubelet logs

ipamd logs

mogren commented Feb 27, 2019

mogren commented Mar 29, 2019

lutierigb commented Jun 18, 2019

nithu0115 commented Jun 25, 2019 • edited

mogren commented Jun 25, 2019

edmorley commented Jun 26, 2019

dng-dev commented Jul 9, 2019

stijndehaes commented Jul 9, 2019

stijndehaes commented Jul 9, 2019

dng-dev commented Jul 9, 2019

stijndehaes commented Jul 9, 2019

nithu0115 commented Jul 9, 2019

stijndehaes commented Jul 9, 2019

stijndehaes commented Jul 9, 2019

stijndehaes commented Jul 9, 2019

nithu0115 commented Jul 25, 2019

ericlarssen-wf commented Jul 30, 2019 • edited

kimi1987 commented Aug 6, 2019

mogren commented Aug 6, 2019

ericlarssen-wf commented Aug 6, 2019

kimi1987 commented Aug 8, 2019

kimi1987 commented Aug 9, 2019 • edited

mogren commented Aug 9, 2019

mogren commented Aug 22, 2019

nithu0115 commented Feb 22, 2019 •

edited by mcrute

nithu0115 commented Jun 25, 2019 •

edited

ericlarssen-wf commented Jul 30, 2019 •

edited

kimi1987 commented Aug 9, 2019 •

edited