Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet will start scheduling pods before amazon-vpc-cni-k8s daemon set is fully functional causing workloads to error with "failed to assign an IP address to container" #330

Closed
nithu0115 opened this issue Feb 22, 2019 · 24 comments
Labels
priority/P0 Highest priority. Someone needs to actively work on this.
Milestone

Comments

@nithu0115
Copy link
Contributor

nithu0115 commented Feb 22, 2019

Issue description

Pods are failing to create with sanbox error. We are using latest CNI (1.3.0) and there sufficient IPs available for pods

Analysis

I went through the Kubelet and aws-node logs. This happens as soon as the Worker Nodes joins the cluster and if there are any pending work loads which needs to be scheduled as soon as the Worker Nodes goes to Ready state. I see Kubelet failed to start pod because of no available IP address which tally with the ipamd daemon not having IP address in its warm pool.
This is what I found so far about issue: As soon as the aws-node CNI was initialized at 2019-01-29T01:57:01Z, Kubelet gave the aws-node cni todo pods networking to assign a private ip addresses. However, aws-node CNI took over 7seconds to add ip addresses to warm pool which happened around 2019-01-29T01:57:08Z. This caused the pods to fail in those 7 seconds as they could not get a IP address with error - "failed to assign an IP address to container".

Below are the last occurrences of the error where it is complaining about "failed to assign an IP address to container".

Kubelet logs

cmd: failed to assign an IP address to container
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: E0129 01:57:08.022505    5122 cni.go:260] Error adding network: add cmd: failed to assign an IP address to container
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: E0129 01:57:08.022519    5122 cni.go:228] Error while adding to cni network: add cmd: failed to assign an IP address to container
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: E0129 01:57:08.377159    5122 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container "b849b621ba3c224c1b42912252ded857d2a798d031d8f81065d2b1736d07771a" network for pod "workbench-rules-cc7764ccb-stjc8": NetworkPlugin cni failed to set up pod "workbench-rules-cc7764ccb-stjc8_placeable-regression" network: add cmd: failed to assign an IP address to container
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: E0129 01:57:08.377186    5122 kuberuntime_sandbox.go:56] CreatePodSandbox for pod "workbench-rules-cc7764ccb-stjc8_placeable-regression(2d3b0c4c-2369-11e9-910a-02b33f273b20)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "b849b621ba3c224c1b42912252ded857d2a798d031d8f81065d2b1736d07771a" network for pod "workbench-rules-cc7764ccb-stjc8": NetworkPlugin cni failed to set up pod "workbench-rules-cc7764ccb-stjc8_placeable-regression" network: add cmd: failed to assign an IP address to container
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: E0129 01:57:08.377197    5122 kuberuntime_manager.go:646] createPodSandbox for pod "workbench-rules-cc7764ccb-stjc8_placeable-regression(2d3b0c4c-2369-11e9-910a-02b33f273b20)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "b849b621ba3c224c1b42912252ded857d2a798d031d8f81065d2b1736d07771a" network for pod "workbench-rules-cc7764ccb-stjc8": NetworkPlugin cni failed to set up pod "workbench-rules-cc7764ccb-stjc8_placeable-regression" network: add cmd: failed to assign an IP address to container
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: E0129 01:57:08.377238    5122 pod_workers.go:186] Error syncing pod 2d3b0c4c-2369-11e9-910a-02b33f273b20 ("workbench-rules-cc7764ccb-stjc8_placeable-regression(2d3b0c4c-2369-11e9-910a-02b33f273b20)"), skipping: failed to "CreatePodSandbox" for "workbench-rules-cc7764ccb-stjc8_placeable-regression(2d3b0c4c-2369-11e9-910a-02b33f273b20)" with CreatePodSandboxError: "CreatePodSandbox for pod \"workbench-rules-cc7764ccb-stjc8_placeable-regression(2d3b0c4c-2369-11e9-910a-02b33f273b20)\" failed: rpc error: code = Unknown desc = failed to set up sandbox container \"b849b621ba3c224c1b42912252ded857d2a798d031d8f81065d2b1736d07771a\" network for pod \"workbench-rules-cc7764ccb-stjc8\": NetworkPlugin cni failed to set up pod \"workbench-rules-cc7764ccb-stjc8_placeable-regression\" network: add cmd: failed to assign an IP address to container"
Jan 29 01:57:08 ip-10-77-10-10.ec2.internal kubelet[5122]: I0129 01:57:08.377298    5122 server.go:460] Event(v1.ObjectReference{Kind:"Pod", Namespace:"placeable-regression", Name:"workbench-rules-cc7764ccb-stjc8", UID:"2d3b0c4c-2369-11e9-910a-02b33f273b20", APIVersion:"v1", ResourceVersion:"106928974", FieldPath:""}): type: 'Warning' reason: 'FailedCreatePodSandBox' Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "b849b621ba3c224c1b42912252ded857d2a798d031d8f81065d2b1736d07771a" network for pod "workbench-rules-cc7764ccb-stjc8": NetworkPlugin cni failed to set up pod "workbench-rules-cc7764ccb-stjc8_placeable-regression" network: add cmd: failed to assign an IP address to container`

ipamd logs

2019-01-29T01:57:07Z [INFO] Send DelNetworkReply: IPv4Addr , DeviceNumber: 0, err: datastore: unknown pod
2019-01-29T01:57:08Z [INFO] Received AddNetwork for NS /proc/7846/ns/net, Pod workbench-rules-cc7764ccb-stjc8, NameSpace regression, Container b849b621ba3c224c1b42912252ded857d2a798d031d8f81065d2b1736d07771a, ifname eth0
2019-01-29T01:57:08Z [DEBUG] AssignIPv4Address: IP address pool stats: total:0, assigned 0
@nithu0115 nithu0115 changed the title Kubelet will start scheduling pods before amazon-vpc-cni-k8s daemon set is fully functional causing initial works to error with "failed to assign an IP address to container" Kubelet will start scheduling pods before amazon-vpc-cni-k8s daemon set is fully functional causing workloadss to error with "failed to assign an IP address to container" Feb 22, 2019
@nithu0115 nithu0115 changed the title Kubelet will start scheduling pods before amazon-vpc-cni-k8s daemon set is fully functional causing workloadss to error with "failed to assign an IP address to container" Kubelet will start scheduling pods before amazon-vpc-cni-k8s daemon set is fully functional causing workloads to error with "failed to assign an IP address to container" Feb 22, 2019
@mogren
Copy link
Contributor

mogren commented Feb 27, 2019

Related to #282

@mogren mogren added this to the v1.4 milestone Mar 1, 2019
@mogren mogren added the priority/P0 Highest priority. Someone needs to actively work on this. label Mar 1, 2019
@mogren
Copy link
Contributor

mogren commented Mar 29, 2019

I've added some improved logging and a potential fix in #367. Punting this issue to 1.5 until I can make sure it has been resolved.

@mogren mogren modified the milestones: v1.4, v1.5 Mar 29, 2019
@mogren mogren modified the milestones: v1.5, v1.6 Jun 5, 2019
@lutierigb
Copy link
Contributor

Possible workaround:

The only thing I can think of is tainting the node from the very beginning with a NoSchedule action, thus preventing pods from being scheduled. The aws-node(cni plugin) and kube-proxy have tolerations so they will be scheduled regardless. Later on, a few seconds/minutes after the node has joined the cluster and once aws-node is running on the new node they can remove the taint, allowing other pods to be scheduled.

To taint the node one can use Kubelet register-with-taints flag. In the EC2 userdata it will look like this:

/etc/eks/bootstrap.sh <clustername> --kubelet-extra-args "--register-with-taints=\"bootstrapping=true:NoSchedule"\" 

Later on, to manually remove the taint, use:

kubectl taint node ip-xxxxxxx.compute.internal bootstrapping:NoSchedule- 

One could think of adding a delay in the userdata and removing the taint from there, the problem is the node doesn't have permission to remove taints so removing the taint has to be done manually by the operator.

@nithu0115
Copy link
Contributor Author

nithu0115 commented Jun 25, 2019

Thanks @lutierigb for the workaround.

I did little research about the issue. The best possible approach to avoid manual steps would be to include grpc health check at 127.0.0.1:50051 and make changes to manifest file to do live-probe/readiness-probe.

@mogren any thoughts ?

@mogren
Copy link
Contributor

mogren commented Jun 25, 2019

@nithu0115 Sounds good to me, worth testing. Thanks!

@edmorley
Copy link
Contributor

Hi! We're hitting this issue quite regularly with CNI plugin version 1.4.1, since our bursty Job workloads means node addition in the presence of a backlog of pending pods is a common occurrence, increasing the chance we hit the race condition.

Notably this issue became more of a problem for us when we upgraded to Kubernetes 1.13, since our Job definition specifies a restartPolicy of Never, which means pods that encounter the race condition then no longer get retried, due to a 1.13 regression (kubernetes/kubernetes/issues/79398).

I tested briefly using CNI plugin 1.5.0 (under Kubernetes 1.13), and couldn't seem to reproduce the race condition, though didn't investigate too much further for now, since we're going to stick to Kubernetes 1.12 until the Kubernetes regression is fixed anyway.

@dng-dev
Copy link

dng-dev commented Jul 9, 2019

For me the Liveness / readiness probe provided by @nithu0115 is not working.
spec.template.spec.containers[0].livenessProbe.successThreshold: Invalid value: 2: must be 1
After fixing the threshold the health check failes Liveness probe failed: dial tcp 172.19.39.52:50051: connect: connection refused. I applied the 1.5 manifest.

@stijndehaes
Copy link

@dng-dev This works:

readinessProbe:
  tcpSocket:
    host: 127.0.0.1
    port: 50051
  initialDelaySeconds: 1
  periodSeconds: 1
  successThreshold: 1
  timeoutSeconds: 1
  failureThreshold: 3

You need to set the host to localhost. Since it only listens on localhost

@stijndehaes
Copy link

I will make a PR

@dng-dev
Copy link

dng-dev commented Jul 9, 2019

@stijndehaes thanks for the advice, i also found that the issue is to set the host. But some was-node pods stays unhealthy. Its a new eks cluster.
pod/aws-node-2zwc7 0/1 CrashLoopBackOff 4 3m44s

Events:
  Type     Reason     Age                    From                                                   Message
  ----     ------     ----                   ----                                                   -------
  Normal   Scheduled  5m3s                   default-scheduler                                      Successfully assigned kube-system/aws-node-2zwc7 to ip-172-19-83-9.eu-central-1.compute.internal
  Normal   Pulling    5m1s                   kubelet, ip-172-19-83-9.eu-central-1.compute.internal  pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.5.0"
  Normal   Pulled     4m48s                  kubelet, ip-172-19-83-9.eu-central-1.compute.internal  Successfully pulled image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.5.0"
  Normal   Created    4m45s                  kubelet, ip-172-19-83-9.eu-central-1.compute.internal  Created container
  Normal   Started    4m45s                  kubelet, ip-172-19-83-9.eu-central-1.compute.internal  Started container
  Warning  Unhealthy  4m38s (x3 over 4m40s)  kubelet, ip-172-19-83-9.eu-central-1.compute.internal  Liveness probe failed: dial tcp 127.0.0.1:50051: connect: connection refused
  Warning  Unhealthy  0s (x225 over 4m43s)   kubelet, ip-172-19-83-9.eu-central-1.compute.internal  Readiness probe failed: dial tcp 127.0.0.1:50051: connect: connection refused

@stijndehaes
Copy link

@dng-dev not sure what is happing. I am having no problems atm. But I am using 1.4.1 have you looked into the logs?

@nithu0115
Copy link
Contributor Author

@dng-dev, I have unreferenced commits from my local repository which was adding confusion.
By setting tcpSocket probe, kubelet is opening a socket to gRPC server which is not meaningful, since it cannot read the response body.

I am still testing if tcpsocket liveness and readiness probes is better approach or otherwise. I have run into similar problems if aws-node pod takes more than usual time to start and the goes into crashloop back off with liveliness probe failed message.

@stijndehaes
Copy link

My PR is not enough just did a test launching a 100 pods an adding an extra 8 machines. Still got errors on some on the nodes.

@stijndehaes
Copy link

Maybe we should use this: https://github.com/grpc-ecosystem/grpc-health-probe/ ?

@stijndehaes
Copy link

Ok adding a long timeout does not work. As kubernetes does not care if this pod is ready or not. The real problem is that k8s thinks the CNI is correctly installed before it is running see: #282

@nithu0115
Copy link
Contributor Author

@stijndehaes @dng-dev @uthark I wrote a quick fix by adding healthz to IPamD and client to poll IPamD endpoint https://raw.githubusercontent.com/nithu0115/amazon-vpc-cni-k8s/342626c1c885b63254cd114f3e82fe91ab63eb03/config/v1.5/aws-k8s-cni.yaml

Would you mind giving it a spin and see if this helps ?

@ericlarssen-wf
Copy link

ericlarssen-wf commented Jul 30, 2019

I tried deploying this fix and got new errors specifically on new hosts

Failed CNI request: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"

Error received from DelNetwork grpc call for pod td-agent-daemonset-cvt8z namespace internal container c1fab7216e28f6173afcb52d2fc0b13e5b7b7b42fd887192f03ddd772a1228c0: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"

@kimi1987
Copy link

kimi1987 commented Aug 6, 2019

Hi I am seeing exact same error jobs pod. They get stuck in container creating state

#330 (comment)

@mogren
Copy link
Contributor

mogren commented Aug 6, 2019

@kimi1987 @ericlarssen-wf Can you try with the recently released v1.5.2?

@ericlarssen-wf
Copy link

lgtm, was able to verify that this v.1.5.2 worked

@kimi1987
Copy link

kimi1987 commented Aug 8, 2019

@mogren Ill try the new version today and will let you know.

@kimi1987
Copy link

kimi1987 commented Aug 9, 2019

@kimi1987 @ericlarssen-wf Can you try with the recently released v1.5.2?

Tried the new version I dont see the issue for now. Ill do some more runs. Also, AWS doc says new version is 1.5.2 but I created new cluster today and I had to manually change the container image to 1.5.2 it was 1.5.1

@mogren
Copy link
Contributor

mogren commented Aug 9, 2019

@kimi1987 Thanks for testing. Yes, the default for new clusters is still v1.5.1. I'm working on a v1.5.3 with some additional fixes for some rare cases.

@mogren
Copy link
Contributor

mogren commented Aug 22, 2019

v1.5.3 has a fix for this on new nodes, and the fix has been merged back to master.

@mogren mogren closed this as completed Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/P0 Highest priority. Someone needs to actively work on this.
Projects
None yet
Development

No branches or pull requests

8 participants