Pod IP was configured to Node's eth0 when Pods were created on a Node in parallel #934

tnqn · 2020-07-09T11:09:09Z

Describe the bug
It happened to @alex-vmw's cluster that some short-life Pods didn't get their IPs and caused connectivity issue.
It was observed that the missing Pod IPs were configured to Node's eth0 somehow, and there were a number of error logs like:

E0707 01:05:30.751839       1 server.go:422] Failed to configure interfaces for container 596bd949477355f1199a5dfe06cb33b8c8cf219421b68c8606e0a09a68b786c0: failed to configure IP address for container 596bd949477355f1199a5dfe06cb33b8c8cf219421b68c8606e0a09a68b786c0: error switching to ns /host/proc/32618/ns/net: Error switching to ns /host/proc/32618/ns/net: invalid argument

E0707 00:55:34.973015       1 pod_configuration.go:242] Failed to advertise IP address for container efb3f9340201bc330e1f009b21d5801ca8fcddcf54e9ecc9f9db6cc32f13dfe9: /host/proc/3808/ns/net is not a valid network namespace: failed to Statfs "/host/proc/3808/ns/net": no such file or directory

E0707 01:45:46.128598       1 server.go:422] Failed to configure interfaces for container bd391dfba8a1e7423213b3088f20c2bcbfb42c36b11d7a6ec3fea353aab77606: failed to create veth devices for container bd391dfba8a1e7423213b3088f20c2bcbfb42c36b11d7a6ec3fea353aab77606: error switching to ns /proc/1/task/3667/ns/net: Error switching to ns /proc/1/task/3667/ns/net: bad file descriptor

And there were FD leaks after Pods created and deleted.

antrea-ag 32229 root   23u     sock                0,8      0t0 212567577 protocol: PACKET
antrea-ag 32229 root   27u     sock                0,8      0t0 212637082 protocol: PACKET
antrea-ag 32229 root   32u     sock                0,8      0t0 212616399 protocol: PACKET
antrea-ag 32229 root   35u     sock                0,8      0t0 212611716 protocol: PACKET
antrea-ag 32229 root   48u     sock                0,8      0t0 212637275 protocol: PACKET

The issue didn't happen when v0.5.1 was used.

To Reproduce
Running a loop that schedules a number of Pods to one Node in parallel, the IP misconfiguration may happen after 30~50 rounds. The FD leak can happen immediately.

for i in `seq 1 100`; do
  echo "round $i"
  kubectl apply -f pods.yaml
  sleep 15
  kubectl delete -f pods.yaml
  sleep 15
done

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: test
  name: test-1
spec:
  nodeName: <NODE NAME>
  containers:
  - command:
    - sleep
    - "1"
    image: busybox
    name: container
  dnsPolicy: ClusterFirst
  priority: 0
  restartPolicy: Never
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: test
  name: test-2
spec:
  nodeName: <NODE NAME>
  containers:
  - command:
    - sleep
    - "1"
    image: busybox
    name: container
  dnsPolicy: ClusterFirst
  priority: 0
  restartPolicy: Never
---
# Create 20 such Pods
...

Expected
None of above errors should not happen.

Versions:
Please provide the following information:

Antrea version (Docker image tag). 0.7.2

The text was updated successfully, but these errors were encountered:

tnqn · 2020-07-09T12:10:29Z

Root cause

GratuitousArpOverIface in "github.com/j-keck/arping" is not thread-safe as it uses global variables to keep socket and Sockaddr:
https://github.com/j-keck/arping/blob/1a9d8d0d4f43e32ed4df967f1b3fd67cee09f38c/arping_linux.go#L9-L10
When batch processing CNI ADD requests, race condition could happen and lead to FD leak and mismatch as the goroutine that sends gratuitous ARPs may close and release others' FDs by accident.

It didn't happen in v0.5.1 because sending gratuitous ARPs actually failed for most times with that version:

Prior to 0.7.0, it had a great chance that the container netns had been closed when it tried to send gratuitous ARPs because it's executed in a goroutine:
https://github.com/vmware-tanzu/antrea/blob/a5cf8233c0209df3cc24ac6f65d465376f82cf22/pkg/agent/cniserver/pod_configuration.go#L142-L143
In 0.7.0, it was wrong in another manner, see explanation and fix in Fix the issue GARP not being sent #796.

antoninbas · 2020-07-09T18:28:03Z

I wanted to thank @alex-vmw for all the help he provided in troubleshooting this issue and the time he spent on this.

tnqn added the kind/bug Categorizes issue or PR as related to a bug. label Jul 9, 2020

tnqn mentioned this issue Jul 9, 2020

Fix FD leak and mismatch when batch processing CNI ADD #933

Merged

tnqn closed this as completed in #933 Jul 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod IP was configured to Node's eth0 when Pods were created on a Node in parallel #934

Pod IP was configured to Node's eth0 when Pods were created on a Node in parallel #934

tnqn commented Jul 9, 2020

tnqn commented Jul 9, 2020

antoninbas commented Jul 9, 2020

Pod IP was configured to Node's eth0 when Pods were created on a Node in parallel #934

Pod IP was configured to Node's eth0 when Pods were created on a Node in parallel #934

Comments

tnqn commented Jul 9, 2020

tnqn commented Jul 9, 2020

Root cause

antoninbas commented Jul 9, 2020