You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
It happened to @alex-vmw's cluster that some short-life Pods didn't get their IPs and caused connectivity issue.
It was observed that the missing Pod IPs were configured to Node's eth0 somehow, and there were a number of error logs like:
E0707 01:05:30.751839 1 server.go:422] Failed to configure interfaces for container 596bd949477355f1199a5dfe06cb33b8c8cf219421b68c8606e0a09a68b786c0: failed to configure IP address for container 596bd949477355f1199a5dfe06cb33b8c8cf219421b68c8606e0a09a68b786c0: error switching to ns /host/proc/32618/ns/net: Error switching to ns /host/proc/32618/ns/net: invalid argument
E0707 00:55:34.973015 1 pod_configuration.go:242] Failed to advertise IP address for container efb3f9340201bc330e1f009b21d5801ca8fcddcf54e9ecc9f9db6cc32f13dfe9: /host/proc/3808/ns/net is not a valid network namespace: failed to Statfs "/host/proc/3808/ns/net": no such file or directory
E0707 01:45:46.128598 1 server.go:422] Failed to configure interfaces for container bd391dfba8a1e7423213b3088f20c2bcbfb42c36b11d7a6ec3fea353aab77606: failed to create veth devices for container bd391dfba8a1e7423213b3088f20c2bcbfb42c36b11d7a6ec3fea353aab77606: error switching to ns /proc/1/task/3667/ns/net: Error switching to ns /proc/1/task/3667/ns/net: bad file descriptor
And there were FD leaks after Pods created and deleted.
To Reproduce
Running a loop that schedules a number of Pods to one Node in parallel, the IP misconfiguration may happen after 30~50 rounds. The FD leak can happen immediately.
for i in `seq 1 100`; do
echo "round $i"
kubectl apply -f pods.yaml
sleep 15
kubectl delete -f pods.yaml
sleep 15
done
apiVersion: v1
kind: Pod
metadata:
labels:
test: test
name: test-1
spec:
nodeName: <NODE NAME>
containers:
- command:
- sleep
- "1"
image: busybox
name: container
dnsPolicy: ClusterFirst
priority: 0
restartPolicy: Never
---
apiVersion: v1
kind: Pod
metadata:
labels:
test: test
name: test-2
spec:
nodeName: <NODE NAME>
containers:
- command:
- sleep
- "1"
image: busybox
name: container
dnsPolicy: ClusterFirst
priority: 0
restartPolicy: Never
---
# Create 20 such Pods
...
Expected
None of above errors should not happen.
Versions:
Please provide the following information:
Antrea version (Docker image tag). 0.7.2
The text was updated successfully, but these errors were encountered:
GratuitousArpOverIface in "github.com/j-keck/arping" is not thread-safe as it uses global variables to keep socket and Sockaddr: https://github.com/j-keck/arping/blob/1a9d8d0d4f43e32ed4df967f1b3fd67cee09f38c/arping_linux.go#L9-L10
When batch processing CNI ADD requests, race condition could happen and lead to FD leak and mismatch as the goroutine that sends gratuitous ARPs may close and release others' FDs by accident.
It didn't happen in v0.5.1 because sending gratuitous ARPs actually failed for most times with that version:
Describe the bug
It happened to @alex-vmw's cluster that some short-life Pods didn't get their IPs and caused connectivity issue.
It was observed that the missing Pod IPs were configured to Node's eth0 somehow, and there were a number of error logs like:
And there were FD leaks after Pods created and deleted.
The issue didn't happen when v0.5.1 was used.
To Reproduce
Running a loop that schedules a number of Pods to one Node in parallel, the IP misconfiguration may happen after 30~50 rounds. The FD leak can happen immediately.
Expected
None of above errors should not happen.
Versions:
Please provide the following information:
The text was updated successfully, but these errors were encountered: