Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue GARP not being sent #796

Merged
merged 1 commit into from
Jun 5, 2020
Merged

Fix the issue GARP not being sent #796

merged 1 commit into from
Jun 5, 2020

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Jun 4, 2020

On Linux, to flush stale ARP cache of network, antrea-agent is supposed
to send gratuitous ARP packets when configuring network for a Pod.
However, the routine was broken when refactoring the
"advertiseContainerAddr" method for Windows support. The action was
executed in a new spawned goroutine within the context of "NetNS.Do",
then it was actually executed after NetNS had switched to the target
container NS and had switched back to the host NS.

This patch fixes it by executing the whole routine in a goroutine, and
fixes the ticker leak problem. It also adds an e2e test to verify there
are at least 3 ARP packets received by OVS after a Pod is up to avoid
this being broken again as not sending GARP may not lead to connectivity
test failure stably if there are no IP reuse before ARP cache expiry.

For #785

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-all: to trigger all tests.
  • /skip-all: to skip all tests.

These commands can only be run by members of the vmware-tanzu organization.

@tnqn
Copy link
Member Author

tnqn commented Jun 4, 2020

/test-all

antoninbas
antoninbas previously approved these changes Jun 4, 2020
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

defer teardownTest(t, data)

podName := "busybox"
nodeName := masterNodeName()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the master?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted the test not limited by node number.
Now I see some tests are using workerNodeName(1) without checking node number, changed to workerNodeName(1) too.

test/e2e/basic_test.go Outdated Show resolved Hide resolved
t.Fatalf("Error when waiting for IP for Pod '%s': %v", podName, err)
}

cmd := []string{"ovs-ofctl", "dump-flows", "br-int", fmt.Sprintf("table=10,arp,arp_spa=%s", podIP)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that by that time the GARPS have not been sent yet, since it is an asynchronous operation in the agent?

Copy link
Member Author

@tnqn tnqn Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I was wondering why the test failed on Jenkin CI only and it got only 1 ARP packet. Maybe it's because of this. I added a sleep and let's see if it's stable now.

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lzhecheng
Copy link
Contributor

/test-windows-conformance

On Linux, to flush stale ARP cache of network, antrea-agent is supposed
to send gratuitous ARP packets when configuring network for a Pod.
However, the routine was broken when refactoring the
"advertiseContainerAddr" method for Windows support. The action was
executed in a new spawned goroutine within the context of "NetNS.Do",
then it was actually executed after NetNS had switched to the target
container NS and had switched back to the host NS.

This patch fixes it by executing the whole routine in a goroutine, and
fixes the ticker leak problem. It also adds an e2e test to verify there
are at least 3 ARP packets received by OVS after a Pod is up to avoid
this being broken again as not sending GARP may not lead to connectivity
test failure stably if there are no IP reuse before ARP cache expiry.
@tnqn
Copy link
Member Author

tnqn commented Jun 5, 2020

/test-all

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn tnqn merged commit 3b5326b into antrea-io:master Jun 5, 2020
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Jun 5, 2020
On Linux, to flush stale ARP cache of network, antrea-agent is supposed
to send gratuitous ARP packets when configuring network for a Pod.
However, the routine was broken when refactoring the
"advertiseContainerAddr" method for Windows support. The action was
executed in a new spawned goroutine within the context of "NetNS.Do",
then it was actually executed after NetNS had switched to the target
container NS and had switched back to the host NS.

This patch fixes it by executing the whole routine in a goroutine, and
fixes the ticker leak problem. It also adds an e2e test to verify there
are at least 3 ARP packets received by OVS after a Pod is up to avoid
this being broken again as not sending GARP may not lead to connectivity
test failure stably if there are no IP reuse before ARP cache expiry.
antoninbas pushed a commit that referenced this pull request Jun 5, 2020
On Linux, to flush stale ARP cache of network, antrea-agent is supposed
to send gratuitous ARP packets when configuring network for a Pod.
However, the routine was broken when refactoring the
"advertiseContainerAddr" method for Windows support. The action was
executed in a new spawned goroutine within the context of "NetNS.Do",
then it was actually executed after NetNS had switched to the target
container NS and had switched back to the host NS.

This patch fixes it by executing the whole routine in a goroutine, and
fixes the ticker leak problem. It also adds an e2e test to verify there
are at least 3 ARP packets received by OVS after a Pod is up to avoid
this being broken again as not sending GARP may not lead to connectivity
test failure stably if there are no IP reuse before ARP cache expiry.
GraysonWu pushed a commit to GraysonWu/antrea that referenced this pull request Sep 22, 2020
On Linux, to flush stale ARP cache of network, antrea-agent is supposed
to send gratuitous ARP packets when configuring network for a Pod.
However, the routine was broken when refactoring the
"advertiseContainerAddr" method for Windows support. The action was
executed in a new spawned goroutine within the context of "NetNS.Do",
then it was actually executed after NetNS had switched to the target
container NS and had switched back to the host NS.

This patch fixes it by executing the whole routine in a goroutine, and
fixes the ticker leak problem. It also adds an e2e test to verify there
are at least 3 ARP packets received by OVS after a Pod is up to avoid
this being broken again as not sending GARP may not lead to connectivity
test failure stably if there are no IP reuse before ARP cache expiry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants