Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ARP/NDP responders to external IP assigner #3318

Merged
merged 1 commit into from Feb 23, 2022

Conversation

hty690
Copy link
Contributor

@hty690 hty690 commented Feb 15, 2022

Apart from assigning IPs to the dummy interface, this commit will also create
raw sockets and listen for ARP requests packets (IPv4) and Neighbor Solicitation
packets (IPv6). This fixes the issue that Egress cannot work in IPv6 mode
as the system would not reply to Neighbor Advertisement from external
interfaces if the IP is assigned to the dummy interface. The IP assigner
will skip managing the dummy device if dummyDeviceName is empty. This avoids
the kernel creating a local route, which has a higher priority than the routes
installed by antrea-proxy in proxyAll mode.

Signed-off-by: Tianyi Huang hty690@126.com

@hty690 hty690 force-pushed the arp_ndp_socket branch 2 times, most recently from ea5bd69 to c8a5073 Compare February 15, 2022 13:48
pkg/agent/ipassigner/virtual/arp_responder.go Outdated Show resolved Hide resolved
// See the License for the specific language governing permissions and
// limitations under the License.

package virtual
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how if we need both of binding IP to an interface and replying ARP (e.g. to fix Egress IPv6 support)? Should we make binding IP and ARP two configurable feature of a single IP assigner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merged the assigners into one to fix the Egress issue as well.

@hty690 hty690 force-pushed the arp_ndp_socket branch 2 times, most recently from 1561e81 to f1f3f5c Compare February 16, 2022 07:55
@hty690 hty690 changed the title Fix ServiceExternalIP feature cannot work with antrea-proxy in proxyAll mode Add ARP/NDP responders to external IP assigner Feb 16, 2022
github.com/mdlayher/arp v0.0.0-20191213142603-f72070a231fc
github.com/mdlayher/ethernet v0.0.0-20190606142754-0394541c37b7
github.com/mdlayher/ndp v0.0.0-20210831201139-f982b8766fb5
github.com/mdlayher/raw v0.0.0-20211126142749-4eae47f3d54b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we import those packages as we have implemented some functions in agent/util/arping and agent/util/arping?

Copy link
Contributor Author

@hty690 hty690 Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature does not only need to send GARP or NS over the network interface. We also need to listen to ARP and NS requests and process the queries accordingly. Without these packages, we may need to manage the raw sockets, parse the ethernet headers, etc.

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit message:

if the IP were assigned to the dummy interface.

were -> is

This avoids the kernel creating local routes for the external IP, which have

"creates a local route", "which has"

@@ -173,3 +215,14 @@ func (a *ipAssigner) AssignedIPs() sets.String {
// Return a copy.
return a.assignedIPs.Union(nil)
}

// Run starts the IP assigner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starts ARP responder and NDP responder?

if err := a.loadIPAddresses(); err != nil {
return nil, fmt.Errorf("error when loading IP addresses from the system: %v", err)
if ipv4 != nil {
arpResonder, err := responder.NewARPResponder(externalInterface)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As ARP is not always needed, can we pass a boolean to enable ARP responder? We can add another for NDP too.

Or another we do not enable ARP when dummyDeviceName is not "".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see you changed this?

if ipv6 != nil {
ndpResponder, err := responder.NewNDPResponder(externalInterface)
if err != nil {
return nil, fmt.Errorf("failed to create ARP responder for link %s: %v", externalInterface.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARP -> NDP

if dummyDeviceName != "" {
dummyDevice, err := ensureDummyDevice(dummyDeviceName)
if err != nil {
return nil, fmt.Errorf("error when ensuring dummy device exist: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exist -> exists

ensuring -> checking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureDummyDevice creates the dummy device if it doesn't exist, not just checks its existence

return r.iface.Name
}

func (r *arpResponder) AssignIP(ip net.IP) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably rename to "AddIP", "RemoveIP"? It is not really about assignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update. thanks.

}

// GratuitousARP sends an gratuitous ARP packet for the IP.
func (r *arpResponder) Gratuitous(ip net.IP) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming it Advertise()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update. thanks.

if err != nil {
return err
}
return r.conn.WriteTo(pkt, ethernet.Broadcast)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should send out multiple GARPs. We can do it in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not clear for multiple GARPs. Could you help to provide more context for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to

ticker := time.NewTicker(50 * time.Millisecond)
defer ticker.Stop()
count := 0
for {
// Send gratuitous ARP to network in case of stale mappings for this IP address
// (e.g. if a previous - deleted - Pod was using the same IP).
if err := arping.GratuitousARPOverIface(targetIP, iface); err != nil {
klog.Warningf("Failed to send gratuitous ARP #%d: %v", count, err)
}
count++
if count == 3 {
break
}
<-ticker.C
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer. Do we need this for external IPs to send multiple GARPs to the transport interface?

if pkt.Operation != arp.OperationRequest {
return nil
}
if _, ok := r.assignedIPs[pkt.TargetIP.String()]; !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to protect the map?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is just a set, we can use a thread-safe type. @tnqn might have a recommendation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggested to use sets.String but it's not thread-safe either. It still needs a lock, but it's a common pattern in our code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need not a lock for other reasons, we can use sync.Map (which should work well for a set that no need to read the value).

@codecov-commenter
Copy link

codecov-commenter commented Feb 17, 2022

Codecov Report

Merging #3318 (4918998) into main (15de4cf) will decrease coverage by 7.32%.
The diff coverage is 55.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3318      +/-   ##
==========================================
- Coverage   60.95%   53.62%   -7.33%     
==========================================
  Files         266      374     +108     
  Lines       26508    51235   +24727     
==========================================
+ Hits        16159    27477   +11318     
- Misses       8597    21345   +12748     
- Partials     1752     2413     +661     
Flag Coverage Δ
e2e-tests 50.93% <34.04%> (?)
integration-tests 35.79% <ø> (?)
kind-e2e-tests 47.38% <29.08%> (-0.46%) ⬇️
unit-tests 41.76% <60.30%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/ipassigner/ip_assigner_linux.go 52.55% <43.83%> (-7.72%) ⬇️
pkg/agent/ipassigner/responder/ndp_responder.go 47.05% <47.05%> (ø)
pkg/agent/ipassigner/responder/arp_responder.go 70.58% <70.58%> (ø)
...g/agent/controller/serviceexternalip/controller.go 80.34% <78.78%> (-0.52%) ⬇️
pkg/agent/controller/egress/egress_controller.go 69.16% <100.00%> (+30.35%) ⬆️
pkg/agent/cniserver/pod_configuration_linux.go 26.31% <0.00%> (-40.36%) ⬇️
pkg/controller/ipam/antrea_ipam_controller.go 48.71% <0.00%> (-31.57%) ⬇️
pkg/agent/flowexporter/connections/conntrack.go 45.71% <0.00%> (-30.48%) ⬇️
pkg/controller/networkpolicy/endpoint_querier.go 61.46% <0.00%> (-29.97%) ⬇️
.../agent/flowexporter/priorityqueue/priorityqueue.go 63.29% <0.00%> (-29.31%) ⬇️
... and 339 more

@hty690 hty690 force-pushed the arp_ndp_socket branch 5 times, most recently from 8de0057 to 72f5bf1 Compare February 17, 2022 10:46
cmd/antrea-agent/agent.go Show resolved Hide resolved
pkg/agent/ipassigner/ip_assigner_linux.go Outdated Show resolved Hide resolved
if err := func() error {
a.mutex.Lock()
defer a.mutex.Unlock()
if a.isIPAssigned(ip) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Egress, it's possible that multiple Egresses share same static egress IP, then it could happen that two workers are assigning the same IP concurrently. Without the previous lock, the following code may be executed twice for an IP. Is it ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the responders are thread-safe, do you think it is enough by adding checks of the error codes of netlink.AddrAdd/netlink.AddrDel?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the race condition can happen between AssignIP and UnassignIP. When an IP is released from one Resource and reallocated to another Resource.
worker1 wants to remove an IP while worker2 wants to add it. worker1 starts first.

  1. worker1 sees IP is assigned and starts removing it without holding the lock.
  2. worker2 sees IP is assigned and returns directly.
  3. worker1 removes the IP.

But it seems if worker2 starts first, worker1 may still remove the IP even with the lock. This is not an issue when there is an localIPDetector as removing IP could trigger resync. As it's no longer the case for Service external IP, maybe it needs to track reference number for such case, as well as holding the lock. Then:
If worker1 starts first:

  1. worker1 sees IP is assigned and starts removing it.
  2. worker2 waits for the lock.
  3. worker1 removes IP.
  4. worker2 gets the lock, and adds the IP back.

If worker2 starts first:

  1. worker2 sees IP is assigned and increase reference count, returns.
  2. worker1 sees IP is assigned and decrease reference count, returns.

Reference count can be counted from a set of entities names which are associated with the IP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the reference count mechanism for the agent Service external IP controller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue could still happen without the lock when it's used in EgressController. When Egress A is removed and its IP is allocated to Egress B. worker1 handles removal of A, worker2 handles sync of B.

  1. worker1 sees IP is assigned and starts removing it without holding the lock.
  2. worker2 sees IP is already assigned and returns directly.
  3. worker1 removes the IP, which triggers resync of B.
  4. worker2 resyncs B, sees IP is already assigned and returns directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. I revert the changes to make the assign/unassign operation atomic. Thanks!

pkg/agent/ipassigner/ip_assigner_linux.go Outdated Show resolved Hide resolved
defer a.mutex.Unlock()

if !a.assignedIPs.Has(ip) {
if !a.isIPAssigned(ip) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, there may be a race condition between Assign and Unassign as well.

}

// Advertise sends an gratuitous ARP packet for the IP.
func (r *arpResponder) Advertise(ip net.IP) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should AddIP calls advertise directly since it's by design assigning an IP should advertise it as well, which could make IPAssigner simpler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed accordingly. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it can be private? And we can remove the IP version check as it's an internal method whose argument has been checked by public methods.

if err != nil {
return err
}
return r.conn.WriteTo(pkt, ethernet.Broadcast)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to

ticker := time.NewTicker(50 * time.Millisecond)
defer ticker.Stop()
count := 0
for {
// Send gratuitous ARP to network in case of stale mappings for this IP address
// (e.g. if a previous - deleted - Pod was using the same IP).
if err := arping.GratuitousARPOverIface(targetIP, iface); err != nil {
klog.Warningf("Failed to send gratuitous ARP #%d: %v", count, err)
}
count++
if count == 3 {
break
}
<-ticker.C
}

pkg/agent/ipassigner/responder/arp_responder.go Outdated Show resolved Hide resolved
pkg/agent/ipassigner/responder/arp_responder.go Outdated Show resolved Hide resolved
pkg/agent/ipassigner/responder/arp_responder.go Outdated Show resolved Hide resolved
pkg/agent/ipassigner/responder/ndp_responder.go Outdated Show resolved Hide resolved
pkg/agent/ipassigner/responder/ndp_responder.go Outdated Show resolved Hide resolved
pkg/agent/ipassigner/responder/ndp_responder.go Outdated Show resolved Hide resolved
pkg/agent/ipassigner/responder/ndp_responder.go Outdated Show resolved Hide resolved
test/e2e/service_externalip_test.go Show resolved Hide resolved
@hty690 hty690 force-pushed the arp_ndp_socket branch 3 times, most recently from 42e1912 to 3e427cd Compare February 18, 2022 07:27
if err := func() error {
a.mutex.Lock()
defer a.mutex.Unlock()
if a.isIPAssigned(ip) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the race condition can happen between AssignIP and UnassignIP. When an IP is released from one Resource and reallocated to another Resource.
worker1 wants to remove an IP while worker2 wants to add it. worker1 starts first.

  1. worker1 sees IP is assigned and starts removing it without holding the lock.
  2. worker2 sees IP is assigned and returns directly.
  3. worker1 removes the IP.

But it seems if worker2 starts first, worker1 may still remove the IP even with the lock. This is not an issue when there is an localIPDetector as removing IP could trigger resync. As it's no longer the case for Service external IP, maybe it needs to track reference number for such case, as well as holding the lock. Then:
If worker1 starts first:

  1. worker1 sees IP is assigned and starts removing it.
  2. worker2 waits for the lock.
  3. worker1 removes IP.
  4. worker2 gets the lock, and adds the IP back.

If worker2 starts first:

  1. worker2 sees IP is assigned and increase reference count, returns.
  2. worker1 sees IP is assigned and decrease reference count, returns.

Reference count can be counted from a set of entities names which are associated with the IP.

}

// Advertise sends an gratuitous ARP packet for the IP.
func (r *arpResponder) Advertise(ip net.IP) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it can be private? And we can remove the IP version check as it's an internal method whose argument has been checked by public methods.

klog.InfoS("Assigned IP to ARP responder", "ip", ip, "interface", r.iface.Name)
err := r.Advertise(ip)
if err != nil {
klog.Warningf("failed to advertise for IP %s: %v", ip, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't log error?

return r.iface.Name
}

func (r *ndpResponder) Advertise(ip net.IP) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

test/e2e/service_externalip_test.go Show resolved Hide resolved
@tnqn tnqn added the action/release-note Indicates a PR that should be included in release notes. label Feb 18, 2022
iface *net.Interface
conn *arp.Client
assignedIPs sets.String
mutex sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync.Map is indeed a simpler choice here, but I know you need a mutex in ndpResponder for other reasons, so maybe you prefer to use the same way for both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think the mutex here is better for consistency.

func (a *ipAssigner) Run(ch <-chan struct{}) {
// Start the ARP responder only when the dummy device is not created. The kernel will handle ARP requests
// for IPs assigned to the dummy devices by default.
// TODO: Check arp_ignore sysctl parameter of the transport interface to determin whether to start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

determin -> determine

the arp_ignore sysctl parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

pkg/agent/controller/serviceexternalip/controller.go Outdated Show resolved Hide resolved
@@ -416,21 +437,38 @@ func TestUpdateService(t *testing.T) {
},
expectError: false,
},
{
name: "Service updated external IP and local Node selected but other Service still owns the assigned IP",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good test case

if err := func() error {
a.mutex.Lock()
defer a.mutex.Unlock()
if a.isIPAssigned(ip) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue could still happen without the lock when it's used in EgressController. When Egress A is removed and its IP is allocated to Egress B. worker1 handles removal of A, worker2 handles sync of B.

  1. worker1 sees IP is assigned and starts removing it without holding the lock.
  2. worker2 sees IP is already assigned and returns directly.
  3. worker1 removes the IP, which triggers resync of B.
  4. worker2 resyncs B, sees IP is already assigned and returns directly.

Apart from assigning IPs to the dummy interface, this commit will also create
raw sockets and listen for ARP requests packets (IPv4) and Neighbor Solicitation
packets (IPv6). This fixes the issue that Egress cannot work in IPv6 mode
as the system would not reply to Neighbor Advertisement from external
interfaces if the IP is assigned to the dummy interface. The IP assigner
will skip managing the dummy device if dummyDeviceName is empty. This avoids
the kernel creating a local route, which has a higher priority than the routes
installed by antrea-proxy in proxyAll mode.

Signed-off-by: Tianyi Huang <hty690@126.com>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@hty690
Copy link
Contributor Author

hty690 commented Feb 22, 2022

/test-ipv6-all
/test-ipv6-only-all

@hty690
Copy link
Contributor Author

hty690 commented Feb 22, 2022

/test-integration

1 similar comment
@hty690
Copy link
Contributor Author

hty690 commented Feb 22, 2022

/test-integration

@jianjuns
Copy link
Contributor

/test-e2e

@antrea-io antrea-io deleted a comment from tnqn Feb 23, 2022
@hty690
Copy link
Contributor Author

hty690 commented Feb 23, 2022

/test-ipv6-e2e
/test-ipv6-only-e2e

@jianjuns jianjuns merged commit a87ef55 into antrea-io:main Feb 23, 2022
@hty690 hty690 deleted the arp_ndp_socket branch February 24, 2022 00:35
bangqipropel pushed a commit to bangqipropel/antrea that referenced this pull request Mar 2, 2022
Apart from assigning IPs to the dummy interface, this commit will also create
raw sockets and listen for ARP requests packets (IPv4) and Neighbor Solicitation
packets (IPv6). This fixes the issue that Egress cannot work in IPv6 mode
as the system would not reply to Neighbor Advertisement from external
interfaces if the IP is assigned to the dummy interface. The IP assigner
will skip managing the dummy device if dummyDeviceName is empty. This avoids
the kernel creating a local route, which has a higher priority than the routes
installed by antrea-proxy in proxyAll mode.

Signed-off-by: Tianyi Huang <hty690@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants