Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neighbor entry is installed on every NIC on a MultiNIC node #28660

Closed
2 tasks done
liuyuan10 opened this issue Oct 17, 2023 · 3 comments · Fixed by #28782
Closed
2 tasks done

Neighbor entry is installed on every NIC on a MultiNIC node #28660

liuyuan10 opened this issue Oct 17, 2023 · 3 comments · Fixed by #28782
Assignees
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps.

Comments

@liuyuan10
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When cilium runs on a MultiNIC node with enable-l2-neigh-discovery, a neighbor entry is installed for every node and every NIC, even if there is no route to the node through that NIC:

ip neigh | grep extern_learn
10.128.0.1 dev eth0 lladdr 42:01:0a:80:00:01 managed extern_learn REACHABLE
10.128.0.51 dev eth2 managed extern_learn INCOMPLETE
10.128.0.49 dev eth1 managed extern_learn INCOMPLETE
10.128.0.47 dev eth1 managed extern_learn INCOMPLETE
10.128.0.51 dev eth3 managed extern_learn INCOMPLETE
10.128.0.48 dev eth1 managed extern_learn INCOMPLETE
10.128.0.49 dev eth2 managed extern_learn INCOMPLETE
10.128.0.48 dev eth2 managed extern_learn INCOMPLETE
10.128.0.47 dev eth2 managed extern_learn INCOMPLETE
10.128.0.49 dev eth3 managed extern_learn INCOMPLETE
10.128.0.48 dev eth3 managed extern_learn INCOMPLETE
10.128.0.47 dev eth3 managed extern_learn INCOMPLETE
10.128.0.51 dev eth1 managed extern_learn INCOMPLETE
ip r
default via 10.128.0.1 dev eth0 proto dhcp src 10.128.0.118 metric 1024
10.128.0.0/20 via 10.128.0.1 dev eth0 proto dhcp src 10.128.0.118 metric 1024
10.128.0.1 dev eth0 proto dhcp scope link src 10.128.0.118 metric 1024
10.240.2.0/24 via 10.240.2.1 dev eth1 proto dhcp src 10.240.2.5 metric 1024
10.240.2.1 dev eth1 proto dhcp scope link src 10.240.2.5 metric 1024
10.241.4.0/24 via 10.241.4.1 dev eth2 proto dhcp src 10.241.4.49 metric 1024
10.241.4.1 dev eth2 proto dhcp scope link src 10.241.4.49 metric 1024
10.242.0.0/24 via 10.242.0.1 dev eth3 proto dhcp src 10.242.0.15 metric 1024
10.242.0.1 dev eth3 proto dhcp scope link src 10.242.0.15 metric 1024
169.254.123.0/24 dev docker0 proto kernel scope link src 169.254.123.1 linkdown
169.254.169.254 via 10.128.0.1 dev eth0 proto dhcp src 10.128.0.118 metric 1024
169.254.169.254 dev eth2 proto dhcp scope link src 10.241.4.49 metric 1024
169.254.169.254 dev eth1 proto dhcp scope link src 10.240.2.5 metric 1024
169.254.169.254 dev eth3 proto dhcp scope link src 10.242.0.15 metric 1024

From code, we only installed the entry if there is "route" to the node on the NIC. But from what I test on the node, it always returns a route:

ip route get to 10.128.0.48 dev eth2
10.128.0.48 dev eth2 src 10.241.4.49 uid 20162
    cache

so a neighbor entry is added for each node and each NIC. Because the ARP fails, kernel keeps trying to refresh them constantly. I see a retry every 3 second from kernel.

Cilium Version

1.13.6

Kernel Version

6.1.42

Kubernetes Version

v1.28.1

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@liuyuan10 liuyuan10 added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Oct 17, 2023
@borkmann
Copy link
Member

Cc @ysksuzuki could you take a look? On multi-nic we should only install it where the actual route is.

@ysksuzuki
Copy link
Member

@borkmann Sure!

@ysksuzuki ysksuzuki self-assigned this Oct 18, 2023
@ysksuzuki
Copy link
Member

ysksuzuki commented Oct 21, 2023

ip route get returns a route even if the destination is unreachable from the specified interface. So, I think we need to run netlink.RouteList and check if there is a route to the destination from the specified link.

routes, err := netlink.RouteGetWithOptions(nodeIP, &netlink.RouteGetOptions{Oif: link.Attrs().Name})

EDIT: We might be able to use fibmatch flag. Will test it.

$ sudo ip netns add netns1
$ sudo ip netns add netns2
$ sudo ip link add name veth1 type veth peer name veth2
$ sudo ip link add name veth3 type veth peer name veth4
$ sudo ip link set veth2 netns netns1
$ sudo ip link set veth4 netns netns2
$ sudo ip addr add 10.0.0.1/24 dev veth1
$ sudo ip addr add 10.0.1.1/24 dev veth3
$ sudo ip link set veth1 up
$ sudo ip link set veth3 up
$ sudo ip netns exec netns1 ip addr add 10.0.0.2/24 dev veth2
$ sudo ip netns exec netns1 ip link set veth2 up
$ sudo ip netns exec netns2 ip addr add 10.0.1.2/24 dev veth4
$ sudo ip netns exec netns2 ip link set veth4 up
$ ip r
10.0.0.0/24 dev veth1 proto kernel scope link src 10.0.0.1 
10.0.1.0/24 dev veth3 proto kernel scope link src 10.0.1.1 
$ ping -I veth1 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.0.1 veth1: 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.066 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.057 ms

$ ping -I veth3 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.1.1 veth3: 56(84) bytes of data.
From 10.0.1.1 icmp_seq=1 Destination Host Unreachable
From 10.0.1.1 icmp_seq=2 Destination Host Unreachable
From 10.0.1.1 icmp_seq=3 Destination Host Unreachable

$ ip route get 10.0.0.1
local 10.0.0.1 dev lo src 10.0.0.1 uid 1000 
    cache <local> 

# `ip route get` returns a route even if the destination is unreachable from the specified interface 
$ ip route get 10.0.0.1 oif veth3
10.0.0.1 dev veth3 src 10.0.1.1 uid 1000 
    cache 

# With fibmatch flag, it returns full fib lookup matched route
$ ip route get fibmatch 10.0.0.1 oif veth1
local 10.0.0.1 dev veth1 proto kernel scope host src 10.0.0.1

$ ip route get fibmatch 10.0.0.1 oif veth3
RTNETLINK answers: No route to host

ysksuzuki added a commit to ysksuzuki/cilium that referenced this issue Oct 25, 2023
This PR fixes the issue that neighbor entries can be installed on devices
where no route to the destination host exists.

`netlink.RouteGetWithOptions` (equivalent to `ip route get`) with oif
returns a route even if the destination is unreachable from the specified interface.
The neighbor entries can be installed on all devices because of this.

`netlink.RouteGetWithOptions` with `FIBMatch` option (equivalent to `fibmatch` flag)
returns full fib lookup matched route. It returns `No route to host`
if the destination is non-routable from the specified device.
This PR adds `FIBMatch` option to avoid installing the unnecessary entries.

Note:

- With IPv6, it returns `Network is unreachable` if the destination
  is unreachable with or without fibmatch.
- With `FIBMatch` option, `netlink.RouteGetWithOptions` returns `MultiPath` field
  if there are multiple paths to the dest. It returns `GW` field without `FIBMatch`.
  See examples below.

```
// FIBMatch: false
netlink.Route{ GW: 8.8.8.250, MultiPath: [] }
netlink.Route{ GW: 9.9.9.250, MultiPath: [] }

// FIBMatch: true
netlink.Route{ GW: <nil>, MultiPath: [{Ifindex: 1218 Weight: 1 Gw: 9.9.9.250 Flags: []},
                                      {Ifindex: 1220 Weight: 1 Gw: 8.8.8.250 Flags: []}]}
netlink.Route{ GW: <nil>, MultiPath: [{Ifindex: 1218 Weight: 1 Gw: 9.9.9.250 Flags: []},
                                      {Ifindex: 1220 Weight: 1 Gw: 8.8.8.250 Flags: []}]}
```

`ip route get` examples

```
$ ip route
10.0.0.0/24 dev veth1 proto kernel scope link src 10.0.0.1
10.0.1.0/24 dev veth3 proto kernel scope link src 10.0.1.1

$ ping -I veth1 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.0.1 veth1: 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.066 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.057 ms

$ ping -I veth3 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.1.1 veth3: 56(84) bytes of data.
From 10.0.1.1 icmp_seq=1 Destination Host Unreachable
From 10.0.1.1 icmp_seq=2 Destination Host Unreachable
From 10.0.1.1 icmp_seq=3 Destination Host Unreachable

$ ip route get 10.0.0.1 oif veth1
local 10.0.0.1 dev lo table local src 10.0.0.1 uid 1000
    cache <local>

// `ip route get` returns a route even if the destination is unreachable
// from the specified interface
$ ip route get 10.0.0.1 oif veth3
10.0.0.1 dev veth3 src 10.0.1.1 uid 1000
    cache

// With fibmatch flag, it returns full fib lookup matched route
$ ip route get fibmatch 10.0.0.1 oif veth1
local 10.0.0.1 dev veth1 proto kernel scope host src 10.0.0.1

$ ip route get fibmatch 10.0.0.1 oif veth3
RTNETLINK answers: No route to host
```

Fixes: cilium#28660

Signed-off-by: Yusuke Suzuki <ysuzuki4112@gmail.com>
ysksuzuki added a commit to ysksuzuki/cilium that referenced this issue Oct 27, 2023
This PR fixes the issue that neighbor entries can be installed on devices
where no route to the destination host exists.

`netlink.RouteGetWithOptions` (equivalent to `ip route get`) with oif
returns a route even if the destination is unreachable from the specified interface.
The neighbor entries can be installed on all devices because of this.

`netlink.RouteGetWithOptions` with `FIBMatch` option (equivalent to `fibmatch` flag)
returns full fib lookup matched route. It returns `No route to host`
if the destination is non-routable from the specified device.
This PR adds `FIBMatch` option to avoid installing the unnecessary entries.

Note:

- With IPv6, it returns `Network is unreachable` if the destination
  is unreachable with or without fibmatch.
- With `FIBMatch` option, `netlink.RouteGetWithOptions` returns `MultiPath` field
  if there are multiple paths to the dest. It returns `GW` field without `FIBMatch`.
  See examples below.

```
// FIBMatch: false
netlink.Route{ GW: 8.8.8.250, MultiPath: [] }
netlink.Route{ GW: 9.9.9.250, MultiPath: [] }

// FIBMatch: true
netlink.Route{ GW: <nil>, MultiPath: [{Ifindex: 1218 Weight: 1 Gw: 9.9.9.250 Flags: []},
                                      {Ifindex: 1220 Weight: 1 Gw: 8.8.8.250 Flags: []}]}
```

`ip route get` examples

```
$ ip route
10.0.0.0/24 dev veth1 proto kernel scope link src 10.0.0.1
10.0.1.0/24 dev veth3 proto kernel scope link src 10.0.1.1

$ ping -I veth1 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.0.1 veth1: 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.066 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.057 ms

$ ping -I veth3 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.1.1 veth3: 56(84) bytes of data.
From 10.0.1.1 icmp_seq=1 Destination Host Unreachable
From 10.0.1.1 icmp_seq=2 Destination Host Unreachable
From 10.0.1.1 icmp_seq=3 Destination Host Unreachable

$ ip route get 10.0.0.1 oif veth1
local 10.0.0.1 dev lo table local src 10.0.0.1 uid 1000
    cache <local>

// `ip route get` returns a route even if the destination is unreachable
// from the specified interface
$ ip route get 10.0.0.1 oif veth3
10.0.0.1 dev veth3 src 10.0.1.1 uid 1000
    cache

// With fibmatch flag, it returns full fib lookup matched route
$ ip route get fibmatch 10.0.0.1 oif veth1
local 10.0.0.1 dev veth1 proto kernel scope host src 10.0.0.1

$ ip route get fibmatch 10.0.0.1 oif veth3
RTNETLINK answers: No route to host
```

Fixes: cilium#28660

Signed-off-by: Yusuke Suzuki <ysuzuki4112@gmail.com>
squeed pushed a commit that referenced this issue Nov 10, 2023
This PR fixes the issue that neighbor entries can be installed on devices
where no route to the destination host exists.

`netlink.RouteGetWithOptions` (equivalent to `ip route get`) with oif
returns a route even if the destination is unreachable from the specified interface.
The neighbor entries can be installed on all devices because of this.

`netlink.RouteGetWithOptions` with `FIBMatch` option (equivalent to `fibmatch` flag)
returns full fib lookup matched route. It returns `No route to host`
if the destination is non-routable from the specified device.
This PR adds `FIBMatch` option to avoid installing the unnecessary entries.

Note:

- With IPv6, it returns `Network is unreachable` if the destination
  is unreachable with or without fibmatch.
- With `FIBMatch` option, `netlink.RouteGetWithOptions` returns `MultiPath` field
  if there are multiple paths to the dest. It returns `GW` field without `FIBMatch`.
  See examples below.

```
// FIBMatch: false
netlink.Route{ GW: 8.8.8.250, MultiPath: [] }
netlink.Route{ GW: 9.9.9.250, MultiPath: [] }

// FIBMatch: true
netlink.Route{ GW: <nil>, MultiPath: [{Ifindex: 1218 Weight: 1 Gw: 9.9.9.250 Flags: []},
                                      {Ifindex: 1220 Weight: 1 Gw: 8.8.8.250 Flags: []}]}
```

`ip route get` examples

```
$ ip route
10.0.0.0/24 dev veth1 proto kernel scope link src 10.0.0.1
10.0.1.0/24 dev veth3 proto kernel scope link src 10.0.1.1

$ ping -I veth1 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.0.1 veth1: 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.066 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.057 ms

$ ping -I veth3 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.1.1 veth3: 56(84) bytes of data.
From 10.0.1.1 icmp_seq=1 Destination Host Unreachable
From 10.0.1.1 icmp_seq=2 Destination Host Unreachable
From 10.0.1.1 icmp_seq=3 Destination Host Unreachable

$ ip route get 10.0.0.1 oif veth1
local 10.0.0.1 dev lo table local src 10.0.0.1 uid 1000
    cache <local>

// `ip route get` returns a route even if the destination is unreachable
// from the specified interface
$ ip route get 10.0.0.1 oif veth3
10.0.0.1 dev veth3 src 10.0.1.1 uid 1000
    cache

// With fibmatch flag, it returns full fib lookup matched route
$ ip route get fibmatch 10.0.0.1 oif veth1
local 10.0.0.1 dev veth1 proto kernel scope host src 10.0.0.1

$ ip route get fibmatch 10.0.0.1 oif veth3
RTNETLINK answers: No route to host
```

Fixes: #28660

Signed-off-by: Yusuke Suzuki <ysuzuki4112@gmail.com>
sujoshua pushed a commit to sujoshua/cilium that referenced this issue Nov 16, 2023
This PR fixes the issue that neighbor entries can be installed on devices
where no route to the destination host exists.

`netlink.RouteGetWithOptions` (equivalent to `ip route get`) with oif
returns a route even if the destination is unreachable from the specified interface.
The neighbor entries can be installed on all devices because of this.

`netlink.RouteGetWithOptions` with `FIBMatch` option (equivalent to `fibmatch` flag)
returns full fib lookup matched route. It returns `No route to host`
if the destination is non-routable from the specified device.
This PR adds `FIBMatch` option to avoid installing the unnecessary entries.

Note:

- With IPv6, it returns `Network is unreachable` if the destination
  is unreachable with or without fibmatch.
- With `FIBMatch` option, `netlink.RouteGetWithOptions` returns `MultiPath` field
  if there are multiple paths to the dest. It returns `GW` field without `FIBMatch`.
  See examples below.

```
// FIBMatch: false
netlink.Route{ GW: 8.8.8.250, MultiPath: [] }
netlink.Route{ GW: 9.9.9.250, MultiPath: [] }

// FIBMatch: true
netlink.Route{ GW: <nil>, MultiPath: [{Ifindex: 1218 Weight: 1 Gw: 9.9.9.250 Flags: []},
                                      {Ifindex: 1220 Weight: 1 Gw: 8.8.8.250 Flags: []}]}
```

`ip route get` examples

```
$ ip route
10.0.0.0/24 dev veth1 proto kernel scope link src 10.0.0.1
10.0.1.0/24 dev veth3 proto kernel scope link src 10.0.1.1

$ ping -I veth1 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.0.1 veth1: 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.066 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.057 ms

$ ping -I veth3 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.1.1 veth3: 56(84) bytes of data.
From 10.0.1.1 icmp_seq=1 Destination Host Unreachable
From 10.0.1.1 icmp_seq=2 Destination Host Unreachable
From 10.0.1.1 icmp_seq=3 Destination Host Unreachable

$ ip route get 10.0.0.1 oif veth1
local 10.0.0.1 dev lo table local src 10.0.0.1 uid 1000
    cache <local>

// `ip route get` returns a route even if the destination is unreachable
// from the specified interface
$ ip route get 10.0.0.1 oif veth3
10.0.0.1 dev veth3 src 10.0.1.1 uid 1000
    cache

// With fibmatch flag, it returns full fib lookup matched route
$ ip route get fibmatch 10.0.0.1 oif veth1
local 10.0.0.1 dev veth1 proto kernel scope host src 10.0.0.1

$ ip route get fibmatch 10.0.0.1 oif veth3
RTNETLINK answers: No route to host
```

Fixes: cilium#28660

Signed-off-by: Yusuke Suzuki <ysuzuki4112@gmail.com>
pjablonski123 pushed a commit to pjablonski123/cilium that referenced this issue Dec 15, 2023
This PR fixes the issue that neighbor entries can be installed on devices
where no route to the destination host exists.

`netlink.RouteGetWithOptions` (equivalent to `ip route get`) with oif
returns a route even if the destination is unreachable from the specified interface.
The neighbor entries can be installed on all devices because of this.

`netlink.RouteGetWithOptions` with `FIBMatch` option (equivalent to `fibmatch` flag)
returns full fib lookup matched route. It returns `No route to host`
if the destination is non-routable from the specified device.
This PR adds `FIBMatch` option to avoid installing the unnecessary entries.

Note:

- With IPv6, it returns `Network is unreachable` if the destination
  is unreachable with or without fibmatch.
- With `FIBMatch` option, `netlink.RouteGetWithOptions` returns `MultiPath` field
  if there are multiple paths to the dest. It returns `GW` field without `FIBMatch`.
  See examples below.

```
// FIBMatch: false
netlink.Route{ GW: 8.8.8.250, MultiPath: [] }
netlink.Route{ GW: 9.9.9.250, MultiPath: [] }

// FIBMatch: true
netlink.Route{ GW: <nil>, MultiPath: [{Ifindex: 1218 Weight: 1 Gw: 9.9.9.250 Flags: []},
                                      {Ifindex: 1220 Weight: 1 Gw: 8.8.8.250 Flags: []}]}
```

`ip route get` examples

```
$ ip route
10.0.0.0/24 dev veth1 proto kernel scope link src 10.0.0.1
10.0.1.0/24 dev veth3 proto kernel scope link src 10.0.1.1

$ ping -I veth1 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.0.1 veth1: 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.066 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.057 ms

$ ping -I veth3 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.1.1 veth3: 56(84) bytes of data.
From 10.0.1.1 icmp_seq=1 Destination Host Unreachable
From 10.0.1.1 icmp_seq=2 Destination Host Unreachable
From 10.0.1.1 icmp_seq=3 Destination Host Unreachable

$ ip route get 10.0.0.1 oif veth1
local 10.0.0.1 dev lo table local src 10.0.0.1 uid 1000
    cache <local>

// `ip route get` returns a route even if the destination is unreachable
// from the specified interface
$ ip route get 10.0.0.1 oif veth3
10.0.0.1 dev veth3 src 10.0.1.1 uid 1000
    cache

// With fibmatch flag, it returns full fib lookup matched route
$ ip route get fibmatch 10.0.0.1 oif veth1
local 10.0.0.1 dev veth1 proto kernel scope host src 10.0.0.1

$ ip route get fibmatch 10.0.0.1 oif veth3
RTNETLINK answers: No route to host
```

Fixes: cilium#28660

Signed-off-by: Yusuke Suzuki <ysuzuki4112@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants