Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: Re-introduce ICMPv6 NS responder on from-netdev #30837

Merged
merged 3 commits into from Mar 1, 2024

Conversation

jschwinger233
Copy link
Member

@jschwinger233 jschwinger233 commented Feb 19, 2024

This reverts commit 6580714, to fix the breakage of "IPv6 NS responder for pod" introduced by #12086 (bpf: Reply NA when recv ND for local IPv6 endpoints).

6580714 was merged to solve #14509. To not revive #14509, this commit also passes through ICMPv6 NS if the target is native node IP (eth0's addr). By letting stack take care of those NS-for-node-IP packets, we managed to:

  1. Solve IPV6 access to node is lost after installing cilium with ipv6 enabled #14509 again, but in a way keeping NS responder. The cause of IPV6 access to node is lost after installing cilium with ipv6 enabled #14509 was NS responder always generates ND whose source IP is "router_ip" (cilium_internal_ip) rather than "node_ip". Once we pass those NS-for-node-IP packets to stack, the ND response would naturally have "node_ip" as source.

  2. Avoid the fib_lookup failure mentioned at bpf: Re-introduce ICMPv6 NS responder on from-netdev #30837 (comment).

This PR also adds bpf unit test to cover IPv6 NS responder feature.

Fixes: #30926

Fixes an IPv6 issue that cilium doesn't respond to Neighbor Solicitation targeting the pods on same node.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 19, 2024
@jschwinger233
Copy link
Member Author

/test

@jschwinger233
Copy link
Member Author

/test

@jschwinger233 jschwinger233 added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Feb 19, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 19, 2024
@jschwinger233 jschwinger233 added kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Feb 19, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 19, 2024
@jschwinger233 jschwinger233 added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. feature/ipv6 Relates to IPv6 protocol support labels Feb 19, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 19, 2024
@jschwinger233 jschwinger233 added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. backport/author The backport will be carried out by the author of the PR. labels Feb 19, 2024
@jschwinger233 jschwinger233 force-pushed the gray/fix-ipv6-proxy branch 3 times, most recently from 01c4b18 to 0bf3a8c Compare February 20, 2024 09:27
@jschwinger233
Copy link
Member Author

/test

@jschwinger233
Copy link
Member Author

/test

@jschwinger233
Copy link
Member Author

ci-e2e keeps failing on north-south-loadbalancing but I couldn't make a local repro: https://github.com/cilium/cilium/actions/runs/7971912812/job/21798175918

Adding pwru gh action to see if I can get something useful.

@jschwinger233
Copy link
Member Author

/ci-e2e

@jschwinger233
Copy link
Member Author

jschwinger233 commented Feb 23, 2024

ci-e2e keeps failing on north-south-loadbalancing but I couldn't make a local repro: https://github.com/cilium/cilium/actions/runs/7971912812/job/21798175918

Adding pwru gh action to see if I can get something useful.

This turns out to be a kernel specific bug.

c0dfeb0 (cc @julianwiedmann ) added a testcase for 5.4 with KPR=on + routing=vxlan, revealed this hidden bug which was accidentally fixed on 1.14, but now come back again when I intended to simply revert #25329 .

The scenario is NS connectivity, we are curling from outside to node_ip:node_port. The key part is when first TCP reply with syn+ack arrives at cilium_vxlan, it is supposed to be rev-NAT-ed to the original tuple, then to do a bpf_fib_lookup followed by bpf_redirect.

With or without bpf NS responder makes a difference for v6 neighbor system:

  1. without bpf NS responder, when curling from outside, the ICMP6 NS for node_ip from outside will be handled by stack, as a result an ipv6 neigh entry will be recorded on the node.
  2. with bpf NS responder, the NS for node_ip from outside is taken care of by bpf, and kernel doesn't get a chance to generate a v6 neigh entry for outside source ip.

Then it's going to affect bpf_fib_lookup:

  1. without bpf NS responder, bpf_fib_lookup returns 0, because kernel has the neigh record;
  2. with bpf NS responder, bpf_fib_lookup returns NO_NEIGH, and
    1. for kernel 5.10+, we have HAVE_FIB_INDEX to tolerate NO_NEIGH, so that cilium can call the bpf_redirecet_neigh to make things right
    2. for kernel 5.4, without HAVE_FIB_INDEX, cilium turns to use DIRECT_ROUTING_DEV_IFINDEX as fallback, which unfortunately is set to the index of eth1 instead of eth0 for some reasons. So skb ends up being redirected to a wrong netdev, breaking the connectivity.

#27642 is going to be helpful once we have the necessary kernel patch backport.

For now, I'm going to work another solution: bring back bpf NS responder only if the NS is asking for a pod; if NS is asking for the node IP, just hand it over to stack.

@jschwinger233
Copy link
Member Author

/ci-e2e

@jschwinger233
Copy link
Member Author

/ci-e2e

@jschwinger233
Copy link
Member Author

jschwinger233 commented Feb 23, 2024

CI was happy: https://github.com/cilium/cilium/actions/runs/8018397688/job/21904227831

Will add more bpf unit test and polish commit messages for review

jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Mar 5, 2024
[ upstream commit: dc9dfd7 ]

[ backporter's notes: in v1.15 icmp6_handle_ns() doesn't have
  ext_err parameter, so icmp6_host_handle() doesn't need to accept
  it either. ]

This reverts commit 6580714, to fix
the breakage of "IPv6 NS responder for pod" introduced by
cilium#12086 (bpf: Reply NA when recv ND for local IPv6 endpoints).

6580714 was merged to solve cilium#14509.
To not revive cilium#14509, this commit also passes through ICMPv6 NS if the
target is native node IP (eth0's addr). By letting stack take care of
those NS-for-node-IP packets, we managed to:

1. Solve cilium#14509 again, but in a way keeping NS responder. The cause of
   cilium#14509 was NS responder always generates ND whose source IP is
   "router_ip" (cilium_internal_ip) rather than "node_ip". Once we pass
   those NS-for-node-IP packets to stack, the ND response would
   naturally have "node_ip" as source.

2. Avoid the fib_lookup failure mentioned at cilium#30837 (comment).

icmp6_host_handle() also has a new parameter `handle_ns` to control if
we want NS responder to be active. If it is called from `to-netdev` code
path, handle_ns is set to false. This is suggested by julianwiedmann.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
@julianwiedmann julianwiedmann added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Mar 5, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.15 in 1.15.2 Mar 5, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.15 in 1.15.2 Mar 5, 2024
joestringer pushed a commit that referenced this pull request Mar 5, 2024
[ upstream commit: dc9dfd7 ]

[ backporter's notes: in v1.15 icmp6_handle_ns() doesn't have
  ext_err parameter, so icmp6_host_handle() doesn't need to accept
  it either. ]

This reverts commit 6580714, to fix
the breakage of "IPv6 NS responder for pod" introduced by
#12086 (bpf: Reply NA when recv ND for local IPv6 endpoints).

6580714 was merged to solve #14509.
To not revive #14509, this commit also passes through ICMPv6 NS if the
target is native node IP (eth0's addr). By letting stack take care of
those NS-for-node-IP packets, we managed to:

1. Solve #14509 again, but in a way keeping NS responder. The cause of
   #14509 was NS responder always generates ND whose source IP is
   "router_ip" (cilium_internal_ip) rather than "node_ip". Once we pass
   those NS-for-node-IP packets to stack, the ND response would
   naturally have "node_ip" as source.

2. Avoid the fib_lookup failure mentioned at #30837 (comment).

icmp6_host_handle() also has a new parameter `handle_ns` to control if
we want NS responder to be active. If it is called from `to-netdev` code
path, handle_ns is set to false. This is suggested by julianwiedmann.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
@github-actions github-actions bot added backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels Mar 5, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.15 to Backport done to v1.15 in 1.15.2 Mar 5, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Backport done to v1.15 in 1.15.2 Mar 5, 2024
jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Mar 6, 2024
[ upstream commit: dc9dfd7 ]

[ backporter's notes: in v1.14 icmp6_handle_ns() doesn't have
  ext_err parameter, so icmp6_host_handle() doesn't need to accept
  it either. ]

This reverts commit 6580714, to fix
the breakage of "IPv6 NS responder for pod" introduced by
cilium#12086 (bpf: Reply NA when recv ND for local IPv6 endpoints).

6580714 was merged to solve cilium#14509.
To not revive cilium#14509, this commit also passes through ICMPv6 NS if the
target is native node IP (eth0's addr). By letting stack take care of
those NS-for-node-IP packets, we managed to:

1. Solve cilium#14509 again, but in a way keeping NS responder. The cause of
   cilium#14509 was NS responder always generates ND whose source IP is
   "router_ip" (cilium_internal_ip) rather than "node_ip". Once we pass
   those NS-for-node-IP packets to stack, the ND response would
   naturally have "node_ip" as source.

2. Avoid the fib_lookup failure mentioned at cilium#30837 (comment).

icmp6_host_handle() also has a new parameter `handle_ns` to control if
we want NS responder to be active. If it is called from `to-netdev` code
path, handle_ns is set to false. This is suggested by julianwiedmann.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Mar 6, 2024
[ upstream commit: dc9dfd7 ]

[ backporter's notes: in v1.14 icmp6_handle_ns() doesn't have
  ext_err parameter, so icmp6_host_handle() doesn't need to accept
  it either. ]

This reverts commit 6580714, to fix
the breakage of "IPv6 NS responder for pod" introduced by
cilium#12086 (bpf: Reply NA when recv ND for local IPv6 endpoints).

6580714 was merged to solve cilium#14509.
To not revive cilium#14509, this commit also passes through ICMPv6 NS if the
target is native node IP (eth0's addr). By letting stack take care of
those NS-for-node-IP packets, we managed to:

1. Solve cilium#14509 again, but in a way keeping NS responder. The cause of
   cilium#14509 was NS responder always generates ND whose source IP is
   "router_ip" (cilium_internal_ip) rather than "node_ip". Once we pass
   those NS-for-node-IP packets to stack, the ND response would
   naturally have "node_ip" as source.

2. Avoid the fib_lookup failure mentioned at cilium#30837 (comment).

icmp6_host_handle() also has a new parameter `handle_ns` to control if
we want NS responder to be active. If it is called from `to-netdev` code
path, handle_ns is set to false. This is suggested by julianwiedmann.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Mar 6, 2024
[ upstream commit: dc9dfd7 ]

[ backporter's notes: in v1.14 icmp6_handle_ns() doesn't have
  ext_err parameter, so icmp6_host_handle() doesn't need to accept
  it either. ]

This reverts commit 6580714, to fix
the breakage of "IPv6 NS responder for pod" introduced by
cilium#12086 (bpf: Reply NA when recv ND for local IPv6 endpoints).

6580714 was merged to solve cilium#14509.
To not revive cilium#14509, this commit also passes through ICMPv6 NS if the
target is native node IP (eth0's addr). By letting stack take care of
those NS-for-node-IP packets, we managed to:

1. Solve cilium#14509 again, but in a way keeping NS responder. The cause of
   cilium#14509 was NS responder always generates ND whose source IP is
   "router_ip" (cilium_internal_ip) rather than "node_ip". Once we pass
   those NS-for-node-IP packets to stack, the ND response would
   naturally have "node_ip" as source.

2. Avoid the fib_lookup failure mentioned at cilium#30837 (comment).

icmp6_host_handle() also has a new parameter `handle_ns` to control if
we want NS responder to be active. If it is called from `to-netdev` code
path, handle_ns is set to false. This is suggested by julianwiedmann.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
@jschwinger233 jschwinger233 added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Mar 6, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Backport done to v1.15 in 1.15.2 Mar 6, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.8 Mar 6, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.8 Mar 6, 2024
jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Mar 7, 2024
[ upstream commit: dc9dfd7 ]

[ backporter's notes: in v1.14 icmp6_handle_ns() doesn't have
  ext_err parameter, so icmp6_host_handle() doesn't need to accept
  it either. ]

This reverts commit 6580714, to fix
the breakage of "IPv6 NS responder for pod" introduced by
cilium#12086 (bpf: Reply NA when recv ND for local IPv6 endpoints).

6580714 was merged to solve cilium#14509.
To not revive cilium#14509, this commit also passes through ICMPv6 NS if the
target is native node IP (eth0's addr). By letting stack take care of
those NS-for-node-IP packets, we managed to:

1. Solve cilium#14509 again, but in a way keeping NS responder. The cause of
   cilium#14509 was NS responder always generates ND whose source IP is
   "router_ip" (cilium_internal_ip) rather than "node_ip". Once we pass
   those NS-for-node-IP packets to stack, the ND response would
   naturally have "node_ip" as source.

2. Avoid the fib_lookup failure mentioned at cilium#30837 (comment).

icmp6_host_handle() also has a new parameter `handle_ns` to control if
we want NS responder to be active. If it is called from `to-netdev` code
path, handle_ns is set to false. This is suggested by julianwiedmann.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Mar 7, 2024
[ upstream commit: dc9dfd7 ]

[ backporter's notes: in v1.14 icmp6_handle_ns() doesn't have
  ext_err parameter, so icmp6_host_handle() doesn't need to accept
  it either. ]

This reverts commit 6580714, to fix
the breakage of "IPv6 NS responder for pod" introduced by
cilium#12086 (bpf: Reply NA when recv ND for local IPv6 endpoints).

6580714 was merged to solve cilium#14509.
To not revive cilium#14509, this commit also passes through ICMPv6 NS if the
target is native node IP (eth0's addr). By letting stack take care of
those NS-for-node-IP packets, we managed to:

1. Solve cilium#14509 again, but in a way keeping NS responder. The cause of
   cilium#14509 was NS responder always generates ND whose source IP is
   "router_ip" (cilium_internal_ip) rather than "node_ip". Once we pass
   those NS-for-node-IP packets to stack, the ND response would
   naturally have "node_ip" as source.

2. Avoid the fib_lookup failure mentioned at cilium#30837 (comment).

icmp6_host_handle() also has a new parameter `handle_ns` to control if
we want NS responder to be active. If it is called from `to-netdev` code
path, handle_ns is set to false. This is suggested by julianwiedmann.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
julianwiedmann pushed a commit that referenced this pull request Mar 8, 2024
[ upstream commit: dc9dfd7 ]

[ backporter's notes: in v1.14 icmp6_handle_ns() doesn't have
  ext_err parameter, so icmp6_host_handle() doesn't need to accept
  it either. ]

This reverts commit 6580714, to fix
the breakage of "IPv6 NS responder for pod" introduced by
#12086 (bpf: Reply NA when recv ND for local IPv6 endpoints).

6580714 was merged to solve #14509.
To not revive #14509, this commit also passes through ICMPv6 NS if the
target is native node IP (eth0's addr). By letting stack take care of
those NS-for-node-IP packets, we managed to:

1. Solve #14509 again, but in a way keeping NS responder. The cause of
   #14509 was NS responder always generates ND whose source IP is
   "router_ip" (cilium_internal_ip) rather than "node_ip". Once we pass
   those NS-for-node-IP packets to stack, the ND response would
   naturally have "node_ip" as source.

2. Avoid the fib_lookup failure mentioned at #30837 (comment).

icmp6_host_handle() also has a new parameter `handle_ns` to control if
we want NS responder to be active. If it is called from `to-netdev` code
path, handle_ns is set to false. This is suggested by julianwiedmann.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
@github-actions github-actions bot added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Mar 8, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Backport pending to v1.14 in 1.14.8 Mar 8, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Backport done to v1.14 in 1.14.8 Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.14 This issue affects v1.14 branch backport/author The backport will be carried out by the author of the PR. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. feature/ipv6 Relates to IPv6 protocol support kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.14.8
Backport done to v1.14
1.15.2
Backport done to v1.15
Development

Successfully merging this pull request may close these issues.

Cilium doesn't send neighbor advertisements in response to neighbor solicitiations for pod IPv6 IPs
2 participants