Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change cilium_host IPv6 address #24208

Merged
merged 4 commits into from Apr 13, 2023

Conversation

jschwinger233
Copy link
Member

@jschwinger233 jschwinger233 commented Mar 7, 2023

Originally the IPv6 of cilium_host is set to a native address. Instead, this commit uses an IPv6 allocated from IPAM for
it, which is the same way we set IPv4 address for cilium_host.

Fixes: #23445
Fixes: #21954
Fixes: #23461

Change cilium_host IPv6 address, use node router IPv6 instead of native node IPv6, and fixed several relative IPv6 issues.

Signed-off-by: Zhichuan Liang gray.liang@isovalent.com

@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Mar 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 8, 2023
@jschwinger233
Copy link
Member Author

jschwinger233 commented Mar 8, 2023

One thing left to do is kind of tricky and I may need some help. @julianwiedmann @brb

The problem is, after simply changing the cilium_host ipv6, route table can't be updated by cilium-agent.

In a two-node cluster, I expect two crucial routes on each host:

$cidr_local via $ipv6_cilium_host dev cilium_host
$cidr_remote via $ipv6_ciilum_host dev cilium_host mtu 1450

But there is nothing.

Cilium-agent's journal log shows failure to update routes:

Mar 08 03:36:42 k8s1 cilium-agent[10142]: {Ifindex: 10 Dst: fd04::1:0/112 Src: fd00::b Gw: fd04::8dd1 Flags: [] Table: 0 Realm: 0}
Mar 08 03:36:42 k8s1 cilium-agent[10142]: level=warning msg="Unable to update route" error="invalid argument" interface=cilium_host local="fd00::b" nexthop="fd04::8dd1" prefix="{fd04::1:0 ffffffffffffffffffffffffffff0000}" subsys=linux-datapath

If I try to add route by ip(8) command line, such as ip -6 r a fd04::0:0/112 dev cilium_host via fd04::1:d01b, kernel will reject the request with error Error: Gateway can not be a local address. I believe this is the same error cilium-agent hit.

I looked into the kernel source, it seems the error happens here: https://github.com/torvalds/linux/blob/v6.1-rc3/net/ipv6/route.c#L3438-L3441

I worked out a way to workaround it, it's just to temporarily delete ipv6 from cilium_host, and add ipv6 back after adding routes, something like this:

# temporarily delete ipv6
ip a d $ipv6 dev cilium_host

# add route for $ipv6, or kernel will complain "no route to host" when adding route for $cidr
ip r a $ipv6 dev cilium_host

# now we can add route for $cidr
ip r a $cidr_local dev cilium_host via $ipv6
ip r a $cidr_remote dev cilium_host via $ipv6 

# finally we add back $ipv6
ip a a $ipv6 dev cilium_host

But I don't think this approach is acceptable, as cilium-agent is supposed to maintain route table periodically, so even if we succeed in adding routes for one time, there are still loads of errors once we add back the ipv6 for cilium_host.

@brb
Copy link
Member

brb commented Mar 8, 2023

@jschwinger233 👋 A few suggestions / notes:

  • Have you tried dropping the via $GW?
  • What is scope of the v6 addrs? global?
  • The route installation is done in pkg/datapath/linux/node.go

@jschwinger233
Copy link
Member Author

jschwinger233 commented Mar 8, 2023

Hi @brb, thanks for the suggestions. You got up so early - -

Have you tried dropping the via $GW?

Yes, still failed. Wait, let me confirm.

What is scope of the v6 addrs? global?

global

@brb
Copy link
Member

brb commented Mar 8, 2023

It's still an early morning here 😅 , but the following seems to work for the $cidr_local via $ipv6_cilium_host dev cilium_host case:

# ip -6 -o r
...
fd00:10:244:1::/64 dev cilium_host proto kernel metric 256 pref medium

# ip -6 -o a
...
3: cilium_host    inet6 fd00:10:244:1::3851/64 scope global \       valid_lft forever preferred_lft forever

# ip r g fd00:10:244:1::5629 (LOCAL POD IP)
fd00:10:244:1::5629 from :: dev cilium_host proto kernel src fd00:10:244:1::3851 metric 256 pref medium

@jschwinger233
Copy link
Member Author

@brb Yes I know it's 8am in the morning, and I usually don't get up at that time, sorry to let you feel that was satire.

I just tried ip r a $ipv6 dev cilium_host and it seems working. Shame that I didn't come up with such obvious idea, thanks for your insight, let me revise the cilium-agent.

@brb
Copy link
Member

brb commented Mar 8, 2023

Please keep in mind that I haven't tried remote-cidr case

@jschwinger233
Copy link
Member Author

Please keep in mind that I haven't tried remote-cidr case

For remote cidr, we have to indicate src $native_ipv6 to make connection between host and remote pod.

@jschwinger233 jschwinger233 changed the title [WIP] Change cilium_host IPv6 address Change cilium_host IPv6 address Mar 8, 2023
@jschwinger233 jschwinger233 marked this pull request as ready for review March 8, 2023 10:05
@jschwinger233 jschwinger233 requested review from a team as code owners March 8, 2023 10:05
@brb
Copy link
Member

brb commented Mar 8, 2023

I guess the runtime tests failed due to eth0 instead of cilium_host v6 addr being used:

/home/jenkins/workspace/Cilium-PR-Runtime-net-next/runtime-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:515
Expected command: docker exec -i  app2 curl --path-as-is -s -D /dev/stderr --fail --connect-timeout 5 --max-time 20 [http://[f00d::1:1]:80/public](http://[f00d::1:1]/public) -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}',Transfer '%{time_starttransfer}', total '%{time_total}'" 
To succeed, but it failed:
Exitcode: 28 
Err: Process exited with status 28
Stdout:
 	 time-> DNS: '0.000015()', Connect: '0.000000',Transfer '0.000000', total '5.001701'
Stderr:

This is set by https://github.com/cilium/cilium/blob/master/test/runtime/lb.go#L386


UPDATE: sent the PR to remove the test cases #24245 (we are planning to get rid of the runtime tests).

@brb
Copy link
Member

brb commented Mar 8, 2023

The ci-datapath failure looks legit too (https://github.com/cilium/cilium/actions/runs/4363707535/jobs/7632661159):

command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --output /dev/null --connect-timeout 2 --max-time 10 http://[fc00:f853:ccd:e793::5]:32460" failed: command terminated with exit code 28

It should be this test case https://github.com/cilium/cilium-cli/blob/master/connectivity/tests/service.go#L137.

The relevant configuration which was used:

            name: '8'
            kernel: 'bpf-next-main'
            kube-proxy: 'iptables'
            kpr: 'disabled'
            tunnel: 'geneve'
            endpoint-routes: 'true'

@jschwinger233
Copy link
Member Author

One thing left to do is kind of tricky and I may need some help. @julianwiedmann @brb

The problem is, after simply changing the cilium_host ipv6, route table can't be updated by cilium-agent.

In a two-node cluster, I expect two crucial routes on each host:

$cidr_local via $ipv6_cilium_host dev cilium_host
$cidr_remote via $ipv6_ciilum_host dev cilium_host mtu 1450

But there is nothing.

Cilium-agent's journal log shows failure to update routes:

Mar 08 03:36:42 k8s1 cilium-agent[10142]: {Ifindex: 10 Dst: fd04::1:0/112 Src: fd00::b Gw: fd04::8dd1 Flags: [] Table: 0 Realm: 0}
Mar 08 03:36:42 k8s1 cilium-agent[10142]: level=warning msg="Unable to update route" error="invalid argument" interface=cilium_host local="fd00::b" nexthop="fd04::8dd1" prefix="{fd04::1:0 ffffffffffffffffffffffffffff0000}" subsys=linux-datapath

If I try to add route by ip(8) command line, such as ip -6 r a fd04::0:0/112 dev cilium_host via fd04::1:d01b, kernel will reject the request with error Error: Gateway can not be a local address. I believe this is the same error cilium-agent hit.

I looked into the kernel source, it seems the error happens here: https://github.com/torvalds/linux/blob/v6.1-rc3/net/ipv6/route.c#L3438-L3441

I worked out a way to workaround it, it's just to temporarily delete ipv6 from cilium_host, and add ipv6 back after adding routes, something like this:

# temporarily delete ipv6
ip a d $ipv6 dev cilium_host

# add route for $ipv6, or kernel will complain "no route to host" when adding route for $cidr
ip r a $ipv6 dev cilium_host

# now we can add route for $cidr
ip r a $cidr_local dev cilium_host via $ipv6
ip r a $cidr_remote dev cilium_host via $ipv6 

# finally we add back $ipv6
ip a a $ipv6 dev cilium_host

But I don't think this approach is acceptable, as cilium-agent is supposed to maintain route table periodically, so even if we succeed in adding routes for one time, there are still loads of errors once we add back the ipv6 for cilium_host.

@NikAleksandrov Hi Nikolay, could you shed some light on this issue? I actually realized this via $gw is essential to pass the connectivity test, but kernel won't allow us to do so for ipv6. Basically ipv4 has the similar route like 10.10.0.0/24 via 10.10.0.1 dev cilium_host instead of 10.10.0.0/24 dev cilium_host, so a more general question is, what's the key difference between those two patterns?

@jschwinger233
Copy link
Member Author

jschwinger233 commented Mar 9, 2023

The ci-datapath failure looks legit too (https://github.com/cilium/cilium/actions/runs/4363707535/jobs/7632661159)

@brb I can reproduce this error locally, and it's seems related to via $gw issue. ~~Once I add the routes with via, the no-policies-extra/pod-to-local-nodeport test will pass. ~~

Now I'm looking into how these two routes can influence L2 and L3 header in different way. Any progress will let you know.

@jschwinger233
Copy link
Member Author

jschwinger233 commented Mar 9, 2023

@brb Got some clues.

The datapath is lxc -> stack -> cillium_host, and it's kernel stack where DNAT and SNAT both happen. DNAT replaces dest ip with real upstream pod ip, and SNAT replaces source ip with $ipv6_cilium_host. Compared to ipv4 datapath, this seems good, but at present cilium can't handle tcp segment with source ip $ipv6_cilium_host.

Let me elaborate it.

According to my observation, after a tcp syn with ipv6 header $ipv6_cilium_host -> $ipv6_upstream_pod is sent to cilium_host, cilium_host will reply a tcp syn with ipv6 header $ipv6_upstream_pod -> $ipv6_native, instead of $ipv6_upstream_pod -> $ipv6_cilium_host. That the place tcp connection breaks.

This is not a problem before because $ipv6_cilium_host == $ipv6_native, but now we have to take care of it.

@NikAleksandrov
Copy link

One thing left to do is kind of tricky and I may need some help. @julianwiedmann @brb
The problem is, after simply changing the cilium_host ipv6, route table can't be updated by cilium-agent.
In a two-node cluster, I expect two crucial routes on each host:

$cidr_local via $ipv6_cilium_host dev cilium_host
$cidr_remote via $ipv6_ciilum_host dev cilium_host mtu 1450

But there is nothing.
Cilium-agent's journal log shows failure to update routes:

Mar 08 03:36:42 k8s1 cilium-agent[10142]: {Ifindex: 10 Dst: fd04::1:0/112 Src: fd00::b Gw: fd04::8dd1 Flags: [] Table: 0 Realm: 0}
Mar 08 03:36:42 k8s1 cilium-agent[10142]: level=warning msg="Unable to update route" error="invalid argument" interface=cilium_host local="fd00::b" nexthop="fd04::8dd1" prefix="{fd04::1:0 ffffffffffffffffffffffffffff0000}" subsys=linux-datapath

If I try to add route by ip(8) command line, such as ip -6 r a fd04::0:0/112 dev cilium_host via fd04::1:d01b, kernel will reject the request with error Error: Gateway can not be a local address. I believe this is the same error cilium-agent hit.
I looked into the kernel source, it seems the error happens here: https://github.com/torvalds/linux/blob/v6.1-rc3/net/ipv6/route.c#L3438-L3441
I worked out a way to workaround it, it's just to temporarily delete ipv6 from cilium_host, and add ipv6 back after adding routes, something like this:

# temporarily delete ipv6
ip a d $ipv6 dev cilium_host

# add route for $ipv6, or kernel will complain "no route to host" when adding route for $cidr
ip r a $ipv6 dev cilium_host

# now we can add route for $cidr
ip r a $cidr_local dev cilium_host via $ipv6
ip r a $cidr_remote dev cilium_host via $ipv6 

# finally we add back $ipv6
ip a a $ipv6 dev cilium_host

But I don't think this approach is acceptable, as cilium-agent is supposed to maintain route table periodically, so even if we succeed in adding routes for one time, there are still loads of errors once we add back the ipv6 for cilium_host.

@NikAleksandrov Hi Nikolay, could you shed some light on this issue? I actually realized this via $gw is essential to pass the connectivity test, but kernel won't allow us to do so for ipv6. Basically ipv4 has the similar route like 10.10.0.0/24 via 10.10.0.1 dev cilium_host instead of 10.10.0.0/24 dev cilium_host, so a more general question is, what's the key difference between those two patterns?

Hi, using via $gw IMO for this case mostly influences the choice of address (as also Joe mentioned over slack). This guarantees the right address will be chosen. Although I cannot guarantee that's the reason it's used because I haven't looked into what would change and what exactly relies on it. Since you have a test that's breaking if it's not used I'd look into what changes and where the packet gets dropped.
@jschwinger233 does dropping "via $gw" break it for the IPv4 case as well?

@dylandreimerink
Copy link
Member

/test-runtime

@dylandreimerink dylandreimerink merged commit 03e9414 into cilium:master Apr 13, 2023
57 checks passed
jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Apr 27, 2023
Previously we didn't have any network interface whose IPv6 was set to
router IPv6, so the special handle for ICMPv6 echo whose destination
address is router IPv6 was required, as kernel couldn't handle such
ICMPv6 echo; but it's no more the case since PR cilium#24208 was merged, now
cilium_host has router IPv6 instead of native IPv6, making kernel
capable of handling this ICMPv6.

Therefore, this commit removes code relevant to this special handle,
including several functions such as icmp6_send_echo_reply,
tail_icmp6_send_echo_reply, __icmp6_send_echo_reply.

Macro SKIP_ICMPV6_ECHO_HANDLING and CILIUM_CALL_SEND_ICMP6_ECHO_REPLY
are also obsolete and deleted.

Deletion of macro CILIUM_CALL_SEND_ICMP6_ECHO_REPLY leaves a gap in the
sequence of numbers, and we don't renumber the other macros in order to
pass the CI tests, otherwise the K8sUpdates test suite would fail due to
"migrate-svc restart count values do not match"

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
pchaigno pushed a commit that referenced this pull request Apr 27, 2023
Previously we didn't have any network interface whose IPv6 was set to
router IPv6, so the special handle for ICMPv6 echo whose destination
address is router IPv6 was required, as kernel couldn't handle such
ICMPv6 echo; but it's no more the case since PR #24208 was merged, now
cilium_host has router IPv6 instead of native IPv6, making kernel
capable of handling this ICMPv6.

Therefore, this commit removes code relevant to this special handle,
including several functions such as icmp6_send_echo_reply,
tail_icmp6_send_echo_reply, __icmp6_send_echo_reply.

Macro SKIP_ICMPV6_ECHO_HANDLING and CILIUM_CALL_SEND_ICMP6_ECHO_REPLY
are also obsolete and deleted.

Deletion of macro CILIUM_CALL_SEND_ICMP6_ECHO_REPLY leaves a gap in the
sequence of numbers, and we don't renumber the other macros in order to
pass the CI tests, otherwise the K8sUpdates test suite would fail due to
"migrate-svc restart count values do not match"

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
jschwinger233 added a commit to jschwinger233/cilium that referenced this pull request Jun 7, 2023
Prior to 1.14, Cilium set the cilium_host IPv6 addr to the same one as
the native iface, but cilium#24208
replaces the native IPv6 with the one allocated from IPAM. That change
breaks the downgrade path due to failures on installing CIDR routes.

To fix the downgrade path, the ideal way is to delete the stale IPv6 on
cilium_host, as long as the IPv6 is from IPAM; but in practical, we
don't have a perfect approach to tell if an IPv6 is from IPAM due to the
complicated situations for multi-pool IPAM, ENI IPAM, and so on.

Therefore, this commit deletes global scope IPv6 on cilium_host as long
as the address is not the one we want. We believe this is so far the
most robust way to make sure stale addresses are gone.

Fixes: cilium#25938

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
dylandreimerink pushed a commit that referenced this pull request Jun 8, 2023
Prior to 1.14, Cilium set the cilium_host IPv6 addr to the same one as
the native iface, but #24208
replaces the native IPv6 with the one allocated from IPAM. That change
breaks the downgrade path due to failures on installing CIDR routes.

To fix the downgrade path, the ideal way is to delete the stale IPv6 on
cilium_host, as long as the IPv6 is from IPAM; but in practical, we
don't have a perfect approach to tell if an IPv6 is from IPAM due to the
complicated situations for multi-pool IPAM, ENI IPAM, and so on.

Therefore, this commit deletes global scope IPv6 on cilium_host as long
as the address is not the one we want. We believe this is so far the
most robust way to make sure stale addresses are gone.

Fixes: #25938

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/ipv6 Relates to IPv6 protocol support ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
8 participants