New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: remove special handle for ICMPv6 echo targeting router IPv6 #24921
Conversation
eacecbb
to
9f26194
Compare
9f26194
to
aad3aca
Compare
c0d2992
to
a651fbc
Compare
@@ -78,7 +78,6 @@ enum { | |||
|
|||
#define CILIUM_CALL_DROP_NOTIFY 1 | |||
#define CILIUM_CALL_ERROR_NOTIFY 2 | |||
#define CILIUM_CALL_SEND_ICMP6_ECHO_REPLY 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we just leave a gap here, or change all the macro definitions below.
For example, we change #define CILIUM_CALL_HANDLE_ICMP6_NS 4
to #define CILIUM_CALL_HANDLE_ICMP6_NS 3
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure if that can cause issues on upgrade. Maybe @ti-mo knows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping! Unfortunately, I think we may need to renumber to cause a change in CILIUM_CALL_SIZE, which will force map migration to recreate the prog array from scratch. That is, CILIUM_CALL_SIZE needs to differ from whatever 1.13 version is used during the upgrade test. v1.13.2 has CILIUM_CALL_SIZE = 40
, so map migration will still take place if we drop it back down to 42 on main. (To be clear, we'll be fine if we don't touch CILIUM_CALL_SIZE
and leave a gap here as well.)
#20691 is relevant here, we should be able to get that done over the coming weeks. This should remove another potential cause of missed tail calls and undefined behaviour during minor upgrades when tail call maps are currently being re-used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your information @ti-mo , that was very helpful. Just one thing to confirm, does changing CILIUM_CALL_SIZE
break the upgrade test? The test suite Tests upgrade and downgrade from a Cilium stable image to master
failed because migrate-svc had restarted unexpectedly, do you have any clue where I can look into?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To pass upgrade test suite, I leave a gap here with additional comments to explain what happened here lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it fail on every run with the change to CILIUM_CALL_SIZE
included? The run you linked seems to fail on downgrade. Either way, the prog array gets recreated when switching between main/master and 1.13, so there shouldn't be any missed tail calls.
Looking at the cilium-metrics-list.md
files in the sysdump seems to corroborate this, there are no cilium_drop_
entries with reason=Missed tail call
(these are normally echoed in the test stdout), but there are other drop reasons that might be of interest in finding the cause.
You may be seeing a non-tail call-related flake that may or may not be caused by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your help Timo, I ran K8sUpdates locally with new CILIUM_CALL_SIZE
for 3 times and I'm afraid K8sUpdates fails on every run LOL
a651fbc
to
cd2316b
Compare
@@ -78,7 +78,6 @@ enum { | |||
|
|||
#define CILIUM_CALL_DROP_NOTIFY 1 | |||
#define CILIUM_CALL_ERROR_NOTIFY 2 | |||
#define CILIUM_CALL_SEND_ICMP6_ECHO_REPLY 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure if that can cause issues on upgrade. Maybe @ti-mo knows?
/test |
cd2316b
to
c87407a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
c87407a
to
27766bf
Compare
721d0cd
to
96fac18
Compare
@@ -78,7 +78,15 @@ enum { | |||
|
|||
#define CILIUM_CALL_DROP_NOTIFY 1 | |||
#define CILIUM_CALL_ERROR_NOTIFY 2 | |||
#define CILIUM_CALL_SEND_ICMP6_ECHO_REPLY 3 | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, this is a bit much for a comment, as I can just git-blame to see what previous MACRO was there.
Tho, I don't really feel to strongly about it, I think you could just say the bits about the gap, and the concerns around the backporting.
This is a total nit, so feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, love when code gets simpler 👍
96fac18
to
085fdf7
Compare
/test |
Previously we didn't have any network interface whose IPv6 was set to router IPv6, so the special handle for ICMPv6 echo whose destination address is router IPv6 was required, as kernel couldn't handle such ICMPv6 echo; but it's no more the case since PR cilium#24208 was merged, now cilium_host has router IPv6 instead of native IPv6, making kernel capable of handling this ICMPv6. Therefore, this commit removes code relevant to this special handle, including several functions such as icmp6_send_echo_reply, tail_icmp6_send_echo_reply, __icmp6_send_echo_reply. Macro SKIP_ICMPV6_ECHO_HANDLING and CILIUM_CALL_SEND_ICMP6_ECHO_REPLY are also obsolete and deleted. Deletion of macro CILIUM_CALL_SEND_ICMP6_ECHO_REPLY leaves a gap in the sequence of numbers, and we don't renumber the other macros in order to pass the CI tests, otherwise the K8sUpdates test suite would fail due to "migrate-svc restart count values do not match" Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
085fdf7
to
cc8c5c0
Compare
/test |
/test-1.25-5.4 |
/test-1.27-net-next Don't know why there are 3 CI jobs not triggered by /test |
/test-1.27-net-next |
/test-1.26-4.19 |
The failed CI job for k8s-1.27-kernel-net-next (test-1.27-net-next) is expected because we reverted the 1.27 support. |
Does this fix IPv6 Neighbour Solicitation #16285? When I deploy an EKS IPv6-only cluster, I see connectivity issues between pods with Cilium 1.13.3. But when I deploy Cilium 1.14.0-snapshot.2, everything works fine. |
Previously we don't have any network interface whose ipv6 is set to router ipv6, so the special handle for icmp6 echo whose destination address is router ipv6 was required, as kernel couldn't handle such icmp6 echo; but it's no more the case since PR #24208 was merged, now cilium_host has router ipv6 instead of native ipv6, making kernel capable of dealing with this icmp6.
Therefore, this commit removes code relevant to this special handle, including several functions such as icmp6_send_echo_reply, tail_icmp6_send_echo_reply, __icmp6_send_echo_reply.
Macro SKIP_ICMPV6_ECHO_HANDLING and CILIUM_CALL_SEND_ICMP6_ECHO_REPLY are also redundant and deleted.
Signed-off-by: Zhichuan Liang gray.liang@isovalent.com