Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vishvananda/netlink #31671

Merged
merged 1 commit into from
May 29, 2024
Merged

Conversation

foyerunix
Copy link

Hello,

Fixes: #30744

When queried for the routes of an unknow IP family the kernel return the routes for all IP families.
This end up returning the IPv4 routes when IPv6 is disabled at boot time. Therefore calling DeleteRouteTable with family = AF_INET6 will delete IPv4 routes too, breaking the DNS proxy.

I didn't include all the new code from the library yet as the contribution guide state that:

If the PR is considerably large (e.g. with more than 200 lines changed and/or more than 6 commits), consider whether there is a good way to split the PR into smaller PRs that can be merged more incrementally.

Unfortunately I don't see how to split the whole library update into smaller chunks. I also can't guarantee that the updated library code, if included fully, won't cause any regression. Therefore I prefer to run the CI with my changes only first.

Fix DNS proxy regression from Cilium 1.15 on IPv4 only nodes

Best Regards,

@foyerunix foyerunix requested a review from a team as a code owner March 29, 2024 08:58
@foyerunix foyerunix requested a review from sayboras March 29, 2024 08:58
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 29, 2024
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Mar 29, 2024
@sayboras sayboras added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Mar 30, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 30, 2024
@maintainer-s-little-helper
Copy link

Commit 84473ba does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 30, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 30, 2024
@sayboras
Copy link
Member

I misclicked the update branch button, which adds one merge commit for main branch. I did a force push to your remote branch to rectify it, additionally, we can merge 2 commits into one.

@sayboras
Copy link
Member

/test

@sayboras sayboras force-pushed the update-vishvananda-netlink branch 2 times, most recently from 39c2daf to e984789 Compare March 30, 2024 01:00
@sayboras
Copy link
Member

/test

@sayboras
Copy link
Member

This change in vendor looks good to me, but no harm to have another check from others who involve in triaging the mentioned issue.

@julianwiedmann julianwiedmann added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. feature/ipv6 Relates to IPv6 protocol support needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch and removed release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Apr 2, 2024
@julianwiedmann julianwiedmann requested a review from rgo3 April 2, 2024 10:17
@julianwiedmann julianwiedmann added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Apr 2, 2024
@sayboras sayboras removed their request for review April 2, 2024 12:48
@rgo3
Copy link
Contributor

rgo3 commented May 13, 2024

I've been triggering CI here from time to time over the last two weeks but also can't really pinpoint a clear area that's breaking or deduce a simple reproducer. But its definitely multiple tests that keep failing, not just upgrade/downgrade tests. Potentially you could try to bisect the commits between current commit that cilium pulls from the netlink library and your latest fix. Happy to keep running CI with the revisions we need to test. Once we have a commit in the library that breaks cilium, it hopefully will be easier to locate why that change breaks so many tests.

@rgo3
Copy link
Contributor

rgo3 commented May 15, 2024

I think vishvananda/netlink@916f968 is the commit in the netlink lib that is causing the troubles for this PR. At least bisecting lead me to that commit. Can't really say what exactly the problematic part is that causes the issues in CI, but the code added to catch EAGAIN and retry seems problematic regardless for cilium as that creates a busy loop. Will update this PR if I make any further progress with debugging.

@georglauterbach
Copy link

Thank you very much @rgo3! ❤️

@julianwiedmann
Copy link
Member

Note that if we want to unblock this sooner, one option is to switch over to our https://github.com/cilium/netlink fork again and revert the offending patch there.

@julianwiedmann
Copy link
Member

@julianwiedmann what's the state on this? #30744 is a serious blocker - as of now, you cannot use the DNS proxy on an IPv4-only cluster. Running the proxy on an IPv4-only cluster is, I guess, rather common (for those that do not want nor need IPv6).

If IPv4-only is a common environment that folks care about, then I think it would be great to reflect this via contributions towards #30327. Otherwise we'll most likely continue to break it once in a while 🤷.

@georglauterbach
Copy link

If IPv4-only is a common environment that folks care about, then I think it would be great to reflect this via contributions towards #30327. Otherwise we'll most likely continue to break it once in a while 🤷.

I understand, fair point. Sadly, I have neither the time nor the Go experience to do that :( Hopefully someone can tackle this :)

@foyerunix foyerunix force-pushed the update-vishvananda-netlink branch 2 times, most recently from d0f28c0 to d098868 Compare May 28, 2024 13:09
@foyerunix
Copy link
Author

foyerunix commented May 28, 2024

Hello,

Someone reverted the offending commit in the netlink library: vishvananda/netlink#976

I just rebased the branch and updated the library to the latest version.

@julianwiedmann let's CI this please ! 🙂

Best Regards.

@julianwiedmann
Copy link
Member

/test

@julianwiedmann
Copy link
Member

Looks much better :).

The ci-e2e-upgrade is a known flake, so just the go module thing that needs sorting out. Over to sig/vendor.

@julianwiedmann julianwiedmann removed the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label May 28, 2024
Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution and endless effort to drive this one through.

@sayboras
Copy link
Member

Can you help to re-run go mod tidy and go mod vendor to sync up vendor dir?

$ git status            
On branch update-vishvananda-netlink
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        deleted:    vendor/github.com/vishvananda/netlink/cmd/ipset-test/main.go

When queried for the routes of an unknown IP family the
kernel return the routes for all IP families.

This end up returning the IPv4 routes when IPv6 is disabled
at boot time. Therefore calling DeleteRouteTable with
family = AF_INET6 will delete IPv4 routes too, breaking
the DNS proxy.

Fixes: cilium#30744

Signed-off-by: Foyer Unix <foyerunix@foyer.lu>
@foyerunix
Copy link
Author

Hello,

I think the issue should be solved. Not being a Go developer myself, I didn't know the existence of go mod vendor, therefore I tried to do the process manually.

Best Regards.

@sayboras
Copy link
Member

/test

@sayboras sayboras added this pull request to the merge queue May 29, 2024
Merged via the queue into cilium:main with commit 6ed0098 May 29, 2024
63 of 64 checks passed
@joamaki joamaki mentioned this pull request May 30, 2024
6 tasks
@joamaki joamaki added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels May 30, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.15 in 1.15.6 May 30, 2024
@github-actions github-actions bot added backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels May 31, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Backport pending to v1.15 in 1.15.6 May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. feature/ipv6 Relates to IPv6 protocol support kind/bug This is a bug in the Cilium logic. kind/community-contribution This was a contribution made by a community member. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
Status: Released
Development

Successfully merging this pull request may close these issues.

[1.15] DNS proxy doesn't get traffic when ipv6 is disabled in kernel
6 participants