Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctmap: do not call InitMapInfo() in init() #15590

Merged
merged 1 commit into from Apr 9, 2021
Merged

Conversation

kkourt
Copy link
Contributor

@kkourt kkourt commented Apr 7, 2021

PR #14721 introduced changes that removed the
cilium_snat_v{4,6}_external maps if NodePort is not enabled.

Issue #15141 was attributed to the above PR, where controllers were
failing with:

error=Unable to get object /sys/fs/bpf/tc/globals/cilium_snat_v4_external: no such file or directory name=bpf-map-sync-cilium_snat_v4_external

In an attempt to address #15141, #15175 added a nodeport argument to
InitMapInfo() so that the cilium_snat_v{4,6}_external maps are not
created if NodePort is not enabled.

PR #15141 did not fix the issue: #15337 (comment)

This PR takes another shot at fixing #15141 by removing a call of
InitMapInfo from init(), where the (new) nodeport argument is always
true.

see the commit message for more details

Fixes: cac5218
Fixes: d639905
Fixes: #15141
Fixes: #15337

Signed-off-by: Kornilios Kourtis kornilios@isovalent.com

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 7, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Apr 7, 2021
@kkourt
Copy link
Contributor Author

kkourt commented Apr 7, 2021

test-me-please

@brb
Copy link
Member

brb commented Apr 7, 2021

This will fix #15337, right?

@kkourt kkourt force-pushed the pr/kkourt/ctmapinit branch 4 times, most recently from 8381ed9 to e50fd3e Compare April 7, 2021 16:15
@kkourt kkourt added the release-note/misc This PR makes changes that have no direct user impact. label Apr 7, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 7, 2021
@kkourt
Copy link
Contributor Author

kkourt commented Apr 7, 2021

This will fix #15337, right?

That's the intention, yes.

@kkourt
Copy link
Contributor Author

kkourt commented Apr 7, 2021

test-me-please

@kkourt kkourt force-pushed the pr/kkourt/ctmapinit branch 2 times, most recently from f877ec0 to 722000a Compare April 8, 2021 13:45
@kkourt
Copy link
Contributor Author

kkourt commented Apr 8, 2021

test-me-please

@kkourt
Copy link
Contributor Author

kkourt commented Apr 8, 2021

test-me-please

PR #14721 introduced changes that removed the
cilium_snat_v{4,6}_external maps if NodePort is not enabled.

Issue #15141 was attributed to the above PR, where controllers were
failing with:
> error=Unable to get object /sys/fs/bpf/tc/globals/cilium_snat_v4_external: no such file or directory  name=bpf-map-sync-cilium_snat_v4_external

In an attempt to fix #15141, #15175 added a nodeport argument to
InitMapInfo() so that the cilium_snat_v{4,6}_external maps are not
created if NodePort is not enabled.

PR #15141 did not fix the issue: #15337 (comment)

This PR takes another shot at fixing #15141 by removing a call of
InitMapInfo from init(), where the (new) nodeport argument is always
true.

Not that in init(), option.Config.EnableNodePort is not properly set
yet, so we cannot pass the config option because it would always be
false.

For this change to properly work, this patch also adds explicit
InitMapInfo() calls since removing init() means that this function is
called in contexts such as tests and the cli.

Fixes: cac5218
Fixes: d639905
Fixes: #15141
Fixes: #15337

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
@kkourt
Copy link
Contributor Author

kkourt commented Apr 8, 2021

test-me-please

@kkourt
Copy link
Contributor Author

kkourt commented Apr 9, 2021

test-gke

@kkourt
Copy link
Contributor Author

kkourt commented Apr 9, 2021

retest-net-next

@kkourt
Copy link
Contributor Author

kkourt commented Apr 9, 2021

retest-runtime

@kkourt
Copy link
Contributor Author

kkourt commented Apr 9, 2021

There are currently two failures: one for gke-stable with a "Build Scaling cluster failed" and another one that looks like a flake: #15621. So, I will mark the PR as Ready for review.

@kkourt kkourt marked this pull request as ready for review April 9, 2021 13:27
@kkourt kkourt requested a review from a team April 9, 2021 13:27
@kkourt kkourt requested review from a team as code owners April 9, 2021 13:27
@kkourt kkourt requested a review from jibi April 9, 2021 13:27
@brb
Copy link
Member

brb commented Apr 12, 2021

@kkourt I think there is still one case where we might wrongly initialize the maps. We do ctmap.InitMapInfo() before finishKubeProxyReplacement() which might disable the NodePort BPF.

@kkourt
Copy link
Contributor Author

kkourt commented Apr 12, 2021

I guess you mean finishKubeProxyReplacementInit()

Here's the order of events:

I think if the maps are initialized before we disable NodePort BPF, then it would mean that we will not delete the maps. So I think that the map controller error might not trigger. On the other hand, this would mean that for the case were NodePort is disabled late (i.e., in finishKubeProxyReplacementInit) we do not delete the snat maps.

@kkourt
Copy link
Contributor Author

kkourt commented Apr 12, 2021

I would suggest that, assuming this PR fixes the faulting map controller, we should tackle the issue mentioned by @brb
by restructuring initialization #13573 and properly deal with the various dependencies.

@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.6 Apr 13, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.6 Apr 13, 2021
This was referenced Apr 13, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Backport done to v1.9 in 1.9.6 Apr 14, 2021
@joestringer joestringer added this to Needs backport from master in 1.8.10 Apr 20, 2021
@joestringer joestringer removed this from Needs backport from master in 1.8.9 Apr 20, 2021
@joestringer joestringer moved this from Needs backport from master to Backport done to v1.8 in 1.8.10 May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.8.10
Backport done to v1.8
1.9.6
Backport done to v1.9
9 participants