Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing disabled maps causes controllers to fail #15141

Closed
aanm opened this issue Mar 1, 2021 · 5 comments · Fixed by #15175 or #15590
Closed

Removing disabled maps causes controllers to fail #15141

aanm opened this issue Mar 1, 2021 · 5 comments · Fixed by #15175 or #15590
Assignees
Labels
kind/bug This is a bug in the Cilium logic.

Comments

@aanm
Copy link
Member

aanm commented Mar 1, 2021

Removing disabled maps in

for _, m := range maps {
p := path.Join(bpf.MapPrefixPath(), m)
if _, err := os.Stat(p); !os.IsNotExist(err) {
ms.RemoveMapPath(p)
}
}

Causes the controllers to fail:

2021-02-26T14:24:45.460845518Z level=debug msg="Controller func execution time: 15.392µs" name=bpf-map-sync-cilium_snat_v4_external subsys=controller uuid=946f5dda-bd86-44d5-aa8d-7d5ee12daaed
2021-02-26T14:24:45.460882480Z level=debug msg="Controller run failed" consecutiveErrors=1 error="Unable to get object /sys/fs/bpf/tc/globals/cilium_snat_v4_external: no such file or directory" name=bpf-map-sync-cilium_snat_v4_external subsys=controller uuid=946f5dda-bd86-44d5-aa8d-7d5ee12daaed

as seen in our CI https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/799/testReport/junit/Suite-k8s-1/20/K8sServicesTest_Checks_service_across_nodes_Checks_ClusterIP_Connectivity/

This change seems to have been introduced by cac5218

cc @brb @pchaigno @mazzy89

@mazzy89
Copy link
Contributor

mazzy89 commented Mar 1, 2021

Thanks @aanm for reporting this. I will take a look and wire a fix accordingly.

@pchaigno
Copy link
Member

pchaigno commented Mar 2, 2021

@brb If cilium_snat_v4_external is only used when BPF NodePort is enabled, should we create the userspace counterparts only when EnableNodePort is true?

ipv4Nat, ipv6Nat := nat.GlobalMaps(option.Config.EnableIPv4,
and
global4Map, global6Map := nat.GlobalMaps(v4, v6)

@mazzy89
Copy link
Contributor

mazzy89 commented Mar 2, 2021

Thanks, @pchaigno for taking care of this. I was looking at it this morning but you were faster than me 😝

@brb
Copy link
Member

brb commented Mar 3, 2021

If cilium_snat_v4_external is only used when BPF NodePort is enabled, should we create the userspace counterparts only when EnableNodePort is true?

@pchaigno Yep.

@tklauser
Copy link
Member

tklauser commented Mar 5, 2021

Hit this in #15213 test runs, I think: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/861/

17:28:14 STEP: Cilium is not ready yet: controllers are failing: cilium-agent 'cilium-jznc4': controller bpf-map-sync-cilium_snat_v4_external is failing: Exitcode: 0 
Stdout:
 	 KVStore:                Ok   Disabled
	 Kubernetes:             Ok   1.20 (v1.20.4) [linux/amd64]
	 Kubernetes APIs:        ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1beta1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
	 KubeProxyReplacement:   Disabled   
	 Cilium:                 Ok   1.9.90 (v.1.9.90-r.d55bb42)
	 NodeMonitor:            Listening for events on 3 CPUs with 64x4096 of shared memory
	 IPAM:                   IPv4: 4/255 allocated from 10.0.0.0/24, IPv6: 4/255 allocated from fd02::/120
	 BandwidthManager:       Disabled
	 Host Routing:           Legacy
	 Masquerading:           IPTables [IPv4: Enabled, IPv6: Enabled]
	 Controller Status:      29/30 healthy
	   Name                                          Last success   Last error   Count   Message
	   bpf-map-sync-cilium_lxc                       5s ago         never        0       no error                                                                                         
	   bpf-map-sync-cilium_snat_v4_external          never          3s ago       5       Unable to get object /sys/fs/bpf/tc/globals/cilium_snat_v4_external: no such file or directory 
[...]

pchaigno added a commit to pchaigno/cilium that referenced this issue Mar 9, 2021
The IPv4 and IPv6 SNAT maps are only used if BPF NodePort is enabled.
Commit cac5218 ("datapath: remove SNAT maps entries when kube-proxy is
enabled") removed the maps on startup if BPF NodePort is disabled.

We were however still creating them regardless of the BPF NodePort
status. The creation started a controller which then fails once the
actual map is removed. This commit fixes the issue by not creating the
userspace object, including the controller, for the SNAT maps when BPF
NodePort is disabled.

Fixes: cac5218 ("datapath: remove SNAT maps entries when kube-proxy is
enabled")
Fixes: cilium#15141
Signed-off-by: Paul Chaignon <paul@cilium.io>
joestringer pushed a commit that referenced this issue Mar 10, 2021
The IPv4 and IPv6 SNAT maps are only used if BPF NodePort is enabled.
Commit cac5218 ("datapath: remove SNAT maps entries when kube-proxy is
enabled") removed the maps on startup if BPF NodePort is disabled.

We were however still creating them regardless of the BPF NodePort
status. The creation started a controller which then fails once the
actual map is removed. This commit fixes the issue by not creating the
userspace object, including the controller, for the SNAT maps when BPF
NodePort is disabled.

Fixes: cac5218 ("datapath: remove SNAT maps entries when kube-proxy is
enabled")
Fixes: #15141
Signed-off-by: Paul Chaignon <paul@cilium.io>
pchaigno added a commit to pchaigno/cilium that referenced this issue Mar 10, 2021
[ upstream commit 602e5ce ]

The IPv4 and IPv6 SNAT maps are only used if BPF NodePort is enabled.
Commit cac5218 ("datapath: remove SNAT maps entries when kube-proxy is
enabled") removed the maps on startup if BPF NodePort is disabled.

We were however still creating them regardless of the BPF NodePort
status. The creation started a controller which then fails once the
actual map is removed. This commit fixes the issue by not creating the
userspace object, including the controller, for the SNAT maps when BPF
NodePort is disabled.

Fixes: cac5218 ("datapath: remove SNAT maps entries when kube-proxy is
enabled")
Fixes: cilium#15141
Signed-off-by: Paul Chaignon <paul@cilium.io>
pchaigno added a commit to pchaigno/cilium that referenced this issue Mar 10, 2021
[ upstream commit 602e5ce ]

The IPv4 and IPv6 SNAT maps are only used if BPF NodePort is enabled.
Commit cac5218 ("datapath: remove SNAT maps entries when kube-proxy is
enabled") removed the maps on startup if BPF NodePort is disabled.

We were however still creating them regardless of the BPF NodePort
status. The creation started a controller which then fails once the
actual map is removed. This commit fixes the issue by not creating the
userspace object, including the controller, for the SNAT maps when BPF
NodePort is disabled.

Fixes: cac5218 ("datapath: remove SNAT maps entries when kube-proxy is
enabled")
Fixes: cilium#15141
Signed-off-by: Paul Chaignon <paul@cilium.io>
aanm pushed a commit that referenced this issue Mar 10, 2021
[ upstream commit 602e5ce ]

The IPv4 and IPv6 SNAT maps are only used if BPF NodePort is enabled.
Commit cac5218 ("datapath: remove SNAT maps entries when kube-proxy is
enabled") removed the maps on startup if BPF NodePort is disabled.

We were however still creating them regardless of the BPF NodePort
status. The creation started a controller which then fails once the
actual map is removed. This commit fixes the issue by not creating the
userspace object, including the controller, for the SNAT maps when BPF
NodePort is disabled.

Fixes: cac5218 ("datapath: remove SNAT maps entries when kube-proxy is
enabled")
Fixes: #15141
Signed-off-by: Paul Chaignon <paul@cilium.io>
joestringer pushed a commit that referenced this issue Mar 10, 2021
[ upstream commit 602e5ce ]

The IPv4 and IPv6 SNAT maps are only used if BPF NodePort is enabled.
Commit cac5218 ("datapath: remove SNAT maps entries when kube-proxy is
enabled") removed the maps on startup if BPF NodePort is disabled.

We were however still creating them regardless of the BPF NodePort
status. The creation started a controller which then fails once the
actual map is removed. This commit fixes the issue by not creating the
userspace object, including the controller, for the SNAT maps when BPF
NodePort is disabled.

Fixes: cac5218 ("datapath: remove SNAT maps entries when kube-proxy is
enabled")
Fixes: #15141
Signed-off-by: Paul Chaignon <paul@cilium.io>
kkourt added a commit that referenced this issue Apr 8, 2021
PR #14721 introduced changes that removed the
cilium_snat_v{4,6}_external maps if NodePort is not enabled.

Issue #15141 was attributed to the above PR, where controllers were
failing with:
> error=Unable to get object /sys/fs/bpf/tc/globals/cilium_snat_v4_external: no such file or directory  name=bpf-map-sync-cilium_snat_v4_external

In an attempt to fix #15141, #15175 added a nodeport argument to
InitMapInfo() so that the cilium_snat_v{4,6}_external maps are not
created if NodePort is not enabled.

PR #15141 did not fix the issue: #15337 (comment)

This PR takes another shot at fixing #15141 by removing a call of
InitMapInfo from init(), where the (new) nodeport argument is always
true.

Not that in init(), option.Config.EnableNodePort is not properly set
yet, so we cannot pass the config option because it would always be
false.

For this change to properly work, this patch also adds explicit
InitMapInfo() calls since removing init() means that this function is
called in contexts such as tests and the cli.

Fixes: cac5218
Fixes: d639905
Fixes: #15141
Fixes: #15337

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
pchaigno pushed a commit that referenced this issue Apr 9, 2021
PR #14721 introduced changes that removed the
cilium_snat_v{4,6}_external maps if NodePort is not enabled.

Issue #15141 was attributed to the above PR, where controllers were
failing with:
> error=Unable to get object /sys/fs/bpf/tc/globals/cilium_snat_v4_external: no such file or directory  name=bpf-map-sync-cilium_snat_v4_external

In an attempt to fix #15141, #15175 added a nodeport argument to
InitMapInfo() so that the cilium_snat_v{4,6}_external maps are not
created if NodePort is not enabled.

PR #15141 did not fix the issue: #15337 (comment)

This PR takes another shot at fixing #15141 by removing a call of
InitMapInfo from init(), where the (new) nodeport argument is always
true.

Not that in init(), option.Config.EnableNodePort is not properly set
yet, so we cannot pass the config option because it would always be
false.

For this change to properly work, this patch also adds explicit
InitMapInfo() calls since removing init() means that this function is
called in contexts such as tests and the cli.

Fixes: cac5218
Fixes: d639905
Fixes: #15141
Fixes: #15337

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
jibi pushed a commit that referenced this issue Apr 13, 2021
[ upstream commit e83dd53 ]

PR #14721 introduced changes that removed the
cilium_snat_v{4,6}_external maps if NodePort is not enabled.

Issue #15141 was attributed to the above PR, where controllers were
failing with:
> error=Unable to get object /sys/fs/bpf/tc/globals/cilium_snat_v4_external: no such file or directory  name=bpf-map-sync-cilium_snat_v4_external

In an attempt to fix #15141, #15175 added a nodeport argument to
InitMapInfo() so that the cilium_snat_v{4,6}_external maps are not
created if NodePort is not enabled.

PR #15141 did not fix the issue: #15337 (comment)

This PR takes another shot at fixing #15141 by removing a call of
InitMapInfo from init(), where the (new) nodeport argument is always
true.

Not that in init(), option.Config.EnableNodePort is not properly set
yet, so we cannot pass the config option because it would always be
false.

For this change to properly work, this patch also adds explicit
InitMapInfo() calls since removing init() means that this function is
called in contexts such as tests and the cli.

Fixes: cac5218
Fixes: d639905
Fixes: #15141
Fixes: #15337

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
qmonnet pushed a commit that referenced this issue Apr 14, 2021
[ upstream commit e83dd53 ]

PR #14721 introduced changes that removed the
cilium_snat_v{4,6}_external maps if NodePort is not enabled.

Issue #15141 was attributed to the above PR, where controllers were
failing with:
> error=Unable to get object /sys/fs/bpf/tc/globals/cilium_snat_v4_external: no such file or directory  name=bpf-map-sync-cilium_snat_v4_external

In an attempt to fix #15141, #15175 added a nodeport argument to
InitMapInfo() so that the cilium_snat_v{4,6}_external maps are not
created if NodePort is not enabled.

PR #15141 did not fix the issue: #15337 (comment)

This PR takes another shot at fixing #15141 by removing a call of
InitMapInfo from init(), where the (new) nodeport argument is always
true.

Not that in init(), option.Config.EnableNodePort is not properly set
yet, so we cannot pass the config option because it would always be
false.

For this change to properly work, this patch also adds explicit
InitMapInfo() calls since removing init() means that this function is
called in contexts such as tests and the cli.

Fixes: cac5218
Fixes: d639905
Fixes: #15141
Fixes: #15337

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
christarazi pushed a commit to christarazi/cilium that referenced this issue Apr 28, 2021
[ upstream commit e83dd53 ]

PR cilium#14721 introduced changes that removed the
cilium_snat_v{4,6}_external maps if NodePort is not enabled.

Issue cilium#15141 was attributed to the above PR, where controllers were
failing with:
> error=Unable to get object /sys/fs/bpf/tc/globals/cilium_snat_v4_external: no such file or directory  name=bpf-map-sync-cilium_snat_v4_external

In an attempt to fix cilium#15141, cilium#15175 added a nodeport argument to
InitMapInfo() so that the cilium_snat_v{4,6}_external maps are not
created if NodePort is not enabled.

PR cilium#15141 did not fix the issue: cilium#15337 (comment)

This PR takes another shot at fixing cilium#15141 by removing a call of
InitMapInfo from init(), where the (new) nodeport argument is always
true.

Not that in init(), option.Config.EnableNodePort is not properly set
yet, so we cannot pass the config option because it would always be
false.

For this change to properly work, this patch also adds explicit
InitMapInfo() calls since removing init() means that this function is
called in contexts such as tests and the cli.

Fixes: cac5218
Fixes: d639905
Fixes: cilium#15141
Fixes: cilium#15337

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
ti-mo pushed a commit that referenced this issue May 4, 2021
[ upstream commit e83dd53 ]

PR #14721 introduced changes that removed the
cilium_snat_v{4,6}_external maps if NodePort is not enabled.

Issue #15141 was attributed to the above PR, where controllers were
failing with:
> error=Unable to get object /sys/fs/bpf/tc/globals/cilium_snat_v4_external: no such file or directory  name=bpf-map-sync-cilium_snat_v4_external

In an attempt to fix #15141, #15175 added a nodeport argument to
InitMapInfo() so that the cilium_snat_v{4,6}_external maps are not
created if NodePort is not enabled.

PR #15141 did not fix the issue: #15337 (comment)

This PR takes another shot at fixing #15141 by removing a call of
InitMapInfo from init(), where the (new) nodeport argument is always
true.

Not that in init(), option.Config.EnableNodePort is not properly set
yet, so we cannot pass the config option because it would always be
false.

For this change to properly work, this patch also adds explicit
InitMapInfo() calls since removing init() means that this function is
called in contexts such as tests and the cli.

Fixes: cac5218
Fixes: d639905
Fixes: #15141
Fixes: #15337

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic.
Projects
None yet
5 participants