Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEV in startup when installNoConntrackIptablesRules is true #32607

Open
2 of 3 tasks
Jean-Daniel opened this issue May 17, 2024 · 2 comments
Open
2 of 3 tasks

SIGSEV in startup when installNoConntrackIptablesRules is true #32607

Jean-Daniel opened this issue May 17, 2024 · 2 comments
Assignees
Labels
area/iptables Impacts how Cilium interacts with iptables. kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. sig/agent Cilium agent related.

Comments

@Jean-Daniel
Copy link

Jean-Daniel commented May 17, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

I tried to enable installNoConntrackIptablesRules and restart the agent to apply the change, but they started to crash loop with the following stack trace.

time="2024-05-17T13:57:03Z" level=info msg="regenerating all endpoints" reason="one or more identities created or deleted" subsys=endpoint-manager
time="2024-05-17T13:57:04Z" level=info msg="regenerating all endpoints" reason= subsys=endpoint-manager
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x21e4c0b]

goroutine 1 [running]:
github.com/cilium/cilium/pkg/datapath/iptables.(*Manager).installRules(0xc000a6fe10, {0x3956220, 0xb})
	/go/src/github.com/cilium/cilium/pkg/datapath/iptables/iptables.go:1720 +0x6ab
github.com/cilium/cilium/pkg/datapath/iptables.(*Manager).doInstallRules(0xc000a6fe10, {0x3956220, 0xb}, 0x1, 0x1)
	/go/src/github.com/cilium/cilium/pkg/datapath/iptables/iptables.go:1594 +0x125
github.com/cilium/cilium/pkg/datapath/iptables.(*Manager).InstallRules(0x0?, {0x3f594a8, 0xc0028a3110}, {0x3956220, 0xb}, 0xf8?, 0x39?)
	/go/src/github.com/cilium/cilium/pkg/datapath/iptables/iptables.go:1564 +0x14c
github.com/cilium/cilium/pkg/datapath/loader.(*Loader).Reinitialize(0x3940f19?, {0x3f59438, 0xc0008a1f40}, {0x3f5caf0, 0xc00113c000}, {{0x0, 0x0}, 0x0, {0x0, 0x0}, ...}, ...)
	/go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:473 +0x1c19
github.com/cilium/cilium/daemon/cmd.(*Daemon).init(0xc00113c000)
	/go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:254 +0x6b7
github.com/cilium/cilium/daemon/cmd.newDaemon({0x3f59438, 0xc0008a1f40}, 0xc000cea560, 0xc001edfb00)
	/go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:956 +0x5b05
github.com/cilium/cilium/daemon/cmd.newDaemonPromise.func1({0x3578b60, 0x496e00})
	/go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:1698 +0x66
github.com/cilium/cilium/pkg/hive/cell.Hook.Start(...)
	/go/src/github.com/cilium/cilium/pkg/hive/cell/lifecycle.go:45
github.com/cilium/cilium/pkg/hive/cell.(*DefaultLifecycle).Start(0xc000a76b70, {0x3f594a8?, 0xc00055f340?})
	/go/src/github.com/cilium/cilium/pkg/hive/cell/lifecycle.go:108 +0x337
github.com/cilium/cilium/pkg/hive.(*Hive).Start(0xc000537900, {0x3f594a8, 0xc00055f340})
	/go/src/github.com/cilium/cilium/pkg/hive/hive.go:310 +0xf9
github.com/cilium/cilium/pkg/hive.(*Hive).Run(0xc000537900)
	/go/src/github.com/cilium/cilium/pkg/hive/hive.go:210 +0x73
github.com/cilium/cilium/daemon/cmd.NewAgentCmd.func1(0xc000bea400?, {0x3940fc9?, 0x4?, 0x3940e35?})
	/go/src/github.com/cilium/cilium/daemon/cmd/root.go:39 +0x17b
github.com/spf13/cobra.(*Command).execute(0xc000be8300, {0xc0001be010, 0x1, 0x1})
	/go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:987 +0xaa3
github.com/spf13/cobra.(*Command).ExecuteC(0xc000be8300)
	/go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:1115 +0x3ff
github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:1039
github.com/cilium/cilium/daemon/cmd.Execute(0xc000537900?)
	/go/src/github.com/cilium/cilium/daemon/cmd/root.go:79 +0x13
main.main()
	/go/src/github.com/cilium/cilium/daemon/main.go:14 +0x57

Cilium Version

Cilium 1.15.5

Image versions         hubble-ui          quay.io/cilium/hubble-ui:v0.13.0@sha256:7d663dc16538dd6e29061abd1047013a645e6e69c115e008bee9ea9fef9a6666: 1
                       hubble-ui          quay.io/cilium/hubble-ui-backend:v0.13.0@sha256:1e7657d997c5a48253bb8dc91ecee75b63018d16ff5e5797e5af367336bc8803: 1
                       hubble-relay       quay.io/cilium/hubble-relay:v1.15.5@sha256:1d24b24e3477ccf9b5ad081827db635419c136a2bd84a3e60f37b26a38dd0781: 1
                       cilium-operator    quay.io/cilium/operator-generic:v1.15.5@sha256:f5d3d19754074ca052be6aac5d1ffb1de1eb5f2d947222b5f10f6d97ad4383e8: 2
                       cilium             quay.io/cilium/cilium:v1.15.5@sha256:4ce1666a73815101ec9a4d360af6c5b7f1193ab00d89b7124f8505dee147ca40: 10

Kernel Version

Linux worker-1.cluster 5.15.0-107-generic #117-Ubuntu SMP Fri Apr 26 12:26:49 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Server Version: v1.29.5

Regression

No response

Sysdump

agent-not-ready-taint-key                         node.cilium.io/agent-not-ready
annotate-k8s-node                                 true
arping-refresh-period                             30s
auto-direct-node-routes                           true
bgp-secrets-namespace                             kube-system
bpf-lb-acceleration                               disabled
bpf-lb-external-clusterip                         false
bpf-lb-map-max                                    65536
bpf-lb-sock                                       true
bpf-map-dynamic-size-ratio                        0.0025
bpf-policy-map-max                                16384
bpf-root                                          /sys/fs/bpf
cgroup-root                                       /run/cilium/cgroupv2
cilium-endpoint-gc-interval                       5m0s
cluster-id                                        0
cluster-name                                      default
cluster-pool-ipv4-cidr                            10.128.0.0/16
cluster-pool-ipv4-mask-size                       24
cluster-pool-ipv6-cidr                            2a0c:b641:9b0:3c78::/104
cluster-pool-ipv6-mask-size                       120
cni-exclusive                                     true
cni-log-file                                      /var/run/cilium/cilium-cni.log
controller-group-metrics                          write-cni-file sync-host-ips sync-lb-maps-with-k8s-services
custom-cni-conf                                   false
debug                                             false
devices                                           knet-+
dnsproxy-enable-transparent-mode                  true
egress-gateway-reconciliation-trigger-interval    1s
enable-auto-protect-node-port-range               true
enable-bgp-control-plane                          true
enable-bpf-clock-probe                            false
enable-endpoint-health-checking                   true
enable-health-check-loadbalancer-ip               false
enable-health-check-nodeport                      true
enable-health-checking                            true
enable-hubble                                     true
enable-ipv4                                       true
enable-ipv4-big-tcp                               false
enable-ipv4-masquerade                            false
enable-ipv6                                       true
enable-ipv6-big-tcp                               false
enable-ipv6-masquerade                            false
enable-k8s-networkpolicy                          true
enable-k8s-terminating-endpoint                   true
enable-l2-neigh-discovery                         true
enable-l7-proxy                                   true
enable-local-redirect-policy                      false
enable-masquerade-to-route-source                 false
enable-metrics                                    true
enable-policy                                     default
enable-remote-node-identity                       true
enable-sctp                                       false
enable-svc-source-range-check                     true
enable-vtep                                       false
enable-well-known-identities                      false
enable-xt-socket-fallback                         false
external-envoy-proxy                              false
hubble-disable-tls                                false
hubble-export-file-max-backups                    5
hubble-export-file-max-size-mb                    10
hubble-listen-address                             :4244
hubble-socket-path                                /var/run/cilium/hubble.sock
hubble-tls-cert-file                              /var/lib/cilium/tls/hubble/server.crt
hubble-tls-client-ca-files                        /var/lib/cilium/tls/hubble/client-ca.crt
hubble-tls-key-file                               /var/lib/cilium/tls/hubble/server.key
identity-allocation-mode                          crd
identity-gc-interval                              15m0s
identity-heartbeat-timeout                        30m0s
install-no-conntrack-iptables-rules               false
ipam                                              cluster-pool
ipam-cilium-node-update-rate                      15s
k8s-client-burst                                  20
k8s-client-qps                                    10
kube-proxy-replacement                            true
kube-proxy-replacement-healthz-bind-address       
max-connected-clusters                            255
mesh-auth-enabled                                 true
mesh-auth-gc-interval                             5m0s
mesh-auth-queue-size                              1024
mesh-auth-rotated-identities-queue-size           1024
monitor-aggregation                               medium
monitor-aggregation-flags                         all
monitor-aggregation-interval                      5s
node-port-bind-protection                         true
nodes-gc-interval                                 5m0s
operator-api-serve-addr                           127.0.0.1:9234
operator-prometheus-serve-addr                    :9963
preallocate-bpf-maps                              false
procfs                                            /host/proc
prometheus-serve-addr                             :9962
proxy-connect-timeout                             2
proxy-idle-timeout-seconds                        60
proxy-max-connection-duration-seconds             0
proxy-max-requests-per-connection                 0
proxy-prometheus-port                             9964
proxy-xff-num-trusted-hops-egress                 0
proxy-xff-num-trusted-hops-ingress                0
remove-cilium-node-taints                         true
routing-mode                                      native
service-no-backend-response                       reject
set-cilium-is-up-condition                        true
set-cilium-node-taints                            true
sidecar-istio-proxy-image                         cilium/istio_proxy
skip-cnp-status-startup-clean                     false
synchronize-k8s-nodes                             true
tofqdns-dns-reject-response-code                  refused
tofqdns-enable-dns-compression                    true
tofqdns-endpoint-max-ip-per-hostname              50
tofqdns-idle-connection-grace-period              0s
tofqdns-max-deferred-connection-deletes           10000
tofqdns-proxy-response-max-delay                  100ms
unmanaged-pod-watcher-interval                    15
vtep-cidr                                         
vtep-endpoint                                     
vtep-mac                                          
vtep-mask                                         
write-cni-conf-when-ready                         /host/etc/cni/net.d/05-cilium.conflist

Relevant log output

No response

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Jean-Daniel Jean-Daniel added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels May 17, 2024
@Jean-Daniel
Copy link
Author

This is because IPv4NativeRoutingCIDR is NULL in the config (as masquerading is disabled).

I see that the main branch is using podsCIDR := state.localNodeInfo.ipv4NativeRoutingCIDR instead of getting it from the shared config, so it may already be fixed in the next major release.

@lmb lmb added sig/agent Cilium agent related. area/iptables Impacts how Cilium interacts with iptables. labels May 21, 2024
@pippolo84
Copy link
Member

pippolo84 commented May 21, 2024

Hi @Jean-Daniel, thanks for the bug report. I think what you pointed out is correct, we are missing a check for a nil (or empty) IPv4 native routing CIDR before installing the iptables NOTRACK rules. 👍

@pippolo84 pippolo84 self-assigned this May 21, 2024
pippolo84 added a commit to pippolo84/cilium that referenced this issue May 21, 2024
In case IPv4NativeRoutingCIDR is left unspecified, the related config
option will be nil. To avoid panicking, check for this case before
converting the CIDR to a string. Moreover, do not try to run the
iptables command to install the NOTRACK rules if the resulting string is
empty.

FIxes: cilium#32607

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 added a commit to pippolo84/cilium that referenced this issue May 21, 2024
In case IPv4NativeRoutingCIDR is left unspecified, the related config
option will be nil. To avoid panicking, check for this case before
converting the CIDR to a string. Moreover, do not try to run the
iptables command to install the NOTRACK rules if the resulting string is
empty.

Fixes: cilium#32607

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 added a commit to pippolo84/cilium that referenced this issue May 21, 2024
In case IPv4NativeRoutingCIDR is left unspecified, the related config
option will be nil. To avoid panicking, check for this case before
converting the CIDR to a string. Moreover, do not try to run the
iptables command to install the NOTRACK rules if the resulting string is
empty.

Related: cilium#32607

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 added a commit to pippolo84/cilium that referenced this issue May 21, 2024
In case IPv4NativeRoutingCIDR is left unspecified, the related config
option will be nil. To avoid panicking, check for this case before
converting the CIDR to a string. Moreover, do not try to run the
iptables command to install the NOTRACK rules if the resulting string is
empty.

Related: cilium#32607

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 added a commit to pippolo84/cilium that referenced this issue May 21, 2024
In case IPv4NativeRoutingCIDR is left unspecified, the related config
option will be nil. To avoid panicking, check for this case before
converting the CIDR to a string. Moreover, do not try to run the
iptables command to install the NOTRACK rules if the resulting string is
empty.

Related: cilium#32607

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 added a commit to pippolo84/cilium that referenced this issue May 23, 2024
In case IPv4NativeRoutingCIDR is left unspecified, the related config
option will be nil. To avoid panicking, check for this case before
converting the CIDR to a string. Moreover, do not try to run the
iptables command to install the NOTRACK rules if the resulting string is
empty.

Related: cilium#32607

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
julianwiedmann pushed a commit that referenced this issue May 27, 2024
In case IPv4NativeRoutingCIDR is left unspecified, the related config
option will be nil. To avoid panicking, check for this case before
converting the CIDR to a string. Moreover, do not try to run the
iptables command to install the NOTRACK rules if the resulting string is
empty.

Related: #32607

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
julianwiedmann pushed a commit that referenced this issue May 27, 2024
In case IPv4NativeRoutingCIDR is left unspecified, the related config
option will be nil. To avoid panicking, check for this case before
converting the CIDR to a string. Moreover, do not try to run the
iptables command to install the NOTRACK rules if the resulting string is
empty.

Related: #32607

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
julianwiedmann pushed a commit that referenced this issue May 27, 2024
In case IPv4NativeRoutingCIDR is left unspecified, the related config
option will be nil. To avoid panicking, check for this case before
converting the CIDR to a string. Moreover, do not try to run the
iptables command to install the NOTRACK rules if the resulting string is
empty.

Fixes: #32607

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/iptables Impacts how Cilium interacts with iptables. kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. sig/agent Cilium agent related.
Projects
None yet
Development

No branches or pull requests

3 participants