Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling wellKnownIdentities on v1.11.0 causes the agent to crash #18436

Closed
2 tasks done
mvisonneau opened this issue Jan 11, 2022 · 4 comments
Closed
2 tasks done

Enabling wellKnownIdentities on v1.11.0 causes the agent to crash #18436

mvisonneau opened this issue Jan 11, 2022 · 4 comments
Labels
backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium.

Comments

@mvisonneau
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When upgrading from v1.10.4 to v1.11.0, I ran into the following issue on the Cilium agent:

{
    "error": "maximum ID must be greater than minimum ID: configured max 65535, min 65536",
    "level": "fatal",
    "msg": "Unable to initialize Identity Allocator with backend crd",
    "subsys": "identity-cache"
}

After some troubleshooting, I found the causality to be around the following parameter:

enable-well-known-identities: "true"

Disabling the feature seems to sort the issue out.

Cilium Version

v1.11.0

Kernel Version

5.11.0-1023-aws

Kubernetes Version

Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

Sysdump

cilium-sysdump-20220111-090535.zip

Relevant log output

# When enabled
level=info msg="Started gops server" address="127.0.0.1:9890" subsys=daemon
level=info msg="Memory available for map entries (0.003% of 1961787392B): 4904468B" subsys=config
level=info msg="option bpf-nat-global-max set by dynamic sizing to 131072" subsys=config
level=info msg="option bpf-neigh-global-max set by dynamic sizing to 131072" subsys=config
level=info msg="option bpf-sock-rev-map-max set by dynamic sizing to 65536" subsys=config
{"error":"maximum ID must be greater than minimum ID: configured max 65535, min 65536","level":"fatal","msg":"Unable to initialize Identity Allocator with backend crd","subsys":"identity-cache"}

# When disabled
level=info msg="Started gops server" address="127.0.0.1:9890" subsys=daemon
level=info msg="Memory available for map entries (0.003% of 16500850688B): 41252126B" subsys=config
level=info msg="option bpf-nat-global-max set by dynamic sizing to 144744" subsys=config
level=info msg="option bpf-neigh-global-max set by dynamic sizing to 144744" subsys=config
level=info msg="option bpf-sock-rev-map-max set by dynamic sizing to 72372" subsys=config

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@mvisonneau mvisonneau added kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. labels Jan 11, 2022
@pchaigno pchaigno added the kind/community-report This was reported by a user in the Cilium community, eg via Slack. label Jan 11, 2022
@pchaigno
Copy link
Member

I think this is caused by this line:

MinimalAllocationIdentity = NumericIdentity((1 << ClusterIDShift) * option.Config.ClusterID)

which was introduced by 7112706 (in #17589). That code was then fixed in #18148 and backported to v1.11. So I'd expect this to be fixed in the next v1.11.1. You should already be able to test the v1.11 development image.

cc @ArthurChiao

@pchaigno pchaigno removed the needs/triage This issue requires triaging to establish severity and next steps. label Jan 11, 2022
@ArthurChiao
Copy link
Contributor

Yes, I think #18148 should have fixed the problem.

@aanm aanm added the kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. label Jan 12, 2022
@aanm aanm modified the milestone: 1.11.1 Jan 12, 2022
@aanm aanm added the backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. label Jan 12, 2022
@norrissw
Copy link

norrissw commented Jan 13, 2022

Doing some testing with a cluster-mesh setup against 1.11.0 and I encountered the same error. Installing with helm:

  --set etcd.managed=true \
  --set etcd.k8sService=true \
  --set identityAllocationMode=kvstore \
  --namespace kube-system \
  --set cluster.id=1 \
  --set cluster.name=demo1 \
  --set eni.enabled=false \
  --set tunnel=vxlan \
  --set ipam.mode=cluster-pool \
  --set ipam.operator.clusterPoolIPv4PodCIDR="x.x.x.x/xx" \
  --set ipam.operator.clusterPoolIPv4MaskSize=24 \
  --set kubeProxyReplacement=strict \
  --set k8sServiceHost=x.x.aws.com\
  --set k8sServicePort=443 \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

And the error:

level=info msg="Initializing ClusterMesh routing" path=/var/lib/cilium/clustermesh/ subsys=daemon
level=fatal msg="Unable to initialize Identity Allocator with backend kvstore" error="maximum ID must be greater than minimum ID: configured max 65535, min 65536" subsys=identity-cache

I tried disabling enable-well-known-identities: "true", but it didn't work (and perhaps you can't disable that in a cluster-mesh setup? I'm not quite familiar enough yet to know). I haven't dug to deep yet just figured it was similar (and yet different enough) it was worth reporting. If I get anywhere digging I'll update here

Also worth noting my k8s, kernel and cilium versions are identical to what was posted originally.

@aanm
Copy link
Member

aanm commented Jan 19, 2022

Cilium v1.11.1 has been released, please open a new GH issue if you are facing this problem again. Thank you!

@aanm aanm closed this as completed Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium.
Projects
None yet
Development

No branches or pull requests

5 participants