Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: ConformanceGKE: Error: Unable to enable Hubble: timeout while waiting for status to become successful: context deadline exceeded #25468

Closed
christarazi opened this issue May 16, 2023 · 3 comments
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

Comments

@christarazi
Copy link
Member

christarazi commented May 16, 2023

CI failure


Run cilium hubble enable --chart-directory=install/kubernetes/cilium --relay-image=quay.io/cilium/hubble-relay-ci:6a364a41f3768f427f55cf10c57de076788727e7 --relay-version=6a364a41f3768f427f55cf10c57de076788727e7
🔑 Found CA in secret cilium-ca
ℹ️  helm template --namespace kube-system cilium "install/kubernetes/cilium" --version 1.14.0 --set agentNotReadyTaintKey=ignore-taint.cluster-autoscaler.kubernetes.io/cilium-agent-not-ready,cluster.id=0,cluster.name=cilium-cilium-4988504724,clustermesh.apiserver.image.repository=quay.io/cilium/clustermesh-apiserver-ci,clustermesh.apiserver.image.tag=6a364a41f3768f427f55cf10c57de076788727e7,clustermesh.apiserver.image.useDigest=false,cni.binPath=/home/kubernetes/bin,encryption.nodeEncryption=false,extraConfig.monitor-aggregation=none,hubble.enabled=true,hubble.relay.enabled=true,hubble.relay.image.override=quay.io/cilium/hubble-relay-ci:6a364a41f3768f427f55cf10c57de076788727e7,hubble.relay.image.repository=quay.io/cilium/hubble-relay-ci,hubble.relay.image.tag=6a364a41f3768f427f55cf10c57de076788727e7,hubble.relay.image.useDigest=false,image.repository=quay.io/cilium/cilium-ci,image.tag=6a364a41f3768f427f55cf10c57de076788727e7,image.useDigest=false,ipv4NativeRoutingCIDR=10.72.0.0/14,kubeProxyReplacement=disabled,loadBalancer.l7.backend=envoy,nodeinit.enabled=true,nodeinit.reconfigureKubelet=true,nodeinit.removeCbrBridge=true,operator.image.repository=quay.io/cilium/operator,operator.image.suffix=-ci,operator.image.tag=6a364a41f3768f427f55cf10c57de076788727e7,operator.image.useDigest=false,operator.replicas=1,serviceAccounts.cilium.name=cilium,serviceAccounts.operator.name=cilium-operator,tls.ca.cert=LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNGRENDQWJxZ0F3SUJBZ0lVQk1MY0hwZlllYkVpWlNpQzI2SWd6R3FkdjZVd0NnWUlLb1pJemowRUF3SXcKYURFTE1Ba0dBMVVFQmhNQ1ZWTXhGakFVQmdOVkJBZ1REVk5oYmlCR2NtRnVZMmx6WTI4eEN6QUpCZ05WQkFjVApBa05CTVE4d0RRWURWUVFLRXdaRGFXeHBkVzB4RHpBTkJnTlZCQXNUQmtOcGJHbDFiVEVTTUJBR0ExVUVBeE1KClEybHNhWFZ0SUVOQk1CNFhEVEl6TURVeE5qQTJNalF3TUZvWERUSTRNRFV4TkRBMk1qUXdNRm93YURFTE1Ba0cKQTFVRUJoTUNWVk14RmpBVUJnTlZCQWdURFZOaGJpQkdjbUZ1WTJselkyOHhDekFKQmdOVkJBY1RBa05CTVE4dwpEUVlEVlFRS0V3WkRhV3hwZFcweER6QU5CZ05WQkFzVEJrTnBiR2wxYlRFU01CQUdBMVVFQXhNSlEybHNhWFZ0CklFTkJNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVRRUdBMExxMGYybjNxZGhpdVpqbHFmWWkKc2VVWWZETGN6emN0a1ExOXdEbExxRWpIR213dkxUeUUrVWx6a2hUMG5aM2V0Y2FKYVZRakN4T0g2NEJob2FOQwpNRUF3RGdZRFZSMFBBUUgvQkFRREFnRUdNQThHQTFVZEV3RUIvd1FGTUFNQkFmOHdIUVlEVlIwT0JCWUVGS0lpClVpZG41Y3dsaThkWXBCdVJFbktBak5zZk1Bb0dDQ3FHU000OUJBTUNBMGdBTUVVQ0lIeDB0N3lhNlh4VUJuUDUKTmYzYUtReGJBbjlBazFmRGtjbDZqTE9URFZRdkFpRUErZkY3akRPTXYzNXVmZHVURHhUU2dBa0E2Snh4dzFNYgpSMnczQVVjbVVROD0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=,tls.ca.key=[--- REDACTED WHEN PRINTING TO TERMINAL (USE --redact-helm-certificate-keys=false TO PRINT) ---],tls.secretsBackend=k8s,tunnel=vxlan
✨ Patching ConfigMap cilium-config to enable Hubble...
🚀 Creating ConfigMap for Cilium version 1.14.0...
♻️  Restarted Cilium pods
⌛ Waiting for Cilium to become ready before deploying other Hubble component(s)...
🚀 Creating Peer Service...
✨ Generating certificates...
🔑 Generating certificates for Relay...
✨ Deploying Relay...
⌛ Waiting for Hubble to be installed...

    /¯¯\
 /¯¯\__/¯¯\    Cilium:          OK
 \__/¯¯\__/    Operator:        1 errors
 /¯¯\__/¯¯\    Hubble Relay:    1 errors, 1 warnings
 \__/¯¯\__/    ClusterMesh:     disabled
    \__/

Deployment        cilium-operator    Desired: 1, Unavailable: 1/1
DaemonSet         cilium             Desired: 2, Ready: 2/2, Available: 2/2
Deployment        hubble-relay       Desired: 1, Unavailable: 1/1
Containers:       cilium             Running: 2
                  cilium-operator    Running: 1
                  hubble-relay       Pending: 1
Cluster Pods:     9/10 managed by Cilium
Image versions    cilium             quay.io/cilium/cilium-ci:6a364a41f3768f427f55cf10c57de076788727e7: 2
                  cilium-operator    quay.io/cilium/operator-generic-ci:6a364a41f3768f427f55cf10c57de076788727e7: 1
                  hubble-relay       quay.io/cilium/hubble-relay-ci:6a364a41f3768f427f55cf10c57de076788727e7: 1
Errors:           hubble-relay       hubble-relay                     1 pods of Deployment hubble-relay are not ready
                  cilium-operator    cilium-operator                  1 pods of Deployment cilium-operator are not ready
Warnings:         hubble-relay       hubble-relay-7d494fc494-rbdw9    pod is pending

Error: Unable to enable Hubble: timeout while waiting for status to become successful: context deadline exceeded
Error: Process completed with exit code 1.

cilium-sysdumps.zip

https://github.com/cilium/cilium/actions/runs/4988504724/jobs/8931313859

Seen in #23208

@christarazi christarazi added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels May 16, 2023
@christarazi
Copy link
Member Author

Looking at the sysdump, the Operator fatal'd for the following reason:

level=fatal msg="Unable to init cluster-pool allocator" error="cluster-pool-ipv6-cidr must be provided when using ClusterPool" subsys=cilium-operator-generic

Looking at the code, this error should only occur when IPv6 is enabled:

❯ rg enable-ipv6
logs-cilium-z9d45-cilium-agent-20230516-063558.log
133:2023-05-16T06:29:51.087962212Z level=info msg="  --enable-ipv6='false'" subsys=daemon
134:2023-05-16T06:29:51.087967915Z level=info msg="  --enable-ipv6-big-tcp='false'" subsys=daemon
135:2023-05-16T06:29:51.087973375Z level=info msg="  --enable-ipv6-masquerade='true'" subsys=daemon
136:2023-05-16T06:29:51.087978608Z level=info msg="  --enable-ipv6-ndp='false'" subsys=daemon

logs-cilium-operator-648cb9874c-nqpw7-cilium-operator-20230516-063558.log
33:2023-05-16T06:33:20.740188459Z level=info msg="  --enable-ipv6='true'" subsys=cilium-operator-generic

logs-cilium-operator-648cb9874c-nqpw7-cilium-operator-20230516-063558-prev.log
33:2023-05-16T06:33:20.740188459Z level=info msg="  --enable-ipv6='true'" subsys=cilium-operator-generic

logs-cilium-hbfwl-cilium-agent-20230516-063558.log
133:2023-05-16T06:29:52.208668360Z level=info msg="  --enable-ipv6='false'" subsys=daemon
134:2023-05-16T06:29:52.208674291Z level=info msg="  --enable-ipv6-big-tcp='false'" subsys=daemon
135:2023-05-16T06:29:52.208679719Z level=info msg="  --enable-ipv6-masquerade='true'" subsys=daemon
136:2023-05-16T06:29:52.208684868Z level=info msg="  --enable-ipv6-ndp='false'" subsys=daemon

Weird. So the Agent is configured with IPv6 disabled, but the Operator is configured with it enabled. This must be some sort of bug with the installation / Helm values.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 16, 2023
@github-actions
Copy link

This issue has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
None yet
Development

No branches or pull requests

1 participant