Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing: Ipsec default interface lookup failed #16699

Closed
pchaigno opened this issue Jun 30, 2021 · 3 comments
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects

Comments

@pchaigno
Copy link
Member

https://jenkins.cilium.io/job/cilium-master-k8s-1.19-kernel-5.4/367/testReport/junit/Suite-k8s-1/19/K8sDatapathConfig_Transparent_encryption_DirectRouting_Check_connectivity_with_transparent_encryption_and_direct_routing/
c05d5c34_K8sDatapathConfig_Transparent_encryption_DirectRouting_Check_connectivity_with_transparent_encryption_and_direct_routing.zip

Stacktrace

/home/jenkins/workspace/cilium-master-k8s-1.19-kernel-5.4/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:518
Timed out after 240.000s.
Timeout while waiting for Cilium to become ready
Expected
    <*errors.errorString | 0xc000aafe50>: {
        s: "only 0 of 2 desired pods are ready",
    }
to be nil
/home/jenkins/workspace/cilium-master-k8s-1.19-kernel-5.4/src/github.com/cilium/cilium/test/k8sT/assertionHelpers.go:126

The Cilium agent is failing because of this error:

2021-06-29T13:32:57.637154932Z level=debug msg="Found default route on node {Ifindex: 2 Dst: <nil> Src: 10.0.2.15 Gw: 10.0.2.2 Flags: [] Table: 254}" subsys=route
2021-06-29T13:32:57.637312330Z level=debug msg="Found default route on node {Ifindex: 3 Dst: <nil> Src: <nil> Gw: <nil> Flags: [] Table: 254}" subsys=route
2021-06-29T13:32:57.637321149Z level=fatal msg="Ipsec default interface lookup failed, consider \"encrypt-interface\" to manually configure interface." error="IPv4/IPv6 have different link indices" subsys=daemon

We don't have the bugtools (to check the routes) because the agents are failing. Know facts:

  • It's not failing consistently.
  • It's only failing on k8s 1.19, Linux 5.4.
@pchaigno pchaigno added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Jun 30, 2021
@pchaigno pchaigno self-assigned this Jun 30, 2021
@pchaigno
Copy link
Member Author

pchaigno commented Jul 2, 2021

Please post here if you see this happen again. We should have more logs now that #16700 is merged.

@pchaigno
Copy link
Member Author

This issue is happening from time to time because kubectl.GetPrivateIface() sometimes returns an empty interface name without error:

time="2021-06-29T13:29:33Z" level=debug msg="running command: helm template --validate /home/jenkins/workspace/cilium-master-k8s-1.19-kernel-5.4/src/github.com/cilium/cilium/install/kubernetes/cilium --namespace=kube-system  --set k8sServiceHost=192.168.36.11  --set hubble.relay.image.tag=504e19e956e41c8126c030897a7cf4de2d959671  --set etcd.leaseTTL=30s  --set nodePort.enableHealthCheck=false  --set hubble.listenAddress=:4244  --set bpf.masquerade=true  --set k8s.requireIPv4PodCIDR=true  --set preflight.image.useDigest=false  --set image.tag=504e19e956e41c8126c030897a7cf4de2d959671  --set bandwidthManager=false  --set operator.image.useDigest=false  --set nativeRoutingCIDR=10.0.0.0/8  --set hubble.relay.image.useDigest=false  --set ipv6.enabled=true  --set hubble.relay.image.repository=quay.io/cilium/hubble-relay-ci  --set hubble.enabled=true  --set bpf.preallocateMaps=false  --set kubeProxyReplacement=disabled  --set encryption.ipsec.interface=  --set logSystemLoad=true  --set image.repository=quay.io/cilium/cilium-ci  --set operator.image.repository=quay.io/cilium/operator  --set tunnel=disabled  --set autoDirectNodeRoutes=true  --set hostFirewall=false  --set encryption.enabled=true  --set ipam.operator.clusterPoolIPv6PodCIDR=fd02::/112  --set image.useDigest=false  --set preflight.image.tag=504e19e956e41c8126c030897a7cf4de2d959671  --set debug.enabled=true  --set enableCnpStatusUpdates=true  --set preflight.image.repository=quay.io/cilium/cilium-ci  --set operator.image.tag=504e19e956e41c8126c030897a7cf4de2d959671  --set operator.image.suffix=-ci  --set k8sServicePort=6443  --set devices=  --set ipv4.enabled=true  --set pprof.enabled=true  > cilium-168d0fbc64de4248.yaml"

Because of that, we fallback to the IPsec interface detection logic. That logic fails to detect the interface in the test VM, as expected.

So it appears the additional information dump in #16700 isn't actually useful. #16863 reverts that code and instead prints more useful information in case of kubectl.GetPrivateIface() failure.

@pchaigno pchaigno removed their assignment Sep 2, 2021
@brb brb added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label May 6, 2022
@github-actions
Copy link

github-actions bot commented Jul 9, 2022

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 9, 2022
@pchaigno pchaigno closed this as completed Jul 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
No open projects
CI Force
  
Awaiting triage
Development

No branches or pull requests

2 participants