CI: K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running #13552

pchaigno · 2020-10-14T06:48:47Z

Stacktrace

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Kernel/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:461
cilium pre-flight checks failed
Expected
    <*errors.errorString | 0xc000523410>: {
        s: "Cilium validation failed: 4m0s timeout expired: Last polled error: connectivity health is failing: Cluster connectivity is unhealthy on 'cilium-v8b2c': Exitcode: 255 \nErr: exit status 255\nStdout:\n \t \nStderr:\n \t Error: Cannot get status/probe: Put \"http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe\": context deadline exceeded\n\t \n\t command terminated with exit code 255\n\t \n",
    }
to be nil
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Kernel/src/github.com/cilium/cilium/test/k8sT/assertionHelpers.go:107

Standard Output

Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
Number of "level=warning" in logs: 0
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
No errors/warnings found in logs
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
Number of "level=warning" in logs: 0
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
No errors/warnings found in logs
⚠️  Number of "context deadline exceeded" in logs: 16
Number of "level=error" in logs: 0
⚠️  Number of "level=warning" in logs: 6
Number of "Cilium API handler panicked" in logs: 0
⚠️  Number of "Goroutine took lock for more than" in logs: 7
Top 3 errors/warnings:
Session affinity for host reachable services needs kernel 5.7.0 or newer to work properly when accessed from inside cluster: the same service endpoint will be selected from all network namespaces on the host.
BPF bandwidth manager needs kernel 5.0 or newer. Disabling the feature.
Unable to update ipcache map entry on pod add
Cilium pods: [cilium-htxg2 cilium-v8b2c]
Netpols loaded: 
CiliumNetworkPolicies loaded: 
Endpoint Policy Enforcement:
Pod                           Ingress   Egress
grafana-54dbdc987-hgv4n                 
prometheus-6ff848df8b-5klz7             
coredns-7964865f77-t6r8z                
Cilium agent 'cilium-htxg2': Status: Ok  Health: Ok Nodes "" ContinerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 17 Failed 0
Cilium agent 'cilium-v8b2c': Status: Ok  Health: Ok Nodes "" ContinerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 21 Failed 0

https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Kernel/3445/testReport/junit/Suite-k8s-1/18/K8sChaosTest_Connectivity_demo_application_Endpoint_can_still_connect_while_Cilium_is_not_running/
0892951b_K8sChaosTest_Connectivity_demo_application_Endpoint_can_still_connect_while_Cilium_is_not_running.zip

This test failing then caused two other subsequent tests to fail with failed due to BeforeAll failure:

Suite-k8s-1.18.K8sChaosTest Restart with long lived connections TCP connection is not dropped when cilium restarts
Suite-k8s-1.18.K8sChaosTest Restart with long lived connections L3/L4 policies still work while Cilium is restarted

The text was updated successfully, but these errors were encountered:

joestringer · 2020-10-22T21:58:16Z

Hit during K8sBandwidthTest Checks Bandwidth Rate-Limiting test: in #13691:

https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Kernel/3558/testReport/Suite-k8s-1/18/K8sBandwidthTest_Checks_Bandwidth_Rate_Limiting/

tklauser · 2020-11-09T18:19:39Z

Hit during K8sPolicyTest Multi-node policy test validates fromEntities policies with remote-node identity disabled Allows from all hosts with cnp fromEntities host policy in 1.7 backport #13950

https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-K8s/3682/

ungureanuvladvictor · 2020-12-23T22:07:07Z

Hit this problem in #14482.

https://jenkins.cilium.io/job/Cilium-PR-K8s-1.13-net-next/281/testReport/junit/Suite-k8s-1/13/K8sChaosTest_Connectivity_demo_application_Endpoint_can_still_connect_while_Cilium_is_not_running/

c04758c3_K8sChaosTest_Connectivity_demo_application_Endpoint_can_still_connect_while_Cilium_is_not_running.zip

pchaigno · 2021-02-03T12:54:38Z

Happened again in #14797:
https://jenkins.cilium.io/job/Cilium-PR-K8s-1.17-kernel-4.19/110/testReport/junit/Suite-k8s-1/17/K8sChaosTest_Connectivity_demo_application_Endpoint_can_still_connect_while_Cilium_is_not_running/
198f744c_K8sChaosTest_Connectivity_demo_application_Endpoint_can_still_connect_while_Cilium_is_not_running.zip

The logs for the CrashLoopBackOff cilium-agent pod have this fatal:

2021-02-03T11:07:01.202509486Z level=fatal msg="Error while creating daemon" error="listen tcp :45113: bind: address already in use" subsys=daemon

Looks like an issue with the DNS proxy. /cc @jrajahalme

stale · 2021-06-23T11:14:26Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale · 2021-07-11T20:49:48Z

This issue has not seen any activity since it was marked stale. Closing.

errordeveloper · 2021-10-13T11:10:21Z

Re-opening since it reoccured in #17567.

pchaigno · 2021-10-13T12:28:52Z

Re-opening since it reoccured in #17567.
* [job](https://jenkins.cilium.io/job/Cilium-PR-K8s-1.17-kernel-4.9/373/)

@errordeveloper The K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running test passed in this Jenkins job (see console at 14:41:03). Did you mean to link to something else?

errordeveloper · 2021-10-13T13:13:58Z

@pchaigno I was looking at this:

15:43:34  • Failure in Spec Setup (BeforeEach) [150.139 seconds]
15:43:34  K8sChaosTest
15:43:34  /home/jenkins/workspace/Cilium-PR-K8s-1.17-kernel-4.9/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:478
15:43:34    Restart with long lived connections
15:43:34    /home/jenkins/workspace/Cilium-PR-K8s-1.17-kernel-4.9/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:478
15:43:34      TCP connection is not dropped when cilium restarts [BeforeEach]
15:43:34      /home/jenkins/workspace/Cilium-PR-K8s-1.17-kernel-4.9/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:514
15:43:34  
15:43:34      Netperf cannot be deployed
[2021-10-12T14:43:34.716Z]     Expected command: kubectl apply --force=false -f /home/jenkins/workspace/Cilium-PR-K8s-1.17-kernel-4.9/src/github.com/cilium/cilium/test/k8sT/manifests/netperf-deployment.yaml 
[2021-10-12T14:43:34.716Z]     To succeed, but it failed:
[2021-10-12T14:43:34.716Z]     Exitcode: -1 
[2021-10-12T14:43:34.716Z]     Err: signal: killed
[2021-10-12T14:43:34.716Z]     Stdout:
[2021-10-12T14:43:34.716Z]      	 pod/netperf-server created
[2021-10-12T14:43:34.716Z]     	 pod/netperf-client created
[2021-10-12T14:43:34.716Z]     	 
[2021-10-12T14:43:34.716Z]     Stderr:
[2021-10-12T14:43:34.716Z]      	 
[2021-10-12T14:43:34.716Z]     
15:43:34  
15:43:34      /home/jenkins/workspace/Cilium-PR-K8s-1.17-kernel-4.9/src/github.com/cilium/cilium/test/k8sT/Chaos.go:187

I assumed it's to do with this issues as 'TCP connection is not dropped when cilium restarts' is mentioned above... should this be a separate issue?

pchaigno · 2021-10-13T13:16:38Z

should this be a separate issue?

I think so. Neither the test name nor the error message match the present flake report.

github-actions · 2022-07-09T02:18:46Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

pchaigno added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Oct 14, 2020

pchaigno added this to To Do (1.8, 1.9 - Rare Flakes) in CI Force Oct 14, 2020

pchaigno mentioned this issue Oct 14, 2020

test/vagrant: Fix NFS setup for test VMs #13527

Merged

joestringer mentioned this issue Oct 22, 2020

Fix Helm upgrade compatibility #13691

Merged

joestringer mentioned this issue Nov 5, 2020

v1.9 backports 2020-11-05 #13903

Merged

This was referenced Nov 9, 2020

v1.7 backports 2020-11-09 #13950

Merged

test/k8sT/manifests: use image hash with cilium-builder image #13982

Merged

aanm mentioned this issue Dec 7, 2020

Add Kubernetes 1.20 #14248

Merged

tklauser mentioned this issue Dec 21, 2020

Bump aws-go-sdk-v2 to v0.30.0 #14460

Merged

pchaigno added the area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. label Feb 3, 2021

stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jun 23, 2021

stale bot closed this as completed Jul 11, 2021

CI Force automation moved this from To Do (1.8, 1.9 - Rare Flakes) to Fixed / Done Jul 11, 2021

errordeveloper reopened this Oct 13, 2021

CI Force automation moved this from Fixed / Done to In Progress (Cilium) Oct 13, 2021

errordeveloper mentioned this issue Oct 13, 2021

v1.9 backports 2021-10-11 #17567

Merged

pchaigno removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Nov 5, 2021

pchaigno mentioned this issue Nov 5, 2021

k8s/watchers: Add missing v1 EndpointSlice group on init #17778

Merged

nbusseneau mentioned this issue Dec 8, 2021

policy: Fix selector identity release for FQDN #18166

Merged

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 9, 2022

pchaigno closed this as completed Jul 9, 2022

CI Force automation moved this from In Progress (Cilium) to Fixed / Done Jul 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running #13552

CI: K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running #13552

pchaigno commented Oct 14, 2020 •

edited

joestringer commented Oct 22, 2020

tklauser commented Nov 9, 2020

ungureanuvladvictor commented Dec 23, 2020

pchaigno commented Feb 3, 2021

stale bot commented Jun 23, 2021

stale bot commented Jul 11, 2021

errordeveloper commented Oct 13, 2021

pchaigno commented Oct 13, 2021

errordeveloper commented Oct 13, 2021

pchaigno commented Oct 13, 2021

github-actions bot commented Jul 9, 2022

CI: K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running #13552

CI: K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running #13552

Comments

pchaigno commented Oct 14, 2020 • edited

Stacktrace

Standard Output

joestringer commented Oct 22, 2020

tklauser commented Nov 9, 2020

ungureanuvladvictor commented Dec 23, 2020

pchaigno commented Feb 3, 2021

stale bot commented Jun 23, 2021

stale bot commented Jul 11, 2021

errordeveloper commented Oct 13, 2021

pchaigno commented Oct 13, 2021

errordeveloper commented Oct 13, 2021

pchaigno commented Oct 13, 2021

github-actions bot commented Jul 9, 2022

pchaigno commented Oct 14, 2020 •

edited