CI: K8sDatapathConfig MonitorAggregation Checks that monitor aggregation restricts notifications #17590

ti-mo · 2021-10-13T13:19:44Z

Test output:

15:20:01 STEP: Performing K8s service preflight check
15:20:03 STEP: Waiting for cilium-operator to be ready
FAIL: unable to retrieve all nodes with 'kubectl get nodes -o json | jq '.items | length'': Exitcode: -1 
Err: signal: killed
Stdout:
 	 2
	 
Stderr:
 	 

=== Test Finished at 2021-10-12T15:20:13Z====
15:20:13 STEP: Running JustAfterEach block for EntireTestsuite K8sDatapathConfig
FAIL: Found 1 io.cilium/app=operator logs matching list of errors that must be investigated:
level=error
===================== TEST FAILED =====================
15:20:25 STEP: Running AfterFailed block for EntireTestsuite K8sDatapathConfig
cmd: kubectl get pods -o wide --all-namespaces
Exitcode: 0 
Stdout:
 	 NAMESPACE           NAME                               READY   STATUS    RESTARTS   AGE     IP              NODE   NOMINATED NODE   READINESS GATES
	 cilium-monitoring   grafana-5747bcc8f9-ftfkh           1/1     Running   0          2m16s   10.0.1.21       k8s2   <none>           <none>
	 cilium-monitoring   prometheus-655fb888d7-qhrxl        1/1     Running   0          2m16s   10.0.1.146      k8s2   <none>           <none>
	 kube-system         cilium-cqjw7                       1/1     Running   0          64s     192.168.36.12   k8s2   <none>           <none>
	 kube-system         cilium-operator-687c69586d-77fxn   1/1     Running   0          64s     192.168.36.11   k8s1   <none>           <none>
	 kube-system         cilium-operator-687c69586d-xgpj5   0/1     Error     0          64s     192.168.36.12   k8s2   <none>           <none>
...

Haven't seen a Pod in status Error before, but there are operator logs. They indicate:

2021-10-12T15:20:09.775659439Z level=debug msg="Controller func execution time: 1.468µs" name=update-cilium-nodes-pod-cidr subsys=controller uuid=9e125a1b-7b5c-453c-93c9-dfcb51923a65
2021-10-12T15:20:13.244328728Z error retrieving resource lock kube-system/cilium-operator-resource-lock: Get "https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cilium-operator-resource-lock": context deadline exceeded
2021-10-12T15:20:13.244492883Z Failed to release lock: resource name may not be empty
2021-10-12T15:20:13.244701389Z level=error msg="error retrieving resource lock kube-system/cilium-operator-resource-lock: Get \"https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cilium-operator-resource-lock\": context deadline exceeded" subsys=klog
2021-10-12T15:20:13.244716419Z level=info msg="Leader election lost" operator-id=k8s2-xBerCvjWDi subsys=cilium-operator-generic
2021-10-12T15:20:13.244884133Z level=info msg="failed to renew lease kube-system/cilium-operator-resource-lock: timed out waiting for the condition" subsys=klog
2021-10-12T15:20:13.244898894Z level=error msg="Failed to release lock: resource name may not be empty" subsys=klog

Looks like the k8s apiserver becomes unresponsive and there's a cascading failure?

Zip: test_results_Cilium-PR-K8s-1.21-kernel-4.9_1603_BDD-Test-PR.zip

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.21-kernel-4.9/1603/

The text was updated successfully, but these errors were encountered:

github-actions · 2022-02-22T01:49:08Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

brb · 2022-05-06T11:53:59Z

Haven't seen this failure in awhile. Closing.

ti-mo added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Oct 13, 2021

ti-mo mentioned this issue Oct 13, 2021

bpf: Migrate cilium-migrate-map from C to Go #16917

Merged

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 22, 2022

brb closed this as completed May 6, 2022

brb added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label May 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: K8sDatapathConfig MonitorAggregation Checks that monitor aggregation restricts notifications #17590

CI: K8sDatapathConfig MonitorAggregation Checks that monitor aggregation restricts notifications #17590

ti-mo commented Oct 13, 2021

github-actions bot commented Feb 22, 2022

brb commented May 6, 2022

CI: K8sDatapathConfig MonitorAggregation Checks that monitor aggregation restricts notifications #17590

CI: K8sDatapathConfig MonitorAggregation Checks that monitor aggregation restricts notifications #17590

Comments

ti-mo commented Oct 13, 2021

github-actions bot commented Feb 22, 2022

brb commented May 6, 2022