[CI]: K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master #6307

aanm · 2018-11-27T20:41:12Z

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:383
Cannot curl app1-service
Expected command: kubectl exec -n default app2-85b74b9c79-ztpb6 -- curl -s -D /dev/stderr --fail --connect-timeout 3 --max-time 8 http://app1-service/public -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}',Transfer '%{time_starttransfer}', total '%{time_total}'" 
To succeed, but it failed:
Exitcode: 28 
Stdout:
 	 time-> DNS: '0.000000()', Connect: '0.000000',Transfer '0.000000', total '3.516777'
Stderr:
 	 command terminated with exit code 28
	 

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/src/github.com/cilium/cilium/test/k8sT/Updates.go:216

e413417a_K8sUpdates_Tests_upgrade_and_downgrade_from_a_Cilium_stable_image_to_master.zip

2nd time:

da2b440e_K8sUpdates_Tests_upgrade_and_downgrade_from_a_Cilium_stable_image_to_master.zip

The text was updated successfully, but these errors were encountered:

ianvernon · 2018-11-28T17:15:12Z

Happened in https://jenkins.cilium.io/job/cilium-ginkgo/job/cilium/job/master/1957/testReport/junit/k8s-1/8/K8sUpdates_Tests_upgrade_and_downgrade_from_a_Cilium_stable_image_to_master/ as well

aanm · 2018-11-29T21:41:39Z

happened here as well #6331

ianvernon · 2018-11-30T15:57:54Z

Happened in a PR build: https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/8317/testReport/junit/k8s-1/8/K8sUpdates_Tests_upgrade_and_downgrade_from_a_Cilium_stable_image_to_master/

ianvernon · 2018-11-30T17:29:51Z

Another failure: https://jenkins.cilium.io/job/cilium-ginkgo/job/cilium/job/master/1965/testReport/junit/k8s-1/8/K8sUpdates_Tests_upgrade_and_downgrade_from_a_Cilium_stable_image_to_master/

tgraf · 2019-01-28T02:56:33Z

Dup of #6730

raybejjani · 2019-04-23T09:25:33Z

master: https://jenkins.cilium.io/job/cilium-ginkgo/job/cilium/job/master/2763
test_results_master_2763_BDD-Test-PR.zip

I've seen this one other time, and it may be distinct from the test failing overall:

23:54:23  cmd: kubectl get pods -o wide --all-namespaces
23:54:23  Exitcode: 0 
23:54:23  Stdout:
23:54:23   	 NAMESPACE     NAME                                    READY     STATUS     RESTARTS   AGE       IP              NODE
23:54:23  	 kube-system   cilium-etcd-b245mtxhtk                  1/1       Running    0          34m       10.10.0.76      k8s1
23:54:23  	 kube-system   cilium-etcd-hf74w4zsdd                  1/1       Unknown    0          34m       10.10.1.41      k8s2
23:54:23  	 kube-system   cilium-etcd-operator-77d4ddf8c6-5skxb   1/1       Unknown    0          36m       192.168.36.12   k8s2
23:54:23  	 kube-system   cilium-etcd-whh4wxpk6v                  1/1       Unknown    0          34m       10.10.1.77      k8s2
23:54:23  	 kube-system   cilium-v9rl6                            1/1       NodeLost   0          27m       192.168.36.12   k8s2
23:54:23  	 kube-system   cilium-z5kwm                            0/1       Running    6          26m       192.168.36.11   k8s1
23:54:23  	 kube-system   etcd-k8s1                               1/1       Running    0          41m       192.168.36.11   k8s1
23:54:23  	 kube-system   etcd-operator-65476dd78f-2c8zz          1/1       Unknown    0          35m       10.10.1.251     k8s2
23:54:23  	 kube-system   kube-apiserver-k8s1                     1/1       Running    0          41m       192.168.36.11   k8s1
23:54:23  	 kube-system   kube-controller-manager-k8s1            1/1       Running    0          41m       192.168.36.11   k8s1
23:54:23  	 kube-system   kube-dns-f4d788bb7-rw7xm                3/3       Unknown    0          42m       10.10.1.253     k8s2
23:54:23  	 kube-system   kube-proxy-6b7zk                        1/1       Running    0          42m       192.168.36.11   k8s1
23:54:23  	 kube-system   kube-proxy-7wr62                        1/1       NodeLost   0          36m       192.168.36.12   k8s2
23:54:23  	 kube-system   kube-scheduler-k8s1                     1/1       Running    0          41m       192.168.36.11   k8s1

shows k8s2 as completely lost. I'm not sure what to make of this, and the only logs available are test-specific (nothing for cilium-v9rl6 or for kubelet).

jrajahalme · 2019-05-09T20:32:51Z

Failed on https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/12211/execution/node/132/log/. Looks like test on traffic started even when some endpoints were NOT ready?

STEP: Performing Cilium preflight check
STEP: Validate that endpoints are ready before making any connection
STEP: Making L7 requests between endpoints
=== Test Finished at 2019-05-09T20:06:56Z====
===================== TEST FAILED =====================
cmd: kubectl get pods -o wide --all-namespaces
Exitcode: 0 
Stdout:
 	 NAMESPACE     NAME                                    READY     STATUS    RESTARTS   AGE       IP              NODE
	 default       app1-6f5f7bd649-4m944                   0/1       Running   0          3m        10.10.0.115     k8s1
	 default       app1-6f5f7bd649-8pnr4                   0/1       Running   0          3m        10.10.0.56      k8s1
	 default       app2-5c44ff87c-t9gcs                    1/1       Running   0          3m        10.10.0.228     k8s1
	 default       app3-579cbb5fcd-q8b7v                   1/1       Running   0          3m        10.10.0.125     k8s1
	 default       migrate-svc-client-7zht2                1/1       Running   0          3m        10.10.1.177     k8s2
	 default       migrate-svc-client-dvn82                1/1       Running   0          3m        10.10.1.136     k8s2
	 default       migrate-svc-client-tshpn                1/1       Running   0          3m        10.10.1.61      k8s2
	 default       migrate-svc-client-v7d6j                1/1       Running   0          3m        10.10.0.84      k8s1
	 default       migrate-svc-client-zzthh                1/1       Running   0          3m        10.10.0.175     k8s1
	 default       migrate-svc-server-7c2zv                1/1       Running   0          3m        10.10.1.138     k8s2
	 default       migrate-svc-server-d76hw                1/1       Running   0          3m        10.10.0.63      k8s1
	 default       migrate-svc-server-rx2xw                1/1       Running   0          3m        10.10.1.84      k8s2
	 kube-system   cilium-cl4ws                            1/1       Running   0          49s       192.168.36.11   k8s1
	 kube-system   cilium-etcd-mnbb98rjbf                  1/1       Running   0          4m        10.10.0.200     k8s1
	 kube-system   cilium-etcd-operator-77d4ddf8c6-595r7   1/1       Running   0          4m        192.168.36.12   k8s2
	 kube-system   cilium-etcd-qcjkndw7mk                  1/1       Running   0          3m        10.10.1.243     k8s2
	 kube-system   cilium-etcd-vm2wzkklc7                  1/1       Running   0          4m        10.10.1.51      k8s2
	 kube-system   cilium-n8d6f                            1/1       Running   0          49s       192.168.36.12   k8s2
	 kube-system   cilium-operator-5fb956fc65-ppntk        1/1       Running   0          1m        10.10.0.217     k8s1
	 kube-system   etcd-k8s1                               1/1       Running   0          27m       192.168.36.11   k8s1
	 kube-system   etcd-operator-65476dd78f-s9lmd          1/1       Running   0          4m        10.10.1.35      k8s2
	 kube-system   kube-apiserver-k8s1                     1/1       Running   0          27m       192.168.36.11   k8s1
	 kube-system   kube-controller-manager-k8s1            1/1       Running   0          27m       192.168.36.11   k8s1
	 kube-system   kube-dns-f4d788bb7-xjjmg                3/3       Running   0          4m        10.10.1.25      k8s2
	 kube-system   kube-proxy-v4xwj                        1/1       Running   0          28m       192.168.36.11   k8s1
	 kube-system   kube-proxy-z8c7z                        1/1       Running   0          19m       192.168.36.12   k8s2
	 kube-system   kube-scheduler-k8s1                     1/1       Running   0          27m       192.168.36.11   k8s1

jrajahalme · 2019-05-09T22:18:56Z

@aanm Do you know what is the significance of pod listing READY column reporting "0/1"? Apparently the cilium state for these endpoint is "ready" and the test is happy to proceed, but then fails. I.e., we wait for the cilium endpoint list to show "ready", but not for the k8s pod READY to be "n/n".

aanm · 2019-05-09T23:01:34Z

@aanm Do you know what is the significance of pod listing READY column reporting "0/1"? Apparently the cilium state for these endpoint is "ready" and the test is happy to proceed, but then fails. I.e., we wait for the cilium endpoint list to show "ready", but not for the k8s pod READY to be "n/n".

@jrajahalme ebc77ec

stale · 2019-07-08T23:52:34Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale · 2019-07-23T00:03:19Z

This issue has not seen any activity since it was marked stale. Closing.

jrajahalme · 2020-07-29T00:07:32Z

#12548 had same failure, same symptom as in description. coreDNS was in crashloop backoff with these logs:

2020-07-28T19:04:49.845705843Z .:53
2020-07-28T19:04:49.845742463Z 2020/07/28 19:04:49 [INFO] CoreDNS-1.2.2
2020-07-28T19:04:49.845746522Z 2020/07/28 19:04:49 [INFO] linux/amd64, go1.11, eb51e8b
2020-07-28T19:04:49.845749987Z CoreDNS-1.2.2
2020-07-28T19:04:49.845753393Z linux/amd64, go1.11, eb51e8b
2020-07-28T19:04:49.845756755Z 2020/07/28 19:04:49 [INFO] plugin/reload: Running configuration MD5 = ffcc993a37738c0d6dd423fdb6ad81b0
2020-07-28T19:04:56.028383249Z 10.0.0.49:48566 - [28/Jul/2020:19:04:56 +0000] 59749 "A IN app1-service.default.svc.cluster.local. udp 80 false 4096" NOERROR qr,aa,rd,ra 134 0.000136696s
2020-07-28T19:04:56.453196361Z 10.0.0.49:54759 - [28/Jul/2020:19:04:56 +0000] 2004 "A IN app1-service.default.svc.cluster.local. udp 80 false 4096" NOERROR qr,rd,ra 134 0.000084368s
2020-07-28T19:04:56.848148387Z 2020/07/28 19:04:56 [FATAL] plugin/loop: Seen "HINFO IN 2264449761600304678.7082390303545810211." more than twice, loop detected

Update k8s 1.12 coredns deployment to image tag 1.2.6 to get the bug fix for loop detection getting confused due to retries (coredns/coredns#2391). Fixes: #6307 Signed-off-by: Jarno Rajahalme <jarno@covalent.io>

aanm added priority/medium This is considered important, but not urgent. area/CI Continuous Integration testing issue or flake needs/triage This issue requires triaging to establish severity and next steps. labels Nov 27, 2018

aanm added this to the 1.4-bugfix milestone Nov 27, 2018

aanm added this to Needs triage in CI Failures via automation Nov 27, 2018

aanm added this to Proposed in 1.4 via automation Nov 27, 2018

aanm moved this from Needs triage to Daily / Always in CI Failures Nov 29, 2018

ianvernon mentioned this issue Nov 30, 2018

endpoint: store more contextual information about datapath regenerations while regenerating endpoints #6191

Merged

ianvernon mentioned this issue Nov 30, 2018

make: Fix indentation in start-kvstores section #6344

Merged

tgraf closed this as completed Jan 28, 2019

1.4 automation moved this from Proposed to Done Jan 28, 2019

raybejjani reopened this Apr 23, 2019

jrajahalme mentioned this issue May 9, 2019

policy: SelectorCache integration to policy computation #7958

Merged

stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 8, 2019

stale bot closed this as completed Jul 23, 2019

jrajahalme mentioned this issue Jul 28, 2020

proxy: Move Kafka proxy to Envoy Go Extensions #12548

Merged

jrajahalme reopened this Jul 28, 2020

stale bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 28, 2020

jrajahalme mentioned this issue Aug 13, 2020

test: Update coredns image tag from 1.2.2 to 1.2.6 for k8s 1.12 #12869

Merged

aanm closed this as completed in #12869 Aug 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI]: K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master #6307

[CI]: K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master #6307

aanm commented Nov 27, 2018

ianvernon commented Nov 28, 2018

aanm commented Nov 29, 2018

ianvernon commented Nov 30, 2018

ianvernon commented Nov 30, 2018

tgraf commented Jan 28, 2019

raybejjani commented Apr 23, 2019

jrajahalme commented May 9, 2019 •

edited

jrajahalme commented May 9, 2019

aanm commented May 9, 2019

stale bot commented Jul 8, 2019

stale bot commented Jul 23, 2019

jrajahalme commented Jul 29, 2020

[CI]: K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master #6307

[CI]: K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master #6307

Comments

aanm commented Nov 27, 2018

ianvernon commented Nov 28, 2018

aanm commented Nov 29, 2018

ianvernon commented Nov 30, 2018

ianvernon commented Nov 30, 2018

tgraf commented Jan 28, 2019

raybejjani commented Apr 23, 2019

jrajahalme commented May 9, 2019 • edited

jrajahalme commented May 9, 2019

aanm commented May 9, 2019

stale bot commented Jul 8, 2019

stale bot commented Jul 23, 2019

jrajahalme commented Jul 29, 2020

jrajahalme commented May 9, 2019 •

edited