Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: RuntimeFQDNPolicies toFQDNs populates toCIDRSet (data from proxy) L3-dependent L7/HTTP with toFQDN updates proxy policy #16724

Closed
aanm opened this issue Jul 1, 2021 · 4 comments · Fixed by #16769
Assignees
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me!

Comments

@aanm
Copy link
Member

aanm commented Jul 1, 2021

Test Name

RuntimeFQDNPolicies toFQDNs populates toCIDRSet (data from proxy) L3-dependent L7/HTTP with toFQDN updates proxy policy

Failure Output

FAIL: Cannot access to "http://world1.cilium.test" when it should work

Stacktrace

Click to show.
/home/jenkins/workspace/Cilium-PR-Runtime-4.9/runtime-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:518
Cannot access to "http://world1.cilium.test" when it should work
Expected command: docker exec -i  app1 curl --path-as-is -s -D /dev/stderr --fail --connect-timeout 5 --max-time 20 http://world1.cilium.test -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}',Transfer '%{time_starttransfer}', total '%{time_total}'" 
To succeed, but it failed:
Exitcode: 28 
Err: Process exited with status 28
Stdout:
 	 time-> DNS: '0.003208()', Connect: '0.000000',Transfer '0.000000', total '5.001691'
Stderr:
 	 

/home/jenkins/workspace/Cilium-PR-Runtime-4.9/runtime-gopath/src/github.com/cilium/cilium/test/runtime/fqdn.go:995

Standard Output

Click to show.
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
Number of "level=warning" in logs: 0
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 1
No errors/warnings found in logs


Standard Error

Click to show.
15:47:17 STEP: Running BeforeEach block for EntireTestsuite RuntimeFQDNPolicies toFQDNs populates toCIDRSet (data from proxy)
15:47:17 STEP: Clearing fqdn cache: FQDN proxy cache cleared

15:47:17 STEP: Testing connectivity to "http://world1.cilium.test"
15:47:17 STEP: Importing the policy
15:47:17 STEP: Setting up policy: /home/vagrant/go/src/github.com/cilium/cilium/test/policy_be588b11.json
15:47:18 STEP: Trying curl connection to "http://world1.cilium.test" without DNS request
15:47:23 STEP: Testing connectivity to "http://world1.cilium.test"
FAIL: Cannot access to "http://world1.cilium.test" when it should work
Expected command: docker exec -i  app1 curl --path-as-is -s -D /dev/stderr --fail --connect-timeout 5 --max-time 20 http://world1.cilium.test -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}',Transfer '%{time_starttransfer}', total '%{time_total}'" 
To succeed, but it failed:
Exitcode: 28 
Err: Process exited with status 28
Stdout:
 	 time-> DNS: '0.003208()', Connect: '0.000000',Transfer '0.000000', total '5.001691'
Stderr:
 	 

=== Test Finished at 2021-06-28T15:47:28Z====
15:47:28 STEP: Running JustAfterEach block for EntireTestsuite RuntimeFQDNPolicies
===================== TEST FAILED =====================
15:47:29 STEP: Running AfterFailed block for EntireTestsuite RuntimeFQDNPolicies
10.15.150.83 app3
10.15.75.173 app2
10.15.9.42 app1
10.15.194.77 httpd3
10.15.112.92 httpd2
10.15.116.166 httpd1
172.17.0.5 bind
172.18.0.7 OutsideHttpd3
172.18.0.6 OutsideHttpd2
172.18.0.5 OutsideHttpd1
172.18.0.4 WorldHttpd1
172.18.0.3 WorldHttpd3
172.18.0.2 WorldHttpd2
172.17.0.4 cilium-etcd
172.17.0.3 cilium-consul
172.17.0.2 registry

cmd: sudo cilium endpoint list
Exitcode: 0 
Stdout:
 	 ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])   IPv6                 IPv4            STATUS   
	            ENFORCEMENT        ENFORCEMENT                                                                                     
	 412        Disabled           Disabled          4          reserved:health               f00d::a0f:0:0:69af   10.15.70.250    ready   
	 532        Disabled           Disabled          35613      container:app=test            f00d::a0f:0:0:4d83   10.15.116.166   ready   
	                                                            container:id.httpd1                                                        
	                                                            container:id.service1                                                      
	 1200       Disabled           Disabled          8768       container:app=test            f00d::a0f:0:0:ce7d   10.15.75.173    ready   
	                                                            container:id.app2                                                          
	 1444       Disabled           Enabled           2864       container:app=test            f00d::a0f:0:0:df59   10.15.9.42      ready   
	                                                            container:id.app1                                                          
	 1539       Disabled           Disabled          1          reserved:host                                                      ready   
	 3047       Disabled           Disabled          59520      container:app=test            f00d::a0f:0:0:92e5   10.15.150.83    ready   
	                                                            container:id.app3                                                          
	 3135       Disabled           Disabled          62133      container:app=test            f00d::a0f:0:0:d906   10.15.194.77    ready   
	                                                            container:id.httpd3                                                        
	                                                            container:id.service1                                                      
	 3795       Disabled           Disabled          42356      container:app=test            f00d::a0f:0:0:c3ee   10.15.112.92    ready   
	                                                            container:id.httpd2                                                        
	                                                            container:id.service1                                                      
	 
Stderr:
 	 

cmd: sudo cilium policy get
Exitcode: 0 
Stdout:
 	 [
	   {
	     "endpointSelector": {
	       "matchLabels": {
	         "container:id.app1": ""
	       }
	     },
	     "egress": [
	       {
	         "toPorts": [
	           {
	             "ports": [
	               {
	                 "port": "53",
	                 "protocol": "ANY"
	               }
	             ],
	             "rules": {
	               "dns": [
	                 {
	                   "matchName": "world1.cilium.test"
	                 },
	                 {
	                   "matchPattern": "*.cilium.test"
	                 }
	               ]
	             }
	           }
	         ]
	       },
	       {
	         "toPorts": [
	           {
	             "ports": [
	               {
	                 "port": "80",
	                 "protocol": "TCP"
	               }
	             ],
	             "rules": {
	               "http": [
	                 {
	                   "method": "GET"
	                 }
	               ]
	             }
	           }
	         ],
	         "toFQDNs": [
	           {
	             "matchName": "world1.cilium.test"
	           },
	           {
	             "matchPattern": "*.cilium.test"
	           }
	         ]
	       }
	     ],
	     "labels": [
	       {
	         "key": "L3-dependent L7 with toFQDN",
	         "source": ""
	       }
	     ]
	   }
	 ]
	 Revision: 4
	 
Stderr:
 	 

===================== Exiting AfterFailed =====================
15:47:31 STEP: Running AfterEach for block EntireTestsuite RuntimeFQDNPolicies
15:47:32 STEP: Running AfterEach for block EntireTestsuite

[[ATTACHMENT|6e1c5143_RuntimeFQDNPolicies_toFQDNs_populates_toCIDRSet_(data_from_proxy)_L3-dependent_L7-HTTP_with_toFQDN_updates_proxy_policy.zip]]
15:47:32 STEP: Running AfterAll block for EntireTestsuite RuntimeFQDNPolicies toFQDNs populates toCIDRSet (data from proxy)


ZIP Links:

Click to show.

https://jenkins.cilium.io/job/Cilium-PR-Runtime-4.9/5123/artifact/6e1c5143_RuntimeFQDNPolicies_toFQDNs_populates_toCIDRSet_(data_from_proxy)_L3-dependent_L7-HTTP_with_toFQDN_updates_proxy_policy.zip/6e1c5143_RuntimeFQDNPolicies_toFQDNs_populates_toCIDRSet_(data_from_proxy)_L3-dependent_L7-HTTP_with_toFQDN_updates_proxy_policy.zip
https://jenkins.cilium.io/job/Cilium-PR-Runtime-4.9/5123/artifact/test_results_Cilium-PR-Runtime-4.9_5123_BDD-Test-PR.zip/test_results_Cilium-PR-Runtime-4.9_5123_BDD-Test-PR.zip

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-Runtime-4.9/5123/

@aanm aanm added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Jul 1, 2021
@aanm
Copy link
Member Author

aanm commented Jul 1, 2021

Assigned Tom based on #16662 (comment) and #16662 (comment)

@aanm
Copy link
Member Author

aanm commented Jul 1, 2021

I tried to debug this flake by running [1] for 2 hours but I couldn't hit it. Unfortunately, later I realized that I tested this in net-next and not 4.9. However, I'm not sure if the kernel version would cause this test to fail.

[1]

while [ $? == 0 ] ; do ginkgo --focus="RuntimeFQDNPolicies toFQDNs populates toCIDRSet.*L3-dependent L7/HTTP with toFQDN updates proxy policy"  -noColor -v -- -cilium.holdEnvironment=true -cilium.provision=false -cilium.provision-k8s=false -cilium.showCommands=true -cilium.skipLogs=true  -cilium.SSHConfig="vagrant ssh-config 843f6fd"; done

@pchaigno
Copy link
Member

pchaigno commented Jul 5, 2021

According to the CI dashboard, could be the same root cause as #16713 (comment) since it started failing around the same time (June 24th):
image

@pchaigno
Copy link
Member

pchaigno commented Jul 5, 2021

Please post here if you hit this again. The next failures should have more information thanks to #16748.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants