Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

e2e failures when curling for 502 #552

Closed
vdice opened this issue Mar 22, 2016 · 8 comments
Closed

e2e failures when curling for 502 #552

vdice opened this issue Mar 22, 2016 · 8 comments
Labels
Milestone

Comments

@vdice
Copy link
Member

vdice commented Mar 22, 2016

https://ci.deis.io/job/workflow-beta1-test-e2e is consistently catching the following errors when running tests in which app url is curled and a 502 is expected. Specifically, no response is received from said curl command within the specified 10s timeout

12:02:46 ------------------------------
12:02:46 Processes with a deployed app can scale up and down 
12:02:46   scales to 0
12:02:46   /go/src/github.com/deis/workflow-e2e/vendor/github.com/onsi/ginkgo/extensions/table/table_entry.go:46
12:02:48 $ deis login http://deis.10.131.255.59.xip.io --username=test-81 --password=asdf1234
12:02:48 Logged in as test-81
12:02:48 $ deis ps:scale web=0 --app=test-305436544
12:02:48 Scaling processes... but first, coffee!
12:02:49 ...���o..���.o.���..o���done in 1s
12:02:49 === test-305436544 Processes
12:02:49 $ deis ps:list --app=test-305436544
12:02:49 === test-305436544 Processes
12:02:49 $ curl -sL -w "%{http_code}\\n" "http://test-305436544.10.131.255.59.xip.io" -o /dev/null
12:02:59 
12:02:59 • Failure [12.913 seconds]
12:02:59 Processes
12:02:59 /go/src/github.com/deis/workflow-e2e/tests/ps_test.go:166
12:02:59   with a deployed app
12:02:59   /go/src/github.com/deis/workflow-e2e/tests/ps_test.go:165
12:02:59     can scale up and down
12:02:59     /go/src/github.com/deis/workflow-e2e/vendor/github.com/onsi/ginkgo/extensions/table/table.go:96
12:02:59       scales to 0 [It]
12:02:59       /go/src/github.com/deis/workflow-e2e/vendor/github.com/onsi/ginkgo/extensions/table/table_entry.go:46
12:02:59 
12:02:59       Timed out after 10.000s.
12:02:59       Got stuck at:
12:02:59           
12:02:59       Waiting for:
12:02:59           502
12:02:59 
12:02:59       /go/src/github.com/deis/workflow-e2e/tests/ps_test.go:83
12:02:59 ------------------------------

Full set looks like:

12:17:18 [Fail] Processes with a deployed app can scale up and down [It] scales to 0 
12:17:18 /go/src/github.com/deis/workflow-e2e/tests/ps_test.go:83
12:17:18 
12:17:18 [Fail] Processes with a deployed app can scale up and down [It] scales to 0 
12:17:18 /go/src/github.com/deis/workflow-e2e/tests/ps_test.go:83
12:17:18 
12:17:18 [Fail] Processes with a deployed app can restart processes [It] restarts all of 0 
12:17:18 /go/src/github.com/deis/workflow-e2e/tests/ps_test.go:149
12:17:18 
12:17:18 [Fail] Processes with a deployed app can restart processes [It] restarts all of 0 by type 
12:17:18 /go/src/github.com/deis/workflow-e2e/tests/ps_test.go:149
12:17:18 
12:17:18 [Fail] Processes with a deployed app can restart processes [It] restarts all of 0 by wrong type 
12:17:18 /go/src/github.com/deis/workflow-e2e/tests/ps_test.go:149
12:17:18 
12:17:18 [Fail] Builds with a logged-in user with a deployed app [It] can create a build from an existing image ("deis pull") 
12:17:18 /go/src/github.com/deis/workflow-e2e/tests/tests_suite_test.go:380
12:17:18 

Logs from the most recent build hitting these can be found here

@vdice vdice added this to the v2.0-beta1 milestone Mar 22, 2016
@vdice vdice changed the title e2e failures in ps:scale to 0, restart all of 0 e2e failures when curling for 502 Mar 22, 2016
@helgi
Copy link
Contributor

helgi commented Mar 22, 2016

Logs look fine to me. Are all those failures that you were waiting for 502 but did not get it within the period?

There are a few of these peppered around the router logs:

2016/03/22 18:05:33 [error] 20#0: *328 upstream timed out (110: Operation timed out) while connecting to upstream, client: 10.128.1.8, server: ~^test-305436544.(?.+)$, request: "GET / HTTP/1.1", upstream: "http://10.131.243.231:80/", host: "test-305436544.10.131.255.59.xip.io"

@helgi
Copy link
Contributor

helgi commented Mar 23, 2016

This looks like a duplicate of #551 - Can you verify @vdice?

@vdice
Copy link
Member Author

vdice commented Mar 23, 2016

I think so but will have to defer to @slack -- the nuances between both issues have me a little lost.

@slack
Copy link
Member

slack commented Mar 23, 2016

So routeability checking from #551 won't help in this particular case.

What is happening here.

  1. scale an app to X
  2. scale an app to 0
  3. IMMEDIATELY curl the app with the expectation that we will see a 502

We are asserting via a router 502 that we have set the replica count to 0.

Two things appear to be happening:

  1. the 5XX delivered is not deterministic between k8s versions
  2. when we make the curl command we are probably getting stuck during the curl TCP handshake, most likely between the router and a pod that is going away

I think that's why we are hitting a 10s timeout, curl is doing a bunch of TCP retries and never sees a 5XX.

@smothiki and @vdice are working on changing these sets of tests so that we wait after the scale down event, validate the replica count and then make the curl. That, paired with the post-beta router change in deis/router#153 should allow us to have more stable set of ps tests.

@vdice
Copy link
Member Author

vdice commented Mar 23, 2016

Thanks, @slack. By the way, the test change you referenced is no longer scheduled. For beta, we are marking said tests as Pending (see deis/workflow-e2e#123) with intentions to re-enable after said router/other changes post-beta.

@slack
Copy link
Member

slack commented Mar 23, 2016

Sounds good!

@vdice
Copy link
Member Author

vdice commented Mar 31, 2016

Just a refresh on this issue. It still exists, just not visible as tests the would otherwise experience this behavior are marked Pending or commented out.

(See https://github.com/deis/workflow-e2e/blob/master/tests/ps_test.go#L161-L163, https://github.com/deis/workflow-e2e/blob/master/tests/ps_test.go#L90-L92 and https://github.com/deis/workflow-e2e/blob/master/tests/builds_test.go#L143)

@vdice
Copy link
Member Author

vdice commented Apr 14, 2016

This behavior has not been seen lately; closing.

@vdice vdice closed this as completed Apr 14, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants