bug(tests): wait for router to reload before testing config change #119

slack · 2016-03-22T05:17:34Z

Test: "can set and list environment variables"
Jenkins job: https://ci.deis.io/job/workflow-beta1-test-e2e/17/

It looks like we are hitting the app before the router has had a chance to reload, causing a 404:

00:15:33.957   can set and list environment variables
00:15:33.958   /go/src/github.com/deis/workflow-e2e/tests/config_test.go:55
00:15:35.220 $ deis login http://deis.10.187.248.89.xip.io --username=test-81 --password=asdf1234
00:15:35.273 Logged in as test-81
00:15:35.278 $ deis apps:create test-23966460 
00:15:35.464 Creating Application... ...���done, created test-23966460
00:15:35.468 Git remote deis added
00:15:35.468 remote available at ssh://git@deis.10.187.248.89.xip.io:2222/test-23966460.git
00:15:35.478 $ GIT_SSH=/tmp/deis-workflow-home943620212/.ssh/git-ssh git push deis master

[REMOVED]

00:15:46.424 remote: running [git gc] in directory /home/git/test-23966460.git        
00:15:46.430 To ssh://git@deis.10.187.248.89.xip.io:2222/test-23966460.git
00:15:46.430  * [new branch]      master -> master
00:15:46.430 $ deis config:set POWERED_BY=midi-chlorians
00:15:49.709 Creating config... ...���o..���.o.���..o���...���o..���.o.���..o���...���done
00:15:49.709 
00:15:49.745 === test-23966460 Config
00:15:49.745 POWERED_BY      midi-chlorians
00:15:49.761 $ deis config:list -a test-23966460
00:15:49.793 === test-23966460 Config
00:15:49.793 POWERED_BY      midi-chlorians
00:15:49.803 $ curl -sL "http://test-23966460.10.187.248.89.xip.io"; echo
00:15:49.874 <html>
00:15:49.874 <head><title>404 Not Found</title></head>
00:15:49.875 <body bgcolor="white">
00:15:49.875 <center><h1>404 Not Found</h1></center>
00:15:49.875 <hr><center>nginx/1.9.6</center>
00:15:49.875 </body>
00:15:49.875 </html>
00:15:49.875

Looking at the router logs, there is a reload, but we immediately see the 404 later in the logs:

[22/Mar/2016:04:35:09 +0000] - 10.184.0.9 - - - 201 - "POST /v2/apps/ HTTP/1.1" - 522 - "-" - "Deis Client v2.0.0-dev" - "~^deis\x5C.(?<domain>.+)$" - 10.187.244.132:80 - deis.10.187.248.89.xip.io - 0.175 - 0.175
2016/03/22 04:35:16 WARNING: Field "router.deis.io/certificates" value "" does not satisfy constraint /(?i)^((([a-z0-9]+(-[a-z0-9]+)*)|((\*\.)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}):([a-z0-9]+(-[a-z0-9]+)*)(\s*,\s*)?)+$/ -- skipping this field and using default value "map[]".
2016/03/22 04:35:16 INFO: Router configuration has changed in k8s.
2016/03/22 04:35:16 INFO: Reloading nginx...
2016/03/22 04:35:16 INFO: nginx reloaded.
[22/Mar/2016:04:35:23 +0000] - 10.184.0.9 - - - 201 - "POST /v2/apps/test-23966460/config/ HTTP/1.1" - 597 - "-" - "Deis Client v2.0.0-dev" - "~^deis\x5C.(?<domain>.+)$" - 10.187.244.132:80 - deis.10.187.248.89.xip.io - 3.261 - 3.261
[22/Mar/2016:04:35:23 +0000] - 10.184.0.9 - - - 200 - "GET /v2/apps/test-23966460/config/ HTTP/1.1" - 585 - "-" - "Deis Client v2.0.0-dev" - "~^deis\x5C.(?<domain>.+)$" - 10.187.244.132:80 - deis.10.187.248.89.xip.io - 0.033 - 0.034
[22/Mar/2016:04:35:23 +0000] - 10.184.0.9 - - - 200 - "GET /v2/apps/test-23966460/config/ HTTP/1.1" - 595 - "-" - "Deis Client v2.0.0-dev" - "~^deis\x5C.(?<domain>.+)$" - 10.187.244.132:80 - deis.10.187.248.89.xip.io - 0.022 - 0.023
[22/Mar/2016:04:35:23 +0000] - 10.184.0.9 - - - 404 - "GET / HTTP/1.1" - 322 - "-" - "curl/7.35.0" - "_" - - - test-23966460.10.187.248.89.xip.io - - - 0.000

There are further router reloads at (but it is difficult to tell what the router knows about at any given moment):

2016/03/22 04:37:06 INFO: Router configuration has changed in k8s.
2016/03/22 04:39:46 INFO: Router configuration has changed in k8s.
2016/03/22 04:39:56 INFO: Router configuration has changed in k8s.
2016/03/22 04:40:26 INFO: Router configuration has changed in k8s.

vdice · 2016-03-22T16:26:27Z

I am conflicted b/c I want the tests to be reliable/dependable but I also am hesitant to add too many test 'workarounds' in instances like these because here, for example, the cli has returned and a user would reasonably assume they can begin interacting with it immediately (w/o any wait). Maybe a compromise would be add this as well as a //TODO to revisit/look into a potential router or other issue? (Specifically around Looking at the router logs, there is a reload, but we immediately see the 404 later in the logs)

slack · 2016-03-22T17:40:39Z

I think if we set an application HEALTHCHECK_URL for the app, controller won't return to the calling component until that check passes. Can you verify @helgi

Without a healthcheck we would need to have controller validate that the app and router are up by guessing.

helgi · 2016-03-22T17:56:09Z

Setting HEALTHCHECK_URL means we are injecting our own instead of using k8s default ones that "do stuff" (not much really) and longer timeouts and other things are put in place.

Controller will always verify that k8s has green lit all the probes, the default ones or ours.

Right now the controller has no insight into when the router is ready and if that hostname is routable. The router is on a 10 second loop I believe to wait for new namespaces and routable information to come up

slack · 2016-03-22T18:06:33Z

Understand the router issue. The only way to fix that is to also have controller hit the app through the router as the final check (most likely to the HEALTHCHECK_URL).

We need to define the no-check behavior for k8s. I'm betting it just leans on Pod state.

I still think HEALTHCHECK_URL in test is a "good idea" (and will push the total loop > 10s so router will have a chance to reload).

helgi · 2016-03-22T18:09:11Z

Given we ditch the concept of a platform domain as a known entity by Controller then this is not the quickest of fixes. If we can figure out how we want to derive that or if we want to hit the router IP (using xip.io) by querying k8s about it via the controller then that's a doable check and something we used to do in v1.

I'm looking at the no-check behaviour now

slack · 2016-03-22T18:10:05Z

Hitting the router svc address with the appropriate Host header would be sufficient.

helgi · 2016-03-22T18:11:04Z

Yeah, that's fair. Please open up a ticket for it since I do agree it's a good idea, if nothing else than a last check after all of k8s has gone green based on our own checks.

helgi · 2016-03-22T18:34:30Z

For the no-check version see http://kubernetes.io/docs/user-guide/pod-states/

LivenessProbe: indicates whether the container is live, i.e. still running. The LivenessProbe hints to the kubelet when a container is unhealthy. If the LivenessProbe fails, the kubelet will kill the container and the container will be subjected to it’s RestartPolicy. The default state of Liveness before the initial delay is Success. The state of Liveness for a container when no probe is provided is assumed to be Success.

ReadinessProbe: indicates whether the container is ready to service requests. If the ReadinessProbe fails, the endpoints controller will remove the pod’s IP address from the endpoints of all services that match the pod. Thus, the ReadinessProbe is sometimes useful to signal to the endpoints controller that even though a pod may be running, it should not receive traffic from the proxy (e.g. the container has a long startup time before it starts listening or the container is down for maintenance). The default state of Readiness before the initial delay is Failure. The state of Readiness for a container when no probe is provided is assumed to be Success.

So no probes = assumed state for both is Success

helgi · 2016-04-04T16:49:50Z

This shouldn't be required anymore if the controller changes are doing their thing

mboersma · 2016-04-08T15:57:36Z

Is this time.Sleep call still needed as a workaround for test stability?

slack · 2016-04-13T17:23:50Z

Closing this, probably not needed anymore.

bug(tests): wait for router to reload before testing config change

5130287

slack added in progress awaiting review and removed in progress labels Mar 22, 2016

slack self-assigned this Mar 22, 2016

slack mentioned this pull request Mar 22, 2016

Validate routeability through deis-router upon application creation. deis/controller#551

Closed

mboersma added the needs rebase label Apr 8, 2016

slack closed this Apr 13, 2016

slack removed the awaiting review label Apr 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(tests): wait for router to reload before testing config change #119

bug(tests): wait for router to reload before testing config change #119

slack commented Mar 22, 2016

vdice commented Mar 22, 2016

slack commented Mar 22, 2016

helgi commented Mar 22, 2016

slack commented Mar 22, 2016

helgi commented Mar 22, 2016

slack commented Mar 22, 2016

helgi commented Mar 22, 2016

helgi commented Mar 22, 2016

helgi commented Apr 4, 2016

mboersma commented Apr 8, 2016

slack commented Apr 13, 2016

bug(tests): wait for router to reload before testing config change #119

bug(tests): wait for router to reload before testing config change #119

Conversation

slack commented Mar 22, 2016

vdice commented Mar 22, 2016

slack commented Mar 22, 2016

helgi commented Mar 22, 2016

slack commented Mar 22, 2016

helgi commented Mar 22, 2016

slack commented Mar 22, 2016

helgi commented Mar 22, 2016

helgi commented Mar 22, 2016

helgi commented Apr 4, 2016

mboersma commented Apr 8, 2016

slack commented Apr 13, 2016