local proxying: set a default resolver ttl of 10s #54

risicle · 2019-03-06T14:40:06Z

This is needed because internal ("private") PaaS routes are load-balanced simply on a DNS trick and our blue-green deployment procedure may make an instance disappear at any time with little notice.

While I'd love to be able to test this without merging, that slightly defeats the point because it's a problem with deployment & releasing that this is trying to fix, and we need to go through a proper release cycle to see if it works.

Note I've added this rule for connections to our frontend apps too even though we don't yet use internal routes for them - this is because I anticipate someone trying to switch them too sooner or later and I'd rather they don't have to rediscover this problem from scratch then.

this is needed because internal ("private") PaaS routes are load-balanced simply on a DNS trick and our blue-green deployment procedure may make an instance disappear at any time with little notice

risicle · 2019-03-06T14:51:39Z

To elaborate on the "DNS trick" used by cf, it should in most cases "just work" because their DNS server responds with a 0s ttl, which compliant clients should take to mean "re-lookup every time", but nginx notoriously takes to mean "whatever". So that's why we're only seeing it as a problem in the router.

katstevens · 2019-03-06T15:23:57Z

templates/api.j2

@@ -3,6 +3,8 @@
 server {
    listen 80;
    server_name api.*;
+    # valid= value must be significantly less than the amount of guard time we have in our blue-green deployment process
+    resolver {{ resolver_ip }} valid=10s;


Don't we already have a line like this in the top level nginx.conf.j2? Would it be better to replace the 300s value there than do each app separately?

That would also have an effect on e.g. requests forwarded to S3 - are we sure we want to do that?

Ah gotcha. OK!

lfdebrux · 2019-03-06T15:38:08Z

Can you deploy to test without merging then restore before merging to test the deployment process? Just to check that the config isn't going to break routing?

katstevens

Let's see what happens (have you got a revert PR at the ready?)

risicle · 2019-03-06T15:45:00Z

@lfdebrux I think that would be tricky to say the least.

local proxying: set a default resolver ttl of 10s

a5df538

this is needed because internal ("private") PaaS routes are load-balanced simply on a DNS trick and our blue-green deployment procedure may make an instance disappear at any time with little notice

katstevens reviewed Mar 6, 2019

View reviewed changes

katstevens approved these changes Mar 6, 2019

View reviewed changes

risicle merged commit bba4293 into master Mar 6, 2019

risicle deleted the ris-api-proxy-pass-resolver-timeout branch March 6, 2019 15:45

risicle mentioned this pull request Mar 6, 2019

Revert "local proxying: set a default resolver ttl of 10s" #55

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local proxying: set a default resolver ttl of 10s #54

local proxying: set a default resolver ttl of 10s #54

risicle commented Mar 6, 2019

risicle commented Mar 6, 2019

katstevens Mar 6, 2019

risicle Mar 6, 2019

katstevens Mar 6, 2019

lfdebrux commented Mar 6, 2019 •

edited

Loading

katstevens left a comment

risicle commented Mar 6, 2019

local proxying: set a default resolver ttl of 10s #54

local proxying: set a default resolver ttl of 10s #54

Conversation

risicle commented Mar 6, 2019

risicle commented Mar 6, 2019

katstevens Mar 6, 2019

Choose a reason for hiding this comment

risicle Mar 6, 2019

Choose a reason for hiding this comment

katstevens Mar 6, 2019

Choose a reason for hiding this comment

lfdebrux commented Mar 6, 2019 • edited Loading

katstevens left a comment

Choose a reason for hiding this comment

risicle commented Mar 6, 2019

lfdebrux commented Mar 6, 2019 •

edited

Loading