Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local proxying: set a default resolver ttl of 10s #54

Merged
merged 1 commit into from
Mar 6, 2019

Conversation

risicle
Copy link
Contributor

@risicle risicle commented Mar 6, 2019

https://trello.com/c/H53K45It

This is needed because internal ("private") PaaS routes are load-balanced simply on a DNS trick and our blue-green deployment procedure may make an instance disappear at any time with little notice.

While I'd love to be able to test this without merging, that slightly defeats the point because it's a problem with deployment & releasing that this is trying to fix, and we need to go through a proper release cycle to see if it works.

Note I've added this rule for connections to our frontend apps too even though we don't yet use internal routes for them - this is because I anticipate someone trying to switch them too sooner or later and I'd rather they don't have to rediscover this problem from scratch then.

this is needed because internal ("private") PaaS routes are load-balanced
simply on a DNS trick and our blue-green deployment procedure may make an
instance disappear at any time with little notice
@risicle
Copy link
Contributor Author

risicle commented Mar 6, 2019

To elaborate on the "DNS trick" used by cf, it should in most cases "just work" because their DNS server responds with a 0s ttl, which compliant clients should take to mean "re-lookup every time", but nginx notoriously takes to mean "whatever". So that's why we're only seeing it as a problem in the router.

@@ -3,6 +3,8 @@
server {
listen 80;
server_name api.*;
# valid= value must be significantly less than the amount of guard time we have in our blue-green deployment process
resolver {{ resolver_ip }} valid=10s;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already have a line like this in the top level nginx.conf.j2? Would it be better to replace the 300s value there than do each app separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also have an effect on e.g. requests forwarded to S3 - are we sure we want to do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha. OK!

@lfdebrux
Copy link
Contributor

lfdebrux commented Mar 6, 2019

Can you deploy to test without merging then restore before merging to test the deployment process? Just to check that the config isn't going to break routing?

Copy link
Contributor

@katstevens katstevens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see what happens (have you got a revert PR at the ready?)

@risicle
Copy link
Contributor Author

risicle commented Mar 6, 2019

@lfdebrux I think that would be tricky to say the least.

@risicle risicle merged commit bba4293 into master Mar 6, 2019
@risicle risicle deleted the ris-api-proxy-pass-resolver-timeout branch March 6, 2019 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants