New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make /healthcheck/ always return 200 because CloudFront is weird #1516

Closed
toolness opened this Issue Mar 23, 2017 · 12 comments

Comments

Projects
None yet
3 participants
@toolness
Contributor

toolness commented Mar 23, 2017

On calc.gsa.gov right now, we haven't set up the Site object, which means that the page's source contains the following post-#1407:

<link rel="canonical" href="https://example.com/">

And that means that pasting CALC's URL into sites like Facebook does the following:

pasted image at 2017_03_23 08_00 am

😞

@toolness

This comment has been minimized.

Contributor

toolness commented Mar 23, 2017

Er, also, /healthcheck/ should verify that request.META['HTTP_HOST'] is the same as the current Site's domain property. I meant to add that to #1407 but ... wait, no, I did add it!

Ummm and curl -I 'https://calc.gsa.gov/healthcheck/' gives me:

HTTP/1.1 500 Internal Server Error

because the payload is:

{  
   "version":"2.5.1",
   "is_database_synchronized":true,
   "canonical_url":"https://example.com/healthcheck/",
   "request_url":"https://calc.gsa.gov/healthcheck/",
   "canonical_url_matches_request_url":false,
   "rq_jobs":0
}

@jseppi is New Relic not currently checking /healthcheck/ or something?

@jseppi

This comment has been minimized.

Contributor

jseppi commented Mar 23, 2017

No, we don't have New Relic setup to monitor /healthcheck/. Will add it now!

@jseppi

This comment has been minimized.

Contributor

jseppi commented Mar 23, 2017

Ugh, wait, we actually did have it setup. Unsure why it wasn't alerting...looking into it.

@jseppi

This comment has been minimized.

Contributor

jseppi commented Mar 23, 2017

Theory that @toolness and I just came up with: When we should be getting a 500 error, CloudFront is instead serving a cached version of its last known good response.

@jseppi

This comment has been minimized.

Contributor

jseppi commented Mar 23, 2017

🤦‍♀️
From: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/HTTPStatusCodes.html

If your origin returns a 5xx error code, CloudFront serves the object even though it has expired. For the duration of the error caching minimum TTL (five minutes by default), CloudFront continues to respond to viewer requests by serving the object from the edge cache.

@toolness

This comment has been minimized.

Contributor

toolness commented Mar 23, 2017

I actually vaguely recall reading this a few months ago but I didn't (and still don't) actually understand what it means. Does it basicall mean that CF will be serving the last-good response?

CF is so confusing...

@toolness

This comment has been minimized.

Contributor

toolness commented Mar 23, 2017

um, so i guess another alternative is just to have /healthcheck/ always return 200, but include the "is_everything_ok" key in it, and we check against that instead of the doggone response code.

bleh.

@jseppi

This comment has been minimized.

Contributor

jseppi commented Mar 23, 2017

There are a lot of situations described on that linked page, but from what I can gather it is that CF keeps serving the last-good response, regardless of expiration headers, when the response was 5xx. Interestingly, it looks like if it is 4xx, it will actually return the error, but not the requested "object".

@jseppi

This comment has been minimized.

Contributor

jseppi commented Mar 23, 2017

This is so whacky

@jseppi

This comment has been minimized.

Contributor

jseppi commented Mar 23, 2017

Yeah, always returning 200 is the only thing that seems like it will work. Which is goofy IMO, but 🤷‍♀️

@toolness

This comment has been minimized.

Contributor

toolness commented Mar 23, 2017

Ok I will bust a PR soon!

@toolness toolness changed the title from Add a test to production test suite ensuring value of <link rel="canonical"> to Make /healthcheck/ always return 200 because CloudFront is weird Mar 23, 2017

@jseppi

This comment has been minimized.

Contributor

jseppi commented Mar 23, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment