-
Notifications
You must be signed in to change notification settings - Fork 14
RFC to remove our Google Cloud CDN backup #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bilbof
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent. Might be worth linking to the original decision / explanation for why we have 3 CDNs, if there is such a decision document.
9d08a79 to
761b31f
Compare
ChrisBAshton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
bilbof
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, great work
|
|
||
| The certificate in use on the CDN expired in May 2020. | ||
|
|
||
| The Google Cloud Storage bucket the CDN is pointing to is currently private. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended behaviour to save money in case someone/thing decides to crawl against the Google bucket. As part of the playbook for activating the Google CDN, we would mark the bucket as public and re-point the www.gov.uk DNS to the Google Cloud CDN fronting the bucket.
The main issue we have is the certificate issuing before the live traffic switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that we don't bother with this for the CloudFront / S3 backup. What's to stop something from crawling that?
In any case, I think it's still worth mentioning this as something that's broken. If we had documentation on how to fail over which included making the bucket public, that would be fine. But we don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm.. my understanding is that the S3 buckets should be protected, see here: https://github.com/alphagov/govuk-aws/blob/master/terraform/projects/infra-mirror-bucket/mirror-read-policy.tf
I don't know what policy they had on the Cloudfont side as I didn't do that work. My thoughts were they were going to be only office IPs allowed to use the Cloudfront endpoint using test and we widen the whitelist once it is made operational.
barrucadu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a second backup CDN seems mad, especially if we didn't notice that it was broken (expired certificate) for ~10 months.
Putting in the work to fix this and make it testable seems far less valuable than spending the same time improving other parts of GOV.UK... or just on other non-reliability-related work too.
fredericfran-gds
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unlikely we would ever use Google cloud CDN and even the AWS CloudFront since it will have so little functionality that the higher powers will pile in pressure to fix Fastly etc.
brucebolt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
I have a couple of questions:
- Have we confirmed Fastly do not use any AWS infrastructure behind the scenes (if so, a AWS outage that takes Fastly offline could render the AWS option unusable at the same time).
- I notice assets (mostly images) on www.amazon.co.uk have a
x-cacheheader that indicates they use Fastly. Have we considered why Amazon are using a competitor and not using their own service? I'm hoping this is just down to better performance, but it may be worth researching if any information about this is public.
The infra cost is quite small, but the cost of fixing it (including the opportunity cost) is unacceptably high.
aa8774e to
17f7833
Compare
No - I can't find any evidence one way or another. Their careers page asks for experience of bare metal, and AWS / Azure / GCP, so it's possible they're using a mix of stuff. We'd need to know a fair bit about their internal infra to know if AWS outages and Fastly outages would be correlated. I guess we could ask them on slack, but it feels like a bit of a weird question. Their infrastructure "at the edge" (the POPs) is definitely not AWS as they're in different locations to AWS' data centers :)
Fastly's definitely a better product than CloudFront 😅. Amazon has a history of using technically superior products instead of AWS |
brucebolt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @richardTowers. I couldn't find any further information on either of my questions either, so this all sounds good to me.
himal-mandalia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
The CDN has now been removed. |
✨ Rendered ✨
Deadline: 2021-03-17