Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC to remove our Google Cloud CDN backup #136

Merged
merged 2 commits into from
Mar 16, 2021

Conversation

richardTowers
Copy link
Contributor

@richardTowers richardTowers commented Feb 25, 2021

Rendered

Deadline: 2021-03-17

Base automatically changed from master to main February 25, 2021 13:52
Copy link
Contributor

@bilbof bilbof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Might be worth linking to the original decision / explanation for why we have 3 CDNs, if there is such a decision document.

rfc-136-remove-our-backup-cdn-in-gcp.md Show resolved Hide resolved
@richardTowers richardTowers marked this pull request as ready for review March 10, 2021 14:06
@richardTowers richardTowers changed the title WIP - RFC to remove our Google Cloud CDN backup RFC to remove our Google Cloud CDN backup Mar 10, 2021
Copy link
Contributor

@ChrisBAshton ChrisBAshton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Contributor

@bilbof bilbof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great work


The certificate in use on the CDN expired in May 2020.

The Google Cloud Storage bucket the CDN is pointing to is currently private.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended behaviour to save money in case someone/thing decides to crawl against the Google bucket. As part of the playbook for activating the Google CDN, we would mark the bucket as public and re-point the www.gov.uk DNS to the Google Cloud CDN fronting the bucket.

The main issue we have is the certificate issuing before the live traffic switch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that we don't bother with this for the CloudFront / S3 backup. What's to stop something from crawling that?

In any case, I think it's still worth mentioning this as something that's broken. If we had documentation on how to fail over which included making the bucket public, that would be fine. But we don't.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm.. my understanding is that the S3 buckets should be protected, see here: https://github.com/alphagov/govuk-aws/blob/master/terraform/projects/infra-mirror-bucket/mirror-read-policy.tf

I don't know what policy they had on the Cloudfont side as I didn't do that work. My thoughts were they were going to be only office IPs allowed to use the Cloudfront endpoint using test and we widen the whitelist once it is made operational.

Copy link
Contributor

@barrucadu barrucadu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a second backup CDN seems mad, especially if we didn't notice that it was broken (expired certificate) for ~10 months.

Putting in the work to fix this and make it testable seems far less valuable than spending the same time improving other parts of GOV.UK... or just on other non-reliability-related work too.

Copy link

@fredericfran-gds fredericfran-gds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unlikely we would ever use Google cloud CDN and even the AWS CloudFront since it will have so little functionality that the higher powers will pile in pressure to fix Fastly etc.

Copy link
Member

@brucebolt brucebolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

I have a couple of questions:

  • Have we confirmed Fastly do not use any AWS infrastructure behind the scenes (if so, a AWS outage that takes Fastly offline could render the AWS option unusable at the same time).
  • I notice assets (mostly images) on www.amazon.co.uk have a x-cache header that indicates they use Fastly. Have we considered why Amazon are using a competitor and not using their own service? I'm hoping this is just down to better performance, but it may be worth researching if any information about this is public.

rfc-136-remove-our-backup-cdn-in-gcp.md Outdated Show resolved Hide resolved
The infra cost is quite small, but the cost of fixing it (including the
opportunity cost) is unacceptably high.
@richardTowers
Copy link
Contributor Author

Have we confirmed Fastly do not use any AWS infrastructure behind the scenes (if so, a AWS outage that takes Fastly offline could render the AWS option unusable at the same time).

No - I can't find any evidence one way or another. Their careers page asks for experience of bare metal, and AWS / Azure / GCP, so it's possible they're using a mix of stuff. We'd need to know a fair bit about their internal infra to know if AWS outages and Fastly outages would be correlated. I guess we could ask them on slack, but it feels like a bit of a weird question.

Their infrastructure "at the edge" (the POPs) is definitely not AWS as they're in different locations to AWS' data centers :)

I notice assets (mostly images) on www.amazon.co.uk have a x-cache header that indicates they use Fastly. Have we considered why Amazon are using a competitor and not using their own service? I'm hoping this is just down to better performance, but it may be worth researching if any information about this is public.

Fastly's definitely a better product than CloudFront 😅. Amazon has a history of using technically superior products instead of AWS :trollface:.

Copy link
Member

@brucebolt brucebolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @richardTowers. I couldn't find any further information on either of my questions either, so this all sounds good to me.

@himal-mandalia himal-mandalia self-assigned this Mar 16, 2021
@himal-mandalia himal-mandalia self-requested a review March 16, 2021 15:44
Copy link

@himal-mandalia himal-mandalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@richardTowers richardTowers merged commit 1f81c13 into main Mar 16, 2021
@richardTowers richardTowers deleted the remove-the-gcp-backup-cdn branch March 16, 2021 15:44
@richardTowers
Copy link
Contributor Author

The CDN has now been removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants