Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate setting "Requester Pays" on the cache S3 bucket #277

Closed
delroth opened this issue Sep 13, 2023 · 9 comments · Fixed by #299
Closed

Investigate setting "Requester Pays" on the cache S3 bucket #277

delroth opened this issue Sep 13, 2023 · 9 comments · Fixed by #299

Comments

@delroth
Copy link
Contributor

delroth commented Sep 13, 2023

The NixOS cache S3 bucket is currently publicly accessible. However, the foundation gets charged for any access to that publicly accessible bucket (bandwidth when not accessed from AWS us-east, and ops always). This has the following problems (non exhaustive list):

  • A malicious actor could bleed money from the foundation.
  • A non-malicious actor could write code accessing the S3 bucket (e.g. to perform a batch analysis of some sort) and accidentally end up charging all the analysis cost to the foundation. Them being shielded from the costs also means they have no incentive to make their analysis more efficient.

AWS supports a feature called "Requester Pays" for public S3 buckets, which charges the cost of the request (bandwidth + ops) to the requester. The requester needs to authenticate their request to the bucket so charges can be accounted properly, but otherwise the contents are still publicly readable.

It's currently unclear whether this can be made to work with Fastly - for example, whether a bucket can be both "Requester Pays" for unknown users, and "Owner Pays" (current config) for certain identities. This would need some digging into AWS docs, or asking someone who knows more about S3 settings :)

@delroth
Copy link
Contributor Author

delroth commented Sep 13, 2023

I suspect Fastly could be made to work using something similar to https://docs.fastly.com/en/guides/amazon-s3#using-an-amazon-s3-private-bucket to authenticate requests with foundation-owned credentials.

@zimbatm
Copy link
Member

zimbatm commented Sep 30, 2023

This has become a priority because I suspect somebody is pulling directly from the bucket, and the invoice keeps increasing.

image

On the Fastly side, the requests are relatively constant over time:

image

@delroth
Copy link
Contributor Author

delroth commented Sep 30, 2023

Suggested pre-rollout plan:

  1. Create a test bucket set up with similar perms to the cache bucket (RO to everyone). Make it Requester Pays.
  2. Create an IAM user to use for our Fastly config, attached to the Foundation's billing account.
  3. Create a test Fastly service to front that test bucket. Use the Fastly docs for private buckets (linked in my initial comment) to set up authn. Figure out how to add the magic header that grants consent for billing (x-amz-request-payer). Ensure test bucket access works.
  4. Figure out a rollback plan: is removing Requester Pays on the bucket enough? If so, document how. If authn also needs to be removed on the Fastly side (e.g. if it confuses AWS when it's used on a bucket that's anonymous / not Requester Pays anymore), also document this.

For the actual rollout, I need to look into what options Fastly provides for gradual rollouts of service config changes. If we're fine with short interruptions of service and we're confident in the rollback procedure we could also just globally push the change and monitor (I don't really expect that this change would cause any non-binary issues - it's either it works or it doesn't...).

There might be broken pieces to pick up after the fact if other parts of the infra relied on anonymous S3 bucket access, which will break. Channel scripts would be my best guess. We should audit beforehand, but I don't think it's necessarily worth trying to run them against a test environment - we can always fix them after the fact (at a small cost to channel bump latency) or rollback.

@zimbatm if you're fine with me leading this I'll need Fastly access. Can you provision this?

@zimbatm
Copy link
Member

zimbatm commented Oct 2, 2023

Sounds good. I also need to give you access to the Terraform bucket. Once you get Fastly to a point where it's authenticated against S3, we can toggle it on/off with no problem.

I'm also going to do an announce on Discourse to warn people of the change.

@delroth
Copy link
Contributor Author

delroth commented Oct 3, 2023

Initial research using my personal S3/Fastly accounts:

  • Fastly's sample VCL to authenticate S3<->Fastly requests via signing seems to work fine (https://docs.fastly.com/en/guides/amazon-s3#using-an-amazon-s3-private-bucket)
    • Note: the host format in the VCL needs to match how the origin is configured. I hit some issues because I was using .s3.amazonaws.com when this VCL snippet expects .s3.REGION.amazonaws.com.
    • Note: x-amz-request-payer doesn't need to be set in my case because the IAM user for Fastly I'm using is owned by the same org that owns the bucket. If we set things up differently in the prod NixOS AWS environment we'll need to add that bit to the VCL
  • IAM user needs no special permissions to be able to authenticate an S3 request
  • S3 doesn't seem to care if authentication is provided when Requester Pays is off.
  • Access continues to work after Requester Pays is on!

Result: I've got s3://delroth-test-bucket set to Requester Pays + world readable. https://test-bucket.delroth.net/index.html is fronted by Fastly and works fine, authenticated with an IAM user that has no permissions. Direct unauth'd bucket access fails: https://delroth-test-bucket.s3.us-east-1.amazonaws.com/index.html but S3 CLI access works for a user with no permissions.


Conclusions:

  • We can already go ahead and create the IAM user we'll use for Fastly. It needs no permissions, just needs to be owned by the org we want to be billed.
  • We can create a 2nd "clone" Fastly service to test the configuration against the prod bucket, since authenticated requests should work against the nix-cache S3 bucket even currently.
    • This might be something we want to keep long term anyway, e.g. a permanent https://cache-test.nixos.org/ where we can experiment with Fastly settings.
  • What we want to do seems to be working fine in practice, concept is validated.

@delroth
Copy link
Contributor Author

delroth commented Oct 3, 2023

I'm also going to do an announce on Discourse to warn people of the change.

Draft comms. Feel free to use as is / edit as you want, or I can post it myself if it looks good to you.

Disabling anonymous direct S3 access to the NixOS cache

If you are not maintaining software which uses AWS S3 directly to access the NixOS cache contents, you can stop reading now. This does not impact any access through the cache CDN, e.g. https://cache.nixos.org/ and does not impact Nix/NixOS end-users.

The NixOS cache is hosted on Amazon S3 and its contents are publicly readable to anyone. However, any access to the cache currently results in costs to the NixOS Foundation. We've recently noticed that this might be representing a non-trivial portion of the infrastructure costs. As a countermeasure, we will be implementing the following change:

  • Accessing the nix-cache S3 bucket will require authentication. The contents are still worldwide readable and can be performed without special authorization, but in the future you'll need an AWS account and requests will need to be properly signed with your credentials.
  • Additionally, we will be enabling the Requester Pays option on the S3 Bucket. This means that the costs induced for direct access to the nix-cache S3 bucket will be charged to the AWS user who sent the request, not the NixOS foundation. This requires specific opt-in configuration so it shouldn't take anyone by surprise.

This change will take effect on: 2023-10-XX.

Summary of actions required:

  • If you use https://nix-cache.s3.amazonaws.com/ or https://nix-cache.s3.us-east-1.amazonaws.com/ directly: use https://cache.nixos.org/ instead. You can also use signed HTTP requests with the x-amz-request-payer flag set.
  • If you use s3://nix-cache via a programmatic client or the S3 CLI, make sure that your client has AWS credentials, and configure it to use x-amz-request-payer (docs).
  • If you are in neither of these cases: you should not be impacted.
  • If you switch to x-amz-request-payer: estimate the costs and ensure you'll be able to pay the bill that will now be charged to you!

@delroth
Copy link
Contributor Author

delroth commented Oct 27, 2023

I've got the Fastly setup working on cache-staging.nixos.org - next steps:

  1. Maybe tomorrow: push the Fastly VCL to cache.nixos.org if we don't notice any other problem on cache-staging.nixos.org
  2. Announce the change with a (suggested) 1 week heads up.
  3. Set Requester Pays on the bucket and hope again nothing explodes.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/disabling-anonymous-direct-s3-access-to-the-nixos-cache/34697/1

@delroth
Copy link
Contributor Author

delroth commented Oct 27, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants