Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Short Term Strategy and Priorities] Migration of S3 Bucket Payments to Foundation #82

Closed
refroni opened this issue May 16, 2023 · 21 comments
Assignees

Comments

@refroni
Copy link
Contributor

refroni commented May 16, 2023

We might have to move the cache.nixos.org S3 bucket payment to the Foundation in a near future. Need to find a sustainable way to keep it running

@refroni refroni self-assigned this May 16, 2023
@zimbatm
Copy link
Member

zimbatm commented Jun 4, 2023

@refroni refroni changed the title Migration of S3 Bucket Payments to Foundation Migration of S3 Bucket Payments to Foundation [Short Term Resolution Thread] Jun 4, 2023
@refroni refroni changed the title Migration of S3 Bucket Payments to Foundation [Short Term Resolution Thread] Migration of S3 Bucket Payments to Foundation [Short Term Priorities] Jun 4, 2023
@refroni
Copy link
Contributor Author

refroni commented Jun 4, 2023

Listing out all options/possibilities that have been brought up or being explored below. Please add in anything that might be of interest to bring up/discuss/alternative options on the topic.

Thank you to joepie91 and raitobezarius for helping put this initial list together from the matrix/discourse discussions:

  1. S3 with partial/full sponsorship from AWS (sponsor dependency)
  2. S3 with "intelligent tiering" (cost reduction by automatically moving 'cold' data to glacier, AIUI), exact savings unknown with current data but likely significant
  3. Cloudflare R2: $15/TB storage plus ‘operation fees’, free traffic; possibly sponsorable
  4. Backblaze B2: $5/TB storage, $10/TB traffic, no minimum storage, supposedly free migration from S3
  5. Wasabi: $6/TB storage, free traffic up to 100%-of-data egress, 90 days minimum storage
  6. Storj: $4/TB storage plus ‘segment fee’, $7/TB traffic, no minimum storage, supposedly free migration from S3, unknown reliability of underlying 'decentralized' storage suppliers
  7. Telnyx: $2.30/TB storage plus 'operation fees', free traffic;

@refroni refroni changed the title Migration of S3 Bucket Payments to Foundation [Short Term Priorities] [Short Term Resolution and Priorities] Migration of S3 Bucket Payments to Foundation Jun 5, 2023
@refroni refroni changed the title [Short Term Resolution and Priorities] Migration of S3 Bucket Payments to Foundation [Short Term Strategy and Priorities] Migration of S3 Bucket Payments to Foundation Jun 5, 2023
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/the-nixos-foundations-call-to-action-s3-costs-require-community-support/28672/96

@zhaofengli
Copy link
Member

Note that with Backblaze R2, there are no egress fees if the data is proxied though Cloudflare [1].

@dinvlad
Copy link

dinvlad commented Jun 5, 2023

And also no fees via Fastly, I believe, since they're all part of the same alliance.

@zimbatm
Copy link
Member

zimbatm commented Jun 5, 2023

My impression is that only (1) and (3) are realistic short-term. Since egress is free in (3), we can then move laterally to a better long-term solution.

(2) can be removed as Intelligent Tiering is already turned on. EDIT: See https://github.com/NixOS/nixos-org-configurations/blob/f27bdec45066d828dba681cc0d2655a4ad8edb0e/terraform/cache.tf#L9-L16

(4) I believe B2 is mainly designed with backup scenarios in mind. It's optimized for low storage costs, that need to sometimes become available. I know that Domen tried it out for Cachix, and it wasn't reliable enough then. It might work if the narinfos are stored separately, but that requires more design.

(5), (6) and (7): we don't know how reliable those are.

@fleaz
Copy link

fleaz commented Jun 5, 2023

2. S3 with "intelligent tiering" (cost reduction by automatically moving 'cold' data to glacier, AIUI), exact savings unknown with current data but likely significant

Just a heads up reagarding "Intelligent tiering": Just activating it, will create costs: $0.0025 per 1k Obj per Month for monitoring of access frequency, and $0.01 per every 1,000 objects that get moved to a different class. So depending on the access pattern on the files,this could probably even increase your costs compared to leaving everything in the default storage class.

@fleaz
Copy link

fleaz commented Jun 5, 2023

and it wasn't reliable enough then

What do you mean with "not realiable enough" exactly? Because I assume they did not loose data.
If it's regarding performance, that can probably be ignored up to a certain degree due to the heavy use of Fastly in front of it?

@zimbatm
Copy link
Member

zimbatm commented Jun 5, 2023

When building your own config, the derivations most likely don't exist in the cache, but Nix will still ask the cache as it doesn't know the distinction. Because it's a new hash/path, it will always go through the CDN and hit upstream. Because Nix will wait for the cache reply before deciding to build locally, latency SLA has an impact on how fast the build happens. Because Nix will hang/retry/fail if the cache returns a 5xx request, uptime is also important. So while 90+% of the requests are cached, there is a small percentage that can never be cached, and is also important for the user experience.

If we had two backends; one for the narinfos, and one for the NAR files, then we could store the NAR files in B2/Storj/... while still providing better uptime and latency SLA for the narinfo files. It's an interesting avenue but I don't if we can pull that off in the short term.

@nh2
Copy link

nh2 commented Jun 5, 2023

Please add in anything that might be of interest

@refroni Since I cannot edit your post #82 (comment) directly, making edit suggestions here, perhaps you could add them:

  • 8. Self-host on Hetzner-dedicated+Ceph: $2.3/TB storage, $0.15/TB traffic, run by community infra team

(Edit: Discourse link for this suggestion.)

Telnyx: $2.30/TB storage plus 'operation fees', free traffic;

  • Mention that this is based on Filecoin blockchain. It claims "low latency" but I cannot find concrete latency numbers, nor how the low latency is added on top of Filecoin.

Also, I think there should be a post that summarises our options of making a transfer-out of the S3 data cheaper than $32k, if needed:

  • AWS Snowball (suggested here), potentially cutting cost by 3x
  • Deduplicating backup before transfer-out (suggested here), potentially cutting costs by 2.5x

@RaitoBezarius
Copy link
Member

Please add in anything that might be of interest

@refroni Since I cannot edit your post #82 (comment) directly, making edit suggestions here, perhaps you could add them:

  • 8. Self-host on Hetzner-dedicated+Ceph: $2.3/TB storage, $0.15/TB traffic, run by community infra team

I'd dare to say this is not very short-term actionable :P.

@AmineChikhaoui
Copy link
Member

(2) can be removed as Intelligent Tiering is already turned on. EDIT: See https://github.com/NixOS/nixos-org-configurations/blob/f27bdec45066d828dba681cc0d2655a4ad8edb0e/terraform/cache.tf#L9-L16

@zimbatm That's just a lifecycle and not intelligent tiering. Intelligent tiering would monitor and automatically move across storage classes without needing a lifecycle afaik.

@7c6f434c
Copy link
Member

7c6f434c commented Jun 5, 2023

I'd dare to say this is not very short-term actionable.

Well, there is a multi-node Ceph test in NixOS tests — is doing the same thing naively going to lose performance or reliability too? (But the very first question is to figure out the level of redundancy, sure)

@nh2
Copy link

nh2 commented Jun 5, 2023

I'd dare to say this is not very short-term actionable :P.

@RaitoBezarius Why not?

My understanding is that "short-term" means along with the Deadline - Aiming for July 1st from the Discourse post.

As mentioned on Discourse, the company I co-founded uses Ceph-on-NixOS for hosting our production data.

It is very feasible to buy 500 TB as 3 Hetzner SX servers right now, enable the corresponding NixOS modules, and start transferring data to it e.g. tomorrow.

When I posted this suggestion, had in mind that this setup, plus finishing the transfer of the ~500 TB from S3 to this cluster, would be finished before above deadline -- thus short-term.

So I think it makes sense to add it to the list of approaches to discuss.

@RaitoBezarius
Copy link
Member

I'd dare to say this is not very short-term actionable :P.

@RaitoBezarius Why not?

My understanding is that "short-term" means along with the Deadline - Aiming for July 1st from the Discourse post.

As mentioned on Discourse, the company I co-founded uses Ceph-on-NixOS for hosting our production data.

It is very feasible to buy 500 TB as 3 Hetzner SX servers right now, enable the corresponding NixOS modules, and start transferring data to it e.g. tomorrow.

When I posted this suggestion, had in mind that this setup, plus finishing the transfer of the ~500 TB from S3 to this cluster, would be finished before above deadline -- thus short-term.

So I think it makes sense to add it to the list of approaches to discuss.

I mean, would you explicitly join the sysadmin efforts to maintain such a cluster on the long term? If so, yes, this is a valid short term proposal.

But the current infra team cannot take this load.

@nh2
Copy link

nh2 commented Jun 5, 2023

would you explicitly join the sysadmin efforts to maintain such a cluster on the long term? If so, yes, this is a valid short term proposal.

But the current infra team cannot take this load.

Yes, I would join the those efforts, provided that there will be a reasonable number co-sysadmins to share maintenance with, so that the load on each individual is low. I would also be happy to share my existing knowledge regarding setup and Ceph operations.

Of course there could also be the option to spend some of the current $9k/month to pay a sysadmin for e.g. a few hours per month, or the twice-yearly NixOS upgrades.

@refroni
Copy link
Contributor Author

refroni commented Jun 7, 2023

Adding this into the general options for review as well. I would say it would be fair to assume that this is something we would also want to look into as an option for the longer term which can be looked into further with the Infra team.

@endgame
Copy link

endgame commented Jun 8, 2023

(2) can be removed as Intelligent Tiering is already turned on. EDIT: See https://github.com/NixOS/nixos-org-configurations/blob/f27bdec45066d828dba681cc0d2655a4ad8edb0e/terraform/cache.tf#L9-L16

That lifecycle rule is not moving objects into Intelligent Tiering, it's moving them to the Standard - Infrequent Access storage class. But it is probably getting most of the benefit of Intelligent Tiering without the automation charges. As I said on Discourse:

Every time I've looked at S3 Intelligent Tiering in my own work, the $0.0025 per 1,000 objects automation fee makes me nervous. According to @edolstra, there are 667M objects in the cache.nixos.org bucket, so you're paying $1667.50/month in automation fees, and 3/4 of the bucket is already in Infrequent-Access tier by some mechanism or other. So Intelligent Tiering needs to move a lot of stuff to smarter storage classes to come out ahead (or we only turn it on for large NARs, or something).

It is possible that we'd get some cost savings by moving even older stuff into Glacier Instant Archive, but I'm not sure whether that's going to immediately bite us when we want to move the entire bucket somewhere else.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/s3-update-and-recap-of-community-call/28942/1

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-s3-short-term-resolution/29413/1

@thufschmitt
Copy link
Member

Closing in favor of #86 since the “short term” is taken care of

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests