Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Mirroring #2

Open
ghost opened this issue Feb 15, 2016 · 25 comments
Open

Data Mirroring #2

ghost opened this issue Feb 15, 2016 · 25 comments

Comments

@ghost
Copy link

ghost commented Feb 15, 2016

  • Who?
  • How?
@bookt-jacob
Copy link

S3 for storage w/ Cloudfront for CDN? Long expiration in CDN should be very effective. Having the data at rest in S3 reduces HTTP requests to the backend servers and should prove much simpler than mirrors.

@ghost
Copy link
Author

ghost commented Feb 15, 2016

The upside of using mirrors is, though, that we can defray hosting costs.

Our budget so far is $0.

@ghost ghost mentioned this issue Feb 15, 2016
@GenPage
Copy link

GenPage commented Feb 15, 2016

I can provide rudimentary mirrors through DigitalOcean in all regions at no cost.

@ghost ghost added the enhancement label Feb 15, 2016
@ghost ghost added this to the Back Online! milestone Feb 15, 2016
@Vekseid
Copy link

Vekseid commented Feb 15, 2016

I can throw several TB/month into the pool.

@ghost ghost removed this from the Back Online! milestone Feb 15, 2016
@SpaceTeph
Copy link

Linux distros use rsync for mirror synchronization - here is how Arch Linux does it. Something similar to the process they use could enable about anybody willing who has a HTTP server running somewhere to donate bandwidth and storage, which at times is easier than donating money.

@pjf
Copy link

pjf commented Feb 15, 2016

I have to run and give a talk, but the Internet Archive is happy to host freely distributable content on their servers, which includes all FOSS/CC licensed KSP mods. They have an S3-alike API that's described here.

Yes, the Internet Archive is super-awesome. <3

@ghost
Copy link
Author

ghost commented Feb 15, 2016

I can donate some of my bandwidth for a mirror. I can't promise much speed. But I am willing to help.

@NecroBones
Copy link

Internet Archive might be really great for this.

For our own mirroring options, I can spare some bandwidth too. Not on the order of what KerbalStuff was using by itself, but I have unused quota each month on my hosting, since my own websites use less than 5% of what I'm permitted, last I looked. If we get enough mirrors involved, each one's bandwidth requirement would be fairly small.

@Ristellise
Copy link

github pages can store it well
EDIT: will be hosting all OLD downloads on github so stay tuned.

@NecroBones
Copy link

I took a look at my Linode account, and I'd have to increase my plan to have enough disk storage for all of the current file data (since there's 62 GB of it). In terms of monthly bandwidth allowance, I have tons of room to spare. It's the disk that's really tight. I'll hold off from doing anything until we know whether we need the mirrors.

@dries007
Copy link

I'm offering up part of my unlimited 250 mbit dedicated in Europe (its in Roubaix, France).
It only has 3x 110gb SSD's, so I can't provide a full mirror (there is other stuff running, mostly minecraft servers), but its still might come in handy.

@brandonwamboldt
Copy link

I have 500GB of hard drive space with a 1gbps uplink on a dedicated server (with CloudFlare as a CDN in front of it). Would love to help out as well.

@sebneira
Copy link

I'll be working on the best solution for this case, as it seems that we have lots of people willing to offer mirrors + the Internet Archive.

@phmayo have you come to any conclusions?

@sebneira sebneira self-assigned this Feb 17, 2016
@ghost
Copy link
Author

ghost commented Feb 17, 2016

Using the IA requires work on the backend, either the website, or an uploading cron job. @ThomasKerman and VITAS have plenty to do as it is for the moment.

So, much as I want this, we need to focus on getting an easy way for mirrors to be activated and made available first, preferably without any impact on CKAN at all. If that isn't possible, well, we'll deal with that when it's time. Making SD resilient to failure is our top priority, so we don't get another outage like on Monday.

rsync would be the low hanging fruit. Maybe something like SyncThing, though that requires manual intervention, but goes a little easier on the bandwidth.

For a mirror, 100GB storage, 10 TB transfer should be plenty to get us started.

@sebneira
Copy link

@phmayo just had a conversation with VITAS and got to a pretty nice design for the solution plus the fact that it's not a priority. I'll get a workflow during the next days.

Will look into SyncThing, thank for sharing!

@NecroBones
Copy link

rsync is pretty easy of course. Another possibility is to roll our own process to push out whole files when new ones are added, and nightly (or every 48 hours or whatever) rsync to catch anything that was missed/dropped.

@ghost ghost added this to the Content Delivery milestone Feb 17, 2016
@oliverde8
Copy link

Hi you might wish to check https://about.maniacdn.net/ it is a community driven cdn built for another game. and the sources are public. It uses rsync to sync the files. Anyone can contribute with their server.

@dries007
Copy link

How about web caching, a solution that requires no control over the mirror server, and requires no cronjobs or daemons?

@sebneira sebneira removed their assignment Feb 18, 2016
@sebneira
Copy link

The main concern here is the following: anyone can poison files with ease, as these are not signed.

@dries007 I don't believe that would be a solution as it wouldn't help with transfer bandwidth nor having distributed data in case of failure.

@dries007
Copy link

Well, you are setting up a deliberate man-in-the-middle structure, but how hard would it be to add hashes to the page to so at least people who want to can check them, that is if you are only serving the download files out of the CDN and not also the main page.

@sikian I disagree, I'll be using nginx as an example here, but I'm pretty sure you'd be able to apply this to most web server software:
You can configure nginx to serve stale content in case of timeout or http 5xx errors, which would allow the cached content to be served in case of error, and if the cache is configured properly, that will be the most requested content, so it'd keep you going.
If you'd enable caching on /static/ and on the mods, you are taking most bandwidth issues away right there. Or am I wrong? (I've not implemented this on any larger scale)

@oliverde8
Copy link

@dries007 Not really the bandwith issue is about the network, you server would still need to answer to the same quantity of request and send the same amount of data.

@sikian I understand I am a trusting person but I can see where that would be going

@dries007
Copy link

@oliverde8 If multiple people have multiple caches running, you can distribute the load and the main server would only have to supply the user specific data, and new files the cache doesn't have yet.
This is basically what cloudflare does right? Except that we know better what files are long term cache-able, and which ones are session/page-view specific.

@brandonwamboldt
Copy link

@dries007 FYI you can configure caching rules in CloudFlare to tell it what to store short term or long term (it will listen for standard cache control headers).

@dries007
Copy link

@brandonwamboldt I thought that was premium only, good to know.

@brandonwamboldt
Copy link

@dries007 There is a limit for the free account (although it will always follow cache headers so you can just set it up via Nginx/Apache). However, if SpaceDock goes with CF I've volunteered to sponsor the premium plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests