New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asset server for older uploads #2500
Comments
Just a note that this can be done with very little code changes. Offload the old files to assets server, mount the remote directory with sshfs, then use unionfs to merge remote and local image storages in a single RO directory, and use that directory in webserver config and when checking for existence. The downside is high internal traffic - older images will first be pulled from assets to main server, then delivered to user. |
posts actually has bit flags so it'd be trivial to add a flag for whether a post has been archived or not. |
Servers have around 350GB free so it's time to start thinking about this. I think the immediate step is to start hosting older files out of S3. The files are already there, the ACL just has to be set and they have to be brought out of Glacier. It is a bit pricy, but accesses should be infrequent and not having the headache of maintaining a separate asset server (and all the infrastructure and development that entails) is probably worth it. Oldest 100k posts should be hosted out of S3. I will see how that affects costs and make further decisions from there. |
Why not just host everything out of S3 and configure nginx as a simple caching proxy in front of it? That way everything is handled transparently: unpopular images naturally fall out of the cache, but they're automatically pulled in from S3 when they're accessed again. That seems simpler than handling archival at the app level. The image host doesn't necessarily need to be a separate server, just a separate subdomain (which would be beneficial regardless). |
That's actually a really good idea. I was able to implement a proof of concept on Testbooru. I guess the downsides are this:
There are some pretty strong advantages though:
I think it's possible to do this gradually by configuring Nginx to only proxy if the file doesn't exist. |
From https://danbooru.donmai.us/forum_topics/9127?page=167#forum_post_128781:
These URLs return
with this error: <Error>
<Code>InvalidObjectState</Code>
<Message>
The operation is not valid for the object's storage class
</Message>
<RequestId>329863F0B4A14EA0</RequestId>
<HostId>
Fcqbtntbkk6aOQUhrlYDt4AAdkWrQtsCLtMxtb7KEtgCuoH7RwQSVetuVuIfcgBvoivMtJhFLVU=
</HostId>
</Error> |
Yeah, I guess we kind of forgot that app assumes that file is always physically present and runs imagemagick queries on every save. There's also APNG detector which likely broke as well. Actually, I have a feeling it'll bring us more problems in the future. Can't this be configured transparently at OS filesystem level instead of nginx, via SSHFS for example, or maybe using some smarter remote FS? |
The plan will be to migrate older posts instead of making this happen all at once during upload. So I think the file existence check is good enough. |
This actually already resulted in a problem. Since |
I don't see why auto-tags should ever be removed. If the file itself never changes then properties like whether it's animated or not will never change. |
Eventually the primary web servers will run out of hard drive space. Older images should be offloaded onto asset servers so free up space for newer content.
The text was updated successfully, but these errors were encountered: