Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce storage manager concept #3577

Merged
merged 10 commits into from
Mar 27, 2018

Conversation

evazion
Copy link
Member

@evazion evazion commented Mar 21, 2018

This introduces a concept of storage managers for dealing with the storage of uploaded files. This incidentally fixes #3569, but the main goal is to clean up how archived files are dealt with.

This PR makes it possible to configure how uploads are stored: locally, via SFTP, or via S3. This also makes it possible to fully store all files externally, so that the webservers can be independent of the asset servers.

The basic idea is to introduce a StorageManager class with store and delete methods. Subclasses implement three different storage methods: local filesystem, SFTP, or S3. Each method supports flat or hierarchical directory structures. The SFTP method supports multiple hosts, which allows you to store files to both sonohara and hijiribe at the same time.

There is a fourth "hybrid" method, which lets you choose the method to use based on the post's ID and type (preview, sample, or original file). In this way it's possible to support complex configurations like Danbooru's, where files are stored across different servers with different URL structures. Here is how you would replicate Danbooru's current setup.

The basic idea itself is simple enough, but integrating it with the upload process turned out to be somewhat involved. This was mainly because of assumptions throughout the code that files are stored locally on the webserver. In short:

  • Autotagging of animated GIFs/PNGs was moved from during post editing to during upload. This assumed that files were available on the local webserver, which we don't want to assume. Rechecking the file on every edit isn't very efficient anyway.

  • Ugoira .webm conversion was made synchronous instead of asynchronous. Doing it asynchronously created multiple problems: it assumed the .zip file was available locally, it could break backups and iqdb submissions (since the conversion may not be finished yet when these things run), and it required hacky file existence checks in various places to detect whether the conversion was finished and the images were ready to use.

  • Backups were made synchronous instead of asynchronous. Again, this assumed that uploaded files were present on the local webserver. It also meant that if the job worker went down, uploads would be accepted without being backed up.

  • The signature for Post.delete_files had to be changed. Important: this means existing file deletion jobs need to be cancelled ASAP. These jobs wouldn't work right anyway, since in many cases they're trying to delete files that have since been archived.

  • /posts.json now returns absolute URLs in all cases. Right now it returns absolute URLs for some posts and relative URLs for others, which clients have to detect and fix in order to download files. This change may break scrapers. However, it's not hard to fix and makes the API easier to deal with.

  • The upload process was reworked to pass around temp files, instead of writing preview/samples directly to the local filesystem. This simplified some things.

  • URL building was moved to StorageManager, since the URL structure depends on how files are stored.

  • The backup system was changed to use the storage management system. This means you can choose between backing up files to a local directory, to SFTP, or to S3.

Making backups and ugoira conversions synchronous may slow down uploads. Ultimately I think it'd be better to make the whole upload process asynchronous, but that's outside the scope of this.

* Remove `initialize_status` (status already defaults to pending in database)
* Remove `has_dimensions?` (always returns true)
* Remove `async_conversion?` (dead code)
* Remove `validate_file_exists` (unneeded checks)
* Simplify `calculate_dimensions`
* Merge `file_header_to_content_type` with `content_type_to_file_ext` (content type isn't used elsewhere)
Move animated_gif / animated_png autotagging to take place during
uploading, instead of during tag editing. We can't generally assume the
file will be present on the local filesystem after uploading.
Refactors the upload process to pass around temp files, rather than
passing around file paths and directly writing output to the local
filesystem. This way we can pass the storage manager the preview /
sample / original temp files, so it can deal with storage itself.

* Change Download::File#download! to return a temp file.

* Change DanbooruImageResizer and PixivUgoiraConverter to accept/return
  temp files instead of file paths.

* Change Upload#generate_resizes to return temp files for previews and samples.

* Change Upload#generate_resizes to generate ugoira .webm samples
  synchronously instead of asynchronously.
* Perform backups synchronously inside `distribute_files` instead of
  asynchronously in `queue_backup`. Asynchronous backups assumed that
  files are stored on the local filesystem, which isn't true in general.

* Remove obsolete backup service classes.
* Remove file_path_for, cropped_file_url (dead code)
* Remove complete_preview_file_url (preview_file_url now returns absolute links)
* Remove `file_name` (only used for Download link in sidebar)
@r888888888
Copy link
Collaborator

Is this good to merge now?

@evazion
Copy link
Member Author

evazion commented Mar 27, 2018

Yeah, I'm finished with it, should be good to go.

@r888888888 r888888888 merged commit ce7e313 into danbooru:master Mar 27, 2018
@evazion evazion deleted the feat-storage-manager branch April 14, 2018 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Post replacement breaks posts whose images are hosted on raikou2.donmai.us
2 participants