Introduce storage manager concept #3577

evazion · 2018-03-21T02:30:52Z

This introduces a concept of storage managers for dealing with the storage of uploaded files. This incidentally fixes #3569, but the main goal is to clean up how archived files are dealt with.

This PR makes it possible to configure how uploads are stored: locally, via SFTP, or via S3. This also makes it possible to fully store all files externally, so that the webservers can be independent of the asset servers.

The basic idea is to introduce a StorageManager class with store and delete methods. Subclasses implement three different storage methods: local filesystem, SFTP, or S3. Each method supports flat or hierarchical directory structures. The SFTP method supports multiple hosts, which allows you to store files to both sonohara and hijiribe at the same time.

There is a fourth "hybrid" method, which lets you choose the method to use based on the post's ID and type (preview, sample, or original file). In this way it's possible to support complex configurations like Danbooru's, where files are stored across different servers with different URL structures. Here is how you would replicate Danbooru's current setup.

The basic idea itself is simple enough, but integrating it with the upload process turned out to be somewhat involved. This was mainly because of assumptions throughout the code that files are stored locally on the webserver. In short:

Autotagging of animated GIFs/PNGs was moved from during post editing to during upload. This assumed that files were available on the local webserver, which we don't want to assume. Rechecking the file on every edit isn't very efficient anyway.
Ugoira .webm conversion was made synchronous instead of asynchronous. Doing it asynchronously created multiple problems: it assumed the .zip file was available locally, it could break backups and iqdb submissions (since the conversion may not be finished yet when these things run), and it required hacky file existence checks in various places to detect whether the conversion was finished and the images were ready to use.
Backups were made synchronous instead of asynchronous. Again, this assumed that uploaded files were present on the local webserver. It also meant that if the job worker went down, uploads would be accepted without being backed up.
The signature for Post.delete_files had to be changed. Important: this means existing file deletion jobs need to be cancelled ASAP. These jobs wouldn't work right anyway, since in many cases they're trying to delete files that have since been archived.
/posts.json now returns absolute URLs in all cases. Right now it returns absolute URLs for some posts and relative URLs for others, which clients have to detect and fix in order to download files. This change may break scrapers. However, it's not hard to fix and makes the API easier to deal with.
The upload process was reworked to pass around temp files, instead of writing preview/samples directly to the local filesystem. This simplified some things.
URL building was moved to StorageManager, since the URL structure depends on how files are stored.
The backup system was changed to use the storage management system. This means you can choose between backing up files to a local directory, to SFTP, or to S3.

Making backups and ugoira conversions synchronous may slow down uploads. Ultimately I think it'd be better to make the whole upload process asynchronous, but that's outside the scope of this.

* Remove `initialize_status` (status already defaults to pending in database) * Remove `has_dimensions?` (always returns true) * Remove `async_conversion?` (dead code) * Remove `validate_file_exists` (unneeded checks) * Simplify `calculate_dimensions` * Merge `file_header_to_content_type` with `content_type_to_file_ext` (content type isn't used elsewhere)

Move animated_gif / animated_png autotagging to take place during uploading, instead of during tag editing. We can't generally assume the file will be present on the local filesystem after uploading.

Refactors the upload process to pass around temp files, rather than passing around file paths and directly writing output to the local filesystem. This way we can pass the storage manager the preview / sample / original temp files, so it can deal with storage itself. * Change Download::File#download! to return a temp file. * Change DanbooruImageResizer and PixivUgoiraConverter to accept/return temp files instead of file paths. * Change Upload#generate_resizes to return temp files for previews and samples. * Change Upload#generate_resizes to generate ugoira .webm samples synchronously instead of asynchronously.

* Perform backups synchronously inside `distribute_files` instead of asynchronously in `queue_backup`. Asynchronous backups assumed that files are stored on the local filesystem, which isn't true in general. * Remove obsolete backup service classes.

* Remove file_path_for, cropped_file_url (dead code) * Remove complete_preview_file_url (preview_file_url now returns absolute links) * Remove `file_name` (only used for Download link in sidebar)

r888888888 · 2018-03-27T00:28:10Z

Is this good to merge now?

evazion · 2018-03-27T02:36:46Z

Yeah, I'm finished with it, should be good to go.

evazion added 10 commits March 20, 2018 19:49

Add storage managers (local, sftp, s3, hybrid).

b0c7d9c

uploads: autotag animated_gif/png during upload.

60dcfbf

Move animated_gif / animated_png autotagging to take place during uploading, instead of during tag editing. We can't generally assume the file will be present on the local filesystem after uploading.

posts: use storage manager to build file urls.

6d0d1a3

posts: use storage manager to delete files.

b7f7187

posts: remove unused file path / url methods.

41a4ff1

* Remove file_path_for, cropped_file_url (dead code) * Remove complete_preview_file_url (preview_file_url now returns absolute links) * Remove `file_name` (only used for Download link in sidebar)

tests: fix upload tests.

d089be9

tests: add storage manager tests.

619a205

r888888888 merged commit ce7e313 into danbooru:master Mar 27, 2018

This was referenced Apr 2, 2018

Post replacement: Inappropriate ioctl for device #3593

Closed

Asynchronous uploads #3606

Closed

evazion deleted the feat-storage-manager branch April 14, 2018 03:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce storage manager concept #3577

Introduce storage manager concept #3577

evazion commented Mar 21, 2018 •

edited

r888888888 commented Mar 27, 2018

evazion commented Mar 27, 2018

Introduce storage manager concept #3577

Introduce storage manager concept #3577

Conversation

evazion commented Mar 21, 2018 • edited

r888888888 commented Mar 27, 2018

evazion commented Mar 27, 2018

evazion commented Mar 21, 2018 •

edited