Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Initial implementation of direct S3 uploads with uppy and stubs #13787

Merged
merged 28 commits into from
Jul 27, 2021

Conversation

martin-brennan
Copy link
Contributor

@martin-brennan martin-brennan commented Jul 20, 2021

This PR adds a few different things to allow for direct S3 uploads using uppy. These changes are still not the default. There are hidden enable_experimental_image_uploader and enable_direct_s3_uploads settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader. If you want to test this turn both those settings on and go to http://localhost:3000/u/me/preferences/profile

A new ExternalUploadStub model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used.

Starting a direct S3 upload

When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new generate-presigned-put endpoint in UploadsController. This generates an S3 key in the temp folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an ExternalUploadStub and store the details of the temp object key and the file being uploaded.

Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage.

Completing a direct S3 upload

Once the upload to S3 is done we call the new complete-external-upload route with the unique identifier of the ExternalUploadStub created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the ExternalUploadManager.

  1. If the object in S3 is too large (currently 100mb defined by ExternalUploadManager::DOWNLOAD_LIMIT) we do not download and generate the SHA1 for that file. Instead we create the Upload record via UploadCreator and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to UploadCreator have been made to accommodate this.

  2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the UppyChecksum plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues.

    We then follow the normal UploadCreator path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in UploadCreator we follow the same copy + delete temp path that we do for files that are too large.

  3. Finally we return the serialized upload record back to the client

There are several errors that could happen that are handled by UploadsController as well.

Also in this PR is some refactoring of displayErrorForUpload to handle both uppy and jquery file uploader errors.

DistributedMutex.synchronize("upload_#{user_id}_#{@filename}") do
# We need to convert HEIFs early because FastImage does not consider them as images
if convert_heif_to_jpeg?
if convert_heif_to_jpeg? && !external_upload_too_big
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like all these checks for !external_upload_too_big, may refactor this in future. There is just so little we can do for these huge external uploads that we haven't downloaded

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I have the same feeling here. I wonder if we should just split it out into two different methods or just two different code paths.

@martin-brennan martin-brennan marked this pull request as ready for review July 22, 2021 04:47
@martin-brennan martin-brennan changed the title WIP: Initial implementation of direct S3 uploads FEATURE: Initial implementation of direct S3 uploads with uppy and stubs Jul 22, 2021
Copy link
Contributor

@eviltrout eviltrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good - I only made minor comments. The usual warning applies: large commits like this should be merged with caution when you are around to support them.

app/assets/javascripts/discourse/app/lib/uploads.js Outdated Show resolved Hide resolved
app/controllers/uploads_controller.rb Outdated Show resolved Hide resolved
config/locales/server.en.yml Show resolved Hide resolved
config/routes.rb Show resolved Hide resolved
lib/file_store/s3_store.rb Outdated Show resolved Hide resolved
Copy link
Contributor

@tgxworld tgxworld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general approach looks good to me and the changes here are safe since it is still hidden behind a site setting 👍

@martin-brennan martin-brennan merged commit b500949 into main Jul 27, 2021
@martin-brennan martin-brennan deleted the feature/direct-s3-upload-groundwork-with-uppy branch July 27, 2021 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants