FEATURE: Initial implementation of direct S3 uploads with uppy and stubs #13787

martin-brennan · 2021-07-20T00:08:03Z

This PR adds a few different things to allow for direct S3 uploads using uppy. These changes are still not the default. There are hidden enable_experimental_image_uploader and enable_direct_s3_uploads settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader. If you want to test this turn both those settings on and go to http://localhost:3000/u/me/preferences/profile

A new ExternalUploadStub model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used.

Starting a direct S3 upload

When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new generate-presigned-put endpoint in UploadsController. This generates an S3 key in the temp folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an ExternalUploadStub and store the details of the temp object key and the file being uploaded.

Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage.

Completing a direct S3 upload

Once the upload to S3 is done we call the new complete-external-upload route with the unique identifier of the ExternalUploadStub created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the ExternalUploadManager.

If the object in S3 is too large (currently 100mb defined by ExternalUploadManager::DOWNLOAD_LIMIT) we do not download and generate the SHA1 for that file. Instead we create the Upload record via UploadCreator and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to UploadCreator have been made to accommodate this.
If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the UppyChecksum plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues.

We then follow the normal UploadCreator path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in UploadCreator we follow the same copy + delete temp path that we do for files that are too large.
Finally we return the serialized upload record back to the client

There are several errors that could happen that are handled by UploadsController as well.

Also in this PR is some refactoring of displayErrorForUpload to handle both uppy and jquery file uploader errors.

martin-brennan · 2021-07-21T04:32:32Z

lib/upload_creator.rb

    DistributedMutex.synchronize("upload_#{user_id}_#{@filename}") do
      # We need to convert HEIFs early because FastImage does not consider them as images
-      if convert_heif_to_jpeg?
+      if convert_heif_to_jpeg? && !external_upload_too_big


I'm not sure I like all these checks for !external_upload_too_big, may refactor this in future. There is just so little we can do for these huge external uploads that we haven't downloaded

Yea I have the same feeling here. I wonder if we should just split it out into two different methods or just two different code paths.

app/models/external_upload_stub.rb

app/services/external_upload_manager.rb

lib/upload_creator.rb

eviltrout

This looks good - I only made minor comments. The usual warning applies: large commits like this should be merged with caution when you are around to support them.

app/assets/javascripts/discourse/app/lib/uploads.js

app/controllers/uploads_controller.rb

config/locales/server.en.yml

config/routes.rb

lib/file_store/s3_store.rb

app/assets/javascripts/discourse/app/lib/uppy-checksum-plugin.js

app/controllers/uploads_controller.rb

app/services/external_upload_manager.rb

db/migrate/20210709042135_create_external_upload_stubs.rb

app/models/external_upload_stub.rb

db/migrate/20210709042135_create_external_upload_stubs.rb

lib/file_store/s3_store.rb

spec/fabricators/external_upload_stub_fabricator.rb

spec/jobs/clean_up_uploads_spec.rb

spec/requests/uploads_controller_spec.rb

spec/services/external_upload_manager_spec.rb

spec/jobs/clean_up_uploads_spec.rb

tgxworld

The general approach looks good to me and the changes here are safe since it is still hidden behind a site setting 👍

db/migrate/20210709042135_create_external_upload_stubs.rb

martin-brennan added 9 commits July 20, 2021 10:06

WIP: Initial implementation of direct S3 uploads

6143ddb

Add upload type to external stub and add tests

e3ee871

More testing and refactoring

01498a0

Use displayErrorForUpload for all upload errors

a190c08

Fix ruby syntax errors

93475f1

Fixes for upload controller

76ee24d

Adding more tests and refactors

87da823

Add external upload stub cleanup to scheduled clean up uploads

24dd6dc

Fix specs

ed17aba

martin-brennan commented Jul 21, 2021

View reviewed changes

martin-brennan added 4 commits July 21, 2021 15:52

Add more specs

652f2d4

Merge branch 'main' into feature/direct-s3-upload-groundwork-with-uppy

0e1b5de

Lots more edge cases and adding specs

48d0332

Add another hidden setting for enable_direct_s3_uploads

e1b0f67

martin-brennan commented Jul 22, 2021

View reviewed changes

app/models/external_upload_stub.rb Show resolved Hide resolved

martin-brennan commented Jul 22, 2021

View reviewed changes

app/services/external_upload_manager.rb Show resolved Hide resolved

remove todo

5e6b57a

martin-brennan commented Jul 22, 2021

View reviewed changes

lib/upload_creator.rb Show resolved Hide resolved

martin-brennan added 3 commits July 22, 2021 14:29

Remove unnecessary file

cfef226

Add further restrictions to new routes

5a12bca

Add annotation

e0bb880

martin-brennan marked this pull request as ready for review July 22, 2021 04:47

martin-brennan changed the title ~~WIP: Initial implementation of direct S3 uploads~~ FEATURE: Initial implementation of direct S3 uploads with uppy and stubs Jul 22, 2021

eviltrout approved these changes Jul 22, 2021

View reviewed changes

Review fixes

7f2a571

martin-brennan requested a review from eviltrout July 26, 2021 03:03

martin-brennan added 2 commits July 26, 2021 13:22

Merge branch 'main' into feature/direct-s3-upload-groundwork-with-uppy

4c3948a

Fix flaky spec

9ed930e

tgxworld reviewed Jul 26, 2021

View reviewed changes

app/assets/javascripts/discourse/app/lib/uppy-checksum-plugin.js Outdated Show resolved Hide resolved

tgxworld reviewed Jul 26, 2021

View reviewed changes

app/assets/javascripts/discourse/app/lib/uppy-checksum-plugin.js Outdated Show resolved Hide resolved