Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Storage: allow creation of multipart presigned upload links #3021

Merged

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented May 5, 2022

What do these changes do?

This PR brings the long awaited so-called multipart upload in the storage micro-service. Multipart uploads allow to upload a file of max 5 TB to S3 by chunking the data in smaller pieces.

Highlights:

  • Storage now is capable of creating multiple presigned upload links (process described below)
  • Frontend uses multipart upload
  • Exporter uses multipart upload
  • Copying uses multipart upload
  • NOTE: api-server is NOT using multipart upload yet
  • NOTE: computational backend is NOT using multipart upload yet
  • node ports now uses pydantic settings (replaced STORAGE_ENDPOINT with STORAGE_HOST/STORAGE_PORT)

Upload process

  • PUT /v0/locations/{location_id}/files/{file_id} now take an optional query parameter file_size
    • if file_size is not set, then the request will behave as before (backward compatible - ONLY for legacy dynamic services, will be deprecated as soon as these services are removed)
    • if file_size is set with 0, then the new data structure is returned but still a single link (limited to 5Gb) is returned
    • if file_size is set to the real size of the file, the new data structure is returned with a number of links for upload (S3 allows to go multipart from 10MB upwards)
  • POST /v0/locations/{location_id}/files/{file_id}:complete allows to tell storage that the file upload was completed
    • it is required for multipart uploads, as it is necessary to call S3 to put all the chunks back together into one file (this process can take up to several minutes)
    • for single presigned links and s3 direct link it is not strictly necessary but a good practice
    • the request will return a future_id
  • POST /locations/{location_id}/files/{file_id}:complete/futures/{future_id} allows to know when the completion task is finished
    • once the state of that future is OK, the file is available for use by osparc

Multipart uploads:

  • IMPORTANT NOTE: AWS makes the client pay for costs of a multipart upload, even if the file is not uploaded YET. therefore the dsm_cleaner background task is extended to monitor expired multipart uploads and aborts them. That means, if the upload is not completed within 1 hour, then it will be forcibly aborted.
  • to this end, the file_meta_data table got 1 additional column called upload_id

Related issue/s

How to test

Checklist

@sanderegg sanderegg added this to the Macarons +1 milestone May 5, 2022
@sanderegg sanderegg self-assigned this May 5, 2022
@codecov
Copy link

codecov bot commented May 5, 2022

Codecov Report

Merging #3021 (e379255) into master (aed99b7) will increase coverage by 0.8%.
The diff coverage is 85.3%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #3021     +/-   ##
========================================
+ Coverage    80.4%   81.3%   +0.8%     
========================================
  Files         722     723      +1     
  Lines       30705   30933    +228     
  Branches     3971    4001     +30     
========================================
+ Hits        24706   25164    +458     
+ Misses       5206    4961    -245     
- Partials      793     808     +15     
Flag Coverage Δ
integrationtests 64.4% <75.4%> (+13.8%) ⬆️
unittests 78.2% <68.5%> (+0.1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...simcore_postgres_database/models/file_meta_data.py 100.0% <ø> (ø)
...s/settings-library/src/settings_library/storage.py 0.0% <0.0%> (ø)
...src/simcore_service_api_server/api/routes/files.py 43.4% <0.0%> (-1.8%) ⬇️
...2/src/simcore_service_director_v2/core/settings.py 96.1% <ø> (-0.1%) ⬇️
...es/dynamic_sidecar/docker_service_specs/sidecar.py 80.4% <ø> (ø)
.../src/simcore_service_api_server/modules/storage.py 52.5% <50.0%> (-0.1%) ⬇️
...rage/src/simcore_service_storage/utils_handlers.py 88.0% <50.0%> (-3.4%) ⬇️
.../src/simcore_service_webserver/storage_handlers.py 70.1% <60.8%> (+6.8%) ⬆️
...re-sdk/src/simcore_sdk/node_ports_v2/port_utils.py 70.0% <71.4%> (+0.3%) ⬆️
...service_storage/datcore_adapter/datcore_adapter.py 35.8% <75.0%> (+2.1%) ⬆️
... and 69 more

@sanderegg sanderegg force-pushed the enhancement/allow_multipart_links branch 2 times, most recently from cc813e6 to 0563ecf Compare May 6, 2022 14:30
@sanderegg sanderegg force-pushed the enhancement/allow_multipart_links branch 3 times, most recently from 23cb29f to c32a2dd Compare May 19, 2022 11:39
@sanderegg sanderegg force-pushed the enhancement/allow_multipart_links branch 4 times, most recently from d9616e2 to 0b10a96 Compare May 25, 2022 10:17
@sanderegg sanderegg force-pushed the enhancement/allow_multipart_links branch from 0b10a96 to 91de02b Compare May 29, 2022 21:34
@sanderegg sanderegg force-pushed the enhancement/allow_multipart_links branch from 177c324 to 392f425 Compare May 30, 2022 12:54
@sanderegg sanderegg changed the title ✨ Add creation of multipart upload links ✨ Refactoring of storage/allow creation of multipart presigned upload links May 30, 2022
@sanderegg sanderegg force-pushed the enhancement/allow_multipart_links branch from b930aab to 40b4773 Compare June 1, 2022 16:45
@sonarcloud
Copy link

sonarcloud bot commented Jun 2, 2022

Please retry analysis of this Pull-Request directly on SonarCloud.

@sanderegg sanderegg force-pushed the enhancement/allow_multipart_links branch 4 times, most recently from 502a42d to 81e43ea Compare June 9, 2022 05:40
@sanderegg sanderegg force-pushed the enhancement/allow_multipart_links branch 3 times, most recently from 1c3b411 to c51a157 Compare June 13, 2022 10:00
@sanderegg sanderegg force-pushed the enhancement/allow_multipart_links branch from ad656f3 to e379255 Compare July 6, 2022 08:57
@sonarcloud
Copy link

sonarcloud bot commented Jul 6, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 4 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@sanderegg sanderegg merged commit 3f88ad3 into ITISFoundation:master Jul 6, 2022
@sanderegg sanderegg deleted the enhancement/allow_multipart_links branch August 26, 2022 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants