Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨♻️ Storage refactoring step3 #3144

Merged

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Jun 27, 2022

What do these changes do?

pre-requisite to #3021

This PR brings the step 3 of storage refactoring. It is the biggest step of the refactoring and replaces a lot of the parts in storage micro-service.

Highlights:

  • ♻️ removed minio library, replaced by aioboto3/aiobotocore
  • ♻️ moto now used for testing S3 instead of docker container of minio
  • 🐛✨ add option to explicitly abort an upload and revert to the previous file if any instead of bluntly deleting the files (issue when using RClone in the node_ports)
  • ♻️ uses pydantic-based request validators
  • ♻️ handlers code fully separated by routes
  • ♻️ clean up of file_meta_data table in postgres (👋 file_uuid, project_name, node_name, file_name, user_name, raw_file_path, display_file_path)
  • ✨ introduction of the DSM cleaner, periodically checks if the file_meta_data table contains expired uploads (e.g. uploads that were created at some time but never completed, these are then removed from the table to maintain sync)
  • ♻️ completely separated DB code from infamous dsm.py module
  • ♻️ completely separated dsm.py for simcore and datcore following the factory pattern to simplify
  • ♻️ tests are now using complete application and run from REST API endpoint
  • 🔥 removed all funny not so funny test code in production code...
  • 🔥 removed usage of attrs library

Related issue/s

How to test

Checklist

@sanderegg sanderegg added this to the Diolkos milestone Jun 27, 2022
@sanderegg sanderegg self-assigned this Jun 27, 2022
@codecov
Copy link

codecov bot commented Jun 27, 2022

Codecov Report

Merging #3144 (2bc4d08) into master (2bc4d08) will not change coverage.
The diff coverage is n/a.

❗ Current head 2bc4d08 differs from pull request most recent head 84142e9. Consider uploading reports for the commit 84142e9 to get more accurate results

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #3144   +/-   ##
======================================
  Coverage    77.3%   77.3%           
======================================
  Files         591     591           
  Lines       24266   24266           
  Branches     3148    3148           
======================================
  Hits        18761   18761           
  Misses       4923    4923           
  Partials      582     582           
Flag Coverage Δ
unittests 77.2% <0.0%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@sanderegg sanderegg force-pushed the enhancement/storage_refactoring_step3 branch 3 times, most recently from 604d2f2 to 9fe0ae0 Compare June 28, 2022 19:04
@sanderegg sanderegg marked this pull request as ready for review June 28, 2022 19:04
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!
Thx for upgrading so well storage!

@sanderegg sanderegg force-pushed the enhancement/storage_refactoring_step3 branch 2 times, most recently from 8e3100c to b859bf9 Compare June 29, 2022 20:52
Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Glad we have so many tests on storage.
Please consider my questions and suggestions below.

services/storage/src/simcore_service_storage/s3_client.py Outdated Show resolved Hide resolved
Comment on lines +3 to +16
# Rationale:
- for each upload an entry is created in the file_meta_data table in the database
- then an upload link (S3/HTTP URL) is created through S3 backend and sent back to the client
- the client shall upload the file and then notify DSM of completion
- upon completion the corresponding entry in file_meta_data is updated:
- the file_size of the uploaded file is set
- the upload_expiration_date is set to null

# DSM cleaner:
- runs at an interval
- list the entries that are expired in the database by checking "upload_expires_at" column
- tries to update from S3 the database first, if that fails:
- removes the entries in the database that are expired:
- removes the entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

Could this also be a good opportunity to clean up the S3 storage?

When people remove nodes, sometimes there is trash left in the S3 storage, this is especially valid for bing/long running studies.

When the DSM cleaner ends it could also check if files in S3 are no longer being used (or are no longer referenced) and remove them.
Maybe such a step could be triggered every X times the DSM cleaner finishes, since it might take some time to check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yes, but in a next iteration. let's keep it simple now

services/storage/src/simcore_service_storage/exceptions.py Outdated Show resolved Hide resolved
services/storage/tests/unit/test_handlers_files.py Outdated Show resolved Hide resolved
@sanderegg sanderegg force-pushed the enhancement/storage_refactoring_step3 branch from d6c2bed to 84142e9 Compare June 30, 2022 10:24
@sonarcloud
Copy link

sonarcloud bot commented Jun 30, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants