Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dataset): external storage backend #3323

Merged
merged 7 commits into from Mar 20, 2023

Conversation

m-alisafaee
Copy link
Contributor

@m-alisafaee m-alisafaee commented Feb 16, 2023

Description

Removes external files. Replaces is_external with linked for linked files. Adds an external storage backend for external files. Store file sizes when adding files to datasets.

TODO

  • Set minimum version to 2.4.0 since we add new fields to the metadata.

Fixes #3284
Fixes #3279

@coveralls
Copy link
Collaborator

coveralls commented Feb 16, 2023

Pull Request Test Coverage Report for Build 4449216907

  • 376 of 507 (74.16%) changed or added relevant lines in 34 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+86.0%) to 85.991%

Changes Missing Coverage Covered Lines Changed/Added Lines %
renku/core/dataset/providers/factory.py 9 10 90.0%
renku/core/dataset/providers/s3.py 14 15 93.33%
renku/core/workflow/activity.py 0 1 0.0%
renku/infrastructure/database.py 5 6 83.33%
renku/infrastructure/storage/factory.py 1 2 50.0%
renku/command/dataset.py 4 6 66.67%
renku/command/format/dataset_files.py 34 36 94.44%
renku/core/dataset/providers/azure.py 17 19 89.47%
renku/core/interface/storage.py 8 10 80.0%
renku/domain_model/dataset.py 22 25 88.0%
Totals Coverage Status
Change from base Build 4447217940: 86.0%
Covered Lines: 25615
Relevant Lines: 29788

💛 - Coveralls

@m-alisafaee m-alisafaee force-pushed the 3284-external-storage-backend branch 5 times, most recently from c04bcfd to b8538a3 Compare March 1, 2023 16:14
@m-alisafaee m-alisafaee force-pushed the 3284-external-storage-backend branch 9 times, most recently from fc04ebf to 7aa6786 Compare March 3, 2023 21:33
@m-alisafaee m-alisafaee force-pushed the 3284-external-storage-backend branch 9 times, most recently from bfa20a7 to e8bec96 Compare March 14, 2023 00:22
@m-alisafaee m-alisafaee force-pushed the 3284-external-storage-backend branch from e8bec96 to 5d06c59 Compare March 14, 2023 08:28
@m-alisafaee m-alisafaee marked this pull request as ready for review March 14, 2023 08:28
@m-alisafaee m-alisafaee requested a review from a team as a code owner March 14, 2023 08:28
Copy link
Member

@Panaetius Panaetius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we separate renku.core.dataset.providers to make it clear what is for storage and what is for just add? It's getting a bit crowded in there.

And I'm a bit torn on the internal naming going from external to cloud, since it's not all cloud-based now. That could easily lead to misunderstandings. I like external or remote more

renku/command/checks/datasets.py Outdated Show resolved Hide resolved
renku/core/dataset/providers/api.py Outdated Show resolved Hide resolved
@m-alisafaee
Copy link
Contributor Author

should we separate renku.core.dataset.providers to make it clear what is for storage and what is for just add? It's getting a bit crowded in there.

There are providers that are both storage and for add (S3, Azure, ...). I'm not sure how we can address them. I believe renku.core.dataset.providers isn't the best place for providers anymore. We should remove them outside the dataset module and put them directly under core. Maybe also rename them to data-provider (since we use provider in other places).

And I'm a bit torn on the internal naming going from external to cloud, since it's not all cloud-based now. That could easily lead to misunderstandings. I like external or remote more

I changed it to cloud since I've noticed people are using this term. Remote makes more sense although it's a bit overused. Let's discuss this in the meeting tomorrow to decide on the terminology.

Panaetius
Panaetius previously approved these changes Mar 17, 2023
@m-alisafaee m-alisafaee merged commit 2a461d4 into develop Mar 20, 2023
35 of 37 checks passed
@m-alisafaee m-alisafaee deleted the 3284-external-storage-backend branch March 20, 2023 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Support external files as a storage backend Store file sizes for datasets with cloud storage
3 participants