-
Notifications
You must be signed in to change notification settings - Fork 3.3k
feat: DIA-2092: Support reading multiple tasks from each file in a cloud storage #7334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: label_studio/io_storages/base_models.py
Did you find this useful? React with a 👍 or 👎 |
✅ Deploy Preview for label-studio-storybook canceled.
|
✅ Deploy Preview for label-studio-docs-new-theme canceled.
|
✅ Deploy Preview for heartex-docs canceled.
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7334 +/- ##
===========================================
+ Coverage 78.40% 78.53% +0.12%
===========================================
Files 193 195 +2
Lines 15624 15767 +143
===========================================
+ Hits 12250 12382 +132
- Misses 3374 3385 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dcd8a58 to
af4547e
Compare
pakelley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything blocking, lgtm
|
/fm sync |
|
/fm sync |
[x] allow reading multiple tasks from the same file during import storage sync in scan_and_create_links
[x] update get_data for all import storage types to return list[dict] instead of dict
[x] handle blob urls?
currently, still assuming only 1 task per file using these. Changing this would have a larger blast radius, would have to read the url at import time to determine how many tasks are in it, and then make up a task format for pointing to an object within a url, and read the url n times for n tasks during normal usage unless it's cached somehow...use blob urls == "Treat every bucket object as a source file", so it doesn't make sense to do this[x] do the same for LSE S3 and GCS WIF import storage
[x] add test coverage - currently most test coverage for io_storages is in a single tavern file. Each import storage class definitely needs its own coverage for multiple tasks per key, so this'll be pretty lengthy