Skip to content

Conversation

@matt-bernstein
Copy link
Contributor

@matt-bernstein matt-bernstein commented Apr 4, 2025

[x] allow reading multiple tasks from the same file during import storage sync in scan_and_create_links
[x] update get_data for all import storage types to return list[dict] instead of dict
[x] handle blob urls? currently, still assuming only 1 task per file using these. Changing this would have a larger blast radius, would have to read the url at import time to determine how many tasks are in it, and then make up a task format for pointing to an object within a url, and read the url n times for n tasks during normal usage unless it's cached somehow... use blob urls == "Treat every bucket object as a source file", so it doesn't make sense to do this
[x] do the same for LSE S3 and GCS WIF import storage
[x] add test coverage - currently most test coverage for io_storages is in a single tavern file. Each import storage class definitely needs its own coverage for multiple tasks per key, so this'll be pretty lengthy

@github-actions github-actions bot added the feat label Apr 4, 2025
@sentry
Copy link

sentry bot commented Apr 4, 2025

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: label_studio/io_storages/base_models.py

Function Unhandled Issue
_scan_and_create_links ValueError: Storage status (in_progress) must be QUEUED to move it IN_PROGRESS io_storages.base...
Event Count: 8
_scan_and_create_links ValueError: Error on key object-detection/processed_videos.json: For GCS your JSON file must be a dictionary with one task. ...
Event Count: 3

Did you find this useful? React with a 👍 or 👎

@netlify
Copy link

netlify bot commented Apr 4, 2025

Deploy Preview for label-studio-storybook canceled.

Name Link
🔨 Latest commit 9629f22
🔍 Latest deploy log https://app.netlify.com/sites/label-studio-storybook/deploys/681043e8dc4ba50008996df8

@netlify
Copy link

netlify bot commented Apr 4, 2025

Deploy Preview for label-studio-docs-new-theme canceled.

Name Link
🔨 Latest commit 9629f22
🔍 Latest deploy log https://app.netlify.com/sites/label-studio-docs-new-theme/deploys/681043e80c87550008cb1289

@netlify
Copy link

netlify bot commented Apr 4, 2025

Deploy Preview for heartex-docs canceled.

Name Link
🔨 Latest commit 9629f22
🔍 Latest deploy log https://app.netlify.com/sites/heartex-docs/deploys/681043e80c87550008cb1285

@matt-bernstein matt-bernstein requested a review from pakelley April 4, 2025 21:00
@codecov
Copy link

codecov bot commented Apr 4, 2025

Codecov Report

Attention: Patch coverage is 86.74033% with 24 lines in your changes missing coverage. Please review.

Project coverage is 78.53%. Comparing base (665f1b5) to head (9629f22).
Report is 16 commits behind head on develop.

Files with missing lines Patch % Lines
label_studio/io_storages/localfiles/models.py 30.00% 7 Missing ⚠️
label_studio/io_storages/redis/models.py 62.50% 6 Missing ⚠️
..._studio/io_storages/tests/test_multitask_import.py 94.00% 3 Missing ⚠️
label_studio/io_storages/azure_blob/models.py 77.77% 2 Missing ⚠️
label_studio/io_storages/base_models.py 80.00% 2 Missing ⚠️
label_studio/io_storages/gcs/models.py 81.81% 2 Missing ⚠️
label_studio/io_storages/s3/models.py 86.66% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7334      +/-   ##
===========================================
+ Coverage    78.40%   78.53%   +0.12%     
===========================================
  Files          193      195       +2     
  Lines        15624    15767     +143     
===========================================
+ Hits         12250    12382     +132     
- Misses        3374     3385      +11     
Flag Coverage Δ
pytests 78.53% <86.74%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@matt-bernstein matt-bernstein changed the title feat: DIA-2902: Support reading multiple tasks from each file in a cloud storage feat: DIA-2092: Support reading multiple tasks from each file in a cloud storage Apr 15, 2025
Copy link
Contributor

@pakelley pakelley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything blocking, lgtm

@matt-bernstein
Copy link
Contributor Author

matt-bernstein commented Apr 29, 2025

/fm sync

Workflow run

@matt-bernstein matt-bernstein enabled auto-merge (squash) April 29, 2025 03:09
@matt-bernstein
Copy link
Contributor Author

matt-bernstein commented Apr 29, 2025

/fm sync

Workflow run

@matt-bernstein matt-bernstein requested a review from hakan458 April 29, 2025 13:37
@matt-bernstein matt-bernstein merged commit ec3c97f into develop Apr 29, 2025
49 checks passed
@robot-ci-heartex robot-ci-heartex deleted the fb-dia-2092 branch April 29, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants