Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue syncing tasks from S3 #3150

Open
Ben-Epstein opened this issue Oct 26, 2022 · 4 comments
Open

Issue syncing tasks from S3 #3150

Ben-Epstein opened this issue Oct 26, 2022 · 4 comments
Assignees
Labels
import problem bug or something isn't working storages External / Cloud storage connections

Comments

@Ben-Epstein
Copy link
Contributor

Ben-Epstein commented Oct 26, 2022

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Working with text classification using a single CSV file with 2 columns: text and label

I created a project with my template

project = ls.start_project(title="new_project", label_config=label_config)

then connected the storage to S3

project.connect_s3_import_storage(bucket="<REDACTED>", prefix="<REDACTED>", regex_filter=".*train.csv", use_blob_urls=False, aws_access_key_id="<REDACTED>", aws_secret_access_key="<REDACTED>")

then synced storage

project.sync_storage("s3", 8)

Expected behavior
I expected to see my CSV file imported as multiple tasks in labelstudio

Current behavior
I am seeing a single task which is the presigned URL in Labelstudio. I set up CORS the way it was instructed here

Screenshots
image

Environment (please complete the following information):

  • OS: [e.g. iOS] MacOS
  • Label Studio Version [e.g. 0.8.0] 1.6.0

Additional context
I also tried to use a JSON file instead of a CSV file, same result. I might be missing something regarding how to import text tasks, but I cant seem to find anything else in the docs. Thanks!.

UPDATE:
It seems like if I set the valueType=url in the label config, I get the data imported, but it comes in as a single task for all rows in the CSV, instead of 1 task per row in the CSV.
How can I have it parsed as individual tasks?

as CSV
image

as JSON
image

@Ben-Epstein Ben-Epstein changed the title Issue syncing tasks to S3 Issue syncing tasks from S3 Oct 26, 2022
@makseq makseq added problem bug or something isn't working storages External / Cloud storage connections import labels Nov 3, 2022
@makseq
Copy link
Member

makseq commented Nov 3, 2022

Unfortunately this is not yet supported. There is only one way - to have multiple csv, each csv for each task + valueType="url"

Relative:
#3061
#2662
#1960

@oleksandr-svystun
Copy link

Hi @makseq,
are there any updates regarding this issue?

@makseq
Copy link
Member

makseq commented Mar 16, 2023

@Polarnatt No updates, only one json/csv per task is allowed for import storages.

@infinity811
Copy link

Hi @makseq is there any further update as I am still facing this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
import problem bug or something isn't working storages External / Cloud storage connections
Projects
None yet
Development

No branches or pull requests

4 participants