-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multimodal label-studio export reader & doc #2615
Conversation
Job PR-2615-51ee391 is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, left a few comments.
@sxjscience Please let us know what else is missing from current PR, thank you.
@@ -0,0 +1,173 @@ | |||
# Label-Studio Export file reader | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use _
in file name, e.g., LabelStudio_export_file_reader.md
|
||
params: | ||
- path: str: the path of the exported file | ||
- data_columns: list[str]: the key/column names of the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a specific example of what data_columns
and label_columns
look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the specific usage of this two params in the docs, and address the doc link in the code for the next commit. Is it acceptable or should I provide examples as well in the source code as a comment?
if len(label_studio_json) == 0: | ||
raise ValueError("ERROR: empty export file") | ||
|
||
if "annotations" in label_studio_json[0]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why label_studio_json[0]
? How many elements are there in label_studio_json
? Usually we want to avoid hard coded stuff, like index 0, unless there is good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For exporting annotations in label-studio, "JSON" and "JSON-MIN" are two different options with the same file extension ".json" that contains a list of annotation dicts. (their examples can be seen in https://labelstud.io/guide/export.html#Label-Studio-JSON-format-of-annotated-tasks and https://labelstud.io/guide/export.html#JSON-MIN, basically JSON-MIN is the simplified version of JSON).
This line of code here is to check if the export file is from "JSON" or "JSON-MIN". Currently I just use a simple check on whether one of the elements (an annotation) has a key "annotations" that exists in JSON but not in JSON-MIN. Still I'm finding better ways to distinguish them and any suggestion is welcomed.
|
||
else: | ||
split_lst = s.split("/") | ||
if split_lst[2] == "local-files": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, how many elements are there in split_lst
and what does split_lst[2]
represent?
from autogluon.multimodal.utils import LabelStudioReader | ||
|
||
# initialize LabelStudioReader with default localhost host | ||
ls=LabelStudioReader() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need extra whitespaces, e.g., ls = LabelStudioReader
. There are also many instances like this below, please add whitespaces accordingly.
|
||
|
||
""" | ||
Usage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to put the docstring inside “LabelStudioReader” class.
Actually, should we rename |
Job PR-2615-f886e8e is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we can fill in more details later. @zhiqiangdon @cheungdaven @FANGAreNotGnu @suzhoum @yongxinw Please help to review this PR. This PR enables AG to train on labeled data using LabelStudio, which is a quite useful feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Awesome feature!
This tool is to help user to transform the exported label annotation data from a data labeling platform Label-Studio (https://labelstud.io/) and generate the pandas Dataframe for Autogluon multimodal input. In this way use can build up a labelstudio-autogluon workflow, label the data through Label-Studio and then feed the data to Autogluon with a few lines of simple code to adjust the data.
So far there are 3 task template available, including image-classification (image), named entity recognition(text) and user-customized template. Other templates are WIP.
A documentation for this feature is attached to this PR.
Description of changes:
from_labelstudio.py
to autogluon/multimodal/src/autogluon/multimodal/utilslabel-studio-export-reader
to autogluon/examples/autommBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.