multimodal label-studio export reader & doc #2615

MountPOTATO · 2022-12-30T17:01:13Z

This tool is to help user to transform the exported label annotation data from a data labeling platform Label-Studio (https://labelstud.io/) and generate the pandas Dataframe for Autogluon multimodal input. In this way use can build up a labelstudio-autogluon workflow, label the data through Label-Studio and then feed the data to Autogluon with a few lines of simple code to adjust the data.
So far there are 3 task template available, including image-classification (image), named entity recognition(text) and user-customized template. Other templates are WIP.
A documentation for this feature is attached to this PR.

Description of changes:

add from_labelstudio.py to autogluon/multimodal/src/autogluon/multimodal/utils
a documentation folder label-studio-export-reader to autogluon/examples/automm

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2022-12-31T12:09:27Z

Job PR-2615-51ee391 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2615/51ee391/index.html

bryanyzhu

Thanks for the PR, left a few comments.

@sxjscience Please let us know what else is missing from current PR, thank you.

bryanyzhu · 2023-01-03T03:46:23Z

examples/automm/label-studio-export-reader/Label-Studio Export file reader.md

@@ -0,0 +1,173 @@
+# Label-Studio Export file reader
+


Please use _ in file name, e.g., LabelStudio_export_file_reader.md

bryanyzhu · 2023-01-03T03:49:48Z

multimodal/src/autogluon/multimodal/utils/from_labelstudio.py

+
+    params:
+    - path: str: the path of the exported file
+    - data_columns: list[str]: the key/column names of the data


Can you give a specific example of what data_columns and label_columns look like?

I will update the specific usage of this two params in the docs, and address the doc link in the code for the next commit. Is it acceptable or should I provide examples as well in the source code as a comment?

bryanyzhu · 2023-01-03T03:51:08Z

multimodal/src/autogluon/multimodal/utils/from_labelstudio.py

+        if len(label_studio_json) == 0:
+            raise ValueError("ERROR: empty export file")
+
+        if "annotations" in label_studio_json[0]:


Why label_studio_json[0]? How many elements are there in label_studio_json? Usually we want to avoid hard coded stuff, like index 0, unless there is good reason.

For exporting annotations in label-studio, "JSON" and "JSON-MIN" are two different options with the same file extension ".json" that contains a list of annotation dicts. (their examples can be seen in https://labelstud.io/guide/export.html#Label-Studio-JSON-format-of-annotated-tasks and https://labelstud.io/guide/export.html#JSON-MIN, basically JSON-MIN is the simplified version of JSON).
This line of code here is to check if the export file is from "JSON" or "JSON-MIN". Currently I just use a simple check on whether one of the elements (an annotation) has a key "annotations" that exists in JSON but not in JSON-MIN. Still I'm finding better ways to distinguish them and any suggestion is welcomed.

bryanyzhu · 2023-01-03T03:55:04Z

multimodal/src/autogluon/multimodal/utils/from_labelstudio.py

+
+        else:
+            split_lst = s.split("/")
+            if split_lst[2] == "local-files":


Again, how many elements are there in split_lst and what does split_lst[2] represent?

bryanyzhu · 2023-01-03T03:57:43Z

examples/automm/label-studio-export-reader/Label-Studio Export file reader.md

+from autogluon.multimodal.utils import LabelStudioReader
+
+# initialize LabelStudioReader with default localhost host
+ls=LabelStudioReader() 


Need extra whitespaces, e.g., ls = LabelStudioReader. There are also many instances like this below, please add whitespaces accordingly.

sxjscience · 2023-01-03T04:07:41Z

multimodal/src/autogluon/multimodal/utils/from_labelstudio.py

+
+
+"""
+Usage:


Try to put the docstring inside “LabelStudioReader” class.

multimodal/src/autogluon/multimodal/utils/from_labelstudio.py

sxjscience · 2023-01-04T03:56:37Z

Actually, should we rename utils/from_labelstudio.py to utils/labelstudio.py ?

github-actions · 2023-01-05T07:27:56Z

Job PR-2615-f886e8e is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2615/f886e8e/index.html

bryanyzhu

LGTM, we can fill in more details later. @zhiqiangdon @cheungdaven @FANGAreNotGnu @suzhoum @yongxinw Please help to review this PR. This PR enables AG to train on labeled data using LabelStudio, which is a quite useful feature.

sxjscience

LGTM

zhiqiangdon

LGTM. Awesome feature!

MountPOTATO added 6 commits December 31, 2022 00:46

feat: add label-studio export reader & doc

9768120

fix: PEP8 reformat

b91d9b8

fix: lint_check suggestions correct

1547a0c

fix: other small lint issues

564edf3

fix: sort packages import order

7eeebab

Update from_labelstudio.py

51ee391

bryanyzhu reviewed Jan 3, 2023

View reviewed changes

sxjscience reviewed Jan 3, 2023

View reviewed changes

multimodal/src/autogluon/multimodal/utils/from_labelstudio.py Outdated Show resolved Hide resolved

MountPOTATO added 3 commits January 5, 2023 13:44

fix: naming and code details

7a8cd23

fix: prefix bug

8d246e8

Update label_studio.py

f886e8e

bryanyzhu approved these changes Jan 9, 2023

View reviewed changes

bryanyzhu requested review from zhiqiangdon, cheungdaven, FANGAreNotGnu, suzhoum and yongxinw January 9, 2023 04:15

sxjscience approved these changes Jan 9, 2023

View reviewed changes

zhiqiangdon approved these changes Jan 9, 2023

View reviewed changes

sxjscience merged commit 21d3ad7 into autogluon:master Jan 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal label-studio export reader & doc #2615

multimodal label-studio export reader & doc #2615

MountPOTATO commented Dec 30, 2022

github-actions bot commented Dec 31, 2022

bryanyzhu left a comment

bryanyzhu Jan 3, 2023

bryanyzhu Jan 3, 2023

MountPOTATO Jan 4, 2023

bryanyzhu Jan 3, 2023

MountPOTATO Jan 4, 2023

bryanyzhu Jan 3, 2023

bryanyzhu Jan 3, 2023

sxjscience Jan 3, 2023

sxjscience commented Jan 4, 2023

github-actions bot commented Jan 5, 2023

bryanyzhu left a comment •

edited

sxjscience left a comment

zhiqiangdon left a comment



		"""
		Usage:

multimodal label-studio export reader & doc #2615

multimodal label-studio export reader & doc #2615

Conversation

MountPOTATO commented Dec 30, 2022

github-actions bot commented Dec 31, 2022

bryanyzhu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sxjscience commented Jan 4, 2023

github-actions bot commented Jan 5, 2023

bryanyzhu left a comment • edited

Choose a reason for hiding this comment

sxjscience left a comment

Choose a reason for hiding this comment

zhiqiangdon left a comment

Choose a reason for hiding this comment

bryanyzhu left a comment •

edited