How to train and annotate on custom dataset #12

harsh2ai · 2022-08-05T09:11:00Z

@gwkrsrch
Its a great project but I do have couple on questions on how to annotate my custom dataset
I have 10K images with texts on them I want different categories from them like price objects count product name , product description, is there any tool to do so , if no then how can it be done.

VictorAtPL · 2022-08-06T12:10:15Z

hey @harsh2ai

for my dataset I used label-studio to annotate training, validation and test data: https://labelstud.io/templates/optical_character_recognition.html

based on the output of Label-studio I had to prepare the json which is expected by donut:

{"file_name": {image_path0}, "ground_truth": "{\"gt_parse\": {ground_truth_parse}, ... {other_metadata_not_used} ... }"}

gwkrsrch · 2022-08-13T07:29:36Z

Hi, deeply thanks to @VictorAtPL for the comments.
The introduced tool seems suitable enough to perform the annotation. On the other hand, since donut can be trained without bounding box information, labeling with a simpler/naive tool would also be an option. For example, in some simple IE tasks, it would also be okay to directly create a target ground-truth JSON with a text editor. Hope this helps to you :)

gwkrsrch closed this as completed Aug 13, 2022

qustions mentioned this issue Sep 7, 2022

How to train and annotate on custom dataset #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train and annotate on custom dataset #12

How to train and annotate on custom dataset #12

harsh2ai commented Aug 5, 2022

VictorAtPL commented Aug 6, 2022

gwkrsrch commented Aug 13, 2022

How to train and annotate on custom dataset #12

How to train and annotate on custom dataset #12

Comments

harsh2ai commented Aug 5, 2022

VictorAtPL commented Aug 6, 2022

gwkrsrch commented Aug 13, 2022