Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training For Document Information Extraction #36

Open
benugopal opened this issue Aug 27, 2022 · 0 comments
Open

Training For Document Information Extraction #36

benugopal opened this issue Aug 27, 2022 · 0 comments

Comments

@benugopal
Copy link

Its a great project, and I want to try it out the approach without OCR.
I have 3 questions related training

  1. We need to create ground truth for training test and validation, do we have any tool to perform the annotations to get the input as per training requirement.

  2. For training I think you need to use OCR to create ground truth data, than how it is extracted during inference?

  3. I see we need to provide dictionary hierarchy for classes in ground truth, can i use my own classes and custom hierarchy for ground truth example
    {
    "gt_parse": {
    "Item": [
    {
    "Description": "SPGTHY BOLOGNASE",
    "Quantity": "1",
    "Price": "58,000"
    },
    {
    "Description": "SPGTHY BOLOGNASE",
    "Quantity": "1",
    "Price": "58,000"
    }],

     	"Total": {"value": "20"},
     	"Sub_Total": {"value": "50"},
     	"Number": {"value": "80"}}}
    

Could you please guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant