-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoMM] Add document classification pipeline #2765
Conversation
Job PR-2765-05acc33 is done. |
Job PR-2765-dd29319 is done. |
Job PR-2765-cb50221 is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great feature and tutorial, thanks.
docs/tutorials/multimodal/multimodal_prediction/document_classification.md
Outdated
Show resolved
Hide resolved
multimodal/src/autogluon/multimodal/data/preprocess_dataframe.py
Outdated
Show resolved
Hide resolved
docs/tutorials/multimodal/multimodal_prediction/document_classification.md
Outdated
Show resolved
Hide resolved
Job PR-2765-be3db78 is done. |
Job PR-2765-7c87a17 is done. |
Job PR-2765-db34ed3 is done. |
Job PR-2765-d095b6a is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Awesome new feature! We may need to sync up with Zihan on the OCR design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Job PR-2765-3336abc is done. |
Job PR-2765-f154a60 is done. |
Job PR-2765-598a05d is done. |
Job PR-2765-e1b6c11 is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few minor comments.
Job PR-2765-1d97c17 is done. |
Job PR-2765-fe711d4 is done. |
Job PR-2765-e6343bc is done. |
Job PR-2765-c46d7cf is done. |
@cheungdaven Feel free to merge when you think it's ready. |
Description of changes:
This pull request adds a document classification pipeline which can classify scanned document images into appropriate categories. Specifically,
(1) documents are represented as images;
(3) an OCR pipeline is used to obtain their texts and layout information;
(3) document foundation models (or document transformer) such as LayoutLM, LayoutLmv*, are used as the backbone which can be fine-tuned with document classification datasets.
We added
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.