[AutoMM] Add document classification pipeline #2765

cheungdaven · 2023-01-26T04:49:57Z

Description of changes:
This pull request adds a document classification pipeline which can classify scanned document images into appropriate categories. Specifically,
(1) documents are represented as images;
(3) an OCR pipeline is used to obtain their texts and layout information;
(3) document foundation models (or document transformer) such as LayoutLM, LayoutLmv*, are used as the backbone which can be fine-tuned with document classification datasets.

We added

DocumentProcessor: for document processing, e.g., OCR, text, layout, and image feature generation.
DocumentTransformer: a classification wrapper for document foundation models. Text-focused models are also supported.
Tutorial and unit test on document classification.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2023-01-26T08:38:09Z

Job PR-2765-05acc33 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/05acc33/index.html

github-actions · 2023-01-26T22:56:02Z

Job PR-2765-dd29319 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/dd29319/index.html

github-actions · 2023-01-27T04:03:24Z

Job PR-2765-cb50221 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/cb50221/index.html

bryanyzhu

Great feature and tutorial, thanks.

docs/tutorials/multimodal/multimodal_prediction/document_classification.md

multimodal/src/autogluon/multimodal/data/preprocess_dataframe.py

multimodal/src/autogluon/multimodal/data/process_document.py

multimodal/src/autogluon/multimodal/data/infer_types.py

multimodal/src/autogluon/multimodal/constants.py

multimodal/src/autogluon/multimodal/data/process_document.py

docs/tutorials/multimodal/multimodal_prediction/document_classification.md

github-actions · 2023-01-30T20:53:23Z

Job PR-2765-be3db78 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/be3db78/index.html

multimodal/tests/unittests/others/test_doc_classification.py

github-actions · 2023-01-31T23:12:01Z

Job PR-2765-7c87a17 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/7c87a17/index.html

multimodal/src/autogluon/multimodal/data/infer_types.py

github-actions · 2023-02-01T01:02:54Z

Job PR-2765-db34ed3 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/db34ed3/index.html

github-actions · 2023-02-01T01:17:30Z

Job PR-2765-d095b6a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/d095b6a/index.html

zhiqiangdon

LGTM! Awesome new feature! We may need to sync up with Zihan on the OCR design.

sxjscience

LGTM.

github-actions · 2023-02-01T21:26:15Z

Job PR-2765-3336abc is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/3336abc/index.html

github-actions · 2023-02-01T23:39:51Z

Job PR-2765-f154a60 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/f154a60/index.html

github-actions · 2023-02-02T00:03:30Z

Job PR-2765-598a05d is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/598a05d/index.html

github-actions · 2023-02-02T00:30:34Z

Job PR-2765-e1b6c11 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/e1b6c11/index.html

FANGAreNotGnu

LGTM with a few minor comments.

multimodal/src/autogluon/multimodal/data/infer_types.py

multimodal/src/autogluon/multimodal/data/utils.py

github-actions · 2023-02-02T00:58:27Z

Job PR-2765-1d97c17 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/1d97c17/index.html

github-actions · 2023-02-02T01:56:10Z

Job PR-2765-fe711d4 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/fe711d4/index.html

github-actions · 2023-02-02T02:43:34Z

Job PR-2765-e6343bc is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/e6343bc/index.html

github-actions · 2023-02-02T02:55:28Z

Job PR-2765-c46d7cf is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2765/c46d7cf/index.html

sxjscience · 2023-02-02T05:49:54Z

@cheungdaven Feel free to merge when you think it's ready.

Ubuntu and others added 6 commits January 25, 2023 23:02

Add document classification pipeline

937fbd2

reuse image processing functions

e4109d4

add unit test

33b3b52

Update setup.py

2847a18

fix import error

50fa16d

fic ci issue

05acc33

update document processor

dd29319

Ubuntu and others added 2 commits January 27, 2023 01:23

add document classification tutorial

3dfbe9f

Update test_doc_classification.py

cb50221

cheungdaven requested review from sxjscience and zhiqiangdon January 27, 2023 03:30

cheungdaven changed the title ~~[WIP] Add document classification pipeline~~ [AutoMM] Add document classification pipeline Jan 27, 2023

cheungdaven requested review from bryanyzhu, FANGAreNotGnu, suzhoum, yongxinw and Harry-zzh January 27, 2023 04:05

bryanyzhu reviewed Jan 27, 2023

View reviewed changes

zhiqiangdon reviewed Jan 28, 2023

View reviewed changes

multimodal/src/autogluon/multimodal/data/infer_types.py Outdated Show resolved Hide resolved

zhiqiangdon reviewed Jan 28, 2023

View reviewed changes

multimodal/src/autogluon/multimodal/constants.py Show resolved Hide resolved

zhiqiangdon reviewed Jan 28, 2023

View reviewed changes

multimodal/src/autogluon/multimodal/data/process_document.py Outdated Show resolved Hide resolved

Harry-zzh reviewed Jan 28, 2023

View reviewed changes

docs/tutorials/multimodal/multimodal_prediction/document_classification.md Outdated Show resolved Hide resolved

Merge branch 'master' into doc_submit

be3db78

zhiqiangdon reviewed Jan 31, 2023

View reviewed changes

multimodal/tests/unittests/others/test_doc_classification.py Outdated Show resolved Hide resolved

address feedback

7c87a17

zhiqiangdon reviewed Feb 1, 2023

View reviewed changes

multimodal/src/autogluon/multimodal/data/infer_types.py Outdated Show resolved Hide resolved

zhiqiangdon reviewed Feb 1, 2023

View reviewed changes

multimodal/src/autogluon/multimodal/data/infer_types.py Outdated Show resolved Hide resolved

Ubuntu and others added 3 commits February 1, 2023 18:50

add document_image

dfab260

remove document from allowable_label_types

91dedec

Merge branch 'autogluon:master' into doc_submit

02fc323

zhiqiangdon approved these changes Feb 1, 2023

View reviewed changes

sxjscience approved these changes Feb 1, 2023

View reviewed changes

Merge branch 'autogluon:master' into doc_submit

3336abc

Ubuntu and others added 3 commits February 1, 2023 22:09

update index.rst

f154a60

Merge branch 'autogluon:master' into doc_submit

e67cf49

update model list

598a05d

cheungdaven added the model list checked You have updated the model list after modifying multimodal unit tests/docs label Feb 1, 2023

Ubuntu added 2 commits February 1, 2023 22:59

update model list

e1b6c11

disable onnix support for the moment

1d97c17

add document folder to workflow

fe711d4

FANGAreNotGnu approved these changes Feb 2, 2023

View reviewed changes

multimodal/src/autogluon/multimodal/data/infer_types.py Show resolved Hide resolved

multimodal/src/autogluon/multimodal/data/utils.py Outdated Show resolved Hide resolved

multimodal/src/autogluon/multimodal/data/utils.py Outdated Show resolved Hide resolved

cheungdaven and others added 2 commits February 1, 2023 17:20

Merge branch 'master' into doc_submit

e6343bc

address feedback

c46d7cf

cheungdaven merged commit f3719e1 into autogluon:master Feb 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoMM] Add document classification pipeline #2765

[AutoMM] Add document classification pipeline #2765

cheungdaven commented Jan 26, 2023 •

edited

github-actions bot commented Jan 26, 2023

github-actions bot commented Jan 26, 2023

github-actions bot commented Jan 27, 2023

bryanyzhu left a comment

github-actions bot commented Jan 30, 2023

github-actions bot commented Jan 31, 2023

github-actions bot commented Feb 1, 2023

github-actions bot commented Feb 1, 2023

zhiqiangdon left a comment

sxjscience left a comment

github-actions bot commented Feb 1, 2023

github-actions bot commented Feb 1, 2023

github-actions bot commented Feb 2, 2023

github-actions bot commented Feb 2, 2023

FANGAreNotGnu left a comment

github-actions bot commented Feb 2, 2023

github-actions bot commented Feb 2, 2023

github-actions bot commented Feb 2, 2023

github-actions bot commented Feb 2, 2023

sxjscience commented Feb 2, 2023

[AutoMM] Add document classification pipeline #2765

[AutoMM] Add document classification pipeline #2765

Conversation

cheungdaven commented Jan 26, 2023 • edited

github-actions bot commented Jan 26, 2023

github-actions bot commented Jan 26, 2023

github-actions bot commented Jan 27, 2023

bryanyzhu left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 30, 2023

github-actions bot commented Jan 31, 2023

github-actions bot commented Feb 1, 2023

github-actions bot commented Feb 1, 2023

zhiqiangdon left a comment

Choose a reason for hiding this comment

sxjscience left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 1, 2023

github-actions bot commented Feb 1, 2023

github-actions bot commented Feb 2, 2023

github-actions bot commented Feb 2, 2023

FANGAreNotGnu left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 2, 2023

github-actions bot commented Feb 2, 2023

github-actions bot commented Feb 2, 2023

github-actions bot commented Feb 2, 2023

sxjscience commented Feb 2, 2023

cheungdaven commented Jan 26, 2023 •

edited