feat: process images #11

qued · 2023-01-11T04:42:47Z

Adds ability to process images as single page documents. Uses mostly the same path as processing a pdf, but uses a null layout (since an image won't provide a pdf layout) and does not filter layout text blocks using a bounding box to discover text (because there are no layout text blocks).

Testing:

Run this code in the unstructured-inference environment:

from unstructured_inference.inference.layout import DocumentLayout

doc = DocumentLayout.from_image_file('sample-docs/loremipsum.png')
print(doc.pages[0].elements)

MthwRobinson

LGTM! Longer term, curious if there are cases where it may make sense to go straight to PaddleOCR instead to save on compute time. This looks great for the detectron2 case.

test_unstructured_inference/inference/test_layout.py

qued added 5 commits January 10, 2023 16:39

process image to single page document

cebe2f9

Add tests

c576360

Add sample image

9156391

Clean up test

eef8a31

Add test for from_image_file

ceb8770

qued requested a review from MthwRobinson January 11, 2023 04:42

Update changelog

34a7799

MthwRobinson approved these changes Jan 11, 2023

View reviewed changes

qued added 2 commits January 11, 2023 08:55

Remove commented out code

75e5b16

Remove unused code path

84e4d81

MthwRobinson reviewed Jan 11, 2023

View reviewed changes

test_unstructured_inference/inference/test_layout.py Outdated Show resolved Hide resolved

Add test for jpg and sample jpg

2e82987

qued merged commit b8238fe into main Jan 11, 2023

qued deleted the feat/process-images branch January 11, 2023 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: process images #11

feat: process images #11

Uh oh!

qued commented Jan 11, 2023

Uh oh!

MthwRobinson left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: process images #11

feat: process images #11

Uh oh!

Conversation

qued commented Jan 11, 2023

Testing:

Uh oh!

MthwRobinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants