feat: better image support #71

qued · 2023-03-22T15:10:51Z

Updated the OCR logic to be aware of image elements.

LayoutParser only deals with text objects, so this PR removes LayoutParser from the internals and replaces the components. (LayoutParser remains a dependency because of detectron2.)

Testing:

Run:

from unstructured_inference.inference.layout import DocumentLayout

doc = DocumentLayout.from_file('sample-docs/loremipsum-flat.pdf')

doc.pages[0].elements should contain elements with the text of the document.

mallorih

LGTM

benjats07

LGTM

ajjimeno · 2023-03-23T10:53:34Z

LGTM!

In the function in line 158 from the code below, I think if no model name is passed, it will raise an exception, I think get_model() would raise an exception.

https://github.com/Unstructured-IO/unstructured-inference/blob/f23a13af897d8e7348b7948bd4b237f5bd0722e6/unstructured_inference/inference/layout.py

qued added 10 commits March 20, 2023 10:56

Pull out layoutparser

e489191

Change how ocr is done

a5cbe9a

New model output type

97ba9e4

Update tests

59b6256

Add image-only pdf for testing

d0c9815

Add missing docstrings.

8a170b4

Update deps and add docstring linting

c6e4375

Merge branch 'main' into feat/better-image-support

65ed2ce

Update version

701fa04

Remove note that no longer applies

ceafc4d

qued requested review from MthwRobinson, benjats07 and mallorih March 22, 2023 15:10

mallorih approved these changes Mar 22, 2023

View reviewed changes

benjats07 approved these changes Mar 22, 2023

View reviewed changes

qued merged commit 4a52922 into main Mar 25, 2023

qued deleted the feat/better-image-support branch March 25, 2023 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: better image support #71

feat: better image support #71

Uh oh!

qued commented Mar 22, 2023

Uh oh!

mallorih left a comment

Uh oh!

benjats07 left a comment

Uh oh!

ajjimeno commented Mar 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: better image support #71

feat: better image support #71

Uh oh!

Conversation

qued commented Mar 22, 2023

Testing:

Uh oh!

mallorih left a comment

Choose a reason for hiding this comment

Uh oh!

benjats07 left a comment

Choose a reason for hiding this comment

Uh oh!

ajjimeno commented Mar 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants