Feat: improve image extraction by supporting all types of image elements detected by detection models #286

christinestraub · 2023-11-15T19:43:09Z

Closes #285.

Summary

support extracting elements with types Picture and Figure
add a class ElementType for the element type constants and use the constants to replace element type strings

Testing

from unstructured_inference.inference.layout import DocumentLayout

doc = DocumentLayout.from_file(
    filename="algebra-graph-level1-1.pdf",
    extract_images_in_pdf=True,
)

# Conflicts: # CHANGELOG.md

CHANGELOG.md

benjats07

LGTM
In the future we may explore the possiblity of just renaming Picture and Figure to be Image and simplify this.

christinestraub added 4 commits November 15, 2023 11:09

feat: support extracting elements with types Picture and Figure

ab42c06

feat: add constants for the element types

7257232

chore: update changelog & version

de150ce

chore: update .gitignore

09b4af9

christinestraub requested review from badGarnet, benjats07, cragwolfe, qued and yuming-long November 15, 2023 19:47

christinestraub added 2 commits November 16, 2023 13:25

Merge branch 'main' into feat/285-improve-image-extraction

e1e88f2

# Conflicts: # CHANGELOG.md

chore: update version

b251473

cragwolfe reviewed Nov 16, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

chore: update dev version to non-dev release

c8683d4

benjats07 approved these changes Nov 16, 2023

View reviewed changes

cragwolfe merged commit f35b830 into main Nov 16, 2023

cragwolfe deleted the feat/285-improve-image-extraction branch November 16, 2023 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: improve image extraction by supporting all types of image elements detected by detection models #286

Feat: improve image extraction by supporting all types of image elements detected by detection models #286

Uh oh!

christinestraub commented Nov 15, 2023 •

edited

Loading

Uh oh!

Uh oh!

benjats07 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Feat: improve image extraction by supporting all types of image elements detected by detection models #286

Feat: improve image extraction by supporting all types of image elements detected by detection models #286

Uh oh!

Conversation

christinestraub commented Nov 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

Uh oh!

benjats07 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

christinestraub commented Nov 15, 2023 •

edited

Loading