-
Notifications
You must be signed in to change notification settings - Fork 1.1k
enhancement: allow setting image block crop padding parameter #2415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement: allow setting image block crop padding parameter #2415
Conversation
# Conflicts: # CHANGELOG.md
|
@christine Do I need to define some additional param to increase the padding? |
|
Do the |
@Coniferish To see a difference in the image, you'll need to set two environment variables |
Updated the |
Coniferish
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once we get that docker test passing :)
Want to check something before approval
|
@christinestraub, one last question: I want to make sure that this should NOT update the |
I didn't expect to update the |
# Conflicts: # CHANGELOG.md # unstructured/__version__.py
I think the primary intent of this PR is with respect to the image artifacts saved, so do not need to update the .text in this PR. It's a good callout though and worth tracking in another github issue. But, there is a fair bit of nuance -- it seems find to extend OCR .text, but also want to be careful not to duplicate other text if the bbox overlaps another element. Or, don't allow a bbox overlap in the first place for OCR, only extend to min(EXTRACT_IMAGE_BLOCK_CROP_HORIZONTAL_PAD,distance_to_next_bbox). |
cragwolfe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Closes #2320 .
Summary
In certain circumstances, adjusting the image block crop padding can improve image block extraction by preventing extracted image blocks from being clipped.
Testing
EXTRACT_IMAGE_BLOCK_CROP_HORIZONTAL_PADandEXTRACT_IMAGE_BLOCK_CROP_VERTICAL_PAD(e.g.
EXTRACT_IMAGE_BLOCK_CROP_HORIZONTAL_PAD = 40,EXTRACT_IMAGE_BLOCK_CROP_VERTICAL_PAD = 20