Skip to content

Conversation

@christinestraub
Copy link
Contributor

@christinestraub christinestraub commented Dec 4, 2023

Summary

This PR is the second part of the "image extraction" refactor to move it from unstructured-inference repo to unstructured repo, the first part is done in Unstructured-IO/unstructured-inference#299. This PR adds logic to support extracting images.

Testing

git clone -b refactor/remove_image_extraction_code --single-branch https://github.com/Unstructured-IO/unstructured-inference.git && cd unstructured-inference && pip install -e . && cd ../

elements = partition_pdf(
        filename="example-docs/embedded-images.pdf",
        strategy="hi_res",
        extract_images_in_pdf=True,
    )

print("\n\n".join([str(el) for el in elements]))

cragwolfe pushed a commit to Unstructured-IO/unstructured-inference that referenced this pull request Dec 5, 2023
### Summary
This PR is the first part of the "image extraction" refactor to move it
from unstructured-inference repo to unstructured repo. This PR removes
all "image extraction" related code from unstructured-inference repo and
works together with the unstructured refactor PR -
Unstructured-IO/unstructured#2201.

### Note
The ingest test won't pass until we merge the unstructured refactor PR -
Unstructured-IO/unstructured#2201.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants