Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip text drawn outside the MediaBox #413

Merged
merged 1 commit into from Dec 24, 2021
Merged

Conversation

yob
Copy link
Owner

@yob yob commented Dec 24, 2021

When characters are rendered off the page, don't include them in the extracted text.

Ideally this would be the CropBox rather than MediaBox, but I don't have easy access to that in PageLayout and some coming refactors will make that easier to achieve. This is a good start

I don't have a sample PDF to use in an integration test, so I've added a pending spec.

When characters are rendered off the page, don't include them in the
extracted text.

Ideally this would be the CropBox rather than MediaBox, but I don't have
easy access to that in PageLayout and some coming refactors will make
that easier to achieve. This is a good start

I do't have a sample PDF to use in an integration test, so I've added a
pending spec.
@yob yob merged commit 5c752c3 into main Dec 24, 2021
@yob yob deleted the skip-characters-outside-crobox branch December 24, 2021 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant