Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix assertion failure on PDF pages with only an LTFigure and no LTTextBox/Line components #49

Merged
merged 2 commits into from
Nov 28, 2021

Conversation

0xabu
Copy link
Owner

@0xabu 0xabu commented Nov 27, 2021

Fix a crash when a page contains annotations and text, but that text is part of an LTFigure component and not an LTTextLine/Box. In this case, the prior logic would fail to assign a sequence number to the annotations on the page. The new logic is not perfect (all annotations inside a "figure" are effectively at the same sequence on the page), but should be good enough given that such PDFs appear to be rare.

Also, improve the warning message for unsupported annotations.

Fixes #48

…mponents

issue #48 demonstrates a PDF where all text is chars within a figure, and there
are no lines/boxes
@0xabu 0xabu merged commit a0e6b41 into master Nov 28, 2021
@0xabu 0xabu deleted the issue48 branch November 28, 2021 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: assert self._pageseq != 0
1 participant