Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and enhance PDF metadata reading #242

Merged
merged 3 commits into from
Jul 5, 2022
Merged

Fix and enhance PDF metadata reading #242

merged 3 commits into from
Jul 5, 2022

Conversation

J08nY
Copy link
Member

@J08nY J08nY commented Jul 5, 2022

This fixes an issue where we closed the PDF file too early (and if there were IndirectObjects in it PyPDF2 then failed reading metadata). Also, this adds a search of hyperlinks in the PDF metadata and a function for parsing the PDF creation dates (useful for future analysis).

J08nY added 2 commits July 5, 2022 18:17
Closing the file object before reading things off of the metadata
object leads to "seek of closed file" errors.
@J08nY J08nY force-pushed the fix/metadata-read branch 2 times, most recently from 215dce8 to 67d94eb Compare July 5, 2022 17:01
@J08nY J08nY merged commit 69b934d into main Jul 5, 2022
@adamjanovsky adamjanovsky deleted the fix/metadata-read branch July 27, 2022 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant