-
Notifications
You must be signed in to change notification settings - Fork 1.5k
ROB: Flate decoding for streams with faulty tail bytes #3332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROB: Flate decoding for streams with faulty tail bytes #3332
Conversation
Some FLATE encoded streams of early Adobe Distiller / Pitstop versions are written with additionally added CR bytes to the PDF and calculate the faulty tail bytes into Length value of stream dict. Later then decoding fails. Solved with removing step by step tail bytes until decoding is successful.
faulty_stream_tail_example 1.pdf I added one sample file and will create a test from it later. I got another sample PDF but it is too large to add it here ... |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3332 +/- ##
=======================================
Coverage 96.73% 96.73%
=======================================
Files 53 53
Lines 9054 9060 +6
Branches 1675 1676 +1
=======================================
+ Hits 8758 8764 +6
Misses 177 177
Partials 119 119 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
First paste of new code was wrong, so I corrected it and moved it from the "try" block into the "except" block because it is a fall back in case of an error and not a general new approach for decoding
Sorry for the 2nd commit, did a mistake during code copy. Now it fits. |
simplified code + created test
…ertelgmg/pypdf into patch-FLATEperformance
I am done with the suggestion to simplify code as requested above. |
Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>
To make the asset download more robust the test PDF and the expected data have been move to test function body with the pitfall of needing a higher timeout now, what makes this test less precise for the performance check.
Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>
Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and your patience.
Thanks for your support Stefan - was not easy for me because the lack of experience with your project and the tough style rules. |
## What's new ### Performance Improvements (PI) - Performance optimization for LZW decoding (#3329) by @henningkoertelgmg ### Robustness (ROB) - Flate decoding for streams with faulty tail bytes (#3332) by @henningkoertelgmg - dc_creator could be a Bag as well (#3333) by @stefan6419846 - Handle tree being NullObject when retrieving named destinations (#3331) by @stefan6419846 ### Maintenance (MAINT) - Move inline-image mappings to constants (#3328) by @stefan6419846 [Full Changelog](5.6.1...5.7.0)
Some FLATE encoded streams of early Adobe Distiller / Pitstop versions are written with additionally added CR bytes to the PDF and calculate the faulty tail bytes into Length value of stream dict. Later then decoding fails. Solved with removing step by step tail bytes until decoding is successful.