Skip to content

ROB: Flate decoding for streams with faulty tail bytes #3332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 27, 2025

Conversation

henningkoertelgmg
Copy link
Contributor

Some FLATE encoded streams of early Adobe Distiller / Pitstop versions are written with additionally added CR bytes to the PDF and calculate the faulty tail bytes into Length value of stream dict. Later then decoding fails. Solved with removing step by step tail bytes until decoding is successful.

Some FLATE encoded streams of early Adobe Distiller / Pitstop versions are written with additionally added CR bytes to the PDF and calculate the faulty tail bytes into Length value of stream dict. Later then decoding fails. Solved with removing step by step tail bytes until decoding is successful.
@henningkoertelgmg
Copy link
Contributor Author

henningkoertelgmg commented Jun 25, 2025

faulty_stream_tail_example 1.pdf
decoded.dat.txt

I added one sample file and will create a test from it later. I got another sample PDF but it is too large to add it here ...

Copy link

codecov bot commented Jun 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.73%. Comparing base (7c3db03) to head (8b0c38a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3332   +/-   ##
=======================================
  Coverage   96.73%   96.73%           
=======================================
  Files          53       53           
  Lines        9054     9060    +6     
  Branches     1675     1676    +1     
=======================================
+ Hits         8758     8764    +6     
  Misses        177      177           
  Partials      119      119           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stefan6419846 stefan6419846 changed the title PI: FLATE decoding for streams with faulty tail bytes ROB: Flate decoding for streams with faulty tail bytes Jun 25, 2025
First paste of new code was wrong, so I corrected it and moved it from the "try" block into the "except" block because it is a fall back in case of an error and not a general new approach for decoding
@henningkoertelgmg
Copy link
Contributor Author

Sorry for the 2nd commit, did a mistake during code copy. Now it fits.

@stefan6419846 stefan6419846 added the needs-test A test should be added before this PR is merged. label Jun 26, 2025
@henningkoertelgmg
Copy link
Contributor Author

henningkoertelgmg commented Jun 27, 2025

I am done with the suggestion to simplify code as requested above.
Test has been added and covers now two aspects: readabilty of the faulty stream and good performance.

Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>
stefan6419846 and others added 2 commits June 27, 2025 12:14
To make the asset download more robust the test PDF and the expected data have been move to test function body with the pitfall of needing a higher timeout now, what makes this test less precise for the performance check.
henningkoertelgmg and others added 2 commits June 27, 2025 12:46
Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>
Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>
Copy link
Collaborator

@stefan6419846 stefan6419846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and your patience.

@stefan6419846 stefan6419846 removed the needs-test A test should be added before this PR is merged. label Jun 27, 2025
@stefan6419846 stefan6419846 merged commit ed645ca into py-pdf:main Jun 27, 2025
16 checks passed
@henningkoertelgmg henningkoertelgmg deleted the patch-FLATEperformance branch June 27, 2025 11:20
@henningkoertelgmg
Copy link
Contributor Author

Thanks for your support Stefan - was not easy for me because the lack of experience with your project and the tough style rules.

stefan6419846 added a commit that referenced this pull request Jun 29, 2025
## What's new

### Performance Improvements (PI)
- Performance optimization for LZW decoding (#3329) by @henningkoertelgmg

### Robustness (ROB)
- Flate decoding for streams with faulty tail bytes (#3332) by @henningkoertelgmg
- dc_creator could be a Bag as well (#3333) by @stefan6419846
- Handle tree being NullObject when retrieving named destinations (#3331) by @stefan6419846

### Maintenance (MAINT)
- Move inline-image mappings to constants (#3328) by @stefan6419846

[Full Changelog](5.6.1...5.7.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants