Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.0.21 causes previously-consumable PDFs to fail now with RangeError #191

Open
rdunlop opened this issue Mar 29, 2021 · 6 comments · May be fixed by #215
Open

1.0.21 causes previously-consumable PDFs to fail now with RangeError #191

rdunlop opened this issue Mar 29, 2021 · 6 comments · May be fixed by #215

Comments

@rdunlop
Copy link

rdunlop commented Mar 29, 2021

I suspect that the input PDF that I'm dealing with is invalid...but I wanted to mention that it was working in 1.0.20, but no longer in 1.0.21.

The PDF appears to have an invalid stream defined near the end of my file (relevant part here::

8 0 obj\r<</Length 2200\r/Type\r/Metadata\r/Subtype \r/XML>>\rstream\rendstream\rendobj\r9 0 obj\r<< /Keywords()\r/Creator(HP Scan) \r/CreationDate(D:20210326163700-08'00')\r/ModDate(D:20210326163700-08'00')\r/Author ()\r/Producer (HP Scan Extended Application)\r/Title ()\r/Subject ()\r>>\rendobj\rxref\r0 10\r0000000000 65535 f \r0000000009 00000 n \r0000522282 00000 n \r0000522379 00000 n \r0000522588 00000 n \r0000522646 00000 n \r0000522697 00000 n \r0000522746 00000 n \r0000522892 00000 n \r0000522972 00000 n \rtrailer\r<<\r/Size 10\r/Root 5 0 R\r/Info 6 0 R\r/Info 7 0 R\r/Info 8 0 R\r/Info 9 0 R\r>>\rstartxref\r523171\r%%EOF\r

(pretty printed):

8 0 obj
<</Length 2200
/Type
/Metadata
/Subtype 
/XML>>
stream
endstream
endobj
9 0 obj
<< /Keywords()
/Creator(HP Scan) 
/CreationDate(D:20210326163700-08'00')
/ModDate(D:20210326163700-08'00')
/Author ()
/Producer (HP Scan Extended Application)
/Title ()
/Subject ()
>>
endobj
xref
0 10
0000000000 65535 f 
0000000009 00000 n 
0000522282 00000 n 
0000522379 00000 n 
0000522588 00000 n 
0000522646 00000 n 
0000522697 00000 n 
0000522746 00000 n 
0000522892 00000 n 
0000522972 00000 n 
trailer
<<
/Size 10
/Root 5 0 R
/Info 6 0 R
/Info 7 0 R
/Info 8 0 R
/Info 9 0 R
>>
startxref
523171
%%EOF

As you can see, the Length is 2200, but there are not 2200 bytes left in the file, and thus the @scanner.pos += out.last[:Length].to_i - 2
(here)[https://github.com/boazsegev/combine_pdf/blob/b966e703fd897ff50832d3823e74791099b82ca3/lib/combine_pdf/parser.rb#L364] causes a RangeError.

I am opening this ticket because I'm 90% sure that this is an invalid PDF, but I wanted to mention it out loud that the change introduced in 1.0.21 is (to me) a regression in capability. I recognize that #184 is a related issue.

For now, I've resolved my issue by reverting to 1.0.20. Not ideal, but sufficient for my purposes for now.

@boazsegev
Copy link
Owner

Hi @rdunlop ,

Thank you for opening this issue. I totally understand your concern and I myself was debating this change for his very reason.

This isn't about a performance optimization. I would much rather be able to read malformed PDF files than run faster...

...however, as I explained in #185 , this is required to accommodate properly authored PDF files that are allowed to contain PDF-like markers in their stream data (i.e., a PDF explaining how PDF data looks might contain the PDF endstream keyword). Issue #184 was an issue that referenced such a valid PDF file as an example.

The choice was either to continue failing on valid PDF files or to patch in a way that limited support for malformed PDF files... I guess there's a way to support both variations, I just didn't see it at the time (though I see it now, it might have a performance penalty).

I'm not high on time, but if you want to submit a PR that prefers valid PDF files and supports some sort of handling for malformed PDF files, that would be great.

Cheers,
Boaz Segev.

@RBIII
Copy link

RBIII commented Jul 12, 2022

This issue happened for me as well. PR seems to fix @boazsegev.

@stiaannel
Copy link

Has there been any updates on this ticket or #205 as yet on whether it will be merged or not? @boazsegev

@JrmKrb
Copy link

JrmKrb commented Jun 14, 2023

Thanks for the PR, is there anyway to get this fix merged @boazsegev ?

@AdrienQuilletKelio
Copy link

Sorry to bimp that PR, but we experience the same bug in production !
Fix would be greatly appreciated :)

@julitrows
Copy link

Still alive in 1.0.26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants