Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Incorrectly parsed object contents #59

Closed
kevinswartz opened this issue Jan 2, 2019 · 5 comments
Closed

Error: Incorrectly parsed object contents #59

kevinswartz opened this issue Jan 2, 2019 · 5 comments

Comments

@kevinswartz
Copy link

Hi Guys,
I'm seeing an error parsing this pdf. I've attached it, and here's the relevant portion of the stack:

I see this with v0.4.0, and v0.6.0.

error loading pdf-lib Error: Incorrectly parsed object contents
    at mo (pdf-lib.min.js:1)
    at rI (pdf-lib.min.js:1)
    at pdf-lib.min.js:1
    at cI (pdf-lib.min.js:1)
    at Object.parse (pdf-lib.min.js:1)

parsing error.pdf

Thanks for your help!
Kevin

@Hopding
Copy link
Owner

Hopding commented Jan 4, 2019

Hello @kevinswartz.

I'll take a look at this over the weekend. It might be a parser bug. Or the PDF might be invalid in some way, and the parser might need to be updated to handle it anyways. One thing that would help is if you can reproduce the stack trace using the non-minified version of pdf-lib. That will make the stack trace a lot more intelligible.

Note that the non-minified version is available from the same CDN as the minified version, just without the .min in the extension: https://unpkg.com/pdf-lib/dist/pdf-lib.js

@kevinswartz
Copy link
Author

Here's the un-minified stack. Thanks for your help!

error loading pdf-lib Error: Incorrectly parsed object contents
    at error (pdf-lib.js:8569)
    at parseIndirectObj (pdf-lib.js:46942)
    at parseLinearization (pdf-lib.js:47117)
    at parseDocument (pdf-lib.js:47204)
    at PDFParser.parse (pdf-lib.js:47241)
    at Function.PDFDocumentFactory.load (pdf-lib.js:47363)

@Hopding
Copy link
Owner

Hopding commented Jan 10, 2019

I took a look at the PDF you shared. It is technically invalid according to the PDF spec. It uses invalid stream delimiters. Opening up the file in Vim reveals the following:

%PDF-1.3^M1 0 obj^M<</Type /XObject /Subtype /Image /Name /Im1 /Width 1700
/Height 2200 /Length 80693/ColorSpace /DeviceRGB /BitsPerComponent 8 
/Filter [ /DCTDecode ] >> stream^MÿØÿà^@^PJFIF^@^A^A^A^@È^@È^@^@ÿÛ^@C^@^H^F^F^G^F^E^H^G^G^G   ^H

The important thing here is

>> stream^MÿØÿ

The >> marks the end of a dictionary. The stream keyword is supposed to be followed by a newline or newline and carriage return. But not a carriage return alone.

From the PDF spec:

7.3.8 Stream Objects
The keyword stream that follows the stream dictionary shall be followed by an end-of-line marker consisting of either a CARRIAGE RETURN and a LINE FEED or just a LINE FEED, and not by a CARRIAGE RETURN alone.

However, this PDF violates this requirement. The ^M you see above represents the carriage return that's in the file. And the stream begins immediately after it (ÿØÿ), with no newline in between.

pdf-lib's parser doesn't check for this scenario, since it shouldn't happen. But now that I have an example of such a PDF, I'll add a check in the parser for this to fix the error.

@Hopding
Copy link
Owner

Hopding commented Jan 10, 2019

@kevinswartz the check was added in #64.

I cut prerelease 0.6.1-rc2 with the fix.

You can install this prerelease with npm:

npm install pdf-lib@0.6.1-rc2

It's also available on unpkg:

Please try it out and let me know if it works for you!

@kevinswartz
Copy link
Author

Thanks @Hopding, it looks fixed to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants