Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing a PDF hang - no error - while loading #95

Closed
matthopson opened this issue Apr 23, 2019 · 4 comments
Closed

Seeing a PDF hang - no error - while loading #95

matthopson opened this issue Apr 23, 2019 · 4 comments

Comments

@matthopson
Copy link

Hi,

We're seeing an issue with one PDF in our testing. This was a randomly-downloaded PDF, and we've not see this issue with any other PDF we've tested it against. We tried tracking the bug down in the parser, but had to move on. I wonder if you'd be able to offer any insight?

The PDF can be sourced here:
http://downloadcenter.samsung.com/content/UM/201903/20190326104351182/ENG_US_MUSATSCR-2.0.2.pdf

Some things of note:

  • Running it through a validator shows it's missing embeded fonts and metadata
  • Looking at the file info, it looks like it was created in InDesign - so... it could just be the way the file was made.

In our case, it'd be helpful if we were at least getting an error kicked back - but it looks more like it's stuck in a loop.

Thanks for any info you can offer!

@Hopding
Copy link
Owner

Hopding commented Apr 28, 2019

Hello @matthopson!

I spend some time investigating this today. It turns out that the parser does eventually finish. However, it takes a very long time to do so - 1300 seconds (22 minutes), to be precise! This is caused by an interplay between this particular PDF and a flaw in pdf-lib's parser.

This is a very large PDF. And it also contains some strange objects. Specifically, several very large arrays filled almost entirely with null values (e.g. [ null null null null ... null ]). This is quite uncommon (in fact, it's the first time I've seen a PDF like this). And it just so happens that pdf-lib's parser doesn't efficiently parse large arrays containing primarily boolean, hex string, or null values.

Anyways, I was able to fix this in #99. With these changes, the parser finishes in just 13 seconds (0.22 minutes) - 99% faster!

I just cut prerelease 0.6.2-rc1 with the fix.

You can install this prerelease with npm:

npm install pdf-lib@0.6.2-rc1

It's also available on unpkg:

Please try it out and let me know if it works for you!

@Hopding
Copy link
Owner

Hopding commented Apr 28, 2019

@matthopson I should also mention that when working with large PDF files like this, it's a good idea to disable object streams when saving them:

const pdfBytes = PDFDocumentWriter.saveToBytes(pdfDoc, { useObjectStreams: false });

Some PDF readers perform poorly when displaying large PDFs saved with object streams. Saving your documents this way helps avoid that issue.

@matthopson
Copy link
Author

Thanks Andrew! I can confirm that this fixes the long processing time for that PDF! I really appreciate your help with this, and all your work on this library in general.

@Hopding
Copy link
Owner

Hopding commented May 5, 2019

Version 0.6.2 is now published. It contains fix for this issue. The full release notes are available here.

You can install this new version with npm:

npm install pdf-lib@0.6.2

It's also available on unpkg:

@Hopding Hopding closed this as completed May 5, 2019
Hopding added a commit that referenced this issue Jun 30, 2019
Hopding added a commit that referenced this issue Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants