Seeing a PDF hang - no error - while loading #95

matthopson · 2019-04-23T21:16:23Z

Hi,

We're seeing an issue with one PDF in our testing. This was a randomly-downloaded PDF, and we've not see this issue with any other PDF we've tested it against. We tried tracking the bug down in the parser, but had to move on. I wonder if you'd be able to offer any insight?

The PDF can be sourced here:
http://downloadcenter.samsung.com/content/UM/201903/20190326104351182/ENG_US_MUSATSCR-2.0.2.pdf

Some things of note:

Running it through a validator shows it's missing embeded fonts and metadata
Looking at the file info, it looks like it was created in InDesign - so... it could just be the way the file was made.

In our case, it'd be helpful if we were at least getting an error kicked back - but it looks more like it's stuck in a loop.

Thanks for any info you can offer!

Hopding · 2019-04-28T18:41:13Z

Hello @matthopson!

I spend some time investigating this today. It turns out that the parser does eventually finish. However, it takes a very long time to do so - 1300 seconds (22 minutes), to be precise! This is caused by an interplay between this particular PDF and a flaw in pdf-lib's parser.

This is a very large PDF. And it also contains some strange objects. Specifically, several very large arrays filled almost entirely with null values (e.g. [ null null null null ... null ]). This is quite uncommon (in fact, it's the first time I've seen a PDF like this). And it just so happens that pdf-lib's parser doesn't efficiently parse large arrays containing primarily boolean, hex string, or null values.

Anyways, I was able to fix this in #99. With these changes, the parser finishes in just 13 seconds (0.22 minutes) - 99% faster!

I just cut prerelease 0.6.2-rc1 with the fix.

You can install this prerelease with npm:

npm install pdf-lib@0.6.2-rc1

It's also available on unpkg:

Please try it out and let me know if it works for you!

Hopding · 2019-04-28T21:12:31Z

@matthopson I should also mention that when working with large PDF files like this, it's a good idea to disable object streams when saving them:

const pdfBytes = PDFDocumentWriter.saveToBytes(pdfDoc, { useObjectStreams: false });

Some PDF readers perform poorly when displaying large PDFs saved with object streams. Saving your documents this way helps avoid that issue.

matthopson · 2019-04-29T20:09:32Z

Thanks Andrew! I can confirm that this fixes the long processing time for that PDF! I really appreciate your help with this, and all your work on this library in general.

Hopding · 2019-05-05T00:38:08Z

Version 0.6.2 is now published. It contains fix for this issue. The full release notes are available here.

You can install this new version with npm:

npm install pdf-lib@0.6.2

It's also available on unpkg:

Hopding mentioned this issue Apr 28, 2019

Change parsing order #99

Merged

Hopding closed this as completed May 5, 2019

Hopding added a commit that referenced this issue Jun 30, 2019

Add parser tests for #95 and #119

540e534

Hopding added a commit that referenced this issue Aug 30, 2021

Add parser tests for #95 and #119

f028276

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeing a PDF hang - no error - while loading #95

Seeing a PDF hang - no error - while loading #95

matthopson commented Apr 23, 2019

Hopding commented Apr 28, 2019

Hopding commented Apr 28, 2019

matthopson commented Apr 29, 2019

Hopding commented May 5, 2019

Seeing a PDF hang - no error - while loading #95

Seeing a PDF hang - no error - while loading #95

Comments

matthopson commented Apr 23, 2019

Hopding commented Apr 28, 2019

Hopding commented Apr 28, 2019

matthopson commented Apr 29, 2019

Hopding commented May 5, 2019