Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing text to specific pdf seems to break the structure #78

Closed
kevinswartz opened this issue Feb 26, 2019 · 9 comments
Closed

Writing text to specific pdf seems to break the structure #78

kevinswartz opened this issue Feb 26, 2019 · 9 comments

Comments

@kevinswartz
Copy link

Hi @Hopding ,
I have a file here that I'm able to view without issue in pdf.js. Once I write some text to it via pdf-lib, the file can no longer be viewed in pdf.js with the error "Invalid PDF Structure". I've attached pdfs from before, and after the write. Do you have any ideas about ways to write text differently so this doesn't happen? These files are non-production.
Thanks again!
file_before.pdf
file_after.pdf

@kevinswartz
Copy link
Author

kevinswartz commented Mar 1, 2019

I have another file with the same problem (I think). Attaching it here!
before.pdf
after.pdf

Edit: I was using v0.6.1-rc4 when I generated this file

@Hopding
Copy link
Owner

Hopding commented Mar 10, 2019

Hello @kevinswartz.

I took a look at this today. Something about the source document seems to be causing pdf-lib to miscalculate the offsets for the cross-reference table. This is likely a bug in the PDFDocumentWriter, which means the problem will arise just by opening and saving the document with pdf-lib - whether you make any modifications or not.

You can sort of work around the problem by saving the document without object streams:

// With Object Streams
PDFDocumentWriter.saveToBytes(pdfDoc);

// Without Object Streams
PDFDocumentWriter.saveToBytes(pdfDoc, { useObjectStreams: false });

Acrobat was able to open the documents you shared after saving with useObjectStreams: false.

Of course, this doesn't actually fix the bug. So I'll continue looking into this and let you know what I find.

@kevinswartz
Copy link
Author

Thanks @Hopding ,
I can confirm that this fixes what I was seeing with both of these files. Are there any other consequences to save with useObjectStreams: false? What is that really doing? Thanks!

@Hopding
Copy link
Owner

Hopding commented Mar 12, 2019

@kevinswartz The only real benefit to using object streams is that it makes the resulting PDF file a bit smaller. Many PDF libraries don't support object streams at all, and only write PDFs without them.

PDF files contain a structure known as a Cross Reference Table (since PDF v1.0). This table contains pointers (byte offsets) of each object in the document. This allows for fast random access to objects in large PDF files. These tables tend to get corrupted a lot, so most readers are able to reconstruct them without any perceptible change in the reader's performance.

However, if the file is saved with object streams, then Cross Reference Streams are used instead of Cross Reference Tables. Cross Reference Streams were introduced in a later PDF version (v1.6, I think). For whatever reason, not as many readers are able to reconstruct corrupted Cross Reference Streams (e.g. Google Chrome can, but Mac's Preview and Adobe Acrobat apparently cannot).

@kevinswartz
Copy link
Author

Thanks @Hopding! Good information. We might start not using object streams if it means better compatibility.

@Hopding
Copy link
Owner

Hopding commented May 3, 2019

Hello there @kevinswartz!

I was able to find and fix the issue causing this in #101. Some of the logic used to write out the cross reference tables and streams was incorrect. In particular, the code assumed that all PDFs would have an object with an ID of 1. This resulted in offset miscalculations if, in fact, no such object existed.

I just cut prerelease 0.6.2-rc3 with the fix.

You can install this prerelease with npm:

npm install pdf-lib@0.6.2-rc3

It's also available on unpkg:

Please try it out and let me know if it works for you!

@kevinswartz
Copy link
Author

Thanks! I'll check it out.

@Hopding
Copy link
Owner

Hopding commented May 5, 2019

Version 0.6.2 is now published. It contains fix for this issue. The full release notes are available here.

You can install this new version with npm:

npm install pdf-lib@0.6.2

It's also available on unpkg:

(@kevinswartz if you find that you're still having trouble with this after using the new release, please go ahead and reopen this issue.)

@Hopding Hopding closed this as completed May 5, 2019
@kevinswartz
Copy link
Author

Thanks @Hopding! Looks like that fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants