Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: PDF should decrease in file size after calling removePage #140

Closed
jdbaculard opened this issue Jul 23, 2019 · 8 comments

Comments

@jdbaculard
Copy link

Hi

I use PDF-LIB with nodeJS. I've got a PDF File with 2000 pages for 13MB.
I remove 1900 pages (with pdfDoc.removePage(10),pdfDoc.removePage(11),pdfDoc.removePage(12), ...). It's working. it's great but the file size remains the same while only 100 pages remain in the modified file.

Why ?

Any solution please ? Thanks

@Hopding
Copy link
Owner

Hopding commented Jul 25, 2019

Hello @jdbaculard!

The removePage method does not delete all of a page's objects from the document. It just removes the reference to the from the page tree (leaving the page's objects intact). The reason for this is that a page can actually be reused. If the objects themselves were deleted, then removing a single page might have unintended side effects elsewhere in the document. And even it it wasn't used multiple times, you may wish to reinsert the page after removing it (for example, reordering pages).

That being said, I can certainly see how it would be desirable to actually delete a page's objects from the document, in addition to removing its reference from the page tree. I think it would make sense to add a new method, PDFDocument.deletePage that does this.

But, in the meantime, there is a workaround: You can copy the pages you want to keep into a document object. For example, if you want to keep pages 1, 4, and 75:

const originalDoc = await PDFDocument.load(...)
const modifiedDoc = await PDFDocument.create()

const pages = await modifiedDoc.copyPages(originalDoc, [0, 3, 74])
pages.forEach(page => modifiedDoc.addPage(page))

const pdfBytes = await pdfDoc.save()

NOTE: This code sample is based on the new v1.0.0 API. This is currently in beta. However, it is stable at this point and will graduate from beta to stable this weekend. So feel free to use it now.

I hope this helps. Please let me know if you have any additional questions!

@jdbaculard
Copy link
Author

Thank you for your reply.
The copy method is not a good idea for me because each page keeps the logo and each page is 200KB. At the end the total weight of the file is 20MB for 100 pages is much larger than the original file of 2000 pages.

Can I test the deletePage in beta method?

Congratulations for your work.

@Hopding
Copy link
Owner

Hopding commented Jul 26, 2019

@jdbaculard Have you tried the page copying approach using the beta version? The current version (0.6.4) has an API flaw that results in duplicate objects being created (hence the large document size). But this has been fixed in the beta version. For example, in 0.6.4 the logo will be copied 2000 times. But in v1.0.0-beta.3 the logo will only be copied once.

The beta version does not currently have a deletePage method. This method hasn’t yet been implemented. And it will probably be at least a few weeks until I have time to work on it myself (of course, others are welcome to work on it and submit a PR).

@jdbaculard
Copy link
Author

Great ! it's works !
Thanks

@sdfereday
Copy link

@Hopding Grand way of removing pages 👍 Just one thing I noticed (can raise an issue if that's preferred), but after creating the new document it seems to be missing a couple of entries in its catalog that the original document had (such as AcroForm, StructTreeRoot, etc).

Presumably document creation leaves that part up to you? So if I wanted to copy across extra catalog entries from the original, that's allowed?

@Hopding
Copy link
Owner

Hopding commented Dec 28, 2019

Reopening this to track it as a feature request.

@sdfereday Just realized that my comment on #159 (comment) isn't applicable as it doesn't delete unused page objects.

@Hopding Hopding reopened this Dec 28, 2019
@Hopding Hopding changed the title PDF not decrease file size after removePage [Feature Request]: PDF should decrease in file size after calling removePage Jan 1, 2020
@kevinswartz
Copy link

Hey guys, just wanted to voice my support for this feature. This would help us out a lot. Thanks for your great work on this!

@Hopding
Copy link
Owner

Hopding commented Sep 24, 2021

Added this to the roadmap for tracking: #998.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants