Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links are lost after combining PDFs #341

Closed
vekunz opened this issue Jan 27, 2020 · 6 comments
Closed

Links are lost after combining PDFs #341

vekunz opened this issue Jan 27, 2020 · 6 comments

Comments

@vekunz
Copy link

vekunz commented Jan 27, 2020

Hi, I use pdf-lib to combine multiple PDFs. One of the PDFs has links in it, like a table of contents. The links direct to other pages of the same PDF. the problem is that these links are lost after combining PDFs with pdf-lib.
Is there a way to preserve the links?

My code:

const pdfDoc = await PDFDocument.create();
for (const file of files) {
    const indices = [];
    for (let i = 0; i < file.getPageCount(); i++)
        indices.push(i);
    const pages = await pdfDoc.copyPages(file, indices);

   for (const page of pages) {
        pdfDoc.addPage(page);
    }
}

Edit: I found out that the links are saved as "Named Destinations" in the PDF. The PDF has Version 1.4. One option would be that I add the destinations after merging, but then I need an option to add these to the pdf manually.

@Hopding
Copy link
Owner

Hopding commented Feb 9, 2020

Hello @vekunz!

As you noted, the links do not work after the pages are merged because the links reference Named Destinations. Named Destinations are stored under the /Dests entry of the document's catalog. Unfortunately, the current page copying code does not copy anything from the donor document that isn't accessible from the page via a chain of indirect references. And most of the resources listed under the catalog are not accessible in this way.

This limitation has come up before in #159 and #218. I would like to see this issue resolved, but haven't had any time to work on it. I'd be open to discussing a solution to anybody interested in implementing a fix for copying catalog entries between documents!

@SteffenLanger
Copy link

Hi @Hopding,

Thanks for your great work!

I'd like to support you in copying catalog entries between documents. I'm new to PDFs internal workings but am a quick learner. I started researching the format and feel like I've got a good overview.

Since you know about pdf-lib best, do you have any suggestions for implementing this feature? My first (uneducated) guess would be:

  1. Find the catalog entries in the original document.
  2. Copy all catalog entries related to links to the new document.

@Hopding
Copy link
Owner

Hopding commented Sep 24, 2021

Added this to the roadmap for tracking: #998.

@oleteacher
Copy link

Wonderful lib! Know this old issue and closed, but links still do not work on merge in latest release. Hoping for support in future.

@rajashree23
Copy link

#1609

any updates for internal link to work?

@Ludevik
Copy link

Ludevik commented Apr 24, 2024

This is how i post process multiple documents after using copy pages.

import { PDFArray, PDFDict, PDFDocument, PDFName, PDFRef } from 'pdf-lib';

function getLinksPDFName(): PDFName {
  return PDFName.of('Dests');
}

function mapSourceToTargetPages(
  sources: PDFDocument[],
  destination: PDFDocument,
): Record<string, PDFRef> {
  const result = {};
  const sourcePages = sources.flatMap(source => source.getPages());
  const destinationPages = destination.getPages();
  for (let i = 0; i < sourcePages.length; i++) {
    result[sourcePages[i].ref.tag] = destinationPages[i].ref;
  }
  return result;
}

export function copyLinks(sources: PDFDocument[], target: PDFDocument): void {
  const targetLinksDict = PDFDict.withContext(target.context);
  sources
    .map(source => source.context.lookupMaybe(source.catalog.get(getLinksPDFName()), PDFDict))
    .filter(links => links != null)
    .forEach(links =>
      links.entries().forEach(([destName, destValue]) => targetLinksDict.set(destName, destValue)),
    );
  const pagesMapping = mapSourceToTargetPages(sources, target);
  (targetLinksDict.values() as PDFArray[]).forEach(array => {
    const currentPageRef = array.get(0) as PDFRef;
    array.set(0, pagesMapping[currentPageRef.tag]);
  });

  const destinationDestsRef = target.context.register(targetLinksDict);
  target.catalog.set(getLinksPDFName(), destinationDestsRef);
}

How it works:

  • copy entries from Dests from all sources to new dictionary
  • fix references (links) in target Dests because copied pages have different PDFRef
  • register dictionary in the target's context
  • set Dests reference in target's catalog to dictionary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants