Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing through pdf-lib causes xref/corrupt message #454

Closed
jackwshepherd opened this issue May 22, 2020 · 5 comments
Closed

Processing through pdf-lib causes xref/corrupt message #454

jackwshepherd opened this issue May 22, 2020 · 5 comments

Comments

@jackwshepherd
Copy link

jackwshepherd commented May 22, 2020

Hello

Thanks once again for supporting such a fantastic library. I've always had trouble with the below PDF.

https://tinyurl.com/y9cwewbz

It appears that whenever I process it through pdf-lib, it opens in every PDF Reader except Adobe Reader. I think it's quite important it works in Adobe Reader.

When I run qpdf on it, I get the below error:

WARNING: /Users/me/Downloads/corrupt.pdf: file is damaged
WARNING: /Users/me/Downloads/corrupt.pdf (offset 6711353): xref not found
WARNING: /Users/me/Downloads/corrupt.pdf: Attempting to reconstruct cross-reference table

My code is below, in case relevant.

Thanks a lot

// Get the bundle
const bundle = req.project.bundles[req.bundleKey];
// Get list of all entries
const entries = bundle.sections.reduce(
  (acc, section) => [...acc, ...section.entries],
  []
);

// Initiate PDFDocument obejct
let mergedPdf = await PDFDocument.create();
const font = await mergedPdf.embedFont(StandardFonts.HelveticaBold);
const paginationFont = await mergedPdf.embedFont(
  StandardFonts.CourierBold
);

// First, put the cover page on
/*  const cover = await readFileAsync("./tmp/cover.pdf");
const coverPdf = await PDFDocument.load(cover);
const coverPage = await mergedPdf.copyPages(coverPdf, [0]);


coverPage[0].drawText(req.project.name, {
  x: 50,
  y: 75,
  font,
  size: 16,
  color: rgb(0.26, 0.25, 0.24),
});

// Put the name of the bundle
coverPage[0].drawText(req.project.bundles[req.bundleKey].name, {
  x: 50,
  y: 50,
  font,
  size: 12,
  color: rgb(0.41, 0.4, 0.37),
});

mergedPdf.addPage(coverPage[0]);*/

// Get dividers for future usage
//  const divider = await readFileAsync("./tmp/divider.pdf");

// Set page count at 1
let pageNumber = 1;

// Array of PDFDocument objects for all files to be merged
for (key in entries) {
  const entry = entries[key];
  if (!entry.uploaded) return false;
  const filename = `./tmp/${entry.uploaded.replace("/", "")}`;

  // Download file to the tmp folder
  await downloadFile(filename, entry.uploaded);

  // First, add the divider
  /*    const dividerPdf = await PDFDocument.load(divider);
  const dividerPage = await mergedPdf.copyPages(dividerPdf, [0]);
  const dividerNo = parseInt(parseInt(key) + 1).toString();
  dividerPage[0].drawText(dividerNo, {
    x: 560,
    y: 710,
    font,
    size: 14,
    color: rgb(0.22, 0.22, 0.22),
  });
  dividerPage[0].drawText(parseFilename(entry.name, key), {
    x: 50,
    y: 710,
    font,
    size: 14,
    maxWidth: 350,
    color: rgb(0.22, 0.22, 0.22),
  });
  mergedPdf.addPage(dividerPage[0]);*/

  // Now load into a PDFDocument object
  const file = await readFileAsync(filename);
  const entryPdf = await PDFDocument.load(file);

  const copiedPages = await mergedPdf.copyPages(
    entryPdf,
    entryPdf.getPageIndices()
  );

  // Now add each page into the merged PDF
  copiedPages.forEach((page) => {
    const width = page.getWidth();
    const bottomRight = width - 20;
    // Draw a background circle for hte number
    page.drawCircle({
      x: bottomRight,
      y: 20,
      size: 15,
      color: rgb(1, 1, 1),
    });
    // Put the number on the page
    page.drawText(pageNumber.toString(), {
      x:
        pageNumber < 10
          ? bottomRight - 3
          : pageNumber < 99
          ? bottomRight - 7
          : bottomRight - 11,
      y: 16,
      font: paginationFont,
      size: 12,
      color: rgb(0, 0, 0),
    });
    // Increment the page nunber
    pageNumber = pageNumber + 1;

    // And add the page in
    mergedPdf.addPage(page);
  });

  // Now delete them
  await unlinkAsync(filename);
}

// Finalise the pdf file
const finalPdf = await mergedPdf.save();

const uploadPdf = new Buffer.from(finalPdf);

// Now write to the file sync
//await writeFileAsync("./tmp/merged.pdf", finalPdf);

// Set filename for the bundle
const bundleFilenameLong = `${bundle._id.toString()}.pdf`;
const bundleFilenameShort = `${req.project.name} - ${bundle.name}`;

// Now upload to s3
const upload = await new Promise((resolve, reject) => {
  s3.putObject(
    {
      Bucket: keys.awsBucket,
      Key: bundleFilenameLong,
      Body: uploadPdf,
    },
    (err, url) => {
      if (err) reject(err);
      else resolve(url);
    }
  );
});
const url = await getFile(bundleFilenameLong, bundleFilenameShort);

return res.status(200).send(url);
@Hopding
Copy link
Owner

Hopding commented May 22, 2020

Hello @jackwshepherd!

I’m unable to view the document via the link you shared. Can you please upload the PDF directly to this thread as an attachment?

@jackwshepherd
Copy link
Author

@Hopding - I just emailed this to you. Thank you :)

@Hopding
Copy link
Owner

Hopding commented May 24, 2020

I just published 1.6.1-rc1. It contains a fix for this issue (see #458). The problem seems to be that the PDF you are copying from contains a number that is too large to fit into a 64 bit integer. Presumably, this causes Acrobat to throw a fit and not render the document. Let me know how this version works for you!

You can install this new version with npm:

npm install pdf-lib@1.6.1-rc1

It's also available on unpkg:

As well as jsDelivr:

@Hopding
Copy link
Owner

Hopding commented May 25, 2020

@jackwshepherd Version 1.6.1 is now published. It introduces a capNumbers option that resolves this issue. You can use it like so:

const pdfDoc = await PDFDocument.load(bytes, { capNumbers: true });

The full release notes are available here.

You can install this new version with npm:

npm install pdf-lib@1.6.1

It's also available on unpkg:

As well as jsDelivr:

@Hopding Hopding closed this as completed May 25, 2020
@jackwshepherd
Copy link
Author

Thank you once again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants