Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to parse invalid object #369

Closed
antoinerousseau opened this issue Feb 29, 2020 · 19 comments · Fixed by #370
Closed

Trying to parse invalid object #369

antoinerousseau opened this issue Feb 29, 2020 · 19 comments · Fixed by #370

Comments

@antoinerousseau
Copy link

When I open my beautiful PDF with this lib, all is great.
But because the file is big, I compressed it using smallpdf.com, and now when I open the compressed version, I get these warnings:

Trying to parse invalid object: {"line":140,"column":0,"offset":33170})
Invalid object ref: 9 0 R
Trying to parse invalid object: {"line":301,"column":0,"offset":65943})
Invalid object ref: 21 0 R
Trying to parse invalid object: {"line":309,"column":0,"offset":65969})
Invalid object ref: 22 0 R
Trying to parse invalid object: {"line":317,"column":0,"offset":65995})
Invalid object ref: 23 0 R

Any idea why? Is this an issue if I just use drawText?

@Hopding
Copy link
Owner

Hopding commented Mar 1, 2020

Hello @antoinerousseau!

The warnings will not cause you any issues. pdf-lib's parser failed to parse those objects, but that's because they are a very specific type of empty object that don't do anything in your document anyways. Regardless, I fixed the parser bug in #370. So when the next release of pdf-lib goes out, you will no longer see these warnings.

@antoinerousseau
Copy link
Author

antoinerousseau commented Mar 1, 2020

Wow, so fast, thanks!
So I wonder, those extra whitespace-like characters, are they useless? If so, I guess smallpdf.com can further reduce the output size of the PDF in their code ;)

@Hopding
Copy link
Owner

Hopding commented Mar 1, 2020

Haha, indeed. They are useless as far as I can tell. Though, even if they aren't, smallpdf.com could eliminate the extra whitespace in them. But they only contribute a few bytes to the total size of the document, so I guess it's probably more effort than its worth for them.

@yakupteke
Copy link

Hi,
I have the same issue.

Trying to parse invalid object: {"line":342,"column":6,"offset":132428})
Invalid object ref: 389 0 R
Trying to parse invalid object: {"line":393,"column":6,"offset":137107})
Invalid object ref: 390 0 R

How can I solve this issue?
Thanks.

@antoinerousseau
Copy link
Author

antoinerousseau commented Mar 4, 2020

@yakupteke did you read this thread?...

So when the next release of pdf-lib goes out, you will no longer see these warnings.

@Hopding
Copy link
Owner

Hopding commented Mar 7, 2020

@antoinerousseau Version 1.4.1 is now published. It contains the fix for this issue (@yakupteke it will hopefully fix your issue as well). The full release notes are available here.

You can install this new version with npm:

npm install pdf-lib@1.4.1

It's also available on unpkg:

As well as jsDelivr:

@yakupteke
Copy link

Thanks a lot.

@deammer
Copy link

deammer commented Apr 13, 2020

Hi @Hopding, I seem to be running into the same issue when I delete a page from some PDF files, like this one. I'm using 1.4.1 and seeing the console warnings below:

Trying to parse invalid object: {"line":24,"column":6,"offset":5055}) PDFParser.js:213:12
Invalid object ref: 4 0 R PDFParser.js:215:12
Trying to parse invalid object: {"line":152,"column":6,"offset":21555}) PDFParser.js:213:12
Invalid object ref: 8 0 R PDFParser.js:215:12
Trying to parse invalid object: {"line":273,"column":6,"offset":40714}) PDFParser.js:213:12
Invalid object ref: 12 0 R PDFParser.js:215:12
Trying to parse invalid object: {"line":390,"column":6,"offset":55361}) PDFParser.js:213:12
Invalid object ref: 16 0 R PDFParser.js:215:12
Trying to parse invalid object: {"line":439,"column":6,"offset":61912}) PDFParser.js:213:12
Invalid object ref: 20 0 R PDFParser.js:215:12
Trying to parse invalid object: {"line":504,"column":6,"offset":69247}) PDFParser.js:213:12
Invalid object ref: 24 0 R PDFParser.js:215:12
Trying to parse invalid object: {"line":566,"column":6,"offset":77542}) PDFParser.js:213:12
Invalid object ref: 28 0 R PDFParser.js:215:12
Trying to parse invalid object: {"line":641,"column":6,"offset":99802}) PDFParser.js:213:12
Invalid object ref: 34 0 R PDFParser.js:215:12
Trying to parse invalid object: {"line":700,"column":6,"offset":106641}) PDFParser.js:213:12
Invalid object ref: 38 0 R PDFParser.js:215:12
Trying to parse invalid object: {"line":755,"column":6,"offset":112760})

Do you have any idea why that would be happening?

Sample relevant code, nothing too wild:
const deleteFirstPage = async (pdfUrl: string) => {
  const pdfBytes: ArrayBuffer = await axios
    .get(pdfUrl, {
      responseType: 'arraybuffer'
    })
    .then(res => res.data as ArrayBuffer)

  const document = await PDFDocument.load(pdfBytes)
  document.removePage(1)

  const updatedPdfBytes = await document.save()

  return new Blob([updatedPdfBytes], { type: 'application/pdf' })
}

@kimsean
Copy link

kimsean commented Apr 14, 2020

I just installed this package.. Im using 1.4.1 and i still get the parse invalid object error. Any idea? Thanks

@Hopding
Copy link
Owner

Hopding commented Apr 16, 2020

@deammer It looks like the problem you're encountering is due to the fact that your PDF is encrypted. pdf-lib does not currently support encrypted documents.

Just to be clear, when I run the following script:

import { PDFDocument } from 'pdf-lib';

(async () => {
  const url = 'https://www.uscis.gov/sites/default/files/files/form/i-130.pdf';
  const i130Bytes = await fetch(url).then((res) => res.arrayBuffer());
  const pdfDoc = await PDFDocument.load(i130Bytes);
})();

It produces the following output:

Trying to parse invalid object: {"line":24,"column":6,"offset":5055})
Invalid object ref: 4 0 R
...
Trying to parse invalid object: {"line":3862,"column":6,"offset":563722})
Invalid object ref: 408 0 R
Error: Input document to `PDFDocument.load` is encrypted. You can use `PDFDocument.load(..., { ignoreEncryption: true })` if you wish to load the document anyways.
    at new EncryptedPDFError (/Users/user/github/pdf-lib/scratchpad/build/src/api/errors.js:11:24)
    at new PDFDocument (/Users/user/github/pdf-lib/scratchpad/build/src/api/PDFDocument.js:52:19)
    at Function.<anonymous> (/Users/user/github/pdf-lib/scratchpad/build/src/api/PDFDocument.js:123:47)
    at step (/Users/user/github/pdf-lib/node_modules/tslib/tslib.js:139:27)
    at Object.next (/Users/user/github/pdf-lib/node_modules/tslib/tslib.js:120:57)
    at fulfilled (/Users/user/github/pdf-lib/node_modules/tslib/tslib.js:110:62)

Notice the encryption error. The ignoreEncryption flag is explained here. It doesn't actually do anything but suppress the error. I only added it to the library for backwards compatibility reasons. In hindsight it probably should not have been added.

@Hopding
Copy link
Owner

Hopding commented Apr 16, 2020

@kimsean Can you share the document that is causing the error?

@pedro-surf
Copy link

pedro-surf commented Oct 17, 2020

Still getting this error.
I'm on latest (1.11.2)
image

Link for the PDF file: https://www.icloud.com/iclouddrive/0de0B7Vuvr6gTeYYAn4c9GC_Q#large

Errors:
image

@tranghuyen215
Copy link

Still getting this error.
I'm on latest (1.11.2)
image

Link for the PDF file: https://www.icloud.com/iclouddrive/0de0B7Vuvr6gTeYYAn4c9GC_Q#large

Errors:
image

same to me

@dbaq
Copy link

dbaq commented Jul 2, 2021

Hi @Hopding, do you mind reopening this issue?

I am on "pdf-lib": "1.16.0" and I am still seeing this problem (even for non encrypted PDFs).

Thank you.

@ccoddington
Copy link

Same as dbaq, on 1.16.0:
image

@JMAmimacom
Copy link

The same here!

bs-pdf.entry.js:16547 Trying to parse invalid object: {"line":12386,"column":19174,"offset":178542}) PDFParser.tryToParseInvalidIndirectObject @ bs-pdf.entry.js:16547 (anonymous) @ bs-pdf.entry.js:16590 step @ bs-pdf.entry.js:102 (anonymous) @ bs-pdf.entry.js:83 rejected @ bs-pdf.entry.js:74 Promise.then (async) step @ bs-pdf.entry.js:75 fulfilled @ bs-pdf.entry.js:73 Promise.then (async) step @ bs-pdf.entry.js:75 fulfilled @ bs-pdf.entry.js:73 Promise.then (async) step @ bs-pdf.entry.js:75 fulfilled @ bs-pdf.entry.js:73 Promise.then (async) step @ bs-pdf.entry.js:75 fulfilled @ bs-pdf.entry.js:73 Promise.then (async) step @ bs-pdf.entry.js:75 fulfilled @ bs-pdf.entry.js:73 Promise.then (async) step @ bs-pdf.entry.js:75 fulfilled @ bs-pdf.entry.js:73 Promise.then (async) step @ bs-pdf.entry.js:75 fulfilled @ bs-pdf.entry.js:73 Promise.then (async) step @ bs-pdf.entry.js:75 fulfilled @ bs-pdf.entry.js:73 Promise.then (async) step @ bs-pdf.entry.js:75 fulfilled @ bs-pdf.entry.js:73 Promise.then (async) step @ bs-pdf.entry.js:75 fulfilled @ bs-pdf.entry.js:73 Promise.then (async) step @ bs-pdf.entry.js:75 fulfilled @ bs-pdf.entry.js:73 Promise.then (async) step @ bs-pdf.entry.js:75 (anonymous) @ bs-pdf.entry.js:76 __awaiter @ bs-pdf.entry.js:72 PDFParser.parseIndirectObjects @ bs-pdf.entry.js:16569 (anonymous) @ bs-pdf.entry.js:16669 step @ bs-pdf.entry.js:102 (anonymous) @ bs-pdf.entry.js:83 (anonymous) @ bs-pdf.entry.js:76 __awaiter @ bs-pdf.entry.js:72 PDFParser.parseDocumentSection @ bs-pdf.entry.js:16666 (anonymous) @ bs-pdf.entry.js:16434 step @ bs-pdf.entry.js:102 (anonymous) @ bs-pdf.entry.js:83 (anonymous) @ bs-pdf.entry.js:76 __awaiter @ bs-pdf.entry.js:72 PDFParser.parseDocument @ bs-pdf.entry.js:16421 (anonymous) @ bs-pdf.entry.js:23008 step @ bs-pdf.entry.js:102 (anonymous) @ bs-pdf.entry.js:83 (anonymous) @ bs-pdf.entry.js:76 __awaiter @ bs-pdf.entry.js:72 PDFDocument.load @ bs-pdf.entry.js:22997 downloadPDF @ bs-pdf.entry.js:25776 onClick @ bs-pdf.entry.js:25794 bs-pdf.entry.js:16549 Invalid object ref: 36 0 R

@Hopding
Copy link
Owner

Hopding commented Oct 1, 2021

If anybody is having trouble with this (for non-encrypted PDFs), please create a new bug report

@DouglasHFonseca
Copy link

DouglasHFonseca commented Apr 15, 2022

Does anyone know a way around this error ?

the same erro

pdf: https://drive.google.com/file/d/14K67CUc5fZ3XPbXYlHQPUW3PXufxPRVZ/view?usp=sharing

@ruiyongsheng
Copy link

ruiyongsheng commented May 11, 2022

@Hopding
hi, I use "pdf-lib": "^1.17.1", in puppeteer,
page.pdf([options]
returns: <Promise> Promise which resolves with PDF buffer.

const buffer = await page.pdf([options];
  // Give the buffer to pdf-lib;
const pdfDoc = await PDFDocument.load(buffer);

error log:

err message: Failed to parse PDF document (line:47350 col:464825 offset=8891): Failed to parse invalid PDF object 
err stack:  Error: Failed to parse PDF document (line:47350 col:464825 offset=8891): Failed to parse invalid PDF object 
at PDFInvalidObjectParsingError.PDFParsingError [as constructor] (/node_modules/pdf-lib/cjs/core/errors.js:229:24) 
at new PDFInvalidObjectParsingError (/node_modules/pdf-lib/cjs/core/errors.js:262:24) 
at PDFParser.tryToParseInvalidIndirectObject (/node_modules/pdf-lib/cjs/core/parser/PDFParser.js:181:19) 
at PDFParser.<anonymous> (/node_modules/pdf-lib/cjs/core/parser/PDFParser.js:209:30)

I'm not sure if this is a bug or a concurrency problem, it's not necessary, I found it in the error log, I look forward to your help, thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.