Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add getAttachments to PDFDocument #1242

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
19 changes: 19 additions & 0 deletions README.md
Expand Up @@ -63,6 +63,7 @@
- [Embed PDF Pages](#embed-pdf-pages)
- [Embed Font and Measure Text](#embed-font-and-measure-text)
- [Add Attachments](#add-attachments)
- [Extract Attachments](#extract-attachments)
- [Set Document Metadata](#set-document-metadata)
- [Read Document Metadata](#read-document-metadata)
- [Set Viewer Preferences](#set-viewer-preferences)
Expand Down Expand Up @@ -108,6 +109,7 @@
- Set viewer preferences
- Read viewer preferences
- Add attachments
- Extract attachments

## Motivation

Expand Down Expand Up @@ -756,6 +758,23 @@ const pdfBytes = await pdfDoc.save()
// • Rendered in an <iframe>
```

### Extract Attachments

If you load a PDF that has `cars.csv` as an attachment, you can use the
following to extract the attachments:

<!-- prettier-ignore -->
```js
const pdfDoc = await PDFDocument.load(...)
const attachments = pdfDoc.getAttachments()
const csv = attachments.find(({ name }) => name === 'cars.csv')
fs.writeFileSync(csv.name, csv.data)
```

> NOTE: If you are building a pdf file with this library, any attachments you've
> added won't be returned by this function until after you call `save` on the
> document.

### Set Document Metadata

_This example produces [this PDF](assets/pdfs/examples/set_document_metadata.pdf)_.
Expand Down
45 changes: 45 additions & 0 deletions src/api/PDFDocument.ts
Expand Up @@ -21,6 +21,10 @@ import {
PDFCatalog,
PDFContext,
PDFDict,
PDFArray,
decodePDFRawStream,
PDFStream,
PDFRawStream,
PDFHexString,
PDFName,
PDFObjectCopier,
Expand Down Expand Up @@ -900,6 +904,47 @@ export default class PDFDocument {
this.embeddedFiles.push(embeddedFile);
}

private getRawAttachments() {
if (!this.catalog.has(PDFName.of('Names'))) return [];
const Names = this.catalog.lookup(PDFName.of('Names'), PDFDict);

if (!Names.has(PDFName.of('EmbeddedFiles'))) return [];
const EmbeddedFiles = Names.lookup(PDFName.of('EmbeddedFiles'), PDFDict);

if (!EmbeddedFiles.has(PDFName.of('Names'))) return [];
const EFNames = EmbeddedFiles.lookup(PDFName.of('Names'), PDFArray);

const rawAttachments = [];
for (let idx = 0, len = EFNames.size(); idx < len; idx += 2) {
const fileName = EFNames.lookup(idx) as PDFHexString | PDFString;
const fileSpec = EFNames.lookup(idx + 1, PDFDict);
rawAttachments.push({ fileName, fileSpec });
}

return rawAttachments;
}

/**
* Get all attachments that are embedded in this document.
*
* > **NOTE:** If you build a document with this library, this won't return
* > anything until you call [[save]] on the document.
*
* @returns Array of attachments with name and data
*/
getAttachments() {
const rawAttachments = this.getRawAttachments();
return rawAttachments.map(({ fileName, fileSpec }) => {
const stream = fileSpec
.lookup(PDFName.of('EF'), PDFDict)
.lookup(PDFName.of('F'), PDFStream) as PDFRawStream;
return {
name: fileName.decodeText(),
data: decodePDFRawStream(stream).decode(),
};
});
}

/**
* Embed a font into this document. The input data can be provided in multiple
* formats:
Expand Down
142 changes: 142 additions & 0 deletions tests/api/PDFDocument.spec.ts
Expand Up @@ -37,6 +37,9 @@ const normalPdfBytes = fs.readFileSync('assets/pdfs/normal.pdf');
const withViewerPrefsPdfBytes = fs.readFileSync(
'assets/pdfs/with_viewer_prefs.pdf',
);
const hasAttachmentPdfBytes = fs.readFileSync(
'assets/pdfs/examples/add_attachments.pdf',
);

describe(`PDFDocument`, () => {
describe(`load() method`, () => {
Expand Down Expand Up @@ -573,4 +576,143 @@ describe(`PDFDocument`, () => {
expect(pdfDoc.defaultWordBreaks).toEqual(srcDoc.defaultWordBreaks);
});
});

describe(`attach() method`, () => {
it(`Saves to the same value after attaching a file`, async () => {
const pdfDoc1 = await PDFDocument.create({ updateMetadata: false });
const pdfDoc2 = await PDFDocument.create({ updateMetadata: false });

const jpgAttachmentBytes = fs.readFileSync(
'assets/images/cat_riding_unicorn.jpg',
);
const pdfAttachmentBytes = fs.readFileSync(
'assets/pdfs/us_constitution.pdf',
);

await pdfDoc1.attach(jpgAttachmentBytes, 'cat_riding_unicorn.jpg', {
mimeType: 'image/jpeg',
description: 'Cool cat riding a unicorn! 🦄🐈🕶️',
creationDate: new Date('2019/12/01'),
modificationDate: new Date('2020/04/19'),
});

await pdfDoc1.attach(pdfAttachmentBytes, 'us_constitution.pdf', {
mimeType: 'application/pdf',
description: 'Constitution of the United States 🇺🇸🦅',
creationDate: new Date('1787/09/17'),
modificationDate: new Date('1992/05/07'),
});

await pdfDoc2.attach(jpgAttachmentBytes, 'cat_riding_unicorn.jpg', {
mimeType: 'image/jpeg',
description: 'Cool cat riding a unicorn! 🦄🐈🕶️',
creationDate: new Date('2019/12/01'),
modificationDate: new Date('2020/04/19'),
});

await pdfDoc2.attach(pdfAttachmentBytes, 'us_constitution.pdf', {
mimeType: 'application/pdf',
description: 'Constitution of the United States 🇺🇸🦅',
creationDate: new Date('1787/09/17'),
modificationDate: new Date('1992/05/07'),
});

const savedDoc1 = await pdfDoc1.save();
const savedDoc2 = await pdfDoc2.save();

expect(savedDoc1).toEqual(savedDoc2);
});
});

describe(`getAttachments() method`, () => {
it(`Can read attachments from an existing pdf file`, async () => {
const pdfDoc = await PDFDocument.load(hasAttachmentPdfBytes);
const attachments = pdfDoc.getAttachments();
expect(attachments.length).toEqual(2);
const jpgAttachmentExtractedBytes = attachments.find(
(attachment) => attachment.name === 'cat_riding_unicorn.jpg',
)!;
const pdfAttachmentExtractedBytes = attachments.find(
(attachment) => attachment.name === 'us_constitution.pdf',
)!;
expect(pdfAttachmentExtractedBytes).toBeDefined();
expect(jpgAttachmentExtractedBytes).toBeDefined();
const jpgAttachmentBytes = fs.readFileSync(
'assets/images/cat_riding_unicorn.jpg',
);
const pdfAttachmentBytes = fs.readFileSync(
'assets/pdfs/us_constitution.pdf',
);
expect(jpgAttachmentBytes).toEqual(
Buffer.from(jpgAttachmentExtractedBytes.data),
);
expect(pdfAttachmentBytes).toEqual(
Buffer.from(pdfAttachmentExtractedBytes.data),
);
});

it(`Saves to the same value after round tripping`, async () => {
const pdfDoc1 = await PDFDocument.create({ updateMetadata: false });
const pdfDoc2 = await PDFDocument.create({ updateMetadata: false });

const jpgAttachmentBytes = fs.readFileSync(
'assets/images/cat_riding_unicorn.jpg',
);
const pdfAttachmentBytes = fs.readFileSync(
'assets/pdfs/us_constitution.pdf',
);

await pdfDoc1.attach(jpgAttachmentBytes, 'cat_riding_unicorn.jpg', {
mimeType: 'image/jpeg',
description: 'Cool cat riding a unicorn! 🦄🐈🕶️',
creationDate: new Date('2019/12/01'),
modificationDate: new Date('2020/04/19'),
});

await pdfDoc1.attach(pdfAttachmentBytes, 'us_constitution.pdf', {
mimeType: 'application/pdf',
description: 'Constitution of the United States 🇺🇸🦅',
creationDate: new Date('1787/09/17'),
modificationDate: new Date('1992/05/07'),
});

// This is the currently documented behavior before save has been called
const noAttachments = pdfDoc1.getAttachments();
expect(noAttachments).toEqual([]);

const savedDoc1 = await pdfDoc1.save();
const attachments = pdfDoc1.getAttachments();
const jpgAttachmentExtractedBytes = attachments.find(
(attachment) => attachment.name === 'cat_riding_unicorn.jpg',
)!;
const pdfAttachmentExtractedBytes = attachments.find(
(attachment) => attachment.name === 'us_constitution.pdf',
)!;

await pdfDoc2.attach(
jpgAttachmentExtractedBytes.data,
'cat_riding_unicorn.jpg',
{
mimeType: 'image/jpeg',
description: 'Cool cat riding a unicorn! 🦄🐈🕶️',
creationDate: new Date('2019/12/01'),
modificationDate: new Date('2020/04/19'),
},
);

await pdfDoc2.attach(
pdfAttachmentExtractedBytes.data,
'us_constitution.pdf',
{
mimeType: 'application/pdf',
description: 'Constitution of the United States 🇺🇸🦅',
creationDate: new Date('1787/09/17'),
modificationDate: new Date('1992/05/07'),
},
);

const savedDoc2 = await pdfDoc2.save();
expect(savedDoc1).toEqual(savedDoc2);
});
});
});