Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Embed files #229

Closed
zimt28 opened this issue Nov 1, 2019 · 5 comments
Closed

[Feature Request]: Embed files #229

zimt28 opened this issue Nov 1, 2019 · 5 comments

Comments

@zimt28
Copy link

zimt28 commented Nov 1, 2019

I want to generate ZUGFeRD invoices, which is a German standard for XML files with a certain structure embedded in a PDF/A-3 file.

I'll create feature requests for both requirements but here I'm asking for attachment embedding support. I couldn't find this in any other JS pdf library and think it's something quite some people might want.

@Hopding
Copy link
Owner

Hopding commented Dec 22, 2019

Hello @zimt28! This certainly seems like an interesting and useful feature. It shouldn't be terribly difficult to implement either. The PDF specification documents file attachments in section 7.11.4 Embedded File Streams and 12.5.6.15 File Attachment Annotations.

@Hopding
Copy link
Owner

Hopding commented Dec 22, 2019

To be clear, creating a file attachment is already possible to do with pdf-lib. It just entails creating particular dictionaries and streams, and wiring them up in the right way. pdf-lib already provides access to the low-level primitives necessary to do this. So this feature would just involve creating a nice API to abstract away the nitty gritty of what's involved here.

@Hopding Hopding changed the title [Feature Request] Embed files [Feature Request]: Embed files Jan 1, 2020
@sebastinez
Copy link
Contributor

Hello @Hopding, I would like to contribute to this library with this feature.

I've read the sections you mentioned in the PDF specification, and tried some different things, but I'm kind of lost with all the dictionaries and streams of the library.

It seems that the embedded file streams consist of 2 objects:

A descriptor, which defines de EF operator
<</Desc()/EF<</F 15 0 R>>/F(ExampleFileName.docx)/Type/Filespec/UF(ExampleFileName.docx)>>

And a FlatDecode Stream, which has the file contents it seems.
<</DL 4664/Filter/FlateDecode/Length 4274/Params<</CheckSum<1411DA765B6BE9B8DF0EEA5B8C4B02BD>/CreationDate(D:20200521183112-03'00')/ModDate(D:20200406151931-03'00')/Size 4664>>/Subtype/application#2Fvnd.openxmlformats-officedocument.wordprocessingml.document>>stream

Could you help me a bit out? A hint which low-level primitives I should take a look at? And eventually how to wire them up?
I'm learning as I go about the PDF specification, but I'm really new to this topic.

Thank you very much!

@Hopding
Copy link
Owner

Hopding commented May 22, 2020

Hello @sebastinez!

Adding file embedding functionality to pdf-lib would be fantastic. I look forward to a PR!

Here's a basic working example that creates a new document and attaches an image file to it. Let me know if you have any questions.

import {
  PDFDocument,
  PDFName,
  PDFDict,
  PDFArray,
  PDFString,
  PDFHexString,
  PDFRef,
} from 'pdf-lib';

const createEmbeddedFile = (
  pdfDoc: PDFDocument,
  file: Uint8Array,
  fileName: string,
  mimeType: string,
) => {
  const embeddedFileStream = pdfDoc.context.flateStream(file, {
    Type: 'EmbeddedFile',
    Subtype: PDFName.of(mimeType),
  });
  const embeddedFileStreamRef = pdfDoc.context.register(embeddedFileStream);

  const fileSpecDict = pdfDoc.context.obj({
    Type: 'Filespec',
    F: PDFString.of(fileName),
    UF: PDFHexString.fromText(fileName),
    EF: { F: embeddedFileStreamRef },
  });
  const fileSpecDictRef = pdfDoc.context.register(fileSpecDict);

  return fileSpecDictRef;
};

const insertEmbeddedFile = (
  pdfDoc: PDFDocument,
  fileName: string,
  fileSpecRef: PDFRef,
) => {
  if (!pdfDoc.catalog.has(PDFName.of('Names'))) {
    pdfDoc.catalog.set(PDFName.of('Names'), pdfDoc.context.obj({}));
  }
  const Names = pdfDoc.catalog.lookup(PDFName.of('Names'), PDFDict);

  if (!Names.has(PDFName.of('EmbeddedFiles'))) {
    Names.set(PDFName.of('EmbeddedFiles'), pdfDoc.context.obj({}));
  }
  const EmbeddedFiles = Names.lookup(PDFName.of('EmbeddedFiles'), PDFDict);

  if (!EmbeddedFiles.has(PDFName.of('Names'))) {
    EmbeddedFiles.set(PDFName.of('Names'), pdfDoc.context.obj([]));
  }
  const EFNames = EmbeddedFiles.lookup(PDFName.of('Names'), PDFArray);

  EFNames.push(PDFHexString.fromText(fileName));
  EFNames.push(fileSpecRef);
};

(async () => {
  const pdfDoc = await PDFDocument.create();

  const page = pdfDoc.addPage();
  page.drawText('This is a document with an attachment', { x: 100, y: 700 });

  const jpgUrl = 'https://pdf-lib.js.org/assets/cat_riding_unicorn.jpg';
  const jpgBuffer = await fetch(jpgUrl).then((res) => res.arrayBuffer());

  const file = new Uint8Array(jpgBuffer);
  const fileName = 'cat_riding_unicorn.jpg';
  const mimeType = 'image/jpeg';

  const fileSpecRef = createEmbeddedFile(pdfDoc, file, fileName, mimeType);
  insertEmbeddedFile(pdfDoc, fileName, fileSpecRef);

  const pdfBytes = await pdfDoc.save();
})();

@sebastinez sebastinez mentioned this issue May 22, 2020
5 tasks
@Hopding
Copy link
Owner

Hopding commented May 30, 2020

Version 1.7.0 is now published with this feature (thanks @sebastinez!). The full release notes are available here.

You can install this new version with npm:

npm install pdf-lib@1.7.0

It's also available on unpkg:

As well as jsDelivr:

@Hopding Hopding closed this as completed May 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants