Skip to content

Internals Signatures and PDFA

Dragon edited this page Jun 3, 2026 · 1 revision

Internals: Signatures and PDF/A

This page covers the internal implementation of long-term validation (DSS / LTV), strict ETSI.CAdES signatures, and PDF/A archival conformance in phppdf.

Digital signature internals

Long-term validation (DSS / LTV)

Document::enableLtv() makes the document's signatures long-term validatable by embedding their validation material in a Document Security Store and covering it with a document timestamp. It is written as incremental revisions appended after every signature: first a DSS revision, then (when a Tsa is given) a /DocTimeStamp revision whose ByteRange covers the DSS.

  • /DSS is an indirect dictionary referenced from the catalog, carrying /Certs, /CRLs and /OCSPs arrays of raw-DER stream objects (one stream per certificate, CRL and OCSP response; empty arrays are omitted). The global DSS only - no per-signature /VRI sub-dictionary, because its key is the SHA-1 of a signature's /Contents which is not known until after signing; a global DSS is sufficient for validators and is what modern signers emit.
  • CRL or OCSP revocation. Material is collected through an injectable ValidationDataSource (the same seam shape as TsaClient): HttpCrlValidationDataSource reads each certificate's CRL distribution point and fetches the CRL, HttpOcspValidationDataSource reads each certificate's AIA responder and fetches the OCSP response (the OCSP request is a single SHA-1 CertID with no nonce, built by OcspRequestBuilder over an injectable OcspClient seam), and StaticValidationDataSource supplies material a caller obtained itself (and is how the test suite runs offline).
  • Subfilter independent. This is Adobe-style LTV: it is the presence of validation material in the DSS plus a covering document timestamp that makes the file long-term validatable, not the CMS subfilter. It works over both the default adbe.pkcs7.detached and the strict ETSI.CAdES.detached signatures (see below).
  • Archival (B-LTA). enableLtv($source, $timestamp, $timestampCertificateChains) also collects, through the same ValidationDataSource, the chain + revocation of the certificate that signs the covering /DocTimeStamp, and merges it into the DSS. The archive timestamp then protects validation material that includes its own TSA certificate's revocation, so the whole construct - signature and timestamp - validates offline from the embedded DSS (PAdES-B-LTA). Renewal (stacking further archive timestamps years later, each preceded by a DSS update for the prior timestamp's TSA) is out of scope.

Strict ETSI.CAdES signatures

sign(..., format: SignatureFormat::EtsiCadesDetached) (and the same on addSignature()) emits /SubFilter /ETSI.CAdES.detached with a CMS SignedData built by hand, because PHP's openssl_cms_sign cannot inject signed attributes. CadesSigner assembles the DER with the Der toolkit.

  • Signed attributes. CmsSignedAttributes builds the three CAdES attributes: contentType (id-data), messageDigest (SHA-256 of the ByteRange content), and signingCertificateV2 (ESS, RFC 5035) - an ESSCertIDv2 whose certHash is sha256(signerCertDer) plus an IssuerSerial, binding the signature to the exact signer certificate. The SHA-256 hashAlgorithm is the ESSCertIDv2 DEFAULT and is omitted.
  • Sign vs embed (RFC 5652 5.4). The attributes are signed under an EXPLICIT SET OF tag (0x31) but embedded in the SignerInfo under the [0] IMPLICIT tag (0xA0), over the same content; the SET OF elements are DER-sorted ascending bytewise.
  • SignerInfo is v1 with issuerAndSerialNumber, digestAlgorithm SHA-256, signatureAlgorithm rsaEncryption; the signature is RSA-SHA256 over the SET OF form (openssl_sign). RSA keys only.
  • Composition. The existing SignatureTimestamper adds the RFC 3161 token as an unsigned attribute on the hand-built CMS unchanged, so CAdES + a Tsa gives PAdES-B-T; enableLtv() on top is the strict path toward B-LT.

PDF/A archival conformance

Document::enablePdfA(PdfALevel $level) makes the output comply with ISO 19005 (PDF/A-2 or PDF/A-3). Four levels are available:

  • PdfALevel::A2B - PDF/A-2b (basic): correct visual reproduction.
  • PdfALevel::A2U - PDF/A-2u (unicode): A2b plus ToUnicode maps on every font (already satisfied by custom embedded fonts; standard fonts are prohibited anyway).
  • PdfALevel::A3B / A3U - PDF/A-3 (ISO 19005-3): the A-2 levels plus support for embedded associated files.

Use Document::attachFile(string $bytes, string $name, AFRelationship $relationship = AFRelationship::Data, string $mime = 'application/octet-stream', ?string $description = null, ?DateTimeImmutable $modDate = null) to attach a file; the most common use case is a Factur-X or ZUGFeRD e-invoice XML embedded alongside the human-readable PDF. Attachments are rejected at PDF/A-2 (the guard throws); use A3B or A3U.

Namespace: src/PdfA/

  • PdfALevel - pure (non-backed) enum with one case per level (A2B, A2U, A3B, A3U). Helper methods part() (2 or 3), conformance() (B or U), allowsEmbeddedFiles(), and requiresUnicode() derive everything the emitters need from the case.
  • OutputIntent - builds the intent object and its profile object as a pair. The /OutputIntent dictionary carries /Type /OutputIntent, /S /GTS_PDFA1, /OutputConditionIdentifier (sRGB IEC61966-2.1), /Info (the same condition string), and /DestOutputProfile pointing at the IccProfileStream. The ICC bytes are FlateDecode-compressed (gzcompress, level 9) here before being handed to the stream.
  • IccProfileStream - wraps the bundled resources/icc/sRGB.icc (a 588-byte littleCMS sRGB profile) in a FlateDecode stream object with /N 3 (three color components). The same profile object is used for every PDF/A document produced in a session.
  • PdfAConformanceGuard - called by output() before serialization; throws PdfException for any of the four prohibited combinations:
    • a non-embedded standard font (Helvetica, Times, Courier, etc.) is in use - every font must be embedded via registerFontFamily();
    • encryption is configured;
    • document JavaScript (addDocumentScript) is present;
    • appended revisions (addSignature / addDocumentTimestamp / enableLtv) are present. An additional guard rejects attachFile() calls when the PDF/A part is 2 (attachments are only permitted at part 3).
  • AFRelationship - pure (non-backed) enum for the /AFRelationship key on a file specification: Source, Data, Alternative, Supplement, Unspecified (Data is the Factur-X default). pdfName() maps each case to its PDF name.
  • AttachedFile - final readonly value object holding the filename, raw byte string, AFRelationship, MIME type string, an optional description, and the modification date (DateTimeImmutable, passed in so serialization is deterministic). It throws on an empty name or empty MIME type.
  • EmbeddedFileStream - the stream-object wrapper for one embedded file: given the prepared dictionary and the raw bytes, it appends /Length and the uncompressed body. The dictionary it receives is an /EmbeddedFile stream with /Subtype set to the MIME type as a PDF name (the raw / is encoded to #2F by Name::of) and /Params carrying /Size, /ModDate, and a /CheckSum (MD5 hex).
  • EmbeddedFileEmitter - builds the two indirect objects per attachment (the /Filespec dictionary then the EmbeddedFileStream), numbered sequentially from a given first object number, and returns both the objects and the filespec references. The filespec carries /Type /Filespec, /F, /UF, /AFRelationship, /EF, and /Desc (omitted when null). outputWithMetadata() calls it after outlines and before the output intent, then injects the filespec references as the catalog /AF array and as /EmbeddedFiles name tree entries (keyed by filename). That name tree is merged with the existing /JavaScript name tree via the shared withNames() helper, so there is always a single /Names dictionary.

How enablePdfA() wires into serialization

Calling enablePdfA() does three things that all take effect at output() time:

  1. Forces the metadata output path - the XMP packet, the /Info dictionary, and the document /ID pair are always written (normally /ID and full XMP are optional).
  2. Injects /OutputIntents - an [OutputIntent] array is added to the catalog referencing the sRGB ICC profile stream.
  3. Passes the level to XmpWriter - the XMP serializer prepends a pdfaid RDF description block (xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/") carrying <pdfaid:part> (2 or 3, from PdfALevel::part()) and <pdfaid:conformance> (B or U). The rest of the XMP packet (dc:, xmp:, pdf: namespaces) is emitted unchanged, so the null (non-PDF/A) path stays byte-identical.

The PDF header is already %PDF-1.7, which satisfies the PDF/A-2 and PDF/A-3 version requirement.

Why the u variant is essentially free

PDF/A-2u requires a valid ToUnicode CMap on every font. Custom embedded fonts (the only fonts allowed in a PDF/A document) already carry a /ToUnicode stream built by FontEngine during subsetting - the same stream that makes copy-paste work correctly. No extra work is needed to satisfy the Unicode conformance level.

Validation oracle

The e2e golden test tests/Golden/PdfA2ConformanceTest.php is a single data-provided test covering three flavours (2b, 2u, 3b). Each case rebuilds the matching byte-identity fixture document (PdfA2bTest, PdfA2uTest, PdfA3bTest expose a static buildDocument()), pipes the output to veraPDF --flavour <flavour>, and asserts isCompliant="true" in the veraPDF XML report. The 3b case attaches an XML file via attachFile(), exercising the embedded-file path. The test auto-skips when the bundled JRE or the veraPDF CLI jar are absent. That attachFile() throws at PDF/A-2 is covered separately by the unit tests.

See also

Clone this wiki locally