-
Notifications
You must be signed in to change notification settings - Fork 0
Internals Signatures and PDFA
This page covers the internal implementation of long-term validation (DSS / LTV), strict ETSI.CAdES signatures, and PDF/A archival conformance in phppdf.
Document::enableLtv() makes the document's signatures long-term validatable by embedding their validation material in a Document Security Store and covering it with a document timestamp. It is written as incremental revisions appended after every signature: first a DSS revision, then (when a Tsa is given) a /DocTimeStamp revision whose ByteRange covers the DSS.
-
/DSS is an indirect dictionary referenced from the catalog, carrying
/Certs,/CRLsand/OCSPsarrays of raw-DER stream objects (one stream per certificate, CRL and OCSP response; empty arrays are omitted). The global DSS only - no per-signature/VRIsub-dictionary, because its key is the SHA-1 of a signature's/Contentswhich is not known until after signing; a global DSS is sufficient for validators and is what modern signers emit. -
CRL or OCSP revocation. Material is collected through an injectable
ValidationDataSource(the same seam shape asTsaClient):HttpCrlValidationDataSourcereads each certificate's CRL distribution point and fetches the CRL,HttpOcspValidationDataSourcereads each certificate's AIA responder and fetches the OCSP response (the OCSP request is a single SHA-1CertIDwith no nonce, built byOcspRequestBuilderover an injectableOcspClientseam), andStaticValidationDataSourcesupplies material a caller obtained itself (and is how the test suite runs offline). -
Subfilter independent. This is Adobe-style LTV: it is the presence of validation material in the DSS plus a covering document timestamp that makes the file long-term validatable, not the CMS subfilter. It works over both the default
adbe.pkcs7.detachedand the strictETSI.CAdES.detachedsignatures (see below). -
Archival (B-LTA).
enableLtv($source, $timestamp, $timestampCertificateChains)also collects, through the sameValidationDataSource, the chain + revocation of the certificate that signs the covering/DocTimeStamp, and merges it into the DSS. The archive timestamp then protects validation material that includes its own TSA certificate's revocation, so the whole construct - signature and timestamp - validates offline from the embedded DSS (PAdES-B-LTA). Renewal (stacking further archive timestamps years later, each preceded by a DSS update for the prior timestamp's TSA) is out of scope.
sign(..., format: SignatureFormat::EtsiCadesDetached) (and the same on addSignature()) emits /SubFilter /ETSI.CAdES.detached with a CMS SignedData built by hand, because PHP's openssl_cms_sign cannot inject signed attributes. CadesSigner assembles the DER with the Der toolkit.
-
Signed attributes.
CmsSignedAttributesbuilds the three CAdES attributes:contentType(id-data),messageDigest(SHA-256 of the ByteRange content), andsigningCertificateV2(ESS, RFC 5035) - anESSCertIDv2whosecertHashissha256(signerCertDer)plus anIssuerSerial, binding the signature to the exact signer certificate. The SHA-256 hashAlgorithm is the ESSCertIDv2 DEFAULT and is omitted. -
Sign vs embed (RFC 5652 5.4). The attributes are signed under an EXPLICIT
SET OFtag (0x31) but embedded in the SignerInfo under the[0] IMPLICITtag (0xA0), over the same content; theSET OFelements are DER-sorted ascending bytewise. -
SignerInfo is v1 with
issuerAndSerialNumber, digestAlgorithm SHA-256, signatureAlgorithm rsaEncryption; the signature is RSA-SHA256 over theSET OFform (openssl_sign). RSA keys only. -
Composition. The existing
SignatureTimestamperadds the RFC 3161 token as an unsigned attribute on the hand-built CMS unchanged, so CAdES + aTsagives PAdES-B-T;enableLtv()on top is the strict path toward B-LT.
Document::enablePdfA(PdfALevel $level) makes the output comply with ISO 19005 (PDF/A-2 or PDF/A-3). Four levels are available:
-
PdfALevel::A2B- PDF/A-2b (basic): correct visual reproduction. -
PdfALevel::A2U- PDF/A-2u (unicode): A2b plus ToUnicode maps on every font (already satisfied by custom embedded fonts; standard fonts are prohibited anyway). -
PdfALevel::A3B/A3U- PDF/A-3 (ISO 19005-3): the A-2 levels plus support for embedded associated files.
Use Document::attachFile(string $bytes, string $name, AFRelationship $relationship = AFRelationship::Data, string $mime = 'application/octet-stream', ?string $description = null, ?DateTimeImmutable $modDate = null) to attach a file; the most common use case is a Factur-X or ZUGFeRD e-invoice XML embedded alongside the human-readable PDF. Attachments are rejected at PDF/A-2 (the guard throws); use A3B or A3U.
-
PdfALevel- pure (non-backed) enum with one case per level (A2B,A2U,A3B,A3U). Helper methodspart()(2 or 3),conformance()(BorU),allowsEmbeddedFiles(), andrequiresUnicode()derive everything the emitters need from the case. -
OutputIntent- builds the intent object and its profile object as a pair. The/OutputIntentdictionary carries/Type /OutputIntent,/S /GTS_PDFA1,/OutputConditionIdentifier (sRGB IEC61966-2.1),/Info(the same condition string), and/DestOutputProfilepointing at theIccProfileStream. The ICC bytes are FlateDecode-compressed (gzcompress, level 9) here before being handed to the stream. -
IccProfileStream- wraps the bundledresources/icc/sRGB.icc(a 588-byte littleCMS sRGB profile) in a FlateDecode stream object with/N 3(three color components). The same profile object is used for every PDF/A document produced in a session. -
PdfAConformanceGuard- called byoutput()before serialization; throwsPdfExceptionfor any of the four prohibited combinations:- a non-embedded standard font (Helvetica, Times, Courier, etc.) is in use - every font must be embedded via
registerFontFamily(); - encryption is configured;
- document JavaScript (
addDocumentScript) is present; - appended revisions (
addSignature/addDocumentTimestamp/enableLtv) are present. An additional guard rejectsattachFile()calls when the PDF/A part is 2 (attachments are only permitted at part 3).
- a non-embedded standard font (Helvetica, Times, Courier, etc.) is in use - every font must be embedded via
-
AFRelationship- pure (non-backed) enum for the/AFRelationshipkey on a file specification:Source,Data,Alternative,Supplement,Unspecified(Datais the Factur-X default).pdfName()maps each case to its PDF name. -
AttachedFile-final readonlyvalue object holding the filename, raw byte string,AFRelationship, MIME type string, an optional description, and the modification date (DateTimeImmutable, passed in so serialization is deterministic). It throws on an empty name or empty MIME type. -
EmbeddedFileStream- the stream-object wrapper for one embedded file: given the prepared dictionary and the raw bytes, it appends/Lengthand the uncompressed body. The dictionary it receives is an/EmbeddedFilestream with/Subtypeset to the MIME type as a PDF name (the raw/is encoded to#2FbyName::of) and/Paramscarrying/Size,/ModDate, and a/CheckSum(MD5 hex). -
EmbeddedFileEmitter- builds the two indirect objects per attachment (the/Filespecdictionary then theEmbeddedFileStream), numbered sequentially from a given first object number, and returns both the objects and the filespec references. The filespec carries/Type /Filespec,/F,/UF,/AFRelationship,/EF, and/Desc(omitted when null).outputWithMetadata()calls it after outlines and before the output intent, then injects the filespec references as the catalog/AFarray and as/EmbeddedFilesname tree entries (keyed by filename). That name tree is merged with the existing/JavaScriptname tree via the sharedwithNames()helper, so there is always a single/Namesdictionary.
Calling enablePdfA() does three things that all take effect at output() time:
-
Forces the metadata output path - the XMP packet, the
/Infodictionary, and the document/IDpair are always written (normally/IDand full XMP are optional). -
Injects
/OutputIntents- an[OutputIntent]array is added to the catalog referencing the sRGB ICC profile stream. -
Passes the level to
XmpWriter- the XMP serializer prepends apdfaidRDF description block (xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/") carrying<pdfaid:part>(2 or 3, fromPdfALevel::part()) and<pdfaid:conformance>(B or U). The rest of the XMP packet (dc:, xmp:, pdf: namespaces) is emitted unchanged, so thenull(non-PDF/A) path stays byte-identical.
The PDF header is already %PDF-1.7, which satisfies the PDF/A-2 and PDF/A-3 version requirement.
PDF/A-2u requires a valid ToUnicode CMap on every font. Custom embedded fonts (the only fonts allowed in a PDF/A document) already carry a /ToUnicode stream built by FontEngine during subsetting - the same stream that makes copy-paste work correctly. No extra work is needed to satisfy the Unicode conformance level.
The e2e golden test tests/Golden/PdfA2ConformanceTest.php is a single data-provided test covering three flavours (2b, 2u, 3b). Each case rebuilds the matching byte-identity fixture document (PdfA2bTest, PdfA2uTest, PdfA3bTest expose a static buildDocument()), pipes the output to veraPDF --flavour <flavour>, and asserts isCompliant="true" in the veraPDF XML report. The 3b case attaches an XML file via attachFile(), exercising the embedded-file path. The test auto-skips when the bundled JRE or the veraPDF CLI jar are absent. That attachFile() throws at PDF/A-2 is covered separately by the unit tests.
MIT licensed. Source on GitHub - if phppdf helps you, you can buy me a coffee.