-
Notifications
You must be signed in to change notification settings - Fork 0
Tagged PDF
A tagged PDF carries a logical structure tree alongside its visual content: a parallel tree of semantic elements (paragraphs, headings, figures, tables, lists) that describes what the content is and in what order it should be read. Assistive technology such as screen readers uses this tree to present a document in a meaningful reading order, and it is the basis for PDF/UA (ISO 14289) accessibility conformance.
By default phppdf emits untagged PDFs. Call $doc->enableTagging() before save() / output() and the library builds the structure tree for you and emits all the tagged-PDF plumbing automatically as you draw with the high-level API.
use DragonOfMercy\PhpPdf\Document;
use DragonOfMercy\PhpPdf\Font;
$doc = new Document();
$doc->enableTagging('en-US'); // turn on tagging, set the document language
$page = $doc->addPage();
$page->setFont(Font::helvetica(), 12);
$page->cell(w: 80, h: 10, text: 'A tagged paragraph.');
$doc->save('tagged.pdf');enableTagging() is fluent (returns the Document) and opt-in. When you do not call it, output is byte-for-byte identical to an untagged build - tagging adds nothing to the file.
The single argument to enableTagging() is an optional BCP-47 language tag (for example 'en-US', 'fr', 'de-CH'). It sets the catalog /Lang, which tells a reader the default natural language of the document's text. Pass null (or omit it) to leave the language unset:
$doc->enableTagging(); // tagged, no /Lang
$doc->enableTagging('fr-FR'); // tagged, /Lang (fr-FR)An invalid tag throws a PdfException.
Once tagging is on, the high-level drawing calls are mapped to structure elements automatically. You do not place tags by hand:
| API call | Structure element(s) |
|---|---|
cell() text |
<P> (paragraph) |
image() |
<Figure> |
table() |
<Table>, <TR> rows, <TH> header cells, <TD> body cells |
markdown() heading |
<H1> .. <H6>
|
markdown() paragraph / code block |
<P> |
markdown() list |
<L> with <LI> items, each wrapping an <LBody>
|
use DragonOfMercy\PhpPdf\Color;
use DragonOfMercy\PhpPdf\Document;
use DragonOfMercy\PhpPdf\Font;
use DragonOfMercy\PhpPdf\Table\Column;
use DragonOfMercy\PhpPdf\Table\TableBorders;
use DragonOfMercy\PhpPdf\Table\TableStyle;
use DragonOfMercy\PhpPdf\TextAlign;
$doc = new Document();
$doc->enableTagging('en-US');
$page = $doc->addPage();
$page->setFont(Font::helvetica(), 11.0);
$page->table(
columns: [
Column::of('name', 'Article')->fill(),
Column::of('price', 'Price')->width(30.0)->align(TextAlign::RIGHT),
],
rows: [
['name' => 'Coffee', 'price' => '5.00'],
['name' => 'Croissant', 'price' => '3.60'],
],
x: 20.0, y: 30.0, width: 170.0,
style: TableStyle::default()
->withBorder(TableBorders::GRID)
->withHeader(fill: Color::gray(238), bold: true),
);
$doc->save('tagged-table.pdf');The header row's cells are tagged <TH> and the body cells <TD>, nested under <TR> rows inside a single <Table>.
$doc = new Document();
$doc->enableTagging('en-US');
$page = $doc->addPage();
$page->setFont(Font::helvetica(), 12);
$page->markdown("# Title\n\nA paragraph.\n\n- one\n- two");
$doc->save('tagged-markdown.pdf');The heading becomes <H1>, the paragraph <P>, and the bullet list a <L> with two <LI> / <LBody> items.
When tagging is enabled the serializer adds, in one deterministic pass:
-
/StructTreeRootin the catalog - the root of the logical structure tree, holding every<P>/<Figure>/<Table>/ heading / list element built while you drew the page. -
/MarkInfo <</Marked true>>in the catalog, and the optional document/Lang. -
Marked-content sequences. Each tagged run of page content is wrapped in a
BDC/EMCpair carrying a marked-content identifier (MCID). -
A ParentTree linking every MCID back to its structure element, plus a
/StructParentsentry on each page. -
/Tabs /Son each page, so the tab order follows structure order.
The result is a well-formed tagged PDF (validated structurally and qpdf-clean). For a document that also validates as conformant PDF/UA-1, use enablePdfUA() (below).
enableTagging() builds a well-formed structure tree, but a conformant accessible PDF must do more: mark every non-structural mark as an artifact, give every figure alternate text, embed all fonts, advertise its title, and so on. Document::enablePdfUA() turns all of that on. It implies enableTagging() (everything above still applies) and additionally makes the output validate as PDF/UA-1 (ISO 14289-1), checked with the veraPDF ua1 profile.
use DragonOfMercy\PhpPdf\Document;
use DragonOfMercy\PhpPdf\Font;
$doc = new Document();
$doc->metadata()->title('Quarterly report'); // required (see below)
$doc->registerFontFamily('Body', regular: 'DejaVuSans.ttf'); // embedded font required
$doc->enablePdfUA('en-US');
$page = $doc->addPage();
$page->setFont(Font::custom('Body'), 12);
$page->markdown("# Quarterly report\n\nRevenue grew 12% this quarter.");
$page->image('chart.png', x: 20, y: 80, w: 60, h: 40, alt: 'Revenue chart, Q1 to Q4');
$doc->save('accessible.pdf');enablePdfUA() is fail-fast: at output() a conformance guard throws a PdfException with an actionable message rather than emit a non-conformant file. You must:
-
Embed every font. The non-embeddable standard-14 fonts (Helvetica, Times, ...) are rejected, exactly as under PDF/A. Register a real font with
registerFontFamily()and use it viaFont::custom(). -
Set a document title with
$doc->metadata()->title('...'). PDF/UA shows the title (not the file name) in the window bar. -
Give every figure alternate text. Pass
image(alt: '...'). For a purely decorative image, passimage(decorative: true)instead - it is marked as an artifact and needs no alt text. -
Not skip heading levels. Going from
#straight to###(H1 to H3) throws; keep headings contiguous. -
Not use link annotations yet. Conformant link tagging is a later phase; until then a UA document with links is rejected (use
enableTagging()if you need links without UA conformance).
$page->image('logo.png', x: 20, y: 20, w: 30, h: 12, alt: 'Acme Corporation'); // <Figure> with /Alt
$page->image('divider.png', x: 20, y: 40, w: 170, h: 1, decorative: true); // /Artifact, skipped by ATPassing both alt: and decorative: true is a contradiction and throws.
-
Artifact marking. Cell fills and borders, table chrome, Markdown list markers (the bullet / number glyphs), and all header and footer content are wrapped in
/Artifactmarked content, so assistive technology skips them and no real content is left untagged. -
Figure
/Altfromimage(alt: ...), and/Scope /Columnon<TH>table-header cells. -
/ViewerPreferences <</DisplayDocTitle true>>so readers show the title. -
An XMP
/Metadatastream is always emitted, carrying thepdfuaid:partidentifier that marks the file as PDF/UA.
enablePdfUA() makes documents built from cell(), table(), markdown(), and image() validate as PDF/UA-1. A few areas are still deferred:
-
Link / annotation tagging. Conformant links (
<Link>structure elements withOBJRreferences and per-annotation/StructParent) are the next phase; until thenenablePdfUA()rejects a document that has link annotations. - Image-in-table cells are not individually tagged.
- Encryption / signatures with UA are not validated against veraPDF; the guard does not reject them, but conformance of an encrypted or signed UA document is unverified.
- PDF/A level "a". Tagging composes with PDF/A-2 and PDF/A-3 (levels b and u); tagged PDF/A (level "a") builds on this work and is a later phase.
The library ships golden tests (tests/Golden/TaggingGoldenTest.php) that render tagged cell, figure, table, Markdown, and a full UA document, each qpdf --check-validated. A dedicated test (tests/Golden/VeraPdfUa1Test.php) renders the UA document and runs it through the veraPDF ua1 profile, asserting isCompliant. The qpdf and veraPDF checks auto-skip when the tools are absent. Run them with:
cd build/
vendor/bin/phpunit tests/Golden/TaggingGoldenTest.php tests/Golden/VeraPdfUa1Test.phpMIT licensed. Source on GitHub - if phppdf helps you, you can buy me a coffee.