-
Notifications
You must be signed in to change notification settings - Fork 0
Tagged PDF
A tagged PDF carries a logical structure tree alongside its visual content: a parallel tree of semantic elements (paragraphs, headings, figures, tables, lists) that describes what the content is and in what order it should be read. Assistive technology such as screen readers uses this tree to present a document in a meaningful reading order, and it is the basis for PDF/UA (ISO 14289) accessibility conformance.
By default phppdf emits untagged PDFs. Call $doc->enableTagging() before save() / output() and the library builds the structure tree for you and emits all the tagged-PDF plumbing automatically as you draw with the high-level API.
use DragonOfMercy\PhpPdf\Document;
use DragonOfMercy\PhpPdf\Font;
$doc = new Document();
$doc->enableTagging('en-US'); // turn on tagging, set the document language
$page = $doc->addPage();
$page->setFont(Font::helvetica(), 12);
$page->cell(w: 80, h: 10, text: 'A tagged paragraph.');
$doc->save('tagged.pdf');enableTagging() is fluent (returns the Document) and opt-in. When you do not call it, output is byte-for-byte identical to an untagged build - tagging adds nothing to the file.
The single argument to enableTagging() is an optional BCP-47 language tag (for example 'en-US', 'fr', 'de-CH'). It sets the catalog /Lang, which tells a reader the default natural language of the document's text. Pass null (or omit it) to leave the language unset:
$doc->enableTagging(); // tagged, no /Lang
$doc->enableTagging('fr-FR'); // tagged, /Lang (fr-FR)An invalid tag throws a PdfException.
Once tagging is on, the high-level drawing calls are mapped to structure elements automatically. You do not place tags by hand:
| API call | Structure element(s) |
|---|---|
cell() text |
<P> (paragraph) |
image() |
<Figure> |
table() |
<Table>, <TR> rows, <TH> header cells, <TD> body cells |
markdown() heading |
<H1> .. <H6>
|
markdown() paragraph / code block |
<P> |
markdown() list |
<L> with <LI> items, each wrapping an <LBody>
|
use DragonOfMercy\PhpPdf\Color;
use DragonOfMercy\PhpPdf\Document;
use DragonOfMercy\PhpPdf\Font;
use DragonOfMercy\PhpPdf\Table\Column;
use DragonOfMercy\PhpPdf\Table\TableBorders;
use DragonOfMercy\PhpPdf\Table\TableStyle;
use DragonOfMercy\PhpPdf\TextAlign;
$doc = new Document();
$doc->enableTagging('en-US');
$page = $doc->addPage();
$page->setFont(Font::helvetica(), 11.0);
$page->table(
columns: [
Column::of('name', 'Article')->fill(),
Column::of('price', 'Price')->width(30.0)->align(TextAlign::RIGHT),
],
rows: [
['name' => 'Coffee', 'price' => '5.00'],
['name' => 'Croissant', 'price' => '3.60'],
],
x: 20.0, y: 30.0, width: 170.0,
style: TableStyle::default()
->withBorder(TableBorders::GRID)
->withHeader(fill: Color::gray(238), bold: true),
);
$doc->save('tagged-table.pdf');The header row's cells are tagged <TH> and the body cells <TD>, nested under <TR> rows inside a single <Table>.
$doc = new Document();
$doc->enableTagging('en-US');
$page = $doc->addPage();
$page->setFont(Font::helvetica(), 12);
$page->markdown("# Title\n\nA paragraph.\n\n- one\n- two");
$doc->save('tagged-markdown.pdf');The heading becomes <H1>, the paragraph <P>, and the bullet list a <L> with two <LI> / <LBody> items.
When tagging is enabled the serializer adds, in one deterministic pass:
-
/StructTreeRootin the catalog - the root of the logical structure tree, holding every<P>/<Figure>/<Table>/ heading / list element built while you drew the page. -
/MarkInfo <</Marked true>>in the catalog, and the optional document/Lang. -
Marked-content sequences. Each tagged run of page content is wrapped in a
BDC/EMCpair carrying a marked-content identifier (MCID). -
A ParentTree linking every MCID back to its structure element, plus a
/StructParentsentry on each page. -
/Tabs /Son each page, so the tab order follows structure order.
The result is a well-formed tagged PDF (validated structurally and qpdf-clean). For a document that also validates as conformant PDF/UA-1, use enablePdfUA() (below).
enableTagging() builds a well-formed structure tree, but a conformant accessible PDF must do more: mark every non-structural mark as an artifact, give every figure alternate text, embed all fonts, advertise its title, and so on. Document::enablePdfUA() turns all of that on. It implies enableTagging() (everything above still applies) and additionally makes the output validate as PDF/UA-1 (ISO 14289-1), checked with the veraPDF ua1 profile.
use DragonOfMercy\PhpPdf\Document;
use DragonOfMercy\PhpPdf\Font;
$doc = new Document();
$doc->metadata()->title('Quarterly report'); // required (see below)
$doc->registerFontFamily('Body', regular: 'DejaVuSans.ttf'); // embedded font required
$doc->enablePdfUA('en-US');
$page = $doc->addPage();
$page->setFont(Font::custom('Body'), 12);
$page->markdown("# Quarterly report\n\nRevenue grew 12% this quarter.");
$page->image('chart.png', x: 20, y: 80, w: 60, h: 40, alt: 'Revenue chart, Q1 to Q4');
$doc->save('accessible.pdf');enablePdfUA() is fail-fast: at output() a conformance guard throws a PdfException with an actionable message rather than emit a non-conformant file. You must:
-
Embed every font. The non-embeddable standard-14 fonts (Helvetica, Times, ...) are rejected, exactly as under PDF/A. Register a real font with
registerFontFamily()and use it viaFont::custom(). -
Set a document title with
$doc->metadata()->title('...'). PDF/UA shows the title (not the file name) in the window bar. -
Give every figure alternate text. Pass
image(alt: '...'). For a purely decorative image, passimage(decorative: true)instead - it is marked as an artifact and needs no alt text. -
Not skip heading levels. Going from
#straight to###(H1 to H3) throws; keep headings contiguous. -
Tag your links. Make hyperlinks with
cell(link: ...)(which tags them - see below), not the low-levelPage::link()area annotation. A UA document that contains an untaggedPage::link()is rejected.
$page->image('logo.png', x: 20, y: 20, w: 30, h: 12, alt: 'Acme Corporation'); // <Figure> with /Alt
$page->image('divider.png', x: 20, y: 40, w: 170, h: 1, decorative: true); // /Artifact, skipped by ATPassing both alt: and decorative: true is a contradiction and throws.
A UA-conformant link must be reachable from the structure tree: a <Link>
element holding the link text and an object reference (/OBJR) to the
annotation, which in turn carries a description. cell(link: ...) produces all
of that in one call - it draws the text, makes the cell box clickable, and (when
tagging is on) tags it as a <Link>.
use DragonOfMercy\PhpPdf\Outline\Link;
use DragonOfMercy\PhpPdf\Outline\Destination;
// External hyperlink. linkAlt becomes the annotation /Contents; it defaults to the cell text.
$page->cell(w: 70, h: 8, text: 'Visit example.com', link: Link::url('https://example.com'), linkAlt: 'Example home page');
// Internal jump to another page.
$page->cell(w: 70, h: 8, text: 'Back to the cover', link: Link::destination(Destination::page(0)));The link: parameter also works without tagging (it just makes the cell
clickable). linkAlt without link, or link with no width/height, throws.
The low-level Page::link(x, y, w, h, $link) (a bare clickable rectangle with
no associated text) is not tagged and is rejected under enablePdfUA(); use
cell(link: ...) for accessible links.
When tagging is enabled, inline hyperlinks in Markdown (the [text](url) syntax)
are also tagged automatically as <Link> structure elements - no extra API call
is needed beyond enableTagging() or enablePdfUA(). Each link carries an
/OBJR, a /StructParent, and a /Contents set to the link's visible text
(CommonMark link titles are not yet used for /Contents). The underline rule
drawn under the link text is marked as an /Artifact. Documents rendered with
markdown() containing inline links validate under veraPDF PDF/UA-1.
A link that wraps across lines produces a single <Link> element with one
annotation rectangle per line.
$doc = new Document();
$doc->metadata()->title('Report');
$doc->registerFontFamily('Body', regular: 'DejaVuSans.ttf');
$doc->enablePdfUA('en-US');
$page = $doc->addPage();
$page->setFont(Font::custom('Body'), 12);
$page->markdown("See the [project wiki](https://github.com/dragonofmercy/php-pdf/wiki) for full documentation.");
$doc->save('accessible-links.pdf');With tagging off the output is unchanged: links become plain area annotations, identical to the output before this feature was added.
Image hyperlinks are tagged on both surfaces when tagging is enabled. Page::image(link: ..., linkAlt: ...) and Markdown block image links [](url) both tag the result as a <Link> structure element wrapping the image <Figure>. The annotation carries an /OBJR, a /StructParent, and /Contents set to linkAlt (falling back to the image alt when linkAlt is not supplied). The <Figure> carries /Alt from the image alt text. Documents with image links validate under veraPDF PDF/UA-1.
use DragonOfMercy\PhpPdf\Document;
use DragonOfMercy\PhpPdf\Font;
use DragonOfMercy\PhpPdf\Outline\Link;
$doc = new Document();
$doc->metadata()->title('Report');
$doc->registerFontFamily('Body', regular: 'DejaVuSans.ttf');
$doc->enablePdfUA('en-US');
$page = $doc->addPage();
$page->setFont(Font::custom('Body'), 12);
// Direct API: a clickable image tagged as <Link> + <Figure>
$page->image('chart.png', x: 20, y: 40, w: 60, h: 40,
alt: 'Revenue chart',
link: Link::url('https://example.com'),
linkAlt: 'Open the full report');
$doc->save('accessible-image-link.pdf');With tagging off, a plain area-link annotation is used (unchanged from before). Two constraints apply: a decorative image (decorative: true) cannot carry a link (throws); CommonMark link titles are not used for /Contents (use linkAlt on the direct API, or accept the image alt as the fallback in Markdown).
-
Artifact marking. Cell fills and borders, table chrome, Markdown list markers (the bullet / number glyphs), and all header and footer content are wrapped in
/Artifactmarked content, so assistive technology skips them and no real content is left untagged. -
Figure
/Altfromimage(alt: ...), and/Scope /Columnon<TH>table-header cells. -
/ViewerPreferences <</DisplayDocTitle true>>so readers show the title. -
An XMP
/Metadatastream is always emitted, carrying thepdfuaid:partidentifier that marks the file as PDF/UA.
enablePdfUA() makes documents built from cell(), table(), markdown(), and image() validate as PDF/UA-1. A few areas are still deferred:
-
Area links. The low-level
Page::link()area annotation (a bare clickable rectangle with no associated text) is not tagged and is rejected underenablePdfUA(). Usecell(link: ...)for accessible text links andimage(link: ...)for accessible image links. - Image-in-table cells are not individually tagged.
- Encryption / signatures with UA are not validated against veraPDF; the guard does not reject them, but conformance of an encrypted or signed UA document is unverified.
The library ships golden tests (tests/Golden/TaggingGoldenTest.php) that render tagged cell, figure, table, Markdown, a full UA document, and a UA document with tagged links, each qpdf --check-validated. A dedicated test (tests/Golden/VeraPdfUa1Test.php) renders the UA documents (including the one with links) and runs them through the veraPDF ua1 profile, asserting isCompliant. The qpdf and veraPDF checks auto-skip when the tools are absent. Run them with:
cd build/
vendor/bin/phpunit tests/Golden/TaggingGoldenTest.php tests/Golden/VeraPdfUa1Test.phpMIT licensed. Source on GitHub - if phppdf helps you, you can buy me a coffee.