-
Notifications
You must be signed in to change notification settings - Fork 0
Tagged PDF
A tagged PDF carries a logical structure tree alongside its visual content: a parallel tree of semantic elements (paragraphs, headings, figures, tables, lists) that describes what the content is and in what order it should be read. Assistive technology such as screen readers uses this tree to present a document in a meaningful reading order, and it is the basis for PDF/UA (ISO 14289) accessibility conformance.
By default phppdf emits untagged PDFs. Call $doc->enableTagging() before save() / output() and the library builds the structure tree for you and emits all the tagged-PDF plumbing automatically as you draw with the high-level API.
use DragonOfMercy\PhpPdf\Document;
use DragonOfMercy\PhpPdf\Font;
$doc = new Document();
$doc->enableTagging('en-US'); // turn on tagging, set the document language
$page = $doc->addPage();
$page->setFont(Font::helvetica(), 12);
$page->cell(w: 80, h: 10, text: 'A tagged paragraph.');
$doc->save('tagged.pdf');enableTagging() is fluent (returns the Document) and opt-in. When you do not call it, output is byte-for-byte identical to an untagged build - tagging adds nothing to the file.
The single argument to enableTagging() is an optional BCP-47 language tag (for example 'en-US', 'fr', 'de-CH'). It sets the catalog /Lang, which tells a reader the default natural language of the document's text. Pass null (or omit it) to leave the language unset:
$doc->enableTagging(); // tagged, no /Lang
$doc->enableTagging('fr-FR'); // tagged, /Lang (fr-FR)An invalid tag throws a PdfException.
Once tagging is on, the high-level drawing calls are mapped to structure elements automatically. You do not place tags by hand:
| API call | Structure element(s) |
|---|---|
cell() text |
<P> (paragraph) |
image() |
<Figure> |
table() |
<Table>, <TR> rows, <TH> header cells, <TD> body cells |
markdown() heading |
<H1> .. <H6>
|
markdown() paragraph / code block |
<P> |
markdown() list |
<L> with <LI> items, each wrapping an <LBody>
|
use DragonOfMercy\PhpPdf\Color;
use DragonOfMercy\PhpPdf\Document;
use DragonOfMercy\PhpPdf\Font;
use DragonOfMercy\PhpPdf\Table\Column;
use DragonOfMercy\PhpPdf\Table\TableBorders;
use DragonOfMercy\PhpPdf\Table\TableStyle;
use DragonOfMercy\PhpPdf\TextAlign;
$doc = new Document();
$doc->enableTagging('en-US');
$page = $doc->addPage();
$page->setFont(Font::helvetica(), 11.0);
$page->table(
columns: [
Column::of('name', 'Article')->fill(),
Column::of('price', 'Price')->width(30.0)->align(TextAlign::RIGHT),
],
rows: [
['name' => 'Coffee', 'price' => '5.00'],
['name' => 'Croissant', 'price' => '3.60'],
],
x: 20.0, y: 30.0, width: 170.0,
style: TableStyle::default()
->withBorder(TableBorders::GRID)
->withHeader(fill: Color::gray(238), bold: true),
);
$doc->save('tagged-table.pdf');The header row's cells are tagged <TH> and the body cells <TD>, nested under <TR> rows inside a single <Table>.
$doc = new Document();
$doc->enableTagging('en-US');
$page = $doc->addPage();
$page->setFont(Font::helvetica(), 12);
$page->markdown("# Title\n\nA paragraph.\n\n- one\n- two");
$doc->save('tagged-markdown.pdf');The heading becomes <H1>, the paragraph <P>, and the bullet list a <L> with two <LI> / <LBody> items.
When tagging is enabled the serializer adds, in one deterministic pass:
-
/StructTreeRootin the catalog - the root of the logical structure tree, holding every<P>/<Figure>/<Table>/ heading / list element built while you drew the page. -
/MarkInfo <</Marked true>>in the catalog, and the optional document/Lang. -
Marked-content sequences. Each tagged run of page content is wrapped in a
BDC/EMCpair carrying a marked-content identifier (MCID). -
A ParentTree linking every MCID back to its structure element, plus a
/StructParentsentry on each page. -
/Tabs /Son each page, so the tab order follows structure order.
The result is a well-formed tagged PDF (validated structurally and qpdf-clean).
Tagging ships in phases. This is Phase 1: it produces a sound structure tree for the common content types, but it is not yet PDF/UA-1 conformant. The following are deliberately deferred to a later phase:
-
No alternate text yet. Figures (
<Figure>) and links carry no/Alt. Because PDF/UA-1 requires alternate text on non-text content, the output is not yet UA-1 conformant. -
Links are not tagged yet. There is no
<Link>structure element orOBJRlinking annotations into the tree. -
Decorative content is not marked as
/Artifactyet. Headers and footers, table borders and fills, and list markers are not yet flagged as artifacts. - Page-break and image-in-table edges. A block that flows across a page break has its marked content on the starting page only, and an image placed inside a table cell is not tagged.
- PDF/A level "a". Tagging composes with PDF/A-2 and PDF/A-3 (levels b and u); tagged PDF/A (level "a") is a later phase.
These items - alternate text, link tagging, artifacts, and full PDF/UA-1 conformance - are the next steps on the accessibility roadmap.
The library ships a golden test (tests/Golden/TaggingGoldenTest.php) that renders tagged cell, figure, table, and Markdown documents and runs each through qpdf --check. The qpdf checks auto-skip when qpdf is not on PATH. Run them with:
cd build/
vendor/bin/phpunit tests/Golden/TaggingGoldenTest.phpMIT licensed. Source on GitHub - if phppdf helps you, you can buy me a coffee.