Skip to content

Tagged PDF

Dragon edited this page Jun 5, 2026 · 6 revisions

Tagged PDF and Accessibility

A tagged PDF carries a logical structure tree alongside its visual content: a parallel tree of semantic elements (paragraphs, headings, figures, tables, lists) that describes what the content is and in what order it should be read. Assistive technology such as screen readers uses this tree to present a document in a meaningful reading order, and it is the basis for PDF/UA (ISO 14289) accessibility conformance.

By default phppdf emits untagged PDFs. Call $doc->enableTagging() before save() / output() and the library builds the structure tree for you and emits all the tagged-PDF plumbing automatically as you draw with the high-level API.

Enabling tagging

use DragonOfMercy\PhpPdf\Document;
use DragonOfMercy\PhpPdf\Font;

$doc = new Document();
$doc->enableTagging('en-US');        // turn on tagging, set the document language

$page = $doc->addPage();
$page->setFont(Font::helvetica(), 12);
$page->cell(w: 80, h: 10, text: 'A tagged paragraph.');

$doc->save('tagged.pdf');

enableTagging() is fluent (returns the Document) and opt-in. When you do not call it, output is byte-for-byte identical to an untagged build - tagging adds nothing to the file.

Document language

The single argument to enableTagging() is an optional BCP-47 language tag (for example 'en-US', 'fr', 'de-CH'). It sets the catalog /Lang, which tells a reader the default natural language of the document's text. Pass null (or omit it) to leave the language unset:

$doc->enableTagging();          // tagged, no /Lang
$doc->enableTagging('fr-FR');   // tagged, /Lang (fr-FR)

An invalid tag throws a PdfException.

What gets tagged

Once tagging is on, the high-level drawing calls are mapped to structure elements automatically. You do not place tags by hand:

API call Structure element(s)
cell() text <P> (paragraph)
image() <Figure>
table() <Table>, <TR> rows, <TH> header cells, <TD> body cells
markdown() heading <H1> .. <H6>
markdown() paragraph / code block <P>
markdown() list <L> with <LI> items, each wrapping an <LBody>

Table example

use DragonOfMercy\PhpPdf\Color;
use DragonOfMercy\PhpPdf\Document;
use DragonOfMercy\PhpPdf\Font;
use DragonOfMercy\PhpPdf\Table\Column;
use DragonOfMercy\PhpPdf\Table\TableBorders;
use DragonOfMercy\PhpPdf\Table\TableStyle;
use DragonOfMercy\PhpPdf\TextAlign;

$doc = new Document();
$doc->enableTagging('en-US');
$page = $doc->addPage();
$page->setFont(Font::helvetica(), 11.0);

$page->table(
    columns: [
        Column::of('name', 'Article')->fill(),
        Column::of('price', 'Price')->width(30.0)->align(TextAlign::RIGHT),
    ],
    rows: [
        ['name' => 'Coffee',    'price' => '5.00'],
        ['name' => 'Croissant', 'price' => '3.60'],
    ],
    x: 20.0, y: 30.0, width: 170.0,
    style: TableStyle::default()
        ->withBorder(TableBorders::GRID)
        ->withHeader(fill: Color::gray(238), bold: true),
);

$doc->save('tagged-table.pdf');

The header row's cells are tagged <TH> and the body cells <TD>, nested under <TR> rows inside a single <Table>.

Markdown example

$doc = new Document();
$doc->enableTagging('en-US');
$page = $doc->addPage();
$page->setFont(Font::helvetica(), 12);

$page->markdown("# Title\n\nA paragraph.\n\n- one\n- two");

$doc->save('tagged-markdown.pdf');

The heading becomes <H1>, the paragraph <P>, and the bullet list a <L> with two <LI> / <LBody> items.

What enableTagging() emits at output time

When tagging is enabled the serializer adds, in one deterministic pass:

  1. /StructTreeRoot in the catalog - the root of the logical structure tree, holding every <P> / <Figure> / <Table> / heading / list element built while you drew the page.
  2. /MarkInfo <</Marked true>> in the catalog, and the optional document /Lang.
  3. Marked-content sequences. Each tagged run of page content is wrapped in a BDC / EMC pair carrying a marked-content identifier (MCID).
  4. A ParentTree linking every MCID back to its structure element, plus a /StructParents entry on each page.
  5. /Tabs /S on each page, so the tab order follows structure order.

The result is a well-formed tagged PDF (validated structurally and qpdf-clean).

Current limitations and roadmap

Tagging ships in phases. This is Phase 1: it produces a sound structure tree for the common content types, but it is not yet PDF/UA-1 conformant. The following are deliberately deferred to a later phase:

  • No alternate text yet. Figures (<Figure>) and links carry no /Alt. Because PDF/UA-1 requires alternate text on non-text content, the output is not yet UA-1 conformant.
  • Links are not tagged yet. There is no <Link> structure element or OBJR linking annotations into the tree.
  • Decorative content is not marked as /Artifact yet. Headers and footers, table borders and fills, and list markers are not yet flagged as artifacts.
  • Page-break and image-in-table edges. A block that flows across a page break has its marked content on the starting page only, and an image placed inside a table cell is not tagged.
  • PDF/A level "a". Tagging composes with PDF/A-2 and PDF/A-3 (levels b and u); tagged PDF/A (level "a") is a later phase.

These items - alternate text, link tagging, artifacts, and full PDF/UA-1 conformance - are the next steps on the accessibility roadmap.

Validation

The library ships a golden test (tests/Golden/TaggingGoldenTest.php) that renders tagged cell, figure, table, and Markdown documents and runs each through qpdf --check. The qpdf checks auto-skip when qpdf is not on PATH. Run them with:

cd build/
vendor/bin/phpunit tests/Golden/TaggingGoldenTest.php

See also

Clone this wiki locally