Pure-PHP DOCX (Office Open XML) library: bidirectional HTML ↔ DOCX conversion, fluent programmatic builder, variable detection, round-trip-safe AST. No external dependencies beyond standard PHP extensions.
Read this in other languages: English · Русский · 中文 · Deutsch
- Features
- Requirements
- Installation
- Quick start
- HTML → DOCX
- Programmatic builder API
- DOCX → HTML (Reader)
- Headers, footers & watermarks
- Variable detection
- Length helpers
- AST overview
- Round-trip
- Architecture
- Development
- License
- HTML → DOCX writer — full set of typical layout elements (paragraphs/headings/tables/lists/images/links/fields), inline-style resolution, custom heading registry.
- DOCX → HTML reader — parses arbitrary Word/Pages/LibreOffice documents into a typed AST, then serialises back to HTML with inline styles. Style cascade (docDefaults → named → direct), theme colors, numbering reconstruction, vMerge/gridSpan collapse, watermark detection (VML + DrawingML).
- Fluent programmatic builder —
DocumentBuilderwith closure scopes for nested structures (tables, lists, headers). - Variable detection — MERGEFIELD, SDT content controls, configurable
text patterns (
{{x}},${x},%x%). - Multi-header/footer — default / first-page / even-pages variants
with automatic
<w:titlePg/>and<w:evenAndOddHeaders/>plumbing. - Round-trip safe — read DOCX → AST → write DOCX produces a valid document; bytes-level differences are limited to whitespace/ordering.
- PHP 8.2+ —
readonlyvalue-objects, named arguments, constructor promotion, enums. - Zero composer dependencies.
Tracked changes, comments, embedded charts, OLE objects, footnotes/endnotes, SmartArt, math equations (OMML), form fields, custom XML parts.
- PHP 8.2+
ext-zip,ext-dom,ext-mbstring
composer require dskripchenko/php-docxuse Dskripchenko\PhpDocx\Html\Converter;
use Dskripchenko\PhpDocx\Writer\Word2007Writer;
$html = <<<HTML
<h1>Invoice #42</h1>
<p>Total: <strong>500 USD</strong></p>
<table>
<tr><th>Item</th><th>Qty</th></tr>
<tr><td>Widget</td><td>2</td></tr>
</table>
<p>Page <page-number/> of <page-total/></p>
HTML;
$doc = (new Converter)->fromHtml($html);
file_put_contents('invoice.docx', (new Word2007Writer)->write($doc));use Dskripchenko\PhpDocx\Build\DocumentBuilder;
use Dskripchenko\PhpDocx\Element\ListFormat;
DocumentBuilder::new()
->watermark('DRAFT')
->header(fn ($h) => $h->paragraph('Acme Inc.'))
->footer(fn ($f) => $f->paragraph(fn ($p) => $p
->text('Page ')->pageNumber()->text(' of ')->totalPages()
))
->heading(1, 'Invoice #42')
->paragraph(fn ($p) => $p
->text('Customer: ')->bold('Acme Co.')
->lineBreak()
->text('ID: ')->mergeField('CustomerID')
)
->table(fn ($t) => $t
->columns(fn ($c) => $c->widthCm(8), fn ($c) => $c->widthCm(3))
->headerRow(['Item', 'Qty'])
->row(['Widget', '2'])
)
->orderedList(fn ($l) => $l
->format(ListFormat::LowerLetter)
->item('Net 30 terms')
->item('Free shipping')
)
->toFile('invoice.docx');use Dskripchenko\PhpDocx\Reader\DocxReader;
use Dskripchenko\PhpDocx\Reader\DocxPackageReader;
use Dskripchenko\PhpDocx\Reader\VariableDetector;
use Dskripchenko\PhpDocx\Html\Serializer;
$bytes = file_get_contents('input.docx');
$document = (new DocxReader)->read($bytes);
$pkg = (new DocxPackageReader)->read($bytes);
$variables = (new VariableDetector)->detect($pkg);
$imported = (new Serializer)->serialize($document, $variables);
echo $imported->bodyHtml;
echo $imported->headerHtml;
echo $imported->footerHtml;
echo $imported->watermarkText;
$imported->pageSettings;
$imported->variables;
$imported->media;Input HTML must use inline styles only (no <style> blocks). Use a
CSS-inliner upstream if needed.
| Category | HTML tags |
|---|---|
| Text blocks | <p>, <h1..h6>, <div>, <pre>, <blockquote> |
| Inline marks | <strong>/<b>, <em>/<i>, <u>, <s>/<del>, <sup>, <sub>, <mark> |
| Code/teletype | <code>, <kbd>, <samp>, <var>, <cite>, <dfn>, <q>, <small> |
| Links | <a href> external, <a href="#anchor"> internal, <a id> bookmarks |
| Images | <img src="data:image/...;base64,..."> |
| Tables | <table>, <thead>/<tbody>, <tr>, <th>/<td>, <colgroup>/<col>, <caption>, colspan, rowspan |
| Lists | <ul>, <ol type="a/A/i/I" start="N">, <li value="N">, <dl>/<dt>/<dd> |
| Custom tags | <page-number/>, <page-total/>, <current-date format="...">, <page-break> |
| Layout | <hr>, <br>, <figure>/<figcaption> |
The converter understands style="…" properties:
- Run-level:
font-family,font-size,font-weight,font-style,text-decoration,color,background-color - Paragraph-level:
text-align,margin,text-indent,line-height,border,padding - Table-level:
width,border,border-collapse - Cell-level:
width,padding,border,vertical-align,background-color
<p>Page <page-number/> of <page-total/></p>
<p>Generated on <current-date format="dd.MM.yyyy"/></p>These become OOXML field codes (<w:fldSimple w:instr="PAGE">).
use Dskripchenko\PhpDocx\Style\StyleRegistry;
use Dskripchenko\PhpDocx\Style\RunStyle;
use Dskripchenko\PhpDocx\Style\ParagraphStyle;
use Dskripchenko\PhpDocx\Style\Alignment;
$styles = (new StyleRegistry)
->heading(1, new RunStyle(sizeHalfPoints: 44, bold: true), new ParagraphStyle(alignment: Alignment::Center))
->heading(2, new RunStyle(sizeHalfPoints: 28, bold: true));
$writer = new Word2007Writer($styles);The Build namespace provides a fluent API for assembling DOCX
documents block by block, finalising to the same immutable AST that the
HTML pipeline produces.
Entry point. Accumulates body, header/footer, watermark, page setup.
use Dskripchenko\PhpDocx\Build\DocumentBuilder;
use Dskripchenko\PhpDocx\Style\PageSetup;
use Dskripchenko\PhpDocx\Style\PaperSize;
use Dskripchenko\PhpDocx\Style\Orientation;
$doc = DocumentBuilder::new()
->pageSetup(new PageSetup(
paperSize: PaperSize::A4,
orientation: Orientation::Portrait,
))
->watermark('CONFIDENTIAL')
->heading(1, 'Report')
->paragraph('Body')
->build(); // → Document AST
$bytes = DocumentBuilder::new()->paragraph('Hi')->toBytes();
$count = DocumentBuilder::new()->paragraph('Hi')->toFile('out.docx');Inside ->paragraph(fn ($p) => …):
->paragraph(fn ($p) => $p
->text('plain ')
->bold('bold ')
->italic('italic ')
->underline('under ')
->strike('strike ')
->sup('super')->text('script ')
->sub('sub')->text('script ')
->styled('red', fn ($s) => $s->color('ff0000')->bold())
->lineBreak()
->link('https://example.com', 'website')
->internalLink('section1', 'go to section 1')
->bookmark('anchor1', 'anchor target')
->pageNumber()
->totalPages()
->currentDate('yyyy-MM-dd')
->mergeField('CustomerName')
->image($img)
->imageFromFile('/path/to/logo.png', widthPx: 150, altText: 'Logo')
)Paragraph-level styling:
->paragraph(fn ($p) => $p
->alignCenter() // or alignRight()/alignJustify()
->indentMm(left: 20, firstLine: 10)
->spacingPt(before: 6, after: 12)
->text('Indented & spaced')
)use Dskripchenko\PhpDocx\Build\{TableBuilder, TableRowBuilder, TableCellBuilder, ColumnBuilder};
->table(fn (TableBuilder $t) => $t
->caption('Sales 2026')
->column(fn (ColumnBuilder $c) => $c->widthCm(6))
->column(fn (ColumnBuilder $c) => $c->widthCm(3))
->widthPercent(100)
->alignCenter()
->cellMarginsMm(2)
->headerRow(['Item', 'Price'])
->row(['Apple', '10 USD'])
->row(fn (TableRowBuilder $r) => $r
->cell('Banana')
->cell(fn (TableCellBuilder $c) => $c
->backgroundColor('ffeb3b')
->valignCenter()
->paragraph(fn ($p) => $p->bold('20 USD'))
)
)
)Spans and merges:
->row(fn ($r) => $r
->cell(fn ($c) => $c->gridSpan(2)->paragraph('Wide header'))
)
->row(fn ($r) => $r
->cell(fn ($c) => $c->rowSpan(2)->paragraph('Tall'))
->cell('right')
)use Dskripchenko\PhpDocx\Build\ListBuilder;
use Dskripchenko\PhpDocx\Element\ListFormat;
->bulletList(fn (ListBuilder $l) => $l
->item('First')
->item('Second', fn ($n) => $n
->item('Nested A')
->item('Nested B')
)
)
->orderedList(fn (ListBuilder $l) => $l
->format(ListFormat::LowerLetter) // a, b, c
->startAt(3)
->item('item starts at "c"')
)Used inside ->styled(text, fn (RunStyleBuilder) => …) or standalone via
RunStyleBuilder::new()->…->build().
RunStyleBuilder::new()
->bold()
->italic()
->underline()
->strike()
->color('ff0000')
->backgroundColor('eeeeee')
->highlight('yellow')
->fontFamily('Arial')
->fontSizePt(14.5)
->build();Convert common units to OOXML twips (1 twip = 1/20 pt). Used wherever a twip int is expected.
use Dskripchenko\PhpDocx\Build\Length;
Length::pt(12); // 240
Length::mm(20); // 1134
Length::cm(2.5); // 1417
Length::inch(0.5); // 720
Length::px(100); // 1500 (CSS px @ 96 DPI)Most builders expose unit-aware shortcuts:
- TableBuilder:
widthPt/widthMm/widthCm/widthInches,cellMarginsMm/cellMarginsPt - TableCellBuilder:
widthPt/Mm/Cm/Inches,paddingMm/Pt/Cm/Inches - ColumnBuilder:
widthPt/Mm/Cm/Inches/Px - ParagraphBuilder:
indentMm/Cm/Pt/Inches,spacingPt/Mm - RunStyleBuilder:
fontSizePt
use Dskripchenko\PhpDocx\Reader\DocxReader;
$document = (new DocxReader)->read(file_get_contents('input.docx'));
// → Document (AST)This runs the full pipeline: package unpack → styles resolve → body/header/footer parsing → vMerge/list reconstruction → image extraction → watermark detection → page setup.
If you need the raw OOXML parts:
use Dskripchenko\PhpDocx\Reader\DocxPackageReader;
$pkg = (new DocxPackageReader)->read($bytes);
$pkg->documentXml; // \DOMDocument
$pkg->stylesXml; // ?\DOMDocument
$pkg->numberingXml; // ?\DOMDocument
$pkg->themeXml; // ?\DOMDocument
$pkg->settingsXml; // ?\DOMDocument
$pkg->headers; // array<path, \DOMDocument>
$pkg->footers; // array<path, \DOMDocument>
$pkg->media; // array<path, bytes>
$pkg->documentRelationships(); // list<Relationship>
$pkg->resolveDocumentRel('rId7'); // Relationshipuse Dskripchenko\PhpDocx\Html\Serializer;
$imported = (new Serializer)->serialize($document, $variables);
// ImportedDocument:
$imported->bodyHtml; // string
$imported->headerHtml; // ?string
$imported->footerHtml; // ?string
$imported->watermarkText; // ?string
$imported->pageSettings; // PageSetup
$imported->variables; // list<DetectedVariable>
$imported->media; // array<filename, bytes>HTML output uses inline styles only — re-loadable into the same library
via Html\Converter::fromHtml($imported->bodyHtml).
Three header/footer types are supported per section: default, first
(title page), even (even pages). Word automatically renders the right
one based on page number.
DocumentBuilder::new()
->header(fn ($h) => $h->paragraph('Default header'))
->firstHeader(fn ($h) => $h->paragraph('Cover page'))
->evenHeader(fn ($h) => $h->paragraph(fn ($p) => $p
->text('Page ')->pageNumber()
))
->footer(fn ($f) => $f->paragraph('© 2026 Acme'))
->firstFooter(fn ($f) => $f->paragraph('Confidential'))
->evenFooter(fn ($f) => $f->paragraph('Even footer'))
->paragraph('Body')
->toFile('with-headers.docx');The writer automatically:
- emits
<w:titlePg/>insectPrwhen first-page header/footer is set - emits
word/settings.xmlwith<w:evenAndOddHeaders/>when even header/footer is set
DocumentBuilder::new()
->watermark('DRAFT')
->paragraph('Body')
->toFile('with-watermark.docx');Renders as a 45°-rotated VML text shape on every page.
Scans an imported DOCX for three kinds of variables:
- MERGEFIELD — Word mail-merge native, both simple
<w:fldSimple>and complex<w:fldChar>form. - SDT content controls —
<w:sdt>with<w:tag w:val="...">. - Text patterns — configurable regexes (defaults:
{{name}},${name},%name%).
use Dskripchenko\PhpDocx\Reader\VariableDetector;
$pkg = (new DocxPackageReader)->read($bytes);
$detector = new VariableDetector; // defaults
// Or with custom regexes:
$detector = new VariableDetector(['/\[\[(\w+)\]\]/']);
$variables = $detector->detect($pkg);
foreach ($variables as $v) {
echo "{$v->name} ({$v->source->value})";
echo " placeholder='{$v->placeholder}'";
echo " sample='{$v->sampleValue}'\n";
}Detection runs across body + all headers + all footers. Results are
deduplicated by (source, name).
See Length helpers above. Conversion table:
| Unit | Twips | Pt | Notes |
|---|---|---|---|
| 1 twip | 1 | 0.05 | OOXML native |
| 1 pt | 20 | 1 | typography |
| 1 mm | ~57 | 2.83 | metric |
| 1 cm | ~567 | 28.35 | metric |
| 1 inch | 1440 | 72 | imperial |
| 1 px | 15 | 0.75 | CSS @ 96 DPI |
All elements live under Dskripchenko\PhpDocx\Element namespace.
| Element | Type | Notes |
|---|---|---|
Document |
root | { section: Section, watermarkText: ?string } |
Section |
container | { body, header, footer, pageSetup, firstHeader, firstFooter, evenHeader, evenFooter } |
Paragraph |
BlockElement | { children: InlineElement[], style: ParagraphStyle, headingLevel: ?int } |
Run |
InlineElement | { text: string, style: RunStyle } |
Hyperlink |
InlineElement | { href: ?string, anchor: ?string, children: InlineElement[] } |
Bookmark |
InlineElement | { name: string, children: InlineElement[] } |
Image |
both | { binary, format, widthEmu, heightEmu, altText } |
Field |
InlineElement | { instruction: string, style: RunStyle } |
LineBreak, PageBreak, HorizontalRule |
both | marker elements |
Table |
BlockElement | { rows: TableRow[], style, caption, gridColumnsTwips } |
TableRow |
element | { cells: TableCell[], isHeader, heightTwips } |
TableCell |
element | { children: BlockElement[], style: CellStyle } |
ListNode |
BlockElement | { items: ListItem[], ordered, format, startAt } |
ListItem |
element | { children: InlineElement[], nestedList: ?ListNode } |
Styles live under Dskripchenko\PhpDocx\Style:
RunStyle— font, weight, italic, color, size, highlight, …ParagraphStyle— alignment, indents, spacing, bordersCellStyle— width, padding, borders, valign, gridSpan, rowSpanTableStyle— width, borders, alignment, cell margins, layoutPageSetup,PaperSize,Orientation,Alignment,VerticalAlign,BorderStyle,Border,BorderSet
$bytes1 = file_get_contents('original.docx');
$doc = (new DocxReader)->read($bytes1);
$bytes2 = (new Word2007Writer)->write($doc);
file_put_contents('roundtrip.docx', $bytes2);The library targets semantic round-trip safety, not byte equality — content, structure and styling survive, but XML ordering and whitespace may differ.
In-scope round-trip features:
- Paragraphs/headings with all run formatting
- Tables with
vMerge/gridSpanreconstruction - Lists (bullet/decimal/letter/roman) with arbitrary nesting
- Images with EMU sizes and alt text
- Hyperlinks (external + internal anchors) and bookmarks
- Headers/footers (default/first/even) and watermarks
- Field codes (PAGE, NUMPAGES, DATE, MERGEFIELD)
- Page setup (size, orientation, margins)
Out-of-scope features are silently dropped (footnotes, comments, equations, etc.).
HTML (inline styles)
│
▼ Html\Converter
Document (AST) ◀──── DocumentBuilder (programmatic)
│
▼ Writer\Word2007Writer
DOCX bytes
▲
│ Reader\DocxReader
Document (AST)
│
▼ Html\Serializer
ImportedDocument (bodyHtml, headerHtml, footerHtml, variables, media)
The same Document AST is shared by HTML conversion, programmatic
construction and DOCX reading — every entry/exit point operates on
typed value-objects.
composer install
composer test # phpunit suite (~340 tests)
composer stan # phpstan level 8MIT — see LICENSE.