-
Notifications
You must be signed in to change notification settings - Fork 0
Reading Existing PDFs
Dragon edited this page Jun 10, 2026
·
1 revision
PdfReader opens an existing (non-encrypted) PDF and exposes its object graph and page tree. It is a low-level API: it does not render or edit anything by itself, but it is the foundation the upcoming template-import and modify-existing-PDF features build on.
use DragonOfMercy\PhpPdf\Reader\PdfReader;
$reader = PdfReader::fromFile('invoice.pdf'); // or PdfReader::fromBytes($bytes)
$reader->version(); // "1.7" (catalog /Version overrides the header)
$reader->pageCount(); // 3
$page = $reader->page(1); // 1-based, ReadPage- Classic cross-reference tables and cross-reference streams (PDF 1.5+), including PNG/TIFF predictor encodings.
- Incremental updates:
/Prevrevision chains are walked and merged (the newest revision wins), including hybrid-reference files (/XRefStm). - Object streams (
/ObjStm): compressed objects are extracted transparently. - Stream filters needed for document structure:
FlateDecode,ASCIIHexDecode,ASCII85Decode,RunLengthDecode(with/DecodeParms). Image filters (DCT, JPX, CCITT, JBIG2) are not decoded - image streams stay opaque. - Real-world quirks: junk before the
%PDF-header, slightly wrong xref offsets (a recovery scan looks around the recorded position), a wrong stream/Length(fallback scan forendstream), and a missing%%EOF.
page(int $n) returns a ReadPage with the page's inherited attributes already resolved (PDF inheritance through the /Pages tree):
$page = $reader->page(1);
$page->mediaBox; // [llx, lly, urx, ury] in points, corner-normalized
$page->cropBox; // same shape, or null
$page->box(); // CropBox when present, else MediaBox
$page->rotate; // 0 / 90 / 180 / 270
$page->resources; // the resolved /Resources dictionary, or null
$page->contents; // list of references to the page's content stream(s)
$page->dict; // the raw page dictionary$catalog = $reader->catalog(); // the document catalog dictionary
$trailer = $reader->trailer(); // merged trailer across revisions
$object = $reader->object(12); // payload of object 12 (lazy, cached)
$value = $reader->resolve($maybeReference); // follow reference chains
$bytes = $reader->decodeStream($stream); // apply a stream's /Filter chainObjects are returned as the library's internal PDF object model (dictionaries, arrays, names, numbers, strings, streams). Resolution is lazy and cached; circular references and over-deep chains throw a PdfParseException.
-
Encrypted PDFs are rejected at
fromFile()/fromBytes()with a clearPdfException. Decrypt the file first (e.g.qpdf --decrypt). - Malformed input throws
PdfParseExceptionwith the byte offset and what was expected. -
LZWDecodeand full reconstruction of severely broken files (rebuilding the xref by scanning) are not supported yet.
MIT licensed. Source on GitHub - if phppdf helps you, you can buy me a coffee.