-
Notifications
You must be signed in to change notification settings - Fork 0
Reading Existing PDFs
PdfReader opens an existing PDF (including encrypted ones) and exposes its object graph and page tree. It is a low-level API: it does not render or edit anything by itself, but it is the foundation for template import (Document::importPdf() / Page::template()) and the higher-level modify workflow (PdfEditor::open()).
use DragonOfMercy\PhpPdf\Reader\PdfReader;
$reader = PdfReader::fromFile('invoice.pdf'); // non-encrypted
$reader = PdfReader::fromFile('protected.pdf', 'pass'); // user or owner password
$reader = PdfReader::fromBytes($bytes); // from a string
$reader = PdfReader::fromBytes($bytes, 'pass'); // from bytes, with password
$reader->version(); // "1.7" (catalog /Version overrides the header)
$reader->pageCount(); // 3
$reader->isEncrypted(); // true when the source was encrypted
$page = $reader->page(1); // 1-based, ReadPage- Classic cross-reference tables and cross-reference streams (PDF 1.5+), including PNG/TIFF predictor encodings.
- Incremental updates:
/Prevrevision chains are walked and merged (the newest revision wins), including hybrid-reference files (/XRefStm). - Object streams (
/ObjStm): compressed objects are extracted transparently. - Stream filters needed for document structure:
FlateDecode,ASCIIHexDecode,ASCII85Decode,RunLengthDecode(with/DecodeParms). Image filters (DCT, JPX, CCITT, JBIG2) are not decoded - image streams stay opaque. - Real-world quirks: junk before the
%PDF-header, slightly wrong xref offsets (a recovery scan looks around the recorded position), a wrong stream/Length(fallback scan forendstream), and a missing%%EOF.
page(int $n) returns a ReadPage with the page's inherited attributes already resolved (PDF inheritance through the /Pages tree):
$page = $reader->page(1);
$page->mediaBox; // [llx, lly, urx, ury] in points, corner-normalized
$page->cropBox; // same shape, or null
$page->box(); // CropBox when present, else MediaBox
$page->rotate; // 0 / 90 / 180 / 270
$page->resources; // the resolved /Resources dictionary, or null
$page->contents; // list of references to the page's content stream(s)
$page->dict; // the raw page dictionary$catalog = $reader->catalog(); // the document catalog dictionary
$trailer = $reader->trailer(); // merged trailer across revisions
$object = $reader->object(12); // payload of object 12 (lazy, cached)
$value = $reader->resolve($maybeReference); // follow reference chains
$bytes = $reader->decodeStream($stream); // apply a stream's /Filter chainObjects are returned as the library's internal PDF object model (dictionaries, arrays, names, numbers, strings, streams). Resolution is lazy and cached; circular references and over-deep chains throw a PdfParseException.
PdfReader transparently decrypts files protected with the Standard security handler: RC4 40-bit, RC4 128-bit, AES-128, and AES-256.
The $password argument is optional:
-
Omit it (or pass
null) for PDFs with an empty user password - permissions-only encryption opens without a password. - Supply a password to open a user-password or owner-password protected file. The library tries it first as the user password, then as the owner password.
- A wrong or missing password throws a
PdfExceptionwith a clear message.
// Permissions-only encryption - no password needed
$reader = PdfReader::fromFile('report.pdf');
// User or owner password
$reader = PdfReader::fromFile('report.pdf', 'secret');
// Check whether the source was encrypted
if ($reader->isEncrypted()) {
// ...
}Editing encrypted PDFs is not yet supported. PdfEditor::open() / PdfEditor::fromBytes() reject encrypted sources with a clear message. Use PdfReader (or Document::importPdf()) to read the content, then build a new document if edits are needed.
- Malformed input throws
PdfParseExceptionwith the byte offset and what was expected. -
LZWDecodeand full reconstruction of severely broken files (rebuilding the xref by scanning) are not supported yet.
MIT licensed. Source on GitHub - if phppdf helps you, you can buy me a coffee.