Skip to content

Reading Existing PDFs

Dragon edited this page Jun 12, 2026 · 4 revisions

Reading Existing PDFs

PdfReader opens an existing PDF (including encrypted ones) and exposes its object graph and page tree. It is a low-level API: it does not render or edit anything by itself, but it is the foundation for template import (Document::importPdf() / Page::template()) and the higher-level modify workflow (PdfEditor::open()).

use DragonOfMercy\PhpPdf\Reader\PdfReader;

$reader = PdfReader::fromFile('invoice.pdf');            // non-encrypted
$reader = PdfReader::fromFile('protected.pdf', 'pass');  // user or owner password
$reader = PdfReader::fromBytes($bytes);                  // from a string
$reader = PdfReader::fromBytes($bytes, 'pass');          // from bytes, with password

$reader->version();          // "1.7" (catalog /Version overrides the header)
$reader->pageCount();        // 3
$reader->isEncrypted();      // true when the source was encrypted
$page = $reader->page(1);    // 1-based, ReadPage

What it understands

  • Classic cross-reference tables and cross-reference streams (PDF 1.5+), including PNG/TIFF predictor encodings.
  • Incremental updates: /Prev revision chains are walked and merged (the newest revision wins), including hybrid-reference files (/XRefStm).
  • Object streams (/ObjStm): compressed objects are extracted transparently.
  • Stream filters needed for document structure: FlateDecode, ASCIIHexDecode, ASCII85Decode, RunLengthDecode (with /DecodeParms). Image filters (DCT, JPX, CCITT, JBIG2) are not decoded - image streams stay opaque.
  • Real-world quirks: junk before the %PDF- header, slightly wrong xref offsets (a recovery scan looks around the recorded position), a wrong stream /Length (fallback scan for endstream), and a missing %%EOF.

Pages

page(int $n) returns a ReadPage with the page's inherited attributes already resolved (PDF inheritance through the /Pages tree):

$page = $reader->page(1);
$page->mediaBox;    // [llx, lly, urx, ury] in points, corner-normalized
$page->cropBox;     // same shape, or null
$page->box();       // CropBox when present, else MediaBox
$page->rotate;      // 0 / 90 / 180 / 270
$page->resources;   // the resolved /Resources dictionary, or null
$page->contents;    // list of references to the page's content stream(s)
$page->dict;        // the raw page dictionary

Raw object access

$catalog = $reader->catalog();                 // the document catalog dictionary
$trailer = $reader->trailer();                 // merged trailer across revisions
$object  = $reader->object(12);                // payload of object 12 (lazy, cached)
$value   = $reader->resolve($maybeReference);  // follow reference chains
$bytes   = $reader->decodeStream($stream);     // apply a stream's /Filter chain

Objects are returned as the library's internal PDF object model (dictionaries, arrays, names, numbers, strings, streams). Resolution is lazy and cached; circular references and over-deep chains throw a PdfParseException.

Encrypted PDFs

PdfReader transparently decrypts files protected with the Standard security handler: RC4 40-bit, RC4 128-bit, AES-128, and AES-256.

The $password argument is optional:

  • Omit it (or pass null) for PDFs with an empty user password - permissions-only encryption opens without a password.
  • Supply a password to open a user-password or owner-password protected file. The library tries it first as the user password, then as the owner password.
  • A wrong or missing password throws a PdfException with a clear message.
// Permissions-only encryption - no password needed
$reader = PdfReader::fromFile('report.pdf');

// User or owner password
$reader = PdfReader::fromFile('report.pdf', 'secret');

// Check whether the source was encrypted
if ($reader->isEncrypted()) {
    // ...
}

Editing encrypted PDFs is not yet supported. PdfEditor::open() / PdfEditor::fromBytes() reject encrypted sources with a clear message. Use PdfReader (or Document::importPdf()) to read the content, then build a new document if edits are needed.

Limits

  • Malformed input throws PdfParseException with the byte offset and what was expected.
  • LZWDecode and full reconstruction of severely broken files (rebuilding the xref by scanning) are not supported yet.

Clone this wiki locally