Reading Existing PDFs

PdfReader opens an existing PDF (including encrypted ones) and exposes its object graph and page tree. It is a low-level API: it does not render or edit anything by itself, but it is the foundation for template import (Document::importPdf() / Page::template()) and the higher-level modify workflow (PdfEditor::open()).

use DragonOfMercy\PhpPdf\Reader\PdfReader;

$reader = PdfReader::fromFile('invoice.pdf');            // non-encrypted
$reader = PdfReader::fromFile('protected.pdf', 'pass');  // user or owner password
$reader = PdfReader::fromBytes($bytes);                  // from a string
$reader = PdfReader::fromBytes($bytes, 'pass');          // from bytes, with password

$reader->version();          // "1.7" (catalog /Version overrides the header)
$reader->pageCount();        // 3
$reader->isEncrypted();      // true when the source was encrypted
$page = $reader->page(1);    // 1-based, ReadPage

What it understands

Classic cross-reference tables and cross-reference streams (PDF 1.5+), including PNG/TIFF predictor encodings.
Incremental updates: /Prev revision chains are walked and merged (the newest revision wins), including hybrid-reference files (/XRefStm).
Object streams (/ObjStm): compressed objects are extracted transparently.
Stream filters needed for document structure: FlateDecode, ASCIIHexDecode, ASCII85Decode, RunLengthDecode (with /DecodeParms). Image filters (DCT, JPX, CCITT, JBIG2) are not decoded - image streams stay opaque.
Real-world quirks: junk before the %PDF- header, slightly wrong xref offsets (a recovery scan looks around the recorded position), a wrong stream /Length (fallback scan for endstream), and a missing %%EOF.

Pages

page(int $n) returns a ReadPage with the page's inherited attributes already resolved (PDF inheritance through the /Pages tree):

$page = $reader->page(1);
$page->mediaBox;    // [llx, lly, urx, ury] in points, corner-normalized
$page->cropBox;     // same shape, or null
$page->box();       // CropBox when present, else MediaBox
$page->rotate;      // 0 / 90 / 180 / 270
$page->resources;   // the resolved /Resources dictionary, or null
$page->contents;    // list of references to the page's content stream(s)
$page->dict;        // the raw page dictionary

Raw object access

$catalog = $reader->catalog();                 // the document catalog dictionary
$trailer = $reader->trailer();                 // merged trailer across revisions
$object  = $reader->object(12);                // payload of object 12 (lazy, cached)
$value   = $reader->resolve($maybeReference);  // follow reference chains
$bytes   = $reader->decodeStream($stream);     // apply a stream's /Filter chain

Objects are returned as the library's internal PDF object model (dictionaries, arrays, names, numbers, strings, streams). Resolution is lazy and cached; circular references and over-deep chains throw a PdfParseException.

Encrypted PDFs

PdfReader transparently decrypts files protected with the Standard security handler: RC4 40-bit, RC4 128-bit, AES-128, and AES-256.

The $password argument is optional:

Omit it (or pass null) for PDFs with an empty user password - permissions-only encryption opens without a password.
Supply a password to open a user-password or owner-password protected file. The library tries it first as the user password, then as the owner password.
A wrong or missing password throws a PdfException with a clear message.

// Permissions-only encryption - no password needed
$reader = PdfReader::fromFile('report.pdf');

// User or owner password
$reader = PdfReader::fromFile('report.pdf', 'secret');

// Check whether the source was encrypted
if ($reader->isEncrypted()) {
    // ...
}

Editing encrypted PDFs is not yet supported. PdfEditor::open() / PdfEditor::fromBytes() reject encrypted sources with a clear message. Use PdfReader (or Document::importPdf()) to read the content, then build a new document if edits are needed.

Limits

Malformed input throws PdfParseException with the byte offset and what was expected.
LZWDecode and full reconstruction of severely broken files (rebuilding the xref by scanning) are not supported yet.

Reading Existing PDFs

Reading Existing PDFs

What it understands

Pages

Raw object access

Encrypted PDFs

Limits

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Getting Started

Content & Layout

Codes & Vector

Document Features

Forms

Security & Archival

Internals

Project

Clone this wiki locally