Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
242 changes: 99 additions & 143 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,185 +3,141 @@
[![CI](https://github.com/Velli20/safe-pdf/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/Velli20/safe-pdf/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

Safe-PDF is a modular PDF reader and renderer written in Rust.

A **PDF reader and renderer** written in Rust. `safe-pdf` provides a robust, memory-safe, and extensible foundation for working with PDF files, ideal for both end-users and developers.
Status: this project is under active, heavy development (pre-alpha; APIs may change without notice). Contributions, feedback, and issue reports are very welcome.

---

## 🚏 System Architecture

`safe-pdf` is organized as a modular Rust monorepo, with each core PDF concept implemented as a separate crate.

### Key Components

- **pdf-tokenizer**: Lexical analysis of PDF byte streams into tokens.
- **pdf-parser**: Syntactic parsing of tokens into PDF objects and structures.
- **pdf-object**: In-memory representation of all PDF object types (dictionaries, arrays, streams, etc.).
- **pdf-document**: High-level API for loading, validating, and traversing PDF documents.
- **pdf-page**: Page tree, page objects, and resource management.
- **pdf-content-stream**: Parsing and dispatching PDF drawing/text operators.
- **pdf-canvas**: Abstracts 2D drawing operations, delegating to a backend.
- **pdf-graphics, pdf-graphics-skia, pdf-graphics-femtovg**: Rendering backends for different graphics engines.
- **pdf-renderer**: Handles rendering of PDF pages using a chosen backend.
- **pdf-font**: Font parsing, encoding, glyph access (Type1/TrueType/Type3).

### Data Flow

1. **Input**: PDF file bytes
2. **Tokenization**: `pdf-tokenizer` → tokens
3. **Parsing**: `pdf-parser` → PDF objects
4. **Object Model**: `pdf-object` → in-memory structure
5. **Document API**: `pdf-document` → high-level access
6. **Page Handling**: `pdf-page` → page tree, resources
7. **Content Stream**: `pdf-content-stream` → operator dispatch
8. **Canvas Abstraction**: `pdf-canvas` → drawing commands
9. **Rendering**: `pdf-graphics-*` → pixels on screen or image

### Module Interactions

```mermaid
flowchart TD
A[PDF File] --> B[pdf-tokenizer]
B --> C[pdf-parser]
C --> D[pdf-object]
D --> E[pdf-document]
E --> F[pdf-page]
F --> G[pdf-content-stream]
G --> H[pdf-canvas]
H --> I[pdf-graphics-skia/femtovg]
I --> J[Display/Output]
```
The project is still pre-alpha and under active development. APIs change quickly, rendering coverage is incomplete, and the current value of the repo is its architecture: strict safety constraints, clear layering, and extension points for new renderers or non-rendering PDF tooling.

---
## Overview

## 🗂️ Module Structure
Safe-PDF is organized as a Cargo workspace under `crates/`, with examples and build tooling alongside it.

Project directory layout:
At a high level, the pipeline looks like this:

```text
safe-pdf/
├── AGENTS.md
├── Cargo.toml
├── README.md
├── crates/
│ ├── pdf-canvas/ # 2D drawing abstraction and stateful canvas API
│ ├── pdf-content-stream/ # PDF content stream operator parsing and dispatch
│ ├── pdf-document/ # High-level PDF document API
│ ├── pdf-font/ # Font parsing and management
│ ├── pdf-graphics/ # Common graphics types (color, transform, etc.)
│ ├── pdf-graphics-femtovg/ # FemtoVG rendering backend
│ ├── pdf-graphics-skia/ # Skia rendering backend
│ ├── pdf-object/ # PDF object model (dictionaries, arrays, etc.)
│ ├── pdf-page/ # Page tree, page objects, resources
│ ├── pdf-parser/ # PDF syntax parser
│ ├── pdf-postscript/ # (Optional) PostScript support
│ ├── pdf-renderer/ # High-level rendering orchestration
│ ├── pdf-tokenizer/ # Tokenizer for PDF byte streams
├── examples/ # Example applications (Skia, FemtoVG)
│ ├── skia.rs
│ ├── femtovg.rs
│ └── web/ # Web viewer (index.html + dist/ artifacts)
└── target/ # Build output
PDF bytes
-> pdf-tokenizer
-> pdf-parser
-> pdf-object
-> pdf-document
-> pdf-page
-> pdf-content-stream / pdf-content-stream-operators
-> pdf-canvas
-> pdf-renderer
-> pdf-graphics-skia / pdf-graphics-femtovg
```

---

## 🧩 Design Decisions & Patterns

- **Layered, Decomposed Architecture**: Clear separation between tokenization, parsing, object modeling, document semantics, page/resource resolution, operator dispatch, and rendering keeps concerns orthogonal and testable.
- **Monorepo of Focused Crates**: Each conceptual layer lives in its own crate (e.g. `pdf-tokenizer`, `pdf-parser`, `pdf-object`), enabling incremental compilation, targeted benchmarks, and reuse in non‑rendering contexts (indexers, validators, analyzers).
- **Pluggable Operator Handling**: Content stream operators are dispatched via traits, so you can substitute a renderer with: (a) a metrics collector, (b) a static analyzer, or (c) a custom export (SVG, canvas, etc.) without forking core logic.
- **Backend Agnosticism via `CanvasBackend`**: Rendering pipelines interact only with an abstract canvas; Skia / FemtoVG backends demonstrate how GPU / vector engines can be integrated with minimal glue.
- **Error Handling Discipline**: Rich domain errors (using `thiserror` inside crates) + `Result` everywhere; workspace Clippy configuration forbids `unwrap` / `expect`, reducing accidental panics.
- **Safety First (`unsafe_code` forbidden)**: The workspace lints disallow `unsafe` by default. Any future exception must be narrowly scoped and justified in docs.
- **Extensible Font System**: `pdf-font` isolates font decoding (Type1 / TrueType / Type3 WIP) so shaping / caching strategies can evolve independently of rendering.
- **Predictable Rendering Pipeline**: `PdfRenderer` orchestrates: page resource resolution → content stream execution → backend drawing; easy insertion points for caching or preflight stages.
- **Testing Strategy**: Unit tests live close to logic in each crate; cross‑crate integration & rendering behavioral tests will accumulate in a higher-level test harness (planned) to diff raster/command outputs.
- **Instrumentation Friendly**: Because operator visitation is trait-based, adding logging / tracing / telemetry layers does not require modifying PDF interpretation logic.
- **Minimal Global State**: State (graphics, text, resources) is threaded explicitly through contexts to simplify future concurrency and parallel page rendering.
- **Clarity Over Cleverness**: Prefer small, explicit functions and well-named types over macro indirection; easier for contributors new to PDF internals.

## Architecture

## 🚀 Quick Start
### Core pipeline

Clone the repository and run tests:
- `pdf-tokenizer`: lexical analysis for PDF byte streams.
- `pdf-parser`: syntax-level parsing for objects, streams, xref data, headers, and related structures.
- `pdf-object`: in-memory PDF object model, including dictionaries, streams, trailers, versions, and object resolution support.
- `pdf-document`: document loading, decryption/encryption handling, object stream support, and high-level document access.
- `pdf-page`: page tree traversal, resource lookup, forms, patterns, shadings, external graphics state, and page-level caches.
- `pdf-content-stream`: content stream parsing and operator stream handling.
- `pdf-content-stream-operators`: trait-based operator categories used to dispatch path, text, color, graphics-state, clipping, shading, image, and marked-content operations.
- `pdf-canvas`: stateful PDF drawing engine that interprets page content against a generic backend.
- `pdf-renderer`: page rendering orchestration, plus recording-canvas based page caching and replay.

```bash
git clone https://github.com/Velli20/safe-pdf.git
cd safe-pdf
cargo test
```
---

## 🖥️ Running the Examples
### Supporting crates

The `examples/` workspace member contains runnable showcase applications. Currently two rendering backends are available behind feature flags: Skia (OpenGL) and FemtoVG (wgpu).
- `pdf-filter`: PDF stream filters such as ASCII85, ASCIIHex, LZW, predictors, and CCITT Fax support.
- `pdf-decode`: sample decoding helpers and indexed/ranged decode utilities.
- `pdf-image`: image XObject and inline-image handling.
- `pdf-color-space`: PDF color space parsing and conversion support, including Indexed, ICCBased, Separation, DeviceN, Lab, CalGray, and CalRGB.
- `pdf-function`: sampled, stitching, exponential interpolation, and PostScript-backed PDF functions.
- `pdf-shading`: shading model parsing and paint generation for gradients and mesh-based shadings.
- `pdf-font`: font decoding and mapping support for Type 0, Type 1, TrueType, Type 3, encodings, widths, CMaps, and ToUnicode handling.
- `pdf-postscript`: PostScript parser and calculator used by higher-level PDF functionality.
- `pdf-graphics`: shared geometry, color, path, transform, and rendering data structures.
- `pdf-object-collection`: utility collection support for PDF objects.

### Example Assets
### Backend layer

Sample PDFs used for experimentation live in `examples/assets`:
- `pdf-graphics-skia`: Skia-backed `CanvasBackend` implementation.
- `pdf-graphics-femtovg`: FemtoVG-backed `CanvasBackend` implementation.

```
examples/assets/
Gradients.pdf
PlaygroundMDN.pdf
RadialGradientFills.pdf
W3Schools.pdf
is.pdf
test6.pdf
webgl.pdf
```
Two design points matter most if you are evaluating the repo:

### Run the Skia Backend (interactive viewer)
- `CanvasBackend` in `pdf-canvas` keeps PDF interpretation separate from the concrete graphics engine. A backend supplies drawing primitives; the PDF pipeline stays backend-agnostic.
- `pdf-content-stream-operators` exposes trait-based operator handling, so the same parsed operator stream can drive a renderer, recorder, analyzer, or extraction tool without rewriting the parser.

This launches an OpenGL + Skia window. Pass the path to a PDF (relative or absolute) as the final argument. Use Up / Down arrow keys to change pages.
## Workspace Layout

```bash
cargo run --example skia --features skia -- examples/assets/webgl.pdf
```text
safe-pdf/
├── crates/ # Core workspace crates
├── examples/ # Native and web-facing demos
├── fuzz/ # Fuzz targets for parser-oriented testing
├── xtask/ # Build automation, including emscripten packaging
├── Cargo.toml
└── README.md
```

### Run the FemtoVG Backend (WIP)
The workspace members are `crates/*`, `examples`, and `xtask`.

FemtoVG + wgpu prototype (may be less feature complete):
## Rendering Model

```bash
cargo run --example femtovg --features femtovg
```
The current rendering path is intentionally layered:

The FemtoVG example embeds a small PDF internally (see `examples/femtovg.rs`). You can adapt it to load external files similarly to `examples/skia.rs`.
1. Parse a document into structured PDF objects and pages.
2. Resolve page resources such as fonts, forms, patterns, images, shadings, and color spaces.
3. Parse content streams into PDF operators.
4. Execute those operators against `PdfCanvas`.
5. Forward low-level draw calls into a chosen `CanvasBackend`.

### WebAssembly (Emscripten) Viewer
`pdf-renderer` also supports recording a page into a `RecordingCanvas`, then replaying it later. That cache is resolution-independent and backend-agnostic, which is useful for page reuse, navigation, and future prefetch strategies.

Build the WASM target and copy artifacts into `examples/web/dist/`:
## Examples And Features

```bash
cargo xtask emscripten --features skia-wasm
```
The `examples` workspace member contains the currently supported demos.

Serve the web viewer (from `examples/web/`) and open the printed URL:
- `cargo run --example skia --features skia -- examples/assets/webgl.pdf`
Opens the native Skia viewer.
- `cargo run --example femtovg --features femtovg`
Runs the FemtoVG prototype renderer.
- `cargo xtask emscripten --features skia-wasm`
Builds the web target and copies artifacts into `examples/web/dist/`.
- `cargo xtask emscripten --features skia-wasm --serve --port 8080`
Builds and serves the web example locally.

```bash
cargo xtask emscripten --features skia-wasm --serve --port 8082
```
Feature flags in `examples/Cargo.toml`:

### Adding a New Backend
- `skia`: native Skia viewer with `winit` and `glutin`.
- `skia-wasm`: emscripten/web build without the native windowing stack.
- `femtovg`: FemtoVG renderer with `wgpu`.

Implement the `CanvasBackend` trait (see `pdf-canvas`) and create a new crate similar to `pdf-graphics-skia`. Then expose it behind a feature flag in `examples/Cargo.toml` so it can be opt‑in at runtime.
Sample PDFs for experiments and debugging live in `examples/assets/`.

---
## Development

## 🤝 Contributing
Common commands:

Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines, or open an issue to discuss your ideas.
```sh
cargo test
cargo test -p pdf-parser
cargo test -p pdf-parser -- test_name
cargo check
cargo clippy --all --workspace
cargo fmt
cargo build --example skia --features skia
cargo run --example skia --features skia -- examples/assets/webgl.pdf
cargo xtask emscripten --features skia-wasm
cargo fuzz run parse_object
```

---
Important workspace constraints:

## 📄 License
- `unsafe_code` is forbidden workspace-wide.
- `unwrap` and `expect` are denied in non-test code.
- indexing and slicing are denied by Clippy policy.
- errors are expected to propagate through `Result` rather than panic paths.

This project is licensed under the MIT License. See [LICENSE](LICENSE) for details.
These constraints are not cosmetic. They shape the implementation style across the entire repo.

SPDX-License-Identifier: MIT
## License

This project embeds the Roboto and Roboto Mono fonts, which are licensed under the SIL Open Font License 1.1. See `crates/pdf-font/assets/OFL.txt` for details.
Safe-PDF is licensed under the MIT License. See [LICENSE](LICENSE).

The repository also embeds Roboto and Roboto Mono font assets under the SIL Open Font License 1.1. See `crates/pdf-font/assets/OFL.txt` for details.
Loading