diff --git a/README.md b/README.md index 36db14e..2c5e9b2 100644 --- a/README.md +++ b/README.md @@ -3,185 +3,141 @@ [![CI](https://github.com/Velli20/safe-pdf/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/Velli20/safe-pdf/actions/workflows/ci.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) +Safe-PDF is a modular PDF reader and renderer written in Rust. -A **PDF reader and renderer** written in Rust. `safe-pdf` provides a robust, memory-safe, and extensible foundation for working with PDF files, ideal for both end-users and developers. -Status: this project is under active, heavy development (pre-alpha; APIs may change without notice). Contributions, feedback, and issue reports are very welcome. - ---- - -## 🚏 System Architecture - -`safe-pdf` is organized as a modular Rust monorepo, with each core PDF concept implemented as a separate crate. - -### Key Components - -- **pdf-tokenizer**: Lexical analysis of PDF byte streams into tokens. -- **pdf-parser**: Syntactic parsing of tokens into PDF objects and structures. -- **pdf-object**: In-memory representation of all PDF object types (dictionaries, arrays, streams, etc.). -- **pdf-document**: High-level API for loading, validating, and traversing PDF documents. -- **pdf-page**: Page tree, page objects, and resource management. -- **pdf-content-stream**: Parsing and dispatching PDF drawing/text operators. -- **pdf-canvas**: Abstracts 2D drawing operations, delegating to a backend. -- **pdf-graphics, pdf-graphics-skia, pdf-graphics-femtovg**: Rendering backends for different graphics engines. -- **pdf-renderer**: Handles rendering of PDF pages using a chosen backend. -- **pdf-font**: Font parsing, encoding, glyph access (Type1/TrueType/Type3). - -### Data Flow - -1. **Input**: PDF file bytes -2. **Tokenization**: `pdf-tokenizer` β†’ tokens -3. **Parsing**: `pdf-parser` β†’ PDF objects -4. **Object Model**: `pdf-object` β†’ in-memory structure -5. **Document API**: `pdf-document` β†’ high-level access -6. **Page Handling**: `pdf-page` β†’ page tree, resources -7. **Content Stream**: `pdf-content-stream` β†’ operator dispatch -8. **Canvas Abstraction**: `pdf-canvas` β†’ drawing commands -9. **Rendering**: `pdf-graphics-*` β†’ pixels on screen or image - -### Module Interactions - -```mermaid -flowchart TD - A[PDF File] --> B[pdf-tokenizer] - B --> C[pdf-parser] - C --> D[pdf-object] - D --> E[pdf-document] - E --> F[pdf-page] - F --> G[pdf-content-stream] - G --> H[pdf-canvas] - H --> I[pdf-graphics-skia/femtovg] - I --> J[Display/Output] -``` +The project is still pre-alpha and under active development. APIs change quickly, rendering coverage is incomplete, and the current value of the repo is its architecture: strict safety constraints, clear layering, and extension points for new renderers or non-rendering PDF tooling. ---- +## Overview -## πŸ—‚οΈ Module Structure +Safe-PDF is organized as a Cargo workspace under `crates/`, with examples and build tooling alongside it. -Project directory layout: +At a high level, the pipeline looks like this: ```text -safe-pdf/ -β”œβ”€β”€ AGENTS.md -β”œβ”€β”€ Cargo.toml -β”œβ”€β”€ README.md -β”œβ”€β”€ crates/ -β”‚ β”œβ”€β”€ pdf-canvas/ # 2D drawing abstraction and stateful canvas API -β”‚ β”œβ”€β”€ pdf-content-stream/ # PDF content stream operator parsing and dispatch -β”‚ β”œβ”€β”€ pdf-document/ # High-level PDF document API -β”‚ β”œβ”€β”€ pdf-font/ # Font parsing and management -β”‚ β”œβ”€β”€ pdf-graphics/ # Common graphics types (color, transform, etc.) -β”‚ β”œβ”€β”€ pdf-graphics-femtovg/ # FemtoVG rendering backend -β”‚ β”œβ”€β”€ pdf-graphics-skia/ # Skia rendering backend -β”‚ β”œβ”€β”€ pdf-object/ # PDF object model (dictionaries, arrays, etc.) -β”‚ β”œβ”€β”€ pdf-page/ # Page tree, page objects, resources -β”‚ β”œβ”€β”€ pdf-parser/ # PDF syntax parser -β”‚ β”œβ”€β”€ pdf-postscript/ # (Optional) PostScript support -β”‚ β”œβ”€β”€ pdf-renderer/ # High-level rendering orchestration -β”‚ β”œβ”€β”€ pdf-tokenizer/ # Tokenizer for PDF byte streams -β”œβ”€β”€ examples/ # Example applications (Skia, FemtoVG) -β”‚ β”œβ”€β”€ skia.rs -β”‚ β”œβ”€β”€ femtovg.rs -β”‚ └── web/ # Web viewer (index.html + dist/ artifacts) -└── target/ # Build output +PDF bytes + -> pdf-tokenizer + -> pdf-parser + -> pdf-object + -> pdf-document + -> pdf-page + -> pdf-content-stream / pdf-content-stream-operators + -> pdf-canvas + -> pdf-renderer + -> pdf-graphics-skia / pdf-graphics-femtovg ``` ---- - -## 🧩 Design Decisions & Patterns - -- **Layered, Decomposed Architecture**: Clear separation between tokenization, parsing, object modeling, document semantics, page/resource resolution, operator dispatch, and rendering keeps concerns orthogonal and testable. -- **Monorepo of Focused Crates**: Each conceptual layer lives in its own crate (e.g. `pdf-tokenizer`, `pdf-parser`, `pdf-object`), enabling incremental compilation, targeted benchmarks, and reuse in non‑rendering contexts (indexers, validators, analyzers). -- **Pluggable Operator Handling**: Content stream operators are dispatched via traits, so you can substitute a renderer with: (a) a metrics collector, (b) a static analyzer, or (c) a custom export (SVG, canvas, etc.) without forking core logic. -- **Backend Agnosticism via `CanvasBackend`**: Rendering pipelines interact only with an abstract canvas; Skia / FemtoVG backends demonstrate how GPU / vector engines can be integrated with minimal glue. -- **Error Handling Discipline**: Rich domain errors (using `thiserror` inside crates) + `Result` everywhere; workspace Clippy configuration forbids `unwrap` / `expect`, reducing accidental panics. -- **Safety First (`unsafe_code` forbidden)**: The workspace lints disallow `unsafe` by default. Any future exception must be narrowly scoped and justified in docs. -- **Extensible Font System**: `pdf-font` isolates font decoding (Type1 / TrueType / Type3 WIP) so shaping / caching strategies can evolve independently of rendering. -- **Predictable Rendering Pipeline**: `PdfRenderer` orchestrates: page resource resolution β†’ content stream execution β†’ backend drawing; easy insertion points for caching or preflight stages. -- **Testing Strategy**: Unit tests live close to logic in each crate; cross‑crate integration & rendering behavioral tests will accumulate in a higher-level test harness (planned) to diff raster/command outputs. -- **Instrumentation Friendly**: Because operator visitation is trait-based, adding logging / tracing / telemetry layers does not require modifying PDF interpretation logic. -- **Minimal Global State**: State (graphics, text, resources) is threaded explicitly through contexts to simplify future concurrency and parallel page rendering. -- **Clarity Over Cleverness**: Prefer small, explicit functions and well-named types over macro indirection; easier for contributors new to PDF internals. - +## Architecture -## πŸš€ Quick Start +### Core pipeline -Clone the repository and run tests: +- `pdf-tokenizer`: lexical analysis for PDF byte streams. +- `pdf-parser`: syntax-level parsing for objects, streams, xref data, headers, and related structures. +- `pdf-object`: in-memory PDF object model, including dictionaries, streams, trailers, versions, and object resolution support. +- `pdf-document`: document loading, decryption/encryption handling, object stream support, and high-level document access. +- `pdf-page`: page tree traversal, resource lookup, forms, patterns, shadings, external graphics state, and page-level caches. +- `pdf-content-stream`: content stream parsing and operator stream handling. +- `pdf-content-stream-operators`: trait-based operator categories used to dispatch path, text, color, graphics-state, clipping, shading, image, and marked-content operations. +- `pdf-canvas`: stateful PDF drawing engine that interprets page content against a generic backend. +- `pdf-renderer`: page rendering orchestration, plus recording-canvas based page caching and replay. -```bash -git clone https://github.com/Velli20/safe-pdf.git -cd safe-pdf -cargo test -``` ---- - -## πŸ–₯️ Running the Examples +### Supporting crates -The `examples/` workspace member contains runnable showcase applications. Currently two rendering backends are available behind feature flags: Skia (OpenGL) and FemtoVG (wgpu). +- `pdf-filter`: PDF stream filters such as ASCII85, ASCIIHex, LZW, predictors, and CCITT Fax support. +- `pdf-decode`: sample decoding helpers and indexed/ranged decode utilities. +- `pdf-image`: image XObject and inline-image handling. +- `pdf-color-space`: PDF color space parsing and conversion support, including Indexed, ICCBased, Separation, DeviceN, Lab, CalGray, and CalRGB. +- `pdf-function`: sampled, stitching, exponential interpolation, and PostScript-backed PDF functions. +- `pdf-shading`: shading model parsing and paint generation for gradients and mesh-based shadings. +- `pdf-font`: font decoding and mapping support for Type 0, Type 1, TrueType, Type 3, encodings, widths, CMaps, and ToUnicode handling. +- `pdf-postscript`: PostScript parser and calculator used by higher-level PDF functionality. +- `pdf-graphics`: shared geometry, color, path, transform, and rendering data structures. +- `pdf-object-collection`: utility collection support for PDF objects. -### Example Assets +### Backend layer -Sample PDFs used for experimentation live in `examples/assets`: +- `pdf-graphics-skia`: Skia-backed `CanvasBackend` implementation. +- `pdf-graphics-femtovg`: FemtoVG-backed `CanvasBackend` implementation. -``` -examples/assets/ - Gradients.pdf - PlaygroundMDN.pdf - RadialGradientFills.pdf - W3Schools.pdf - is.pdf - test6.pdf - webgl.pdf -``` +Two design points matter most if you are evaluating the repo: -### Run the Skia Backend (interactive viewer) +- `CanvasBackend` in `pdf-canvas` keeps PDF interpretation separate from the concrete graphics engine. A backend supplies drawing primitives; the PDF pipeline stays backend-agnostic. +- `pdf-content-stream-operators` exposes trait-based operator handling, so the same parsed operator stream can drive a renderer, recorder, analyzer, or extraction tool without rewriting the parser. -This launches an OpenGL + Skia window. Pass the path to a PDF (relative or absolute) as the final argument. Use Up / Down arrow keys to change pages. +## Workspace Layout -```bash -cargo run --example skia --features skia -- examples/assets/webgl.pdf +```text +safe-pdf/ +β”œβ”€β”€ crates/ # Core workspace crates +β”œβ”€β”€ examples/ # Native and web-facing demos +β”œβ”€β”€ fuzz/ # Fuzz targets for parser-oriented testing +β”œβ”€β”€ xtask/ # Build automation, including emscripten packaging +β”œβ”€β”€ Cargo.toml +└── README.md ``` -### Run the FemtoVG Backend (WIP) +The workspace members are `crates/*`, `examples`, and `xtask`. -FemtoVG + wgpu prototype (may be less feature complete): +## Rendering Model -```bash -cargo run --example femtovg --features femtovg -``` +The current rendering path is intentionally layered: -The FemtoVG example embeds a small PDF internally (see `examples/femtovg.rs`). You can adapt it to load external files similarly to `examples/skia.rs`. +1. Parse a document into structured PDF objects and pages. +2. Resolve page resources such as fonts, forms, patterns, images, shadings, and color spaces. +3. Parse content streams into PDF operators. +4. Execute those operators against `PdfCanvas`. +5. Forward low-level draw calls into a chosen `CanvasBackend`. -### WebAssembly (Emscripten) Viewer +`pdf-renderer` also supports recording a page into a `RecordingCanvas`, then replaying it later. That cache is resolution-independent and backend-agnostic, which is useful for page reuse, navigation, and future prefetch strategies. -Build the WASM target and copy artifacts into `examples/web/dist/`: +## Examples And Features -```bash -cargo xtask emscripten --features skia-wasm -``` +The `examples` workspace member contains the currently supported demos. -Serve the web viewer (from `examples/web/`) and open the printed URL: +- `cargo run --example skia --features skia -- examples/assets/webgl.pdf` + Opens the native Skia viewer. +- `cargo run --example femtovg --features femtovg` + Runs the FemtoVG prototype renderer. +- `cargo xtask emscripten --features skia-wasm` + Builds the web target and copies artifacts into `examples/web/dist/`. +- `cargo xtask emscripten --features skia-wasm --serve --port 8080` + Builds and serves the web example locally. -```bash -cargo xtask emscripten --features skia-wasm --serve --port 8082 -``` +Feature flags in `examples/Cargo.toml`: -### Adding a New Backend +- `skia`: native Skia viewer with `winit` and `glutin`. +- `skia-wasm`: emscripten/web build without the native windowing stack. +- `femtovg`: FemtoVG renderer with `wgpu`. -Implement the `CanvasBackend` trait (see `pdf-canvas`) and create a new crate similar to `pdf-graphics-skia`. Then expose it behind a feature flag in `examples/Cargo.toml` so it can be opt‑in at runtime. +Sample PDFs for experiments and debugging live in `examples/assets/`. ---- +## Development -## 🀝 Contributing +Common commands: -Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines, or open an issue to discuss your ideas. +```sh +cargo test +cargo test -p pdf-parser +cargo test -p pdf-parser -- test_name +cargo check +cargo clippy --all --workspace +cargo fmt +cargo build --example skia --features skia +cargo run --example skia --features skia -- examples/assets/webgl.pdf +cargo xtask emscripten --features skia-wasm +cargo fuzz run parse_object +``` ---- +Important workspace constraints: -## πŸ“„ License +- `unsafe_code` is forbidden workspace-wide. +- `unwrap` and `expect` are denied in non-test code. +- indexing and slicing are denied by Clippy policy. +- errors are expected to propagate through `Result` rather than panic paths. -This project is licensed under the MIT License. See [LICENSE](LICENSE) for details. +These constraints are not cosmetic. They shape the implementation style across the entire repo. -SPDX-License-Identifier: MIT +## License -This project embeds the Roboto and Roboto Mono fonts, which are licensed under the SIL Open Font License 1.1. See `crates/pdf-font/assets/OFL.txt` for details. +Safe-PDF is licensed under the MIT License. See [LICENSE](LICENSE). +The repository also embeds Roboto and Roboto Mono font assets under the SIL Open Font License 1.1. See `crates/pdf-font/assets/OFL.txt` for details.