pdf-toolbox

A collection of tools for processing PDF files

Features

Written in Haskell
Parsing on demand. You don't need to parse or load into memory the entire PDF file just to extract one image
Different levels of abstraction. You can inspect high level (catalog, page tree, pages) or low level (xref, trailer, object) structure of PDF file. You can even switch between levels of details on the fly.
Extremely fast and memory efficient when you need to inspect only part of the document
Resonably fast and memory efficient in general case
Text extraction with exact glyph positions It can be used e.g. to implement text selection and copying in pdf viewer
Full support of xref streams and object streams
Supports editing of PDF files (incremental updates)
Basic support for PDF file generating
Encrypted PDF documents are partially supported

Still in TODO list

Linearized PDF files
Higher level API for incremental updates and PDF generating

Examples

(Also see examples and viewer directories)

Inspect high level structure:

import Control.Monad
import Pdf.Document

main =
  withPdfFile "input.pdf" $ \pdf -> do
    encrypted <- isEncrypted pdf
    when encrypted $ do
      ok <- setUserPassword pdf defaultUserPassword
      unless ok $
        fail "need password"
    doc <- document pdf
    catalog <- documentCatalog doc
    rootNode <- catalogPageNode catalog
    count <- pageNodeNKids rootNode
    print count
    -- the first page of the document
    page <- pageNodePageByNum rootNode 0
    -- extract text
    txt <- pageExtractText page
    print txt
    ...

Name		Name	Last commit message	Last commit date
Latest commit History 253 Commits
.github/workflows		.github/workflows
content		content
core		core
document		document
examples		examples
viewer		viewer
.gitignore		.gitignore
README.md		README.md
cabal.project		cabal.project
cabal_no_viewer.project		cabal_no_viewer.project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

content

content

core

core

document

document

examples

examples

viewer

viewer

.gitignore

.gitignore

README.md

README.md

cabal.project

cabal.project

cabal_no_viewer.project

cabal_no_viewer.project

Repository files navigation

pdf-toolbox

Features

Still in TODO list

Examples

About

Releases

Packages

Contributors 8

Languages

Yuras/pdf-toolbox

Folders and files

Latest commit

History

Repository files navigation

pdf-toolbox

Features

Still in TODO list

Examples

About

Topics

Resources

Stars

Watchers

Forks

Languages