pdfout is a lightweight PDF toolkit written in C. It is heavily based on the mupdf library.
The functionality is organized into small subcommands. You can
dump and modify outlines, page labels and keys of the information dictionary. These support JSON for input and output.
extract text
repair broken cross-reference tables
Download and checkout the mupdf submodule:
user $ git clone git://github.com/amba/pdfout.git && cd pdfout
Check out the mupdf submodule:
user $ ./make.pl submodules
Apart from mupdf, there are no thirdparty dependencies.
Build and install with
user $ ./make.pl
user $ ./make.pl --install --prefix=$HOME/local
See the docs for full details.
Now you can run
user $ pdfout --describe-commands
to list the available subcommands.
So far, the docs mostly cover the build and internals. More docs on the commands are currently being written.
pdfout is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Copyright (C) 2015-2016 AUTHORS
See the COPYRIGHT file for full details on used libraries and code derived from other projects.