Skip to content
Extract Perma links from a PDF document and look up the URLs archived
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.

This program uses pdftotext to read a PDF file, extracts links from it, then uses Perma's public API to look up the URLs originally archived. It exports a CSV file with the Perma links and URLs.


You will need pdftotext, which is in various packages; try brew install poppler on a Mac, or install poppler-utils in Linux.

There are various ways of setting up a Python virtualenv. Try installing python3-venv, then run

python3 -m venv env
source env/bin/activate

Once you've activated the virtual environment, install required packages and the program itself like this:

pip install -r requirements.txt
pip install --editable .

At this point, running

pdf-perma-urls yourfile.pdf

should produce yourfile.csv.

You can’t perform that action at this time.