Quick and dirty extraction of BibTeX data from a folder of PDF files
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
COPYING
README
bibfrompdfs.py

README

bibfrompdfs extraction tool

Copyright 2017-2018 Nicolas Bruot (https://www.bruot.org/hp/)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
bibfrompdfs is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

bibfrompdfs is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with bibfrompdfs.  If not, see <http://www.gnu.org/licenses/>.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


For updates, see:

https://github.com/bruot/bibfrompdfs/

##########################
# INTRODUCTION AND USAGE #
##########################

bibfrompdfs is a Python script to build a BibTeX file from PDF articles.  It takes as an input a folder containing the PDFs, and searches for DOI links inside them to then fetch bibliography data online.  The output is a single .bib file in the same folder as the PDF files.

Usage:

  python bibfrompdfs.py FOLDER


################
# INSTALLATION #
################

bibfrompdfs works with Python 2 and 3.  It requires the textract package corresponding to your Python distribution.  It can be installed for example with pip:

  pip install textract

Alternatively, you can get textract from:

https://github.com/deanmalmgren/textract

textract 1.6.1 does not support Python 3, so if you want to use Python 3, you should get a more recent version of textract or download the master branch from the link above.

You also need the pdftotext tool.

On Debian, make sure the poppler-utils package is installed.

On Windows:

  - Download the Xpdf tools from

      http://www.xpdfreader.com/download.html

  - Extract the files into, e.g. C:\bin\xpdf-tools-win-4.00.

  - Add the binaries directory to your PATH environment variable.  On Windows 10: open an explorer, right-click on "This PC" and click "Properties". In the new window, click "Advanced settings", then "Environment Variables...", double-click on either the user or system "Path" variable, and add an entry, like "C:\bin\xpdf-tools-win-4.00\bin64".