A framework for context-based citation recommendation experiments
Branch: master
Clone or download
Latest commit 8b3cd15 May 24, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
az fixes May 17, 2018
cit_styles initial import Dec 1, 2015
db
evaluation fixes May 17, 2018
evaluation_runs fixes May 17, 2018
importing fixes May 17, 2018
kw_evaluation_runs fixes May 17, 2018
ml update May 8, 2018
multi fixes May 17, 2018
parscit Refactored and transitioned to python3 Feb 15, 2018
proc fixes May 17, 2018
retrieval fixes May 17, 2018
scidoc fixes May 17, 2018
scraping Refactored and transitioned to python3 Feb 15, 2018
scripts fixes May 17, 2018
tests update May 8, 2018
LICENSE.txt initial import Dec 2, 2015
README.md initial import Dec 2, 2015
__init__.py initial import Dec 1, 2015
delete_pyc.bat Refactoring. Updated prebuild and retrieval. Mar 1, 2016
delete_pyc.sh Refactored and transitioned to python3 Feb 15, 2018
internal_api.py Refactored and transitioned to python3 Feb 15, 2018
requirements.txt fixes May 17, 2018
setup.py Refactored and transitioned to python3 Feb 15, 2018
update_git.bat Refactored and transitioned to python3 Feb 15, 2018

README.md

Minerva

Minerva is an open framework for context-based citation recommendation experiments.

If you want to run an experiment in natural language processing or information retrieval using a corpus of scientific papers you may need to do one, more or all of these things:

  • Annotate citations in the running text
  • Parse text from the References/Bibliography section
  • Find the document for a particular reference inside your corpus
  • Split sentences in the text (often less trivial than you expect)
  • Deal with the XML schema of the corpus
  • Extract position-relevant text from the document:
    • Select sentences around a citation token
    • The full paragraph containing a reference to a Figure
    • The Abstract
    • All sentences in which a particular reference is cited

If you are unlucky and need to use a corpus that was not already converted to a machine-readable representation (e.g. XML), you may also need to:

  • Fetch/download a number of files (normally PDFs)
  • Convert these PDF files into a structured representation
  • Clean up the output from this

Minerva aims to make all of this as easy as possible, by providing built-in solutions for many of these tasks and wrappers around existing tools that deal with many other tasks.

Installing Minerva

to be continued...